subscribe to arXiv mailings

Chronicling Germany: An Annotated Historical Newspaper Dataset

Authors: Christian Schultze, Niklas Kerkfeld, Kara Kuebart, Princilia Weber, Moritz Wolter, Felix Selgert

Abstract: The correct detection of article layout in historical newspaper pages remains challenging but is important for Natural Language Processing ( NLP) and machine learning applications in the field of digital history. Digital newspaper portals typically provide Optical Character Recognition ( OCR) text, albeit of varying quality. Unfortunately, layout information is often missing, limiting this rich so… ▽ More The correct detection of article layout in historical newspaper pages remains challenging but is important for Natural Language Processing ( NLP) and machine learning applications in the field of digital history. Digital newspaper portals typically provide Optical Character Recognition ( OCR) text, albeit of varying quality. Unfortunately, layout information is often missing, limiting this rich source's scope. Our dataset is designed to address this issue for historic German-language newspapers. The Chronicling Germany dataset contains 581 annotated historical newspaper pages from the time period between 1852 and 1924. Historic domain experts have spent more than 1,500 hours annotating the dataset. The paper presents a processing pipeline and establishes baseline results on in- and out-of-domain test data using this pipeline. Both our dataset and the corresponding baseline code are freely available online. This work creates a starting point for future research in the field of digital history and historic German language newspaper processing. Furthermore, it provides the opportunity to study a low-resource task in computer vision. △ Less

Submitted 7 June, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

Comments: Dataset available at: https://gitlab.uni-bonn.de/digital-history/Chronicling-Germany-Dataset . Baseline code: https://github.com/Digital-History-Bonn/Chronicling-Germany-Code

arXiv:2305.10548 [pdf, other]

Discovering Individual Rewards in Collective Behavior through Inverse Multi-Agent Reinforcement Learning

Authors: Daniel Waelchli, Pascal Weber, Petros Koumoutsakos

Abstract: The discovery of individual objectives in collective behavior of complex dynamical systems such as fish schools and bacteria colonies is a long-standing challenge. Inverse reinforcement learning is a potent approach for addressing this challenge but its applicability to dynamical systems, involving continuous state-action spaces and multiple interacting agents, has been limited. In this study, we… ▽ More The discovery of individual objectives in collective behavior of complex dynamical systems such as fish schools and bacteria colonies is a long-standing challenge. Inverse reinforcement learning is a potent approach for addressing this challenge but its applicability to dynamical systems, involving continuous state-action spaces and multiple interacting agents, has been limited. In this study, we tackle this challenge by introducing an off-policy inverse multi-agent reinforcement learning algorithm (IMARL). Our approach combines the ReF-ER techniques with guided cost learning. By leveraging demonstrations, our algorithm automatically uncovers the reward function and learns an effective policy for the agents. Through extensive experimentation, we demonstrate that the proposed policy captures the behavior observed in the provided data, and achieves promising results across problem domains including single agent models in the OpenAI gym and multi-agent models of schooling behavior. The present study shows that the proposed IMARL algorithm is a significant step towards understanding collective dynamics from the perspective of its constituents, and showcases its value as a tool for studying complex physical systems exhibiting collective behaviour. △ Less

Submitted 17 May, 2023; originally announced May 2023.

arXiv:2302.00325 [pdf]

Privacy Dashboards for Citizens and corresponding GDPR Services for Small Data Holders: A Literature Review

Authors: Nico Puhlmann, Alex Wiesmaier, Patrick Weber, Andreas Heinemann

Abstract: Citizens have gained many rights with the GDPR, e.g. the right to get a copy of their personal data. In practice, however, this is fraught with problems for citizens and small data holders. We present a literature review on solutions promising relief in the form of privacy dashboards for citizens and GDPR services for small data holders. Covered topics are analyzed, categorized and compared. This… ▽ More Citizens have gained many rights with the GDPR, e.g. the right to get a copy of their personal data. In practice, however, this is fraught with problems for citizens and small data holders. We present a literature review on solutions promising relief in the form of privacy dashboards for citizens and GDPR services for small data holders. Covered topics are analyzed, categorized and compared. This is ought to be a step towards both enabling citizens to exercise their GDPR rights and supporting small data holders to comply with their GDPR duties. △ Less

Submitted 23 March, 2024; v1 submitted 1 February, 2023; originally announced February 2023.

Comments: 29 pages

arXiv:2210.07063 [pdf, other]

doi 10.1109/ICDM54844.2022.00141

Deep Clustering With Consensus Representations

Authors: Lukas Miklautz, Martin Teuffenbach, Pascal Weber, Rona Perjuci, Walid Durani, Christian Böhm, Claudia Plant

Abstract: The field of deep clustering combines deep learning and clustering to learn representations that improve both the learned representation and the performance of the considered clustering method. Most existing deep clustering methods are designed for a single clustering method, e.g., k-means, spectral clustering, or Gaussian mixture models, but it is well known that no clustering algorithm works bes… ▽ More The field of deep clustering combines deep learning and clustering to learn representations that improve both the learned representation and the performance of the considered clustering method. Most existing deep clustering methods are designed for a single clustering method, e.g., k-means, spectral clustering, or Gaussian mixture models, but it is well known that no clustering algorithm works best in all circumstances. Consensus clustering tries to alleviate the individual weaknesses of clustering algorithms by building a consensus between members of a clustering ensemble. Currently, there is no deep clustering method that can include multiple heterogeneous clustering algorithms in an ensemble to update representations and clusterings together. To close this gap, we introduce the idea of a consensus representation that maximizes the agreement between ensemble members. Further, we propose DECCS (Deep Embedded Clustering with Consensus representationS), a deep consensus clustering method that learns a consensus representation by enhancing the embedded space to such a degree that all ensemble members agree on a common clustering result. Our contributions are the following: (1) We introduce the idea of learning consensus representations for heterogeneous clusterings, a novel notion to approach consensus clustering. (2) We propose DECCS, the first deep clustering method that jointly improves the representation and clustering results of multiple heterogeneous clustering algorithms. (3) We show in experiments that learning a consensus representation with DECCS is outperforming several relevant baselines from deep clustering and consensus clustering. Our code can be found at https://gitlab.cs.univie.ac.at/lukas/deccs △ Less

Submitted 13 October, 2022; originally announced October 2022.

Comments: Accepted by the IEEE International Conference on Data Mining (ICDM) 2022

arXiv:2203.13319 [pdf, other]

Remember and Forget Experience Replay for Multi-Agent Reinforcement Learning

Authors: Pascal Weber, Daniel Wälchli, Mustafa Zeqiri, Petros Koumoutsakos

Abstract: We present the extension of the Remember and Forget for Experience Replay (ReF-ER) algorithm to Multi-Agent Reinforcement Learning (MARL). ReF-ER was shown to outperform state of the art algorithms for continuous control in problems ranging from the OpenAI Gym to complex fluid flows. In MARL, the dependencies between the agents are included in the state-value estimator and the environment dynamics… ▽ More We present the extension of the Remember and Forget for Experience Replay (ReF-ER) algorithm to Multi-Agent Reinforcement Learning (MARL). ReF-ER was shown to outperform state of the art algorithms for continuous control in problems ranging from the OpenAI Gym to complex fluid flows. In MARL, the dependencies between the agents are included in the state-value estimator and the environment dynamics are modeled via the importance weights used by ReF-ER. In collaborative environments, we find the best performance when the value is estimated using individual rewards and we ignore the effects of other actions on the transition map. We benchmark the performance of ReF-ER MARL on the Stanford Intelligent Systems Laboratory (SISL) environments. We find that employing a single feed-forward neural network for the policy and the value function in ReF-ER MARL, outperforms state of the art algorithms that rely on complex neural network architectures. △ Less

Submitted 2 September, 2022; v1 submitted 24 March, 2022; originally announced March 2022.

arXiv:2105.00771 [pdf, other]

doi 10.1103/PhysRevFluids.6.093101

Learning swimming escape patterns for larval fish under energy constraints

Authors: Ioannis Mandralis, Pascal Weber, Guido Novati, Petros Koumoutsakos

Abstract: Swimming organisms can escape their predators by creating and harnessing unsteady flow fields through their body motions. Stochastic optimization and flow simulations have identified escape patterns that are consistent with those observed in natural larval swimmers. However, these patterns have been limited by the specification of a particular cost function and depend on a prescribed functional fo… ▽ More Swimming organisms can escape their predators by creating and harnessing unsteady flow fields through their body motions. Stochastic optimization and flow simulations have identified escape patterns that are consistent with those observed in natural larval swimmers. However, these patterns have been limited by the specification of a particular cost function and depend on a prescribed functional form of the body motion. Here, we deploy reinforcement learning to discover swimmer escape patterns for larval fish under energy constraints. The identified patterns include the C-start mechanism, in addition to more energetically efficient escapes. We find that maximizing distance with limited energy requires swimming via short bursts of accelerating motion interlinked with phases of gliding. The present, data efficient, reinforcement learning algorithm results in an array of patterns that reveal practical flow optimization principles for efficient swimming and the methodology can be transferred to the control of aquatic robotic devices operating under energy constraints. △ Less

Submitted 28 June, 2021; v1 submitted 3 May, 2021; originally announced May 2021.

Journal ref: Phys. Rev. Fluids 6, 093101 (2021)

arXiv:2010.09269 [pdf]

A Reinforcement Learning Approach to Health Aware Control Strategy

Authors: Mayank Shekhar Jha, Philippe Weber, Didier Theilliol, Jean-Christophe Ponsart, Didier Maquin

Abstract: Health-aware control (HAC) has emerged as one of the domains where control synthesis is sought based upon the failure prognostics of system/component or the Remaining Useful Life (RUL) predictions of critical components. The fact that mathematical dynamic (transition) models of RUL are rarely available, makes it difficult for RUL information to be incorporated into the control paradigm. A novel fr… ▽ More Health-aware control (HAC) has emerged as one of the domains where control synthesis is sought based upon the failure prognostics of system/component or the Remaining Useful Life (RUL) predictions of critical components. The fact that mathematical dynamic (transition) models of RUL are rarely available, makes it difficult for RUL information to be incorporated into the control paradigm. A novel framework for health aware control is presented in this paper where reinforcement learning based approach is used to learn an optimal control policy in face of component degradation by integrating global system transition data (generated by an analytical model that mimics the real system) and RUL predictions. The RUL predictions generated at each step, is tracked to a desired value of RUL. The latter is integrated within a cost function which is maximized to learn the optimal control. The proposed method is studied using simulation of a DC motor and shaft wear. △ Less

Submitted 19 October, 2020; originally announced October 2020.

Journal ref: Mediterranean Conference on Control and Automation (MED). IEEE, 2019, Jul 2019, Akko, Israel

arXiv:2002.11623 [pdf]

Trends of digitalization and adoption of big data & analytics among UK SMEs: Analysis and lessons drawn from a case study of 53 SMEs

Authors: Muhidin Mohamed, Philip Weber

Abstract: Small and Medium Enterprises (SMEs) now generate digital data at an unprecedented rate from online transactions, social media marketing and associated customer interactions, online product or service reviews and feedback, clinical diagnosis, Internet of Things (IoT) sensors, and production processes. All these forms of data can be transformed into monetary value if put into a proper data value cha… ▽ More Small and Medium Enterprises (SMEs) now generate digital data at an unprecedented rate from online transactions, social media marketing and associated customer interactions, online product or service reviews and feedback, clinical diagnosis, Internet of Things (IoT) sensors, and production processes. All these forms of data can be transformed into monetary value if put into a proper data value chain. This requires both skills and IT investments for the long-term benefit of businesses. However, such spending is beyond the capacity of most SMEs due to their limited resources and restricted access to finances. This paper presents lessons learned from a case study of 53 UK SMEs, mostly from the West Midlands region of England, supported as part of a 3-year ERDF project, Big Data Corridor, in the areas of big data management, analytics and related IT issues. Based on our study's sample companies, several perspectives including the digital technology trends, challenges facing the UK SMEs, and the state of their adoption in data analytics and big data, are presented in the paper. △ Less

Submitted 4 March, 2020; v1 submitted 14 February, 2020; originally announced February 2020.

arXiv:1911.00044 [pdf, ps, other]

Edge minimization in de Bruijn graphs

Authors: Uwe Baier, Thomas Büchler, Enno Ohlebusch, Pascal Weber

Abstract: This paper introduces the de Bruijn graph edge minimization problem, which is related to the compression of de Bruijn graphs: find the order-k de Bruijn graph with minimum edge count among all orders. We describe an efficient algorithm that solves this problem. Since the edge minimization problem is connected to the BWT compression technique called "tunneling", the paper also describes a way to mi… ▽ More This paper introduces the de Bruijn graph edge minimization problem, which is related to the compression of de Bruijn graphs: find the order-k de Bruijn graph with minimum edge count among all orders. We describe an efficient algorithm that solves this problem. Since the edge minimization problem is connected to the BWT compression technique called "tunneling", the paper also describes a way to minimize the length of a tunneled BWT in such a way that useful properties for sequence analysis are preserved. Although being a restriction, this is significant progress towards a solution to the open problem of finding optimal disjoint blocks that minimize space, as stated in Alanko et al. (DCC 2019). △ Less

Submitted 17 January, 2020; v1 submitted 31 October, 2019; originally announced November 2019.

Comments: Accepted for Data Compression Conference 2020

arXiv:1905.02187 [pdf, other]

Principles of Information Storage in Small-Molecule Mixtures

Authors: Jacob K. Rosenstein, Christopher Rose, Sherief Reda, Peter M. Weber, Eunsuk Kim, Jason Sello, Joseph Geiser, Eamonn Kennedy, Christopher Arcadia, Amanda Dombroski, Kady Oakley, Shui Ling Chen, Hokchhay Tann, Brenda M. Rubenstein

Abstract: Molecular data systems have the potential to store information at dramatically higher density than existing electronic media. Some of the first experimental demonstrations of this idea have used DNA, but nature also uses a wide diversity of smaller non-polymeric molecules to preserve, process, and transmit information. In this paper, we present a general framework for quantifying chemical memory,… ▽ More Molecular data systems have the potential to store information at dramatically higher density than existing electronic media. Some of the first experimental demonstrations of this idea have used DNA, but nature also uses a wide diversity of smaller non-polymeric molecules to preserve, process, and transmit information. In this paper, we present a general framework for quantifying chemical memory, which is not limited to polymers and extends to mixtures of molecules of all types. We show that the theoretical limit for molecular information is two orders of magnitude denser by mass than DNA, although this comes with different practical constraints on total capacity. We experimentally demonstrate kilobyte-scale information storage in mixtures of small synthetic molecules, and we consider some of the new perspectives that will be necessary to harness the information capacity available from the vast non-genomic chemical space. △ Less

Submitted 6 May, 2019; originally announced May 2019.

arXiv:1505.05625 [pdf]

Semantic Degrees for Industrie 4.0

Authors: Chih-Hong Cheng, Tuncay Guelfirat, Christian Messinger, Johannes Schmitt, Matthias Schnelte, Peter Weber

Abstract: Under the context of Industrie 4.0 (I4.0), future production systems provide balanced operations between manufacturing flexibility and efficiency, realized in an autonomous, horizontal, and decentralized item-level production control framework. Structured interoperability via precise formulations on an appropriate degree is crucial to achieve engineering efficiency in the system life cycle. Howeve… ▽ More Under the context of Industrie 4.0 (I4.0), future production systems provide balanced operations between manufacturing flexibility and efficiency, realized in an autonomous, horizontal, and decentralized item-level production control framework. Structured interoperability via precise formulations on an appropriate degree is crucial to achieve engineering efficiency in the system life cycle. However, selecting the degree of formalization can be challenging, as it crucially depends on the desired common understanding (semantic degree) between multiple parties. In this paper, we categorize different semantic degrees and map a set of technologies in industrial automation to their associated degrees. Furthermore, we created guidelines to assist engineers selecting appropriate semantic degrees in their design. We applied these guidelines on publically available scenarios to examine the validity of the approach, and identified semantic elements over internally developed use cases targeting semantically-enabled plug-and-produce. △ Less

Submitted 22 May, 2015; v1 submitted 21 May, 2015; originally announced May 2015.

Comments: Timestamp of work-in-progress; the paper has been circulated within standardization units

Showing 1–11 of 11 results for author: Weber, P