subscribe to arXiv mailings

Towards Mobility Data Science (Vision Paper)

Authors: Mohamed Mokbel, Mahmoud Sakr, Li Xiong, Andreas Züfle, Jussara Almeida, Taylor Anderson, Walid Aref, Gennady Andrienko, Natalia Andrienko, Yang Cao, Sanjay Chawla, Reynold Cheng, Panos Chrysanthis, Xiqi Fei, Gabriel Ghinita, Anita Graser, Dimitrios Gunopulos, Christian Jensen, Joon-Seok Kim, Kyoung-Sook Kim, Peer Kröger, John Krumm, Johannes Lauer, Amr Magdy, Mario Nascimento , et al. (23 additional authors not shown)

Abstract: Mobility data captures the locations of moving objects such as humans, animals, and cars. With the availability of GPS-equipped mobile devices and other inexpensive location-tracking technologies, mobility data is collected ubiquitously. In recent years, the use of mobility data has demonstrated significant impact in various domains including traffic management, urban planning, and health sciences… ▽ More Mobility data captures the locations of moving objects such as humans, animals, and cars. With the availability of GPS-equipped mobile devices and other inexpensive location-tracking technologies, mobility data is collected ubiquitously. In recent years, the use of mobility data has demonstrated significant impact in various domains including traffic management, urban planning, and health sciences. In this paper, we present the emerging domain of mobility data science. Towards a unified approach to mobility data science, we envision a pipeline having the following components: mobility data collection, cleaning, analysis, management, and privacy. For each of these components, we explain how mobility data science differs from general data science, we survey the current state of the art and describe open challenges for the research community in the coming years. △ Less

Submitted 7 March, 2024; v1 submitted 21 June, 2023; originally announced July 2023.

Comments: Updated to reflect the major revision for ACM Transactions on Spatial Algorithms and Systems (TSAS). This version reflects the final version accepted by ACM TSAS

arXiv:2305.19985 [pdf, other]

On the Existence of Reactive Strategies Resilient to Delay

Authors: Martin Fränzle, Paul Kröger, Sarah Winter, Martin Zimmermann

Abstract: We compare games under delayed control and delay games, two types of infinite games modelling asynchronicity in reactive synthesis. In games under delayed control both players suffer from partial informedness due to symmetrically delayed communication, while in delay games, the protagonist has to grant lookahead to the alter player. Our first main result, the interreducibility of the existence of… ▽ More We compare games under delayed control and delay games, two types of infinite games modelling asynchronicity in reactive synthesis. In games under delayed control both players suffer from partial informedness due to symmetrically delayed communication, while in delay games, the protagonist has to grant lookahead to the alter player. Our first main result, the interreducibility of the existence of sure winning strategies for the protagonist, allows to transfer known complexity results and bounds on the delay from delay games to games under delayed control, for which no such results had been known. We furthermore analyse existence of randomized strategies that win almost surely, where this correspondence between the two types of games breaks down. In this setting, some games surely won by the alter player in delay games can now be won almost surely by the protagonist in the corresponding game under delayed control, showing that it indeed makes a difference whether the protagonist has to grant lookahead or both players suffer from partial informedness. These results get even more pronounced when we finally address the quantitative goal of winning with a probability in $[0,1]$. We show that for any rational threshold $θ\in [0,1]$ there is a game that can be won by the protagonist with exactly probability $θ$ under delayed control, while being surely won by alter in the delay game setting. All these findings refine our original result that games under delayed control are not determined. △ Less

Submitted 12 March, 2024; v1 submitted 31 May, 2023; originally announced May 2023.

Comments: Full version of arXiv:2310.01010, contains all proofs omitted in the conference version as well as a new section on winning games under delayed control with mixed strategies with respect to a fixed threshold

arXiv:2303.09841 [pdf, other]

GADformer: A Transparent Transformer Model for Group Anomaly Detection on Trajectories

Authors: Andreas Lohrer, Darpan Malik, Claudius Zelenka, Peer Kröger

Abstract: Group Anomaly Detection (GAD) identifies unusual pattern in groups where individual members might not be anomalous. This task is of major importance across multiple disciplines, in which also sequences like trajectories can be considered as a group. As groups become more diverse in heterogeneity and size, detecting group anomalies becomes challenging, especially without supervision. Though Recurre… ▽ More Group Anomaly Detection (GAD) identifies unusual pattern in groups where individual members might not be anomalous. This task is of major importance across multiple disciplines, in which also sequences like trajectories can be considered as a group. As groups become more diverse in heterogeneity and size, detecting group anomalies becomes challenging, especially without supervision. Though Recurrent Neural Networks are well established deep sequence models, their performance can decrease with increasing sequence lengths. Hence, this paper introduces GADformer, a BERT-based model for attention-driven GAD on trajectories in unsupervised and semi-supervised settings. We demonstrate how group anomalies can be detected by attention-based GAD. We also introduce the Block-Attention-anomaly-Score (BAS) to enhance model transparency by scoring attention patterns. In addition to that, synthetic trajectory generation allows various ablation studies. In extensive experiments we investigate our approach versus related works in their robustness for trajectory noise and novelties on synthetic data and three real world datasets. △ Less

Submitted 25 April, 2024; v1 submitted 17 March, 2023; originally announced March 2023.

Comments: accepted at International Joint Conference on Neural Networks (IJCNN) 2024, Yokohama, Japan

arXiv:2302.12317 [pdf, other]

Fact or Artifact? Revise Layer-wise Relevance Propagation on various ANN Architectures

Authors: Marco Landt-Hayen, Willi Rath, Martin Claus, Peer Kröger

Abstract: Layer-wise relevance propagation (LRP) is a widely used and powerful technique to reveal insights into various artificial neural network (ANN) architectures. LRP is often used in the context of image classification. The aim is to understand, which parts of the input sample have highest relevance and hence most influence on the model prediction. Relevance can be traced back through the network to a… ▽ More Layer-wise relevance propagation (LRP) is a widely used and powerful technique to reveal insights into various artificial neural network (ANN) architectures. LRP is often used in the context of image classification. The aim is to understand, which parts of the input sample have highest relevance and hence most influence on the model prediction. Relevance can be traced back through the network to attribute a certain score to each input pixel. Relevance scores are then combined and displayed as heat maps and give humans an intuitive visual understanding of classification models. Opening the black box to understand the classification engine in great detail is essential for domain experts to gain trust in ANN models. However, there are pitfalls in terms of model-inherent artifacts included in the obtained relevance maps, that can easily be missed. But for a valid interpretation, these artifacts must not be ignored. Here, we apply and revise LRP on various ANN architectures trained as classifiers on geospatial and synthetic data. Depending on the network architecture, we show techniques to control model focus and give guidance to improve the quality of obtained relevance maps to separate facts from artifacts. △ Less

Submitted 30 June, 2023; v1 submitted 23 February, 2023; originally announced February 2023.

Comments: Fixed typo

arXiv:2211.13314 [pdf, other]

doi 10.1007/s10994-024-06521-2

CoMadOut -- A Robust Outlier Detection Algorithm based on CoMAD

Authors: Andreas Lohrer, Daniyal Kazempour, Maximilian Hünemörder, Peer Kröger

Abstract: Unsupervised learning methods are well established in the area of anomaly detection and achieve state of the art performances on outlier datasets. Outliers play a significant role, since they bear the potential to distort the predictions of a machine learning algorithm on a given dataset. Especially among PCA-based methods, outliers have an additional destructive potential regarding the result: th… ▽ More Unsupervised learning methods are well established in the area of anomaly detection and achieve state of the art performances on outlier datasets. Outliers play a significant role, since they bear the potential to distort the predictions of a machine learning algorithm on a given dataset. Especially among PCA-based methods, outliers have an additional destructive potential regarding the result: they may not only distort the orientation and translation of the principal components, they also make it more complicated to detect outliers. To address this problem, we propose the robust outlier detection algorithm CoMadOut, which satisfies two required properties: (1) being robust towards outliers and (2) detecting them. Our CoMadOut outlier detection variants using comedian PCA define, dependent on its variant, an inlier region with a robust noise margin by measures of in-distribution (variant CMO) and optimized scores by measures of out-of-distribution (variants CMO*), e.g. kurtosis-weighting by CMO+k. These measures allow distribution based outlier scoring for each principal component, and thus, an appropriate alignment of the degree of outlierness between normal and abnormal instances. Experiments comparing CoMadOut with traditional, deep and other comparable robust outlier detection methods showed that the performance of the introduced CoMadOut approach is competitive to well established methods related to average precision (AP), area under the precision recall curve (AUPRC) and area under the receiver operating characteristic (AUROC) curve. In summary our approach can be seen as a robust alternative for outlier detection tasks. △ Less

Submitted 1 July, 2024; v1 submitted 23 November, 2022; originally announced November 2022.

Comments: published in Springer Machine Learning Journal (MLJ)

MSC Class: 68T99; 68W25; 62H86; 62H25; 62G35

Journal ref: Machine Learning, Special Issue on Imbalanced Learning ISSN: 0885-6125 (Print) 1573-0565 (Online), 2024, Pages 1-75

arXiv:2210.09958 [pdf, other]

Layer-wise Relevance Propagation for Echo State Networks applied to Earth System Variability

Authors: Marco Landt-Hayen, Peer Kröger, Martin Claus, Willi Rath

Abstract: Artificial neural networks (ANNs) are known to be powerful methods for many hard problems (e.g. image classification, speech recognition or time series prediction). However, these models tend to produce black-box results and are often difficult to interpret. Layer-wise relevance propagation (LRP) is a widely used technique to understand how ANN models come to their conclusion and to understand wha… ▽ More Artificial neural networks (ANNs) are known to be powerful methods for many hard problems (e.g. image classification, speech recognition or time series prediction). However, these models tend to produce black-box results and are often difficult to interpret. Layer-wise relevance propagation (LRP) is a widely used technique to understand how ANN models come to their conclusion and to understand what a model has learned. Here, we focus on Echo State Networks (ESNs) as a certain type of recurrent neural networks, also known as reservoir computing. ESNs are easy to train and only require a small number of trainable parameters, but are still black-box models. We show how LRP can be applied to ESNs in order to open the black-box. We also show how ESNs can be used not only for time series prediction but also for image classification: Our ESN model serves as a detector for El Nino Southern Oscillation (ENSO) from sea surface temperature anomalies. ENSO is actually a well-known problem and has been extensively discussed before. But here we use this simple problem to demonstrate how LRP can significantly enhance the explainablility of ESNs. △ Less

Submitted 16 November, 2022; v1 submitted 18 October, 2022; originally announced October 2022.

Comments: Shortened title, corrected author affiliation, added citation reference: Accepted at 3rd International Conference on Machine Learning Techniques (MLTEC 2022), Zurich, Switzerland

arXiv:2207.06755 [pdf, other]

doi 10.4204/EPTCS.361.6

Verification of Sigmoidal Artificial Neural Networks using iSAT

Authors: Dominik Grundt, Sorin Liviu Jurj, Willem Hagemann, Paul Kröger, Martin Fränzle

Abstract: This paper presents an approach for verifying the behaviour of nonlinear Artificial Neural Networks (ANNs) found in cyber-physical safety-critical systems. We implement a dedicated interval constraint propagator for the sigmoid function into the SMT solver iSAT and compare this approach with a compositional approach encoding the sigmoid function by basic arithmetic features available in iSAT and a… ▽ More This paper presents an approach for verifying the behaviour of nonlinear Artificial Neural Networks (ANNs) found in cyber-physical safety-critical systems. We implement a dedicated interval constraint propagator for the sigmoid function into the SMT solver iSAT and compare this approach with a compositional approach encoding the sigmoid function by basic arithmetic features available in iSAT and an approximating approach. Our experimental results show that the dedicated and the compositional approach clearly outperform the approximating approach. Throughout all our benchmarks, the dedicated approach showed an equal or better performance compared to the compositional approach. △ Less

Submitted 14 July, 2022; originally announced July 2022.

Comments: In Proceedings SNR 2021, arXiv:2207.04391

Journal ref: EPTCS 361, 2022, pp. 45-60

arXiv:2207.00510 [pdf, other]

Enhancing cluster analysis via topological manifold learning

Authors: Moritz Herrmann, Daniyal Kazempour, Fabian Scheipl, Peer Kröger

Abstract: We discuss topological aspects of cluster analysis and show that inferring the topological structure of a dataset before clustering it can considerably enhance cluster detection: theoretical arguments and empirical evidence show that clustering embedding vectors, representing the structure of a data manifold instead of the observed feature vectors themselves, is highly beneficial. To demonstrate,… ▽ More We discuss topological aspects of cluster analysis and show that inferring the topological structure of a dataset before clustering it can considerably enhance cluster detection: theoretical arguments and empirical evidence show that clustering embedding vectors, representing the structure of a data manifold instead of the observed feature vectors themselves, is highly beneficial. To demonstrate, we combine manifold learning method UMAP for inferring the topological structure with density-based clustering method DBSCAN. Synthetic and real data results show that this both simplifies and improves clustering in a diverse set of low- and high-dimensional problems including clusters of varying density and/or entangled shapes. Our approach simplifies clustering because topological pre-processing consistently reduces parameter sensitivity of DBSCAN. Clustering the resulting embeddings with DBSCAN can then even outperform complex methods such as SPECTACL and ClusterGAN. Finally, our investigation suggests that the crucial issue in clustering does not appear to be the nominal dimension of the data or how many irrelevant features it contains, but rather how \textit{separable} the clusters are in the ambient observation space they are embedded in, which is usually the (high-dimensional) Euclidean space defined by the features of the data. Our approach is successful because we perform the cluster analysis after projecting the data into a more suitable space that is optimized for separability, in some sense. △ Less

Submitted 1 July, 2022; originally announced July 2022.

Comments: 43, pages, 10 figures

arXiv:2109.03655 [pdf, other]

doi 10.1109/BigData.2017.8258105

On Event-Driven Knowledge Graph Completion in Digital Factories

Authors: Martin Ringsquandl, Evgeny Kharlamov, Daria Stepanova, Steffen Lamparter, Raffaello Lepratti, Ian Horrocks, Peer Kröger

Abstract: Smart factories are equipped with machines that can sense their manufacturing environments, interact with each other, and control production processes. Smooth operation of such factories requires that the machines and engineering personnel that conduct their monitoring and diagnostics share a detailed common industrial knowledge about the factory, e.g., in the form of knowledge graphs. Creation an… ▽ More Smart factories are equipped with machines that can sense their manufacturing environments, interact with each other, and control production processes. Smooth operation of such factories requires that the machines and engineering personnel that conduct their monitoring and diagnostics share a detailed common industrial knowledge about the factory, e.g., in the form of knowledge graphs. Creation and maintenance of such knowledge is expensive and requires automation. In this work we show how machine learning that is specifically tailored towards industrial applications can help in knowledge graph completion. In particular, we show how knowledge completion can benefit from event logs that are common in smart factories. We evaluate this on the knowledge graph from a real world-inspired smart factory with encouraging results. △ Less

Submitted 8 September, 2021; originally announced September 2021.

Journal ref: 2017 IEEE International Conference on Big Data (BIGDATA)

arXiv:2011.01773 [pdf, other]

Memory-Efficient RkNN Retrieval by Nonlinear k-Distance Approximation

Authors: Sandra Obermeier, Max Berrendorf, Peer Kröger

Abstract: The reverse k-nearest neighbor (RkNN) query is an established query type with various applications reaching from identifying highly influential objects over incrementally updating kNN graphs to optimizing sensor communication and outlier detection. State-of-the-art solutions exploit that the k-distances in real-world datasets often follow the power-law distribution, and bound them with linear line… ▽ More The reverse k-nearest neighbor (RkNN) query is an established query type with various applications reaching from identifying highly influential objects over incrementally updating kNN graphs to optimizing sensor communication and outlier detection. State-of-the-art solutions exploit that the k-distances in real-world datasets often follow the power-law distribution, and bound them with linear lines in log-log space. In this work, we investigate this assumption and uncover that it is violated in regions of changing density, which we show are typical for real-life datasets. Towards a generic solution, we pose the estimation of k-distances as a regression problem. Thereby, we enable harnessing the power of the abundance of available Machine Learning models and profiting from their advancement. We propose a flexible approach which allows steering the performance-memory consumption trade-off, and in particular to find good solutions with a fixed memory budget crucial in the context of edge computing. Moreover, we show how to obtain and improve guaranteed bounds essential to exact query processing. In experiments on real-world datasets, we demonstrate how this framework can significantly reduce the index memory consumption, and strongly reduce the candidate set size. We publish our code at https://github.com/sobermeier/nonlinear-kdist. △ Less

Submitted 3 November, 2020; originally announced November 2020.

arXiv:2001.05581 [pdf, other]

Complete and Sufficient Spatial Domination of Multidimensional Rectangles

Authors: Tobias Emrich, Hans-Peter Kriegel, Andreas Züfle, Peer Kröger, Matthias Renz

Abstract: Rectangles are used to approximate objects, or sets of objects, in a plethora of applications, systems and index structures. Many tasks, such as nearest neighbor search and similarity ranking, require to decide if objects in one rectangle A may, must, or must not be closer to objects in a second rectangle B, than objects in a third rectangle R. To decide this relation of "Spatial Domination" it ca… ▽ More Rectangles are used to approximate objects, or sets of objects, in a plethora of applications, systems and index structures. Many tasks, such as nearest neighbor search and similarity ranking, require to decide if objects in one rectangle A may, must, or must not be closer to objects in a second rectangle B, than objects in a third rectangle R. To decide this relation of "Spatial Domination" it can be shown that using minimum and maximum distances it is often impossible to detect spatial domination. This spatial gem provides a necessary and sufficient decision criterion for spatial domination that can be computed efficiently even in higher dimensional space. In addition, this spatial gem provides an example, pseudocode and an implementation in Python. △ Less

Submitted 15 January, 2020; originally announced January 2020.

arXiv:1911.07290 [pdf, other]

doi 10.4204/EPTCS.308.4

Dynamic Conflict Resolution Using Justification Based Reasoning

Authors: Werner Damm, Martin Fränzle, Willem Hagemann, Paul Kröger, Astrid Rakow

Abstract: We study conflict situations that dynamically arise in traffic scenarios, where different agents try to achieve their set of goals and have to decide on what to do based on their local perception. We distinguish several types of conflicts for this setting. In order to enable modelling of conflict situations and the reasons for conflicts, we present a logical framework that adopts concepts from epi… ▽ More We study conflict situations that dynamically arise in traffic scenarios, where different agents try to achieve their set of goals and have to decide on what to do based on their local perception. We distinguish several types of conflicts for this setting. In order to enable modelling of conflict situations and the reasons for conflicts, we present a logical framework that adopts concepts from epistemic and modal logic, justification and temporal logic. Using this framework, we illustrate how conflicts can be identified and how we derive a chain of justifications leading to this conflict. We discuss how conflict resolution can be done when a vehicle has local, incomplete information, vehicle to vehicle communication (V2V) and partially ordered goals. △ Less

Submitted 30 October, 2019; originally announced November 2019.

Comments: In Proceedings CREST 2019, arXiv:1910.13641. arXiv admin note: substantial text overlap with arXiv:1905.11764

Journal ref: EPTCS 308, 2019, pp. 47-65

arXiv:1905.11764 [pdf, ps, other]

Justification Based Reasoning in Dynamic Conflict Resolution

Authors: Werner Damm, Martin Fränzle, Willem Hagemann, Paul Kröger, Astrid Rakow

Abstract: We study conflict situations that dynamically arise in traffic scenarios, where different agents try to achieve their set of goals and have to decide on what to do based on their local perception. We distinguish several types of conflicts for this setting. In order to enable modelling of conflict situations and the reasons for conflicts, we present a logical framework that adopts concepts from epi… ▽ More We study conflict situations that dynamically arise in traffic scenarios, where different agents try to achieve their set of goals and have to decide on what to do based on their local perception. We distinguish several types of conflicts for this setting. In order to enable modelling of conflict situations and the reasons for conflicts, we present a logical framework that adopts concepts from epistemic and modal logic, justification and temporal logic. Using this framework, we illustrate how conflicts can be identified and how we derive a chain of justifications leading to this conflict. We discuss how conflict resolution can be done when a vehicle has local, incomplete information, vehicle to vehicle communication (V2V) and partially ordered goals. △ Less

Submitted 28 May, 2019; originally announced May 2019.

arXiv:1412.5808 [pdf, other]

Minimizing the Number of Matching Queries for Object Retrieval

Authors: Johannes Niedermayer, Peer Kröger

Abstract: To increase the computational efficiency of interest-point based object retrieval, researchers have put remarkable research efforts into improving the efficiency of kNN-based feature matching, pursuing to match thousands of features against a database within fractions of a second. However, due to the high-dimensional nature of image features that reduces the effectivity of index structures (curse… ▽ More To increase the computational efficiency of interest-point based object retrieval, researchers have put remarkable research efforts into improving the efficiency of kNN-based feature matching, pursuing to match thousands of features against a database within fractions of a second. However, due to the high-dimensional nature of image features that reduces the effectivity of index structures (curse of dimensionality), due to the vast amount of features stored in image databases (images are often represented by up to several thousand features), this ultimate goal demanded to trade query runtimes for query precision. In this paper we address an approach complementary to indexing in order to improve the runtimes of retrieval by querying only the most promising keypoint descriptors, as this affects matching runtimes linearly and can therefore lead to increased efficiency. As this reduction of kNN queries reduces the number of tentative correspondences, a loss of query precision is minimized by an additional image-level correspondence generation stage with a computational performance independent of the underlying indexing structure. We evaluate such an adaption of the standard recognition pipeline on a variety of datasets using both SIFT and state-of-the-art binary descriptors. Our results suggest that decreasing the number of queried descriptors does not necessarily imply a reduction in the result quality as long as alternative ways of increasing query recall (by thoroughly selecting k) and MAP (using image-level correspondence generation) are considered. △ Less

Submitted 18 August, 2015; v1 submitted 18 December, 2014; originally announced December 2014.

Showing 1–14 of 14 results for author: Kröger, P