-
Towards Mobility Data Science (Vision Paper)
Authors:
Mohamed Mokbel,
Mahmoud Sakr,
Li Xiong,
Andreas Züfle,
Jussara Almeida,
Taylor Anderson,
Walid Aref,
Gennady Andrienko,
Natalia Andrienko,
Yang Cao,
Sanjay Chawla,
Reynold Cheng,
Panos Chrysanthis,
Xiqi Fei,
Gabriel Ghinita,
Anita Graser,
Dimitrios Gunopulos,
Christian Jensen,
Joon-Seok Kim,
Kyoung-Sook Kim,
Peer Kröger,
John Krumm,
Johannes Lauer,
Amr Magdy,
Mario Nascimento
, et al. (23 additional authors not shown)
Abstract:
Mobility data captures the locations of moving objects such as humans, animals, and cars. With the availability of GPS-equipped mobile devices and other inexpensive location-tracking technologies, mobility data is collected ubiquitously. In recent years, the use of mobility data has demonstrated significant impact in various domains including traffic management, urban planning, and health sciences…
▽ More
Mobility data captures the locations of moving objects such as humans, animals, and cars. With the availability of GPS-equipped mobile devices and other inexpensive location-tracking technologies, mobility data is collected ubiquitously. In recent years, the use of mobility data has demonstrated significant impact in various domains including traffic management, urban planning, and health sciences. In this paper, we present the emerging domain of mobility data science. Towards a unified approach to mobility data science, we envision a pipeline having the following components: mobility data collection, cleaning, analysis, management, and privacy. For each of these components, we explain how mobility data science differs from general data science, we survey the current state of the art and describe open challenges for the research community in the coming years.
△ Less
Submitted 7 March, 2024; v1 submitted 21 June, 2023;
originally announced July 2023.
-
On the Existence of Reactive Strategies Resilient to Delay
Authors:
Martin Fränzle,
Paul Kröger,
Sarah Winter,
Martin Zimmermann
Abstract:
We compare games under delayed control and delay games, two types of infinite games modelling asynchronicity in reactive synthesis. In games under delayed control both players suffer from partial informedness due to symmetrically delayed communication, while in delay games, the protagonist has to grant lookahead to the alter player. Our first main result, the interreducibility of the existence of…
▽ More
We compare games under delayed control and delay games, two types of infinite games modelling asynchronicity in reactive synthesis. In games under delayed control both players suffer from partial informedness due to symmetrically delayed communication, while in delay games, the protagonist has to grant lookahead to the alter player. Our first main result, the interreducibility of the existence of sure winning strategies for the protagonist, allows to transfer known complexity results and bounds on the delay from delay games to games under delayed control, for which no such results had been known. We furthermore analyse existence of randomized strategies that win almost surely, where this correspondence between the two types of games breaks down. In this setting, some games surely won by the alter player in delay games can now be won almost surely by the protagonist in the corresponding game under delayed control, showing that it indeed makes a difference whether the protagonist has to grant lookahead or both players suffer from partial informedness. These results get even more pronounced when we finally address the quantitative goal of winning with a probability in $[0,1]$. We show that for any rational threshold $θ\in [0,1]$ there is a game that can be won by the protagonist with exactly probability $θ$ under delayed control, while being surely won by alter in the delay game setting. All these findings refine our original result that games under delayed control are not determined.
△ Less
Submitted 12 March, 2024; v1 submitted 31 May, 2023;
originally announced May 2023.
-
GADformer: A Transparent Transformer Model for Group Anomaly Detection on Trajectories
Authors:
Andreas Lohrer,
Darpan Malik,
Claudius Zelenka,
Peer Kröger
Abstract:
Group Anomaly Detection (GAD) identifies unusual pattern in groups where individual members might not be anomalous. This task is of major importance across multiple disciplines, in which also sequences like trajectories can be considered as a group. As groups become more diverse in heterogeneity and size, detecting group anomalies becomes challenging, especially without supervision. Though Recurre…
▽ More
Group Anomaly Detection (GAD) identifies unusual pattern in groups where individual members might not be anomalous. This task is of major importance across multiple disciplines, in which also sequences like trajectories can be considered as a group. As groups become more diverse in heterogeneity and size, detecting group anomalies becomes challenging, especially without supervision. Though Recurrent Neural Networks are well established deep sequence models, their performance can decrease with increasing sequence lengths. Hence, this paper introduces GADformer, a BERT-based model for attention-driven GAD on trajectories in unsupervised and semi-supervised settings. We demonstrate how group anomalies can be detected by attention-based GAD. We also introduce the Block-Attention-anomaly-Score (BAS) to enhance model transparency by scoring attention patterns. In addition to that, synthetic trajectory generation allows various ablation studies. In extensive experiments we investigate our approach versus related works in their robustness for trajectory noise and novelties on synthetic data and three real world datasets.
△ Less
Submitted 25 April, 2024; v1 submitted 17 March, 2023;
originally announced March 2023.
-
Fact or Artifact? Revise Layer-wise Relevance Propagation on various ANN Architectures
Authors:
Marco Landt-Hayen,
Willi Rath,
Martin Claus,
Peer Kröger
Abstract:
Layer-wise relevance propagation (LRP) is a widely used and powerful technique to reveal insights into various artificial neural network (ANN) architectures. LRP is often used in the context of image classification. The aim is to understand, which parts of the input sample have highest relevance and hence most influence on the model prediction. Relevance can be traced back through the network to a…
▽ More
Layer-wise relevance propagation (LRP) is a widely used and powerful technique to reveal insights into various artificial neural network (ANN) architectures. LRP is often used in the context of image classification. The aim is to understand, which parts of the input sample have highest relevance and hence most influence on the model prediction. Relevance can be traced back through the network to attribute a certain score to each input pixel. Relevance scores are then combined and displayed as heat maps and give humans an intuitive visual understanding of classification models. Opening the black box to understand the classification engine in great detail is essential for domain experts to gain trust in ANN models. However, there are pitfalls in terms of model-inherent artifacts included in the obtained relevance maps, that can easily be missed. But for a valid interpretation, these artifacts must not be ignored. Here, we apply and revise LRP on various ANN architectures trained as classifiers on geospatial and synthetic data. Depending on the network architecture, we show techniques to control model focus and give guidance to improve the quality of obtained relevance maps to separate facts from artifacts.
△ Less
Submitted 30 June, 2023; v1 submitted 23 February, 2023;
originally announced February 2023.
-
CoMadOut -- A Robust Outlier Detection Algorithm based on CoMAD
Authors:
Andreas Lohrer,
Daniyal Kazempour,
Maximilian Hünemörder,
Peer Kröger
Abstract:
Unsupervised learning methods are well established in the area of anomaly detection and achieve state of the art performances on outlier datasets. Outliers play a significant role, since they bear the potential to distort the predictions of a machine learning algorithm on a given dataset. Especially among PCA-based methods, outliers have an additional destructive potential regarding the result: th…
▽ More
Unsupervised learning methods are well established in the area of anomaly detection and achieve state of the art performances on outlier datasets. Outliers play a significant role, since they bear the potential to distort the predictions of a machine learning algorithm on a given dataset. Especially among PCA-based methods, outliers have an additional destructive potential regarding the result: they may not only distort the orientation and translation of the principal components, they also make it more complicated to detect outliers. To address this problem, we propose the robust outlier detection algorithm CoMadOut, which satisfies two required properties: (1) being robust towards outliers and (2) detecting them. Our CoMadOut outlier detection variants using comedian PCA define, dependent on its variant, an inlier region with a robust noise margin by measures of in-distribution (variant CMO) and optimized scores by measures of out-of-distribution (variants CMO*), e.g. kurtosis-weighting by CMO+k. These measures allow distribution based outlier scoring for each principal component, and thus, an appropriate alignment of the degree of outlierness between normal and abnormal instances. Experiments comparing CoMadOut with traditional, deep and other comparable robust outlier detection methods showed that the performance of the introduced CoMadOut approach is competitive to well established methods related to average precision (AP), area under the precision recall curve (AUPRC) and area under the receiver operating characteristic (AUROC) curve. In summary our approach can be seen as a robust alternative for outlier detection tasks.
△ Less
Submitted 1 July, 2024; v1 submitted 23 November, 2022;
originally announced November 2022.
-
Layer-wise Relevance Propagation for Echo State Networks applied to Earth System Variability
Authors:
Marco Landt-Hayen,
Peer Kröger,
Martin Claus,
Willi Rath
Abstract:
Artificial neural networks (ANNs) are known to be powerful methods for many hard problems (e.g. image classification, speech recognition or time series prediction). However, these models tend to produce black-box results and are often difficult to interpret. Layer-wise relevance propagation (LRP) is a widely used technique to understand how ANN models come to their conclusion and to understand wha…
▽ More
Artificial neural networks (ANNs) are known to be powerful methods for many hard problems (e.g. image classification, speech recognition or time series prediction). However, these models tend to produce black-box results and are often difficult to interpret. Layer-wise relevance propagation (LRP) is a widely used technique to understand how ANN models come to their conclusion and to understand what a model has learned. Here, we focus on Echo State Networks (ESNs) as a certain type of recurrent neural networks, also known as reservoir computing. ESNs are easy to train and only require a small number of trainable parameters, but are still black-box models. We show how LRP can be applied to ESNs in order to open the black-box. We also show how ESNs can be used not only for time series prediction but also for image classification: Our ESN model serves as a detector for El Nino Southern Oscillation (ENSO) from sea surface temperature anomalies. ENSO is actually a well-known problem and has been extensively discussed before. But here we use this simple problem to demonstrate how LRP can significantly enhance the explainablility of ESNs.
△ Less
Submitted 16 November, 2022; v1 submitted 18 October, 2022;
originally announced October 2022.
-
Verification of Sigmoidal Artificial Neural Networks using iSAT
Authors:
Dominik Grundt,
Sorin Liviu Jurj,
Willem Hagemann,
Paul Kröger,
Martin Fränzle
Abstract:
This paper presents an approach for verifying the behaviour of nonlinear Artificial Neural Networks (ANNs) found in cyber-physical safety-critical systems. We implement a dedicated interval constraint propagator for the sigmoid function into the SMT solver iSAT and compare this approach with a compositional approach encoding the sigmoid function by basic arithmetic features available in iSAT and a…
▽ More
This paper presents an approach for verifying the behaviour of nonlinear Artificial Neural Networks (ANNs) found in cyber-physical safety-critical systems. We implement a dedicated interval constraint propagator for the sigmoid function into the SMT solver iSAT and compare this approach with a compositional approach encoding the sigmoid function by basic arithmetic features available in iSAT and an approximating approach. Our experimental results show that the dedicated and the compositional approach clearly outperform the approximating approach. Throughout all our benchmarks, the dedicated approach showed an equal or better performance compared to the compositional approach.
△ Less
Submitted 14 July, 2022;
originally announced July 2022.
-
Enhancing cluster analysis via topological manifold learning
Authors:
Moritz Herrmann,
Daniyal Kazempour,
Fabian Scheipl,
Peer Kröger
Abstract:
We discuss topological aspects of cluster analysis and show that inferring the topological structure of a dataset before clustering it can considerably enhance cluster detection: theoretical arguments and empirical evidence show that clustering embedding vectors, representing the structure of a data manifold instead of the observed feature vectors themselves, is highly beneficial. To demonstrate,…
▽ More
We discuss topological aspects of cluster analysis and show that inferring the topological structure of a dataset before clustering it can considerably enhance cluster detection: theoretical arguments and empirical evidence show that clustering embedding vectors, representing the structure of a data manifold instead of the observed feature vectors themselves, is highly beneficial. To demonstrate, we combine manifold learning method UMAP for inferring the topological structure with density-based clustering method DBSCAN. Synthetic and real data results show that this both simplifies and improves clustering in a diverse set of low- and high-dimensional problems including clusters of varying density and/or entangled shapes. Our approach simplifies clustering because topological pre-processing consistently reduces parameter sensitivity of DBSCAN. Clustering the resulting embeddings with DBSCAN can then even outperform complex methods such as SPECTACL and ClusterGAN. Finally, our investigation suggests that the crucial issue in clustering does not appear to be the nominal dimension of the data or how many irrelevant features it contains, but rather how \textit{separable} the clusters are in the ambient observation space they are embedded in, which is usually the (high-dimensional) Euclidean space defined by the features of the data. Our approach is successful because we perform the cluster analysis after projecting the data into a more suitable space that is optimized for separability, in some sense.
△ Less
Submitted 1 July, 2022;
originally announced July 2022.
-
On Event-Driven Knowledge Graph Completion in Digital Factories
Authors:
Martin Ringsquandl,
Evgeny Kharlamov,
Daria Stepanova,
Steffen Lamparter,
Raffaello Lepratti,
Ian Horrocks,
Peer Kröger
Abstract:
Smart factories are equipped with machines that can sense their manufacturing environments, interact with each other, and control production processes. Smooth operation of such factories requires that the machines and engineering personnel that conduct their monitoring and diagnostics share a detailed common industrial knowledge about the factory, e.g., in the form of knowledge graphs. Creation an…
▽ More
Smart factories are equipped with machines that can sense their manufacturing environments, interact with each other, and control production processes. Smooth operation of such factories requires that the machines and engineering personnel that conduct their monitoring and diagnostics share a detailed common industrial knowledge about the factory, e.g., in the form of knowledge graphs. Creation and maintenance of such knowledge is expensive and requires automation. In this work we show how machine learning that is specifically tailored towards industrial applications can help in knowledge graph completion. In particular, we show how knowledge completion can benefit from event logs that are common in smart factories. We evaluate this on the knowledge graph from a real world-inspired smart factory with encouraging results.
△ Less
Submitted 8 September, 2021;
originally announced September 2021.
-
Memory-Efficient RkNN Retrieval by Nonlinear k-Distance Approximation
Authors:
Sandra Obermeier,
Max Berrendorf,
Peer Kröger
Abstract:
The reverse k-nearest neighbor (RkNN) query is an established query type with various applications reaching from identifying highly influential objects over incrementally updating kNN graphs to optimizing sensor communication and outlier detection. State-of-the-art solutions exploit that the k-distances in real-world datasets often follow the power-law distribution, and bound them with linear line…
▽ More
The reverse k-nearest neighbor (RkNN) query is an established query type with various applications reaching from identifying highly influential objects over incrementally updating kNN graphs to optimizing sensor communication and outlier detection. State-of-the-art solutions exploit that the k-distances in real-world datasets often follow the power-law distribution, and bound them with linear lines in log-log space. In this work, we investigate this assumption and uncover that it is violated in regions of changing density, which we show are typical for real-life datasets. Towards a generic solution, we pose the estimation of k-distances as a regression problem. Thereby, we enable harnessing the power of the abundance of available Machine Learning models and profiting from their advancement. We propose a flexible approach which allows steering the performance-memory consumption trade-off, and in particular to find good solutions with a fixed memory budget crucial in the context of edge computing. Moreover, we show how to obtain and improve guaranteed bounds essential to exact query processing. In experiments on real-world datasets, we demonstrate how this framework can significantly reduce the index memory consumption, and strongly reduce the candidate set size. We publish our code at https://github.com/sobermeier/nonlinear-kdist.
△ Less
Submitted 3 November, 2020;
originally announced November 2020.
-
Complete and Sufficient Spatial Domination of Multidimensional Rectangles
Authors:
Tobias Emrich,
Hans-Peter Kriegel,
Andreas Züfle,
Peer Kröger,
Matthias Renz
Abstract:
Rectangles are used to approximate objects, or sets of objects, in a plethora of applications, systems and index structures. Many tasks, such as nearest neighbor search and similarity ranking, require to decide if objects in one rectangle A may, must, or must not be closer to objects in a second rectangle B, than objects in a third rectangle R. To decide this relation of "Spatial Domination" it ca…
▽ More
Rectangles are used to approximate objects, or sets of objects, in a plethora of applications, systems and index structures. Many tasks, such as nearest neighbor search and similarity ranking, require to decide if objects in one rectangle A may, must, or must not be closer to objects in a second rectangle B, than objects in a third rectangle R. To decide this relation of "Spatial Domination" it can be shown that using minimum and maximum distances it is often impossible to detect spatial domination. This spatial gem provides a necessary and sufficient decision criterion for spatial domination that can be computed efficiently even in higher dimensional space. In addition, this spatial gem provides an example, pseudocode and an implementation in Python.
△ Less
Submitted 15 January, 2020;
originally announced January 2020.
-
Dynamic Conflict Resolution Using Justification Based Reasoning
Authors:
Werner Damm,
Martin Fränzle,
Willem Hagemann,
Paul Kröger,
Astrid Rakow
Abstract:
We study conflict situations that dynamically arise in traffic scenarios, where different agents try to achieve their set of goals and have to decide on what to do based on their local perception. We distinguish several types of conflicts for this setting. In order to enable modelling of conflict situations and the reasons for conflicts, we present a logical framework that adopts concepts from epi…
▽ More
We study conflict situations that dynamically arise in traffic scenarios, where different agents try to achieve their set of goals and have to decide on what to do based on their local perception. We distinguish several types of conflicts for this setting. In order to enable modelling of conflict situations and the reasons for conflicts, we present a logical framework that adopts concepts from epistemic and modal logic, justification and temporal logic. Using this framework, we illustrate how conflicts can be identified and how we derive a chain of justifications leading to this conflict. We discuss how conflict resolution can be done when a vehicle has local, incomplete information, vehicle to vehicle communication (V2V) and partially ordered goals.
△ Less
Submitted 30 October, 2019;
originally announced November 2019.
-
Justification Based Reasoning in Dynamic Conflict Resolution
Authors:
Werner Damm,
Martin Fränzle,
Willem Hagemann,
Paul Kröger,
Astrid Rakow
Abstract:
We study conflict situations that dynamically arise in traffic scenarios, where different agents try to achieve their set of goals and have to decide on what to do based on their local perception. We distinguish several types of conflicts for this setting. In order to enable modelling of conflict situations and the reasons for conflicts, we present a logical framework that adopts concepts from epi…
▽ More
We study conflict situations that dynamically arise in traffic scenarios, where different agents try to achieve their set of goals and have to decide on what to do based on their local perception. We distinguish several types of conflicts for this setting. In order to enable modelling of conflict situations and the reasons for conflicts, we present a logical framework that adopts concepts from epistemic and modal logic, justification and temporal logic. Using this framework, we illustrate how conflicts can be identified and how we derive a chain of justifications leading to this conflict. We discuss how conflict resolution can be done when a vehicle has local, incomplete information, vehicle to vehicle communication (V2V) and partially ordered goals.
△ Less
Submitted 28 May, 2019;
originally announced May 2019.
-
Minimizing the Number of Matching Queries for Object Retrieval
Authors:
Johannes Niedermayer,
Peer Kröger
Abstract:
To increase the computational efficiency of interest-point based object retrieval, researchers have put remarkable research efforts into improving the efficiency of kNN-based feature matching, pursuing to match thousands of features against a database within fractions of a second. However, due to the high-dimensional nature of image features that reduces the effectivity of index structures (curse…
▽ More
To increase the computational efficiency of interest-point based object retrieval, researchers have put remarkable research efforts into improving the efficiency of kNN-based feature matching, pursuing to match thousands of features against a database within fractions of a second. However, due to the high-dimensional nature of image features that reduces the effectivity of index structures (curse of dimensionality), due to the vast amount of features stored in image databases (images are often represented by up to several thousand features), this ultimate goal demanded to trade query runtimes for query precision. In this paper we address an approach complementary to indexing in order to improve the runtimes of retrieval by querying only the most promising keypoint descriptors, as this affects matching runtimes linearly and can therefore lead to increased efficiency. As this reduction of kNN queries reduces the number of tentative correspondences, a loss of query precision is minimized by an additional image-level correspondence generation stage with a computational performance independent of the underlying indexing structure. We evaluate such an adaption of the standard recognition pipeline on a variety of datasets using both SIFT and state-of-the-art binary descriptors. Our results suggest that decreasing the number of queried descriptors does not necessarily imply a reduction in the result quality as long as alternative ways of increasing query recall (by thoroughly selecting k) and MAP (using image-level correspondence generation) are considered.
△ Less
Submitted 18 August, 2015; v1 submitted 18 December, 2014;
originally announced December 2014.