subscribe to arXiv mailings

Generalizability of experimental studies

Authors: Federico Matteucci, Vadim Arzamasov, Jose Cribeiro-Ramallo, Marco Heyden, Konstantin Ntounas, Klemens Böhm

Abstract: Experimental studies are a cornerstone of machine learning (ML) research. A common, but often implicit, assumption is that the results of a study will generalize beyond the study itself, e.g. to new data. That is, there is a high probability that repeating the study under different conditions will yield similar results. Despite the importance of the concept, the problem of measuring generalizabili… ▽ More Experimental studies are a cornerstone of machine learning (ML) research. A common, but often implicit, assumption is that the results of a study will generalize beyond the study itself, e.g. to new data. That is, there is a high probability that repeating the study under different conditions will yield similar results. Despite the importance of the concept, the problem of measuring generalizability remains open. This is probably due to the lack of a mathematical formalization of experimental studies. In this paper, we propose such a formalization and develop a quantifiable notion of generalizability. This notion allows to explore the generalizability of existing studies and to estimate the number of experiments needed to achieve the generalizability of new studies. To demonstrate its usefulness, we apply it to two recently published benchmarks to discern generalizable and non-generalizable results. We also publish a Python module that allows our analysis to be repeated for other experimental studies. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: Under review

arXiv:2406.10421 [pdf, other]

SciEx: Benchmarking Large Language Models on Scientific Exams with Human Expert Grading and Automatic Grading

Authors: Tu Anh Dinh, Carlos Mullov, Leonard Bärmann, Zhaolin Li, Danni Liu, Simon Reiß, Jueun Lee, Nathan Lerzer, Fabian Ternava, Jianfeng Gao, Tobias Röddiger, Alexander Waibel, Tamim Asfour, Michael Beigl, Rainer Stiefelhagen, Carsten Dachsbacher, Klemens Böhm, Jan Niehues

Abstract: With the rapid development of Large Language Models (LLMs), it is crucial to have benchmarks which can evaluate the ability of LLMs on different domains. One common use of LLMs is performing tasks on scientific topics, such as writing algorithms, querying databases or giving mathematical proofs. Inspired by the way university students are evaluated on such tasks, in this paper, we propose SciEx -… ▽ More With the rapid development of Large Language Models (LLMs), it is crucial to have benchmarks which can evaluate the ability of LLMs on different domains. One common use of LLMs is performing tasks on scientific topics, such as writing algorithms, querying databases or giving mathematical proofs. Inspired by the way university students are evaluated on such tasks, in this paper, we propose SciEx - a benchmark consisting of university computer science exam questions, to evaluate LLMs ability on solving scientific tasks. SciEx is (1) multilingual, containing both English and German exams, and (2) multi-modal, containing questions that involve images, and (3) contains various types of freeform questions with different difficulty levels, due to the nature of university exams. We evaluate the performance of various state-of-the-art LLMs on our new benchmark. Since SciEx questions are freeform, it is not straightforward to evaluate LLM performance. Therefore, we provide human expert grading of the LLM outputs on SciEx. We show that the free-form exams in SciEx remain challenging for the current LLMs, where the best LLM only achieves 59.4\% exam grade on average. We also provide detailed comparisons between LLM performance and student performance on SciEx. To enable future evaluation of new LLMs, we propose using LLM-as-a-judge to grade the LLM answers on SciEx. Our experiments show that, although they do not perform perfectly on solving the exams, LLMs are decent as graders, achieving 0.948 Pearson correlation with expert grading. △ Less

Submitted 12 July, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

ACM Class: I.2.7

arXiv:2404.14451 [pdf, other]

Generative Subspace Adversarial Active Learning for Outlier Detection in Multiple Views of High-dimensional Data

Authors: Jose Cribeiro-Ramallo, Vadim Arzamasov, Federico Matteucci, Denis Wambold, Klemens Böhm

Abstract: Outlier detection in high-dimensional tabular data is an important task in data mining, essential for many downstream tasks and applications. Existing unsupervised outlier detection algorithms face one or more problems, including inlier assumption (IA), curse of dimensionality (CD), and multiple views (MV). To address these issues, we introduce Generative Subspace Adversarial Active Learning (GSAA… ▽ More Outlier detection in high-dimensional tabular data is an important task in data mining, essential for many downstream tasks and applications. Existing unsupervised outlier detection algorithms face one or more problems, including inlier assumption (IA), curse of dimensionality (CD), and multiple views (MV). To address these issues, we introduce Generative Subspace Adversarial Active Learning (GSAAL), a novel approach that uses a Generative Adversarial Network with multiple adversaries. These adversaries learn the marginal class probability functions over different data subspaces, while a single generator in the full space models the entire distribution of the inlier class. GSAAL is specifically designed to address the MV limitation while also handling the IA and CD, being the only method to do so. We provide a comprehensive mathematical formulation of MV, convergence guarantees for the discriminators, and scalability results for GSAAL. Our extensive experiments demonstrate the effectiveness and scalability of GSAAL, highlighting its superior performance compared to other popular OD methods, especially in MV scenarios. △ Less

Submitted 20 April, 2024; originally announced April 2024.

Comments: 16 pages, Pre-print

arXiv:2402.03846 [pdf, other]

Efficient Generation of Hidden Outliers for Improved Outlier Detection

Authors: Jose Cribeiro-Ramallo, Vadim Arzamasov, Klemens Böhm

Abstract: Outlier generation is a popular technique used for solving important outlier detection tasks. Generating outliers with realistic behavior is challenging. Popular existing methods tend to disregard the 'multiple views' property of outliers in high-dimensional spaces. The only existing method accounting for this property falls short in efficiency and effectiveness. We propose BISECT, a new outlier g… ▽ More Outlier generation is a popular technique used for solving important outlier detection tasks. Generating outliers with realistic behavior is challenging. Popular existing methods tend to disregard the 'multiple views' property of outliers in high-dimensional spaces. The only existing method accounting for this property falls short in efficiency and effectiveness. We propose BISECT, a new outlier generation method that creates realistic outliers mimicking said property. To do so, BISECT employs a novel proposition introduced in this article stating how to efficiently generate said realistic outliers. Our method has better guarantees and complexity than the current methodology for recreating 'multiple views'. We use the synthetic outliers generated by BISECT to effectively enhance outlier detection in diverse datasets, for multiple use cases. For instance, oversampling with BISECT reduced the error by up to 3 times when compared with the baselines. △ Less

Submitted 6 February, 2024; originally announced February 2024.

arXiv:2402.00592 [pdf, other]

Partial-Label Learning with a Reject Option

Authors: Tobias Fuchs, Florian Kalinke, Klemens Böhm

Abstract: In real-world applications, one often encounters ambiguously labeled data, where different annotators assign conflicting class labels. Partial-label learning allows training classifiers in this weakly supervised setting, where state-of-the-art methods already show good predictive performance. However, even the best algorithms give incorrect predictions, which can have severe consequences when they… ▽ More In real-world applications, one often encounters ambiguously labeled data, where different annotators assign conflicting class labels. Partial-label learning allows training classifiers in this weakly supervised setting, where state-of-the-art methods already show good predictive performance. However, even the best algorithms give incorrect predictions, which can have severe consequences when they impact actions or decisions. We propose a novel risk-consistent partial-label learning algorithm with a reject option, that is, the algorithm can reject unsure predictions. Extensive experiments on artificial and real-world datasets show that our method provides the best trade-off between the number and accuracy of non-rejected predictions when compared to our competitors, which use confidence thresholds for rejecting unsure predictions instead. When evaluated without the reject option, our nearest neighbor-based approach also achieves competitive prediction performance. △ Less

Submitted 5 June, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

arXiv:2307.09191 [pdf, other]

A benchmark of categorical encoders for binary classification

Authors: Federico Matteucci, Vadim Arzamasov, Klemens Boehm

Abstract: Categorical encoders transform categorical features into numerical representations that are indispensable for a wide range of machine learning models. Existing encoder benchmark studies lack generalizability because of their limited choice of (1) encoders, (2) experimental factors, and (3) datasets. Additionally, inconsistencies arise from the adoption of varying aggregation strategies. This paper… ▽ More Categorical encoders transform categorical features into numerical representations that are indispensable for a wide range of machine learning models. Existing encoder benchmark studies lack generalizability because of their limited choice of (1) encoders, (2) experimental factors, and (3) datasets. Additionally, inconsistencies arise from the adoption of varying aggregation strategies. This paper is the most comprehensive benchmark of categorical encoders to date, including an extensive evaluation of 32 configurations of encoders from diverse families, with 36 combinations of experimental factors, and on 50 datasets. The study shows the profound influence of dataset selection, experimental factors, and aggregation strategies on the benchmark's conclusions -- aspects disregarded in previous encoder benchmarks. △ Less

Submitted 20 November, 2023; v1 submitted 17 July, 2023; originally announced July 2023.

Comments: To be published in the 37th Conference on Neural Information Processing Systems (NeurIPS 2023) Track on Datasets and Benchmarks

arXiv:2306.12974 [pdf, other]

Adaptive Bernstein Change Detector for High-Dimensional Data Streams

Authors: Marco Heyden, Edouard Fouché, Vadim Arzamasov, Tanja Fenn, Florian Kalinke, Klemens Böhm

Abstract: Change detection is of fundamental importance when analyzing data streams. Detecting changes both quickly and accurately enables monitoring and prediction systems to react, e.g., by issuing an alarm or by updating a learning algorithm. However, detecting changes is challenging when observations are high-dimensional. In high-dimensional data, change detectors should not only be able to identify whe… ▽ More Change detection is of fundamental importance when analyzing data streams. Detecting changes both quickly and accurately enables monitoring and prediction systems to react, e.g., by issuing an alarm or by updating a learning algorithm. However, detecting changes is challenging when observations are high-dimensional. In high-dimensional data, change detectors should not only be able to identify when changes happen, but also in which subspace they occur. Ideally, one should also quantify how severe they are. Our approach, ABCD, has these properties. ABCD learns an encoder-decoder model and monitors its accuracy over a window of adaptive size. ABCD derives a change score based on Bernstein's inequality to detect deviations in terms of accuracy, which indicate changes. Our experiments demonstrate that ABCD outperforms its best competitor by up to 20% in F1-score on average. It can also accurately estimate changes' subspace, together with a severity measure that correlates with the ground truth. △ Less

Submitted 14 January, 2024; v1 submitted 22 June, 2023; originally announced June 2023.

MSC Class: 68T05 ACM Class: I.2.6

arXiv:2306.07071 [pdf, other]

Budgeted Multi-Armed Bandits with Asymmetric Confidence Intervals

Authors: Marco Heyden, Vadim Arzamasov, Edouard Fouché, Klemens Böhm

Abstract: We study the stochastic Budgeted Multi-Armed Bandit (MAB) problem, where a player chooses from $K$ arms with unknown expected rewards and costs. The goal is to maximize the total reward under a budget constraint. A player thus seeks to choose the arm with the highest reward-cost ratio as often as possible. Current state-of-the-art policies for this problem have several issues, which we illustrate.… ▽ More We study the stochastic Budgeted Multi-Armed Bandit (MAB) problem, where a player chooses from $K$ arms with unknown expected rewards and costs. The goal is to maximize the total reward under a budget constraint. A player thus seeks to choose the arm with the highest reward-cost ratio as often as possible. Current state-of-the-art policies for this problem have several issues, which we illustrate. To overcome them, we propose a new upper confidence bound (UCB) sampling policy, $ω$-UCB, that uses asymmetric confidence intervals. These intervals scale with the distance between the sample mean and the bounds of a random variable, yielding a more accurate and tight estimation of the reward-cost ratio compared to our competitors. We show that our approach has logarithmic regret and consistently outperforms existing policies in synthetic and real settings. △ Less

Submitted 15 August, 2023; v1 submitted 12 June, 2023; originally announced June 2023.

MSC Class: 68T37; 68T05 (Primary) 68W27; 68Q32 (Secondary) ACM Class: I.2.6; H.4.2; G.3

arXiv:2205.12706 [pdf, other]

Maximum Mean Discrepancy on Exponential Windows for Online Change Detection

Authors: Florian Kalinke, Marco Heyden, Edouard Fouché, Klemens Böhm

Abstract: Detecting changes is of fundamental importance when analyzing data streams and has many applications, e.g., predictive maintenance, fraud detection, or medicine. A principled approach to detect changes is to compare the distributions of observations within the stream to each other via hypothesis testing. Maximum mean discrepancy (MMD; also called energy distance) is a well-known (semi-)metric on t… ▽ More Detecting changes is of fundamental importance when analyzing data streams and has many applications, e.g., predictive maintenance, fraud detection, or medicine. A principled approach to detect changes is to compare the distributions of observations within the stream to each other via hypothesis testing. Maximum mean discrepancy (MMD; also called energy distance) is a well-known (semi-)metric on the space of probability distributions. MMD gives rise to powerful non-parametric two-sample tests on kernel-enriched domains under mild conditions, which makes its deployment for change detection desirable. However, the classic MMD estimators suffer quadratic complexity, which prohibits their application in the online change detection setting. We propose a general-purpose change detection algorithm, Maximum Mean Discrepancy on Exponential Windows (MMDEW), which leverages the MMD two-sample test, facilitates its efficient online computation on any kernel-enriched domain, and is able to detect any disparity between distributions. Our experiments and analysis show that (1) MMDEW achieves better detection quality than state-of-the-art competitors and that (2) the algorithm has polylogarithmic runtime and logarithmic memory requirements, which allow its deployment to the streaming setting. △ Less

Submitted 13 March, 2023; v1 submitted 25 May, 2022; originally announced May 2022.

ACM Class: I.2.6; H.1.1

arXiv:2112.13285 [pdf, other]

Pedagogical Rule Extraction to Learn Interpretable Models - an Empirical Study

Authors: Vadim Arzamasov, Benjamin Jochum, Klemens Böhm

Abstract: Machine-learning models are ubiquitous. In some domains, for instance, in medicine, the models' predictions must be interpretable. Decision trees, classification rules, and subgroup discovery are three broad categories of supervised machine-learning models presenting knowledge in the form of interpretable rules. The accuracy of these models learned from small datasets is usually low. Obtaining lar… ▽ More Machine-learning models are ubiquitous. In some domains, for instance, in medicine, the models' predictions must be interpretable. Decision trees, classification rules, and subgroup discovery are three broad categories of supervised machine-learning models presenting knowledge in the form of interpretable rules. The accuracy of these models learned from small datasets is usually low. Obtaining larger datasets is often hard to impossible. Pedagogical rule extraction methods could help to learn better rules from small data by augmenting a dataset employing statistical models and using it to learn a rule-based model. However, existing evaluation of these methods is often inconclusive, and they were not compared so far. Our framework PRELIM unifies existing pedagogical rule extraction techniques. In the extensive experiments, we identified promising PRELIM configurations not studied before. △ Less

Submitted 28 April, 2022; v1 submitted 25 December, 2021; originally announced December 2021.

arXiv:2011.06959 [pdf, other]

doi 10.1016/j.is.2020.101705

Efficient Subspace Search in Data Streams

Authors: Edouard Fouché, Florian Kalinke, Klemens Böhm

Abstract: In the real world, data streams are ubiquitous -- think of network traffic or sensor data. Mining patterns, e.g., outliers or clusters, from such data must take place in real time. This is challenging because (1) streams often have high dimensionality, and (2) the data characteristics may change over time. Existing approaches tend to focus on only one aspect, either high dimensionality or the spec… ▽ More In the real world, data streams are ubiquitous -- think of network traffic or sensor data. Mining patterns, e.g., outliers or clusters, from such data must take place in real time. This is challenging because (1) streams often have high dimensionality, and (2) the data characteristics may change over time. Existing approaches tend to focus on only one aspect, either high dimensionality or the specifics of the streaming setting. For static data, a common approach to deal with high dimensionality -- known as subspace search -- extracts low-dimensional, `interesting' projections (subspaces), in which patterns are easier to find. In this paper, we address both Challenge (1) and (2) by generalising subspace search to data streams. Our approach, Streaming Greedy Maximum Random Deviation (SGMRD), monitors interesting subspaces in high-dimensional data streams. It leverages novel multivariate dependency estimators and monitoring techniques based on bandit theory. We show that the benefits of SGMRD are twofold: (i) It monitors subspaces efficiently, and (ii) this improves the results of downstream data mining tasks, such as outlier detection. Our experiments, performed against synthetic and real-world data, demonstrate that SGMRD outperforms its competitors by a large margin. △ Less

Submitted 7 January, 2021; v1 submitted 13 November, 2020; originally announced November 2020.

Comments: Accepted Manuscript to Information Systems, Volume 97, Elsevier. Final authenticated version: https://doi.org/10.1016/j.is.2020.101705

Journal ref: In: Information Systems 97 (2021), p. 101705. ISSN: 0306-4379

arXiv:2009.13853 [pdf, other]

Efficient SVDD Sampling with Approximation Guarantees for the Decision Boundary

Authors: Adrian Englhardt, Holger Trittenbach, Daniel Kottke, Bernhard Sick, Klemens Böhm

Abstract: Support Vector Data Description (SVDD) is a popular one-class classifiers for anomaly and novelty detection. But despite its effectiveness, SVDD does not scale well with data size. To avoid prohibitive training times, sampling methods select small subsets of the training data on which SVDD trains a decision boundary hopefully equivalent to the one obtained on the full data set. According to the li… ▽ More Support Vector Data Description (SVDD) is a popular one-class classifiers for anomaly and novelty detection. But despite its effectiveness, SVDD does not scale well with data size. To avoid prohibitive training times, sampling methods select small subsets of the training data on which SVDD trains a decision boundary hopefully equivalent to the one obtained on the full data set. According to the literature, a good sample should therefore contain so-called boundary observations that SVDD would select as support vectors on the full data set. However, non-boundary observations also are essential to not fragment contiguous inlier regions and avoid poor classification accuracy. Other aspects, such as selecting a sufficiently representative sample, are important as well. But existing sampling methods largely overlook them, resulting in poor classification accuracy. In this article, we study how to select a sample considering these points. Our approach is to frame SVDD sampling as an optimization problem, where constraints guarantee that sampling indeed approximates the original decision boundary. We then propose RAPID, an efficient algorithm to solve this optimization problem. RAPID does not require any tuning of parameters, is easy to implement and scales well to large data sets. We evaluate our approach on real-world and synthetic data. Our evaluation is the most comprehensive one for SVDD sampling so far. Our results show that RAPID outperforms its competitors in classification accuracy, in sample size, and in runtime. △ Less

Submitted 29 September, 2020; originally announced September 2020.

arXiv:2006.03646 [pdf, other]

doi 10.1145/3447822

Generating Artificial Outliers in the Absence of Genuine Ones -- a Survey

Authors: Georg Steinbuss, Klemens Böhm

Abstract: By definition, outliers are rarely observed in reality, making them difficult to detect or analyse. Artificial outliers approximate such genuine outliers and can, for instance, help with the detection of genuine outliers or with benchmarking outlier-detection algorithms. The literature features different approaches to generate artificial outliers. However, systematic comparison of these approaches… ▽ More By definition, outliers are rarely observed in reality, making them difficult to detect or analyse. Artificial outliers approximate such genuine outliers and can, for instance, help with the detection of genuine outliers or with benchmarking outlier-detection algorithms. The literature features different approaches to generate artificial outliers. However, systematic comparison of these approaches remains absent. This surveys and compares these approaches. We start by clarifying the terminology in the field, which varies from publication to publication, and we propose a general problem formulation. Our description of the connection of generating outliers to other research fields like experimental design or generative models frames the field of artificial outliers. Along with offering a concise description, we group the approaches by their general concepts and how they make use of genuine instances. An extensive experimental study reveals the differences between the generation approaches when ultimately being used for outlier detection. This survey shows that the existing approaches already cover a wide range of concepts underlying the generation, but also that the field still has potential for further development. Our experimental study does confirm the expectation that the quality of the generation approaches varies widely, for example, in terms of the data set they are used on. Ultimately, to guide the choice of the generation approach in a specific context, we propose an appropriate general-decision process. In summary, this survey comprises, describes, and connects all relevant work regarding the generation of artificial outliers and may serve as a basis to guide further research in the field. △ Less

Submitted 5 June, 2020; originally announced June 2020.

arXiv:2005.12178 [pdf, other]

doi 10.1145/3432230

Incremental Real-Time Personalization in Human Activity Recognition Using Domain Adaptive Batch Normalization

Authors: Alan Mazankiewicz, Klemens Böhm, Mario Bergés

Abstract: Human Activity Recognition (HAR) from devices like smartphone accelerometers is a fundamental problem in ubiquitous computing. Machine learning based recognition models often perform poorly when applied to new users that were not part of the training data. Previous work has addressed this challenge by personalizing general recognition models to the unique motion pattern of a new user in a static b… ▽ More Human Activity Recognition (HAR) from devices like smartphone accelerometers is a fundamental problem in ubiquitous computing. Machine learning based recognition models often perform poorly when applied to new users that were not part of the training data. Previous work has addressed this challenge by personalizing general recognition models to the unique motion pattern of a new user in a static batch setting. They require target user data to be available upfront. The more challenging online setting has received less attention. No samples from the target user are available in advance, but they arrive sequentially. Additionally, the motion pattern of users may change over time. Thus, adapting to new and forgetting old information must be traded off. Finally, the target user should not have to do any work to use the recognition system by, say, labeling any activities. Our work addresses all of these challenges by proposing an unsupervised online domain adaptation algorithm. Both classification and personalization happen continuously and incrementally in real time. Our solution works by aligning the feature distributions of all subjects, be they sources or the target, in hidden neural network layers. To this end, we normalize the input of a layer with user-specific mean and variance statistics. During training, these statistics are computed over user-specific batches. In the online phase, they are estimated incrementally for any new target user. △ Less

Submitted 21 December, 2020; v1 submitted 25 May, 2020; originally announced May 2020.

Comments: Updated version of the preprint from 05/2020 after going through revision. The content (experiments, results, proposed method) has not changed. The explanations changed. Certain sentences have been added/removed/rephrased to be clearer. Removed Figure 3. Added Discussion section. Renamed "Description of Approach" Section. Added a reference to related work

Journal ref: Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 4, 4, Article 144 (December 2020), 20 pages

arXiv:2004.06947 [pdf, other]

doi 10.1145/3441453

Benchmarking Unsupervised Outlier Detection with Realistic Synthetic Data

Authors: Georg Steinbuss, Klemens Böhm

Abstract: Benchmarking unsupervised outlier detection is difficult. Outliers are rare, and existing benchmark data contains outliers with various and unknown characteristics. Fully synthetic data usually consists of outliers and regular instance with clear characteristics and thus allows for a more meaningful evaluation of detection methods in principle. Nonetheless, there have only been few attempts to inc… ▽ More Benchmarking unsupervised outlier detection is difficult. Outliers are rare, and existing benchmark data contains outliers with various and unknown characteristics. Fully synthetic data usually consists of outliers and regular instance with clear characteristics and thus allows for a more meaningful evaluation of detection methods in principle. Nonetheless, there have only been few attempts to include synthetic data in benchmarks for outlier detection. This might be due to the imprecise notion of outliers or to the difficulty to arrive at a good coverage of different domains with synthetic data. In this work we propose a generic process for the generation of data sets for such benchmarking. The core idea is to reconstruct regular instances from existing real-world benchmark data while generating outliers so that they exhibit insightful characteristics. This allows both for a good coverage of domains and for helpful interpretations of results. We also describe three instantiations of the generic process that generate outliers with specific characteristics, like local outliers. A benchmark with state-of-the-art detection methods confirms that our generic process is indeed practical. △ Less

Submitted 15 April, 2020; originally announced April 2020.

arXiv:1912.01927 [pdf, other]

Active Learning of SVDD Hyperparameter Values

Authors: Holger Trittenbach, Klemens Böhm, Ira Assent

Abstract: Support Vector Data Description is a popular method for outlier detection. However, its usefulness largely depends on selecting good hyperparameter values -- a difficult problem that has received significant attention in literature. Existing methods to estimate hyperparameter values are purely heuristic, and the conditions under which they work well are unclear. In this article, we propose LAMA (L… ▽ More Support Vector Data Description is a popular method for outlier detection. However, its usefulness largely depends on selecting good hyperparameter values -- a difficult problem that has received significant attention in literature. Existing methods to estimate hyperparameter values are purely heuristic, and the conditions under which they work well are unclear. In this article, we propose LAMA (Local Active Min-Max Alignment), the first principled approach to estimate SVDD hyperparameter values by active learning. The core idea bases on kernel alignment, which we adapt to active learning with small sample sizes. In contrast to many existing approaches, LAMA provides estimates for both SVDD hyperparameters. These estimates are evidence-based, i.e., rely on actual class labels, and come with a quality score. This eliminates the need for manual validation, an issue with current heuristics. LAMA outperforms state-of-the-art competitors in extensive experiments on real-world data. In several cases, LAMA even yields results close to the empirical upper bound. △ Less

Submitted 4 December, 2019; originally announced December 2019.

arXiv:1910.01713 [pdf, other]

doi 10.1145/3448016.3457301

REDS: Rule Extraction for Discovering Scenarios

Authors: Vadim Arzamasov, Klemens Böhm

Abstract: Scenario discovery is the process of finding areas of interest, known as scenarios, in data spaces resulting from simulations. For instance, one might search for conditions, i.e., inputs of the simulation model, where the system is unstable. Subgroup discovery methods are commonly used for scenario discovery. They find scenarios in the form of hyperboxes, which are easy to comprehend. Given a comp… ▽ More Scenario discovery is the process of finding areas of interest, known as scenarios, in data spaces resulting from simulations. For instance, one might search for conditions, i.e., inputs of the simulation model, where the system is unstable. Subgroup discovery methods are commonly used for scenario discovery. They find scenarios in the form of hyperboxes, which are easy to comprehend. Given a computational budget, results tend to get worse as the number of inputs of the simulation model and the cost of simulations increase. We propose a new procedure for scenario discovery from few simulations, dubbed REDS. A key ingredient is using an intermediate machine learning model to label data for subsequent use by conventional subgroup discovery methods. We provide statistical arguments why this is an improvement. In our experiments, REDS reduces the number of simulations required by 50--75\% on average, depending on the quality measure. It is also useful as a semi-supervised subgroup discovery method and for discovering better scenarios from third-party data, when a simulation model is not available. △ Less

Submitted 5 May, 2022; v1 submitted 3 October, 2019; originally announced October 2019.

arXiv:1810.02112 [pdf, other]

Monte Carlo Dependency Estimation

Authors: Edouard Fouché, Klemens Böhm

Abstract: Estimating the dependency of variables is a fundamental task in data analysis. Identifying the relevant attributes in databases leads to better data understanding and also improves the performance of learning algorithms, both in terms of runtime and quality. In data streams, dependency monitoring provides key insights into the underlying process, but is challenging. In this paper, we propose Monte… ▽ More Estimating the dependency of variables is a fundamental task in data analysis. Identifying the relevant attributes in databases leads to better data understanding and also improves the performance of learning algorithms, both in terms of runtime and quality. In data streams, dependency monitoring provides key insights into the underlying process, but is challenging. In this paper, we propose Monte Carlo Dependency Estimation (MCDE), a theoretical framework to estimate multivariate dependency in static and dynamic data. MCDE quantifies dependency as the average discrepancy between marginal and conditional distributions via Monte Carlo simulations. Based on this framework, we present Mann-Whitney P (MWP), a novel dependency estimator. We show that MWP satisfies a number of desirable properties and can accommodate any kind of numerical data. We demonstrate the superiority of our estimator by comparing it to the state-of-the-art multivariate dependency measures. △ Less

Submitted 4 October, 2018; originally announced October 2018.

arXiv:1808.04759 [pdf, other]

An Overview and a Benchmark of Active Learning for Outlier Detection with One-Class Classifiers

Authors: Holger Trittenbach, Adrian Englhardt, Klemens Böhm

Abstract: Active learning methods increase classification quality by means of user feedback. An important subcategory is active learning for outlier detection with one-class classifiers. While various methods in this category exist, selecting one for a given application scenario is difficult. This is because existing methods rely on different assumptions, have different objectives, and often are tailored to… ▽ More Active learning methods increase classification quality by means of user feedback. An important subcategory is active learning for outlier detection with one-class classifiers. While various methods in this category exist, selecting one for a given application scenario is difficult. This is because existing methods rely on different assumptions, have different objectives, and often are tailored to a specific use case. All this calls for a comprehensive comparison, the topic of this article. This article starts with a categorization of the various methods. We then propose ways to evaluate active learning results. Next, we run extensive experiments to compare existing methods, for a broad variety of scenarios. Based on our results, we formulate guidelines on how to select active learning methods for outlier detection with one-class classifiers. △ Less

Submitted 14 May, 2019; v1 submitted 14 August, 2018; originally announced August 2018.

Comments: Change history: update to more specific title; restructure of experimental section: added additional data sets and heuristic to select kernel parameter; add guidelines and decision rules (Section 4.4). Further minor changes: additional references; discussion of split strategies now is in Section 3.4; fixed typos

Showing 1–19 of 19 results for author: Böhm, K