-
Approximating Optimum Online for Capacitated Resource Allocation
Authors:
Alexander Braun,
Thomas Kesselheim,
Tristan Pollner,
Amin Saberi
Abstract:
We study online capacitated resource allocation, a natural generalization of online stochastic max-weight bipartite matching. This problem is motivated by ride-sharing and Internet advertising applications, where online arrivals may have the capacity to serve multiple offline users.
Our main result is a polynomial-time online algorithm which is $(1/2 + κ)$-approximate to the optimal online algor…
▽ More
We study online capacitated resource allocation, a natural generalization of online stochastic max-weight bipartite matching. This problem is motivated by ride-sharing and Internet advertising applications, where online arrivals may have the capacity to serve multiple offline users.
Our main result is a polynomial-time online algorithm which is $(1/2 + κ)$-approximate to the optimal online algorithm for $κ= 0.0115$. This can be contrasted to the (tight) $1/2$-competitive algorithms to the optimum offline benchmark from the prophet inequality literature. Optimum online is a recently popular benchmark for online Bayesian problems which can use unbounded computation, but not "prophetic" knowledge of future inputs.
Our algorithm (which also works for the case of stochastic rewards) rounds a generalized LP relaxation from the unit-capacity case via a two-proposal algorithm, as in previous works in the online matching literature. A key technical challenge in deriving our guarantee is bounding the positive correlation among users introduced when rounding our LP relaxation online. Unlike in the case of unit capacities, this positive correlation is unavoidable for guarantees beyond $1/2$. Conceptually, our results show that the study of optimum online as a benchmark can reveal problem-specific insights that are irrelevant to competitive analysis.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Decoupling of neural network calibration measures
Authors:
Dominik Werner Wolf,
Prasannavenkatesh Balaji,
Alexander Braun,
Markus Ulrich
Abstract:
A lot of effort is currently invested in safeguarding autonomous driving systems, which heavily rely on deep neural networks for computer vision. We investigate the coupling of different neural network calibration measures with a special focus on the Area Under the Sparsification Error curve (AUSE) metric. We elaborate on the well-known inconsistency in determining optimal calibration using the Ex…
▽ More
A lot of effort is currently invested in safeguarding autonomous driving systems, which heavily rely on deep neural networks for computer vision. We investigate the coupling of different neural network calibration measures with a special focus on the Area Under the Sparsification Error curve (AUSE) metric. We elaborate on the well-known inconsistency in determining optimal calibration using the Expected Calibration Error (ECE) and we demonstrate similar issues for the AUSE, the Uncertainty Calibration Score (UCS), as well as the Uncertainty Calibration Error (UCE). We conclude that the current methodologies leave a degree of freedom, which prevents a unique model calibration for the homologation of safety-critical functionalities. Furthermore, we propose the AUSE as an indirect measure for the residual uncertainty, which is irreducible for a fixed network architecture and is driven by the stochasticity in the underlying data generation process (aleatoric contribution) as well as the limitation in the hypothesis space (epistemic contribution).
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
A Point-Based Approach to Efficient LiDAR Multi-Task Perception
Authors:
Christopher Lang,
Alexander Braun,
Lars Schillingmann,
Abhinav Valada
Abstract:
Multi-task networks can potentially improve performance and computational efficiency compared to single-task networks, facilitating online deployment. However, current multi-task architectures in point cloud perception combine multiple task-specific point cloud representations, each requiring a separate feature encoder and making the network structures bulky and slow. We propose PAttFormer, an eff…
▽ More
Multi-task networks can potentially improve performance and computational efficiency compared to single-task networks, facilitating online deployment. However, current multi-task architectures in point cloud perception combine multiple task-specific point cloud representations, each requiring a separate feature encoder and making the network structures bulky and slow. We propose PAttFormer, an efficient multi-task architecture for joint semantic segmentation and object detection in point clouds that only relies on a point-based representation. The network builds on transformer-based feature encoders using neighborhood attention and grid-pooling and a query-based detection decoder using a novel 3D deformable-attention detection head design. Unlike other LiDAR-based multi-task architectures, our proposed PAttFormer does not require separate feature encoders for multiple task-specific point cloud representations, resulting in a network that is 3x smaller and 1.4x faster while achieving competitive performance on the nuScenes and KITTI benchmarks for autonomous driving perception. Our extensive evaluations show substantial gains from multi-task learning, improving LiDAR semantic segmentation by +1.7% in mIou and 3D object detection by +1.7% in mAP on the nuScenes benchmark compared to the single-task models.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
Automatic Bat Call Classification using Transformer Networks
Authors:
Frank Fundel,
Daniel A. Braun,
Sebastian Gottwald
Abstract:
Automatically identifying bat species from their echolocation calls is a difficult but important task for monitoring bats and the ecosystem they live in. Major challenges in automatic bat call identification are high call variability, similarities between species, interfering calls and lack of annotated data. Many currently available models suffer from relatively poor performance on real-life data…
▽ More
Automatically identifying bat species from their echolocation calls is a difficult but important task for monitoring bats and the ecosystem they live in. Major challenges in automatic bat call identification are high call variability, similarities between species, interfering calls and lack of annotated data. Many currently available models suffer from relatively poor performance on real-life data due to being trained on single call datasets and, moreover, are often too slow for real-time classification. Here, we propose a Transformer architecture for multi-label classification with potential applications in real-time classification scenarios. We train our model on synthetically generated multi-species recordings by merging multiple bats calls into a single recording with multiple simultaneous calls. Our approach achieves a single species accuracy of 88.92% (F1-score of 84.23%) and a multi species macro F1-score of 74.40% on our test set. In comparison to three other tools on the independent and publicly available dataset ChiroVox, our model achieves at least 25.82% better accuracy for single species classification and at least 6.9% better macro F1-score for multi species classification.
△ Less
Submitted 20 September, 2023;
originally announced September 2023.
-
Classification robustness to common optical aberrations
Authors:
Patrick Müller,
Alexander Braun,
Margret Keuper
Abstract:
Computer vision using deep neural networks (DNNs) has brought about seminal changes in people's lives. Applications range from automotive, face recognition in the security industry, to industrial process monitoring. In some cases, DNNs infer even in safety-critical situations. Therefore, for practical applications, DNNs have to behave in a robust way to disturbances such as noise, pixelation, or b…
▽ More
Computer vision using deep neural networks (DNNs) has brought about seminal changes in people's lives. Applications range from automotive, face recognition in the security industry, to industrial process monitoring. In some cases, DNNs infer even in safety-critical situations. Therefore, for practical applications, DNNs have to behave in a robust way to disturbances such as noise, pixelation, or blur. Blur directly impacts the performance of DNNs, which are often approximated as a disk-shaped kernel to model defocus. However, optics suggests that there are different kernel shapes depending on wavelength and location caused by optical aberrations. In practice, as the optical quality of a lens decreases, such aberrations increase. This paper proposes OpticsBench, a benchmark for investigating robustness to realistic, practically relevant optical blur effects. Each corruption represents an optical aberration (coma, astigmatism, spherical, trefoil) derived from Zernike Polynomials. Experiments on ImageNet show that for a variety of different pre-trained DNNs, the performance varies strongly compared to disk-shaped kernels, indicating the necessity of considering realistic image degradations. In addition, we show on ImageNet-100 with OpticsAugment that robustness can be increased by using optical kernels as data augmentation. Compared to a conventionally trained ResNeXt50, training with OpticsAugment achieves an average performance gain of 21.7% points on OpticsBench and 6.8% points on 2D common corruptions.
△ Less
Submitted 29 August, 2023;
originally announced August 2023.
-
Occupancy Grid Map to Pose Graph-based Map: Robust BIM-based 2D-LiDAR Localization for Lifelong Indoor Navigation in Changing and Dynamic Environments
Authors:
Miguel Arturo Vega Torres,
Alexander Braun,
André Borrmann
Abstract:
Several studies rely on the de facto standard Adaptive Monte Carlo Localization (AMCL) method to localize a robot in an Occupancy Grid Map (OGM) extracted from a building information model (BIM model). However, most of these studies assume that the BIM model precisely represents the real world, which is rarely true. Discrepancies between the reference BIM model and the real world (Scan-BIM deviati…
▽ More
Several studies rely on the de facto standard Adaptive Monte Carlo Localization (AMCL) method to localize a robot in an Occupancy Grid Map (OGM) extracted from a building information model (BIM model). However, most of these studies assume that the BIM model precisely represents the real world, which is rarely true. Discrepancies between the reference BIM model and the real world (Scan-BIM deviations) are not only due to furniture or clutter but also the usual as-planned and as-built deviations that exist with any model created in the design phase. These deviations affect the accuracy of AMCL drastically. This paper proposes an open-source method to generate appropriate Pose Graph-based maps from BIM models for robust 2D-LiDAR localization in changing and dynamic environments. First, 2D OGMs are automatically generated from complex BIM models. These OGMs only represent structural elements allowing indoor autonomous robot navigation. Then, an efficient technique converts these 2D OGMs into Pose Graph-based maps enabling more accurate robot pose tracking. Finally, we leverage the different map representations for accurate, robust localization with a combination of state-of-the-art algorithms. Moreover, we provide a quantitative comparison of various state-of-the-art localization algorithms in three simulated scenarios with varying levels of Scan-BIM deviations and dynamic agents. More precisely, we compare two Particle Filter (PF) algorithms: AMCL and General Monte Carlo Localization (GMCL); and two Graph-based Localization (GBL) methods: Google's Cartographer and SLAM Toolbox, solving the global localization and pose tracking problems. The numerous experiments demonstrate that the proposed method contributes to a robust localization with an as-designed BIM model or a sparse OGM in changing and dynamic environments, outperforming the conventional AMCL in accuracy and robustness.
△ Less
Submitted 10 August, 2023;
originally announced August 2023.
-
ECS -- an Interactive Tool for Data Quality Assurance
Authors:
Christian Sieberichs,
Simon Geerkens,
Alexander Braun,
Thomas Waschulzik
Abstract:
With the increasing capabilities of machine learning systems and their potential use in safety-critical systems, ensuring high-quality data is becoming increasingly important. In this paper we present a novel approach for the assurance of data quality. For this purpose, the mathematical basics are first discussed and the approach is presented using multiple examples. This results in the detection…
▽ More
With the increasing capabilities of machine learning systems and their potential use in safety-critical systems, ensuring high-quality data is becoming increasingly important. In this paper we present a novel approach for the assurance of data quality. For this purpose, the mathematical basics are first discussed and the approach is presented using multiple examples. This results in the detection of data points with potentially harmful properties for the use in safety-critical systems.
△ Less
Submitted 17 July, 2023; v1 submitted 10 July, 2023;
originally announced July 2023.
-
QI2 -- an Interactive Tool for Data Quality Assurance
Authors:
Simon Geerkens,
Christian Sieberichs,
Alexander Braun,
Thomas Waschulzik
Abstract:
The importance of high data quality is increasing with the growing impact and distribution of ML systems and big data. Also the planned AI Act from the European commission defines challenging legal requirements for data quality especially for the market introduction of safety relevant ML systems. In this paper we introduce a novel approach that supports the data quality assurance process of multip…
▽ More
The importance of high data quality is increasing with the growing impact and distribution of ML systems and big data. Also the planned AI Act from the European commission defines challenging legal requirements for data quality especially for the market introduction of safety relevant ML systems. In this paper we introduce a novel approach that supports the data quality assurance process of multiple data quality aspects. This approach enables the verification of quantitative data quality requirements. The concept and benefits are introduced and explained on small example data sets. How the method is applied is demonstrated on the well known MNIST data set based an handwritten digits.
△ Less
Submitted 10 July, 2023; v1 submitted 7 July, 2023;
originally announced July 2023.
-
Windscreen Optical Quality for AI Algorithms: Refractive Power and MTF not Sufficient
Authors:
Dominik Werner Wolf,
Markus Ulrich,
Alexander Braun
Abstract:
Windscreen optical quality is an important aspect of any advanced driver assistance system, and also for future autonomous driving, as today at least some cameras of the sensor suite are situated behind the windscreen. Automotive mass production processes require measurement systems that characterize the optical quality of the windscreens in a meaningful way, which for modern perception stacks imp…
▽ More
Windscreen optical quality is an important aspect of any advanced driver assistance system, and also for future autonomous driving, as today at least some cameras of the sensor suite are situated behind the windscreen. Automotive mass production processes require measurement systems that characterize the optical quality of the windscreens in a meaningful way, which for modern perception stacks implies meaningful for artificial intelligence (AI) algorithms. The measured optical quality needs to be linked to the performance of these algorithms, such that performance limits - and thus production tolerance limits - can be defined. In this article we demonstrate that the main metric established in the industry - refractive power - is fundamentally not capable of capturing relevant optical properties of windscreens. Further, as the industry is moving towards the modulation transfer function (MTF) as an alternative, we mathematically show that this metric cannot be used on windscreens alone, but that the windscreen forms a novel optical system together with the optics of the camera system. Hence, the required goal of a qualification system that is installed at the windscreen supplier and independently measures the optical quality cannot be achieved using MTF. We propose a novel concept to determine the optical quality of windscreens and to use simulation to link this optical quality to the performance of AI algorithms, which can hopefully lead to novel inspection systems.
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
Self-Supervised Multi-Object Tracking For Autonomous Driving From Consistency Across Timescales
Authors:
Christopher Lang,
Alexander Braun,
Lars Schillingmann,
Abhinav Valada
Abstract:
Self-supervised multi-object trackers have tremendous potential as they enable learning from raw domain-specific data. However, their re-identification accuracy still falls short compared to their supervised counterparts. We hypothesize that this drawback results from formulating self-supervised objectives that are limited to single frames or frame pairs. Such formulations do not capture sufficien…
▽ More
Self-supervised multi-object trackers have tremendous potential as they enable learning from raw domain-specific data. However, their re-identification accuracy still falls short compared to their supervised counterparts. We hypothesize that this drawback results from formulating self-supervised objectives that are limited to single frames or frame pairs. Such formulations do not capture sufficient visual appearance variations to facilitate learning consistent re-identification features for autonomous driving when the frame rate is low or object dynamics are high. In this work, we propose a training objective that enables self-supervised learning of re-identification features from multiple sequential frames by enforcing consistent association scores across short and long timescales. We perform extensive evaluations demonstrating that re-identification features trained from longer sequences significantly reduce ID switches on standard autonomous driving datasets compared to existing self-supervised learning methods, which are limited to training on frame pairs. Using our proposed SubCo loss function, we set the new state-of-the-art among self-supervised methods and even perform on par with fully supervised learning methods.
△ Less
Submitted 21 September, 2023; v1 submitted 25 April, 2023;
originally announced April 2023.
-
Self-Supervised Representation Learning from Temporal Ordering of Automated Driving Sequences
Authors:
Christopher Lang,
Alexander Braun,
Lars Schillingmann,
Karsten Haug,
Abhinav Valada
Abstract:
Self-supervised feature learning enables perception systems to benefit from the vast raw data recorded by vehicle fleets worldwide. While video-level self-supervised learning approaches have shown strong generalizability on classification tasks, the potential to learn dense representations from sequential data has been relatively unexplored. In this work, we propose TempO, a temporal ordering pret…
▽ More
Self-supervised feature learning enables perception systems to benefit from the vast raw data recorded by vehicle fleets worldwide. While video-level self-supervised learning approaches have shown strong generalizability on classification tasks, the potential to learn dense representations from sequential data has been relatively unexplored. In this work, we propose TempO, a temporal ordering pretext task for pre-training region-level feature representations for perception tasks. We embed each frame by an unordered set of proposal feature vectors, a representation that is natural for object detection or tracking systems, and formulate the sequential ordering by predicting frame transition probabilities in a transformer-based multi-frame architecture whose complexity scales less than quadratic with respect to the sequence length. Extensive evaluations on the BDD100K, nuImages, and MOT17 datasets show that our TempO pre-training approach outperforms single-frame self-supervised learning methods as well as supervised transfer learning initialization strategies, achieving an improvement of +0.7% in mAP for object detection and +2.0% in the HOTA score for multi-object tracking.
△ Less
Submitted 8 November, 2023; v1 submitted 17 February, 2023;
originally announced February 2023.
-
Hierarchically Structured Task-Agnostic Continual Learning
Authors:
Heinke Hihn,
Daniel A. Braun
Abstract:
One notable weakness of current machine learning algorithms is the poor ability of models to solve new problems without forgetting previously acquired knowledge. The Continual Learning paradigm has emerged as a protocol to systematically investigate settings where the model sequentially observes samples generated by a series of tasks. In this work, we take a task-agnostic view of continual learnin…
▽ More
One notable weakness of current machine learning algorithms is the poor ability of models to solve new problems without forgetting previously acquired knowledge. The Continual Learning paradigm has emerged as a protocol to systematically investigate settings where the model sequentially observes samples generated by a series of tasks. In this work, we take a task-agnostic view of continual learning and develop a hierarchical information-theoretic optimality principle that facilitates a trade-off between learning and forgetting. We derive this principle from a Bayesian perspective and show its connections to previous approaches to continual learning. Based on this principle, we propose a neural network layer, called the Mixture-of-Variational-Experts layer, that alleviates forgetting by creating a set of information processing paths through the network which is governed by a gating policy. Equipped with a diverse and specialized set of parameters, each path can be regarded as a distinct sub-network that learns to solve tasks. To improve expert allocation, we introduce diversity objectives, which we evaluate in additional ablation studies. Importantly, our approach can operate in a task-agnostic way, i.e., it does not require task-specific knowledge, as is the case with many existing continual learning algorithms. Due to the general formulation based on generic utility functions, we can apply this optimality principle to a large variety of learning problems, including supervised learning, reinforcement learning, and generative modeling. We demonstrate the competitive performance of our method on continual reinforcement learning and variants of the MNIST, CIFAR-10, and CIFAR-100 datasets.
△ Less
Submitted 14 November, 2022;
originally announced November 2022.
-
Simplified Prophet Inequalities for Combinatorial Auctions
Authors:
Alexander Braun,
Thomas Kesselheim
Abstract:
We consider prophet inequalities for XOS and MPH-$k$ combinatorial auctions and give a simplified proof for the existence of static and anonymous item prices which recover the state-of-the-art competitive ratios.
Our proofs make use of a linear programming formulation which has a non-negative objective value if there are prices which admit a given competitive ratio $α\geq 1$. Changing our perspe…
▽ More
We consider prophet inequalities for XOS and MPH-$k$ combinatorial auctions and give a simplified proof for the existence of static and anonymous item prices which recover the state-of-the-art competitive ratios.
Our proofs make use of a linear programming formulation which has a non-negative objective value if there are prices which admit a given competitive ratio $α\geq 1$. Changing our perspective to dual space by an application of strong LP duality, we use an interpretation of the dual variables as probabilities to directly obtain our result. In contrast to previous work, our proofs do not require to argue about specific values of buyers for bundles, but only about the presence or absence of items.
As a side remark, for any $k \geq 2$, this simplification also leads to a tiny improvement in the best competitive ratio for MPH-$k$ combinatorial auctions from $4k-2$ to $2k + 2 \sqrt{k(k-1)} -1$.
△ Less
Submitted 1 November, 2022;
originally announced November 2022.
-
Change Detection for Local Explainability in Evolving Data Streams
Authors:
Johannes Haug,
Alexander Braun,
Stefan Zürn,
Gjergji Kasneci
Abstract:
As complex machine learning models are increasingly used in sensitive applications like banking, trading or credit scoring, there is a growing demand for reliable explanation mechanisms. Local feature attribution methods have become a popular technique for post-hoc and model-agnostic explanations. However, attribution methods typically assume a stationary environment in which the predictive model…
▽ More
As complex machine learning models are increasingly used in sensitive applications like banking, trading or credit scoring, there is a growing demand for reliable explanation mechanisms. Local feature attribution methods have become a popular technique for post-hoc and model-agnostic explanations. However, attribution methods typically assume a stationary environment in which the predictive model has been trained and remains stable. As a result, it is often unclear how local attributions behave in realistic, constantly evolving settings such as streaming and online applications. In this paper, we discuss the impact of temporal change on local feature attributions. In particular, we show that local attributions can become obsolete each time the predictive model is updated or concept drift alters the data generating distribution. Consequently, local feature attributions in data streams provide high explanatory power only when combined with a mechanism that allows us to detect and respond to local changes over time. To this end, we present CDLEEDS, a flexible and model-agnostic framework for detecting local change and concept drift. CDLEEDS serves as an intuitive extension of attribution-based explanation techniques to identify outdated local attributions and enable more targeted recalculations. In experiments, we also show that the proposed framework can reliably detect both local and global concept drift. Accordingly, our work contributes to a more meaningful and robust explainability in online machine learning.
△ Less
Submitted 6 September, 2022;
originally announced September 2022.
-
Countability constraints in order-theoretic approaches to computability
Authors:
Pedro Hack,
Daniel A. Braun,
Sebastian Gottwald
Abstract:
Computability on uncountable sets has no standard formalization, unlike that on countable sets, which is given by Turing machines. Some of the approaches to define computability in these sets rely on order-theoretic structures to translate such notions from Turing machines to uncountable spaces. Since these machines are used as a baseline for computability in these approaches, countability restric…
▽ More
Computability on uncountable sets has no standard formalization, unlike that on countable sets, which is given by Turing machines. Some of the approaches to define computability in these sets rely on order-theoretic structures to translate such notions from Turing machines to uncountable spaces. Since these machines are used as a baseline for computability in these approaches, countability restrictions on the ordered structures are fundamental. Here, we show several relations between the usual countability restrictions in order-theoretic theories of computability and some more common order-theoretic countability constraints, like order density properties and functional characterizations of the order structure in terms of multi-utilities. As a result, we show how computability can be introduced in some order structures via countability order density and multi-utility constraints.
△ Less
Submitted 28 May, 2024; v1 submitted 29 June, 2022;
originally announced June 2022.
-
Computation as uncertainty reduction: a simplified order-theoretic framework
Authors:
Pedro Hack,
Daniel A. Braun,
Sebastian Gottwald
Abstract:
Although there is a somewhat standard formalization of computability on countable sets given by Turing machines, the same cannot be said about uncountable sets. Among the approaches to define computability in these sets, order-theoretic structures have proven to be useful. Here, we discuss the mathematical structure needed to define computability using order-theoretic concepts. In particular, we i…
▽ More
Although there is a somewhat standard formalization of computability on countable sets given by Turing machines, the same cannot be said about uncountable sets. Among the approaches to define computability in these sets, order-theoretic structures have proven to be useful. Here, we discuss the mathematical structure needed to define computability using order-theoretic concepts. In particular, we introduce a more general framework and discuss its limitations compared to the previous one in domain theory. We expose four features in which the stronger requirements in the domain-theoretic structure allow to improve upon the more general framework: computable elements, computable functions, model dependence of computability and complexity theory. Crucially, we show computability of elements in uncountable spaces can be defined in this new setup, and argue why this is not the case for computable functions. Moreover, we show the stronger setup diminishes the dependence of computability on the chosen order-theoretic structure and that, although a suitable complexity theory can be defined in the stronger framework and the more general one posesses a notion of computable elements, there appears to be no proper notion of element complexity in the latter.
△ Less
Submitted 6 September, 2022; v1 submitted 28 June, 2022;
originally announced June 2022.
-
On a geometrical notion of dimension for partially ordered sets
Authors:
Pedro Hack,
Daniel A. Braun,
Sebastian Gottwald
Abstract:
The well-known notion of dimension for partial orders by Dushnik and Miller allows to quantify the degree of incomparability and, thus, is regarded as a measure of complexity for partial orders. However, despite its usefulness, its definition is somewhat disconnected from the geometrical idea of dimension, where, essentially, the number of dimensions indicates how many real lines are required to r…
▽ More
The well-known notion of dimension for partial orders by Dushnik and Miller allows to quantify the degree of incomparability and, thus, is regarded as a measure of complexity for partial orders. However, despite its usefulness, its definition is somewhat disconnected from the geometrical idea of dimension, where, essentially, the number of dimensions indicates how many real lines are required to represent the underlying partially ordered set.
Here, we introduce a variation of the Dushnik-Miller notion of dimension that is closer to geometry, the Debreu dimension, and show the following main results: (i) how to construct its building blocks under some countability restrictions, (ii) its relation to other notions of dimension in the literature, and (iii), as an application of the above, we improve on the classification of preordered spaces through real-valued monotones.
△ Less
Submitted 2 September, 2022; v1 submitted 30 March, 2022;
originally announced March 2022.
-
On Hyperbolic Embeddings in 2D Object Detection
Authors:
Christopher Lang,
Alexander Braun,
Abhinav Valada
Abstract:
Object detection, for the most part, has been formulated in the euclidean space, where euclidean or spherical geodesic distances measure the similarity of an image region to an object class prototype. In this work, we study whether a hyperbolic geometry better matches the underlying structure of the object classification space. We incorporate a hyperbolic classifier in two-stage, keypoint-based, a…
▽ More
Object detection, for the most part, has been formulated in the euclidean space, where euclidean or spherical geodesic distances measure the similarity of an image region to an object class prototype. In this work, we study whether a hyperbolic geometry better matches the underlying structure of the object classification space. We incorporate a hyperbolic classifier in two-stage, keypoint-based, and transformer-based object detection architectures and evaluate them on large-scale, long-tailed, and zero-shot object detection benchmarks. In our extensive experimental evaluations, we observe categorical class hierarchies emerging in the structure of the classification space, resulting in lower classification errors and boosting the overall object detection performance.
△ Less
Submitted 18 March, 2022; v1 submitted 15 March, 2022;
originally announced March 2022.
-
The classification of preordered spaces in terms of monotones: complexity and optimization
Authors:
Pedro Hack,
Daniel A. Braun,
Sebastian Gottwald
Abstract:
The study of complexity and optimization in decision theory involves both partial and complete characterizations of preferences over decision spaces in terms of real-valued monotones. With this motivation, and following the recent introduction of new classes of monotones, like injective monotones or strict monotone multi-utilities, we present the classification of preordered spaces in terms of bot…
▽ More
The study of complexity and optimization in decision theory involves both partial and complete characterizations of preferences over decision spaces in terms of real-valued monotones. With this motivation, and following the recent introduction of new classes of monotones, like injective monotones or strict monotone multi-utilities, we present the classification of preordered spaces in terms of both the existence and cardinality of real-valued monotones and the cardinality of the quotient space. In particular, we take advantage of a characterization of real-valued monotones in terms of separating families of increasing sets in order to obtain a more complete classification consisting of classes that are strictly different from each other. As a result, we gain new insight into both complexity and optimization, and clarify their interplay in preordered spaces.
△ Less
Submitted 14 August, 2022; v1 submitted 24 February, 2022;
originally announced February 2022.
-
ExAID: A Multimodal Explanation Framework for Computer-Aided Diagnosis of Skin Lesions
Authors:
Adriano Lucieri,
Muhammad Naseer Bajwa,
Stephan Alexander Braun,
Muhammad Imran Malik,
Andreas Dengel,
Sheraz Ahmed
Abstract:
One principal impediment in the successful deployment of AI-based Computer-Aided Diagnosis (CAD) systems in clinical workflows is their lack of transparent decision making. Although commonly used eXplainable AI methods provide some insight into opaque algorithms, such explanations are usually convoluted and not readily comprehensible except by highly trained experts. The explanation of decisions r…
▽ More
One principal impediment in the successful deployment of AI-based Computer-Aided Diagnosis (CAD) systems in clinical workflows is their lack of transparent decision making. Although commonly used eXplainable AI methods provide some insight into opaque algorithms, such explanations are usually convoluted and not readily comprehensible except by highly trained experts. The explanation of decisions regarding the malignancy of skin lesions from dermoscopic images demands particular clarity, as the underlying medical problem definition is itself ambiguous. This work presents ExAID (Explainable AI for Dermatology), a novel framework for biomedical image analysis, providing multi-modal concept-based explanations consisting of easy-to-understand textual explanations supplemented by visual maps justifying the predictions. ExAID relies on Concept Activation Vectors to map human concepts to those learnt by arbitrary Deep Learning models in latent space, and Concept Localization Maps to highlight concepts in the input space. This identification of relevant concepts is then used to construct fine-grained textual explanations supplemented by concept-wise location information to provide comprehensive and coherent multi-modal explanations. All information is comprehensively presented in a diagnostic interface for use in clinical routines. An educational mode provides dataset-level explanation statistics and tools for data and model exploration to aid medical research and education. Through rigorous quantitative and qualitative evaluation of ExAID, we show the utility of multi-modal explanations for CAD-assisted scenarios even in case of wrong predictions. We believe that ExAID will provide dermatologists an effective screening tool that they both understand and trust. Moreover, it will be the basis for similar applications in other biomedical imaging fields.
△ Less
Submitted 4 January, 2022;
originally announced January 2022.
-
Contrastive Object Detection Using Knowledge Graph Embeddings
Authors:
Christopher Lang,
Alexander Braun,
Abhinav Valada
Abstract:
Object recognition for the most part has been approached as a one-hot problem that treats classes to be discrete and unrelated. Each image region has to be assigned to one member of a set of objects, including a background class, disregarding any similarities in the object types. In this work, we compare the error statistics of the class embeddings learned from a one-hot approach with semantically…
▽ More
Object recognition for the most part has been approached as a one-hot problem that treats classes to be discrete and unrelated. Each image region has to be assigned to one member of a set of objects, including a background class, disregarding any similarities in the object types. In this work, we compare the error statistics of the class embeddings learned from a one-hot approach with semantically structured embeddings from natural language processing or knowledge graphs that are widely applied in open world object detection. Extensive experimental results on multiple knowledge-embeddings as well as distance metrics indicate that knowledge-based class representations result in more semantically grounded misclassifications while performing on par compared to one-hot methods on the challenging COCO and Cityscapes object detection benchmarks. We generalize our findings to multiple object detection architectures by proposing a knowledge-embedded design for keypoint-based and transformer-based object detection architectures.
△ Less
Submitted 21 December, 2021;
originally announced December 2021.
-
Mixture-of-Variational-Experts for Continual Learning
Authors:
Heinke Hihn,
Daniel A. Braun
Abstract:
One weakness of machine learning algorithms is the poor ability of models to solve new problems without forgetting previously acquired knowledge. The Continual Learning (CL) paradigm has emerged as a protocol to systematically investigate settings where the model sequentially observes samples generated by a series of tasks. In this work, we take a task-agnostic view of continual learning and devel…
▽ More
One weakness of machine learning algorithms is the poor ability of models to solve new problems without forgetting previously acquired knowledge. The Continual Learning (CL) paradigm has emerged as a protocol to systematically investigate settings where the model sequentially observes samples generated by a series of tasks. In this work, we take a task-agnostic view of continual learning and develop a hierarchical information-theoretic optimality principle that facilitates a trade-off between learning and forgetting. We discuss this principle from a Bayesian perspective and show its connections to previous approaches to CL. Based on this principle, we propose a neural network layer, called the Mixture-of-Variational-Experts layer, that alleviates forgetting by creating a set of information processing paths through the network which is governed by a gating policy. Due to the general formulation based on generic utility functions, we can apply this optimality principle to a large variety of learning problems, including supervised learning, reinforcement learning, and generative modeling. We demonstrate the competitive performance of our method in continual supervised learning and in continual reinforcement learning.
△ Less
Submitted 1 March, 2022; v1 submitted 25 October, 2021;
originally announced October 2021.
-
Representing preorders with injective monotones
Authors:
Pedro Hack,
Daniel A. Braun,
Sebastian Gottwald
Abstract:
We introduce a new class of real-valued monotones in preordered spaces, injective monotones. We show that the class of preorders for which they exist lies in between the class of preorders with strict monotones and preorders with countable multi-utilities, improving upon the known classification of preordered spaces through real-valued monotones. We extend several well-known results for strict mon…
▽ More
We introduce a new class of real-valued monotones in preordered spaces, injective monotones. We show that the class of preorders for which they exist lies in between the class of preorders with strict monotones and preorders with countable multi-utilities, improving upon the known classification of preordered spaces through real-valued monotones. We extend several well-known results for strict monotones (Richter-Peleg functions) to injective monotones, we provide a construction of injective monotones from countable multi-utilities, and relate injective monotones to classic results concerning Debreu denseness and order separability. Along the way, we connect our results to Shannon entropy and the uncertainty preorder, obtaining new insights into how they are related. In particular, we show how injective montones can be used to generalize some appealing properties of Jaynes' maximum entropy principle, which is considered a basis for statistical inference and serves as a justification for many regularization techniques that appear throughout machine learning and decision theory.
△ Less
Submitted 24 November, 2021; v1 submitted 30 July, 2021;
originally announced July 2021.
-
A Comparison of Methods for OOV-word Recognition on a New Public Dataset
Authors:
Rudolf A. Braun,
Srikanth Madikeri,
Petr Motlicek
Abstract:
A common problem for automatic speech recognition systems is how to recognize words that they did not see during training. Currently there is no established method of evaluating different techniques for tackling this problem. We propose using the CommonVoice dataset to create test sets for multiple languages which have a high out-of-vocabulary (OOV) ratio relative to a training set and release a n…
▽ More
A common problem for automatic speech recognition systems is how to recognize words that they did not see during training. Currently there is no established method of evaluating different techniques for tackling this problem. We propose using the CommonVoice dataset to create test sets for multiple languages which have a high out-of-vocabulary (OOV) ratio relative to a training set and release a new tool for calculating relevant performance metrics. We then evaluate, within the context of a hybrid ASR system, how much better subword models are at recognizing OOVs, and how much benefit one can get from incorporating OOV-word information into an existing system by modifying WFSTs. Additionally, we propose a new method for modifying a subword-based language model so as to better recognize OOV-words. We showcase very large improvements in OOV-word recognition and make both the data and code available.
△ Less
Submitted 16 July, 2021;
originally announced July 2021.
-
Asymptotically Optimal Welfare of Posted Pricing for Multiple Items with MHR Distributions
Authors:
Alexander Braun,
Matthias Buttkus,
Thomas Kesselheim
Abstract:
We consider the problem of posting prices for unit-demand buyers if all $n$ buyers have identically distributed valuations drawn from a distribution with monotone hazard rate. We show that even with multiple items asymptotically optimal welfare can be guaranteed.
Our main results apply to the case that either a buyer's value for different items are independent or that they are perfectly correlat…
▽ More
We consider the problem of posting prices for unit-demand buyers if all $n$ buyers have identically distributed valuations drawn from a distribution with monotone hazard rate. We show that even with multiple items asymptotically optimal welfare can be guaranteed.
Our main results apply to the case that either a buyer's value for different items are independent or that they are perfectly correlated. We give mechanisms using dynamic prices that obtain a $1 - Θ\left( \frac{1}{\log n}\right)$-fraction of the optimal social welfare in expectation. Furthermore, we devise mechanisms that only use static item prices and are $1 - Θ\left( \frac{\log\log\log n}{\log n}\right)$-competitive compared to the optimal social welfare. As we show, both guarantees are asymptotically optimal, even for a single item and exponential distributions.
△ Less
Submitted 1 July, 2021;
originally announced July 2021.
-
Truthful Mechanisms for Two-Sided Markets via Prophet Inequalities
Authors:
Alexander Braun,
Thomas Kesselheim
Abstract:
We design novel mechanisms for welfare-maximization in two-sided markets. That is, there are buyers willing to purchase items and sellers holding items initially, both acting rationally and strategically in order to maximize utility. Our mechanisms are designed based on a powerful correspondence between two-sided markets and prophet inequalities. They satisfy individual rationality, dominant-strat…
▽ More
We design novel mechanisms for welfare-maximization in two-sided markets. That is, there are buyers willing to purchase items and sellers holding items initially, both acting rationally and strategically in order to maximize utility. Our mechanisms are designed based on a powerful correspondence between two-sided markets and prophet inequalities. They satisfy individual rationality, dominant-strategy incentive compatibility, budget-balance constraints and give constant-factor approximations to the optimal social welfare.
We improve previous results in several settings: Our main focus is on matroid double auctions, where the set of buyers who obtain an item needs to be independent in a matroid. We construct two mechanisms, the first being a $1/3$-approximation of the optimal social welfare satisfying strong budget-balance and requiring the agents to trade in a customized order, the second being a $1/2$-approximation, weakly budget-balanced and able to deal with online arrival determined by an adversary. In addition, we construct constant-factor approximations in two-sided markets when buyers need to fulfill a knapsack constraint. Also, in combinatorial double auctions, where buyers have valuation functions over item bundles instead of being interested in only one item, using similar techniques, we design a mechanism which is a $1/2$-approximation of the optimal social welfare, strongly budget-balanced and can deal with online arrival of agents in an adversarial order.
△ Less
Submitted 31 May, 2021;
originally announced May 2021.
-
SLPC: a VRNN-based approach for stochastic lidar prediction and completion in autonomous driving
Authors:
George Eskandar,
Alexander Braun,
Martin Meinke,
Karim Armanious,
Bin Yang
Abstract:
Predicting future 3D LiDAR pointclouds is a challenging task that is useful in many applications in autonomous driving such as trajectory prediction, pose forecasting and decision making. In this work, we propose a new LiDAR prediction framework that is based on generative models namely Variational Recurrent Neural Networks (VRNNs), titled Stochastic LiDAR Prediction and Completion (SLPC). Our alg…
▽ More
Predicting future 3D LiDAR pointclouds is a challenging task that is useful in many applications in autonomous driving such as trajectory prediction, pose forecasting and decision making. In this work, we propose a new LiDAR prediction framework that is based on generative models namely Variational Recurrent Neural Networks (VRNNs), titled Stochastic LiDAR Prediction and Completion (SLPC). Our algorithm is able to address the limitations of previous video prediction frameworks when dealing with sparse data by spatially inpainting the depth maps in the upcoming frames. Our contributions can thus be summarized as follows: we introduce the new task of predicting and completing depth maps from spatially sparse data, we present a sparse version of VRNNs and an effective self-supervised training method that does not require any labels. Experimental results illustrate the effectiveness of our framework in comparison to the state of the art methods in video prediction.
△ Less
Submitted 19 February, 2021;
originally announced February 2021.
-
Binary Classification: Counterbalancing Class Imbalance by Applying Regression Models in Combination with One-Sided Label Shifts
Authors:
Peter Bellmann,
Heinke Hihn,
Daniel A. Braun,
Friedhelm Schwenker
Abstract:
In many real-world pattern recognition scenarios, such as in medical applications, the corresponding classification tasks can be of an imbalanced nature. In the current study, we focus on binary, imbalanced classification tasks, i.e.~binary classification tasks in which one of the two classes is under-represented (minority class) in comparison to the other class (majority class). In the literature…
▽ More
In many real-world pattern recognition scenarios, such as in medical applications, the corresponding classification tasks can be of an imbalanced nature. In the current study, we focus on binary, imbalanced classification tasks, i.e.~binary classification tasks in which one of the two classes is under-represented (minority class) in comparison to the other class (majority class). In the literature, many different approaches have been proposed, such as under- or oversampling, to counter class imbalance. In the current work, we introduce a novel method, which addresses the issues of class imbalance. To this end, we first transfer the binary classification task to an equivalent regression task. Subsequently, we generate a set of negative and positive target labels, such that the corresponding regression task becomes balanced, with respect to the redefined target label set. We evaluate our approach on a number of publicly available data sets in combination with Support Vector Machines. Moreover, we compare our proposed method to one of the most popular oversampling techniques (SMOTE). Based on the detailed discussion of the presented outcomes of our experimental evaluation, we provide promising ideas for future research directions.
△ Less
Submitted 30 November, 2020;
originally announced November 2020.
-
Specialization in Hierarchical Learning Systems
Authors:
Heinke Hihn,
Daniel A. Braun
Abstract:
Joining multiple decision-makers together is a powerful way to obtain more sophisticated decision-making systems, but requires to address the questions of division of labor and specialization. We investigate in how far information constraints in hierarchies of experts not only provide a principled method for regularization but also to enforce specialization. In particular, we devise an information…
▽ More
Joining multiple decision-makers together is a powerful way to obtain more sophisticated decision-making systems, but requires to address the questions of division of labor and specialization. We investigate in how far information constraints in hierarchies of experts not only provide a principled method for regularization but also to enforce specialization. In particular, we devise an information-theoretically motivated on-line learning rule that allows partitioning of the problem space into multiple sub-problems that can be solved by the individual experts. We demonstrate two different ways to apply our method: (i) partitioning problems based on individual data samples and (ii) based on sets of data samples representing tasks. Approach (i) equips the system with the ability to solve complex decision-making problems by finding an optimal combination of local expert decision-makers. Approach (ii) leads to decision-makers specialized in solving families of tasks, which equips the system with the ability to solve meta-learning problems. We show the broad applicability of our approach on a range of problems including classification, regression, density estimation, and reinforcement learning problems, both in the standard machine learning setup and in a meta-learning setting.
△ Less
Submitted 3 November, 2020;
originally announced November 2020.
-
Very High Resolution Land Cover Mapping of Urban Areas at Global Scale with Convolutional Neural Networks
Authors:
Thomas Tilak,
Arnaud Braun,
David Chandler,
Nicolas David,
Sylvain Galopin,
Amélie Lombard,
Michaël Michaud,
Camille Parisel,
Matthieu Porte,
Marjorie Robert
Abstract:
This paper describes a methodology to produce a 7-classes land cover map of urban areas from very high resolution images and limited noisy labeled data. The objective is to make a segmentation map of a large area (a french department) with the following classes: asphalt, bare soil, building, grassland, mineral material (permeable artificialized areas), forest and water from 20cm aerial images and…
▽ More
This paper describes a methodology to produce a 7-classes land cover map of urban areas from very high resolution images and limited noisy labeled data. The objective is to make a segmentation map of a large area (a french department) with the following classes: asphalt, bare soil, building, grassland, mineral material (permeable artificialized areas), forest and water from 20cm aerial images and Digital Height Model. We created a training dataset on a few areas of interest aggregating databases, semi-automatic classification, and manual annotation to get a complete ground truth in each class. A comparative study of different encoder-decoder architectures (U-Net, U-Net with Resnet encoders, Deeplab v3+) is presented with different loss functions. The final product is a highly valuable land cover map computed from model predictions stitched together, binarized, and refined before vectorization.
△ Less
Submitted 12 May, 2020;
originally announced May 2020.
-
On Interpretability of Deep Learning based Skin Lesion Classifiers using Concept Activation Vectors
Authors:
Adriano Lucieri,
Muhammad Naseer Bajwa,
Stephan Alexander Braun,
Muhammad Imran Malik,
Andreas Dengel,
Sheraz Ahmed
Abstract:
Deep learning based medical image classifiers have shown remarkable prowess in various application areas like ophthalmology, dermatology, pathology, and radiology. However, the acceptance of these Computer-Aided Diagnosis (CAD) systems in real clinical setups is severely limited primarily because their decision-making process remains largely obscure. This work aims at elucidating a deep learning b…
▽ More
Deep learning based medical image classifiers have shown remarkable prowess in various application areas like ophthalmology, dermatology, pathology, and radiology. However, the acceptance of these Computer-Aided Diagnosis (CAD) systems in real clinical setups is severely limited primarily because their decision-making process remains largely obscure. This work aims at elucidating a deep learning based medical image classifier by verifying that the model learns and utilizes similar disease-related concepts as described and employed by dermatologists. We used a well-trained and high performing neural network developed by REasoning for COmplex Data (RECOD) Lab for classification of three skin tumours, i.e. Melanocytic Naevi, Melanoma and Seborrheic Keratosis and performed a detailed analysis on its latent space. Two well established and publicly available skin disease datasets, PH2 and derm7pt, are used for experimentation. Human understandable concepts are mapped to RECOD image classification model with the help of Concept Activation Vectors (CAVs), introducing a novel training and significance testing paradigm for CAVs. Our results on an independent evaluation set clearly shows that the classifier learns and encodes human understandable concepts in its latent representation. Additionally, TCAV scores (Testing with CAVs) suggest that the neural network indeed makes use of disease-related concepts in the correct way when making predictions. We anticipate that this work can not only increase confidence of medical practitioners on CAD but also serve as a stepping stone for further development of CAV-based neural network interpretation methods.
△ Less
Submitted 5 May, 2020;
originally announced May 2020.
-
The Two Kinds of Free Energy and the Bayesian Revolution
Authors:
Sebastian Gottwald,
Daniel A. Braun
Abstract:
The concept of free energy has its origins in 19th century thermodynamics, but has recently found its way into the behavioral and neural sciences, where it has been promoted for its wide applicability and has even been suggested as a fundamental principle of understanding intelligent behavior and brain function. We argue that there are essentially two different notions of free energy in current mo…
▽ More
The concept of free energy has its origins in 19th century thermodynamics, but has recently found its way into the behavioral and neural sciences, where it has been promoted for its wide applicability and has even been suggested as a fundamental principle of understanding intelligent behavior and brain function. We argue that there are essentially two different notions of free energy in current models of intelligent agency, that can both be considered as applications of Bayesian inference to the problem of action selection: one that appears when trading off accuracy and uncertainty based on a general maximum entropy principle, and one that formulates action selection in terms of minimizing an error measure that quantifies deviations of beliefs and policies from given reference models. The first approach provides a normative rule for action selection in the face of model uncertainty or when information processing capabilities are limited. The second approach directly aims to formulate the action selection problem as an inference problem in the context of Bayesian brain theories, also known as Active Inference in the literature. We elucidate the main ideas and discuss critical technical and conceptual issues revolving around these two notions of free energy that both claim to apply at all levels of decision-making, from the high-level deliberation of reasoning down to the low-level information processing of perception.
△ Less
Submitted 6 December, 2020; v1 submitted 24 April, 2020;
originally announced April 2020.
-
Hierarchical Expert Networks for Meta-Learning
Authors:
Heinke Hihn,
Daniel A. Braun
Abstract:
The goal of meta-learning is to train a model on a variety of learning tasks, such that it can adapt to new problems within only a few iterations. Here we propose a principled information-theoretic model that optimally partitions the underlying problem space such that specialized expert decision-makers solve the resulting sub-problems. To drive this specialization we impose the same kind of inform…
▽ More
The goal of meta-learning is to train a model on a variety of learning tasks, such that it can adapt to new problems within only a few iterations. Here we propose a principled information-theoretic model that optimally partitions the underlying problem space such that specialized expert decision-makers solve the resulting sub-problems. To drive this specialization we impose the same kind of information processing constraints both on the partitioning and the expert decision-makers. We argue that this specialization leads to efficient adaptation to new tasks. To demonstrate the generality of our approach we evaluate three meta-learning domains: image classification, regression, and reinforcement learning.
△ Less
Submitted 9 September, 2020; v1 submitted 31 October, 2019;
originally announced November 2019.
-
An Information-theoretic On-line Learning Principle for Specialization in Hierarchical Decision-Making Systems
Authors:
Heinke Hihn,
Sebastian Gottwald,
Daniel A. Braun
Abstract:
Information-theoretic bounded rationality describes utility-optimizing decision-makers whose limited information-processing capabilities are formalized by information constraints. One of the consequences of bounded rationality is that resource-limited decision-makers can join together to solve decision-making problems that are beyond the capabilities of each individual. Here, we study an informati…
▽ More
Information-theoretic bounded rationality describes utility-optimizing decision-makers whose limited information-processing capabilities are formalized by information constraints. One of the consequences of bounded rationality is that resource-limited decision-makers can join together to solve decision-making problems that are beyond the capabilities of each individual. Here, we study an information-theoretic principle that drives division of labor and specialization when decision-makers with information constraints are joined together. We devise an on-line learning rule of this principle that learns a partitioning of the problem space such that it can be solved by specialized linear policies. We demonstrate the approach for decision-making problems whose complexity exceeds the capabilities of individual decision-makers, but can be solved by combining the decision-makers optimally. The strength of the model is that it is abstract and principled, yet has direct applications in classification, regression, reinforcement learning and adaptive control.
△ Less
Submitted 5 December, 2019; v1 submitted 26 July, 2019;
originally announced July 2019.
-
Bounded rational decision-making from elementary computations that reduce uncertainty
Authors:
Sebastian Gottwald,
Daniel A. Braun
Abstract:
In its most basic form, decision-making can be viewed as a computational process that progressively eliminates alternatives, thereby reducing uncertainty. Such processes are generally costly, meaning that the amount of uncertainty that can be reduced is limited by the amount of available computational resources. Here, we introduce the notion of elementary computation based on a fundamental princip…
▽ More
In its most basic form, decision-making can be viewed as a computational process that progressively eliminates alternatives, thereby reducing uncertainty. Such processes are generally costly, meaning that the amount of uncertainty that can be reduced is limited by the amount of available computational resources. Here, we introduce the notion of elementary computation based on a fundamental principle for probability transfers that reduce uncertainty. Elementary computations can be considered as the inverse of Pigou-Dalton transfers applied to probability distributions, closely related to the concepts of majorization, T-transforms, and generalized entropies that induce a preorder on the space of probability distributions. As a consequence we can define resource cost functions that are order-preserving and therefore monotonic with respect to the uncertainty reduction. This leads to a comprehensive notion of decision-making processes with limited resources. Along the way, we prove several new results on majorization theory, as well as on entropy and divergence measures.
△ Less
Submitted 8 April, 2019;
originally announced April 2019.
-
Systems of bounded rational agents with information-theoretic constraints
Authors:
Sebastian Gottwald,
Daniel A. Braun
Abstract:
Specialization and hierarchical organization are important features of efficient collaboration in economical, artificial, and biological systems. Here, we investigate the hypothesis that both features can be explained by the fact that each entity of such a system is limited in a certain way. We propose an information-theoretic approach based on a Free Energy principle, in order to computationally…
▽ More
Specialization and hierarchical organization are important features of efficient collaboration in economical, artificial, and biological systems. Here, we investigate the hypothesis that both features can be explained by the fact that each entity of such a system is limited in a certain way. We propose an information-theoretic approach based on a Free Energy principle, in order to computationally analyze systems of bounded rational agents that deal with such limitations optimally. We find that specialization allows to focus on fewer tasks, thus leading to a more efficient execution, but in turn requires coordination in hierarchical structures of specialized experts and coordinating units. Our results suggest that hierarchical architectures of specialized units at lower levels that are coordinated by units at higher levels are optimal, given that each unit's information-processing capability is limited and conforms to constraints on complexity costs.
△ Less
Submitted 16 September, 2018;
originally announced September 2018.
-
Bounded Rational Decision-Making with Adaptive Neural Network Priors
Authors:
Heinke Hihn,
Sebastian Gottwald,
Daniel A. Braun
Abstract:
Bounded rationality investigates utility-optimizing decision-makers with limited information-processing power. In particular, information theoretic bounded rationality models formalize resource constraints abstractly in terms of relative Shannon information, namely the Kullback-Leibler Divergence between the agents' prior and posterior policy. Between prior and posterior lies an anytime deliberati…
▽ More
Bounded rationality investigates utility-optimizing decision-makers with limited information-processing power. In particular, information theoretic bounded rationality models formalize resource constraints abstractly in terms of relative Shannon information, namely the Kullback-Leibler Divergence between the agents' prior and posterior policy. Between prior and posterior lies an anytime deliberation process that can be instantiated by sample-based evaluations of the utility function through Markov Chain Monte Carlo (MCMC) optimization. The most simple model assumes a fixed prior and can relate abstract information-theoretic processing costs to the number of sample evaluations. However, more advanced models would also address the question of learning, that is how the prior is adapted over time such that generated prior proposals become more efficient. In this work we investigate generative neural networks as priors that are optimized concurrently with anytime sample-based decision-making processes such as MCMC. We evaluate this approach on toy examples.
△ Less
Submitted 4 September, 2018;
originally announced September 2018.
-
An information-theoretic on-line update principle for perception-action coupling
Authors:
Zhen Peng,
Tim Genewein,
Felix Leibfried,
Daniel A. Braun
Abstract:
Inspired by findings of sensorimotor coupling in humans and animals, there has recently been a growing interest in the interaction between action and perception in robotic systems [Bogh et al., 2016]. Here we consider perception and action as two serial information channels with limited information-processing capacity. We follow [Genewein et al., 2015] and formulate a constrained optimization prob…
▽ More
Inspired by findings of sensorimotor coupling in humans and animals, there has recently been a growing interest in the interaction between action and perception in robotic systems [Bogh et al., 2016]. Here we consider perception and action as two serial information channels with limited information-processing capacity. We follow [Genewein et al., 2015] and formulate a constrained optimization problem that maximizes utility under limited information-processing capacity in the two channels. As a solution we obtain an optimal perceptual channel and an optimal action channel that are coupled such that perceptual information is optimized with respect to downstream processing in the action module. The main novelty of this study is that we propose an online optimization procedure to find bounded-optimal perception and action channels in parameterized serial perception-action systems. In particular, we implement the perceptual channel as a multi-layer neural network and the action channel as a multinomial distribution. We illustrate our method in a NAO robot simulator with a simplified cup lifting task.
△ Less
Submitted 16 April, 2018;
originally announced April 2018.
-
Planning with Information-Processing Constraints and Model Uncertainty in Markov Decision Processes
Authors:
Jordi Grau-Moya,
Felix Leibfried,
Tim Genewein,
Daniel A. Braun
Abstract:
Information-theoretic principles for learning and acting have been proposed to solve particular classes of Markov Decision Problems. Mathematically, such approaches are governed by a variational free energy principle and allow solving MDP planning problems with information-processing constraints expressed in terms of a Kullback-Leibler divergence with respect to a reference distribution. Here we c…
▽ More
Information-theoretic principles for learning and acting have been proposed to solve particular classes of Markov Decision Problems. Mathematically, such approaches are governed by a variational free energy principle and allow solving MDP planning problems with information-processing constraints expressed in terms of a Kullback-Leibler divergence with respect to a reference distribution. Here we consider a generalization of such MDP planners by taking model uncertainty into account. As model uncertainty can also be formalized as an information-processing constraint, we can derive a unified solution from a single generalized variational principle. We provide a generalized value iteration scheme together with a convergence proof. As limit cases, this generalized scheme includes standard value iteration with a known model, Bayesian MDP planning, and robust planning. We demonstrate the benefits of this approach in a grid world simulation.
△ Less
Submitted 7 April, 2016;
originally announced April 2016.
-
Bounded Rational Decision-Making in Feedforward Neural Networks
Authors:
Felix Leibfried,
Daniel Alexander Braun
Abstract:
Bounded rational decision-makers transform sensory input into motor output under limited computational resources. Mathematically, such decision-makers can be modeled as information-theoretic channels with limited transmission rate. Here, we apply this formalism for the first time to multilayer feedforward neural networks. We derive synaptic weight update rules for two scenarios, where either each…
▽ More
Bounded rational decision-makers transform sensory input into motor output under limited computational resources. Mathematically, such decision-makers can be modeled as information-theoretic channels with limited transmission rate. Here, we apply this formalism for the first time to multilayer feedforward neural networks. We derive synaptic weight update rules for two scenarios, where either each neuron is considered as a bounded rational decision-maker or the network as a whole. In the update rules, bounded rationality translates into information-theoretically motivated types of regularization in weight space. In experiments on the MNIST benchmark classification task for handwritten digits, we show that such information-theoretic regularization successfully prevents overfitting across different architectures and attains results that are competitive with other recent techniques like dropout, dropconnect and Bayes by backprop, for both ordinary and convolutional neural networks.
△ Less
Submitted 23 May, 2016; v1 submitted 26 February, 2016;
originally announced February 2016.
-
Information-Theoretic Bounded Rationality
Authors:
Pedro A. Ortega,
Daniel A. Braun,
Justin Dyer,
Kee-Eung Kim,
Naftali Tishby
Abstract:
Bounded rationality, that is, decision-making and planning under resource limitations, is widely regarded as an important open problem in artificial intelligence, reinforcement learning, computational neuroscience and economics. This paper offers a consolidated presentation of a theory of bounded rationality based on information-theoretic ideas. We provide a conceptual justification for using the…
▽ More
Bounded rationality, that is, decision-making and planning under resource limitations, is widely regarded as an important open problem in artificial intelligence, reinforcement learning, computational neuroscience and economics. This paper offers a consolidated presentation of a theory of bounded rationality based on information-theoretic ideas. We provide a conceptual justification for using the free energy functional as the objective function for characterizing bounded-rational decisions. This functional possesses three crucial properties: it controls the size of the solution space; it has Monte Carlo planners that are exact, yet bypass the need for exhaustive search; and it captures model uncertainty arising from lack of evidence or from interacting with other agents having unknown intentions. We discuss the single-step decision-making case, and show how to extend it to sequential decisions using equivalence transformations. This extension yields a very general class of decision problems that encompass classical decision rules (e.g. EXPECTIMAX and MINIMAX) as limit cases, as well as trust- and risk-sensitive planning.
△ Less
Submitted 21 December, 2015;
originally announced December 2015.
-
Adaptive information-theoretic bounded rational decision-making with parametric priors
Authors:
Jordi Grau-Moya,
Daniel A. Braun
Abstract:
Deviations from rational decision-making due to limited computational resources have been studied in the field of bounded rationality, originally proposed by Herbert Simon. There have been a number of different approaches to model bounded rationality ranging from optimality principles to heuristics. Here we take an information-theoretic approach to bounded rationality, where information-processing…
▽ More
Deviations from rational decision-making due to limited computational resources have been studied in the field of bounded rationality, originally proposed by Herbert Simon. There have been a number of different approaches to model bounded rationality ranging from optimality principles to heuristics. Here we take an information-theoretic approach to bounded rationality, where information-processing costs are measured by the relative entropy between a posterior decision strategy and a given fixed prior strategy. In the case of multiple environments, it can be shown that there is an optimal prior rendering the bounded rationality problem equivalent to the rate distortion problem for lossy compression in information theory. Accordingly, the optimal prior and posterior strategies can be computed by the well-known Blahut-Arimoto algorithm which requires the computation of partition sums over all possible outcomes and cannot be applied straightforwardly to continuous problems. Here we derive a sampling-based alternative update rule for the adaptation of prior behaviors of decision-makers and we show convergence to the optimal prior predicted by rate distortion theory. Importantly, the update rule avoids typical infeasible operations such as the computation of partition sums. We show in simulations a proof of concept for discrete action and environment domains. This approach is not only interesting as a generic computational method, but might also provide a more realistic model of human decision-making processes occurring on a fast and a slow time scale.
△ Less
Submitted 5 November, 2015;
originally announced November 2015.
-
Robust and Efficient Elimination of Cache and Timing Side Channels
Authors:
Benjamin A. Braun,
Suman Jana,
Dan Boneh
Abstract:
Timing and cache side channels provide powerful attacks against many sensitive operations including cryptographic implementations. Existing defenses cannot protect against all classes of such attacks without incurring prohibitive performance overhead. A popular strategy for defending against all classes of these attacks is to modify the implementation so that the timing and cache access patterns o…
▽ More
Timing and cache side channels provide powerful attacks against many sensitive operations including cryptographic implementations. Existing defenses cannot protect against all classes of such attacks without incurring prohibitive performance overhead. A popular strategy for defending against all classes of these attacks is to modify the implementation so that the timing and cache access patterns of every hardware instruction is independent of the secret inputs. However, this solution is architecture-specific, brittle, and difficult to get right. In this paper, we propose and evaluate a robust low-overhead technique for mitigating timing and cache channels. Our solution requires only minimal source code changes and works across multiple languages/platforms. We report the experimental results of applying our solution to protect several C, C++, and Java programs. Our results demonstrate that our solution successfully eliminates the timing and cache side-channel leaks while incurring significantly lower performance overhead than existing approaches.
△ Less
Submitted 31 August, 2015; v1 submitted 30 May, 2015;
originally announced June 2015.
-
Bounded Rational Decision-Making in Changing Environments
Authors:
Jordi Grau-Moya,
Daniel A. Braun
Abstract:
A perfectly rational decision-maker chooses the best action with the highest utility gain from a set of possible actions. The optimality principles that describe such decision processes do not take into account the computational costs of finding the optimal action. Bounded rational decision-making addresses this problem by specifically trading off information-processing costs and expected utility.…
▽ More
A perfectly rational decision-maker chooses the best action with the highest utility gain from a set of possible actions. The optimality principles that describe such decision processes do not take into account the computational costs of finding the optimal action. Bounded rational decision-making addresses this problem by specifically trading off information-processing costs and expected utility. Interestingly, a similar trade-off between energy and entropy arises when describing changes in thermodynamic systems. This similarity has been recently used to describe bounded rational agents. Crucially, this framework assumes that the environment does not change while the decision-maker is computing the optimal policy. When this requirement is not fulfilled, the decision-maker will suffer inefficiencies in utility, that arise because the current policy is optimal for an environment in the past. Here we borrow concepts from non-equilibrium thermodynamics to quantify these inefficiencies and illustrate with simulations its relationship with computational resources.
△ Less
Submitted 23 December, 2013;
originally announced December 2013.
-
Abstraction in decision-makers with limited information processing capabilities
Authors:
Tim Genewein,
Daniel A. Braun
Abstract:
A distinctive property of human and animal intelligence is the ability to form abstractions by neglecting irrelevant information which allows to separate structure from noise. From an information theoretic point of view abstractions are desirable because they allow for very efficient information processing. In artificial systems abstractions are often implemented through computationally costly for…
▽ More
A distinctive property of human and animal intelligence is the ability to form abstractions by neglecting irrelevant information which allows to separate structure from noise. From an information theoretic point of view abstractions are desirable because they allow for very efficient information processing. In artificial systems abstractions are often implemented through computationally costly formations of groups or clusters. In this work we establish the relation between the free-energy framework for decision making and rate-distortion theory and demonstrate how the application of rate-distortion for decision-making leads to the emergence of abstractions. We argue that abstractions are induced due to a limit in information processing capacity.
△ Less
Submitted 19 December, 2013; v1 submitted 16 December, 2013;
originally announced December 2013.
-
Generalized Thompson Sampling for Sequential Decision-Making and Causal Inference
Authors:
Pedro A. Ortega,
Daniel A. Braun
Abstract:
Recently, it has been shown how sampling actions from the predictive distribution over the optimal action-sometimes called Thompson sampling-can be applied to solve sequential adaptive control problems, when the optimal policy is known for each possible environment. The predictive distribution can then be constructed by a Bayesian superposition of the optimal policies weighted by their posterior p…
▽ More
Recently, it has been shown how sampling actions from the predictive distribution over the optimal action-sometimes called Thompson sampling-can be applied to solve sequential adaptive control problems, when the optimal policy is known for each possible environment. The predictive distribution can then be constructed by a Bayesian superposition of the optimal policies weighted by their posterior probability that is updated by Bayesian inference and causal calculus. Here we discuss three important features of this approach. First, we discuss in how far such Thompson sampling can be regarded as a natural consequence of the Bayesian modeling of policy uncertainty. Second, we show how Thompson sampling can be used to study interactions between multiple adaptive agents, thus, opening up an avenue of game-theoretic analysis. Third, we show how Thompson sampling can be applied to infer causal relationships when interacting with an environment in a sequential fashion. In summary, our results suggest that Thompson sampling might not merely be a useful heuristic, but a principled method to address problems of adaptive sequential decision-making and causal inference.
△ Less
Submitted 18 March, 2013;
originally announced March 2013.
-
A Nonparametric Conjugate Prior Distribution for the Maximizing Argument of a Noisy Function
Authors:
Pedro A. Ortega,
Jordi Grau-Moya,
Tim Genewein,
David Balduzzi,
Daniel A. Braun
Abstract:
We propose a novel Bayesian approach to solve stochastic optimization problems that involve finding extrema of noisy, nonlinear functions. Previous work has focused on representing possible functions explicitly, which leads to a two-step procedure of first, doing inference over the function space and second, finding the extrema of these functions. Here we skip the representation step and directly…
▽ More
We propose a novel Bayesian approach to solve stochastic optimization problems that involve finding extrema of noisy, nonlinear functions. Previous work has focused on representing possible functions explicitly, which leads to a two-step procedure of first, doing inference over the function space and second, finding the extrema of these functions. Here we skip the representation step and directly model the distribution over extrema. To this end, we devise a non-parametric conjugate prior based on a kernel regressor. The resulting posterior distribution directly captures the uncertainty over the maximum of the unknown function. We illustrate the effectiveness of our model by optimizing a noisy, high-dimensional, non-convex objective function.
△ Less
Submitted 10 November, 2012; v1 submitted 8 June, 2012;
originally announced June 2012.
-
Free Energy and the Generalized Optimality Equations for Sequential Decision Making
Authors:
Pedro A. Ortega,
Daniel A. Braun
Abstract:
The free energy functional has recently been proposed as a variational principle for bounded rational decision-making, since it instantiates a natural trade-off between utility gains and information processing costs that can be axiomatically derived. Here we apply the free energy principle to general decision trees that include both adversarial and stochastic environments. We derive generalized se…
▽ More
The free energy functional has recently been proposed as a variational principle for bounded rational decision-making, since it instantiates a natural trade-off between utility gains and information processing costs that can be axiomatically derived. Here we apply the free energy principle to general decision trees that include both adversarial and stochastic environments. We derive generalized sequential optimality equations that not only include the Bellman optimality equations as a limit case, but also lead to well-known decision-rules such as Expectimax, Minimax and Expectiminimax. We show how these decision-rules can be derived from a single free energy principle that assigns a resource parameter to each node in the decision tree. These resource parameters express a concrete computational cost that can be measured as the amount of samples that are needed from the distribution that belongs to each node. The free energy principle therefore provides the normative basis for generalized optimality equations that account for both adversarial and stochastic environments.
△ Less
Submitted 17 May, 2012;
originally announced May 2012.
-
Information, Utility & Bounded Rationality
Authors:
Pedro A. Ortega,
Daniel A. Braun
Abstract:
Perfectly rational decision-makers maximize expected utility, but crucially ignore the resource costs incurred when determining optimal actions. Here we employ an axiomatic framework for bounded rational decision-making based on a thermodynamic interpretation of resource costs as information costs. This leads to a variational "free utility" principle akin to thermodynamical free energy that trades…
▽ More
Perfectly rational decision-makers maximize expected utility, but crucially ignore the resource costs incurred when determining optimal actions. Here we employ an axiomatic framework for bounded rational decision-making based on a thermodynamic interpretation of resource costs as information costs. This leads to a variational "free utility" principle akin to thermodynamical free energy that trades off utility and information costs. We show that bounded optimal control solutions can be derived from this variational principle, which leads in general to stochastic policies. Furthermore, we show that risk-sensitive and robust (minimax) control schemes fall out naturally from this framework if the environment is considered as a bounded rational and perfectly rational opponent, respectively. When resource costs are ignored, the maximum expected utility principle is recovered.
△ Less
Submitted 28 July, 2011;
originally announced July 2011.
-
An axiomatic formalization of bounded rationality based on a utility-information equivalence
Authors:
Pedro A. Ortega,
Daniel A. Braun
Abstract:
Classic decision-theory is based on the maximum expected utility (MEU) principle, but crucially ignores the resource costs incurred when determining optimal decisions. Here we propose an axiomatic framework for bounded decision-making that considers resource costs. Agents are formalized as probability measures over input-output streams. We postulate that any such probability measure can be assigne…
▽ More
Classic decision-theory is based on the maximum expected utility (MEU) principle, but crucially ignores the resource costs incurred when determining optimal decisions. Here we propose an axiomatic framework for bounded decision-making that considers resource costs. Agents are formalized as probability measures over input-output streams. We postulate that any such probability measure can be assigned a corresponding conjugate utility function based on three axioms: utilities should be real-valued, additive and monotonic mappings of probabilities. We show that these axioms enforce a unique conversion law between utility and probability (and thereby, information). Moreover, we show that this relation can be characterized as a variational principle: given a utility function, its conjugate probability measure maximizes a free utility functional. Transformations of probability measures can then be formalized as a change in free utility due to the addition of new constraints expressed by a target utility function. Accordingly, one obtains a criterion to choose a probability measure that trades off the maximization of a target utility function and the cost of the deviation from a reference distribution. We show that optimal control, adaptive estimation and adaptive control problems can be solved this way in a resource-efficient way. When resource costs are ignored, the MEU principle is recovered. Our formalization might thus provide a principled approach to bounded rationality that establishes a close link to information theory.
△ Less
Submitted 6 July, 2010;
originally announced July 2010.