subscribe to arXiv mailings

Embedded Hierarchical MPC for Autonomous Navigation

Authors: Dennis Benders, Johannes Köhler, Thijs Niesten, Robert Babuška, Javier Alonso-Mora, Laura Ferranti

Abstract: To efficiently deploy robotic systems in society, mobile robots need to autonomously and safely move through complex environments. Nonlinear model predictive control (MPC) methods provide a natural way to find a dynamically feasible trajectory through the environment without colliding with nearby obstacles. However, the limited computation power available on typical embedded robotic systems, such… ▽ More To efficiently deploy robotic systems in society, mobile robots need to autonomously and safely move through complex environments. Nonlinear model predictive control (MPC) methods provide a natural way to find a dynamically feasible trajectory through the environment without colliding with nearby obstacles. However, the limited computation power available on typical embedded robotic systems, such as quadrotors, poses a challenge to running MPC in real-time, including its most expensive tasks: constraints generation and optimization. To address this problem, we propose a novel hierarchical MPC scheme that interconnects a planning and a tracking layer. The planner constructs a trajectory with a long prediction horizon at a slow rate, while the tracker ensures trajectory tracking at a relatively fast rate. We prove that the proposed framework avoids collisions and is recursively feasible. Furthermore, we demonstrate its effectiveness in simulations and lab experiments with a quadrotor that needs to reach a goal position in a complex static environment. The code is efficiently implemented on the quadrotor's embedded computer to ensure real-time feasibility. Compared to a state-of-the-art single-layer MPC formulation, this allows us to increase the planning horizon by a factor of 5, which results in significantly better performance. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 18 pages, 14 figures (excluding biography entries)

arXiv:2405.19243 [pdf]

Challenge-Device-Synthesis: A multi-disciplinary approach for the development of social innovation competences for students of Artificial Intelligence

Authors: Matías Bilkis, Joan Moya Kohler, Fernando Vilariño

Abstract: The advent of Artificial Intelligence is expected to imply profound changes in the short-term. It is therefore imperative for Academia, and particularly for the Computer Science scope, to develop cross-disciplinary tools that bond AI developments to their social dimension. To this aim, we introduce the Challenge-Device-Synthesis methodology (CDS), in which a specific challenge is presented to the… ▽ More The advent of Artificial Intelligence is expected to imply profound changes in the short-term. It is therefore imperative for Academia, and particularly for the Computer Science scope, to develop cross-disciplinary tools that bond AI developments to their social dimension. To this aim, we introduce the Challenge-Device-Synthesis methodology (CDS), in which a specific challenge is presented to the students of AI, who are required to develop a device as a solution for the challenge. The device becomes the object of study for the different dimensions of social transformation, and the conclusions addressed by the students during the discussion around the device are presented in a synthesis piece in the shape of a 10-page scientific paper. The latter is evaluated taking into account both the depth of analysis and the level to which it genuinely reflects the social transformations associated with the proposed AI-based device. We provide data obtained during the pilot for the implementation phase of CDS within the subject of Social Innovation, a 6-ECTS subject from the 6th semester of the Degree of Artificial Intelligence, UAB-Barcelona. We provide details on temporalisation, task distribution, methodological tools used and assessment delivery procedure, as well as qualitative analysis of the results obtained. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: accepted as contribution for EDULEARN24 - 16th annual International Conference on Education and New Learning Technologies

arXiv:2405.05224 [pdf, other]

Imagine Flash: Accelerating Emu Diffusion Models with Backward Distillation

Authors: Jonas Kohler, Albert Pumarola, Edgar Schönfeld, Artsiom Sanakoyeu, Roshan Sumbaly, Peter Vajda, Ali Thabet

Abstract: Diffusion models are a powerful generative framework, but come with expensive inference. Existing acceleration methods often compromise image quality or fail under complex conditioning when operating in an extremely low-step regime. In this work, we propose a novel distillation framework tailored to enable high-fidelity, diverse sample generation using just one to three steps. Our approach compris… ▽ More Diffusion models are a powerful generative framework, but come with expensive inference. Existing acceleration methods often compromise image quality or fail under complex conditioning when operating in an extremely low-step regime. In this work, we propose a novel distillation framework tailored to enable high-fidelity, diverse sample generation using just one to three steps. Our approach comprises three key components: (i) Backward Distillation, which mitigates training-inference discrepancies by calibrating the student on its own backward trajectory; (ii) Shifted Reconstruction Loss that dynamically adapts knowledge transfer based on the current time step; and (iii) Noise Correction, an inference-time technique that enhances sample quality by addressing singularities in noise prediction. Through extensive experiments, we demonstrate that our method outperforms existing competitors in quantitative metrics and human evaluations. Remarkably, it achieves performance comparable to the teacher model using only three denoising steps, enabling efficient high-quality generation. △ Less

Submitted 8 May, 2024; originally announced May 2024.

arXiv:2405.03243 [pdf, other]

Mind the Gap Between Synthetic and Real: Utilizing Transfer Learning to Probe the Boundaries of Stable Diffusion Generated Data

Authors: Leonhard Hennicke, Christian Medeiros Adriano, Holger Giese, Jan Mathias Koehler, Lukas Schott

Abstract: Generative foundation models like Stable Diffusion comprise a diverse spectrum of knowledge in computer vision with the potential for transfer learning, e.g., via generating data to train student models for downstream tasks. This could circumvent the necessity of collecting labeled real-world data, thereby presenting a form of data-free knowledge distillation. However, the resultant student models… ▽ More Generative foundation models like Stable Diffusion comprise a diverse spectrum of knowledge in computer vision with the potential for transfer learning, e.g., via generating data to train student models for downstream tasks. This could circumvent the necessity of collecting labeled real-world data, thereby presenting a form of data-free knowledge distillation. However, the resultant student models show a significant drop in accuracy compared to models trained on real data. We investigate possible causes for this drop and focus on the role of the different layers of the student model. By training these layers using either real or synthetic data, we reveal that the drop mainly stems from the model's final layers. Further, we briefly investigate other factors, such as differences in data-normalization between synthetic and real, the impact of data augmentations, texture vs.\ shape learning, and assuming oracle prompts. While we find that some of those factors can have an impact, they are not sufficient to close the gap towards real data. Building upon our insights that mainly later layers are responsible for the drop, we investigate the data-efficiency of fine-tuning a synthetically trained model with real data applied to only those last layers. Our results suggest an improved trade-off between the amount of real training data used and the model's accuracy. Our findings contribute to the understanding of the gap between synthetic and real data and indicate solutions to mitigate the scarcity of labeled real data. △ Less

Submitted 6 May, 2024; originally announced May 2024.

arXiv:2404.19020 [pdf]

Information literacy development and assessment at school level: a systematic review of the literature

Authors: Luz Chourio-Acevedo, Jacqueline Köhler, Carla Coscarelli, Daniel Gacitúa, Verónica Proaño-Ríos, Roberto González-Ibáñez

Abstract: Information literacy (IL) involves a group of competences and fundamental skills in the 21st century. Today, society operates around information, which is challenging considering the vast amount of content available online. People must be capable of searching, critically assessing, making sense of, and communicating information. This set of competences must be properly developed since childhood, e… ▽ More Information literacy (IL) involves a group of competences and fundamental skills in the 21st century. Today, society operates around information, which is challenging considering the vast amount of content available online. People must be capable of searching, critically assessing, making sense of, and communicating information. This set of competences must be properly developed since childhood, especially if considering early age access to online resources. To better understand the evolution and current status of IL development and assessment at school (K-12) level, we conducted a systematic literature review based on the guidelines established by the PRISMA statement. Our review led us to an initial set of 1,234 articles, from which 53 passed the inclusion criteria. These articles were used to address six research questions focused on IL definitions, skills, standards, and assessment tools. Our review shows IL evolution over the years and how it has been formalisedthrough definitions and standards. These findings reveal key gaps that must be addressed in order to advance the field further. Keywords: Elementary education, Information literacy, Secondary education, 21st Century abilities. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.06219 [pdf, other]

doi 10.5220/0011986300003497

Automatic Defect Detection in Sewer Network Using Deep Learning Based Object Detector

Authors: Bach Ha, Birgit Schalter, Laura White, Joachim Koehler

Abstract: Maintaining sewer systems in large cities is important, but also time and effort consuming, because visual inspections are currently done manually. To reduce the amount of aforementioned manual work, defects within sewer pipes should be located and classified automatically. In the past, multiple works have attempted solving this problem using classical image processing, machine learning, or a comb… ▽ More Maintaining sewer systems in large cities is important, but also time and effort consuming, because visual inspections are currently done manually. To reduce the amount of aforementioned manual work, defects within sewer pipes should be located and classified automatically. In the past, multiple works have attempted solving this problem using classical image processing, machine learning, or a combination of those. However, each provided solution only focus on detecting a limited set of defect/structure types, such as fissure, root, and/or connection. Furthermore, due to the use of hand-crafted features and small training datasets, generalization is also problematic. In order to overcome these deficits, a sizable dataset with 14.7 km of various sewer pipes were annotated by sewer maintenance experts in the scope of this work. On top of that, an object detector (EfficientDet-D0) was trained for automatic defect detection. From the result of several expermients, peculiar natures of defects in the context of object detection, which greatly effect annotation and training process, are found and discussed. At the end, the final detector was able to detect 83% of defects in the test set; out of the missing 17%, only 0.77% are very severe defects. This work provides an example of applying deep learning-based object detection into an important but quiet engineering field. It also gives some practical pointers on how to annotate peculiar "object", such as defects. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Journal ref: (2023) In Proceedings of the 3rd International Conference on Image Processing and Vision Engineering - IMPROVE; ISBN 978-989-758-642-2; ISSN 2795-4943, SciTePress, pages 188-198

arXiv:2404.01550 [pdf, other]

Perfecting Periodic Trajectory Tracking: Model Predictive Control with a Periodic Observer ($Π$-MPC)

Authors: Luis Pabon, Johannes Köhler, John Irvin Alora, Patrick Benito Eberhard, Andrea Carron, Melanie N. Zeilinger, Marco Pavone

Abstract: In Model Predictive Control (MPC), discrepancies between the actual system and the predictive model can lead to substantial tracking errors and significantly degrade performance and reliability. While such discrepancies can be alleviated with more complex models, this often complicates controller design and implementation. By leveraging the fact that many trajectories of interest are periodic, we… ▽ More In Model Predictive Control (MPC), discrepancies between the actual system and the predictive model can lead to substantial tracking errors and significantly degrade performance and reliability. While such discrepancies can be alleviated with more complex models, this often complicates controller design and implementation. By leveraging the fact that many trajectories of interest are periodic, we show that perfect tracking is possible when incorporating a simple observer that estimates and compensates for periodic disturbances. We present the design of the observer and the accompanying tracking MPC scheme, proving that their combination achieves zero tracking error asymptotically, regardless of the complexity of the unmodelled dynamics. We validate the effectiveness of our method, demonstrating asymptotically perfect tracking on a high-dimensional soft robot with nearly 10,000 states and a fivefold reduction in tracking errors compared to a baseline MPC on small-scale autonomous race car experiments. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: 8 pages, 3 figures, Submitted to the 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024)

arXiv:2402.06562 [pdf, other]

Safe Guaranteed Exploration for Non-linear Systems

Authors: Manish Prajapat, Johannes Köhler, Matteo Turchetta, Andreas Krause, Melanie N. Zeilinger

Abstract: Safely exploring environments with a-priori unknown constraints is a fundamental challenge that restricts the autonomy of robots. While safety is paramount, guarantees on sufficient exploration are also crucial for ensuring autonomous task completion. To address these challenges, we propose a novel safe guaranteed exploration framework using optimal control, which achieves first-of-its-kind result… ▽ More Safely exploring environments with a-priori unknown constraints is a fundamental challenge that restricts the autonomy of robots. While safety is paramount, guarantees on sufficient exploration are also crucial for ensuring autonomous task completion. To address these challenges, we propose a novel safe guaranteed exploration framework using optimal control, which achieves first-of-its-kind results: guaranteed exploration for non-linear systems with finite time sample complexity bounds, while being provably safe with arbitrarily high probability. The framework is general and applicable to many real-world scenarios with complex non-linear dynamics and unknown domains. Based on this framework we propose an efficient algorithm, SageMPC, SAfe Guaranteed Exploration using Model Predictive Control. SageMPC improves efficiency by incorporating three techniques: i) exploiting a Lipschitz bound, ii) goal-directed exploration, and iii) receding horizon style re-planning, all while maintaining the desired sample complexity, safety and exploration guarantees of the framework. Lastly, we demonstrate safe efficient exploration in challenging unknown environments using SageMPC with a car model. △ Less

Submitted 9 February, 2024; originally announced February 2024.

arXiv:2312.16109 [pdf, other]

fMPI: Fast Novel View Synthesis in the Wild with Layered Scene Representations

Authors: Jonas Kohler, Nicolas Griffiths Sanchez, Luca Cavalli, Catherine Herold, Albert Pumarola, Alberto Garcia Garcia, Ali Thabet

Abstract: In this study, we propose two novel input processing paradigms for novel view synthesis (NVS) methods based on layered scene representations that significantly improve their runtime without compromising quality. Our approach identifies and mitigates the two most time-consuming aspects of traditional pipelines: building and processing the so-called plane sweep volume (PSV), which is a high-dimensio… ▽ More In this study, we propose two novel input processing paradigms for novel view synthesis (NVS) methods based on layered scene representations that significantly improve their runtime without compromising quality. Our approach identifies and mitigates the two most time-consuming aspects of traditional pipelines: building and processing the so-called plane sweep volume (PSV), which is a high-dimensional tensor of planar re-projections of the input camera views. In particular, we propose processing this tensor in parallel groups for improved compute efficiency as well as super-sampling adjacent input planes to generate denser, and hence more accurate scene representation. The proposed enhancements offer significant flexibility, allowing for a balance between performance and speed, thus making substantial steps toward real-time applications. Furthermore, they are very general in the sense that any PSV-based method can make use of them, including methods that employ multiplane images, multisphere images, and layered depth images. In a comprehensive set of experiments, we demonstrate that our proposed paradigms enable the design of an NVS method that achieves state-of-the-art on public benchmarks while being up to $50x$ faster than existing state-of-the-art methods. It also beats the current forerunner in terms of speed by over $3x$, while achieving significantly better rendering quality. △ Less

Submitted 26 December, 2023; originally announced December 2023.

arXiv:2312.12487 [pdf, other]

Adaptive Guidance: Training-free Acceleration of Conditional Diffusion Models

Authors: Angela Castillo, Jonas Kohler, Juan C. Pérez, Juan Pablo Pérez, Albert Pumarola, Bernard Ghanem, Pablo Arbeláez, Ali Thabet

Abstract: This paper presents a comprehensive study on the role of Classifier-Free Guidance (CFG) in text-conditioned diffusion models from the perspective of inference efficiency. In particular, we relax the default choice of applying CFG in all diffusion steps and instead search for efficient guidance policies. We formulate the discovery of such policies in the differentiable Neural Architecture Search fr… ▽ More This paper presents a comprehensive study on the role of Classifier-Free Guidance (CFG) in text-conditioned diffusion models from the perspective of inference efficiency. In particular, we relax the default choice of applying CFG in all diffusion steps and instead search for efficient guidance policies. We formulate the discovery of such policies in the differentiable Neural Architecture Search framework. Our findings suggest that the denoising steps proposed by CFG become increasingly aligned with simple conditional steps, which renders the extra neural network evaluation of CFG redundant, especially in the second half of the denoising process. Building upon this insight, we propose "Adaptive Guidance" (AG), an efficient variant of CFG, that adaptively omits network evaluations when the denoising process displays convergence. Our experiments demonstrate that AG preserves CFG's image quality while reducing computation by 25%. Thus, AG constitutes a plug-and-play alternative to Guidance Distillation, achieving 50% of the speed-ups of the latter while being training-free and retaining the capacity to handle negative prompts. Finally, we uncover further redundancies of CFG in the first half of the diffusion process, showing that entire neural function evaluations can be replaced by simple affine transformations of past score estimates. This method, termed LinearAG, offers even cheaper inference at the cost of deviating from the baseline model. Our findings provide insights into the efficiency of the conditional denoising process that contribute to more practical and swift deployment of text-conditioned diffusion models. △ Less

Submitted 19 December, 2023; originally announced December 2023.

arXiv:2312.10199 [pdf, other]

Automatic nonlinear MPC approximation with closed-loop guarantees

Authors: Abdullah Tokmak, Christian Fiedler, Melanie N. Zeilinger, Sebastian Trimpe, Johannes Köhler

Abstract: Safety guarantees are vital in many control applications, such as robotics. Model predictive control (MPC) provides a constructive framework for controlling safety-critical systems, but is limited by its computational complexity. We address this problem by presenting a novel algorithm that automatically computes an explicit approximation to nonlinear MPC schemes while retaining closed-loop guarant… ▽ More Safety guarantees are vital in many control applications, such as robotics. Model predictive control (MPC) provides a constructive framework for controlling safety-critical systems, but is limited by its computational complexity. We address this problem by presenting a novel algorithm that automatically computes an explicit approximation to nonlinear MPC schemes while retaining closed-loop guarantees. Specifically, the problem can be reduced to a function approximation problem, which we then tackle by proposing ALKIA-X, the Adaptive and Localized Kernel Interpolation Algorithm with eXtrapolated reproducing kernel Hilbert space norm. ALKIA-X is a non-iterative algorithm that ensures numerically well-conditioned computations, a fast-to-evaluate approximating function, and the guaranteed satisfaction of any desired bound on the approximation error. Hence, ALKIA-X automatically computes an explicit function that approximates the MPC, yielding a controller suitable for safety-critical systems and high sampling rates. We apply ALKIA-X to approximate two nonlinear MPC schemes, demonstrating reduced computational demand and applicability to realistic problems. △ Less

Submitted 11 April, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

Comments: Submitted to IEEE Transactions on Automatic Control. Compared to the previously uploaded version, this version contains an additional numerical example

arXiv:2312.03209 [pdf, other]

Cache Me if You Can: Accelerating Diffusion Models through Block Caching

Authors: Felix Wimbauer, Bichen Wu, Edgar Schoenfeld, Xiaoliang Dai, Ji Hou, Zijian He, Artsiom Sanakoyeu, Peizhao Zhang, Sam Tsai, Jonas Kohler, Christian Rupprecht, Daniel Cremers, Peter Vajda, Jialiang Wang

Abstract: Diffusion models have recently revolutionized the field of image synthesis due to their ability to generate photorealistic images. However, one of the major drawbacks of diffusion models is that the image generation process is costly. A large image-to-image network has to be applied many times to iteratively refine an image from random noise. While many recent works propose techniques to reduce th… ▽ More Diffusion models have recently revolutionized the field of image synthesis due to their ability to generate photorealistic images. However, one of the major drawbacks of diffusion models is that the image generation process is costly. A large image-to-image network has to be applied many times to iteratively refine an image from random noise. While many recent works propose techniques to reduce the number of required steps, they generally treat the underlying denoising network as a black box. In this work, we investigate the behavior of the layers within the network and find that 1) the layers' output changes smoothly over time, 2) the layers show distinct patterns of change, and 3) the change from step to step is often very small. We hypothesize that many layer computations in the denoising network are redundant. Leveraging this, we introduce block caching, in which we reuse outputs from layer blocks of previous steps to speed up inference. Furthermore, we propose a technique to automatically determine caching schedules based on each block's changes over timesteps. In our experiments, we show through FID, human evaluation and qualitative analysis that Block Caching allows to generate images with higher visual quality at the same computational cost. We demonstrate this for different state-of-the-art models (LDM and EMU) and solvers (DDIM and DPM). △ Less

Submitted 12 January, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

Comments: Project page: https://fwmb.github.io/blockcaching/

arXiv:2311.04698 [pdf, other]

Examining Common Paradigms in Multi-Task Learning

Authors: Cathrin Elich, Lukas Kirchdorfer, Jan M. Köhler, Lukas Schott

Abstract: While multi-task learning (MTL) has gained significant attention in recent years, its underlying mechanisms remain poorly understood. Recent methods did not yield consistent performance improvements over single task learning (STL) baselines, underscoring the importance of gaining more profound insights about challenges specific to MTL. In our study, we investigate paradigms in MTL in the context o… ▽ More While multi-task learning (MTL) has gained significant attention in recent years, its underlying mechanisms remain poorly understood. Recent methods did not yield consistent performance improvements over single task learning (STL) baselines, underscoring the importance of gaining more profound insights about challenges specific to MTL. In our study, we investigate paradigms in MTL in the context of STL: First, the impact of the choice of optimizer has only been mildly investigated in MTL. We show the pivotal role of common STL tools such as the Adam optimizer in MTL empirically in various experiments. To further investigate Adam's effectiveness, we theoretical derive a partial loss-scale invariance under mild assumptions. Second, the notion of gradient conflicts has often been phrased as a specific problem in MTL. We delve into the role of gradient conflicts in MTL and compare it to STL. For angular gradient alignment we find no evidence that this is a unique problem in MTL. We emphasize differences in gradient magnitude as the main distinguishing factor. Overall, we find surprising similarities between STL and MTL suggesting to consider methods from both fields in a broader context. △ Less

Submitted 27 June, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

Comments: -

arXiv:2311.00604 [pdf, other]

A Systematic Review of Approximability Results for Traveling Salesman Problems leveraging the TSP-T3CO Definition Scheme

Authors: Sophia Saller, Jana Koehler, Andreas Karrenbauer

Abstract: The traveling salesman (or salesperson) problem, short TSP, is a problem of strong interest to many researchers from mathematics, economics, and computer science. Manifold TSP variants occur in nearly every scientific field and application domain: engineering, physics, biology, life sciences, and manufacturing just to name a few. Several thousand papers are published on theoretical research or app… ▽ More The traveling salesman (or salesperson) problem, short TSP, is a problem of strong interest to many researchers from mathematics, economics, and computer science. Manifold TSP variants occur in nearly every scientific field and application domain: engineering, physics, biology, life sciences, and manufacturing just to name a few. Several thousand papers are published on theoretical research or application-oriented results each year. This paper provides the first systematic survey on the best currently known approximability and inapproximability results for well-known TSP variants such as the "standard" TSP, Path TSP, Bottleneck TSP, Maximum Scatter TSP, Generalized TSP, Clustered TSP, Traveling Purchaser Problem, Profitable Tour Problem, Quota TSP, Prize-Collecting TSP, Orienteering Problem, Time-dependent TSP, TSP with Time Windows, and the Orienteering Problem with Time Windows. The foundation of our survey is the definition scheme T3CO, which we propose as a uniform, easy-to-use and extensible means for the formal and precise definition of TSP variants. Applying T3CO to formally define the variant studied by a paper reveals subtle differences within the same named variant and also brings out the differences between the variants more clearly. We achieve the first comprehensive, concise, and compact representation of approximability results by using T3CO definitions. This makes it easier to understand the approximability landscape and the assumptions under which certain results hold. Open gaps become more evident and results can be compared more easily. △ Less

Submitted 27 January, 2024; v1 submitted 1 November, 2023; originally announced November 2023.

arXiv:2309.05746 [pdf, other]

Robust Nonlinear Reduced-Order Model Predictive Control

Authors: John Irvin Alora, Luis A. Pabon, Johannes Köhler, Mattia Cenedese, Ed Schmerling, Melanie N. Zeilinger, George Haller, Marco Pavone

Abstract: Real-world systems are often characterized by high-dimensional nonlinear dynamics, making them challenging to control in real time. While reduced-order models (ROMs) are frequently employed in model-based control schemes, dimensionality reduction introduces model uncertainty which can potentially compromise the stability and safety of the original high-dimensional system. In this work, we propose… ▽ More Real-world systems are often characterized by high-dimensional nonlinear dynamics, making them challenging to control in real time. While reduced-order models (ROMs) are frequently employed in model-based control schemes, dimensionality reduction introduces model uncertainty which can potentially compromise the stability and safety of the original high-dimensional system. In this work, we propose a novel reduced-order model predictive control (ROMPC) scheme to solve constrained optimal control problems for nonlinear, high-dimensional systems. To address the challenges of using ROMs in predictive control schemes, we derive an error bounding system that dynamically accounts for model reduction error. Using these bounds, we design a robust MPC scheme that ensures robust constraint satisfaction, recursive feasibility, and asymptotic stability. We demonstrate the effectiveness of our proposed method in simulations on a high-dimensional soft robot with nearly 10,000 states. △ Less

Submitted 11 September, 2023; originally announced September 2023.

Comments: 9 pages, 3 figures, To be presented at Conference for Decision and Control 2023

arXiv:2304.09575 [pdf, other]

Approximate non-linear model predictive control with safety-augmented neural networks

Authors: Henrik Hose, Johannes Köhler, Melanie N. Zeilinger, Sebastian Trimpe

Abstract: Model predictive control (MPC) achieves stability and constraint satisfaction for general nonlinear systems, but requires computationally expensive online optimization. This paper studies approximations of such MPC controllers via neural networks (NNs) to achieve fast online evaluation. We propose safety augmentation that yields deterministic guarantees for convergence and constraint satisfaction… ▽ More Model predictive control (MPC) achieves stability and constraint satisfaction for general nonlinear systems, but requires computationally expensive online optimization. This paper studies approximations of such MPC controllers via neural networks (NNs) to achieve fast online evaluation. We propose safety augmentation that yields deterministic guarantees for convergence and constraint satisfaction despite approximation inaccuracies. We approximate the entire input sequence of the MPC with NNs, which allows us to verify online if it is a feasible solution to the MPC problem. We replace the NN solution by a safe candidate based on standard MPC techniques whenever it is infeasible or has worse cost. Our method requires a single evaluation of the NN and forward integration of the input sequence online, which is fast to compute on resource-constrained systems. The proposed control framework is illustrated on three non-linear MPC benchmarks of different complexity, demonstrating computational speedups orders of magnitudes higher than online optimization. In the examples, we achieve deterministic safety through the safety-augmented NNs, where naive NN implementation fails. △ Less

Submitted 19 April, 2023; originally announced April 2023.

arXiv:2301.11355 [pdf, other]

Rigid Body Flows for Sampling Molecular Crystal Structures

Authors: Jonas Köhler, Michele Invernizzi, Pim de Haan, Frank Noé

Abstract: Normalizing flows (NF) are a class of powerful generative models that have gained popularity in recent years due to their ability to model complex distributions with high flexibility and expressiveness. In this work, we introduce a new type of normalizing flow that is tailored for modeling positions and orientations of multiple objects in three-dimensional space, such as molecules in a crystal. Ou… ▽ More Normalizing flows (NF) are a class of powerful generative models that have gained popularity in recent years due to their ability to model complex distributions with high flexibility and expressiveness. In this work, we introduce a new type of normalizing flow that is tailored for modeling positions and orientations of multiple objects in three-dimensional space, such as molecules in a crystal. Our approach is based on two key ideas: first, we define smooth and expressive flows on the group of unit quaternions, which allows us to capture the continuous rotational motion of rigid bodies; second, we use the double cover property of unit quaternions to define a proper density on the rotation group. This ensures that our model can be trained using standard likelihood-based methods or variational inference with respect to a thermodynamic target density. We evaluate the method by training Boltzmann generators for two molecular examples, namely the multi-modal density of a tetrahedral system in an external field and the ice XI phase in the TIP4P water model. Our flows can be combined with flows operating on the internal degrees of freedom of molecules and constitute an important step towards the modeling of distributions of many interacting molecules. △ Less

Submitted 7 June, 2023; v1 submitted 26 January, 2023; originally announced January 2023.

Comments: International Conference on Machine Learning, 2023

arXiv:2210.16106 [pdf, other]

doi 10.1109/TAC.2023.3303168

Motion Planning using Reactive Circular Fields: A 2D Analysis of Collision Avoidance and Goal Convergence

Authors: Marvin Becker, Johannes Köhler, Sami Haddadin, Matthias A. Müller

Abstract: Recently, many reactive trajectory planning approaches were suggested in the literature because of their inherent immediate adaption in the ever more demanding cluttered and unpredictable environments of robotic systems. However, typically those approaches are only locally reactive without considering global path planning and no guarantees for simultaneous collision avoidance and goal convergence… ▽ More Recently, many reactive trajectory planning approaches were suggested in the literature because of their inherent immediate adaption in the ever more demanding cluttered and unpredictable environments of robotic systems. However, typically those approaches are only locally reactive without considering global path planning and no guarantees for simultaneous collision avoidance and goal convergence can be given. In this paper, we study a recently developed circular field (CF)-based motion planner that combines local reactive control with global trajectory generation by adapting an artificial magnetic field such that multiple trajectories around obstacles can be evaluated. In particular, we provide a mathematically rigorous analysis of this planner in a planar environment to ensure safe motion of the controlled robot. Contrary to existing results, the derived collision avoidance analysis covers the entire CF motion planning algorithm including attractive forces for goal convergence and is not limited to a specific choice of the rotation field, i.e., our guarantees are not limited to a specific potentially suboptimal trajectory. Our Lyapunov-type collision avoidance analysis is based on the definition of an (equivalent) two-dimensional auxiliary system, which enables us to provide tight, if and only if conditions for the case of a collision with point obstacles. Furthermore, we show how this analysis naturally extends to multiple obstacles and we specify sufficient conditions for goal convergence. Finally, we provide a challenging simulation scenario with multiple non-convex point cloud obstacles and demonstrate collision avoidance and goal convergence. △ Less

Submitted 3 November, 2023; v1 submitted 28 October, 2022; originally announced October 2022.

Comments: Published in IEEE Transactions on Automatic Control (Early Access)

arXiv:2206.02479 [pdf, other]

Easy, adaptable and high-quality Modelling with domain-specific Constraint Patterns

Authors: Sophia Saller, Jana Koehler

Abstract: Domain-specific constraint patterns are introduced, which form the counterpart to design patterns in software engineering for the constraint programming setting. These patterns describe the expert knowledge and best-practice solution to recurring problems and include example implementations. We aim to reach a stage where, for common problems, the modelling process consists of simply picking the ap… ▽ More Domain-specific constraint patterns are introduced, which form the counterpart to design patterns in software engineering for the constraint programming setting. These patterns describe the expert knowledge and best-practice solution to recurring problems and include example implementations. We aim to reach a stage where, for common problems, the modelling process consists of simply picking the applicable patterns from a library of patterns and combining them in a model. This vastly simplifies the modelling process and makes the models simple to adapt. By making the patterns domain-specific we can further include problem-specific modelling ideas, including specific global constraints and search strategies that are known for the problem, into the pattern description. This ensures that the model we obtain from patterns is not only correct but also of high quality. We introduce domain-specific constraint patterns on the example of job shop and flow shop, discuss their advantages and show how the occurrence of patterns can automatically be checked in an event log. △ Less

Submitted 6 June, 2022; originally announced June 2022.

Comments: 15 pages

Journal ref: Twentieth International Workshop on Constraint Modelling and Reformulation, ModRef, 2021

arXiv:2203.11167 [pdf, other]

doi 10.1021/acs.jctc.3c00016

Flow-matching -- efficient coarse-graining of molecular dynamics without forces

Authors: Jonas Köhler, Yaoyi Chen, Andreas Krämer, Cecilia Clementi, Frank Noé

Abstract: Coarse-grained (CG) molecular simulations have become a standard tool to study molecular processes on time- and length-scales inaccessible to all-atom simulations. Parameterizing CG force fields to match all-atom simulations has mainly relied on force-matching or relative entropy minimization, which require many samples from costly simulations with all-atom or CG resolutions, respectively. Here we… ▽ More Coarse-grained (CG) molecular simulations have become a standard tool to study molecular processes on time- and length-scales inaccessible to all-atom simulations. Parameterizing CG force fields to match all-atom simulations has mainly relied on force-matching or relative entropy minimization, which require many samples from costly simulations with all-atom or CG resolutions, respectively. Here we present flow-matching, a new training method for CG force fields that combines the advantages of both methods by leveraging normalizing flows, a generative deep learning method. Flow-matching first trains a normalizing flow to represent the CG probability density, which is equivalent to minimizing the relative entropy without requiring iterative CG simulations. Subsequently, the flow generates samples and forces according to the learned distribution in order to train the desired CG free energy model via force matching. Even without requiring forces from the all-atom simulations, flow-matching outperforms classical force-matching by an order of magnitude in terms of data efficiency, and produces CG models that can capture the folding and unfolding transitions of small proteins. △ Less

Submitted 5 February, 2023; v1 submitted 21 March, 2022; originally announced March 2022.

Journal ref: J. Chem. Theory Comput. 2023, XXXX, XXX, XXX-XXX

arXiv:2201.06868 [pdf, other]

A Study on the Ambiguity in Human Annotation of German Oral History Interviews for Perceived Emotion Recognition and Sentiment Analysis

Authors: Michael Gref, Nike Matthiesen, Sreenivasa Hikkal Venugopala, Shalaka Satheesh, Aswinkumar Vijayananth, Duc Bach Ha, Sven Behnke, Joachim Köhler

Abstract: For research in audiovisual interview archives often it is not only of interest what is said but also how. Sentiment analysis and emotion recognition can help capture, categorize and make these different facets searchable. In particular, for oral history archives, such indexing technologies can be of great interest. These technologies can help understand the role of emotions in historical remember… ▽ More For research in audiovisual interview archives often it is not only of interest what is said but also how. Sentiment analysis and emotion recognition can help capture, categorize and make these different facets searchable. In particular, for oral history archives, such indexing technologies can be of great interest. These technologies can help understand the role of emotions in historical remembering. However, humans often perceive sentiments and emotions ambiguously and subjectively. Moreover, oral history interviews have multi-layered levels of complex, sometimes contradictory, sometimes very subtle facets of emotions. Therefore, the question arises of the chance machines and humans have capturing and assigning these into predefined categories. This paper investigates the ambiguity in human perception of emotions and sentiment in German oral history interviews and the impact on machine learning systems. Our experiments reveal substantial differences in human perception for different emotions. Furthermore, we report from ongoing machine learning experiments with different modalities. We show that the human perceptual ambiguity and other challenges, such as class imbalance and lack of training data, currently limit the opportunities of these technologies for oral history archives. Nonetheless, our work uncovers promising observations and possibilities for further research. △ Less

Submitted 18 January, 2022; originally announced January 2022.

Comments: Submitted to LREC 2022

arXiv:2201.06841 [pdf, other]

Human and Automatic Speech Recognition Performance on German Oral History Interviews

Authors: Michael Gref, Nike Matthiesen, Christoph Schmidt, Sven Behnke, Joachim Köhler

Abstract: Automatic speech recognition systems have accomplished remarkable improvements in transcription accuracy in recent years. On some domains, models now achieve near-human performance. However, transcription performance on oral history has not yet reached human accuracy. In the present work, we investigate how large this gap between human and machine transcription still is. For this purpose, we analy… ▽ More Automatic speech recognition systems have accomplished remarkable improvements in transcription accuracy in recent years. On some domains, models now achieve near-human performance. However, transcription performance on oral history has not yet reached human accuracy. In the present work, we investigate how large this gap between human and machine transcription still is. For this purpose, we analyze and compare transcriptions of three humans on a new oral history data set. We estimate a human word error rate of 8.7% for recent German oral history interviews with clean acoustic conditions. For comparison with recent machine transcription accuracy, we present experiments on the adaptation of an acoustic model achieving near-human performance on broadcast speech. We investigate the influence of different adaptation data on robustness and generalization for clean and noisy oral history interviews. We optimize our acoustic models by 5 to 8% relative for this task and achieve 23.9% WER on noisy and 15.6% word error rate on clean oral history interviews. △ Less

Submitted 18 January, 2022; originally announced January 2022.

Comments: Submitted to LREC 2022

arXiv:2111.01457 [pdf, other]

doi 10.51628/001c.57524

Synthesizing Speech from Intracranial Depth Electrodes using an Encoder-Decoder Framework

Authors: Jonas Kohler, Maarten C. Ottenhoff, Sophocles Goulis, Miguel Angrick, Albert J. Colon, Louis Wagner, Simon Tousseyn, Pieter L. Kubben, Christian Herff

Abstract: Speech Neuroprostheses have the potential to enable communication for people with dysarthria or anarthria. Recent advances have demonstrated high-quality text decoding and speech synthesis from electrocorticographic grids placed on the cortical surface. Here, we investigate a less invasive measurement modality in three participants, namely stereotactic EEG (sEEG) that provides sparse sampling from… ▽ More Speech Neuroprostheses have the potential to enable communication for people with dysarthria or anarthria. Recent advances have demonstrated high-quality text decoding and speech synthesis from electrocorticographic grids placed on the cortical surface. Here, we investigate a less invasive measurement modality in three participants, namely stereotactic EEG (sEEG) that provides sparse sampling from multiple brain regions, including subcortical regions. To evaluate whether sEEG can also be used to synthesize audio from neural recordings, we employ a recurrent encoder-decoder model based on modern deep learning methods. We find that speech can indeed be reconstructed with correlations up to 0.8 from these minimally invasive recordings, despite limited amounts of training data. In particular, the architecture we employ naturally picks up on the temporal nature of the data and thereby outperforms an existing benchmark based on non-regressive convolutional neural networks. △ Less

Submitted 31 October, 2022; v1 submitted 2 November, 2021; originally announced November 2021.

arXiv:2110.00351 [pdf, other]

Smooth Normalizing Flows

Authors: Jonas Köhler, Andreas Krämer, Frank Noé

Abstract: Normalizing flows are a promising tool for modeling probability distributions in physical systems. While state-of-the-art flows accurately approximate distributions and energies, applications in physics additionally require smooth energies to compute forces and higher-order derivatives. Furthermore, such densities are often defined on non-trivial topologies. A recent example are Boltzmann Generato… ▽ More Normalizing flows are a promising tool for modeling probability distributions in physical systems. While state-of-the-art flows accurately approximate distributions and energies, applications in physics additionally require smooth energies to compute forces and higher-order derivatives. Furthermore, such densities are often defined on non-trivial topologies. A recent example are Boltzmann Generators for generating 3D-structures of peptides and small proteins. These generative models leverage the space of internal coordinates (dihedrals, angles, and bonds), which is a product of hypertori and compact intervals. In this work, we introduce a class of smooth mixture transformations working on both compact intervals and hypertori. Mixture transformations employ root-finding methods to invert them in practice, which has so far prevented bi-directional flow training. To this end, we show that parameter gradients and forces of such inverses can be computed from forward evaluations via the inverse function theorem. We demonstrate two advantages of such smooth flows: they allow training by force matching to simulation data and can be used as potentials in molecular dynamics simulations. △ Less

Submitted 30 November, 2021; v1 submitted 1 October, 2021; originally announced October 2021.

Comments: Neural Information Proceessing Systems (NeurIPS) 2021

arXiv:2108.03952 [pdf, other]

Safe Deep Reinforcement Learning for Multi-Agent Systems with Continuous Action Spaces

Authors: Ziyad Sheebaelhamd, Konstantinos Zisis, Athina Nisioti, Dimitris Gkouletsos, Dario Pavllo, Jonas Kohler

Abstract: Multi-agent control problems constitute an interesting area of application for deep reinforcement learning models with continuous action spaces. Such real-world applications, however, typically come with critical safety constraints that must not be violated. In order to ensure safety, we enhance the well-known multi-agent deep deterministic policy gradient (MADDPG) framework by adding a safety lay… ▽ More Multi-agent control problems constitute an interesting area of application for deep reinforcement learning models with continuous action spaces. Such real-world applications, however, typically come with critical safety constraints that must not be violated. In order to ensure safety, we enhance the well-known multi-agent deep deterministic policy gradient (MADDPG) framework by adding a safety layer to the deep policy network. In particular, we extend the idea of linearizing the single-step transition dynamics, as was done for single-agent systems in Safe DDPG (Dalal et al., 2018), to multi-agent settings. We additionally propose to circumvent infeasibility problems in the action correction step using soft constraints (Kerrigan & Maciejowski, 2000). Results from the theory of exact penalty functions can be used to guarantee constraint satisfaction of the soft constraints under mild assumptions. We empirically find that the soft formulation achieves a dramatic decrease in constraint violations, making safety available even during the learning procedure. △ Less

Submitted 11 August, 2021; v1 submitted 9 August, 2021; originally announced August 2021.

Comments: ICML 2021 Workshop on Reinforcement Learning for Real Life

arXiv:2107.05007 [pdf, other]

Generating stable molecules using imitation and reinforcement learning

Authors: Søren Ager Meldgaard, Jonas Köhler, Henrik Lund Mortensen, Mads-Peter V. Christiansen, Frank Noé, Bjørk Hammer

Abstract: Chemical space is routinely explored by machine learning methods to discover interesting molecules, before time-consuming experimental synthesizing is attempted. However, these methods often rely on a graph representation, ignoring 3D information necessary for determining the stability of the molecules. We propose a reinforcement learning approach for generating molecules in cartesian coordinates… ▽ More Chemical space is routinely explored by machine learning methods to discover interesting molecules, before time-consuming experimental synthesizing is attempted. However, these methods often rely on a graph representation, ignoring 3D information necessary for determining the stability of the molecules. We propose a reinforcement learning approach for generating molecules in cartesian coordinates allowing for quantum chemical prediction of the stability. To improve sample-efficiency we learn basic chemical rules from imitation learning on the GDB-11 database to create an initial model applicable for all stoichiometries. We then deploy multiple copies of the model conditioned on a specific stoichiometry in a reinforcement learning setting. The models correctly identify low energy molecules in the database and produce novel isomers not found in the training set. Finally, we apply the model to larger molecules to show how reinforcement learning further refines the imitation learning model in domains far from the training data. △ Less

Submitted 11 July, 2021; originally announced July 2021.

arXiv:2106.03763 [pdf, other]

Vanishing Curvature and the Power of Adaptive Methods in Randomly Initialized Deep Networks

Authors: Antonio Orvieto, Jonas Kohler, Dario Pavllo, Thomas Hofmann, Aurelien Lucchi

Abstract: This paper revisits the so-called vanishing gradient phenomenon, which commonly occurs in deep randomly initialized neural networks. Leveraging an in-depth analysis of neural chains, we first show that vanishing gradients cannot be circumvented when the network width scales with less than O(depth), even when initialized with the popular Xavier and He initializations. Second, we extend the analysis… ▽ More This paper revisits the so-called vanishing gradient phenomenon, which commonly occurs in deep randomly initialized neural networks. Leveraging an in-depth analysis of neural chains, we first show that vanishing gradients cannot be circumvented when the network width scales with less than O(depth), even when initialized with the popular Xavier and He initializations. Second, we extend the analysis to second-order derivatives and show that random i.i.d. initialization also gives rise to Hessian matrices with eigenspectra that vanish as networks grow in depth. Whenever this happens, optimizers are initialized in a very flat, saddle point-like plateau, which is particularly hard to escape with stochastic gradient descent (SGD) as its escaping time is inversely related to curvature. We believe that this observation is crucial for fully understanding (a) historical difficulties of training deep nets with vanilla SGD, (b) the success of adaptive gradient methods (which naturally adapt to curvature and thus quickly escape flat plateaus) and (c) the effectiveness of modern architectural components like residual connections and normalization layers. △ Less

Submitted 7 June, 2021; originally announced June 2021.

arXiv:2105.11879 [pdf, other]

Flexible Table Recognition and Semantic Interpretation System

Authors: Marcin Namysl, Alexander M. Esser, Sven Behnke, Joachim Köhler

Abstract: Table extraction is an important but still unsolved problem. In this paper, we introduce a flexible and modular table extraction system. We develop two rule-based algorithms that perform the complete table recognition process, including table detection and segmentation, and support the most frequent table formats. Moreover, to incorporate the extraction of semantic information, we develop a graph-… ▽ More Table extraction is an important but still unsolved problem. In this paper, we introduce a flexible and modular table extraction system. We develop two rule-based algorithms that perform the complete table recognition process, including table detection and segmentation, and support the most frequent table formats. Moreover, to incorporate the extraction of semantic information, we develop a graph-based table interpretation method. We conduct extensive experiments on the challenging table recognition benchmarks ICDAR 2013 and ICDAR 2019, achieving results competitive with state-of-the-art approaches. Our complete information extraction system exhibited a high F1 score of 0.7380. To support future research on information extraction from documents, we make the resources (ground-truth annotations, evaluation scripts, algorithm parameters) from our table interpretation experiment publicly available. △ Less

Submitted 2 December, 2021; v1 submitted 25 May, 2021; originally announced May 2021.

Comments: Accepted for publication in Proceedings of the 17th International Conference on Computer Vision Theory and Applications (VISAPP 2022)

arXiv:2105.11872 [pdf, other]

Empirical Error Modeling Improves Robustness of Noisy Neural Sequence Labeling

Authors: Marcin Namysl, Sven Behnke, Joachim Köhler

Abstract: Despite recent advances, standard sequence labeling systems often fail when processing noisy user-generated text or consuming the output of an Optical Character Recognition (OCR) process. In this paper, we improve the noise-aware training method by proposing an empirical error generation approach that employs a sequence-to-sequence model trained to perform translation from error-free to erroneous… ▽ More Despite recent advances, standard sequence labeling systems often fail when processing noisy user-generated text or consuming the output of an Optical Character Recognition (OCR) process. In this paper, we improve the noise-aware training method by proposing an empirical error generation approach that employs a sequence-to-sequence model trained to perform translation from error-free to erroneous text. Using an OCR engine, we generated a large parallel text corpus for training and produced several real-world noisy sequence labeling benchmarks for evaluation. Moreover, to overcome the data sparsity problem that exacerbates in the case of imperfect textual input, we learned noisy language model-based embeddings. Our approach outperformed the baseline noise generation and error correction techniques on the erroneous sequence labeling data sets. To facilitate future research on robustness, we make our code, embeddings, and data conversion scripts publicly available. △ Less

Submitted 25 May, 2021; originally announced May 2021.

Comments: Accepted to appear in Findings of ACL 2021 (camera-ready version)

arXiv:2105.02968 [pdf, other]

This Looks Like That... Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks

Authors: Adrian Hoffmann, Claudio Fanconi, Rahul Rade, Jonas Kohler

Abstract: Deep neural networks that yield human interpretable decisions by architectural design have lately become an increasingly popular alternative to post hoc interpretation of traditional black-box models. Among these networks, the arguably most widespread approach is so-called prototype learning, where similarities to learned latent prototypes serve as the basis of classifying an unseen data point. In… ▽ More Deep neural networks that yield human interpretable decisions by architectural design have lately become an increasingly popular alternative to post hoc interpretation of traditional black-box models. Among these networks, the arguably most widespread approach is so-called prototype learning, where similarities to learned latent prototypes serve as the basis of classifying an unseen data point. In this work, we point to an important shortcoming of such approaches. Namely, there is a semantic gap between similarity in latent space and similarity in input space, which can corrupt interpretability. We design two experiments that exemplify this issue on the so-called ProtoPNet. Specifically, we find that this network's interpretability mechanism can be led astray by intentionally crafted or even JPEG compression artefacts, which can produce incomprehensible decisions. We argue that practitioners ought to have this shortcoming in mind when deploying prototype-based models in practice. △ Less

Submitted 23 June, 2021; v1 submitted 5 May, 2021; originally announced May 2021.

Journal ref: ICML 2021 Workshop on Theoretic Foundation, Criticism, and Application Trend of Explainable AI

arXiv:2103.15627 [pdf, other]

Learning Generative Models of Textured 3D Meshes from Real-World Images

Authors: Dario Pavllo, Jonas Kohler, Thomas Hofmann, Aurelien Lucchi

Abstract: Recent advances in differentiable rendering have sparked an interest in learning generative models of textured 3D meshes from image collections. These models natively disentangle pose and appearance, enable downstream applications in computer graphics, and improve the ability of generative models to understand the concept of image formation. Although there has been prior work on learning such mode… ▽ More Recent advances in differentiable rendering have sparked an interest in learning generative models of textured 3D meshes from image collections. These models natively disentangle pose and appearance, enable downstream applications in computer graphics, and improve the ability of generative models to understand the concept of image formation. Although there has been prior work on learning such models from collections of 2D images, these approaches require a delicate pose estimation step that exploits annotated keypoints, thereby restricting their applicability to a few specific datasets. In this work, we propose a GAN framework for generating textured triangle meshes without relying on such annotations. We show that the performance of our approach is on par with prior work that relies on ground-truth keypoints, and more importantly, we demonstrate the generality of our method by setting new baselines on a larger set of categories from ImageNet - for which keypoints are not available - without any class-specific hyperparameter tuning. We release our code at https://github.com/dariopavllo/textured-3d-gan △ Less

Submitted 17 August, 2021; v1 submitted 29 March, 2021; originally announced March 2021.

Comments: ICCV 2021

arXiv:2011.14006 [pdf, ps, other]

Offset-free setpoint tracking using neural network controllers

Authors: Patricia Pauli, Johannes Köhler, Julian Berberich, Anne Koch, Frank Allgöwer

Abstract: In this paper, we present a method to analyze local and global stability in offset-free setpoint tracking using neural network controllers and we provide ellipsoidal inner approximations of the corresponding region of attraction. We consider a feedback interconnection of a linear plant in connection with a neural network controller and an integrator, which allows for offset-free tracking of a desi… ▽ More In this paper, we present a method to analyze local and global stability in offset-free setpoint tracking using neural network controllers and we provide ellipsoidal inner approximations of the corresponding region of attraction. We consider a feedback interconnection of a linear plant in connection with a neural network controller and an integrator, which allows for offset-free tracking of a desired piecewise constant reference that enters the controller as an external input. Exploiting the fact that activation functions used in neural networks are slope-restricted, we derive linear matrix inequalities to verify stability using Lyapunov theory. After stating a global stability result, we present less conservative local stability conditions (i) for a given reference and (ii) for any reference from a certain set. The latter result even enables guaranteed tracking under setpoint changes using a reference governor which can lead to a significant increase of the region of attraction. Finally, we demonstrate the applicability of our analysis by verifying stability and offset-free tracking of a neural network controller that was trained to stabilize a linearized inverted pendulum. △ Less

Submitted 29 April, 2021; v1 submitted 23 November, 2020; originally announced November 2020.

arXiv:2011.12862 [pdf, other]

Cable Tree Wiring -- Benchmarking Solvers on a Real-World Scheduling Problem with a Variety of Precedence Constraints

Authors: Jana Koehler, Joseph Bürgler, Urs Fontana, Etienne Fux, Florian Herzog, Marc Pouly, Sophia Saller, Anastasia Salyaeva, Peter Scheiblechner, Kai Waelti

Abstract: Cable trees are used in industrial products to transmit energy and information between different product parts. To this date, they are mostly assembled by humans and only few automated manufacturing solutions exist using complex robotic machines. For these machines, the wiring plan has to be translated into a wiring sequence of cable plugging operations to be followed by the machine. In this paper… ▽ More Cable trees are used in industrial products to transmit energy and information between different product parts. To this date, they are mostly assembled by humans and only few automated manufacturing solutions exist using complex robotic machines. For these machines, the wiring plan has to be translated into a wiring sequence of cable plugging operations to be followed by the machine. In this paper, we study and formalize the problem of deriving the optimal wiring sequence for a given layout of a cable tree. We summarize our investigations to model this cable tree wiring Problem (CTW) as a traveling salesman problem with atomic, soft atomic, and disjunctive precedence constraints as well as tour-dependent edge costs such that it can be solved by state-of-the-art constraint programming (CP), Optimization Modulo Theories (OMT), and mixed-integer programming (MIP) solvers. It is further shown, how the CTW problem can be viewed as a soft version of the coupled tasks scheduling problem. We discuss various modeling variants for the problem, prove its NP-hardness, and empirically compare CP, OMT, and MIP solvers on a benchmark set of 278 instances. The complete benchmark set with all models and instance data is available on github and is accepted for inclusion in the MiniZinc challenge 2020. △ Less

Submitted 25 November, 2020; originally announced November 2020.

arXiv:2011.00573 [pdf, other]

Two-Level K-FAC Preconditioning for Deep Learning

Authors: Nikolaos Tselepidis, Jonas Kohler, Antonio Orvieto

Abstract: In the context of deep learning, many optimization methods use gradient covariance information in order to accelerate the convergence of Stochastic Gradient Descent. In particular, starting with Adagrad, a seemingly endless line of research advocates the use of diagonal approximations of the so-called empirical Fisher matrix in stochastic gradient-based algorithms, with the most prominent one argu… ▽ More In the context of deep learning, many optimization methods use gradient covariance information in order to accelerate the convergence of Stochastic Gradient Descent. In particular, starting with Adagrad, a seemingly endless line of research advocates the use of diagonal approximations of the so-called empirical Fisher matrix in stochastic gradient-based algorithms, with the most prominent one arguably being Adam. However, in recent years, several works cast doubt on the theoretical basis of preconditioning with the empirical Fisher matrix, and it has been shown that more sophisticated approximations of the actual Fisher matrix more closely resemble the theoretically well-motivated Natural Gradient Descent. One particularly successful variant of such methods is the so-called K-FAC optimizer, which uses a Kronecker-factored block-diagonal Fisher approximation as preconditioner. In this work, drawing inspiration from two-level domain decomposition methods used as preconditioners in the field of scientific computing, we extend K-FAC by enriching it with off-diagonal (i.e. global) curvature information in a computationally efficient way. We achieve this by adding a coarse-space correction term to the preconditioner, which captures the global Fisher information matrix at a coarser scale. We present a small set of experimental results suggesting improved convergence behaviour of our proposed method. △ Less

Submitted 6 December, 2020; v1 submitted 1 November, 2020; originally announced November 2020.

arXiv:2010.07033 [pdf, other]

Training Invertible Linear Layers through Rank-One Perturbations

Authors: Andreas Krämer, Jonas Köhler, Frank Noé

Abstract: Many types of neural network layers rely on matrix properties such as invertibility or orthogonality. Retaining such properties during optimization with gradient-based stochastic optimizers is a challenging task, which is usually addressed by either reparameterization of the affected parameters or by directly optimizing on the manifold. This work presents a novel approach for training invertible l… ▽ More Many types of neural network layers rely on matrix properties such as invertibility or orthogonality. Retaining such properties during optimization with gradient-based stochastic optimizers is a challenging task, which is usually addressed by either reparameterization of the affected parameters or by directly optimizing on the manifold. This work presents a novel approach for training invertible linear layers. In lieu of directly optimizing the network parameters, we train rank-one perturbations and add them to the actual weight matrices infrequently. This P$^{4}$Inv update allows keeping track of inverses and determinants without ever explicitly computing them. We show how such invertible blocks improve the mixing and thus the mode separation of the resulting normalizing flows. Furthermore, we outline how the P$^4$ concept can be utilized to retain properties other than invertibility. △ Less

Submitted 30 November, 2020; v1 submitted 14 October, 2020; originally announced October 2020.

Comments: 17 pages, 10 figures

MSC Class: 68T07; 82-10

arXiv:2006.02425 [pdf, other]

Equivariant Flows: Exact Likelihood Generative Learning for Symmetric Densities

Authors: Jonas Köhler, Leon Klein, Frank Noé

Abstract: Normalizing flows are exact-likelihood generative neural networks which approximately transform samples from a simple prior distribution to samples of the probability distribution of interest. Recent work showed that such generative models can be utilized in statistical mechanics to sample equilibrium states of many-body systems in physics and chemistry. To scale and generalize these results, it i… ▽ More Normalizing flows are exact-likelihood generative neural networks which approximately transform samples from a simple prior distribution to samples of the probability distribution of interest. Recent work showed that such generative models can be utilized in statistical mechanics to sample equilibrium states of many-body systems in physics and chemistry. To scale and generalize these results, it is essential that the natural symmetries in the probability density -- in physics defined by the invariances of the target potential -- are built into the flow. We provide a theoretical sufficient criterion showing that the distribution generated by \textit{equivariant} normalizing flows is invariant with respect to these symmetries by design. Furthermore, we propose building blocks for flows which preserve symmetries which are usually found in physical/chemical many-body particle systems. Using benchmark systems motivated from molecular physics, we demonstrate that those symmetry preserving flows can provide better generalization capabilities and sampling efficiency. △ Less

Submitted 26 October, 2020; v1 submitted 3 June, 2020; originally announced June 2020.

arXiv:2005.12562 [pdf, other]

Multi-Staged Cross-Lingual Acoustic Model Adaption for Robust Speech Recognition in Real-World Applications -- A Case Study on German Oral History Interviews

Authors: Michael Gref, Oliver Walter, Christoph Schmidt, Sven Behnke, Joachim Köhler

Abstract: While recent automatic speech recognition systems achieve remarkable performance when large amounts of adequate, high quality annotated speech data is used for training, the same systems often only achieve an unsatisfactory result for tasks in domains that greatly deviate from the conditions represented by the training data. For many real-world applications, there is a lack of sufficient data that… ▽ More While recent automatic speech recognition systems achieve remarkable performance when large amounts of adequate, high quality annotated speech data is used for training, the same systems often only achieve an unsatisfactory result for tasks in domains that greatly deviate from the conditions represented by the training data. For many real-world applications, there is a lack of sufficient data that can be directly used for training robust speech recognition systems. To address this issue, we propose and investigate an approach that performs a robust acoustic model adaption to a target domain in a cross-lingual, multi-staged manner. Our approach enables the exploitation of large-scale training data from other domains in both the same and other languages. We evaluate our approach using the challenging task of German oral history interviews, where we achieve a relative reduction of the word error rate by more than 30% compared to a model trained from scratch only on the target domain, and 6-7% relative compared to a model trained robustly on 1000 hours of same-language out-of-domain training data. △ Less

Submitted 26 May, 2020; originally announced May 2020.

Comments: Published version of the paper can be accessed via https://www.aclweb.org/anthology/2020.lrec-1.780

Journal ref: 12th International Conference on Language Resources and Evaluation (LREC 2020), pages 6354-6362

arXiv:2005.07162 [pdf, other]

NAT: Noise-Aware Training for Robust Neural Sequence Labeling

Authors: Marcin Namysl, Sven Behnke, Joachim Köhler

Abstract: Sequence labeling systems should perform reliably not only under ideal conditions but also with corrupted inputs - as these systems often process user-generated text or follow an error-prone upstream component. To this end, we formulate the noisy sequence labeling problem, where the input may undergo an unknown noising process and propose two Noise-Aware Training (NAT) objectives that improve robu… ▽ More Sequence labeling systems should perform reliably not only under ideal conditions but also with corrupted inputs - as these systems often process user-generated text or follow an error-prone upstream component. To this end, we formulate the noisy sequence labeling problem, where the input may undergo an unknown noising process and propose two Noise-Aware Training (NAT) objectives that improve robustness of sequence labeling performed on perturbed input: Our data augmentation method trains a neural model using a mixture of clean and noisy samples, whereas our stability training algorithm encourages the model to create a noise-invariant latent representation. We employ a vanilla noise model at training time. For evaluation, we use both the original data and its variants perturbed with real OCR errors and misspellings. Extensive experiments on English and German named entity recognition benchmarks confirmed that NAT consistently improved robustness of popular sequence labeling models, preserving accuracy on the original input. We make our code and data publicly available for the research community. △ Less

Submitted 14 May, 2020; originally announced May 2020.

Comments: Accepted to appear at ACL 2020

arXiv:2004.08355 [pdf]

Towards an Interoperable Ecosystem of AI and LT Platforms: A Roadmap for the Implementation of Different Levels of Interoperability

Authors: Georg Rehm, Dimitrios Galanis, Penny Labropoulou, Stelios Piperidis, Martin Welß, Ricardo Usbeck, Joachim Köhler, Miltos Deligiannis, Katerina Gkirtzou, Johannes Fischer, Christian Chiarcos, Nils Feldhus, Julián Moreno-Schneider, Florian Kintzel, Elena Montiel, Víctor Rodríguez Doncel, John P. McCrae, David Laqua, Irina Patricia Theile, Christian Dittmar, Kalina Bontcheva, Ian Roberts, Andrejs Vasiljevs, Andis Lagzdiņš

Abstract: With regard to the wider area of AI/LT platform interoperability, we concentrate on two core aspects: (1) cross-platform search and discovery of resources and services; (2) composition of cross-platform service workflows. We devise five different levels (of increasing complexity) of platform interoperability that we suggest to implement in a wider federation of AI/LT platforms. We illustrate the a… ▽ More With regard to the wider area of AI/LT platform interoperability, we concentrate on two core aspects: (1) cross-platform search and discovery of resources and services; (2) composition of cross-platform service workflows. We devise five different levels (of increasing complexity) of platform interoperability that we suggest to implement in a wider federation of AI/LT platforms. We illustrate the approach using the five emerging AI/LT platforms AI4EU, ELG, Lynx, QURATOR and SPEAKER. △ Less

Submitted 17 April, 2020; originally announced April 2020.

Comments: Proceedings of the 1st International Workshop on Language Technology Platforms (IWLTP 2020). To appear

arXiv:2003.13833 [pdf]

The European Language Technology Landscape in 2020: Language-Centric and Human-Centric AI for Cross-Cultural Communication in Multilingual Europe

Authors: Georg Rehm, Katrin Marheinecke, Stefanie Hegele, Stelios Piperidis, Kalina Bontcheva, Jan Hajič, Khalid Choukri, Andrejs Vasiļjevs, Gerhard Backfried, Christoph Prinz, José Manuel Gómez Pérez, Luc Meertens, Paul Lukowicz, Josef van Genabith, Andrea Lösch, Philipp Slusallek, Morten Irgens, Patrick Gatellier, Joachim Köhler, Laure Le Bars, Dimitra Anastasiou, Albina Auksoriūtė, Núria Bel, António Branco, Gerhard Budin , et al. (22 additional authors not shown)

Abstract: Multilingualism is a cultural cornerstone of Europe and firmly anchored in the European treaties including full language equality. However, language barriers impacting business, cross-lingual and cross-cultural communication are still omnipresent. Language Technologies (LTs) are a powerful means to break down these barriers. While the last decade has seen various initiatives that created a multitu… ▽ More Multilingualism is a cultural cornerstone of Europe and firmly anchored in the European treaties including full language equality. However, language barriers impacting business, cross-lingual and cross-cultural communication are still omnipresent. Language Technologies (LTs) are a powerful means to break down these barriers. While the last decade has seen various initiatives that created a multitude of approaches and technologies tailored to Europe's specific needs, there is still an immense level of fragmentation. At the same time, AI has become an increasingly important concept in the European Information and Communication Technology area. For a few years now, AI, including many opportunities, synergies but also misconceptions, has been overshadowing every other topic. We present an overview of the European LT landscape, describing funding programmes, activities, actions and challenges in the different countries with regard to LT, including the current state of play in industry and the LT market. We present a brief overview of the main LT-related activities on the EU level in the last ten years and develop strategic guidance with regard to four key dimensions. △ Less

Submitted 30 March, 2020; originally announced March 2020.

Comments: Proceedings of the 12th Language Resources and Evaluation Conference (LREC 2020). To appear

arXiv:2003.01652 [pdf, other]

Batch Normalization Provably Avoids Rank Collapse for Randomly Initialised Deep Networks

Authors: Hadi Daneshmand, Jonas Kohler, Francis Bach, Thomas Hofmann, Aurelien Lucchi

Abstract: Randomly initialized neural networks are known to become harder to train with increasing depth, unless architectural enhancements like residual connections and batch normalization are used. We here investigate this phenomenon by revisiting the connection between random initialization in deep networks and spectral instabilities in products of random matrices. Given the rich literature on random mat… ▽ More Randomly initialized neural networks are known to become harder to train with increasing depth, unless architectural enhancements like residual connections and batch normalization are used. We here investigate this phenomenon by revisiting the connection between random initialization in deep networks and spectral instabilities in products of random matrices. Given the rich literature on random matrices, it is not surprising to find that the rank of the intermediate representations in unnormalized networks collapses quickly with depth. In this work we highlight the fact that batch normalization is an effective strategy to avoid rank collapse for both linear and ReLU networks. Leveraging tools from Markov chain theory, we derive a meaningful lower rank bound in deep linear networks. Empirically, we also demonstrate that this rank robustness generalizes to ReLU nets. Finally, we conduct an extensive set of experiments on real-world data sets, which confirm that rank stability is indeed a crucial condition for training modern-day deep neural architectures. △ Less

Submitted 11 June, 2020; v1 submitted 3 March, 2020; originally announced March 2020.

arXiv:2002.06707 [pdf, other]

Stochastic Normalizing Flows

Authors: Hao Wu, Jonas Köhler, Frank Noé

Abstract: The sampling of probability distributions specified up to a normalization constant is an important problem in both machine learning and statistical mechanics. While classical stochastic sampling methods such as Markov Chain Monte Carlo (MCMC) or Langevin Dynamics (LD) can suffer from slow mixing times there is a growing interest in using normalizing flows in order to learn the transformation of a… ▽ More The sampling of probability distributions specified up to a normalization constant is an important problem in both machine learning and statistical mechanics. While classical stochastic sampling methods such as Markov Chain Monte Carlo (MCMC) or Langevin Dynamics (LD) can suffer from slow mixing times there is a growing interest in using normalizing flows in order to learn the transformation of a simple prior distribution to the given target distribution. Here we propose a generalized and combined approach to sample target densities: Stochastic Normalizing Flows (SNF) -- an arbitrary sequence of deterministic invertible functions and stochastic sampling blocks. We show that stochasticity overcomes expressivity limitations of normalizing flows resulting from the invertibility constraint, whereas trainable transformations between sampling steps improve efficiency of pure MCMC/LD along the flow. By invoking ideas from non-equilibrium statistical mechanics we derive an efficient training procedure by which both the sampler's and the flow's parameters can be optimized end-to-end, and by which we can compute exact importance weights without having to marginalize out the randomness of the stochastic blocks. We illustrate the representational power, sampling efficiency and asymptotic correctness of SNFs on several benchmarks including applications to sampling molecular systems in equilibrium. △ Less

Submitted 26 October, 2020; v1 submitted 16 February, 2020; originally announced February 2020.

arXiv:1912.10360 [pdf, other]

doi 10.1109/LRA.2020.2975727

Safe and Fast Tracking on a Robot Manipulator: Robust MPC and Neural Network Control

Authors: Julian Nubert, Johannes Köhler, Vincent Berenz, Frank Allgöwer, Sebastian Trimpe

Abstract: Fast feedback control and safety guarantees are essential in modern robotics. We present an approach that achieves both by combining novel robust model predictive control (MPC) with function approximation via (deep) neural networks (NNs). The result is a new approach for complex tasks with nonlinear, uncertain, and constrained dynamics as are common in robotics. Specifically, we leverage recent re… ▽ More Fast feedback control and safety guarantees are essential in modern robotics. We present an approach that achieves both by combining novel robust model predictive control (MPC) with function approximation via (deep) neural networks (NNs). The result is a new approach for complex tasks with nonlinear, uncertain, and constrained dynamics as are common in robotics. Specifically, we leverage recent results in MPC research to propose a new robust setpoint tracking MPC algorithm, which achieves reliable and safe tracking of a dynamic setpoint while guaranteeing stability and constraint satisfaction. The presented robust MPC scheme constitutes a one-layer approach that unifies the often separated planning and control layers, by directly computing the control command based on a reference and possibly obstacle positions. As a separate contribution, we show how the computation time of the MPC can be drastically reduced by approximating the MPC law with a NN controller. The NN is trained and validated from offline samples of the MPC, yielding statistical guarantees, and used in lieu thereof at run time. Our experiments on a state-of-the-art robot manipulator are the first to show that both the proposed robust and approximate MPC schemes scale to real-world robotic systems. △ Less

Submitted 2 March, 2020; v1 submitted 21 December, 2019; originally announced December 2019.

Comments: 8 pages, 4 figures,

Journal ref: Robotics and Automation Letters, 2020

arXiv:1911.10367 [pdf, other]

A Sub-sampled Tensor Method for Non-convex Optimization

Authors: Aurelien Lucchi, Jonas Kohler

Abstract: We present a stochastic optimization method that uses a fourth-order regularized model to find local minima of smooth and potentially non-convex objective functions with a finite-sum structure. This algorithm uses sub-sampled derivatives instead of exact quantities. The proposed approach is shown to find an $(ε_1,ε_2,ε_3)$-third-order critical point in at most… ▽ More We present a stochastic optimization method that uses a fourth-order regularized model to find local minima of smooth and potentially non-convex objective functions with a finite-sum structure. This algorithm uses sub-sampled derivatives instead of exact quantities. The proposed approach is shown to find an $(ε_1,ε_2,ε_3)$-third-order critical point in at most $\bigO\left(\max\left(ε_1^{-4/3}, ε_2^{-2}, ε_3^{-4}\right)\right)$ iterations, thereby matching the rate of deterministic approaches. In order to prove this result, we derive a novel tensor concentration inequality for sums of tensors of any order that makes explicit use of the finite-sum structure of the objective function. △ Less

Submitted 15 July, 2023; v1 submitted 23 November, 2019; originally announced November 2019.

Comments: Initial title: A Stochastic Tensor Method for Non-convex Optimization

arXiv:1910.06924 [pdf, other]

DP-MAC: The Differentially Private Method of Auxiliary Coordinates for Deep Learning

Authors: Frederik Harder, Jonas Köhler, Max Welling, Mijung Park

Abstract: Developing a differentially private deep learning algorithm is challenging, due to the difficulty in analyzing the sensitivity of objective functions that are typically used to train deep neural networks. Many existing methods resort to the stochastic gradient descent algorithm and apply a pre-defined sensitivity to the gradients for privatizing weights. However, their slow convergence typically y… ▽ More Developing a differentially private deep learning algorithm is challenging, due to the difficulty in analyzing the sensitivity of objective functions that are typically used to train deep neural networks. Many existing methods resort to the stochastic gradient descent algorithm and apply a pre-defined sensitivity to the gradients for privatizing weights. However, their slow convergence typically yields a high cumulative privacy loss. Here, we take a different route by employing the method of auxiliary coordinates, which allows us to independently update the weights per layer by optimizing a per-layer objective function. This objective function can be well approximated by a low-order Taylor's expansion, in which sensitivity analysis becomes tractable. We perturb the coefficients of the expansion for privacy, which we optimize using more advanced optimization routines than SGD for faster convergence. We empirically show that our algorithm provides a decent trained model quality under a modest privacy budget. △ Less

Submitted 15 October, 2019; originally announced October 2019.

arXiv:1910.00753 [pdf, other]

Equivariant Flows: sampling configurations for multi-body systems with symmetric energies

Authors: Jonas Köhler, Leon Klein, Frank Noé

Abstract: Flows are exact-likelihood generative neural networks that transform samples from a simple prior distribution to the samples of the probability distribution of interest. Boltzmann Generators (BG) combine flows and statistical mechanics to sample equilibrium states of strongly interacting many-body systems such as proteins with 1000 atoms. In order to scale and generalize these results, it is essen… ▽ More Flows are exact-likelihood generative neural networks that transform samples from a simple prior distribution to the samples of the probability distribution of interest. Boltzmann Generators (BG) combine flows and statistical mechanics to sample equilibrium states of strongly interacting many-body systems such as proteins with 1000 atoms. In order to scale and generalize these results, it is essential that the natural symmetries of the probability density - in physics defined by the invariances of the energy function - are built into the flow. Here we develop theoretical tools for constructing such equivariant flows and demonstrate that a BG that is equivariant with respect to rotations and particle permutations can generalize to sampling nontrivially new configurations where a nonequivariant BG cannot. △ Less

Submitted 1 October, 2019; originally announced October 2019.

arXiv:1908.06709 [pdf, other]

doi 10.1109/ICME.2019.00142

Two-Staged Acoustic Modeling Adaption for Robust Speech Recognition by the Example of German Oral History Interviews

Authors: Michael Gref, Christoph Schmidt, Sven Behnke, Joachim Köhler

Abstract: In automatic speech recognition, often little training data is available for specific challenging tasks, but training of state-of-the-art automatic speech recognition systems requires large amounts of annotated speech. To address this issue, we propose a two-staged approach to acoustic modeling that combines noise and reverberation data augmentation with transfer learning to robustly address chall… ▽ More In automatic speech recognition, often little training data is available for specific challenging tasks, but training of state-of-the-art automatic speech recognition systems requires large amounts of annotated speech. To address this issue, we propose a two-staged approach to acoustic modeling that combines noise and reverberation data augmentation with transfer learning to robustly address challenges such as difficult acoustic recording conditions, spontaneous speech, and speech of elderly people. We evaluate our approach using the example of German oral history interviews, where a relative average reduction of the word error rate by 19.3% is achieved. △ Less

Submitted 19 August, 2019; originally announced August 2019.

Comments: Accepted for IEEE International Conference on Multimedia and Expo (ICME), Shanghai, China, July 2019

Journal ref: IEEE International Conference on Multimedia and Expo (ICME), Shanghai, China, July 2019

arXiv:1908.02686 [pdf, other]

Interpretable and Fine-Grained Visual Explanations for Convolutional Neural Networks

Authors: Jörg Wagner, Jan Mathias Köhler, Tobias Gindele, Leon Hetzel, Jakob Thaddäus Wiedemer, Sven Behnke

Abstract: To verify and validate networks, it is essential to gain insight into their decisions, limitations as well as possible shortcomings of training data. In this work, we propose a post-hoc, optimization based visual explanation method, which highlights the evidence in the input image for a specific prediction. Our approach is based on a novel technique to defend against adversarial evidence (i.e. fau… ▽ More To verify and validate networks, it is essential to gain insight into their decisions, limitations as well as possible shortcomings of training data. In this work, we propose a post-hoc, optimization based visual explanation method, which highlights the evidence in the input image for a specific prediction. Our approach is based on a novel technique to defend against adversarial evidence (i.e. faulty evidence due to artefacts) by filtering gradients during optimization. The defense does not depend on human-tuned parameters. It enables explanations which are both fine-grained and preserve the characteristics of images, such as edges and colors. The explanations are interpretable, suited for visualizing detailed evidence and can be tested as they are valid model inputs. We qualitatively and quantitatively evaluate our approach on a multitude of models and datasets. △ Less

Submitted 7 August, 2019; originally announced August 2019.

Comments: In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, June 2019

arXiv:1907.01678 [pdf, other]

The Role of Memory in Stochastic Optimization

Authors: Antonio Orvieto, Jonas Kohler, Aurelien Lucchi

Abstract: The choice of how to retain information about past gradients dramatically affects the convergence properties of state-of-the-art stochastic optimization methods, such as Heavy-ball, Nesterov's momentum, RMSprop and Adam. Building on this observation, we use stochastic differential equations (SDEs) to explicitly study the role of memory in gradient-based algorithms. We first derive a general contin… ▽ More The choice of how to retain information about past gradients dramatically affects the convergence properties of state-of-the-art stochastic optimization methods, such as Heavy-ball, Nesterov's momentum, RMSprop and Adam. Building on this observation, we use stochastic differential equations (SDEs) to explicitly study the role of memory in gradient-based algorithms. We first derive a general continuous-time model that can incorporate arbitrary types of memory, for both deterministic and stochastic settings. We provide convergence guarantees for this SDE for weakly-quasi-convex and quadratically growing functions. We then demonstrate how to discretize this SDE to get a flexible discrete-time algorithm that can implement a board spectrum of memories ranging from short- to long-term. Not only does this algorithm increase the degrees of freedom in algorithmic choice for practitioners but it also comes with better stability properties than classical momentum in the convex stochastic setting. In particular, no iterate averaging is needed for convergence. Interestingly, our analysis also provides a novel interpretation of Nesterov's momentum as stable gradient amplification and highlights a possible reason for its unstable behavior in the (convex) stochastic setting. Furthermore, we discuss the use of long term memory for second-moment estimation in adaptive methods, such as Adam and RMSprop. Finally, we provide an extensive experimental study of the effect of different types of memory in both convex and nonconvex settings. △ Less

Submitted 11 March, 2020; v1 submitted 2 July, 2019; originally announced July 2019.

Comments: Accepted paper at the 35th Conference on Uncertainty in Artificial Intelligence (UAI), Tel Aviv, 2019

arXiv:1906.11876 [pdf, other]

Uncertainty Based Detection and Relabeling of Noisy Image Labels

Authors: Jan M. Köhler, Maximilian Autenrieth, William H. Beluch

Abstract: Deep neural networks (DNNs) are powerful tools in computer vision tasks. However, in many realistic scenarios label noise is prevalent in the training images, and overfitting to these noisy labels can significantly harm the generalization performance of DNNs. We propose a novel technique to identify data with noisy labels based on the different distributions of the predictive uncertainties from a… ▽ More Deep neural networks (DNNs) are powerful tools in computer vision tasks. However, in many realistic scenarios label noise is prevalent in the training images, and overfitting to these noisy labels can significantly harm the generalization performance of DNNs. We propose a novel technique to identify data with noisy labels based on the different distributions of the predictive uncertainties from a DNN over the clean and noisy data. Additionally, the behavior of the uncertainty over the course of training helps to identify the network weights which best can be used to relabel the noisy labels. Data with noisy labels can therefore be cleaned in an iterative process. Our proposed method can be easily implemented, and shows promising performance on the task of noisy label detection on CIFAR-10 and CIFAR-100. △ Less

Submitted 29 May, 2019; originally announced June 2019.

Comments: Uncertainty and Robustness in Deep Visual Learning Workshop at CVPR 2019

Showing 1–50 of 62 results for author: Köhler, J