subscribe to arXiv mailings

The Shortcomings of Force-from-Motion in Robot Learning

Authors: Elie Aljalbout, Felix Frank, Patrick van der Smagt, Alexandros Paraschos

Abstract: Robotic manipulation requires accurate motion and physical interaction control. However, current robot learning approaches focus on motion-centric action spaces that do not explicitly give the policy control over the interaction. In this paper, we discuss the repercussions of this choice and argue for more interaction-explicit action spaces in robot learning. Robotic manipulation requires accurate motion and physical interaction control. However, current robot learning approaches focus on motion-centric action spaces that do not explicitly give the policy control over the interaction. In this paper, we discuss the repercussions of this choice and argue for more interaction-explicit action spaces in robot learning. △ Less

Submitted 3 July, 2024; originally announced July 2024.

ACM Class: I.2.6; I.2.8; I.2.9

arXiv:2405.13191 [pdf, other]

Pragmatic auditing: a pilot-driven approach for auditing Machine Learning systems

Authors: Djalel Benbouzid, Christiane Plociennik, Laura Lucaj, Mihai Maftei, Iris Merget, Aljoscha Burchardt, Marc P. Hauer, Abdeldjallil Naceri, Patrick van der Smagt

Abstract: The growing adoption and deployment of Machine Learning (ML) systems came with its share of ethical incidents and societal concerns. It also unveiled the necessity to properly audit these systems in light of ethical principles. For such a novel type of algorithmic auditing to become standard practice, two main prerequisites need to be available: A lifecycle model that is tailored towards transpare… ▽ More The growing adoption and deployment of Machine Learning (ML) systems came with its share of ethical incidents and societal concerns. It also unveiled the necessity to properly audit these systems in light of ethical principles. For such a novel type of algorithmic auditing to become standard practice, two main prerequisites need to be available: A lifecycle model that is tailored towards transparency and accountability, and a principled risk assessment procedure that allows the proper scoping of the audit. Aiming to make a pragmatic step towards a wider adoption of ML auditing, we present a respective procedure that extends the AI-HLEG guidelines published by the European Commission. Our audit procedure is based on an ML lifecycle model that explicitly focuses on documentation, accountability, and quality assurance; and serves as a common ground for alignment between the auditors and the audited organisation. We describe two pilots conducted on real-world use cases from two different organisations and discuss the shortcomings of ML algorithmic auditing as well as future directions thereof. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2404.18896 [pdf, other]

Overcoming Knowledge Barriers: Online Imitation Learning from Observation with Pretrained World Models

Authors: Xingyuan Zhang, Philip Becker-Ehmck, Patrick van der Smagt, Maximilian Karl

Abstract: Incorporating the successful paradigm of pretraining and finetuning from Computer Vision and Natural Language Processing into decision-making has become increasingly popular in recent years. In this paper, we study Imitation Learning from Observation with pretrained models and find existing approaches such as BCO and AIME face knowledge barriers, specifically the Embodiment Knowledge Barrier (EKB)… ▽ More Incorporating the successful paradigm of pretraining and finetuning from Computer Vision and Natural Language Processing into decision-making has become increasingly popular in recent years. In this paper, we study Imitation Learning from Observation with pretrained models and find existing approaches such as BCO and AIME face knowledge barriers, specifically the Embodiment Knowledge Barrier (EKB) and the Demonstration Knowledge Barrier (DKB), greatly limiting their performance. The EKB arises when pretrained models lack knowledge about unseen observations, leading to errors in action inference. The DKB results from policies trained on limited demonstrations, hindering adaptability to diverse scenarios. We thoroughly analyse the underlying mechanism of these barriers and propose AIME-v2 upon AIME as a solution. AIME-v2 uses online interactions with data-driven regulariser to alleviate the EKB and mitigates the DKB by introducing a surrogate reward function to enhance policy training. Experimental results on tasks from the DeepMind Control Suite and Meta-World benchmarks demonstrate the effectiveness of these modifications in improving both sample-efficiency and converged performance. The study contributes valuable insights into resolving knowledge barriers for enhanced decision-making in pretraining-based approaches. Code will be available at https://github.com/argmax-ai/aime-v2. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: 19 pages, 7 figures

arXiv:2404.03253 [pdf, other]

A dataset of primary nasopharyngeal carcinoma MRI with multi-modalities segmentation

Authors: Yin Li, Qi Chen, Kai Wang, Meige Li, Liping Si, Yingwei Guo, Yu Xiong, Qixing Wang, Yang Qin, Ling Xu, Patrick van der Smagt, Jun Tang, Nutan Chen

Abstract: Multi-modality magnetic resonance imaging data with various sequences facilitate the early diagnosis, tumor segmentation, and disease staging in the management of nasopharyngeal carcinoma (NPC). The lack of publicly available, comprehensive datasets limits advancements in diagnosis, treatment planning, and the development of machine learning algorithms for NPC. Addressing this critical need, we in… ▽ More Multi-modality magnetic resonance imaging data with various sequences facilitate the early diagnosis, tumor segmentation, and disease staging in the management of nasopharyngeal carcinoma (NPC). The lack of publicly available, comprehensive datasets limits advancements in diagnosis, treatment planning, and the development of machine learning algorithms for NPC. Addressing this critical need, we introduce the first comprehensive NPC MRI dataset, encompassing MR axial imaging of 277 primary NPC patients. This dataset includes T1-weighted, T2-weighted, and contrast-enhanced T1-weighted sequences, totaling 831 scans. In addition to the corresponding clinical data, manually annotated and labeled segmentations by experienced radiologists offer high-quality data resources from untreated primary NPC. △ Less

Submitted 4 April, 2024; originally announced April 2024.

arXiv:2403.15239 [pdf, other]

Guided Decoding for Robot Motion Generation and Adaption

Authors: Nutan Chen, Elie Aljalbout, Botond Cseke, Patrick van der Smagt

Abstract: We address motion generation for high-DoF robot arms in complex settings with obstacles, via points, etc. A significant advancement in this domain is achieved by integrating Learning from Demonstration (LfD) into the motion generation process. This integration facilitates rapid adaptation to new tasks and optimizes the utilization of accumulated expertise by allowing robots to learn and generalize… ▽ More We address motion generation for high-DoF robot arms in complex settings with obstacles, via points, etc. A significant advancement in this domain is achieved by integrating Learning from Demonstration (LfD) into the motion generation process. This integration facilitates rapid adaptation to new tasks and optimizes the utilization of accumulated expertise by allowing robots to learn and generalize from demonstrated trajectories. We train a transformer architecture on a large dataset of simulated trajectories. This architecture, based on a conditional variational autoencoder transformer, learns essential motion generation skills and adapts these to meet auxiliary tasks and constraints. Our auto-regressive approach enables real-time integration of feedback from the physical system, enhancing the adaptability and efficiency of motion generation. We show that our model can generate motion from initial and target points, but also that it can adapt trajectories in navigating complex tasks, including obstacle avoidance, via points, and meeting velocity and acceleration constraints, across platforms. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: 7 pages

arXiv:2401.11447 [pdf, other]

Sequential Model for Predicting Patient Adherence in Subcutaneous Immunotherapy for Allergic Rhinitis

Authors: Yin Li, Yu Xiong, Wenxin Fan, Kai Wang, Qingqing Yu, Liping Si, Patrick van der Smagt, Jun Tang, Nutan Chen

Abstract: Objective: Subcutaneous Immunotherapy (SCIT) is the long-lasting causal treatment of allergic rhinitis (AR). How to enhance the adherence of patients to maximize the benefit of allergen immunotherapy (AIT) plays a crucial role in the management of AIT. This study aims to leverage novel machine learning models to precisely predict the risk of non-adherence of AR patients and related local symptom s… ▽ More Objective: Subcutaneous Immunotherapy (SCIT) is the long-lasting causal treatment of allergic rhinitis (AR). How to enhance the adherence of patients to maximize the benefit of allergen immunotherapy (AIT) plays a crucial role in the management of AIT. This study aims to leverage novel machine learning models to precisely predict the risk of non-adherence of AR patients and related local symptom scores in three years SCIT. Methods: The research develops and analyzes two models, sequential latent-variable model (SLVM) of Sequential Latent Actor-Critic (SLAC) and Long Short-Term Memory (LSTM) evaluating them based on scoring and adherence prediction capabilities. Results: Excluding the biased samples at the first time step, the predictive adherence accuracy of the SLAC models is from 60\% to 72\%, and for LSTM models, it is 66\% to 84\%, varying according to the time steps. The range of Root Mean Square Error (RMSE) for SLAC models is between 0.93 and 2.22, while for LSTM models it is between 1.09 and 1.77. Notably, these RMSEs are significantly lower than the random prediction error of 4.55. Conclusion: We creatively apply sequential models in the long-term management of SCIT with promising accuracy in the prediction of SCIT nonadherence in AR patients. While LSTM outperforms SLAC in adherence prediction, SLAC excels in score prediction for patients undergoing SCIT for AR. The state-action-based SLAC adds flexibility, presenting a novel and effective approach for managing long-term AIT. △ Less

Submitted 28 June, 2024; v1 submitted 21 January, 2024; originally announced January 2024.

Comments: Frontiers in Pharmacology, research topic: Methods and Metrics to Measure Medication Adherence

arXiv:2312.03673 [pdf, other]

doi 10.1109/LRA.2024.3398428

On the Role of the Action Space in Robot Manipulation Learning and Sim-to-Real Transfer

Authors: Elie Aljalbout, Felix Frank, Maximilian Karl, Patrick van der Smagt

Abstract: We study the choice of action space in robot manipulation learning and sim-to-real transfer. We define metrics that assess the performance, and examine the emerging properties in the different action spaces. We train over 250 reinforcement learning~(RL) agents in simulated reaching and pushing tasks, using 13 different control spaces. The choice of spaces spans combinations of common action space… ▽ More We study the choice of action space in robot manipulation learning and sim-to-real transfer. We define metrics that assess the performance, and examine the emerging properties in the different action spaces. We train over 250 reinforcement learning~(RL) agents in simulated reaching and pushing tasks, using 13 different control spaces. The choice of spaces spans combinations of common action space design characteristics. We evaluate the training performance in simulation and the transfer to a real-world environment. We identify good and bad characteristics of robotic action spaces and make recommendations for future designs. Our findings have important implications for the design of RL algorithms for robot manipulation tasks, and highlight the need for careful consideration of action spaces when training and transferring RL agents for real-world robotics. △ Less

Submitted 29 April, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

arXiv:2312.02019 [pdf, other]

Action Inference by Maximising Evidence: Zero-Shot Imitation from Observation with World Models

Authors: Xingyuan Zhang, Philip Becker-Ehmck, Patrick van der Smagt, Maximilian Karl

Abstract: Unlike most reinforcement learning agents which require an unrealistic amount of environment interactions to learn a new behaviour, humans excel at learning quickly by merely observing and imitating others. This ability highly depends on the fact that humans have a model of their own embodiment that allows them to infer the most likely actions that led to the observed behaviour. In this paper, we… ▽ More Unlike most reinforcement learning agents which require an unrealistic amount of environment interactions to learn a new behaviour, humans excel at learning quickly by merely observing and imitating others. This ability highly depends on the fact that humans have a model of their own embodiment that allows them to infer the most likely actions that led to the observed behaviour. In this paper, we propose Action Inference by Maximising Evidence (AIME) to replicate this behaviour using world models. AIME consists of two distinct phases. In the first phase, the agent learns a world model from its past experience to understand its own body by maximising the ELBO. While in the second phase, the agent is given some observation-only demonstrations of an expert performing a novel task and tries to imitate the expert's behaviour. AIME achieves this by defining a policy as an inference model and maximising the evidence of the demonstration under the policy and world model. Our method is "zero-shot" in the sense that it does not require further training for the world model or online interactions with the environment after given the demonstration. We empirically validate the zero-shot imitation performance of our method on the Walker and Cheetah embodiment of the DeepMind Control Suite and find it outperforms the state-of-the-art baselines. Code is available at: https://github.com/argmax-ai/aime. △ Less

Submitted 4 December, 2023; originally announced December 2023.

Comments: NeurIPS 2023

arXiv:2304.10246 [pdf, other]

Filter-Aware Model-Predictive Control

Authors: Baris Kayalibay, Atanas Mirchev, Ahmed Agha, Patrick van der Smagt, Justin Bayer

Abstract: Partially-observable problems pose a trade-off between reducing costs and gathering information. They can be solved optimally by planning in belief space, but that is often prohibitively expensive. Model-predictive control (MPC) takes the alternative approach of using a state estimator to form a belief over the state, and then plan in state space. This ignores potential future observations during… ▽ More Partially-observable problems pose a trade-off between reducing costs and gathering information. They can be solved optimally by planning in belief space, but that is often prohibitively expensive. Model-predictive control (MPC) takes the alternative approach of using a state estimator to form a belief over the state, and then plan in state space. This ignores potential future observations during planning and, as a result, cannot actively increase or preserve the certainty of its own state estimate. We find a middle-ground between planning in belief space and completely ignoring its dynamics by only reasoning about its future accuracy. Our approach, filter-aware MPC, penalises the loss of information by what we call "trackability", the expected error of the state estimator. We show that model-based simulation allows condensing trackability into a neural network, which allows fast planning. In experiments involving visual navigation, realistic every-day environments and a two-link robot arm, we show that filter-aware MPC vastly improves regular MPC. △ Less

Submitted 20 April, 2023; originally announced April 2023.

arXiv:2212.02988 [pdf, other]

PRISM: Probabilistic Real-Time Inference in Spatial World Models

Authors: Atanas Mirchev, Baris Kayalibay, Ahmed Agha, Patrick van der Smagt, Daniel Cremers, Justin Bayer

Abstract: We introduce PRISM, a method for real-time filtering in a probabilistic generative model of agent motion and visual perception. Previous approaches either lack uncertainty estimates for the map and agent state, do not run in real-time, do not have a dense scene representation or do not model agent dynamics. Our solution reconciles all of these aspects. We start from a predefined state-space model… ▽ More We introduce PRISM, a method for real-time filtering in a probabilistic generative model of agent motion and visual perception. Previous approaches either lack uncertainty estimates for the map and agent state, do not run in real-time, do not have a dense scene representation or do not model agent dynamics. Our solution reconciles all of these aspects. We start from a predefined state-space model which combines differentiable rendering and 6-DoF dynamics. Probabilistic inference in this model amounts to simultaneous localisation and mapping (SLAM) and is intractable. We use a series of approximations to Bayesian inference to arrive at probabilistic map and state estimates. We take advantage of well-established methods and closed-form updates, preserving accuracy and enabling real-time capability. The proposed solution runs at 10Hz real-time and is similarly accurate to state-of-the-art SLAM in small to medium-sized indoor environments, with high-speed UAV and handheld camera agents (Blackbird, EuRoC and TUM-RGBD). △ Less

Submitted 6 December, 2022; originally announced December 2022.

Comments: Will appear in PMLR, CoRL 2022

arXiv:2211.15824 [pdf, other]

CLAS: Coordinating Multi-Robot Manipulation with Central Latent Action Spaces

Authors: Elie Aljalbout, Maximilian Karl, Patrick van der Smagt

Abstract: Multi-robot manipulation tasks involve various control entities that can be separated into dynamically independent parts. A typical example of such real-world tasks is dual-arm manipulation. Learning to naively solve such tasks with reinforcement learning is often unfeasible due to the sample complexity and exploration requirements growing with the dimensionality of the action and state spaces. In… ▽ More Multi-robot manipulation tasks involve various control entities that can be separated into dynamically independent parts. A typical example of such real-world tasks is dual-arm manipulation. Learning to naively solve such tasks with reinforcement learning is often unfeasible due to the sample complexity and exploration requirements growing with the dimensionality of the action and state spaces. Instead, we would like to handle such environments as multi-agent systems and have several agents control parts of the whole. However, decentralizing the generation of actions requires coordination across agents through a channel limited to information central to the task. This paper proposes an approach to coordinating multi-robot manipulation through learned latent action spaces that are shared across different agents. We validate our method in simulated multi-robot manipulation tasks and demonstrate improvement over previous baselines in terms of sample efficiency and learning performance. △ Less

Submitted 28 November, 2022; originally announced November 2022.

ACM Class: I.2.6; I.2.8; I.2.9

arXiv:2209.09453 [pdf, other]

Probabilistic Dalek -- Emulator framework with probabilistic prediction for supernova tomography

Authors: Wolfgang Kerzendorf, Nutan Chen, Jack O'Brien, Johannes Buchner, Patrick van der Smagt

Abstract: Supernova spectral time series can be used to reconstruct a spatially resolved explosion model known as supernova tomography. In addition to an observed spectral time series, a supernova tomography requires a radiative transfer model to perform the inverse problem with uncertainty quantification for a reconstruction. The smallest parametrizations of supernova tomography models are roughly a dozen… ▽ More Supernova spectral time series can be used to reconstruct a spatially resolved explosion model known as supernova tomography. In addition to an observed spectral time series, a supernova tomography requires a radiative transfer model to perform the inverse problem with uncertainty quantification for a reconstruction. The smallest parametrizations of supernova tomography models are roughly a dozen parameters with a realistic one requiring more than 100. Realistic radiative transfer models require tens of CPU minutes for a single evaluation making the problem computationally intractable with traditional means requiring millions of MCMC samples for such a problem. A new method for accelerating simulations known as surrogate models or emulators using machine learning techniques offers a solution for such problems and a way to understand progenitors/explosions from spectral time series. There exist emulators for the TARDIS supernova radiative transfer code but they only perform well on simplistic low-dimensional models (roughly a dozen parameters) with a small number of applications for knowledge gain in the supernova field. In this work, we present a new emulator for the radiative transfer code TARDIS that not only outperforms existing emulators but also provides uncertainties in its prediction. It offers the foundation for a future active-learning-based machinery that will be able to emulate very high dimensional spaces of hundreds of parameters crucial for unraveling urgent questions in supernovae and related fields. △ Less

Submitted 20 September, 2022; originally announced September 2022.

Comments: 7 pages, accepted at ICML 2022 Workshop on Machine Learning for Astrophysics

arXiv:2206.05909 [pdf, other]

Local Distance Preserving Auto-encoders using Continuous k-Nearest Neighbours Graphs

Authors: Nutan Chen, Patrick van der Smagt, Botond Cseke

Abstract: Auto-encoder models that preserve similarities in the data are a popular tool in representation learning. In this paper we introduce several auto-encoder models that preserve local distances when mapping from the data space to the latent space. We use a local distance preserving loss that is based on the continuous k-nearest neighbours graph which is known to capture topological features at all sc… ▽ More Auto-encoder models that preserve similarities in the data are a popular tool in representation learning. In this paper we introduce several auto-encoder models that preserve local distances when mapping from the data space to the latent space. We use a local distance preserving loss that is based on the continuous k-nearest neighbours graph which is known to capture topological features at all scales simultaneously. To improve training performance, we formulate learning as a constraint optimisation problem with local distance preservation as the main objective and reconstruction accuracy as a constraint. We generalise this approach to hierarchical variational auto-encoders thus learning generative models with geometrically consistent latent and data spaces. Our method provides state-of-the-art performance across several standard datasets and evaluation metrics. △ Less

Submitted 30 September, 2022; v1 submitted 13 June, 2022; originally announced June 2022.

arXiv:2202.12243 [pdf, other]

Flat Latent Manifolds for Human-machine Co-creation of Music

Authors: Nutan Chen, Djalel Benbouzid, Francesco Ferroni, Mathis Nitschke, Luciano Pinna, Patrick van der Smagt

Abstract: The use of machine learning in artistic music generation leads to controversial discussions of the quality of art, for which objective quantification is nonsensical. We therefore consider a music-generating algorithm as a counterpart to a human musician, in a setting where reciprocal interplay is to lead to new experiences, both for the musician and the audience. To obtain this behaviour, we resor… ▽ More The use of machine learning in artistic music generation leads to controversial discussions of the quality of art, for which objective quantification is nonsensical. We therefore consider a music-generating algorithm as a counterpart to a human musician, in a setting where reciprocal interplay is to lead to new experiences, both for the musician and the audience. To obtain this behaviour, we resort to the framework of recurrent Variational Auto-Encoders (VAE) and learn to generate music, seeded by a human musician. In the learned model, we generate novel musical sequences by interpolation in latent space. Standard VAEs however do not guarantee any form of smoothness in their latent representation. This translates into abrupt changes in the generated music sequences. To overcome these limitations, we regularise the decoder and endow the latent space with a flat Riemannian manifold, i.e., a manifold that is isometric to the Euclidean space. As a result, linearly interpolating in the latent space yields realistic and smooth musical changes that fit the type of machine--musician interactions we aim for. We provide empirical evidence for our method via a set of experiments on music datasets and we deploy our model for an interactive jam session with a professional drummer. The live performance provides qualitative evidence that the latent representation can be intuitively interpreted and exploited by the drummer to drive the interplay. Beyond the musical application, our approach showcases an instance of human-centred design of machine-learning models, driven by interpretability and the interaction with the end user. △ Less

Submitted 10 August, 2022; v1 submitted 23 February, 2022; originally announced February 2022.

Comments: 3rd Conference on AI Music Creativity (AIMC 2022)

arXiv:2201.10335 [pdf, other]

Tracking and Planning with Spatial World Models

Authors: Baris Kayalibay, Atanas Mirchev, Patrick van der Smagt, Justin Bayer

Abstract: We introduce a method for real-time navigation and tracking with differentiably rendered world models. Learning models for control has led to impressive results in robotics and computer games, but this success has yet to be extended to vision-based navigation. To address this, we transfer advances in the emergent field of differentiable rendering to model-based control. We do this by planning in a… ▽ More We introduce a method for real-time navigation and tracking with differentiably rendered world models. Learning models for control has led to impressive results in robotics and computer games, but this success has yet to be extended to vision-based navigation. To address this, we transfer advances in the emergent field of differentiable rendering to model-based control. We do this by planning in a learned 3D spatial world model, combined with a pose estimation algorithm previously used in the context of TSDF fusion, but now tailored to our setting and improved to incorporate agent dynamics. We evaluate over six simulated environments based on complex human-designed floor plans and provide quantitative results. We achieve up to 92% navigation success rate at a frequency of 15 Hz using only image and depth observations under stochastic, continuous dynamics. △ Less

Submitted 25 January, 2022; originally announced January 2022.

arXiv:2101.12561 [pdf, other]

doi 10.1109/TRO.2021.3127108

Constrained Probabilistic Movement Primitives for Robot Trajectory Adaptation

Authors: Felix Frank, Alexandros Paraschos, Patrick van der Smagt, Botond Cseke

Abstract: Placing robots outside controlled conditions requires versatile movement representations that allow robots to learn new tasks and adapt them to environmental changes. The introduction of obstacles or the placement of additional robots in the workspace, the modification of the joint range due to faults or range-of-motion constraints are typical cases where the adaptation capabilities play a key rol… ▽ More Placing robots outside controlled conditions requires versatile movement representations that allow robots to learn new tasks and adapt them to environmental changes. The introduction of obstacles or the placement of additional robots in the workspace, the modification of the joint range due to faults or range-of-motion constraints are typical cases where the adaptation capabilities play a key role for safely performing the robot's task. Probabilistic movement primitives (ProMPs) have been proposed for representing adaptable movement skills, which are modelled as Gaussian distributions over trajectories. These are analytically tractable and can be learned from a small number of demonstrations. However, both the original ProMP formulation and the subsequent approaches only provide solutions to specific movement adaptation problems, e.g., obstacle avoidance, and a generic, unifying, probabilistic approach to adaptation is missing. In this paper we develop a generic probabilistic framework for adapting ProMPs. We unify previous adaptation techniques, for example, various types of obstacle avoidance, via-points, mutual avoidance, in one single framework and combine them to solve complex robotic problems. Additionally, we derive novel adaptation techniques such as temporally unbound via-points and mutual avoidance. We formulate adaptation as a constrained optimisation problem where we minimise the Kullback-Leibler divergence between the adapted distribution and the distribution of the original primitive while we constrain the probability mass associated with undesired trajectories to be low. We demonstrate our approach on several adaptation problems on simulated planar robot arms and 7-DOF Franka-Emika robots in a dual robot arm setting. △ Less

Submitted 5 January, 2022; v1 submitted 29 January, 2021; originally announced January 2021.

Comments: There is a supplementary video accompanying the paper. It can be found at https://youtu.be/7UI6QX-eZ3I

arXiv:2101.07046 [pdf, other]

Mind the Gap when Conditioning Amortised Inference in Sequential Latent-Variable Models

Authors: Justin Bayer, Maximilian Soelch, Atanas Mirchev, Baris Kayalibay, Patrick van der Smagt

Abstract: Amortised inference enables scalable learning of sequential latent-variable models (LVMs) with the evidence lower bound (ELBO). In this setting, variational posteriors are often only partially conditioned. While the true posteriors depend, e.g., on the entire sequence of observations, approximate posteriors are only informed by past observations. This mimics the Bayesian filter -- a mixture of smo… ▽ More Amortised inference enables scalable learning of sequential latent-variable models (LVMs) with the evidence lower bound (ELBO). In this setting, variational posteriors are often only partially conditioned. While the true posteriors depend, e.g., on the entire sequence of observations, approximate posteriors are only informed by past observations. This mimics the Bayesian filter -- a mixture of smoothing posteriors. Yet, we show that the ELBO objective forces partially-conditioned amortised posteriors to approximate products of smoothing posteriors instead. Consequently, the learned generative model is compromised. We demonstrate these theoretical findings in three scenarios: traffic flow, handwritten digits, and aerial vehicle dynamics. Using fully-conditioned approximate posteriors, performance improves in terms of generative modelling and multi-step prediction. △ Less

Submitted 17 March, 2021; v1 submitted 18 January, 2021; originally announced January 2021.

Comments: Published as a conference paper at ICLR 2021 (Poster)

arXiv:2007.01868 [pdf, other]

doi 10.3847/2041-8213/abeb1b

Dalek -- a deep-learning emulator for TARDIS

Authors: Wolfgang E. Kerzendorf, Christian Vogl, Johannes Buchner, Gabriella Contardo, Marc Williamson, Patrick van der Smagt

Abstract: Supernova spectral time series contain a wealth of information about the progenitor and explosion process of these energetic events. The modeling of these data requires the exploration of very high dimensional posterior probabilities with expensive radiative transfer codes. Even modest parametrizations of supernovae contain more than ten parameters and a detailed exploration demands at least sever… ▽ More Supernova spectral time series contain a wealth of information about the progenitor and explosion process of these energetic events. The modeling of these data requires the exploration of very high dimensional posterior probabilities with expensive radiative transfer codes. Even modest parametrizations of supernovae contain more than ten parameters and a detailed exploration demands at least several million function evaluations. Physically realistic models require at least tens of CPU minutes per evaluation putting a detailed reconstruction of the explosion out of reach of traditional methodology. The advent of widely available libraries for the training of neural networks combined with their ability to approximate almost arbitrary functions with high precision allows for a new approach to this problem. Instead of evaluating the radiative transfer model itself, one can build a neural network proxy trained on the simulations but evaluating orders of magnitude faster. Such a framework is called an emulator or surrogate model. In this work, we present an emulator for the TARDIS supernova radiative transfer code applied to Type Ia supernova spectra. We show that we can train an emulator for this problem given a modest training set of a hundred thousand spectra (easily calculable on modern supercomputers). The results show an accuracy on the percent level (that are dominated by the Monte Carlo nature of TARDIS and not the emulator) with a speedup of several orders of magnitude. This method has a much broader set of applications and is not limited to the presented problem. △ Less

Submitted 3 July, 2020; originally announced July 2020.

Comments: 6 pages;5 figures submitted to AAS Journals. Constructive Criticism invited

arXiv:2006.14904 [pdf, other]

doi 10.1007/s42484-020-00036-4

Layerwise learning for quantum neural networks

Authors: Andrea Skolik, Jarrod R. McClean, Masoud Mohseni, Patrick van der Smagt, Martin Leib

Abstract: With the increased focus on quantum circuit learning for near-term applications on quantum devices, in conjunction with unique challenges presented by cost function landscapes of parametrized quantum circuits, strategies for effective training are becoming increasingly important. In order to ameliorate some of these challenges, we investigate a layerwise learning strategy for parametrized quantum… ▽ More With the increased focus on quantum circuit learning for near-term applications on quantum devices, in conjunction with unique challenges presented by cost function landscapes of parametrized quantum circuits, strategies for effective training are becoming increasingly important. In order to ameliorate some of these challenges, we investigate a layerwise learning strategy for parametrized quantum circuits. The circuit depth is incrementally grown during optimization, and only subsets of parameters are updated in each training step. We show that when considering sampling noise, this strategy can help avoid the problem of barren plateaus of the error surface due to the low depth of circuits, low number of parameters trained in one step, and larger magnitude of gradients compared to training the full circuit. These properties make our algorithm preferable for execution on noisy intermediate-scale quantum devices. We demonstrate our approach on an image-classification task on handwritten digits, and show that layerwise learning attains an 8% lower generalization error on average in comparison to standard learning schemes for training quantum circuits of the same size. Additionally, the percentage of runs that reach lower test errors is up to 40% larger compared to training the full circuit, which is susceptible to creeping onto a plateau during training. △ Less

Submitted 26 June, 2020; originally announced June 2020.

Comments: 11 pages, 7 figures

Journal ref: Quantum Machine Intelligence Vol. 3, No. 5 (2021)

arXiv:2006.10178 [pdf, other]

Variational State-Space Models for Localisation and Dense 3D Mapping in 6 DoF

Authors: Atanas Mirchev, Baris Kayalibay, Patrick van der Smagt, Justin Bayer

Abstract: We solve the problem of 6-DoF localisation and 3D dense reconstruction in spatial environments as approximate Bayesian inference in a deep state-space model. Our approach leverages both learning and domain knowledge from multiple-view geometry and rigid-body dynamics. This results in an expressive predictive model of the world, often missing in current state-of-the-art visual SLAM solutions. The c… ▽ More We solve the problem of 6-DoF localisation and 3D dense reconstruction in spatial environments as approximate Bayesian inference in a deep state-space model. Our approach leverages both learning and domain knowledge from multiple-view geometry and rigid-body dynamics. This results in an expressive predictive model of the world, often missing in current state-of-the-art visual SLAM solutions. The combination of variational inference, neural networks and a differentiable raycaster ensures that our model is amenable to end-to-end gradient-based optimisation. We evaluate our approach on realistic unmanned aerial vehicle flight data, nearing the performance of state-of-the-art visual-inertial odometry systems. We demonstrate the applicability of the model to generative prediction and planning. △ Less

Submitted 15 March, 2021; v1 submitted 17 June, 2020; originally announced June 2020.

Comments: Update for ICLR2021

arXiv:2003.08876 [pdf, other]

Learning to Fly via Deep Model-Based Reinforcement Learning

Authors: Philip Becker-Ehmck, Maximilian Karl, Jan Peters, Patrick van der Smagt

Abstract: Learning to control robots without requiring engineered models has been a long-term goal, promising diverse and novel applications. Yet, reinforcement learning has only achieved limited impact on real-time robot control due to its high demand of real-world interactions. In this work, by leveraging a learnt probabilistic model of drone dynamics, we learn a thrust-attitude controller for a quadrotor… ▽ More Learning to control robots without requiring engineered models has been a long-term goal, promising diverse and novel applications. Yet, reinforcement learning has only achieved limited impact on real-time robot control due to its high demand of real-world interactions. In this work, by leveraging a learnt probabilistic model of drone dynamics, we learn a thrust-attitude controller for a quadrotor through model-based reinforcement learning. No prior knowledge of the flight dynamics is assumed; instead, a sequential latent variable model, used generatively and as an online filter, is learnt from raw sensory input. The controller and value function are optimised entirely by propagating stochastic analytic gradients through generated latent trajectories. We show that "learning to fly" can be achieved with less than 30 minutes of experience with a single drone, and can be deployed solely using onboard computational resources and sensors, on a self-built drone. △ Less

Submitted 4 August, 2020; v1 submitted 19 March, 2020; originally announced March 2020.

arXiv:2002.04881 [pdf, other]

Learning Flat Latent Manifolds with VAEs

Authors: Nutan Chen, Alexej Klushyn, Francesco Ferroni, Justin Bayer, Patrick van der Smagt

Abstract: Measuring the similarity between data points often requires domain knowledge, which can in parts be compensated by relying on unsupervised methods such as latent-variable models, where similarity/distance is estimated in a more compact latent space. Prevalent is the use of the Euclidean metric, which has the drawback of ignoring information about similarity of data stored in the decoder, as captur… ▽ More Measuring the similarity between data points often requires domain knowledge, which can in parts be compensated by relying on unsupervised methods such as latent-variable models, where similarity/distance is estimated in a more compact latent space. Prevalent is the use of the Euclidean metric, which has the drawback of ignoring information about similarity of data stored in the decoder, as captured by the framework of Riemannian geometry. We propose an extension to the framework of variational auto-encoders allows learning flat latent manifolds, where the Euclidean metric is a proxy for the similarity between data points. This is achieved by defining the latent space as a Riemannian manifold and by regularising the metric tensor to be a scaled identity matrix. Additionally, we replace the compact prior typically used in variational auto-encoders with a recently presented, more expressive hierarchical one---and formulate the learning problem as a constrained optimisation problem. We evaluate our method on a range of data-sets, including a video-tracking benchmark, where the performance of our unsupervised approach nears that of state-of-the-art supervised approaches, while retaining the computational efficiency of straight-line-based approaches. △ Less

Submitted 12 August, 2020; v1 submitted 12 February, 2020; originally announced February 2020.

Comments: Thirty-seventh International Conference on Machine Learning (ICML) 2020

Journal ref: International Conference on Machine Learning 2020

arXiv:1911.00756 [pdf, other]

Beta DVBF: Learning State-Space Models for Control from High Dimensional Observations

Authors: Neha Das, Maximilian Karl, Philip Becker-Ehmck, Patrick van der Smagt

Abstract: Learning a model of dynamics from high-dimensional images can be a core ingredient for success in many applications across different domains, especially in sequential decision making. However, currently prevailing methods based on latent-variable models are limited to working with low resolution images only. In this work, we show that some of the issues with using high-dimensional observations ari… ▽ More Learning a model of dynamics from high-dimensional images can be a core ingredient for success in many applications across different domains, especially in sequential decision making. However, currently prevailing methods based on latent-variable models are limited to working with low resolution images only. In this work, we show that some of the issues with using high-dimensional observations arise from the discrepancy between the dimensionality of the latent and observable space, and propose solutions to overcome them. △ Less

Submitted 2 November, 2019; originally announced November 2019.

arXiv:1910.06205 [pdf, other]

Variational Tracking and Prediction with Generative Disentangled State-Space Models

Authors: Adnan Akhundov, Maximilian Soelch, Justin Bayer, Patrick van der Smagt

Abstract: We address tracking and prediction of multiple moving objects in visual data streams as inference and sampling in a disentangled latent state-space model. By encoding objects separately and including explicit position information in the latent state space, we perform tracking via amortized variational Bayesian inference of the respective latent positions. Inference is implemented in a modular neur… ▽ More We address tracking and prediction of multiple moving objects in visual data streams as inference and sampling in a disentangled latent state-space model. By encoding objects separately and including explicit position information in the latent state space, we perform tracking via amortized variational Bayesian inference of the respective latent positions. Inference is implemented in a modular neural framework tailored towards our disentangled latent space. Generative and inference model are jointly learned from observations only. Comparing to related prior work, we empirically show that our Markovian state-space assumption enables faithful and much improved long-term prediction well beyond the training horizon. Further, our inference model correctly decomposes frames into objects, even in the presence of occlusions. Tracking performance is increased significantly over prior art. △ Less

Submitted 14 October, 2019; originally announced October 2019.

arXiv:1909.05659 [pdf, other]

Estimating Fingertip Forces, Torques, and Local Curvatures from Fingernail Images

Authors: Nutan Chen, Göran Westling, Benoni B. Edin, Patrick van der Smagt

Abstract: The study of dexterous manipulation has provided important insights in humans sensorimotor control as well as inspiration for manipulation strategies in robotic hands. Previous work focused on experimental environment with restrictions. Here we describe a method using the deformation and color distribution of the fingernail and its surrounding skin, to estimate the fingertip forces, torques and co… ▽ More The study of dexterous manipulation has provided important insights in humans sensorimotor control as well as inspiration for manipulation strategies in robotic hands. Previous work focused on experimental environment with restrictions. Here we describe a method using the deformation and color distribution of the fingernail and its surrounding skin, to estimate the fingertip forces, torques and contact surface curvatures for various objects, including the shape and material of the contact surfaces and the weight of the objects. The proposed method circumvents limitations associated with sensorized objects, gloves or fixed contact surface type. In addition, compared with previous single finger estimation in an experimental environment, we extend the approach to multiple finger force estimation, which can be used for applications such as human grasping analysis. Four algorithms are used, c.q., Gaussian process (GP), Convolutional Neural Networks (CNN), Neural Networks with Fast Dropout (NN-FD) and Recurrent Neural Networks with Fast Dropout (RNN-FD), to model a mapping from images to the corresponding labels. The results further show that the proposed method has high accuracy to predict force, torque and contact surface. △ Less

Submitted 9 September, 2019; originally announced September 2019.

Comments: Robotica

arXiv:1908.08750 [pdf, other]

Increasing the Generalisation Capacity of Conditional VAEs

Authors: Alexej Klushyn, Nutan Chen, Botond Cseke, Justin Bayer, Patrick van der Smagt

Abstract: We address the problem of one-to-many mappings in supervised learning, where a single instance has many different solutions of possibly equal cost. The framework of conditional variational autoencoders describes a class of methods to tackle such structured-prediction tasks by means of latent variables. We propose to incentivise informative latent representations for increasing the generalisation c… ▽ More We address the problem of one-to-many mappings in supervised learning, where a single instance has many different solutions of possibly equal cost. The framework of conditional variational autoencoders describes a class of methods to tackle such structured-prediction tasks by means of latent variables. We propose to incentivise informative latent representations for increasing the generalisation capacity of conditional variational autoencoders. To this end, we modify the latent variable model by defining the likelihood as a function of the latent variable only and introduce an expressive multimodal prior to enable the model for capturing semantically meaningful features of the data. To validate our approach, we train our model on the Cornell Robot Grasping dataset, and modified versions of MNIST and Fashion-MNIST obtaining results that show a significantly higher generalisation capability. △ Less

Submitted 10 September, 2019; v1 submitted 23 August, 2019; originally announced August 2019.

arXiv:1905.12434 [pdf, other]

Switching Linear Dynamics for Variational Bayes Filtering

Authors: Philip Becker-Ehmck, Jan Peters, Patrick van der Smagt

Abstract: System identification of complex and nonlinear systems is a central problem for model predictive control and model-based reinforcement learning. Despite their complexity, such systems can often be approximated well by a set of linear dynamical systems if broken into appropriate subsequences. This mechanism not only helps us find good approximations of dynamics, but also gives us deeper insight int… ▽ More System identification of complex and nonlinear systems is a central problem for model predictive control and model-based reinforcement learning. Despite their complexity, such systems can often be approximated well by a set of linear dynamical systems if broken into appropriate subsequences. This mechanism not only helps us find good approximations of dynamics, but also gives us deeper insight into the underlying system. Leveraging Bayesian inference, Variational Autoencoders and Concrete relaxations, we show how to learn a richer and more meaningful state space, e.g. encoding joint constraints and collisions with walls in a maze, from partial and high-dimensional observations. This representation translates into a gain of accuracy of learned dynamics showcased on various simulated tasks. △ Less

Submitted 29 May, 2019; originally announced May 2019.

Comments: Appears in Proceedings of the 36th International Conference on Machine Learning (ICML)

arXiv:1905.04982 [pdf, other]

Learning Hierarchical Priors in VAEs

Authors: Alexej Klushyn, Nutan Chen, Richard Kurle, Botond Cseke, Patrick van der Smagt

Abstract: We propose to learn a hierarchical prior in the context of variational autoencoders to avoid the over-regularisation resulting from a standard normal prior distribution. To incentivise an informative latent representation of the data, we formulate the learning problem as a constrained optimisation problem by extending the Taming VAEs framework to two-level hierarchical models. We introduce a graph… ▽ More We propose to learn a hierarchical prior in the context of variational autoencoders to avoid the over-regularisation resulting from a standard normal prior distribution. To incentivise an informative latent representation of the data, we formulate the learning problem as a constrained optimisation problem by extending the Taming VAEs framework to two-level hierarchical models. We introduce a graph-based interpolation method, which shows that the topology of the learned latent representation corresponds to the topology of the data manifold---and present several examples, where desired properties of latent representation such as smoothness and simple explanatory factors are learned by the prior. △ Less

Submitted 5 October, 2019; v1 submitted 13 May, 2019; originally announced May 2019.

Comments: Published at NeurIPS 2019 (spotlight)

arXiv:1903.07348 [pdf, other]

doi 10.1007/978-3-030-30487-4_35

On Deep Set Learning and the Choice of Aggregations

Authors: Maximilian Soelch, Adnan Akhundov, Patrick van der Smagt, Justin Bayer

Abstract: Recently, it has been shown that many functions on sets can be represented by sum decompositions. These decompositons easily lend themselves to neural approximations, extending the applicability of neural nets to set-valued inputs---Deep Set learning. This work investigates a core component of Deep Set architecture: aggregation functions. We suggest and examine alternatives to commonly used aggreg… ▽ More Recently, it has been shown that many functions on sets can be represented by sum decompositions. These decompositons easily lend themselves to neural approximations, extending the applicability of neural nets to set-valued inputs---Deep Set learning. This work investigates a core component of Deep Set architecture: aggregation functions. We suggest and examine alternatives to commonly used aggregation functions, including learnable recurrent aggregation functions. Empirically, we show that the Deep Set networks are highly sensitive to the choice of aggregation functions: beyond improved performance, we find that learnable aggregations lower hyper-parameter sensitivity and generalize better to out-of-distribution input size. △ Less

Submitted 8 April, 2020; v1 submitted 18 March, 2019; originally announced March 2019.

arXiv:1901.04436 [pdf, other]

Bayesian Learning of Neural Network Architectures

Authors: Georgi Dikov, Patrick van der Smagt, Justin Bayer

Abstract: In this paper we propose a Bayesian method for estimating architectural parameters of neural networks, namely layer size and network depth. We do this by learning concrete distributions over these parameters. Our results show that regular networks with a learnt structure can generalise better on small datasets, while fully stochastic networks can be more robust to parameter initialisation. The pro… ▽ More In this paper we propose a Bayesian method for estimating architectural parameters of neural networks, namely layer size and network depth. We do this by learning concrete distributions over these parameters. Our results show that regular networks with a learnt structure can generalise better on small datasets, while fully stochastic networks can be more robust to parameter initialisation. The proposed method relies on standard neural variational learning and, unlike randomised architecture search, does not require a retraining of the model, thus keeping the computational overhead at minimum. △ Less

Submitted 27 January, 2019; v1 submitted 14 January, 2019; originally announced January 2019.

Comments: The 22nd International Conference on Artificial Intelligence and Statistics (AISTATS 2019)

arXiv:1812.08284 [pdf, other]

Fast Approximate Geodesics for Deep Generative Models

Authors: Nutan Chen, Francesco Ferroni, Alexej Klushyn, Alexandros Paraschos, Justin Bayer, Patrick van der Smagt

Abstract: The length of the geodesic between two data points along a Riemannian manifold, induced by a deep generative model, yields a principled measure of similarity. Current approaches are limited to low-dimensional latent spaces, due to the computational complexity of solving a non-convex optimisation problem. We propose finding shortest paths in a finite graph of samples from the aggregate approximate… ▽ More The length of the geodesic between two data points along a Riemannian manifold, induced by a deep generative model, yields a principled measure of similarity. Current approaches are limited to low-dimensional latent spaces, due to the computational complexity of solving a non-convex optimisation problem. We propose finding shortest paths in a finite graph of samples from the aggregate approximate posterior, that can be solved exactly, at greatly reduced runtime, and without a notable loss in quality. Our approach, therefore, is hence applicable to high-dimensional problems, e.g., in the visual domain. We validate our approach empirically on a series of experiments using variational autoencoders applied to image data, including the Chair, FashionMNIST, and human movement data sets. △ Less

Submitted 23 May, 2019; v1 submitted 19 December, 2018; originally announced December 2018.

Comments: 28th International Conference on Artificial Neural Networks, 2019

Journal ref: 28th International Conference on Artificial Neural Networks, 2019

arXiv:1811.04451 [pdf, other]

Multi-Source Neural Variational Inference

Authors: Richard Kurle, Stephan Günnemann, Patrick van der Smagt

Abstract: Learning from multiple sources of information is an important problem in machine-learning research. The key challenges are learning representations and formulating inference methods that take into account the complementarity and redundancy of various information sources. In this paper we formulate a variational autoencoder based multi-source learning framework in which each encoder is conditioned… ▽ More Learning from multiple sources of information is an important problem in machine-learning research. The key challenges are learning representations and formulating inference methods that take into account the complementarity and redundancy of various information sources. In this paper we formulate a variational autoencoder based multi-source learning framework in which each encoder is conditioned on a different information source. This allows us to relate the sources via the shared latent variables by computing divergence measures between individual source's posterior approximations. We explore a variety of options to learn these encoders and to integrate the beliefs they compute into a consistent posterior approximation. We visualise learned beliefs on a toy dataset and evaluate our methods for learning shared representations and structured output prediction, showing trade-offs of learning separate encoders for each information source. Furthermore, we demonstrate how conflict detection and redundancy can increase robustness of inference in a multi-source setting. △ Less

Submitted 17 November, 2018; v1 submitted 11 November, 2018; originally announced November 2018.

Comments: AAAI 2019, Association for the Advancement of Artificial Intelligence (AAAI) 2019

arXiv:1808.02026 [pdf, other]

Active Learning based on Data Uncertainty and Model Sensitivity

Authors: Nutan Chen, Alexej Klushyn, Alexandros Paraschos, Djalel Benbouzid, Patrick van der Smagt

Abstract: Robots can rapidly acquire new skills from demonstrations. However, during generalisation of skills or transitioning across fundamentally different skills, it is unclear whether the robot has the necessary knowledge to perform the task. Failing to detect missing information often leads to abrupt movements or to collisions with the environment. Active learning can quantify the uncertainty of perfor… ▽ More Robots can rapidly acquire new skills from demonstrations. However, during generalisation of skills or transitioning across fundamentally different skills, it is unclear whether the robot has the necessary knowledge to perform the task. Failing to detect missing information often leads to abrupt movements or to collisions with the environment. Active learning can quantify the uncertainty of performing the task and, in general, locate regions of missing information. We introduce a novel algorithm for active learning and demonstrate its utility for generating smooth trajectories. Our approach is based on deep generative models and metric learning in latent spaces. It relies on the Jacobian of the likelihood to detect non-smooth transitions in the latent space, i.e., transitions that lead to abrupt changes in the movement of the robot. When non-smooth transitions are detected, our algorithm asks for an additional demonstration from that specific region. The newly acquired knowledge modifies the data manifold and allows for learning a latent representation for generating smooth movements. We demonstrate the efficacy of our approach on generalising elementary skills, transitioning across different skills, and implicitly avoiding collisions with the environment. For our experiments, we use a simulated pendulum where we observe its motion from images and a 7-DoF anthropomorphic arm. △ Less

Submitted 6 August, 2018; originally announced August 2018.

Comments: Published on 2018 IEEE/RSJ International Conference on Intelligent Robots and System

arXiv:1805.07206 [pdf, other]

Approximate Bayesian inference in spatial environments

Authors: Atanas Mirchev, Baris Kayalibay, Maximilian Soelch, Patrick van der Smagt, Justin Bayer

Abstract: Model-based approaches bear great promise for decision making of agents interacting with the physical world. In the context of spatial environments, different types of problems such as localisation, mapping, navigation or autonomous exploration are typically adressed with specialised methods, often relying on detailed knowledge of the system at hand. We express these tasks as probabilistic inferen… ▽ More Model-based approaches bear great promise for decision making of agents interacting with the physical world. In the context of spatial environments, different types of problems such as localisation, mapping, navigation or autonomous exploration are typically adressed with specialised methods, often relying on detailed knowledge of the system at hand. We express these tasks as probabilistic inference and planning under the umbrella of deep sequential generative models. Using the frameworks of variational inference and neural networks, our method inherits favourable properties such as flexibility, scalability and the ability to learn from data. The method performs comparably to specialised state-of-the-art methodology in two distinct simulated environments. △ Less

Submitted 20 June, 2019; v1 submitted 18 May, 2018; originally announced May 2018.

Comments: Preprint of publication at RSS 2019

arXiv:1711.11059 [pdf, other]

Gaussian Process Neurons Learn Stochastic Activation Functions

Authors: Sebastian Urban, Marcus Basalla, Patrick van der Smagt

Abstract: We propose stochastic, non-parametric activation functions that are fully learnable and individual to each neuron. Complexity and the risk of overfitting are controlled by placing a Gaussian process prior over these functions. The result is the Gaussian process neuron, a probabilistic unit that can be used as the basic building block for probabilistic graphical models that resemble the structure o… ▽ More We propose stochastic, non-parametric activation functions that are fully learnable and individual to each neuron. Complexity and the risk of overfitting are controlled by placing a Gaussian process prior over these functions. The result is the Gaussian process neuron, a probabilistic unit that can be used as the basic building block for probabilistic graphical models that resemble the structure of neural networks. The proposed model can intrinsically handle uncertainties in its inputs and self-estimate the confidence of its predictions. Using variational Bayesian inference and the central limit theorem, a fully deterministic loss function is derived, allowing it to be trained as efficiently as a conventional neural network using mini-batch gradient descent. The posterior distribution of activation functions is inferred from the training data alongside the weights of the network. The proposed model favorably compares to deep Gaussian processes, both in model complexity and efficiency of inference. It can be directly applied to recurrent or convolutional network structures, allowing its use in audio and image processing tasks. As an preliminary empirical evaluation we present experiments on regression and classification tasks, in which our model achieves performance comparable to or better than a Dropout regularized neural network with a fixed activation function. Experiments are ongoing and results will be added as they become available. △ Less

Submitted 29 November, 2017; originally announced November 2017.

arXiv:1711.01348 [pdf, ps, other]

Automatic Differentiation for Tensor Algebras

Authors: Sebastian Urban, Patrick van der Smagt

Abstract: Kjolstad et. al. proposed a tensor algebra compiler. It takes expressions that define a tensor element-wise, such as $f_{ij}(a,b,c,d) = \exp\left[-\sum_{k=0}^4 \left((a_{ik}+b_{jk})^2\, c_{ii} + d_{i+k}^3 \right) \right]$, and generates the corresponding compute kernel code. For machine learning, especially deep learning, it is often necessary to compute the gradient of a loss function… ▽ More Kjolstad et. al. proposed a tensor algebra compiler. It takes expressions that define a tensor element-wise, such as $f_{ij}(a,b,c,d) = \exp\left[-\sum_{k=0}^4 \left((a_{ik}+b_{jk})^2\, c_{ii} + d_{i+k}^3 \right) \right]$, and generates the corresponding compute kernel code. For machine learning, especially deep learning, it is often necessary to compute the gradient of a loss function $l(a,b,c,d)=l(f(a,b,c,d))$ with respect to parameters $a,b,c,d$. If tensor compilers are to be applied in this field, it is necessary to derive expressions for the derivatives of element-wise defined tensors, i.e. expressions for $(da)_{ik}=\partial l/\partial a_{ik}$. When the mapping between function indices and argument indices is not 1:1, special attention is required. For the function $f_{ij} (x) = x_i^2$, the derivative of the loss is $(dx)_i=\partial l/\partial x_i=\sum_j (df)_{ij}2x_i$; the sum is necessary because index $j$ does not appear in the indices of $f$. Another example is $f_{i}(x)=x_{ii}^2$, where $x$ is a matrix; here we have $(dx)_{ij}=δ_{ij}(df)_i2x_{ii}$; the Kronecker delta is necessary because the derivative is zero for off-diagonal elements. Another indexing scheme is used by $f_{ij}(x)=\exp x_{i+j}$; here the correct derivative is $(dx)_{k}=\sum_i (df)_{i,k-i} \exp x_{k}$, where the range of the sum must be chosen appropriately. In this publication we present an algorithm that can handle any case in which the indices of an argument are an arbitrary linear combination of the indices of the function, thus all the above examples can be handled. Sums (and their ranges) and Kronecker deltas are automatically inserted into the derivatives as necessary. Additionally, the indices are transformed, if required (as in the last example). The algorithm outputs a symbolic expression that can be subsequently fed into a tensor algebra compiler. Source code is provided. △ Less

Submitted 3 November, 2017; originally announced November 2017.

Comments: Technical Report

arXiv:1711.01204 [pdf, other]

Metrics for Deep Generative Models

Authors: Nutan Chen, Alexej Klushyn, Richard Kurle, Xueyan Jiang, Justin Bayer, Patrick van der Smagt

Abstract: Neural samplers such as variational autoencoders (VAEs) or generative adversarial networks (GANs) approximate distributions by transforming samples from a simple random source---the latent space---to samples from a more complex distribution represented by a dataset. While the manifold hypothesis implies that the density induced by a dataset contains large regions of low density, the training crite… ▽ More Neural samplers such as variational autoencoders (VAEs) or generative adversarial networks (GANs) approximate distributions by transforming samples from a simple random source---the latent space---to samples from a more complex distribution represented by a dataset. While the manifold hypothesis implies that the density induced by a dataset contains large regions of low density, the training criterions of VAEs and GANs will make the latent space densely covered. Consequently points that are separated by low-density regions in observation space will be pushed together in latent space, making stationary distances poor proxies for similarity. We transfer ideas from Riemannian geometry to this setting, letting the distance between two points be the shortest path on a Riemannian manifold induced by the transformation. The method yields a principled distance measure, provides a tool for visual inspection of deep generative models, and an alternative to linear interpolation in latent space. In addition, it can be applied for robot movement generalization using previously learned skills. The method is evaluated on a synthetic dataset with known ground truth; on a simulated robot arm dataset; on human motion capture data; and on a generative model of handwritten digits. △ Less

Submitted 8 February, 2018; v1 submitted 3 November, 2017; originally announced November 2017.

Comments: Published on the 21st International Conference on Artificial Intelligence and Statistics (AISTATS), 2018

Journal ref: The 21st International Conference on Artificial Intelligence and Statistics, 2018

arXiv:1703.09783 [pdf, ps, other]

doi 10.1109/IROS.2017.8206288

Two-Stream RNN/CNN for Action Recognition in 3D Videos

Authors: Rui Zhao, Haider Ali, Patrick van der Smagt

Abstract: The recognition of actions from video sequences has many applications in health monitoring, assisted living, surveillance, and smart homes. Despite advances in sensing, in particular related to 3D video, the methodologies to process the data are still subject to research. We demonstrate superior results by a system which combines recurrent neural networks with convolutional neural networks in a vo… ▽ More The recognition of actions from video sequences has many applications in health monitoring, assisted living, surveillance, and smart homes. Despite advances in sensing, in particular related to 3D video, the methodologies to process the data are still subject to research. We demonstrate superior results by a system which combines recurrent neural networks with convolutional neural networks in a voting approach. The gated-recurrent-unit-based neural networks are particularly well-suited to distinguish actions based on long-term information from optical tracking data; the 3D-CNNs focus more on detailed, recent information from video data. The resulting features are merged in an SVM which then classifies the movement. In this architecture, our method improves recognition rates of state-of-the-art methods by 14% on standard data sets. △ Less

Submitted 2 October, 2018; v1 submitted 22 March, 2017; originally announced March 2017.

Comments: Published in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

arXiv:1701.03056 [pdf, other]

CNN-based Segmentation of Medical Imaging Data

Authors: Baris Kayalibay, Grady Jensen, Patrick van der Smagt

Abstract: Convolutional neural networks have been applied to a wide variety of computer vision tasks. Recent advances in semantic segmentation have enabled their application to medical image segmentation. While most CNNs use two-dimensional kernels, recent CNN-based publications on medical image segmentation featured three-dimensional kernels, allowing full access to the three-dimensional structure of medic… ▽ More Convolutional neural networks have been applied to a wide variety of computer vision tasks. Recent advances in semantic segmentation have enabled their application to medical image segmentation. While most CNNs use two-dimensional kernels, recent CNN-based publications on medical image segmentation featured three-dimensional kernels, allowing full access to the three-dimensional structure of medical images. Though closely related to semantic segmentation, medical image segmentation includes specific challenges that need to be addressed, such as the scarcity of labelled data, the high class imbalance found in the ground truth and the high memory demand of three-dimensional images. In this work, a CNN-based method with three-dimensional filters is demonstrated and applied to hand and brain MRI. Two modifications to an existing CNN architecture are discussed, along with methods on addressing the aforementioned challenges. While most of the existing literature on medical image segmentation focuses on soft tissue and the major organs, this work is validated on data both from the central nervous system as well as the bones of the hand. △ Less

Submitted 25 July, 2017; v1 submitted 11 January, 2017; originally announced January 2017.

Comments: 24 pages, Code available on https://github.com/BRML/CNNbasedMedicalSegmentation

arXiv:1606.07312 [pdf, other]

Unsupervised preprocessing for Tactile Data

Authors: Maximilian Karl, Justin Bayer, Patrick van der Smagt

Abstract: Tactile information is important for gripping, stable grasp, and in-hand manipulation, yet the complexity of tactile data prevents widespread use of such sensors. We make use of an unsupervised learning algorithm that transforms the complex tactile data into a compact, latent representation without the need to record ground truth reference data. These compact representations can either be used dir… ▽ More Tactile information is important for gripping, stable grasp, and in-hand manipulation, yet the complexity of tactile data prevents widespread use of such sensors. We make use of an unsupervised learning algorithm that transforms the complex tactile data into a compact, latent representation without the need to record ground truth reference data. These compact representations can either be used directly in a reinforcement learning based controller or can be used to calibrate the tactile sensor to physical quantities with only a few datapoints. We show the quality of our latent representation by predicting important features and with a simple control task. △ Less

Submitted 23 June, 2016; originally announced June 2016.

arXiv:1606.06588 [pdf, other]

ML-based tactile sensor calibration: A universal approach

Authors: Maximilian Karl, Artur Lohrer, Dhananjay Shah, Frederik Diehl, Max Fiedler, Saahil Ognawala, Justin Bayer, Patrick van der Smagt

Abstract: We study the responses of two tactile sensors, the fingertip sensor from the iCub and the BioTac under different external stimuli. The question of interest is to which degree both sensors i) allow the estimation of force exerted on the sensor and ii) enable the recognition of differing degrees of curvature. Making use of a force controlled linear motor affecting the tactile sensors we acquire seve… ▽ More We study the responses of two tactile sensors, the fingertip sensor from the iCub and the BioTac under different external stimuli. The question of interest is to which degree both sensors i) allow the estimation of force exerted on the sensor and ii) enable the recognition of differing degrees of curvature. Making use of a force controlled linear motor affecting the tactile sensors we acquire several high-quality data sets allowing the study of both sensors under exactly the same conditions. We also examined the structure of the representation of tactile stimuli in the recorded tactile sensor data using t-SNE embeddings. The experiments show that both the iCub and the BioTac excel in different settings. △ Less

Submitted 21 June, 2016; originally announced June 2016.

arXiv:1605.06432 [pdf, other]

Deep Variational Bayes Filters: Unsupervised Learning of State Space Models from Raw Data

Authors: Maximilian Karl, Maximilian Soelch, Justin Bayer, Patrick van der Smagt

Abstract: We introduce Deep Variational Bayes Filters (DVBF), a new method for unsupervised learning and identification of latent Markovian state space models. Leveraging recent advances in Stochastic Gradient Variational Bayes, DVBF can overcome intractable inference distributions via variational inference. Thus, it can handle highly nonlinear input data with temporal and spatial dependencies such as image… ▽ More We introduce Deep Variational Bayes Filters (DVBF), a new method for unsupervised learning and identification of latent Markovian state space models. Leveraging recent advances in Stochastic Gradient Variational Bayes, DVBF can overcome intractable inference distributions via variational inference. Thus, it can handle highly nonlinear input data with temporal and spatial dependencies such as image sequences without domain knowledge. Our experiments show that enabling backpropagation through transitions enforces state space assumptions and significantly improves information content of the latent embedding. This also enables realistic long-term prediction. △ Less

Submitted 3 March, 2017; v1 submitted 20 May, 2016; originally announced May 2016.

Comments: Published as a conference paper at ICLR 2017

arXiv:1604.03736 [pdf, other]

A Differentiable Transition Between Additive and Multiplicative Neurons

Authors: Wiebke Köpp, Patrick van der Smagt, Sebastian Urban

Abstract: Existing approaches to combine both additive and multiplicative neural units either use a fixed assignment of operations or require discrete optimization to determine what function a neuron should perform. However, this leads to an extensive increase in the computational complexity of the training procedure. We present a novel, parameterizable transfer function based on the mathematical concept… ▽ More Existing approaches to combine both additive and multiplicative neural units either use a fixed assignment of operations or require discrete optimization to determine what function a neuron should perform. However, this leads to an extensive increase in the computational complexity of the training procedure. We present a novel, parameterizable transfer function based on the mathematical concept of non-integer functional iteration that allows the operation each neuron performs to be smoothly and, most importantly, differentiablely adjusted between addition and multiplication. This allows the decision between addition and multiplication to be integrated into the standard backpropagation training procedure. △ Less

Submitted 13 April, 2016; originally announced April 2016.

Comments: ICLR 2016 extended abstract

arXiv:1602.07109 [pdf, other]

Variational Inference for On-line Anomaly Detection in High-Dimensional Time Series

Authors: Maximilian Soelch, Justin Bayer, Marvin Ludersdorfer, Patrick van der Smagt

Abstract: Approximate variational inference has shown to be a powerful tool for modeling unknown complex probability distributions. Recent advances in the field allow us to learn probabilistic models of sequences that actively exploit spatial and temporal structure. We apply a Stochastic Recurrent Network (STORN) to learn robot time series data. Our evaluation demonstrates that we can robustly detect anomal… ▽ More Approximate variational inference has shown to be a powerful tool for modeling unknown complex probability distributions. Recent advances in the field allow us to learn probabilistic models of sequences that actively exploit spatial and temporal structure. We apply a Stochastic Recurrent Network (STORN) to learn robot time series data. Our evaluation demonstrates that we can robustly detect anomalies both off- and on-line. △ Less

Submitted 14 June, 2016; v1 submitted 23 February, 2016; originally announced February 2016.

Comments: Accepted as workshop paper at ICLR 2016; accepted as workshop paper for anomaly detection workshop at ICML 2016

arXiv:1601.04862 [pdf, other]

doi 10.1109/MRA.2016.2535081

Scalability in Neural Control of Musculoskeletal Robots

Authors: Christoph Richter, Sören Jentzsch, Rafael Hostettler, Jesús A. Garrido, Eduardo Ros, Alois C. Knoll, Florian Röhrbein, Patrick van der Smagt, Jörg Conradt

Abstract: Anthropomimetic robots are robots that sense, behave, interact and feel like humans. By this definition, anthropomimetic robots require human-like physical hardware and actuation, but also brain-like control and sensing. The most self-evident realization to meet those requirements would be a human-like musculoskeletal robot with a brain-like neural controller. While both musculoskeletal robotic ha… ▽ More Anthropomimetic robots are robots that sense, behave, interact and feel like humans. By this definition, anthropomimetic robots require human-like physical hardware and actuation, but also brain-like control and sensing. The most self-evident realization to meet those requirements would be a human-like musculoskeletal robot with a brain-like neural controller. While both musculoskeletal robotic hardware and neural control software have existed for decades, a scalable approach that could be used to build and control an anthropomimetic human-scale robot has not been demonstrated yet. Combining Myorobotics, a framework for musculoskeletal robot development, with SpiNNaker, a neuromorphic computing platform, we present the proof-of-principle of a system that can scale to dozens of neurally-controlled, physically compliant joints. At its core, it implements a closed-loop cerebellar model which provides real-time low-level neural control at minimal power consumption and maximal extensibility: higher-order (e.g., cortical) neural networks and neuromorphic sensors like silicon-retinae or -cochleae can naturally be incorporated. △ Less

Submitted 19 January, 2016; originally announced January 2016.

Comments: Accepted at IEEE Robotics and Automation Magazine on 2015-12-31

arXiv:1509.08455 [pdf, other]

Efficient Empowerment

Authors: Maximilian Karl, Justin Bayer, Patrick van der Smagt

Abstract: Empowerment quantifies the influence an agent has on its environment. This is formally achieved by the maximum of the expected KL-divergence between the distribution of the successor state conditioned on a specific action and a distribution where the actions are marginalised out. This is a natural candidate for an intrinsic reward signal in the context of reinforcement learning: the agent will pla… ▽ More Empowerment quantifies the influence an agent has on its environment. This is formally achieved by the maximum of the expected KL-divergence between the distribution of the successor state conditioned on a specific action and a distribution where the actions are marginalised out. This is a natural candidate for an intrinsic reward signal in the context of reinforcement learning: the agent will place itself in a situation where its action have maximum stability and maximum influence on the future. The limiting factor so far has been the computational complexity of the method: the only way of calculation has so far been a brute force algorithm, reducing the applicability of the method to environments with a small set discrete states. In this work, we propose to use an efficient approximation for marginalising out the actions in the case of continuous environments. This allows fast evaluation of empowerment, paving the way towards challenging environments such as real world robotics. The method is presented on a pendulum swing up problem. △ Less

Submitted 28 September, 2015; originally announced September 2015.

arXiv:1507.05331 [pdf, ps, other]

Fast Adaptive Weight Noise

Authors: Justin Bayer, Maximilian Karl, Daniela Korhammer, Patrick van der Smagt

Abstract: Marginalising out uncertain quantities within the internal representations or parameters of neural networks is of central importance for a wide range of learning techniques, such as empirical, variational or full Bayesian methods. We set out to generalise fast dropout (Wang & Manning, 2013) to cover a wider variety of noise processes in neural networks. This leads to an efficient calculation of th… ▽ More Marginalising out uncertain quantities within the internal representations or parameters of neural networks is of central importance for a wide range of learning techniques, such as empirical, variational or full Bayesian methods. We set out to generalise fast dropout (Wang & Manning, 2013) to cover a wider variety of noise processes in neural networks. This leads to an efficient calculation of the marginal likelihood and predictive distribution which evades sampling and the consequential increase in training time due to highly variant gradient estimates. This allows us to approximate variational Bayes for the parameters of feed-forward neural networks. Inspired by the minimum description length principle, we also propose and experimentally verify the direct optimisation of the regularised predictive distribution. The methods yield results competitive with previous neural network based approaches and Gaussian processes on a wide range of regression tasks. △ Less

Submitted 19 July, 2015; originally announced July 2015.

arXiv:1504.06852 [pdf, other]

FlowNet: Learning Optical Flow with Convolutional Networks

Authors: Philipp Fischer, Alexey Dosovitskiy, Eddy Ilg, Philip Häusser, Caner Hazırbaş, Vladimir Golkov, Patrick van der Smagt, Daniel Cremers, Thomas Brox

Abstract: Convolutional neural networks (CNNs) have recently been very successful in a variety of computer vision tasks, especially on those linked to recognition. Optical flow estimation has not been among the tasks where CNNs were successful. In this paper we construct appropriate CNNs which are capable of solving the optical flow estimation problem as a supervised learning task. We propose and compare tw… ▽ More Convolutional neural networks (CNNs) have recently been very successful in a variety of computer vision tasks, especially on those linked to recognition. Optical flow estimation has not been among the tasks where CNNs were successful. In this paper we construct appropriate CNNs which are capable of solving the optical flow estimation problem as a supervised learning task. We propose and compare two architectures: a generic architecture and another one including a layer that correlates feature vectors at different image locations. Since existing ground truth data sets are not sufficiently large to train a CNN, we generate a synthetic Flying Chairs dataset. We show that networks trained on this unrealistic data still generalize very well to existing datasets such as Sintel and KITTI, achieving competitive accuracy at frame rates of 5 to 10 fps. △ Less

Submitted 4 May, 2015; v1 submitted 26 April, 2015; originally announced April 2015.

Comments: Added supplementary material

ACM Class: I.2.6; I.4.8

arXiv:1503.05724 [pdf, other]

A Neural Transfer Function for a Smooth and Differentiable Transition Between Additive and Multiplicative Interactions

Authors: Sebastian Urban, Patrick van der Smagt

Abstract: Existing approaches to combine both additive and multiplicative neural units either use a fixed assignment of operations or require discrete optimization to determine what function a neuron should perform. This leads either to an inefficient distribution of computational resources or an extensive increase in the computational complexity of the training procedure. We present a novel, parameteriza… ▽ More Existing approaches to combine both additive and multiplicative neural units either use a fixed assignment of operations or require discrete optimization to determine what function a neuron should perform. This leads either to an inefficient distribution of computational resources or an extensive increase in the computational complexity of the training procedure. We present a novel, parameterizable transfer function based on the mathematical concept of non-integer functional iteration that allows the operation each neuron performs to be smoothly and, most importantly, differentiablely adjusted between addition and multiplication. This allows the decision between addition and multiplication to be integrated into the standard backpropagation training procedure. △ Less

Submitted 29 March, 2016; v1 submitted 19 March, 2015; originally announced March 2015.

arXiv:1311.0701 [pdf, other]

On Fast Dropout and its Applicability to Recurrent Networks

Authors: Justin Bayer, Christian Osendorfer, Daniela Korhammer, Nutan Chen, Sebastian Urban, Patrick van der Smagt

Abstract: Recurrent Neural Networks (RNNs) are rich models for the processing of sequential data. Recent work on advancing the state of the art has been focused on the optimization or modelling of RNNs, mostly motivated by adressing the problems of the vanishing and exploding gradients. The control of overfitting has seen considerably less attention. This paper contributes to that by analyzing fast dropout,… ▽ More Recurrent Neural Networks (RNNs) are rich models for the processing of sequential data. Recent work on advancing the state of the art has been focused on the optimization or modelling of RNNs, mostly motivated by adressing the problems of the vanishing and exploding gradients. The control of overfitting has seen considerably less attention. This paper contributes to that by analyzing fast dropout, a recent regularization method for generalized linear models and neural networks from a back-propagation inspired perspective. We show that fast dropout implements a quadratic form of an adaptive, per-parameter regularizer, which rewards large weights in the light of underfitting, penalizes them for overconfident predictions and vanishes at minima of an unregularized training loss. The derivatives of that regularizer are exclusively based on the training error signal. One consequence of this is the absense of a global weight attractor, which is particularly appealing for RNNs, since the dynamics are not biased towards a certain regime. We positively test the hypothesis that this improves the performance of RNNs on four musical data sets. △ Less

Submitted 5 March, 2014; v1 submitted 4 November, 2013; originally announced November 2013.

Comments: The experiments for the Penn Treebank corpus were erroneous and have been stripped from this version

Showing 1–50 of 53 results for author: van der Smagt, P