subscribe to arXiv mailings

Variational Distillation of Diffusion Policies into Mixture of Experts

Authors: Hongyi Zhou, Denis Blessing, Ge Li, Onur Celik, Xiaogang Jia, Gerhard Neumann, Rudolf Lioutikov

Abstract: This work introduces Variational Diffusion Distillation (VDD), a novel method that distills denoising diffusion policies into Mixtures of Experts (MoE) through variational inference. Diffusion Models are the current state-of-the-art in generative modeling due to their exceptional ability to accurately learn and represent complex, multi-modal distributions. This ability allows Diffusion Models to r… ▽ More This work introduces Variational Diffusion Distillation (VDD), a novel method that distills denoising diffusion policies into Mixtures of Experts (MoE) through variational inference. Diffusion Models are the current state-of-the-art in generative modeling due to their exceptional ability to accurately learn and represent complex, multi-modal distributions. This ability allows Diffusion Models to replicate the inherent diversity in human behavior, making them the preferred models in behavior learning such as Learning from Human Demonstrations (LfD). However, diffusion models come with some drawbacks, including the intractability of likelihoods and long inference times due to their iterative sampling process. The inference times, in particular, pose a significant challenge to real-time applications such as robot control. In contrast, MoEs effectively address the aforementioned issues while retaining the ability to represent complex distributions but are notoriously difficult to train. VDD is the first method that distills pre-trained diffusion models into MoE models, and hence, combines the expressiveness of Diffusion Models with the benefits of Mixture Models. Specifically, VDD leverages a decompositional upper bound of the variational objective that allows the training of each expert separately, resulting in a robust optimization scheme for MoEs. VDD demonstrates across nine complex behavior learning tasks, that it is able to: i) accurately distill complex distributions learned by the diffusion model, ii) outperform existing state-of-the-art distillation methods, and iii) surpass conventional methods for training MoE. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.08234 [pdf, other]

MaIL: Improving Imitation Learning with Mamba

Authors: Xiaogang Jia, Qian Wang, Atalay Donat, Bowen Xing, Ge Li, Hongyi Zhou, Onur Celik, Denis Blessing, Rudolf Lioutikov, Gerhard Neumann

Abstract: This work introduces Mamba Imitation Learning (MaIL), a novel imitation learning (IL) architecture that offers a computationally efficient alternative to state-of-the-art (SoTA) Transformer policies. Transformer-based policies have achieved remarkable results due to their ability in handling human-recorded data with inherently non-Markovian behavior. However, their high performance comes with the… ▽ More This work introduces Mamba Imitation Learning (MaIL), a novel imitation learning (IL) architecture that offers a computationally efficient alternative to state-of-the-art (SoTA) Transformer policies. Transformer-based policies have achieved remarkable results due to their ability in handling human-recorded data with inherently non-Markovian behavior. However, their high performance comes with the drawback of large models that complicate effective training. While state space models (SSMs) have been known for their efficiency, they were not able to match the performance of Transformers. Mamba significantly improves the performance of SSMs and rivals against Transformers, positioning it as an appealing alternative for IL policies. MaIL leverages Mamba as a backbone and introduces a formalism that allows using Mamba in the encoder-decoder structure. This formalism makes it a versatile architecture that can be used as a standalone policy or as part of a more advanced architecture, such as a diffuser in the diffusion process. Extensive evaluations on the LIBERO IL benchmark and three real robot experiments show that MaIL: i) outperforms Transformers in all LIBERO tasks, ii) achieves good performance even with small datasets, iii) is able to effectively process multi-modal sensory inputs, iv) is more robust to input noise compared to Transformers. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.07423 [pdf, other]

Beyond ELBOs: A Large-Scale Evaluation of Variational Methods for Sampling

Authors: Denis Blessing, Xiaogang Jia, Johannes Esslinger, Francisco Vargas, Gerhard Neumann

Abstract: Monte Carlo methods, Variational Inference, and their combinations play a pivotal role in sampling from intractable probability distributions. However, current studies lack a unified evaluation framework, relying on disparate performance measures and limited method comparisons across diverse tasks, complicating the assessment of progress and hindering the decision-making of practitioners. In respo… ▽ More Monte Carlo methods, Variational Inference, and their combinations play a pivotal role in sampling from intractable probability distributions. However, current studies lack a unified evaluation framework, relying on disparate performance measures and limited method comparisons across diverse tasks, complicating the assessment of progress and hindering the decision-making of practitioners. In response to these challenges, our work introduces a benchmark that evaluates sampling methods using a standardized task suite and a broad range of performance criteria. Moreover, we study existing metrics for quantifying mode collapse and introduce novel metrics for this purpose. Our findings provide insights into strengths and weaknesses of existing sampling methods, serving as a valuable reference for future developments. The code is publicly available here. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2402.14606 [pdf, other]

Towards Diverse Behaviors: A Benchmark for Imitation Learning with Human Demonstrations

Authors: Xiaogang Jia, Denis Blessing, Xinkai Jiang, Moritz Reuss, Atalay Donat, Rudolf Lioutikov, Gerhard Neumann

Abstract: Imitation learning with human data has demonstrated remarkable success in teaching robots in a wide range of skills. However, the inherent diversity in human behavior leads to the emergence of multi-modal data distributions, thereby presenting a formidable challenge for existing imitation learning algorithms. Quantifying a model's capacity to capture and replicate this diversity effectively is sti… ▽ More Imitation learning with human data has demonstrated remarkable success in teaching robots in a wide range of skills. However, the inherent diversity in human behavior leads to the emergence of multi-modal data distributions, thereby presenting a formidable challenge for existing imitation learning algorithms. Quantifying a model's capacity to capture and replicate this diversity effectively is still an open problem. In this work, we introduce simulation benchmark environments and the corresponding Datasets with Diverse human Demonstrations for Imitation Learning (D3IL), designed explicitly to evaluate a model's ability to learn multi-modal behavior. Our environments are designed to involve multiple sub-tasks that need to be solved, consider manipulation of multiple objects which increases the diversity of the behavior and can only be solved by policies that rely on closed loop sensory feedback. Other available datasets are missing at least one of these challenging properties. To address the challenge of diversity quantification, we introduce tractable metrics that provide valuable insights into a model's ability to acquire and reproduce diverse behaviors. These metrics offer a practical means to assess the robustness and versatility of imitation learning algorithms. Furthermore, we conduct a thorough evaluation of state-of-the-art methods on the proposed task suite. This evaluation serves as a benchmark for assessing their capability to learn diverse behaviors. Our findings shed light on the effectiveness of these methods in tackling the intricate problem of capturing and generalizing multi-modal human behaviors, offering a valuable reference for the design of future imitation learning algorithms. △ Less

Submitted 22 February, 2024; originally announced February 2024.

arXiv:2307.01050 [pdf, other]

Transport meets Variational Inference: Controlled Monte Carlo Diffusions

Authors: Francisco Vargas, Shreyas Padhy, Denis Blessing, Nikolas Nüsken

Abstract: Connecting optimal transport and variational inference, we present a principled and systematic framework for sampling and generative modelling centred around divergences on path space. Our work culminates in the development of the \emph{Controlled Monte Carlo Diffusion} sampler (CMCD) for Bayesian computation, a score-based annealing technique that crucially adapts both forward and backward dynami… ▽ More Connecting optimal transport and variational inference, we present a principled and systematic framework for sampling and generative modelling centred around divergences on path space. Our work culminates in the development of the \emph{Controlled Monte Carlo Diffusion} sampler (CMCD) for Bayesian computation, a score-based annealing technique that crucially adapts both forward and backward dynamics in a diffusion model. On the way, we clarify the relationship between the EM-algorithm and iterative proportional fitting (IPF) for Schr{ö}dinger bridges, deriving as well a regularised objective that bypasses the iterative bottleneck of standard IPF-updates. Finally, we show that CMCD has a strong foundation in the Jarzinsky and Crooks identities from statistical physics, and that it convincingly outperforms competing approaches across a wide array of experiments. △ Less

Submitted 3 July, 2024; v1 submitted 3 July, 2023; originally announced July 2023.

Comments: Workshop on New Frontiers in Learning, Control, and Dynamical Systems at the International Conference on Machine Learning (ICML), Honolulu, Hawaii, USA, 2023

arXiv:2306.16597 [pdf, other]

Weighted Birkhoff Averages and the Parameterization Method

Authors: David Blessing, J. D. Mireles James

Abstract: This work provides a systematic recipe for computing accurate high order Fourier expansions of quasiperiodic invariant circles in area preserving maps. The recipe requires only a finite data set sampled from the quasiperiodic circle. Our approach, being based on the parameterization method, uses a Newton scheme to iteratively solve a conjugacy equation describing the invariant circle. A critical s… ▽ More This work provides a systematic recipe for computing accurate high order Fourier expansions of quasiperiodic invariant circles in area preserving maps. The recipe requires only a finite data set sampled from the quasiperiodic circle. Our approach, being based on the parameterization method, uses a Newton scheme to iteratively solve a conjugacy equation describing the invariant circle. A critical step in properly formulating the conjugacy equation is to determine the rotation number of the quasiperiodic subsystem. For this we exploit a the weighted Birkhoff averaging method. This approach facilities accurate computation of the rotation number given nothing but the already mentioned orbit data. The weighted Birkhoff averages also facilitate the computation of other integral observables like Fourier coefficients of the parameterization of the invariant circle. Since the parameterization method is based on a Newton scheme, we only need to approximate a small number of Fourier coefficients with low accuracy to find a good enough initial approximation so that Newton converges. Moreover, the Fourier coefficients may be computed independently, so we can sample the higher modes to guess the decay rate of the Fourier coefficients. This allows us to choose, a-priori, an appropriate number of modes in the truncation. We illustrate the utility of the approach for explicit example systems including the area preserving Henon map and the standard map. We present example computations for invariant circles with period as low as 1 and up to more than 100. We also employ a numerical continuation scheme to compute large numbers of quasiperiodic circles in these systems. During the continuation we monitor the Sobolev norm of the Parameterization to automatically detect the breakdown of the family. △ Less

Submitted 28 June, 2023; originally announced June 2023.

Comments: 38 pages, 15 figures

arXiv:2304.05171 [pdf, other]

doi 10.1109/ICRA48891.2023.10160543

Curriculum-Based Imitation of Versatile Skills

Authors: Maximilian Xiling Li, Onur Celik, Philipp Becker, Denis Blessing, Rudolf Lioutikov, Gerhard Neumann

Abstract: Learning skills by imitation is a promising concept for the intuitive teaching of robots. A common way to learn such skills is to learn a parametric model by maximizing the likelihood given the demonstrations. Yet, human demonstrations are often multi-modal, i.e., the same task is solved in multiple ways which is a major challenge for most imitation learning methods that are based on such a maximu… ▽ More Learning skills by imitation is a promising concept for the intuitive teaching of robots. A common way to learn such skills is to learn a parametric model by maximizing the likelihood given the demonstrations. Yet, human demonstrations are often multi-modal, i.e., the same task is solved in multiple ways which is a major challenge for most imitation learning methods that are based on such a maximum likelihood (ML) objective. The ML objective forces the model to cover all data, it prevents specialization in the context space and can cause mode-averaging in the behavior space, leading to suboptimal or potentially catastrophic behavior. Here, we alleviate those issues by introducing a curriculum using a weight for each data point, allowing the model to specialize on data it can represent while incentivizing it to cover as much data as possible by an entropy bonus. We extend our algorithm to a Mixture of (linear) Experts (MoE) such that the single components can specialize on local context regions, while the MoE covers all data points. We evaluate our approach in complex simulated and real robot control tasks and show it learns from versatile human demonstrations and significantly outperforms current SOTA methods. A reference implementation can be found at https://github.com/intuitive-robots/ml-cur △ Less

Submitted 11 April, 2023; originally announced April 2023.

Journal ref: 2023 IEEE International Conference on Robotics and Automation (ICRA)

arXiv:2303.15349 [pdf, other]

Information Maximizing Curriculum: A Curriculum-Based Approach for Imitating Diverse Skills

Authors: Denis Blessing, Onur Celik, Xiaogang Jia, Moritz Reuss, Maximilian Xiling Li, Rudolf Lioutikov, Gerhard Neumann

Abstract: Imitation learning uses data for training policies to solve complex tasks. However, when the training data is collected from human demonstrators, it often leads to multimodal distributions because of the variability in human actions. Most imitation learning methods rely on a maximum likelihood (ML) objective to learn a parameterized policy, but this can result in suboptimal or unsafe behavior due… ▽ More Imitation learning uses data for training policies to solve complex tasks. However, when the training data is collected from human demonstrators, it often leads to multimodal distributions because of the variability in human actions. Most imitation learning methods rely on a maximum likelihood (ML) objective to learn a parameterized policy, but this can result in suboptimal or unsafe behavior due to the mode-averaging property of the ML objective. In this work, we propose Information Maximizing Curriculum, a curriculum-based approach that assigns a weight to each data point and encourages the model to specialize in the data it can represent, effectively mitigating the mode-averaging problem by allowing the model to ignore data from modes it cannot represent. To cover all modes and thus, enable diverse behavior, we extend our approach to a mixture of experts (MoE) policy, where each mixture component selects its own subset of the training data for learning. A novel, maximum entropy-based objective is proposed to achieve full coverage of the dataset, thereby enabling the policy to encompass all modes within the data distribution. We demonstrate the effectiveness of our approach on complex simulated control tasks using diverse human demonstrations, achieving superior performance compared to state-of-the-art methods. △ Less

Submitted 31 October, 2023; v1 submitted 27 March, 2023; originally announced March 2023.

arXiv:1401.2499 [pdf, ps, other]

On (t,r) Broadcast Domination Numbers of Grids

Authors: David Blessing, Erik Insko, Katie Johnson, Christie Mauretour

Abstract: The domination number of a graph $G = (V,E)$ is the minimum cardinality of any subset $S \subset V$ such that every vertex in $V$ is in $S$ or adjacent to an element of $S$. Finding the domination numbers of $m$ by $n$ grids was an open problem for nearly 30 years and was finally solved in 2011 by Goncalves, Pinlou, Rao, and Thomassé. Many variants of domination number on graphs have been defined… ▽ More The domination number of a graph $G = (V,E)$ is the minimum cardinality of any subset $S \subset V$ such that every vertex in $V$ is in $S$ or adjacent to an element of $S$. Finding the domination numbers of $m$ by $n$ grids was an open problem for nearly 30 years and was finally solved in 2011 by Goncalves, Pinlou, Rao, and Thomassé. Many variants of domination number on graphs have been defined and studied, but exact values have not yet been obtained for grids. We will define a family of domination theories parameterized by pairs of positive integers $(t,r)$ where $1 \leq r \leq t$ which generalize domination and distance domination theories for graphs. We call these domination numbers the $(t,r)$ broadcast domination numbers. We give the exact values of $(t,r)$ broadcast domination numbers for small grids, and we identify upper bounds for the $(t,r)$ broadcast domination numbers for large grids and conjecture that these bounds are tight for sufficiently large grids. △ Less

Submitted 10 January, 2014; originally announced January 2014.

Comments: 28 pages, 43 figures

MSC Class: 05C69; 05C12; 05C30; 68R05; 68R10

Showing 1–9 of 9 results for author: Blessing, D