Skip to main content

Showing 1–17 of 17 results for author: Grathwohl, W

  1. arXiv:2302.13834  [pdf, other

    cs.LG stat.ML

    Denoising Diffusion Samplers

    Authors: Francisco Vargas, Will Grathwohl, Arnaud Doucet

    Abstract: Denoising diffusion models are a popular class of generative models providing state-of-the-art results in many domains. One adds gradually noise to data using a diffusion to transform the data distribution into a Gaussian distribution. Samples from the generative model are then obtained by simulating an approximation of the time-reversal of this diffusion initialized by Gaussian samples. Practical… ▽ More

    Submitted 16 August, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

    Comments: In The Eleventh International Conference on Learning Representations, 2023

    Journal ref: In The Eleventh In The Eleventh International Conference on Learning Representations, 2023

  2. arXiv:2302.11552  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and MCMC

    Authors: Yilun Du, Conor Durkan, Robin Strudel, Joshua B. Tenenbaum, Sander Dieleman, Rob Fergus, Jascha Sohl-Dickstein, Arnaud Doucet, Will Grathwohl

    Abstract: Since their introduction, diffusion models have quickly become the prevailing approach to generative modeling in many domains. They can be interpreted as learning the gradients of a time-varying sequence of log-probability density functions. This interpretation has motivated classifier-based and classifier-free guidance as methods for post-hoc control of diffusion models. In this work, we build up… ▽ More

    Submitted 18 November, 2023; v1 submitted 22 February, 2023; originally announced February 2023.

    Comments: ICML 2023, Project Webpage: https://energy-based-model.github.io/reduce-reuse-recycle/

  3. arXiv:2211.15089  [pdf, other

    cs.CL cs.LG

    Continuous diffusion for categorical data

    Authors: Sander Dieleman, Laurent Sartran, Arman Roshannai, Nikolay Savinov, Yaroslav Ganin, Pierre H. Richemond, Arnaud Doucet, Robin Strudel, Chris Dyer, Conor Durkan, Curtis Hawthorne, Rémi Leblond, Will Grathwohl, Jonas Adler

    Abstract: Diffusion models have quickly become the go-to paradigm for generative modelling of perceptual signals (such as images and sound) through iterative refinement. Their success hinges on the fact that the underlying physical phenomena are continuous. For inherently discrete and categorical data such as language, various diffusion-inspired alternatives have been proposed. However, the continuous natur… ▽ More

    Submitted 15 December, 2022; v1 submitted 28 November, 2022; originally announced November 2022.

    Comments: 26 pages, 8 figures; corrections and additional information about hyperparameters

  4. arXiv:2211.04236  [pdf, other

    cs.CL cs.LG

    Self-conditioned Embedding Diffusion for Text Generation

    Authors: Robin Strudel, Corentin Tallec, Florent Altché, Yilun Du, Yaroslav Ganin, Arthur Mensch, Will Grathwohl, Nikolay Savinov, Sander Dieleman, Laurent Sifre, Rémi Leblond

    Abstract: Can continuous diffusion models bring the same performance breakthrough on natural language they did for image generation? To circumvent the discrete nature of text data, we can simply project tokens in a continuous space of embeddings, as is standard in language modeling. We propose Self-conditioned Embedding Diffusion, a continuous diffusion mechanism that operates on token embeddings and allows… ▽ More

    Submitted 8 November, 2022; originally announced November 2022.

    Comments: 15 pages

  5. arXiv:2211.00177  [pdf, other

    cs.LG cs.IR cs.SI

    Learning to Navigate Wikipedia by Taking Random Walks

    Authors: Manzil Zaheer, Kenneth Marino, Will Grathwohl, John Schultz, Wendy Shang, Sheila Babayan, Arun Ahuja, Ishita Dasgupta, Christine Kaeser-Chen, Rob Fergus

    Abstract: A fundamental ability of an intelligent web-based agent is seeking out and acquiring new information. Internet search engines reliably find the correct vicinity but the top results may be a few links away from the desired target. A complementary approach is navigation via hyperlinks, employing a policy that comprehends local content and selects a link that moves it closer to the target. In this pa… ▽ More

    Submitted 31 October, 2022; originally announced November 2022.

    Journal ref: NeurIPS 2022

  6. arXiv:2208.07698  [pdf, other

    stat.ML cs.LG

    Score-Based Diffusion meets Annealed Importance Sampling

    Authors: Arnaud Doucet, Will Grathwohl, Alexander G. D. G. Matthews, Heiko Strathmann

    Abstract: More than twenty years after its introduction, Annealed Importance Sampling (AIS) remains one of the most effective methods for marginal likelihood estimation. It relies on a sequence of distributions interpolating between a tractable initial distribution and the target distribution of interest which we simulate from approximately using a non-homogeneous Markov chain. To obtain an importance sampl… ▽ More

    Submitted 24 October, 2022; v1 submitted 16 August, 2022; originally announced August 2022.

    Comments: accepted at NeurIPS 2022

  7. arXiv:2108.04227  [pdf, other

    cs.CV cs.LG

    Directly Training Joint Energy-Based Models for Conditional Synthesis and Calibrated Prediction of Multi-Attribute Data

    Authors: Jacob Kelly, Richard Zemel, Will Grathwohl

    Abstract: Multi-attribute classification generalizes classification, presenting new challenges for making accurate predictions and quantifying uncertainty. We build upon recent work and show that architectures for multi-attribute prediction can be reinterpreted as energy-based models (EBMs). While existing EBM approaches achieve strong discriminative performance, they are unable to generate samples conditio… ▽ More

    Submitted 19 July, 2021; originally announced August 2021.

  8. arXiv:2102.04509  [pdf, other

    cs.LG

    Oops I Took A Gradient: Scalable Sampling for Discrete Distributions

    Authors: Will Grathwohl, Kevin Swersky, Milad Hashemi, David Duvenaud, Chris J. Maddison

    Abstract: We propose a general and scalable approximate sampling strategy for probabilistic models with discrete variables. Our approach uses gradients of the likelihood function with respect to its discrete inputs to propose updates in a Metropolis-Hastings sampler. We show empirically that this approach outperforms generic samplers in a number of difficult settings including Ising models, Potts models, re… ▽ More

    Submitted 6 June, 2021; v1 submitted 8 February, 2021; originally announced February 2021.

    Comments: Energy-Based Models, Deep generative models, MCMC sampling

  9. arXiv:2010.04230  [pdf, other

    cs.LG cs.AI

    No MCMC for me: Amortized sampling for fast and stable training of energy-based models

    Authors: Will Grathwohl, Jacob Kelly, Milad Hashemi, Mohammad Norouzi, Kevin Swersky, David Duvenaud

    Abstract: Energy-Based Models (EBMs) present a flexible and appealing way to represent uncertainty. Despite recent advances, training EBMs on high-dimensional data remains a challenging problem as the state-of-the-art approaches are costly, unstable, and require considerable tuning and domain expertise to apply successfully. In this work, we present a simple method for training EBMs at scale which uses an e… ▽ More

    Submitted 6 June, 2021; v1 submitted 8 October, 2020; originally announced October 2020.

  10. arXiv:2002.05616  [pdf, other

    stat.ML cs.LG

    Learning the Stein Discrepancy for Training and Evaluating Energy-Based Models without Sampling

    Authors: Will Grathwohl, Kuan-Chieh Wang, Jorn-Henrik Jacobsen, David Duvenaud, Richard Zemel

    Abstract: We present a new method for evaluating and training unnormalized density models. Our approach only requires access to the gradient of the unnormalized model's log-density. We estimate the Stein discrepancy between the data density $p(x)$ and the model density $q(x)$ defined by a vector function of the data. We parameterize this function with a neural network and fit its parameters to maximize the… ▽ More

    Submitted 14 August, 2020; v1 submitted 13 February, 2020; originally announced February 2020.

    Comments: ICML 2020

  11. arXiv:1912.03263  [pdf, other

    cs.LG cs.CV stat.ML

    Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One

    Authors: Will Grathwohl, Kuan-Chieh Wang, Jörn-Henrik Jacobsen, David Duvenaud, Mohammad Norouzi, Kevin Swersky

    Abstract: We propose to reinterpret a standard discriminative classifier of p(y|x) as an energy based model for the joint distribution p(x,y). In this setting, the standard class probabilities can be easily computed as well as unnormalized values of p(x) and p(x|y). Within this framework, standard discriminative architectures may beused and the model can also be trained on unlabeled data. We demonstrate tha… ▽ More

    Submitted 15 September, 2020; v1 submitted 6 December, 2019; originally announced December 2019.

  12. arXiv:1906.01171  [pdf, other

    cs.LG stat.ML

    Understanding the Limitations of Conditional Generative Models

    Authors: Ethan Fetaya, Jörn-Henrik Jacobsen, Will Grathwohl, Richard Zemel

    Abstract: Class-conditional generative models hold promise to overcome the shortcomings of their discriminative counterparts. They are a natural choice to solve discriminative tasks in a robust manner as they jointly optimize for predictive performance and accurate modeling of the input distribution. In this work, we investigate robust classification with likelihood-based generative models from a theoretica… ▽ More

    Submitted 17 February, 2020; v1 submitted 3 June, 2019; originally announced June 2019.

  13. arXiv:1811.00995  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Invertible Residual Networks

    Authors: Jens Behrmann, Will Grathwohl, Ricky T. Q. Chen, David Duvenaud, Jörn-Henrik Jacobsen

    Abstract: We show that standard ResNet architectures can be made invertible, allowing the same model to be used for classification, density estimation, and generation. Typically, enforcing invertibility requires partitioning dimensions or restricting network architectures. In contrast, our approach only requires adding a simple normalization step during training, already available in standard frameworks. In… ▽ More

    Submitted 18 May, 2019; v1 submitted 2 November, 2018; originally announced November 2018.

    Journal ref: Proceedings of the International Conference on Machine Learning (ICML), 2019

  14. arXiv:1810.01367  [pdf, other

    cs.LG cs.CV stat.ML

    FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models

    Authors: Will Grathwohl, Ricky T. Q. Chen, Jesse Bettencourt, Ilya Sutskever, David Duvenaud

    Abstract: A promising class of generative models maps points from a simple distribution to a complex distribution through an invertible neural network. Likelihood-based training of these models requires restricting their architectures to allow cheap computation of Jacobian determinants. Alternatively, the Jacobian trace can be used if the transformation is specified by an ordinary differential equation. In… ▽ More

    Submitted 22 October, 2018; v1 submitted 2 October, 2018; originally announced October 2018.

    Comments: 8 Pages, 6 figures

  15. arXiv:1802.10440  [pdf, other

    cs.LG q-bio.TO

    Precision medicine as a control problem: Using simulation and deep reinforcement learning to discover adaptive, personalized multi-cytokine therapy for sepsis

    Authors: Brenden K. Petersen, Jiachen Yang, Will S. Grathwohl, Chase Cockrell, Claudio Santiago, Gary An, Daniel M. Faissol

    Abstract: Sepsis is a life-threatening condition affecting one million people per year in the US in which dysregulation of the body's own immune system causes damage to its tissues, resulting in a 28 - 50% mortality rate. Clinical trials for sepsis treatment over the last 20 years have failed to produce a single currently FDA approved drug treatment. In this study, we attempt to discover an effective cytoki… ▽ More

    Submitted 8 February, 2018; originally announced February 2018.

  16. arXiv:1711.00123  [pdf, other

    cs.LG

    Backpropagation through the Void: Optimizing control variates for black-box gradient estimation

    Authors: Will Grathwohl, Dami Choi, Yuhuai Wu, Geoffrey Roeder, David Duvenaud

    Abstract: Gradient-based optimization is the foundation of deep learning and reinforcement learning. Even when the mechanism being optimized is unknown or not differentiable, optimization using high-variance or biased gradient estimates is still often the best strategy. We introduce a general framework for learning low-variance, unbiased gradient estimators for black-box functions of random variables. Our m… ▽ More

    Submitted 23 February, 2018; v1 submitted 31 October, 2017; originally announced November 2017.

    Comments: Published at ICLR 2018

  17. arXiv:1612.04440  [pdf, other

    cs.CV cs.LG stat.ML

    Disentangling Space and Time in Video with Hierarchical Variational Auto-encoders

    Authors: Will Grathwohl, Aaron Wilson

    Abstract: There are many forms of feature information present in video data. Principle among them are object identity information which is largely static across multiple video frames, and object pose and style information which continuously transforms from frame to frame. Most existing models confound these two types of representation by mapping them to a shared feature space. In this paper we propose a pro… ▽ More

    Submitted 19 December, 2016; v1 submitted 13 December, 2016; originally announced December 2016.

    Comments: fixed typo in equation 16