Skip to main content

Showing 1–26 of 26 results for author: Hoffman, M D

  1. arXiv:2402.01915  [pdf, other

    cs.CV stat.CO

    Robust Inverse Graphics via Probabilistic Inference

    Authors: Tuan Anh Le, Pavel Sountsov, Matthew D. Hoffman, Ben Lee, Brian Patton, Rif A. Saurous

    Abstract: How do we infer a 3D scene from a single image in the presence of corruptions like rain, snow or fog? Straightforward domain randomization relies on knowing the family of corruptions ahead of time. Here, we propose a Bayesian approach-dubbed robust inverse graphics (RIG)-that relies on a strong scene prior and an uninformative uniform corruption prior, making it applicable to a wide range of corru… ▽ More

    Submitted 11 June, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

    Comments: ICML submission. Reworked main body, new appendix figures

  2. arXiv:2312.02179  [pdf, other

    cs.LG cs.AI cs.CL

    Training Chain-of-Thought via Latent-Variable Inference

    Authors: Du Phan, Matthew D. Hoffman, David Dohan, Sholto Douglas, Tuan Anh Le, Aaron Parisi, Pavel Sountsov, Charles Sutton, Sharad Vikram, Rif A. Saurous

    Abstract: Large language models (LLMs) solve problems more accurately and interpretably when instructed to work out the answer step by step using a ``chain-of-thought'' (CoT) prompt. One can also improve LLMs' performance on a specific task by supervised fine-tuning, i.e., by using gradient ascent on some tunable parameters to maximize the average log-likelihood of correct answers from a labeled training se… ▽ More

    Submitted 28 November, 2023; originally announced December 2023.

    Comments: 23 pages, 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

  3. arXiv:2307.09607  [pdf, other

    cs.LG cs.AI stat.ME stat.ML

    Sequential Monte Carlo Learning for Time Series Structure Discovery

    Authors: Feras A. Saad, Brian J. Patton, Matthew D. Hoffman, Rif A. Saurous, Vikash K. Mansinghka

    Abstract: This paper presents a new approach to automatically discovering accurate models of complex time series data. Working within a Bayesian nonparametric prior over a symbolic space of Gaussian process time series models, we present a novel structure learning algorithm that integrates sequential Monte Carlo (SMC) and involutive MCMC for highly effective posterior inference. Our method can be used both… ▽ More

    Submitted 13 July, 2023; originally announced July 2023.

    Comments: 17 pages, 8 figures, 2 tables. Appearing in ICML 2023

    Journal ref: Proceedings of the 40th International Conference on Machine Learning, PMLR 202:29473-29489, 2023

  4. arXiv:2210.17415  [pdf, other

    cs.CV cs.LG

    ProbNeRF: Uncertainty-Aware Inference of 3D Shapes from 2D Images

    Authors: Matthew D. Hoffman, Tuan Anh Le, Pavel Sountsov, Christopher Suter, Ben Lee, Vikash K. Mansinghka, Rif A. Saurous

    Abstract: The problem of inferring object shape from a single 2D image is underconstrained. Prior knowledge about what objects are plausible can help, but even given such prior knowledge there may still be uncertainty about the shapes of occluded parts of objects. Recently, conditional neural radiance field (NeRF) models have been developed that can learn to infer good point estimates of 3D models from sing… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

    Comments: 18 pages, 18 figures, 1 table; submitted to the 26th International Conference on Artificial Intelligence and Statistics (AISTATS 2023)

    MSC Class: 62F15 (Primary) 68T45 (Secondary) ACM Class: G.3; I.5.1; I.4.10

  5. arXiv:2206.08889  [pdf, other

    stat.ML cs.IT cs.LG

    Lossy Compression with Gaussian Diffusion

    Authors: Lucas Theis, Tim Salimans, Matthew D. Hoffman, Fabian Mentzer

    Abstract: We consider a novel lossy compression approach based on unconditional diffusion generative models, which we call DiffC. Unlike modern compression schemes which rely on transform coding and quantization to restrict the transmitted information, DiffC relies on the efficient communication of pixels corrupted by Gaussian noise. We implement a proof of concept and find that it works surprisingly well d… ▽ More

    Submitted 31 December, 2022; v1 submitted 17 June, 2022; originally announced June 2022.

  6. arXiv:2104.14421  [pdf, other

    cs.LG stat.ML

    What Are Bayesian Neural Network Posteriors Really Like?

    Authors: Pavel Izmailov, Sharad Vikram, Matthew D. Hoffman, Andrew Gordon Wilson

    Abstract: The posterior over Bayesian neural network (BNN) parameters is extremely high-dimensional and non-convex. For computational reasons, researchers approximate this posterior using inexpensive mini-batch methods such as mean-field variational inference or stochastic-gradient Markov chain Monte Carlo (SGMCMC). To investigate foundational questions in Bayesian deep learning, we instead use full-batch H… ▽ More

    Submitted 29 April, 2021; originally announced April 2021.

  7. arXiv:2011.03395  [pdf, other

    cs.LG stat.ML

    Underspecification Presents Challenges for Credibility in Modern Machine Learning

    Authors: Alexander D'Amour, Katherine Heller, Dan Moldovan, Ben Adlam, Babak Alipanahi, Alex Beutel, Christina Chen, Jonathan Deaton, Jacob Eisenstein, Matthew D. Hoffman, Farhad Hormozdiari, Neil Houlsby, Shaobo Hou, Ghassen Jerfel, Alan Karthikesalingam, Mario Lucic, Yian Ma, Cory McLean, Diana Mincu, Akinori Mitani, Andrea Montanari, Zachary Nado, Vivek Natarajan, Christopher Nielson, Thomas F. Osborne , et al. (15 additional authors not shown)

    Abstract: ML models often exhibit unexpectedly poor behavior when they are deployed in real-world domains. We identify underspecification as a key reason for these failures. An ML pipeline is underspecified when it can return many predictors with equivalently strong held-out performance in the training domain. Underspecification is common in modern ML pipelines, such as those based on deep learning. Predict… ▽ More

    Submitted 24 November, 2020; v1 submitted 6 November, 2020; originally announced November 2020.

    Comments: Updates: Updated statistical analysis in Section 6; Additional citations

  8. arXiv:2002.01184  [pdf, ps, other

    stat.CO cs.PL stat.ML

    tfp.mcmc: Modern Markov Chain Monte Carlo Tools Built for Modern Hardware

    Authors: Junpeng Lao, Christopher Suter, Ian Langmore, Cyril Chimisov, Ashish Saxena, Pavel Sountsov, Dave Moore, Rif A. Saurous, Matthew D. Hoffman, Joshua V. Dillon

    Abstract: Markov chain Monte Carlo (MCMC) is widely regarded as one of the most important algorithms of the 20th century. Its guarantees of asymptotic convergence, stability, and estimator-variance bounds using only unnormalized probability functions make it indispensable to probabilistic programming. In this paper, we introduce the TensorFlow Probability MCMC toolkit, and discuss some of the considerations… ▽ More

    Submitted 4 February, 2020; originally announced February 2020.

    Comments: Based on extended abstract submitted to PROBPROG 2020

  9. arXiv:1910.11141  [pdf, other

    cs.DC cs.LG cs.PL

    Automatically Batching Control-Intensive Programs for Modern Accelerators

    Authors: Alexey Radul, Brian Patton, Dougal Maclaurin, Matthew D. Hoffman, Rif A. Saurous

    Abstract: We present a general approach to batching arbitrary computations for accelerators such as GPUs. We show orders-of-magnitude speedups using our method on the No U-Turn Sampler (NUTS), a workhorse algorithm in Bayesian statistics. The central challenge of batching NUTS and other Markov chain Monte Carlo algorithms is data-dependent control flow and recursion. We overcome this by mechanically transfo… ▽ More

    Submitted 12 March, 2020; v1 submitted 23 October, 2019; originally announced October 2019.

    Comments: 10 pages; Machine Learning and Systems 2020

  10. arXiv:1906.03028  [pdf, other

    stat.ML cs.LG cs.PL

    Automatic Reparameterisation of Probabilistic Programs

    Authors: Maria I. Gorinova, Dave Moore, Matthew D. Hoffman

    Abstract: Probabilistic programming has emerged as a powerful paradigm in statistics, applied science, and machine learning: by decoupling modelling from inference, it promises to allow modellers to directly reason about the processes generating data. However, the performance of inference algorithms can be dramatically affected by the parameterisation used to express a model, requiring users to transform th… ▽ More

    Submitted 7 June, 2019; originally announced June 2019.

  11. arXiv:1811.11926  [pdf, other

    cs.LG cs.PL stat.ML

    Autoconj: Recognizing and Exploiting Conjugacy Without a Domain-Specific Language

    Authors: Matthew D. Hoffman, Matthew J. Johnson, Dustin Tran

    Abstract: Deriving conditional and marginal distributions using conjugacy relationships can be time consuming and error prone. In this paper, we propose a strategy for automating such derivations. Unlike previous systems which focus on relationships between pairs of random variables, our system (which we call Autoconj) operates directly on Python functions that compute log-joint distribution functions. Auto… ▽ More

    Submitted 28 November, 2018; originally announced November 2018.

    Comments: Appears in Neural Information Processing Systems, 2018. Code available at https://github.com/google-research/autoconj

  12. arXiv:1810.06891  [pdf, other

    cs.LG stat.ML

    The LORACs prior for VAEs: Letting the Trees Speak for the Data

    Authors: Sharad Vikram, Matthew D. Hoffman, Matthew J. Johnson

    Abstract: In variational autoencoders, the prior on the latent codes $z$ is often treated as an afterthought, but the prior shapes the kind of latent representation that the model learns. If the goal is to learn a representation that is interpretable and useful, then the prior should reflect the ways in which the high-level factors that describe the data vary. The "default" prior is an isotropic normal, but… ▽ More

    Submitted 16 October, 2018; originally announced October 2018.

  13. arXiv:1809.04281  [pdf, other

    cs.LG cs.SD eess.AS stat.ML

    Music Transformer

    Authors: Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Noam Shazeer, Ian Simon, Curtis Hawthorne, Andrew M. Dai, Matthew D. Hoffman, Monica Dinculescu, Douglas Eck

    Abstract: Music relies heavily on repetition to build structure and meaning. Self-reference occurs on multiple timescales, from motifs to phrases to reusing of entire sections of music, such as in pieces with ABA structure. The Transformer (Vaswani et al., 2017), a sequence model based on self-attention, has achieved compelling results in many generation tasks that require maintaining long-range coherence.… ▽ More

    Submitted 12 December, 2018; v1 submitted 12 September, 2018; originally announced September 2018.

    Comments: Improved skewing section and accompanying figures. Previous titles are "An Improved Relative Self-Attention Mechanism for Transformer with Application to Music Generation" and "Music Transformer"

  14. arXiv:1802.05814  [pdf, other

    stat.ML cs.IR cs.LG

    Variational Autoencoders for Collaborative Filtering

    Authors: Dawen Liang, Rahul G. Krishnan, Matthew D. Hoffman, Tony Jebara

    Abstract: We extend variational autoencoders (VAEs) to collaborative filtering for implicit feedback. This non-linear probabilistic model enables us to go beyond the limited modeling capacity of linear factor models which still largely dominate collaborative filtering research.We introduce a generative model with multinomial likelihood and use Bayesian inference for parameter estimation. Despite widespread… ▽ More

    Submitted 15 February, 2018; originally announced February 2018.

    Comments: 10 pages, 3 figures. WWW 2018

  15. arXiv:1711.09268  [pdf, other

    stat.ML cs.AI cs.LG

    Generalizing Hamiltonian Monte Carlo with Neural Networks

    Authors: Daniel Levy, Matthew D. Hoffman, Jascha Sohl-Dickstein

    Abstract: We present a general-purpose method to train Markov chain Monte Carlo kernels, parameterized by deep neural networks, that converge and mix quickly to their target distribution. Our method generalizes Hamiltonian Monte Carlo and is trained to maximize expected squared jumped distance, a proxy for mixing speed. We demonstrate large empirical gains on a collection of simple but challenging distribut… ▽ More

    Submitted 2 March, 2018; v1 submitted 25 November, 2017; originally announced November 2017.

    Comments: ICLR 2018

  16. arXiv:1704.04997  [pdf, other

    stat.ML cs.LG

    Multimodal Prediction and Personalization of Photo Edits with Deep Generative Models

    Authors: Ardavan Saeedi, Matthew D. Hoffman, Stephen J. DiVerdi, Asma Ghandeharioun, Matthew J. Johnson, Ryan P. Adams

    Abstract: Professional-grade software applications are powerful but complicated$-$expert users can achieve impressive results, but novices often struggle to complete even basic tasks. Photo editing is a prime example: after loading a photo, the user is confronted with an array of cryptic sliders like "clarity", "temp", and "highlights". An automatically generated suggestion could help, but there is no singl… ▽ More

    Submitted 17 April, 2017; originally announced April 2017.

  17. arXiv:1704.04289  [pdf, other

    stat.ML cs.LG

    Stochastic Gradient Descent as Approximate Bayesian Inference

    Authors: Stephan Mandt, Matthew D. Hoffman, David M. Blei

    Abstract: Stochastic Gradient Descent with a constant learning rate (constant SGD) simulates a Markov chain with a stationary distribution. With this perspective, we derive several new results. (1) We show that constant SGD can be used as an approximate Bayesian posterior inference algorithm. Specifically, we show how to adjust the tuning parameters of constant SGD to best match the stationary distribution… ▽ More

    Submitted 19 January, 2018; v1 submitted 13 April, 2017; originally announced April 2017.

    Comments: 35 pages, published version (JMLR 2017)

    Journal ref: Journal of Machine Learning Research 18 (2017) 1-35

  18. arXiv:1701.03757  [pdf, ps, other

    stat.ML cs.AI cs.LG cs.PL stat.CO

    Deep Probabilistic Programming

    Authors: Dustin Tran, Matthew D. Hoffman, Rif A. Saurous, Eugene Brevdo, Kevin Murphy, David M. Blei

    Abstract: We propose Edward, a Turing-complete probabilistic programming language. Edward defines two compositional representations---random variables and inference. By treating inference as a first class citizen, on a par with modeling, we show that probabilistic programming can be as flexible and computationally efficient as traditional deep learning. For flexibility, Edward makes it easy to fit the same… ▽ More

    Submitted 7 March, 2017; v1 submitted 13 January, 2017; originally announced January 2017.

    Comments: Appears in International Conference on Learning Representations, 2017. A companion webpage for this paper is available at http://edwardlib.org/iclr2017

  19. arXiv:1602.02666  [pdf, other

    stat.ML cs.LG

    A Variational Analysis of Stochastic Gradient Algorithms

    Authors: Stephan Mandt, Matthew D. Hoffman, David M. Blei

    Abstract: Stochastic Gradient Descent (SGD) is an important algorithm in machine learning. With constant learning rates, it is a stochastic process that, after an initial phase of convergence, generates samples from a stationary distribution. We show that SGD with constant rates can be effectively used as an approximate posterior inference algorithm for probabilistic modeling. Specifically, we show how to a… ▽ More

    Submitted 8 February, 2016; originally announced February 2016.

    Comments: 8 pages, 3 figures

    Journal ref: International Conference on Machine Learning (ICML 2016), p. 354--363

  20. arXiv:1509.07164  [pdf, other

    cs.MS

    The Stan Math Library: Reverse-Mode Automatic Differentiation in C++

    Authors: Bob Carpenter, Matthew D. Hoffman, Marcus Brubaker, Daniel Lee, Peter Li, Michael Betancourt

    Abstract: As computational challenges in optimization and statistical inference grow ever harder, algorithms that utilize derivatives are becoming increasingly more important. The implementation of the derivatives that make these algorithms so powerful, however, is a substantial user burden and the practicality of these algorithms depends critically on tools like automatic differentiation that remove the im… ▽ More

    Submitted 23 September, 2015; originally announced September 2015.

    Comments: 96 pages, 9 figures

    ACM Class: G.1.0; G.1.3; G.1.4; F.2.1

  21. arXiv:1411.6909  [pdf, other

    cs.CV

    Image Classification and Retrieval from User-Supplied Tags

    Authors: Hamid Izadinia, Ali Farhadi, Aaron Hertzmann, Matthew D. Hoffman

    Abstract: This paper proposes direct learning of image classification from user-supplied tags, without filtering. Each tag is supplied by the user who shared the image online. Enormous numbers of these tags are freely available online, and they give insight about the image categories important to users and to image classification. Our approach is complementary to the conventional approach of manual annotati… ▽ More

    Submitted 25 November, 2014; originally announced November 2014.

  22. arXiv:1411.1804  [pdf, other

    stat.ML cs.LG

    Beta Process Non-negative Matrix Factorization with Stochastic Structured Mean-Field Variational Inference

    Authors: Dawen Liang, Matthew D. Hoffman

    Abstract: Beta process is the standard nonparametric Bayesian prior for latent factor model. In this paper, we derive a structured mean-field variational inference algorithm for a beta process non-negative matrix factorization (NMF) model with Poisson likelihood. Unlike the linear Gaussian model, which is well-studied in the nonparametric Bayesian literature, NMF model with beta process prior does not enjoy… ▽ More

    Submitted 2 December, 2014; v1 submitted 6 November, 2014; originally announced November 2014.

    Comments: 6 pages, 1 figure

  23. arXiv:1404.4114  [pdf, other

    cs.LG

    Structured Stochastic Variational Inference

    Authors: Matthew D. Hoffman, David M. Blei

    Abstract: Stochastic variational inference makes it possible to approximate posterior distributions induced by large datasets quickly using stochastic optimization. The algorithm relies on the use of fully factorized variational distributions. However, this "mean-field" independence approximation limits the fidelity of the posterior approximation, and introduces local optima. We show how to relax the mean-f… ▽ More

    Submitted 25 November, 2014; v1 submitted 15 April, 2014; originally announced April 2014.

  24. arXiv:1312.5857  [pdf, ps, other

    stat.ML cs.LG

    A Generative Product-of-Filters Model of Audio

    Authors: Dawen Liang, Matthew D. Hoffman, Gautham J. Mysore

    Abstract: We propose the product-of-filters (PoF) model, a generative model that decomposes audio spectra as sparse linear combinations of "filters" in the log-spectral domain. PoF makes similar assumptions to those used in the classic homomorphic filtering approach to signal processing, but replaces hand-designed decompositions built of basic signal processing operations with a learned decomposition based… ▽ More

    Submitted 25 November, 2014; v1 submitted 20 December, 2013; originally announced December 2013.

    Comments: ICLR 2014 conference-track submission. Added link to the source code

  25. arXiv:1111.4246  [pdf, other

    stat.CO cs.LG

    The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo

    Authors: Matthew D. Hoffman, Andrew Gelman

    Abstract: Hamiltonian Monte Carlo (HMC) is a Markov chain Monte Carlo (MCMC) algorithm that avoids the random walk behavior and sensitivity to correlated parameters that plague many MCMC methods by taking a series of steps informed by first-order gradient information. These features allow it to converge to high-dimensional target distributions much more quickly than simpler methods such as random walk Metro… ▽ More

    Submitted 17 November, 2011; originally announced November 2011.

    Comments: 30 pages, 7 figures

  26. arXiv:1009.5761  [pdf, ps, other

    cs.SD cs.LG

    Approximate Maximum A Posteriori Inference with Entropic Priors

    Authors: Matthew D. Hoffman

    Abstract: In certain applications it is useful to fit multinomial distributions to observed data with a penalty term that encourages sparsity. For example, in probabilistic latent audio source decomposition one may wish to encode the assumption that only a few latent sources are active at any given time. The standard heuristic of applying an L1 penalty is not an option when fitting the parameters to a multi… ▽ More

    Submitted 28 September, 2010; originally announced September 2010.