subscribe to arXiv mailings

Variational Inference for Uncertainty Quantification: an Analysis of Trade-offs

Authors: Charles C. Margossian, Loucas Pillaud-Vivien, Lawrence K. Saul

Abstract: Given an intractable distribution $p$, the problem of variational inference (VI) is to find the best approximation from some more tractable family $Q$. Commonly, one chooses $Q$ to be a family of factorized distributions (i.e., the mean-field assumption), even though~$p$ itself does not factorize. We show that this mismatch leads to an impossibility theorem: if $p$ does not factorize, then any fac… ▽ More Given an intractable distribution $p$, the problem of variational inference (VI) is to find the best approximation from some more tractable family $Q$. Commonly, one chooses $Q$ to be a family of factorized distributions (i.e., the mean-field assumption), even though~$p$ itself does not factorize. We show that this mismatch leads to an impossibility theorem: if $p$ does not factorize, then any factorized approximation $q\in Q$ can correctly estimate at most one of the following three measures of uncertainty: (i) the marginal variances, (ii) the marginal precisions, or (iii) the generalized variance (which can be related to the entropy). In practice, the best variational approximation in $Q$ is found by minimizing some divergence $D(q,p)$ between distributions, and so we ask: how does the choice of divergence determine which measure of uncertainty, if any, is correctly estimated by VI? We consider the classic Kullback-Leibler divergences, the more general Rényi divergences, and a score-based divergence which compares $\nabla \log p$ and $\nabla \log q$. We provide a thorough theoretical analysis in the setting where $p$ is a Gaussian and $q$ is a (factorized) Gaussian. We show that all the considered divergences can be \textit{ordered} based on the estimates of uncertainty they yield as objective functions for~VI. Finally, we empirically evaluate the validity of this ordering when the target distribution $p$ is not Gaussian. △ Less

Submitted 7 June, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

arXiv:2402.19455 [pdf, other]

Listening to the Noise: Blind Denoising with Gibbs Diffusion

Authors: David Heurtel-Depeiges, Charles C. Margossian, Ruben Ohana, Bruno Régaldo-Saint Blancard

Abstract: In recent years, denoising problems have become intertwined with the development of deep generative models. In particular, diffusion models are trained like denoisers, and the distribution they model coincide with denoising priors in the Bayesian picture. However, denoising through diffusion-based posterior sampling requires the noise level and covariance to be known, preventing blind denoising. W… ▽ More In recent years, denoising problems have become intertwined with the development of deep generative models. In particular, diffusion models are trained like denoisers, and the distribution they model coincide with denoising priors in the Bayesian picture. However, denoising through diffusion-based posterior sampling requires the noise level and covariance to be known, preventing blind denoising. We overcome this limitation by introducing Gibbs Diffusion (GDiff), a general methodology addressing posterior sampling of both the signal and the noise parameters. Assuming arbitrary parametric Gaussian noise, we develop a Gibbs algorithm that alternates sampling steps from a conditional diffusion model trained to map the signal prior to the family of noise distributions, and a Monte Carlo sampler to infer the noise parameters. Our theoretical analysis highlights potential pitfalls, guides diagnostic usage, and quantifies errors in the Gibbs stationary distribution caused by the diffusion model. We showcase our method for 1) blind denoising of natural images involving colored noises with unknown amplitude and spectral index, and 2) a cosmology problem, namely the analysis of cosmic microwave background data, where Bayesian inference of "noise" parameters means constraining models of the evolution of the Universe. △ Less

Submitted 25 June, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

Comments: 12+9 pages, 7+5 figures, 1+1 tables; accepted to 2024 International Conference on Machine Learning; code: https://github.com/rubenohana/Gibbs-Diffusion

arXiv:2402.14758 [pdf, other]

Batch and match: black-box variational inference with a score-based divergence

Authors: Diana Cai, Chirag Modi, Loucas Pillaud-Vivien, Charles C. Margossian, Robert M. Gower, David M. Blei, Lawrence K. Saul

Abstract: Most leading implementations of black-box variational inference (BBVI) are based on optimizing a stochastic evidence lower bound (ELBO). But such approaches to BBVI often converge slowly due to the high variance of their gradient estimates and their sensitivity to hyperparameters. In this work, we propose batch and match (BaM), an alternative approach to BBVI based on a score-based divergence. Not… ▽ More Most leading implementations of black-box variational inference (BBVI) are based on optimizing a stochastic evidence lower bound (ELBO). But such approaches to BBVI often converge slowly due to the high variance of their gradient estimates and their sensitivity to hyperparameters. In this work, we propose batch and match (BaM), an alternative approach to BBVI based on a score-based divergence. Notably, this score-based divergence can be optimized by a closed-form proximal update for Gaussian variational families with full covariance matrices. We analyze the convergence of BaM when the target distribution is Gaussian, and we prove that in the limit of infinite batch size the variational parameter updates converge exponentially quickly to the target mean and covariance. We also evaluate the performance of BaM on Gaussian and non-Gaussian target distributions that arise from posterior inference in hierarchical and deep generative models. In these experiments, we find that BaM typically converges in fewer (and sometimes significantly fewer) gradient evaluations than leading implementations of BBVI based on ELBO maximization. △ Less

Submitted 12 June, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

Comments: 49 pages, 14 figures. To appear in the Proceedings of the 41st International Conference on Machine Learning (ICML), 2024

arXiv:2307.11018 [pdf, other]

Amortized Variational Inference: When and Why?

Authors: Charles C. Margossian, David M. Blei

Abstract: In a probabilistic latent variable model, factorized (or mean-field) variational inference (F-VI) fits a separate parametric distribution for each latent variable. Amortized variational inference (A-VI) instead learns a common inference function, which maps each observation to its corresponding latent variable's approximate posterior. Typically, A-VI is used as a step in the training of variationa… ▽ More In a probabilistic latent variable model, factorized (or mean-field) variational inference (F-VI) fits a separate parametric distribution for each latent variable. Amortized variational inference (A-VI) instead learns a common inference function, which maps each observation to its corresponding latent variable's approximate posterior. Typically, A-VI is used as a step in the training of variational autoencoders, however it stands to reason that A-VI could also be used as a general alternative to F-VI. In this paper we study when and why A-VI can be used for approximate Bayesian inference. We derive conditions on a latent variable model which are necessary, sufficient, and verifiable under which A-VI can attain F-VI's optimal solution, thereby closing the amortization gap. We prove these conditions are uniquely verified by simple hierarchical models, a broad class that encompasses many models in machine learning. We then show, on a broader class of models, how to expand the domain of AVI's inference function to improve its solution, and we provide examples, e.g. hidden Markov models, where the amortization gap cannot be closed. △ Less

Submitted 23 May, 2024; v1 submitted 20 July, 2023; originally announced July 2023.

arXiv:1811.05031 [pdf, other]

doi 10.1002/WIDM.1305

A Review of automatic differentiation and its efficient implementation

Authors: Charles C. Margossian

Abstract: Derivatives play a critical role in computational statistics, examples being Bayesian inference using Hamiltonian Monte Carlo sampling and the training of neural networks. Automatic differentiation is a powerful tool to automate the calculation of derivatives and is preferable to more traditional methods, especially when differentiating complex algorithms and mathematical functions. The implementa… ▽ More Derivatives play a critical role in computational statistics, examples being Bayesian inference using Hamiltonian Monte Carlo sampling and the training of neural networks. Automatic differentiation is a powerful tool to automate the calculation of derivatives and is preferable to more traditional methods, especially when differentiating complex algorithms and mathematical functions. The implementation of automatic differentiation however requires some care to insure efficiency. Modern differentiation packages deploy a broad range of computational techniques to improve applicability, run time, and memory management. Among these techniques are operation overloading, region based memory, and expression templates. There also exist several mathematical techniques which can yield high performance gains when applied to complex algorithms. For example, semi-analytical derivatives can reduce by orders of magnitude the runtime required to numerically solve and differentiate an algebraic equation. Open problems include the extension of current packages to provide more specialized routines, and efficient methods to perform higher-order differentiation. △ Less

Submitted 14 January, 2019; v1 submitted 12 November, 2018; originally announced November 2018.

Comments: 32 pages, 5 figures, submitted for publication. WIREs Data Mining Knowl Discov, March 2019

Showing 1–5 of 5 results for author: Margossian, C C