-
Local Convergence of Gradient Descent-Ascent for Training Generative Adversarial Networks
Authors:
Evan Becker,
Parthe Pandit,
Sundeep Rangan,
Alyson K. Fletcher
Abstract:
Generative Adversarial Networks (GANs) are a popular formulation to train generative models for complex high dimensional data. The standard method for training GANs involves a gradient descent-ascent (GDA) procedure on a minimax optimization problem. This procedure is hard to analyze in general due to the nonlinear nature of the dynamics. We study the local dynamics of GDA for training a GAN with…
▽ More
Generative Adversarial Networks (GANs) are a popular formulation to train generative models for complex high dimensional data. The standard method for training GANs involves a gradient descent-ascent (GDA) procedure on a minimax optimization problem. This procedure is hard to analyze in general due to the nonlinear nature of the dynamics. We study the local dynamics of GDA for training a GAN with a kernel-based discriminator. This convergence analysis is based on a linearization of a non-linear dynamical system that describes the GDA iterations, under an \textit{isolated points model} assumption from [Becker et al. 2022]. Our analysis brings out the effect of the learning rates, regularization, and the bandwidth of the kernel discriminator, on the local convergence rate of GDA. Importantly, we show phase transitions that indicate when the system converges, oscillates, or diverges. We also provide numerical simulations that verify our claims.
△ Less
Submitted 29 May, 2023; v1 submitted 14 May, 2023;
originally announced May 2023.
-
Proceedings of AAAI 2022 Fall Symposium: The Role of AI in Responding to Climate Challenges
Authors:
Feras A. Batarseh,
Priya L. Donti,
Ján Drgoňa,
Kristen Fletcher,
Pierre-Adrien Hanania,
Melissa Hatton,
Srinivasan Keshav,
Bran Knowles,
Raphaela Kotsch,
Sean McGinnis,
Peetak Mitra,
Alex Philp,
Jim Spohrer,
Frank Stein,
Meghna Tare,
Svitlana Volkov,
Gege Wen
Abstract:
Climate change is one of the most pressing challenges of our time, requiring rapid action across society. As artificial intelligence tools (AI) are rapidly deployed, it is therefore crucial to understand how they will impact climate action. On the one hand, AI can support applications in climate change mitigation (reducing or preventing greenhouse gas emissions), adaptation (preparing for the effe…
▽ More
Climate change is one of the most pressing challenges of our time, requiring rapid action across society. As artificial intelligence tools (AI) are rapidly deployed, it is therefore crucial to understand how they will impact climate action. On the one hand, AI can support applications in climate change mitigation (reducing or preventing greenhouse gas emissions), adaptation (preparing for the effects of a changing climate), and climate science. These applications have implications in areas ranging as widely as energy, agriculture, and finance. At the same time, AI is used in many ways that hinder climate action (e.g., by accelerating the use of greenhouse gas-emitting fossil fuels). In addition, AI technologies have a carbon and energy footprint themselves. This symposium brought together participants from across academia, industry, government, and civil society to explore these intersections of AI with climate change, as well as how each of these sectors can contribute to solutions.
△ Less
Submitted 29 January, 2023; v1 submitted 27 December, 2022;
originally announced December 2022.
-
Instability and Local Minima in GAN Training with Kernel Discriminators
Authors:
Evan Becker,
Parthe Pandit,
Sundeep Rangan,
Alyson K. Fletcher
Abstract:
Generative Adversarial Networks (GANs) are a widely-used tool for generative modeling of complex data. Despite their empirical success, the training of GANs is not fully understood due to the min-max optimization of the generator and discriminator. This paper analyzes these joint dynamics when the true samples, as well as the generated samples, are discrete, finite sets, and the discriminator is k…
▽ More
Generative Adversarial Networks (GANs) are a widely-used tool for generative modeling of complex data. Despite their empirical success, the training of GANs is not fully understood due to the min-max optimization of the generator and discriminator. This paper analyzes these joint dynamics when the true samples, as well as the generated samples, are discrete, finite sets, and the discriminator is kernel-based. A simple yet expressive framework for analyzing training called the $\textit{Isolated Points Model}$ is introduced. In the proposed model, the distance between true samples greatly exceeds the kernel width, so each generated point is influenced by at most one true point. Our model enables precise characterization of the conditions for convergence, both to good and bad minima. In particular, the analysis explains two common failure modes: (i) an approximate mode collapse and (ii) divergence. Numerical simulations are provided that predictably replicate these behaviors.
△ Less
Submitted 21 August, 2022;
originally announced August 2022.
-
Kernel Methods and Multi-layer Perceptrons Learn Linear Models in High Dimensions
Authors:
Mojtaba Sahraee-Ardakan,
Melikasadat Emami,
Parthe Pandit,
Sundeep Rangan,
Alyson K. Fletcher
Abstract:
Empirical observation of high dimensional phenomena, such as the double descent behaviour, has attracted a lot of interest in understanding classical techniques such as kernel methods, and their implications to explain generalization properties of neural networks. Many recent works analyze such models in a certain high-dimensional regime where the covariates are independent and the number of sampl…
▽ More
Empirical observation of high dimensional phenomena, such as the double descent behaviour, has attracted a lot of interest in understanding classical techniques such as kernel methods, and their implications to explain generalization properties of neural networks. Many recent works analyze such models in a certain high-dimensional regime where the covariates are independent and the number of samples and the number of covariates grow at a fixed ratio (i.e. proportional asymptotics). In this work we show that for a large class of kernels, including the neural tangent kernel of fully connected networks, kernel methods can only perform as well as linear models in this regime. More surprisingly, when the data is generated by a kernel model where the relationship between input and the response could be very nonlinear, we show that linear models are in fact optimal, i.e. linear models achieve the minimum risk among all models, linear or nonlinear. These results suggest that more complex models for the data other than independent features are needed for high-dimensional analysis.
△ Less
Submitted 20 January, 2022;
originally announced January 2022.
-
Asymptotics of Ridge Regression in Convolutional Models
Authors:
Mojtaba Sahraee-Ardakan,
Tung Mai,
Anup Rao,
Ryan Rossi,
Sundeep Rangan,
Alyson K. Fletcher
Abstract:
Understanding generalization and estimation error of estimators for simple models such as linear and generalized linear models has attracted a lot of attention recently. This is in part due to an interesting observation made in machine learning community that highly over-parameterized neural networks achieve zero training error, and yet they are able to generalize well over the test samples. This…
▽ More
Understanding generalization and estimation error of estimators for simple models such as linear and generalized linear models has attracted a lot of attention recently. This is in part due to an interesting observation made in machine learning community that highly over-parameterized neural networks achieve zero training error, and yet they are able to generalize well over the test samples. This phenomenon is captured by the so called double descent curve, where the generalization error starts decreasing again after the interpolation threshold. A series of recent works tried to explain such phenomenon for simple models. In this work, we analyze the asymptotics of estimation error in ridge estimators for convolutional linear models. These convolutional inverse problems, also known as deconvolution, naturally arise in different fields such as seismology, imaging, and acoustics among others. Our results hold for a large class of input distributions that include i.i.d. features as a special case. We derive exact formulae for estimation error of ridge estimators that hold in a certain high-dimensional regime. We show the double descent phenomenon in our experiments for convolutional models and show that our theoretical results match the experiments.
△ Less
Submitted 8 March, 2021;
originally announced March 2021.
-
Implicit Bias of Linear RNNs
Authors:
Melikasadat Emami,
Mojtaba Sahraee-Ardakan,
Parthe Pandit,
Sundeep Rangan,
Alyson K. Fletcher
Abstract:
Contemporary wisdom based on empirical studies suggests that standard recurrent neural networks (RNNs) do not perform well on tasks requiring long-term memory. However, precise reasoning for this behavior is still unknown. This paper provides a rigorous explanation of this property in the special case of linear RNNs. Although this work is limited to linear RNNs, even these systems have traditional…
▽ More
Contemporary wisdom based on empirical studies suggests that standard recurrent neural networks (RNNs) do not perform well on tasks requiring long-term memory. However, precise reasoning for this behavior is still unknown. This paper provides a rigorous explanation of this property in the special case of linear RNNs. Although this work is limited to linear RNNs, even these systems have traditionally been difficult to analyze due to their non-linear parameterization. Using recently-developed kernel regime analysis, our main result shows that linear RNNs learned from random initializations are functionally equivalent to a certain weighted 1D-convolutional network. Importantly, the weightings in the equivalent model cause an implicit bias to elements with smaller time lags in the convolution and hence, shorter memory. The degree of this bias depends on the variance of the transition kernel matrix at initialization and is related to the classic exploding and vanishing gradients problem. The theory is validated in both synthetic and real data experiments.
△ Less
Submitted 19 January, 2021;
originally announced January 2021.
-
Low-Rank Nonlinear Decoding of $μ$-ECoG from the Primary Auditory Cortex
Authors:
Melikasadat Emami,
Mojtaba Sahraee-Ardakan,
Parthe Pandit,
Alyson K. Fletcher,
Sundeep Rangan,
Michael Trumpis,
Brinnae Bent,
Chia-Han Chiang,
Jonathan Viventi
Abstract:
This paper considers the problem of neural decoding from parallel neural measurements systems such as micro-electrocorticography ($μ$-ECoG). In systems with large numbers of array elements at very high sampling rates, the dimension of the raw measurement data may be large. Learning neural decoders for this high-dimensional data can be challenging, particularly when the number of training samples i…
▽ More
This paper considers the problem of neural decoding from parallel neural measurements systems such as micro-electrocorticography ($μ$-ECoG). In systems with large numbers of array elements at very high sampling rates, the dimension of the raw measurement data may be large. Learning neural decoders for this high-dimensional data can be challenging, particularly when the number of training samples is limited. To address this challenge, this work presents a novel neural network decoder with a low-rank structure in the first hidden layer. The low-rank constraints dramatically reduce the number of parameters in the decoder while still enabling a rich class of nonlinear decoder maps. The low-rank decoder is illustrated on $μ$-ECoG data from the primary auditory cortex (A1) of awake rats. This decoding problem is particularly challenging due to the complexity of neural responses in the auditory cortex and the presence of confounding signals in awake animals. It is shown that the proposed low-rank decoder significantly outperforms models using standard dimensionality reduction techniques such as principal component analysis (PCA).
△ Less
Submitted 6 May, 2020;
originally announced May 2020.
-
Generalization Error of Generalized Linear Models in High Dimensions
Authors:
Melikasadat Emami,
Mojtaba Sahraee-Ardakan,
Parthe Pandit,
Sundeep Rangan,
Alyson K. Fletcher
Abstract:
At the heart of machine learning lies the question of generalizability of learned rules over previously unseen data. While over-parameterized models based on neural networks are now ubiquitous in machine learning applications, our understanding of their generalization capabilities is incomplete. This task is made harder by the non-convexity of the underlying learning problems. We provide a general…
▽ More
At the heart of machine learning lies the question of generalizability of learned rules over previously unseen data. While over-parameterized models based on neural networks are now ubiquitous in machine learning applications, our understanding of their generalization capabilities is incomplete. This task is made harder by the non-convexity of the underlying learning problems. We provide a general framework to characterize the asymptotic generalization error for single-layer neural networks (i.e., generalized linear models) with arbitrary non-linearities, making it applicable to regression as well as classification problems. This framework enables analyzing the effect of (i) over-parameterization and non-linearity during modeling; and (ii) choices of loss function, initialization, and regularizer during learning. Our model also captures mismatch between training and test distributions. As examples, we analyze a few special cases, namely linear regression and logistic regression. We are also able to rigorously and analytically explain the \emph{double descent} phenomenon in generalized linear models.
△ Less
Submitted 30 April, 2020;
originally announced May 2020.
-
Inference in Multi-Layer Networks with Matrix-Valued Unknowns
Authors:
Parthe Pandit,
Mojtaba Sahraee-Ardakan,
Sundeep Rangan,
Philip Schniter,
Alyson K. Fletcher
Abstract:
We consider the problem of inferring the input and hidden variables of a stochastic multi-layer neural network from an observation of the output. The hidden variables in each layer are represented as matrices. This problem applies to signal recovery via deep generative prior models, multi-task and mixed regression and learning certain classes of two-layer neural networks. A unified approximation a…
▽ More
We consider the problem of inferring the input and hidden variables of a stochastic multi-layer neural network from an observation of the output. The hidden variables in each layer are represented as matrices. This problem applies to signal recovery via deep generative prior models, multi-task and mixed regression and learning certain classes of two-layer neural networks. A unified approximation algorithm for both MAP and MMSE inference is proposed by extending a recently-developed Multi-Layer Vector Approximate Message Passing (ML-VAMP) algorithm to handle matrix-valued unknowns. It is shown that the performance of the proposed Multi-Layer Matrix VAMP (ML-Mat-VAMP) algorithm can be exactly predicted in a certain random large-system limit, where the dimensions $N\times d$ of the unknown quantities grow as $N\rightarrow\infty$ with $d$ fixed. In the two-layer neural-network learning problem, this scaling corresponds to the case where the number of input features and training samples grow to infinity but the number of hidden nodes stays fixed. The analysis enables a precise prediction of the parameter and test error of the learning.
△ Less
Submitted 25 January, 2020;
originally announced January 2020.
-
Inference with Deep Generative Priors in High Dimensions
Authors:
Parthe Pandit,
Mojtaba Sahraee-Ardakan,
Sundeep Rangan,
Philip Schniter,
Alyson K. Fletcher
Abstract:
Deep generative priors offer powerful models for complex-structured data, such as images, audio, and text. Using these priors in inverse problems typically requires estimating the input and/or hidden signals in a multi-layer deep neural network from observation of its output. While these approaches have been successful in practice, rigorous performance analysis is complicated by the non-convex nat…
▽ More
Deep generative priors offer powerful models for complex-structured data, such as images, audio, and text. Using these priors in inverse problems typically requires estimating the input and/or hidden signals in a multi-layer deep neural network from observation of its output. While these approaches have been successful in practice, rigorous performance analysis is complicated by the non-convex nature of the underlying optimization problems. This paper presents a novel algorithm, Multi-Layer Vector Approximate Message Passing (ML-VAMP), for inference in multi-layer stochastic neural networks. ML-VAMP can be configured to compute maximum a priori (MAP) or approximate minimum mean-squared error (MMSE) estimates for these networks. We show that the performance of ML-VAMP can be exactly predicted in a certain high-dimensional random limit. Furthermore, under certain conditions, ML-VAMP yields estimates that achieve the minimum (i.e., Bayes-optimal) MSE as predicted by the replica method. In this way, ML-VAMP provides a computationally efficient method for multi-layer inference with an exact performance characterization and testable conditions for optimality in the large-system limit.
△ Less
Submitted 8 November, 2019;
originally announced November 2019.
-
Input-Output Equivalence of Unitary and Contractive RNNs
Authors:
M. Emami,
M. Sahraee-Ardakan,
S. Rangan,
A. K. Fletcher
Abstract:
Unitary recurrent neural networks (URNNs) have been proposed as a method to overcome the vanishing and exploding gradient problem in modeling data with long-term dependencies. A basic question is how restrictive is the unitary constraint on the possible input-output mappings of such a network? This work shows that for any contractive RNN with ReLU activations, there is a URNN with at most twice th…
▽ More
Unitary recurrent neural networks (URNNs) have been proposed as a method to overcome the vanishing and exploding gradient problem in modeling data with long-term dependencies. A basic question is how restrictive is the unitary constraint on the possible input-output mappings of such a network? This work shows that for any contractive RNN with ReLU activations, there is a URNN with at most twice the number of hidden states and the identical input-output mapping. Hence, with ReLU activations, URNNs are as expressive as general RNNs. In contrast, for certain smooth activations, it is shown that the input-output mapping of an RNN cannot be matched with a URNN, even with an arbitrary number of states. The theoretical results are supported by experiments on modeling of slowly-varying dynamical systems.
△ Less
Submitted 30 October, 2019;
originally announced October 2019.
-
High-Dimensional Bernoulli Autoregressive Process with Long-Range Dependence
Authors:
Parthe Pandit,
Mojtaba Sahraee-Ardakan,
Arash A. Amini,
Sundeep Rangan,
Alyson K. Fletcher
Abstract:
We consider the problem of estimating the parameters of a multivariate Bernoulli process with auto-regressive feedback in the high-dimensional setting where the number of samples available is much less than the number of parameters. This problem arises in learning interconnections of networks of dynamical systems with spiking or binary-valued data. We allow the process to depend on its past up to…
▽ More
We consider the problem of estimating the parameters of a multivariate Bernoulli process with auto-regressive feedback in the high-dimensional setting where the number of samples available is much less than the number of parameters. This problem arises in learning interconnections of networks of dynamical systems with spiking or binary-valued data. We allow the process to depend on its past up to a lag $p$, for a general $p \ge 1$, allowing for more realistic modeling in many applications. We propose and analyze an $\ell_1$-regularized maximum likelihood estimator (MLE) under the assumption that the parameter tensor is approximately sparse. Rigorous analysis of such estimators is made challenging by the dependent and non-Gaussian nature of the process as well as the presence of the nonlinearities and multi-level feedback. We derive precise upper bounds on the mean-squared estimation error in terms of the number of samples, dimensions of the process, the lag $p$ and other key statistical properties of the model. The ideas presented can be used in the high-dimensional analysis of regularized $M$-estimators for other sparse nonlinear and non-Gaussian processes with long-range dependence.
△ Less
Submitted 19 March, 2019;
originally announced March 2019.
-
Asymptotics of MAP Inference in Deep Networks
Authors:
Parthe Pandit,
Mojtaba Sahraee,
Sundeep Rangan,
Alyson K. Fletcher
Abstract:
Deep generative priors are a powerful tool for reconstruction problems with complex data such as images and text. Inverse problems using such models require solving an inference problem of estimating the input and hidden units of the multi-layer network from its output. Maximum a priori (MAP) estimation is a widely-used inference method as it is straightforward to implement, and has been successfu…
▽ More
Deep generative priors are a powerful tool for reconstruction problems with complex data such as images and text. Inverse problems using such models require solving an inference problem of estimating the input and hidden units of the multi-layer network from its output. Maximum a priori (MAP) estimation is a widely-used inference method as it is straightforward to implement, and has been successful in practice. However, rigorous analysis of MAP inference in multi-layer networks is difficult. This work considers a recently-developed method, multi-layer vector approximate message passing (ML-VAMP), to study MAP inference in deep networks. It is shown that the mean squared error of the ML-VAMP estimate can be exactly and rigorously characterized in a certain high-dimensional random limit. The proposed method thus provides a tractable method for MAP inference with exact performance guarantees.
△ Less
Submitted 1 March, 2019;
originally announced March 2019.
-
Bilinear Recovery using Adaptive Vector-AMP
Authors:
Subrata Sarkar,
Alyson K. Fletcher,
Sundeep Rangan,
Philip Schniter
Abstract:
We consider the problem of jointly recovering the vector $\boldsymbol{b}$ and the matrix $\boldsymbol{C}$ from noisy measurements $\boldsymbol{Y} = \boldsymbol{A}(\boldsymbol{b})\boldsymbol{C} + \boldsymbol{W}$, where $\boldsymbol{A}(\cdot)$ is a known affine linear function of $\boldsymbol{b}$ (i.e., $\boldsymbol{A}(\boldsymbol{b})=\boldsymbol{A}_0+\sum_{i=1}^Q b_i \boldsymbol{A}_i$ with known ma…
▽ More
We consider the problem of jointly recovering the vector $\boldsymbol{b}$ and the matrix $\boldsymbol{C}$ from noisy measurements $\boldsymbol{Y} = \boldsymbol{A}(\boldsymbol{b})\boldsymbol{C} + \boldsymbol{W}$, where $\boldsymbol{A}(\cdot)$ is a known affine linear function of $\boldsymbol{b}$ (i.e., $\boldsymbol{A}(\boldsymbol{b})=\boldsymbol{A}_0+\sum_{i=1}^Q b_i \boldsymbol{A}_i$ with known matrices $\boldsymbol{A}_i$). This problem has applications in matrix completion, robust PCA, dictionary learning, self-calibration, blind deconvolution, joint-channel/symbol estimation, compressive sensing with matrix uncertainty, and many other tasks. To solve this bilinear recovery problem, we propose the Bilinear Adaptive Vector Approximate Message Passing (BAd-VAMP) algorithm. We demonstrate numerically that the proposed approach is competitive with other state-of-the-art approaches to bilinear recovery, including lifted VAMP and Bilinear GAMP.
△ Less
Submitted 8 March, 2019; v1 submitted 31 August, 2018;
originally announced September 2018.
-
Plug-in Estimation in High-Dimensional Linear Inverse Problems: A Rigorous Analysis
Authors:
Alyson K. Fletcher,
Sundeep Rangan,
Subrata Sarkar,
Philip Schniter
Abstract:
Estimating a vector $\mathbf{x}$ from noisy linear measurements $\mathbf{Ax}+\mathbf{w}$ often requires use of prior knowledge or structural constraints on $\mathbf{x}$ for accurate reconstruction. Several recent works have considered combining linear least-squares estimation with a generic or "plug-in" denoiser function that can be designed in a modular manner based on the prior knowledge about…
▽ More
Estimating a vector $\mathbf{x}$ from noisy linear measurements $\mathbf{Ax}+\mathbf{w}$ often requires use of prior knowledge or structural constraints on $\mathbf{x}$ for accurate reconstruction. Several recent works have considered combining linear least-squares estimation with a generic or "plug-in" denoiser function that can be designed in a modular manner based on the prior knowledge about $\mathbf{x}$. While these methods have shown excellent performance, it has been difficult to obtain rigorous performance guarantees. This work considers plug-in denoising combined with the recently-developed Vector Approximate Message Passing (VAMP) algorithm, which is itself derived via Expectation Propagation techniques. It shown that the mean squared error of this "plug-and-play" VAMP can be exactly predicted for high-dimensional right-rotationally invariant random $\mathbf{A}$ and Lipschitz denoisers. The method is demonstrated on applications in image recovery and parametric bilinear estimation.
△ Less
Submitted 1 August, 2018; v1 submitted 27 June, 2018;
originally announced June 2018.
-
Inference in Deep Networks in High Dimensions
Authors:
Alyson K. Fletcher,
Sundeep Rangan
Abstract:
Deep generative networks provide a powerful tool for modeling complex data in a wide range of applications. In inverse problems that use these networks as generative priors on data, one must often perform inference of the inputs of the networks from the outputs. Inference is also required for sampling during stochastic training on these generative models. This paper considers inference in a deep s…
▽ More
Deep generative networks provide a powerful tool for modeling complex data in a wide range of applications. In inverse problems that use these networks as generative priors on data, one must often perform inference of the inputs of the networks from the outputs. Inference is also required for sampling during stochastic training on these generative models. This paper considers inference in a deep stochastic neural network where the parameters (e.g., weights, biases and activation functions) are known and the problem is to estimate the values of the input and hidden units from the output. While several approximate algorithms have been proposed for this task, there are few analytic tools that can provide rigorous guarantees in the reconstruction error. This work presents a novel and computationally tractable output-to-input inference method called Multi-Layer Vector Approximate Message Passing (ML-VAMP). The proposed algorithm, derived from expectation propagation, extends earlier AMP methods that are known to achieve the replica predictions for optimality in simple linear inverse problems. Our main contribution shows that the mean-squared error (MSE) of ML-VAMP can be exactly predicted in a certain large system limit (LSL) where the numbers of layers is fixed and weight matrices are random and orthogonally-invariant with dimensions that grow to infinity. ML-VAMP is thus a principled method for output-to-input inference in deep networks with a rigorous and precise performance achievability result in high dimensions.
△ Less
Submitted 20 June, 2017;
originally announced June 2017.
-
Rigorous Dynamics and Consistent Estimation in Arbitrarily Conditioned Linear Systems
Authors:
Alyson K. Fletcher,
Mojtaba Sahraee-Ardakan,
Philip Schniter,
Sundeep Rangan
Abstract:
The problem of estimating a random vector x from noisy linear measurements y = A x + w with unknown parameters on the distributions of x and w, which must also be learned, arises in a wide range of statistical learning and linear inverse problems. We show that a computationally simple iterative message-passing algorithm can provably obtain asymptotically consistent estimates in a certain high-dime…
▽ More
The problem of estimating a random vector x from noisy linear measurements y = A x + w with unknown parameters on the distributions of x and w, which must also be learned, arises in a wide range of statistical learning and linear inverse problems. We show that a computationally simple iterative message-passing algorithm can provably obtain asymptotically consistent estimates in a certain high-dimensional large-system limit (LSL) under very general parameterizations. Previous message passing techniques have required i.i.d. sub-Gaussian A matrices and often fail when the matrix is ill-conditioned. The proposed algorithm, called adaptive vector approximate message passing (Adaptive VAMP) with auto-tuning, applies to all right-rotationally random A. Importantly, this class includes matrices with arbitrarily poor conditioning. We show that the parameter estimates and mean squared error (MSE) of x in each iteration converge to deterministic limits that can be precisely predicted by a simple set of state evolution (SE) equations. In addition, a simple testable condition is provided in which the MSE matches the Bayes-optimal value predicted by the replica method. The paper thus provides a computationally simple method with provable guarantees of optimality and consistency over a large class of linear inverse problems.
△ Less
Submitted 19 June, 2017;
originally announced June 2017.
-
Vector Approximate Message Passing for the Generalized Linear Model
Authors:
Philip Schniter,
Sundeep Rangan,
Alyson K. Fletcher
Abstract:
The generalized linear model (GLM), where a random vector $\boldsymbol{x}$ is observed through a noisy, possibly nonlinear, function of a linear transform output $\boldsymbol{z}=\boldsymbol{Ax}$, arises in a range of applications such as robust regression, binary classification, quantized compressed sensing, phase retrieval, photon-limited imaging, and inference from neural spike trains. When…
▽ More
The generalized linear model (GLM), where a random vector $\boldsymbol{x}$ is observed through a noisy, possibly nonlinear, function of a linear transform output $\boldsymbol{z}=\boldsymbol{Ax}$, arises in a range of applications such as robust regression, binary classification, quantized compressed sensing, phase retrieval, photon-limited imaging, and inference from neural spike trains. When $\boldsymbol{A}$ is large and i.i.d. Gaussian, the generalized approximate message passing (GAMP) algorithm is an efficient means of MAP or marginal inference, and its performance can be rigorously characterized by a scalar state evolution. For general $\boldsymbol{A}$, though, GAMP can misbehave. Damping and sequential-updating help to robustify GAMP, but their effects are limited. Recently, a "vector AMP" (VAMP) algorithm was proposed for additive white Gaussian noise channels. VAMP extends AMP's guarantees from i.i.d. Gaussian $\boldsymbol{A}$ to the larger class of rotationally invariant $\boldsymbol{A}$. In this paper, we show how VAMP can be extended to the GLM. Numerical experiments show that the proposed GLM-VAMP is much more robust to ill-conditioning in $\boldsymbol{A}$ than damped GAMP.
△ Less
Submitted 4 December, 2016;
originally announced December 2016.
-
Vector Approximate Message Passing
Authors:
Sundeep Rangan,
Philip Schniter,
Alyson K. Fletcher
Abstract:
The standard linear regression (SLR) problem is to recover a vector $\mathbf{x}^0$ from noisy linear observations $\mathbf{y}=\mathbf{Ax}^0+\mathbf{w}$. The approximate message passing (AMP) algorithm recently proposed by Donoho, Maleki, and Montanari is a computationally efficient iterative approach to SLR that has a remarkable property: for large i.i.d.\ sub-Gaussian matrices $\mathbf{A}$, its p…
▽ More
The standard linear regression (SLR) problem is to recover a vector $\mathbf{x}^0$ from noisy linear observations $\mathbf{y}=\mathbf{Ax}^0+\mathbf{w}$. The approximate message passing (AMP) algorithm recently proposed by Donoho, Maleki, and Montanari is a computationally efficient iterative approach to SLR that has a remarkable property: for large i.i.d.\ sub-Gaussian matrices $\mathbf{A}$, its per-iteration behavior is rigorously characterized by a scalar state-evolution whose fixed points, when unique, are Bayes optimal. The AMP algorithm, however, is fragile in that even small deviations from the i.i.d.\ sub-Gaussian model can cause the algorithm to diverge. This paper considers a "vector AMP" (VAMP) algorithm and shows that VAMP has a rigorous scalar state-evolution that holds under a much broader class of large random matrices $\mathbf{A}$: those that are right-orthogonally invariant. After performing an initial singular value decomposition (SVD) of $\mathbf{A}$, the per-iteration complexity of VAMP can be made similar to that of AMP. In addition, the fixed points of VAMP's state evolution are consistent with the replica prediction of the minimum mean-squared error recently derived by Tulino, Caire, Verdú, and Shamai. Numerical experiments are used to confirm the effectiveness of VAMP and its consistency with state-evolution predictions.
△ Less
Submitted 12 June, 2018; v1 submitted 10 October, 2016;
originally announced October 2016.
-
Learning and Free Energies for Vector Approximate Message Passing
Authors:
Alyson K. Fletcher,
Philip Schniter
Abstract:
Vector approximate message passing (VAMP) is a computationally simple approach to the recovery of a signal $\mathbf{x}$ from noisy linear measurements $\mathbf{y}=\mathbf{Ax}+\mathbf{w}$. Like the AMP proposed by Donoho, Maleki, and Montanari in 2009, VAMP is characterized by a rigorous state evolution (SE) that holds under certain large random matrices and that matches the replica prediction of o…
▽ More
Vector approximate message passing (VAMP) is a computationally simple approach to the recovery of a signal $\mathbf{x}$ from noisy linear measurements $\mathbf{y}=\mathbf{Ax}+\mathbf{w}$. Like the AMP proposed by Donoho, Maleki, and Montanari in 2009, VAMP is characterized by a rigorous state evolution (SE) that holds under certain large random matrices and that matches the replica prediction of optimality. But while AMP's SE holds only for large i.i.d. sub-Gaussian $\mathbf{A}$, VAMP's SE holds under the much larger class: right-rotationally invariant $\mathbf{A}$. To run VAMP, however, one must specify the statistical parameters of the signal and noise. This work combines VAMP with Expectation-Maximization to yield an algorithm, EM-VAMP, that can jointly recover $\mathbf{x}$ while learning those statistical parameters. The fixed points of the proposed EM-VAMP algorithm are shown to be stationary points of a certain constrained free-energy, providing a variational interpretation of the algorithm. Numerical simulations show that EM-VAMP is robust to highly ill-conditioned $\mathbf{A}$ with performance nearly matching oracle-parameter VAMP.
△ Less
Submitted 8 March, 2018; v1 submitted 26 February, 2016;
originally announced February 2016.
-
Expectation Consistent Approximate Inference: Generalizations and Convergence
Authors:
Alyson K. Fletcher,
Mojtaba Sahraee-Ardakan,
Sundeep Rangan,
Philip Schniter
Abstract:
Approximations of loopy belief propagation, including expectation propagation and approximate message passing, have attracted considerable attention for probabilistic inference problems. This paper proposes and analyzes a generalization of Opper and Winther's expectation consistent (EC) approximate inference method. The proposed method, called Generalized Expectation Consistency (GEC), can be appl…
▽ More
Approximations of loopy belief propagation, including expectation propagation and approximate message passing, have attracted considerable attention for probabilistic inference problems. This paper proposes and analyzes a generalization of Opper and Winther's expectation consistent (EC) approximate inference method. The proposed method, called Generalized Expectation Consistency (GEC), can be applied to both maximum a posteriori (MAP) and minimum mean squared error (MMSE) estimation. Here we characterize its fixed points, convergence, and performance relative to the replica prediction of optimality.
△ Less
Submitted 24 January, 2017; v1 submitted 25 February, 2016;
originally announced February 2016.
-
Inference for Generalized Linear Models via Alternating Directions and Bethe Free Energy Minimization
Authors:
Sundeep Rangan,
Alyson K. Fletcher,
Philip Schniter,
Ulugbek Kamilov
Abstract:
Generalized Linear Models (GLMs), where a random vector $\mathbf{x}$ is observed through a noisy, possibly nonlinear, function of a linear transform $\mathbf{z}=\mathbf{Ax}$ arise in a range of applications in nonlinear filtering and regression. Approximate Message Passing (AMP) methods, based on loopy belief propagation, are a promising class of approaches for approximate inference in these model…
▽ More
Generalized Linear Models (GLMs), where a random vector $\mathbf{x}$ is observed through a noisy, possibly nonlinear, function of a linear transform $\mathbf{z}=\mathbf{Ax}$ arise in a range of applications in nonlinear filtering and regression. Approximate Message Passing (AMP) methods, based on loopy belief propagation, are a promising class of approaches for approximate inference in these models. AMP methods are computationally simple, general, and admit precise analyses with testable conditions for optimality for large i.i.d. transforms $\mathbf{A}$. However, the algorithms can easily diverge for general $\mathbf{A}$. This paper presents a convergent approach to the generalized AMP (GAMP) algorithm based on direct minimization of a large-system limit approximation of the Bethe Free Energy (LSL-BFE). The proposed method uses a double-loop procedure, where the outer loop successively linearizes the LSL-BFE and the inner loop minimizes the linearized LSL-BFE using the Alternating Direction Method of Multipliers (ADMM). The proposed method, called ADMM-GAMP, is similar in structure to the original GAMP method, but with an additional least-squares minimization. It is shown that for strictly convex, smooth penalties, ADMM-GAMP is guaranteed to converge to a local minima of the LSL-BFE, thus providing a convergent alternative to GAMP that is stable under arbitrary transforms. Simulations are also presented that demonstrate the robustness of the method for non-convex penalties as well.
△ Less
Submitted 2 May, 2016; v1 submitted 8 January, 2015;
originally announced January 2015.
-
Scalable Inference for Neuronal Connectivity from Calcium Imaging
Authors:
Alyson K. Fletcher,
Sundeep Rangan
Abstract:
Fluorescent calcium imaging provides a potentially powerful tool for inferring connectivity in neural circuits with up to thousands of neurons. However, a key challenge in using calcium imaging for connectivity detection is that current systems often have a temporal response and frame rate that can be orders of magnitude slower than the underlying neural spiking process. Bayesian inference methods…
▽ More
Fluorescent calcium imaging provides a potentially powerful tool for inferring connectivity in neural circuits with up to thousands of neurons. However, a key challenge in using calcium imaging for connectivity detection is that current systems often have a temporal response and frame rate that can be orders of magnitude slower than the underlying neural spiking process. Bayesian inference methods based on expectation-maximization (EM) have been proposed to overcome these limitations, but are often computationally demanding since the E-step in the EM procedure typically involves state estimation for a high-dimensional nonlinear dynamical system. In this work, we propose a computationally fast method for the state estimation based on a hybrid of loopy belief propagation and approximate message passing (AMP). The key insight is that a neural system as viewed through calcium imaging can be factorized into simple scalar dynamical systems for each neuron with linear interconnections between the neurons. Using the structure, the updates in the proposed hybrid AMP methodology can be computed by a set of one-dimensional state estimation procedures and linear transforms with the connectivity matrix. This yields a computationally scalable method for inferring connectivity of large neural circuits. Simulations of the method on realistic neural networks demonstrate good accuracy with computation times that are potentially significantly faster than current approaches based on Markov Chain Monte Carlo methods.
△ Less
Submitted 3 September, 2014; v1 submitted 1 September, 2014;
originally announced September 2014.
-
On the Convergence of Approximate Message Passing with Arbitrary Matrices
Authors:
Sundeep Rangan,
Philip Schniter,
Alyson K. Fletcher,
Subrata Sarkar
Abstract:
Approximate message passing (AMP) methods and their variants have attracted considerable recent attention for the problem of estimating a random vector $\mathbf{x}$ observed through a linear transform $\mathbf{A}$. In the case of large i.i.d. zero-mean Gaussian $\mathbf{A}$, the methods exhibit fast convergence with precise analytic characterizations on the algorithm behavior. However, the converg…
▽ More
Approximate message passing (AMP) methods and their variants have attracted considerable recent attention for the problem of estimating a random vector $\mathbf{x}$ observed through a linear transform $\mathbf{A}$. In the case of large i.i.d. zero-mean Gaussian $\mathbf{A}$, the methods exhibit fast convergence with precise analytic characterizations on the algorithm behavior. However, the convergence of AMP under general transforms $\mathbf{A}$ is not fully understood. In this paper, we provide sufficient conditions for the convergence of a damped version of the generalized AMP (GAMP) algorithm in the case of quadratic cost functions (i.e., Gaussian likelihood and prior). It is shown that, with sufficient damping, the algorithm is guaranteed to converge, although the amount of damping grows with peak-to-average ratio of the squared singular values of the transforms $\mathbf{A}$. This result explains the good performance of AMP on i.i.d. Gaussian transforms $\mathbf{A}$, but also their difficulties with ill-conditioned or non-zero-mean transforms $\mathbf{A}$. A related sufficient condition is then derived for the local stability of the damped GAMP method under general cost functions, assuming certain strict convexity conditions.
△ Less
Submitted 2 March, 2018; v1 submitted 13 February, 2014;
originally announced February 2014.
-
Approximate Message Passing with Consistent Parameter Estimation and Applications to Sparse Learning
Authors:
Ulugbek S. Kamilov,
Sundeep Rangan,
Alyson K. Fletcher,
Michael Unser
Abstract:
We consider the estimation of an i.i.d. (possibly non-Gaussian) vector $\xbf \in \R^n$ from measurements $\ybf \in \R^m$ obtained by a general cascade model consisting of a known linear transform followed by a probabilistic componentwise (possibly nonlinear) measurement channel. A novel method, called adaptive generalized approximate message passing (Adaptive GAMP), that enables joint learning of…
▽ More
We consider the estimation of an i.i.d. (possibly non-Gaussian) vector $\xbf \in \R^n$ from measurements $\ybf \in \R^m$ obtained by a general cascade model consisting of a known linear transform followed by a probabilistic componentwise (possibly nonlinear) measurement channel. A novel method, called adaptive generalized approximate message passing (Adaptive GAMP), that enables joint learning of the statistics of the prior and measurement channel along with estimation of the unknown vector $\xbf$ is presented. The proposed algorithm is a generalization of a recently-developed EM-GAMP that uses expectation-maximization (EM) iterations where the posteriors in the E-steps are computed via approximate message passing. The methodology can be applied to a large class of learning problems including the learning of sparse priors in compressed sensing or identification of linear-nonlinear cascade models in dynamical systems and neural spiking processes. We prove that for large i.i.d. Gaussian transform matrices the asymptotic componentwise behavior of the adaptive GAMP algorithm is predicted by a simple set of scalar state evolution equations. In addition, we show that when a certain maximum-likelihood estimation can be performed in each step, the adaptive GAMP method can yield asymptotically consistent parameter estimates, which implies that the algorithm achieves a reconstruction quality equivalent to the oracle algorithm that knows the correct parameter values. Remarkably, this result applies to essentially arbitrary parametrizations of the unknown distributions, including ones that are nonlinear and non-Gaussian. The adaptive GAMP methodology thus provides a systematic, general and computationally efficient method applicable to a large range of complex linear-nonlinear models with provable guarantees.
△ Less
Submitted 1 December, 2012; v1 submitted 16 July, 2012;
originally announced July 2012.
-
Iterative Reconstruction of Rank-One Matrices in Noise
Authors:
Alyson K. Fletcher,
Sundeep Rangan
Abstract:
We consider the problem of estimating a rank-one matrix in Gaussian noise under a probabilistic model for the left and right factors of the matrix. The probabilistic model can impose constraints on the factors including sparsity and positivity that arise commonly in learning problems. We propose a family of algorithms that reduce the problem to a sequence of scalar estimation computations. These a…
▽ More
We consider the problem of estimating a rank-one matrix in Gaussian noise under a probabilistic model for the left and right factors of the matrix. The probabilistic model can impose constraints on the factors including sparsity and positivity that arise commonly in learning problems. We propose a family of algorithms that reduce the problem to a sequence of scalar estimation computations. These algorithms are similar to approximate message passing techniques based on Gaussian approximations of loopy belief propagation that have been used recently in compressed sensing. Leveraging analysis methods by Bayati and Montanari, we show that the asymptotic behavior of the algorithm is described by a simple scalar equivalent model, where the distribution of the estimates at each iteration is identical to certain scalar estimates of the variables in Gaussian noise. Moreover, the effective Gaussian noise level is described by a set of state evolution equations. The proposed approach to deriving algorithms thus provides a computationally simple and general method for rank-one estimation problems with a precise analysis in certain high-dimensional settings.
△ Less
Submitted 15 September, 2015; v1 submitted 13 February, 2012;
originally announced February 2012.
-
Hybrid Approximate Message Passing
Authors:
Sundeep Rangan,
Alyson K. Fletcher,
Vivek K. Goyal,
Evan Byrne,
Philip Schniter
Abstract:
Gaussian and quadratic approximations of message passing algorithms on graphs have attracted considerable recent attention due to their computational simplicity, analytic tractability, and wide applicability in optimization and statistical inference problems. This paper presents a systematic framework for incorporating such approximate message passing (AMP) methods in general graphical models. The…
▽ More
Gaussian and quadratic approximations of message passing algorithms on graphs have attracted considerable recent attention due to their computational simplicity, analytic tractability, and wide applicability in optimization and statistical inference problems. This paper presents a systematic framework for incorporating such approximate message passing (AMP) methods in general graphical models. The key concept is a partition of dependencies of a general graphical model into strong and weak edges, with the weak edges representing interactions through aggregates of small, linearizable couplings of variables. AMP approximations based on the Central Limit Theorem can be readily applied to aggregates of many weak edges and integrated with standard message passing updates on the strong edges. The resulting algorithm, which we call hybrid generalized approximate message passing (HyGAMP), can yield significantly simpler implementations of sum-product and max-sum loopy belief propagation. By varying the partition of strong and weak edges, a performance--complexity trade-off can be achieved. Group sparsity and multinomial logistic regression problems are studied as examples of the proposed methodology.
△ Less
Submitted 23 March, 2017; v1 submitted 10 November, 2011;
originally announced November 2011.
-
Ranked Sparse Signal Support Detection
Authors:
Alyson K. Fletcher,
Sundeep Rangan,
Vivek K Goyal
Abstract:
This paper considers the problem of detecting the support (sparsity pattern) of a sparse vector from random noisy measurements. Conditional power of a component of the sparse vector is defined as the energy conditioned on the component being nonzero. Analysis of a simplified version of orthogonal matching pursuit (OMP) called sequential OMP (SequOMP) demonstrates the importance of knowledge of the…
▽ More
This paper considers the problem of detecting the support (sparsity pattern) of a sparse vector from random noisy measurements. Conditional power of a component of the sparse vector is defined as the energy conditioned on the component being nonzero. Analysis of a simplified version of orthogonal matching pursuit (OMP) called sequential OMP (SequOMP) demonstrates the importance of knowledge of the rankings of conditional powers. When the simple SequOMP algorithm is applied to components in nonincreasing order of conditional power, the detrimental effect of dynamic range on thresholding performance is eliminated. Furthermore, under the most favorable conditional powers, the performance of SequOMP approaches maximum likelihood performance at high signal-to-noise ratio.
△ Less
Submitted 27 October, 2011;
originally announced October 2011.
-
Orthogonal Matching Pursuit: A Brownian Motion Analysis
Authors:
Alyson K. Fletcher,
Sundeep Rangan
Abstract:
A well-known analysis of Tropp and Gilbert shows that orthogonal matching pursuit (OMP) can recover a k-sparse n-dimensional real vector from 4 k log(n) noise-free linear measurements obtained through a random Gaussian measurement matrix with a probability that approaches one as n approaches infinity. This work strengthens this result by showing that a lower number of measurements, 2 k log(n - k),…
▽ More
A well-known analysis of Tropp and Gilbert shows that orthogonal matching pursuit (OMP) can recover a k-sparse n-dimensional real vector from 4 k log(n) noise-free linear measurements obtained through a random Gaussian measurement matrix with a probability that approaches one as n approaches infinity. This work strengthens this result by showing that a lower number of measurements, 2 k log(n - k), is in fact sufficient for asymptotic recovery. More generally, when the sparsity level satisfies kmin <= k <= kmax but is unknown, 2 kmax log(n - kmin) measurements is sufficient. Furthermore, this number of measurements is also sufficient for detection of the sparsity pattern (support) of the vector with measurement errors provided the signal-to-noise ratio (SNR) scales to infinity. The scaling 2 k log(n - k) exactly matches the number of measurements required by the more complex lasso method for signal recovery with a similar SNR scaling.
△ Less
Submitted 29 May, 2011;
originally announced May 2011.
-
Asymptotic Analysis of MAP Estimation via the Replica Method and Applications to Compressed Sensing
Authors:
Sundeep Rangan,
Alyson K. Fletcher,
Vivek K Goyal
Abstract:
The replica method is a non-rigorous but well-known technique from statistical physics used in the asymptotic analysis of large, random, nonlinear problems. This paper applies the replica method, under the assumption of replica symmetry, to study estimators that are maximum a posteriori (MAP) under a postulated prior distribution. It is shown that with random linear measurements and Gaussian noise…
▽ More
The replica method is a non-rigorous but well-known technique from statistical physics used in the asymptotic analysis of large, random, nonlinear problems. This paper applies the replica method, under the assumption of replica symmetry, to study estimators that are maximum a posteriori (MAP) under a postulated prior distribution. It is shown that with random linear measurements and Gaussian noise, the replica-symmetric prediction of the asymptotic behavior of the postulated MAP estimate of an n-dimensional vector "decouples" as n scalar postulated MAP estimators. The result is based on applying a hardening argument to the replica analysis of postulated posterior mean estimators of Tanaka and of Guo and Verdu.
The replica-symmetric postulated MAP analysis can be readily applied to many estimators used in compressed sensing, including basis pursuit, lasso, linear estimation with thresholding, and zero norm-regularized estimation. In the case of lasso estimation the scalar estimator reduces to a soft-thresholding operator, and for zero norm-regularized estimation it reduces to a hard-threshold. Among other benefits, the replica method provides a computationally-tractable method for precisely predicting various performance metrics including mean-squared error and sparsity pattern recovery probability.
△ Less
Submitted 8 October, 2011; v1 submitted 17 June, 2009;
originally announced June 2009.
-
On-Off Random Access Channels: A Compressed Sensing Framework
Authors:
Alyson K. Fletcher,
Sundeep Rangan,
Vivek K Goyal
Abstract:
This paper considers a simple on-off random multiple access channel, where n users communicate simultaneously to a single receiver over m degrees of freedom. Each user transmits with probability lambda, where typically lambda n < m << n, and the receiver must detect which users transmitted. We show that when the codebook has i.i.d. Gaussian entries, detecting which users transmitted is mathemati…
▽ More
This paper considers a simple on-off random multiple access channel, where n users communicate simultaneously to a single receiver over m degrees of freedom. Each user transmits with probability lambda, where typically lambda n < m << n, and the receiver must detect which users transmitted. We show that when the codebook has i.i.d. Gaussian entries, detecting which users transmitted is mathematically equivalent to a certain sparsity detection problem considered in compressed sensing. Using recent sparsity results, we derive upper and lower bounds on the capacities of these channels. We show that common sparsity detection algorithms, such as lasso and orthogonal matching pursuit (OMP), can be used as tractable multiuser detection schemes and have significantly better performance than single-user detection. These methods do achieve some near-far resistance but--at high signal-to-noise ratios (SNRs)--may achieve capacities far below optimal maximum likelihood detection. We then present a new algorithm, called sequential OMP, that illustrates that iterative detection combined with power ordering or power shaping can significantly improve the high SNR performance. Sequential OMP is analogous to successive interference cancellation in the classic multiple access channel. Our results thereby provide insight into the roles of power control and multiuser detection on random-access signalling.
△ Less
Submitted 9 March, 2009; v1 submitted 5 March, 2009;
originally announced March 2009.
-
Necessary and Sufficient Conditions on Sparsity Pattern Recovery
Authors:
Alyson K. Fletcher,
Sundeep Rangan,
Vivek K. Goyal
Abstract:
The problem of detecting the sparsity pattern of a k-sparse vector in R^n from m random noisy measurements is of interest in many areas such as system identification, denoising, pattern recognition, and compressed sensing. This paper addresses the scaling of the number of measurements m, with signal dimension n and sparsity-level nonzeros k, for asymptotically-reliable detection. We show a neces…
▽ More
The problem of detecting the sparsity pattern of a k-sparse vector in R^n from m random noisy measurements is of interest in many areas such as system identification, denoising, pattern recognition, and compressed sensing. This paper addresses the scaling of the number of measurements m, with signal dimension n and sparsity-level nonzeros k, for asymptotically-reliable detection. We show a necessary condition for perfect recovery at any given SNR for all algorithms, regardless of complexity, is m = Omega(k log(n-k)) measurements. Conversely, it is shown that this scaling of Omega(k log(n-k)) measurements is sufficient for a remarkably simple ``maximum correlation'' estimator. Hence this scaling is optimal and does not require more sophisticated techniques such as lasso or matching pursuit. The constants for both the necessary and sufficient conditions are precisely defined in terms of the minimum-to-average ratio of the nonzero components and the SNR. The necessary condition improves upon previous results for maximum likelihood estimation. For lasso, it also provides a necessary condition at any SNR and for low SNR improves upon previous work. The sufficient condition provides the first asymptotically-reliable detection guarantee at finite SNR.
△ Less
Submitted 11 April, 2008;
originally announced April 2008.