Skip to main content

Showing 1–18 of 18 results for author: Maheswaranathan, N

  1. arXiv:2203.11860  [pdf, other

    cs.LG cs.NE math.OC stat.ML

    Practical tradeoffs between memory, compute, and performance in learned optimizers

    Authors: Luke Metz, C. Daniel Freeman, James Harrison, Niru Maheswaranathan, Jascha Sohl-Dickstein

    Abstract: Optimization plays a costly and crucial role in developing machine learning systems. In learned optimizers, the few hyperparameters of commonly used hand-designed optimizers, e.g. Adam or SGD, are replaced with flexible parametric functions. The parameters of these functions are then optimized so that the resulting learned optimizer minimizes a target loss on a chosen class of models. Learned opti… ▽ More

    Submitted 16 July, 2022; v1 submitted 22 March, 2022; originally announced March 2022.

  2. arXiv:2110.15253  [pdf, other

    cs.LG stat.ML

    Understanding How Encoder-Decoder Architectures Attend

    Authors: Kyle Aitken, Vinay V Ramasesh, Yuan Cao, Niru Maheswaranathan

    Abstract: Encoder-decoder networks with attention have proven to be a powerful way to solve many sequence-to-sequence tasks. In these networks, attention aligns encoder and decoder states and is often used for visualizing network behavior. However, the mechanisms used by networks to generate appropriate attention matrices are still mysterious. Moreover, how these mechanisms vary depending on the particular… ▽ More

    Submitted 28 October, 2021; originally announced October 2021.

    Comments: 10+14 pages, 16 figures. NeurIPS 2021

  3. arXiv:2101.07367  [pdf, other

    cs.LG cs.NE

    Training Learned Optimizers with Randomly Initialized Learned Optimizers

    Authors: Luke Metz, C. Daniel Freeman, Niru Maheswaranathan, Jascha Sohl-Dickstein

    Abstract: Learned optimizers are increasingly effective, with performance exceeding that of hand designed optimizers such as Adam~\citep{kingma2014adam} on specific tasks \citep{metz2019understanding}. Despite the potential gains available, in current work the meta-training (or `outer-training') of the learned optimizer is performed by a hand-designed optimizer, or by an optimizer trained by a hand-designed… ▽ More

    Submitted 14 January, 2021; originally announced January 2021.

  4. arXiv:2011.02159  [pdf, other

    cs.LG cs.NE stat.ML

    Reverse engineering learned optimizers reveals known and novel mechanisms

    Authors: Niru Maheswaranathan, David Sussillo, Luke Metz, Ruoxi Sun, Jascha Sohl-Dickstein

    Abstract: Learned optimizers are algorithms that can themselves be trained to solve optimization problems. In contrast to baseline optimizers (such as momentum or Adam) that use simple update rules derived from theoretical principles, learned optimizers use flexible, high-dimensional, nonlinear parameterizations. Although this can lead to better performance in certain settings, their inner workings remain a… ▽ More

    Submitted 7 December, 2021; v1 submitted 4 November, 2020; originally announced November 2020.

    Comments: Thirty-Fifth Conference on Neural Information Processing Systems. 2021

  5. arXiv:2010.15114  [pdf, other

    cs.LG cs.CL stat.ML

    The geometry of integration in text classification RNNs

    Authors: Kyle Aitken, Vinay V. Ramasesh, Ankush Garg, Yuan Cao, David Sussillo, Niru Maheswaranathan

    Abstract: Despite the widespread application of recurrent neural networks (RNNs) across a variety of tasks, a unified understanding of how RNNs solve these tasks remains elusive. In particular, it is unclear what dynamical patterns arise in trained RNNs, and how those patterns depend on the training dataset or task. This work addresses these questions in the context of a specific natural language processing… ▽ More

    Submitted 3 June, 2022; v1 submitted 28 October, 2020; originally announced October 2020.

    Comments: 9+19 pages, 30 figures; v2: smaller file size

  6. arXiv:2009.11243  [pdf, other

    cs.LG cs.NE stat.ML

    Tasks, stability, architecture, and compute: Training more effective learned optimizers, and using them to train themselves

    Authors: Luke Metz, Niru Maheswaranathan, C. Daniel Freeman, Ben Poole, Jascha Sohl-Dickstein

    Abstract: Much as replacing hand-designed features with learned functions has revolutionized how we solve perceptual tasks, we believe learned algorithms will transform how we train models. In this work we focus on general-purpose learned optimizers capable of training a wide variety of problems with no user-specified hyperparameters. We introduce a new, neural network parameterized, hierarchical optimizer… ▽ More

    Submitted 23 September, 2020; originally announced September 2020.

  7. arXiv:2004.08013  [pdf, other

    cs.CL cs.LG stat.ML

    How recurrent networks implement contextual processing in sentiment analysis

    Authors: Niru Maheswaranathan, David Sussillo

    Abstract: Neural networks have a remarkable capacity for contextual processing--using recent or nearby inputs to modify processing of current input. For example, in natural language, contextual processing is necessary to correctly interpret negation (e.g. phrases such as "not bad"). However, our ability to understand how networks process context is limited. Here, we propose general methods for reverse engin… ▽ More

    Submitted 16 April, 2020; originally announced April 2020.

  8. arXiv:2002.11887  [pdf, other

    cs.LG stat.ML

    Using a thousand optimization tasks to learn hyperparameter search strategies

    Authors: Luke Metz, Niru Maheswaranathan, Ruoxi Sun, C. Daniel Freeman, Ben Poole, Jascha Sohl-Dickstein

    Abstract: We present TaskSet, a dataset of tasks for use in training and evaluating optimizers. TaskSet is unique in its size and diversity, containing over a thousand tasks ranging from image classification with fully connected or convolutional neural networks, to variational autoencoders, to non-volume preserving flows on a variety of datasets. As an example application of such a dataset we explore meta-l… ▽ More

    Submitted 31 March, 2020; v1 submitted 26 February, 2020; originally announced February 2020.

  9. arXiv:1912.06207  [pdf, other

    q-bio.NC cs.LG physics.bio-ph

    From deep learning to mechanistic understanding in neuroscience: the structure of retinal prediction

    Authors: Hidenori Tanaka, Aran Nayebi, Niru Maheswaranathan, Lane McIntosh, Stephen A. Baccus, Surya Ganguli

    Abstract: Recently, deep feedforward neural networks have achieved considerable success in modeling biological sensory processing, in terms of reproducing the input-output map of sensory neurons. However, such models raise profound questions about the very nature of explanation in neuroscience. Are we simply replacing one complex system (a biological circuit) with another (a deep network), without understan… ▽ More

    Submitted 12 December, 2019; originally announced December 2019.

    Journal ref: Neural Information Processing Systems (NeurIPS), 2019

  10. arXiv:1907.08549  [pdf, other

    q-bio.NC cs.NE

    Universality and individuality in neural dynamics across large populations of recurrent networks

    Authors: Niru Maheswaranathan, Alex H. Williams, Matthew D. Golub, Surya Ganguli, David Sussillo

    Abstract: Task-based modeling with recurrent neural networks (RNNs) has emerged as a popular way to infer the computational function of different brain regions. These models are quantitatively assessed by comparing the low-dimensional neural representations of the model with the brain, for example using canonical correlation analysis (CCA). However, the nature of the detailed neurobiological inferences one… ▽ More

    Submitted 4 December, 2019; v1 submitted 19 July, 2019; originally announced July 2019.

    Comments: Presented at NeurIPS 2019

  11. arXiv:1906.10720  [pdf, other

    cs.LG stat.ML

    Reverse engineering recurrent networks for sentiment classification reveals line attractor dynamics

    Authors: Niru Maheswaranathan, Alex Williams, Matthew D. Golub, Surya Ganguli, David Sussillo

    Abstract: Recurrent neural networks (RNNs) are a widely used tool for modeling sequential data, yet they are often treated as inscrutable black boxes. Given a trained recurrent network, we would like to reverse engineer it--to obtain a quantitative, interpretable description of how it solves a particular task. Even for simple tasks, a detailed understanding of how recurrent networks work, or a prescription… ▽ More

    Submitted 4 December, 2019; v1 submitted 25 June, 2019; originally announced June 2019.

    Comments: Presented at NeurIPS 2019

  12. arXiv:1906.03367  [pdf, other

    cs.LG stat.ML

    Using learned optimizers to make models robust to input noise

    Authors: Luke Metz, Niru Maheswaranathan, Jonathon Shlens, Jascha Sohl-Dickstein, Ekin D. Cubuk

    Abstract: State-of-the art vision models can achieve superhuman performance on image classification tasks when testing and training data come from the same distribution. However, when models are tested on corrupted images (e.g. due to scale changes, translations, or shifts in brightness or contrast), performance degrades significantly. Here, we explore the possibility of meta-training a learned optimizer th… ▽ More

    Submitted 7 June, 2019; originally announced June 2019.

  13. arXiv:1810.10180  [pdf, other

    cs.NE stat.ML

    Understanding and correcting pathologies in the training of learned optimizers

    Authors: Luke Metz, Niru Maheswaranathan, Jeremy Nixon, C. Daniel Freeman, Jascha Sohl-Dickstein

    Abstract: Deep learning has shown that learned functions can dramatically outperform hand-designed functions on perceptual tasks. Analogously, this suggests that learned optimizers may similarly outperform current hand-designed optimizers, especially for specific problems. However, learned optimizers are notoriously difficult to train and have yet to demonstrate wall-clock speedups over hand-designed optimi… ▽ More

    Submitted 7 June, 2019; v1 submitted 24 October, 2018; originally announced October 2018.

  14. arXiv:1806.10230  [pdf, other

    cs.NE cs.LG stat.ML

    Guided evolutionary strategies: Augmenting random search with surrogate gradients

    Authors: Niru Maheswaranathan, Luke Metz, George Tucker, Dami Choi, Jascha Sohl-Dickstein

    Abstract: Many applications in machine learning require optimizing a function whose true gradient is unknown, but where surrogate gradient information (directions that may be correlated with, but not necessarily identical to, the true gradient) is available instead. This arises when an approximate gradient is easier to compute than the full gradient (e.g. in meta-learning or unrolled optimization), or when… ▽ More

    Submitted 10 June, 2019; v1 submitted 26 June, 2018; originally announced June 2018.

    Comments: Published at ICML 2019

  15. arXiv:1804.00222  [pdf, other

    cs.LG cs.NE stat.ML

    Meta-Learning Update Rules for Unsupervised Representation Learning

    Authors: Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein

    Abstract: A major goal of unsupervised learning is to discover data representations that are useful for subsequent tasks, without access to supervised labels during training. Typically, this involves minimizing a surrogate objective, such as the negative log likelihood of a generative model, with the hope that representations useful for subsequent tasks will arise as a side effect. In this work, we propose… ▽ More

    Submitted 26 February, 2019; v1 submitted 31 March, 2018; originally announced April 2018.

  16. arXiv:1711.10151  [pdf, other

    cs.CV

    Recurrent Segmentation for Variable Computational Budgets

    Authors: Lane McIntosh, Niru Maheswaranathan, David Sussillo, Jonathon Shlens

    Abstract: State-of-the-art systems for semantic image segmentation use feed-forward pipelines with fixed computational costs. Building an image segmentation system that works across a range of computational budgets is challenging and time-intensive as new architectures must be designed and trained for every computational setting. To address this problem we develop a recurrent neural network that successivel… ▽ More

    Submitted 14 March, 2018; v1 submitted 28 November, 2017; originally announced November 2017.

  17. arXiv:1703.04813  [pdf, other

    cs.LG cs.NE stat.ML

    Learned Optimizers that Scale and Generalize

    Authors: Olga Wichrowska, Niru Maheswaranathan, Matthew W. Hoffman, Sergio Gomez Colmenarejo, Misha Denil, Nando de Freitas, Jascha Sohl-Dickstein

    Abstract: Learning to learn has emerged as an important direction for achieving artificial intelligence. Two of the primary barriers to its adoption are an inability to scale to larger problems and a limited ability to generalize to new tasks. We introduce a learned gradient descent optimizer that generalizes well to new tasks, and which has significantly reduced memory and computation overhead. We achieve… ▽ More

    Submitted 7 September, 2017; v1 submitted 14 March, 2017; originally announced March 2017.

    Comments: Final ICML paper after reviewer suggestions

  18. arXiv:1503.03585  [pdf, other

    cs.LG cond-mat.dis-nn q-bio.NC stat.ML

    Deep Unsupervised Learning using Nonequilibrium Thermodynamics

    Authors: Jascha Sohl-Dickstein, Eric A. Weiss, Niru Maheswaranathan, Surya Ganguli

    Abstract: A central problem in machine learning involves modeling complex data-sets using highly flexible families of probability distributions in which learning, sampling, inference, and evaluation are still analytically or computationally tractable. Here, we develop an approach that simultaneously achieves both flexibility and tractability. The essential idea, inspired by non-equilibrium statistical physi… ▽ More

    Submitted 18 November, 2015; v1 submitted 12 March, 2015; originally announced March 2015.