Skip to main content

Showing 1–49 of 49 results for author: Morcos, A

  1. arXiv:2407.00434  [pdf, other

    cs.CL

    Brevity is the soul of wit: Pruning long files for code generation

    Authors: Aaditya K. Singh, Yu Yang, Kushal Tirumala, Mostafa Elhoushi, Ari S. Morcos

    Abstract: Data curation is commonly considered a "secret-sauce" for LLM training, with higher quality data usually leading to better LLM performance. Given the scale of internet-scraped corpora, data pruning has become a larger and larger focus. Specifically, many have shown that de-duplicating data, or sub-selecting higher quality data, can lead to efficiency or performance improvements. Generally, three t… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: 15 pages, 5 figures

  2. arXiv:2401.04578  [pdf, other

    cs.CV

    Effective pruning of web-scale datasets based on complexity of concept clusters

    Authors: Amro Abbas, Evgenia Rusak, Kushal Tirumala, Wieland Brendel, Kamalika Chaudhuri, Ari S. Morcos

    Abstract: Utilizing massive web-scale datasets has led to unprecedented performance gains in machine learning models, but also imposes outlandish compute requirements for their training. In order to improve training and data efficiency, we here push the limits of pruning large-scale multimodal datasets for training CLIP-style models. Today's most effective pruning method on ImageNet clusters data samples in… ▽ More

    Submitted 12 March, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

    Comments: Accepted at ICLR 2024, code available at https://github.com/amro-kamal/effective_pruning

  3. arXiv:2312.02418  [pdf, other

    cs.CL cs.AI cs.LG

    Decoding Data Quality via Synthetic Corruptions: Embedding-guided Pruning of Code Data

    Authors: Yu Yang, Aaditya K. Singh, Mostafa Elhoushi, Anas Mahmoud, Kushal Tirumala, Fabian Gloeckle, Baptiste Rozière, Carole-Jean Wu, Ari S. Morcos, Newsha Ardalani

    Abstract: Code datasets, often collected from diverse and uncontrolled sources such as GitHub, potentially suffer from quality issues, thereby affecting the performance and training efficiency of Large Language Models (LLMs) optimized for code generation. Previous studies demonstrated the benefit of using embedding spaces for data pruning, but they mainly focused on duplicate removal or increasing variety,… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: 12 pages, 4 figures, Oral Presentation at 3rd Workshop on Efficient Natural Language and Speech Processing (ENLSP-III), NeurIPS 2023

  4. arXiv:2310.02110  [pdf, other

    cs.CV

    Sieve: Multimodal Dataset Pruning Using Image Captioning Models

    Authors: Anas Mahmoud, Mostafa Elhoushi, Amro Abbas, Yu Yang, Newsha Ardalani, Hugh Leather, Ari Morcos

    Abstract: Vision-Language Models (VLMs) are pretrained on large, diverse, and noisy web-crawled datasets. This underscores the critical need for dataset pruning, as the quality of these datasets is strongly correlated with the performance of VLMs on downstream tasks. Using CLIPScore from a pretrained model to only train models using highly-aligned samples is one of the most successful methods for pruning. W… ▽ More

    Submitted 10 March, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: Accepted in CVPR 2024

  5. arXiv:2308.12284  [pdf, other

    cs.CL cs.AI cs.LG

    D4: Improving LLM Pretraining via Document De-Duplication and Diversification

    Authors: Kushal Tirumala, Daniel Simig, Armen Aghajanyan, Ari S. Morcos

    Abstract: Over recent years, an increasing amount of compute and data has been poured into training large language models (LLMs), usually by doing one-pass learning on as many tokens as possible randomly selected from large-scale web corpora. While training on ever-larger portions of the internet leads to consistent performance improvements, the size of these improvements diminishes with scale, and there ha… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

  6. arXiv:2308.03977  [pdf, other

    cs.CV cs.LG

    PUG: Photorealistic and Semantically Controllable Synthetic Data for Representation Learning

    Authors: Florian Bordes, Shashank Shekhar, Mark Ibrahim, Diane Bouchacourt, Pascal Vincent, Ari S. Morcos

    Abstract: Synthetic image datasets offer unmatched advantages for designing and evaluating deep neural networks: they make it possible to (i) render as many data samples as needed, (ii) precisely control each scene and yield granular ground truth labels (and captions), (iii) precisely control distribution shifts between training and testing to isolate variables of interest for sound experimentation. Despite… ▽ More

    Submitted 12 December, 2023; v1 submitted 7 August, 2023; originally announced August 2023.

  7. arXiv:2305.17409  [pdf, other

    cs.LG

    On the special role of class-selective neurons in early training

    Authors: Omkar Ranadive, Nikhil Thakurdesai, Ari S Morcos, Matthew Leavitt, Stéphane Deny

    Abstract: It is commonly observed that deep networks trained for classification exhibit class-selective neurons in their early and intermediate layers. Intriguingly, recent studies have shown that these class-selective neurons can be ablated without deteriorating network function. But if class-selective neurons are not necessary, why do they exist? We attempt to answer this question in a series of experimen… ▽ More

    Submitted 27 May, 2023; originally announced May 2023.

  8. arXiv:2304.13089  [pdf, other

    cs.LG cs.CV eess.IV

    Objectives Matter: Understanding the Impact of Self-Supervised Objectives on Vision Transformer Representations

    Authors: Shashank Shekhar, Florian Bordes, Pascal Vincent, Ari Morcos

    Abstract: Joint-embedding based learning (e.g., SimCLR, MoCo, DINO) and reconstruction-based learning (e.g., BEiT, SimMIM, MAE) are the two leading paradigms for self-supervised learning of vision transformers, but they differ substantially in their transfer performance. Here, we aim to explain these differences by analyzing the impact of these objectives on the structure and transferability of the learned… ▽ More

    Submitted 25 April, 2023; originally announced April 2023.

  9. arXiv:2304.13013  [pdf, other

    cs.LG cs.CV

    Stable and low-precision training for large-scale vision-language models

    Authors: Mitchell Wortsman, Tim Dettmers, Luke Zettlemoyer, Ari Morcos, Ali Farhadi, Ludwig Schmidt

    Abstract: We introduce new methods for 1) accelerating and 2) stabilizing training for large language-vision models. 1) For acceleration, we introduce SwitchBack, a linear layer for int8 quantized training which provides a speed-up of 13-25% while matching the performance of bfloat16 training within 0.1 percentage points for the 1B parameter CLIP ViT-Huge -- the largest int8 training to date. Our main focus… ▽ More

    Submitted 16 October, 2023; v1 submitted 25 April, 2023; originally announced April 2023.

    Comments: NeurIPS 2023

  10. arXiv:2304.12210  [pdf, other

    cs.LG cs.CV

    A Cookbook of Self-Supervised Learning

    Authors: Randall Balestriero, Mark Ibrahim, Vlad Sobal, Ari Morcos, Shashank Shekhar, Tom Goldstein, Florian Bordes, Adrien Bardes, Gregoire Mialon, Yuandong Tian, Avi Schwarzschild, Andrew Gordon Wilson, Jonas Geiping, Quentin Garrido, Pierre Fernandez, Amir Bar, Hamed Pirsiavash, Yann LeCun, Micah Goldblum

    Abstract: Self-supervised learning, dubbed the dark matter of intelligence, is a promising path to advance machine learning. Yet, much like cooking, training SSL methods is a delicate art with a high barrier to entry. While many components are familiar, successfully training a SSL method involves a dizzying set of choices from the pretext tasks to training hyper-parameters. Our goal is to lower the barrier… ▽ More

    Submitted 28 June, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

  11. arXiv:2303.09540  [pdf, other

    cs.LG cs.AI cs.CV

    SemDeDup: Data-efficient learning at web-scale through semantic deduplication

    Authors: Amro Abbas, Kushal Tirumala, Dániel Simig, Surya Ganguli, Ari S. Morcos

    Abstract: Progress in machine learning has been driven in large part by massive increases in data. However, large web-scale datasets such as LAION are largely uncurated beyond searches for exact duplicates, potentially leaving much redundancy. Here, we introduce SemDeDup, a method which leverages embeddings from pre-trained models to identify and remove semantic duplicates: data pairs which are semantically… ▽ More

    Submitted 22 March, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

  12. arXiv:2301.13261  [pdf, other

    cs.AI cs.CV cs.LG cs.RO

    Emergence of Maps in the Memories of Blind Navigation Agents

    Authors: Erik Wijmans, Manolis Savva, Irfan Essa, Stefan Lee, Ari S. Morcos, Dhruv Batra

    Abstract: Animal navigation research posits that organisms build and maintain internal spatial representations, or maps, of their environment. We ask if machines -- specifically, artificial intelligence (AI) navigation agents -- also build implicit (or 'mental') maps. A positive answer to this question would (a) explain the surprising phenomenon in recent literature of ostensibly map-free neural-networks ac… ▽ More

    Submitted 30 January, 2023; originally announced January 2023.

    Comments: Accepted to ICLR 2023

  13. arXiv:2210.13604  [pdf, other

    cs.CV cs.LG

    The Robustness Limits of SoTA Vision Models to Natural Variation

    Authors: Mark Ibrahim, Quentin Garrido, Ari Morcos, Diane Bouchacourt

    Abstract: Recent state-of-the-art vision models introduced new architectures, learning paradigms, and larger pretraining data, leading to impressive performance on tasks such as classification. While previous generations of vision models were shown to lack robustness to factors such as pose, it's unclear the extent to which this next generation of models are more robust. To study this question, we develop a… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

  14. arXiv:2210.13356  [pdf, other

    cs.CV cs.LG

    Robust Self-Supervised Learning with Lie Groups

    Authors: Mark Ibrahim, Diane Bouchacourt, Ari Morcos

    Abstract: Deep learning has led to remarkable advances in computer vision. Even so, today's best models are brittle when presented with variations that differ even slightly from those seen during training. Minor shifts in the pose, color, or illumination of an object can lead to catastrophic misclassifications. State-of-the art models struggle to understand how a set of variations can affect different objec… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

  15. arXiv:2210.11948  [pdf, other

    cs.LG

    lo-fi: distributed fine-tuning without communication

    Authors: Mitchell Wortsman, Suchin Gururangan, Shen Li, Ali Farhadi, Ludwig Schmidt, Michael Rabbat, Ari S. Morcos

    Abstract: When fine-tuning large neural networks, it is common to use multiple nodes and to communicate gradients at each optimization step. By contrast, we investigate completely local fine-tuning, which we refer to as lo-fi. During lo-fi, each node is fine-tuned independently without any communication. Then, the weights are averaged across nodes at the conclusion of fine-tuning. When fine-tuning DeiT-base… ▽ More

    Submitted 12 November, 2022; v1 submitted 19 October, 2022; originally announced October 2022.

  16. arXiv:2206.14486  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Beyond neural scaling laws: beating power law scaling via data pruning

    Authors: Ben Sorscher, Robert Geirhos, Shashank Shekhar, Surya Ganguli, Ari S. Morcos

    Abstract: Widely observed neural scaling laws, in which error falls off as a power of the training set size, model size, or both, have driven substantial performance improvements in deep learning. However, these improvements through scaling alone require considerable costs in compute and energy. Here we focus on the scaling of error with dataset size and show how in theory we can break beyond power law scal… ▽ More

    Submitted 21 April, 2023; v1 submitted 29 June, 2022; originally announced June 2022.

    Comments: Outstanding Paper Award @ NeurIPS 2022. Added github link to metric scores

  17. arXiv:2203.05482  [pdf, other

    cs.LG cs.CL cs.CV

    Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

    Authors: Mitchell Wortsman, Gabriel Ilharco, Samir Yitzhak Gadre, Rebecca Roelofs, Raphael Gontijo-Lopes, Ari S. Morcos, Hongseok Namkoong, Ali Farhadi, Yair Carmon, Simon Kornblith, Ludwig Schmidt

    Abstract: The conventional recipe for maximizing model accuracy is to (1) train multiple models with various hyperparameters and (2) pick the individual model which performs best on a held-out validation set, discarding the remainder. In this paper, we revisit the second step of this procedure in the context of fine-tuning large pre-trained models, where fine-tuned models often appear to lie in a single low… ▽ More

    Submitted 1 July, 2022; v1 submitted 10 March, 2022; originally announced March 2022.

    Comments: ICML 2022. The last three authors contributed equally

  18. arXiv:2110.08133  [pdf, other

    cs.LG cs.CV

    Trade-offs of Local SGD at Scale: An Empirical Study

    Authors: Jose Javier Gonzalez Ortiz, Jonathan Frankle, Mike Rabbat, Ari Morcos, Nicolas Ballas

    Abstract: As datasets and models become increasingly large, distributed training has become a necessary component to allow deep neural networks to train in reasonable amounts of time. However, distributed training can have substantial communication overhead that hinders its scalability. One strategy for reducing this overhead is to perform multiple unsynchronized SGD steps independently on each worker betwe… ▽ More

    Submitted 15 October, 2021; originally announced October 2021.

  19. arXiv:2106.05795  [pdf, other

    cs.LG

    Transformed CNNs: recasting pre-trained convolutional layers with self-attention

    Authors: Stéphane d'Ascoli, Levent Sagun, Giulio Biroli, Ari Morcos

    Abstract: Vision Transformers (ViT) have recently emerged as a powerful alternative to convolutional networks (CNNs). Although hybrid models attempt to bridge the gap between these two architectures, the self-attention layers they rely on induce a strong computational bottleneck, especially at large spatial resolutions. In this work, we explore the idea of reducing the time spent training these layers by in… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

  20. arXiv:2106.05121  [pdf, other

    cs.CV

    Grounding inductive biases in natural images:invariance stems from variations in data

    Authors: Diane Bouchacourt, Mark Ibrahim, Ari S. Morcos

    Abstract: To perform well on unseen and potentially out-of-distribution samples, it is desirable for machine learning models to have a predictable response with respect to transformations affecting the factors of variation of the input. Here, we study the relative importance of several types of inductive biases towards such predictable behavior: the choice of data, their augmentations, and model architectur… ▽ More

    Submitted 16 November, 2021; v1 submitted 9 June, 2021; originally announced June 2021.

  21. arXiv:2104.13255  [pdf, other

    cs.CV cs.LG

    Width Transfer: On the (In)variance of Width Optimization

    Authors: Ting-Wu Chin, Diana Marculescu, Ari S. Morcos

    Abstract: Optimizing the channel counts for different layers of a CNN has shown great promise in improving the efficiency of CNNs at test-time. However, these methods often introduce large computational overhead (e.g., an additional 2x FLOPs of standard training). Minimizing this overhead could therefore significantly speed up training. In this work, we propose width transfer, a technique that harnesses the… ▽ More

    Submitted 24 April, 2021; originally announced April 2021.

    Comments: Full paper accepted at CVPR Workshops 2021; a 4-page abridged version is accepted at ICLR 2021 NAS Workshop

  22. arXiv:2103.12719  [pdf, other

    cs.CV cs.AI

    Characterizing and Improving the Robustness of Self-Supervised Learning through Background Augmentations

    Authors: Chaitanya K. Ryali, David J. Schwab, Ari S. Morcos

    Abstract: Recent progress in self-supervised learning has demonstrated promising results in multiple visual tasks. An important ingredient in high-performing self-supervised methods is the use of data augmentation by training models to place different augmented views of the same image nearby in embedding space. However, commonly used augmentation pipelines treat images holistically, ignoring the semantic re… ▽ More

    Submitted 12 November, 2021; v1 submitted 23 March, 2021; originally announced March 2021.

    Comments: Technical Report; Additional Results

  23. arXiv:2103.10697  [pdf, other

    cs.CV cs.LG stat.ML

    ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases

    Authors: Stéphane d'Ascoli, Hugo Touvron, Matthew Leavitt, Ari Morcos, Giulio Biroli, Levent Sagun

    Abstract: Convolutional architectures have proven extremely successful for vision tasks. Their hard inductive biases enable sample-efficient learning, but come at the cost of a potentially lower performance ceiling. Vision Transformers (ViTs) rely on more flexible self-attention layers, and have recently outperformed CNNs for image classification. However, they require costly pre-training on large external… ▽ More

    Submitted 10 June, 2021; v1 submitted 19 March, 2021; originally announced March 2021.

  24. arXiv:2012.15045  [pdf, other

    cs.CL

    Reservoir Transformers

    Authors: Sheng Shen, Alexei Baevski, Ari S. Morcos, Kurt Keutzer, Michael Auli, Douwe Kiela

    Abstract: We demonstrate that transformers obtain impressive performance even when some of the layers are randomly initialized and never updated. Inspired by old and well-established ideas in machine learning, we explore a variety of non-linear "reservoir" layers interspersed with regular transformer layers, and show improvements in wall-clock compute time until convergence, as well as overall performance,… ▽ More

    Submitted 1 June, 2021; v1 submitted 30 December, 2020; originally announced December 2020.

    Comments: ACL 2021

  25. arXiv:2010.12016  [pdf, other

    cs.CY cs.AI cs.CV cs.LG stat.ML

    Towards falsifiable interpretability research

    Authors: Matthew L. Leavitt, Ari Morcos

    Abstract: Methods for understanding the decisions of and mechanisms underlying deep neural networks (DNNs) typically rely on building intuition by emphasizing sensory or semantic features of individual examples. For instance, methods aim to visualize the components of an input which are "important" to a network's decision, or to measure the semantic properties of single neurons. Here, we argue that interpre… ▽ More

    Submitted 22 October, 2020; originally announced October 2020.

  26. arXiv:2010.07693  [pdf, other

    cs.LG cs.AI cs.CV cs.NE stat.ML

    Linking average- and worst-case perturbation robustness via class selectivity and dimensionality

    Authors: Matthew L. Leavitt, Ari Morcos

    Abstract: Representational sparsity is known to affect robustness to input perturbations in deep neural networks (DNNs), but less is known about how the semantic content of representations affects robustness. Class selectivity-the variability of a unit's responses across data classes or dimensions-is one way of quantifying the sparsity of semantic representations. Given recent evidence that class selectivit… ▽ More

    Submitted 29 March, 2021; v1 submitted 13 October, 2020; originally announced October 2020.

    Comments: arXiv admin note: text overlap with arXiv:2007.04440

  27. arXiv:2010.06682  [pdf, other

    cs.CV cs.LG eess.IV

    Are all negatives created equal in contrastive instance discrimination?

    Authors: Tiffany Tianhui Cai, Jonathan Frankle, David J. Schwab, Ari S. Morcos

    Abstract: Self-supervised learning has recently begun to rival supervised learning on computer vision tasks. Many of the recent approaches have been based on contrastive instance discrimination (CID), in which the network is trained to recognize two augmented versions of the same instance (a query and positive) while discriminating against a pool of other instances (negatives). The learned representation is… ▽ More

    Submitted 25 October, 2020; v1 submitted 13 October, 2020; originally announced October 2020.

    Comments: Fixed author name error

  28. arXiv:2010.02855  [pdf, other

    cs.AI cs.LG

    CURI: A Benchmark for Productive Concept Learning Under Uncertainty

    Authors: Ramakrishna Vedantam, Arthur Szlam, Maximilian Nickel, Ari Morcos, Brenden Lake

    Abstract: Humans can learn and reason under substantial uncertainty in a space of infinitely many concepts, including structured relational concepts ("a scene with objects that have the same color") and ad-hoc categories defined through goals ("objects that could fall on one's head"). In contrast, standard classification benchmarks: 1) consider only a fixed set of category labels, 2) do not evaluate composi… ▽ More

    Submitted 6 October, 2020; originally announced October 2020.

  29. arXiv:2007.11752  [pdf, other

    cs.LG cs.CV stat.ML

    Joslim: Joint Widths and Weights Optimization for Slimmable Neural Networks

    Authors: Ting-Wu Chin, Ari S. Morcos, Diana Marculescu

    Abstract: Slimmable neural networks provide a flexible trade-off front between prediction error and computational requirement (such as the number of floating-point operations or FLOPs) with the same storage requirement as a single model. They are useful for reducing maintenance overhead for deploying models to devices with different memory constraints and are useful for optimizing the efficiency of a system… ▽ More

    Submitted 30 June, 2021; v1 submitted 22 July, 2020; originally announced July 2020.

    Comments: Accepted at ECML-PKDD 2021 (Research Track), 4-page abridged versions have been accepted at non-archival venues including RealML and DMMLSys workshops at ICML'20 and DLP-KDD and AdvML workshops at KDD'20

  30. arXiv:2007.04440  [pdf, other

    cs.LG stat.ML

    On the relationship between class selectivity, dimensionality, and robustness

    Authors: Matthew L. Leavitt, Ari S. Morcos

    Abstract: While the relative trade-offs between sparse and distributed representations in deep neural networks (DNNs) are well-studied, less is known about how these trade-offs apply to representations of semantically-meaningful information. Class selectivity, the variability of a unit's responses across data classes or dimensions, is one way of quantifying the sparsity of semantic representations. Given re… ▽ More

    Submitted 13 October, 2020; v1 submitted 8 July, 2020; originally announced July 2020.

    Comments: Presented at the ICML 2020 Workshop on Uncertainty and Robustness in Deep Learning

  31. arXiv:2005.03648  [pdf, other

    cs.LG cs.AI stat.ML

    Plan2Vec: Unsupervised Representation Learning by Latent Plans

    Authors: Ge Yang, Amy Zhang, Ari S. Morcos, Joelle Pineau, Pieter Abbeel, Roberto Calandra

    Abstract: In this paper we introduce plan2vec, an unsupervised representation learning approach that is inspired by reinforcement learning. Plan2vec constructs a weighted graph on an image dataset using near-neighbor distances, and then extrapolates this local metric to a global embedding by distilling path-integral over planned path. When applied to control, plan2vec offers a way to learn goal-conditioned… ▽ More

    Submitted 7 May, 2020; originally announced May 2020.

    Comments: code available at https://geyang.github.io/plan2vec

    Journal ref: Proceedings of Machine Learning Research, the 2nd Annual Conference on Learning for Dynamics and Control (2020) Volume 120, 1-12

  32. arXiv:2003.05993  [pdf, other

    cs.CV cs.AI cs.LG

    Analyzing Visual Representations in Embodied Navigation Tasks

    Authors: Erik Wijmans, Julian Straub, Dhruv Batra, Irfan Essa, Judy Hoffman, Ari Morcos

    Abstract: Recent advances in deep reinforcement learning require a large amount of training data and generally result in representations that are often over specialized to the target task. In this work, we present a methodology to study the underlying potential causes for this specialization. We use the recently proposed projection weighted Canonical Correlation Analysis (PWCCA) to measure the similarity of… ▽ More

    Submitted 12 March, 2020; originally announced March 2020.

  33. arXiv:2003.01262  [pdf, other

    cs.LG cs.NE q-bio.NC stat.ML

    Selectivity considered harmful: evaluating the causal impact of class selectivity in DNNs

    Authors: Matthew L. Leavitt, Ari Morcos

    Abstract: The properties of individual neurons are often analyzed in order to understand the biological and artificial neural networks in which they're embedded. Class selectivity-typically defined as how different a neuron's responses are across different classes of stimuli or data samples-is commonly used for this purpose. However, it remains an open question whether it is necessary and/or sufficient for… ▽ More

    Submitted 14 October, 2020; v1 submitted 2 March, 2020; originally announced March 2020.

  34. arXiv:2003.00152  [pdf, other

    cs.LG cs.AI cs.NE stat.ML

    Training BatchNorm and Only BatchNorm: On the Expressive Power of Random Features in CNNs

    Authors: Jonathan Frankle, David J. Schwab, Ari S. Morcos

    Abstract: A wide variety of deep learning techniques from style transfer to multitask learning rely on training affine transformations of features. Most prominent among these is the popular feature normalization technique BatchNorm, which normalizes activations and then subsequently applies a learned affine transform. In this paper, we aim to understand the role and expressive power of affine parameters use… ▽ More

    Submitted 21 March, 2021; v1 submitted 28 February, 2020; originally announced March 2020.

    Comments: Published in ICLR 2021

  35. arXiv:2002.11829  [pdf, other

    cs.LG stat.ML

    Representation Learning Through Latent Canonicalizations

    Authors: Or Litany, Ari Morcos, Srinath Sridhar, Leonidas Guibas, Judy Hoffman

    Abstract: We seek to learn a representation on a large annotated data source that generalizes to a target domain using limited new supervision. Many prior approaches to this problem have focused on learning "disentangled" representations so that as individual factors vary in a new domain, only a portion of the representation need be updated. In this work, we seek the generalization power of disentangled rep… ▽ More

    Submitted 26 February, 2020; originally announced February 2020.

  36. arXiv:2002.10365  [pdf, other

    cs.LG cs.NE stat.ML

    The Early Phase of Neural Network Training

    Authors: Jonathan Frankle, David J. Schwab, Ari S. Morcos

    Abstract: Recent studies have shown that many important aspects of neural network learning take place within the very earliest iterations or epochs of training. For example, sparse, trainable sub-networks emerge (Frankle et al., 2019), gradient descent moves into a small subspace (Gur-Ari et al., 2018), and the network undergoes a critical period (Achille et al., 2019). Here, we examine the changes that dee… ▽ More

    Submitted 24 February, 2020; originally announced February 2020.

    Comments: ICLR 2020 Camera Ready. Available on OpenReview at https://openreview.net/forum?id=Hkl1iRNFwS

  37. arXiv:2001.03554  [pdf, other

    cs.CV cs.LG cs.NE

    Pruning Convolutional Neural Networks with Self-Supervision

    Authors: Mathilde Caron, Ari Morcos, Piotr Bojanowski, Julien Mairal, Armand Joulin

    Abstract: Convolutional neural networks trained without supervision come close to matching performance with supervised pre-training, but sometimes at the cost of an even higher number of parameters. Extracting subnetworks from these large unsupervised convnets with preserved performance is of particular interest to make them less computationally intensive. Typical pruning methods operate during training on… ▽ More

    Submitted 10 January, 2020; originally announced January 2020.

  38. arXiv:1911.00357  [pdf, other

    cs.CV cs.AI cs.LG

    DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames

    Authors: Erik Wijmans, Abhishek Kadian, Ari Morcos, Stefan Lee, Irfan Essa, Devi Parikh, Manolis Savva, Dhruv Batra

    Abstract: We present Decentralized Distributed Proximal Policy Optimization (DD-PPO), a method for distributed reinforcement learning in resource-intensive simulated environments. DD-PPO is distributed (uses multiple machines), decentralized (lacks a centralized server), and synchronous (no computation is ever stale), making it conceptually simple and easy to implement. In our experiments on training virtua… ▽ More

    Submitted 19 January, 2020; v1 submitted 1 November, 2019; originally announced November 2019.

  39. arXiv:1906.03728  [pdf, other

    cs.LG stat.ML

    The Generalization-Stability Tradeoff In Neural Network Pruning

    Authors: Brian R. Bartoldson, Ari S. Morcos, Adrian Barbu, Gordon Erlebacher

    Abstract: Pruning neural network parameters is often viewed as a means to compress models, but pruning has also been motivated by the desire to prevent overfitting. This motivation is particularly relevant given the perhaps surprising observation that a wide variety of pruning approaches increase test accuracy despite sometimes massive reductions in parameter counts. To better understand this phenomenon, we… ▽ More

    Submitted 22 October, 2020; v1 submitted 9 June, 2019; originally announced June 2019.

    Comments: NeurIPS 2020 conference paper

  40. arXiv:1906.02773  [pdf, other

    stat.ML cs.LG cs.NE

    One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers

    Authors: Ari S. Morcos, Haonan Yu, Michela Paganini, Yuandong Tian

    Abstract: The success of lottery ticket initializations (Frankle and Carbin, 2019) suggests that small, sparsified networks can be trained so long as the network is initialized appropriately. Unfortunately, finding these "winning ticket" initializations is computationally expensive. One potential solution is to reuse the same winning tickets across a variety of datasets and optimizers. However, the generali… ▽ More

    Submitted 27 October, 2019; v1 submitted 6 June, 2019; originally announced June 2019.

    Comments: NeurIPS 2019

  41. arXiv:1906.02768  [pdf, other

    stat.ML cs.AI cs.LG cs.NE

    Playing the lottery with rewards and multiple languages: lottery tickets in RL and NLP

    Authors: Haonan Yu, Sergey Edunov, Yuandong Tian, Ari S. Morcos

    Abstract: The lottery ticket hypothesis proposes that over-parameterization of deep neural networks (DNNs) aids training by increasing the probability of a "lucky" sub-network initialization being present rather than by helping the optimization process (Frankle & Carbin, 2019). Intriguingly, this phenomenon suggests that initialization strategies for DNNs can be improved substantially, but the lottery ticke… ▽ More

    Submitted 25 February, 2020; v1 submitted 6 June, 2019; originally announced June 2019.

    Comments: ICLR 2020

  42. arXiv:1905.13405  [pdf, other

    cs.LG stat.ML

    Luck Matters: Understanding Training Dynamics of Deep ReLU Networks

    Authors: Yuandong Tian, Tina Jiang, Qucheng Gong, Ari Morcos

    Abstract: We analyze the dynamics of training deep ReLU networks and their implications on generalization capability. Using a teacher-student setting, we discovered a novel relationship between the gradient received by hidden student nodes and the activations of teacher nodes for deep ReLU networks. With this relationship and the assumption of small overlapping teacher node activations, we prove that (1) st… ▽ More

    Submitted 28 June, 2019; v1 submitted 31 May, 2019; originally announced May 2019.

  43. arXiv:1902.00120  [pdf, other

    cs.AI

    Learning to Make Analogies by Contrasting Abstract Relational Structure

    Authors: Felix Hill, Adam Santoro, David G. T. Barrett, Ari S. Morcos, Timothy Lillicrap

    Abstract: Analogical reasoning has been a principal focus of various waves of AI research. Analogy is particularly challenging for machines because it requires relational structures to be represented such that they can be flexibly applied across diverse domains of experience. Here, we study how analogical reasoning can be induced in neural networks that learn to perceive and reason about raw visual data. We… ▽ More

    Submitted 31 January, 2019; originally announced February 2019.

  44. arXiv:1810.13373  [pdf, other

    q-bio.NC cs.AI cs.CV cs.LG stat.ML

    Analyzing biological and artificial neural networks: challenges with opportunities for synergy?

    Authors: David G. T. Barrett, Ari S. Morcos, Jakob H. Macke

    Abstract: Deep neural networks (DNNs) transform stimuli across multiple processing stages to produce representations that can be used to solve complex tasks, such as object recognition in images. However, a full understanding of how they achieve this remains elusive. The complexity of biological neural networks substantially exceeds the complexity of DNNs, making it even more challenging to understand the r… ▽ More

    Submitted 31 October, 2018; originally announced October 2018.

  45. arXiv:1807.04225  [pdf, other

    cs.LG stat.ML

    Measuring abstract reasoning in neural networks

    Authors: David G. T. Barrett, Felix Hill, Adam Santoro, Ari S. Morcos, Timothy Lillicrap

    Abstract: Whether neural networks can learn abstract reasoning or whether they merely rely on superficial statistics is a topic of recent debate. Here, we propose a dataset and challenge designed to probe abstract reasoning, inspired by a well-known human IQ test. To succeed at this challenge, models must cope with various generalisation `regimes' in which the training and test data differ in clearly-define… ▽ More

    Submitted 11 July, 2018; originally announced July 2018.

    Comments: ICML 2018

  46. arXiv:1807.01281  [pdf, other

    cs.LG cs.AI stat.ML

    Human-level performance in first-person multiplayer games with population-based deep reinforcement learning

    Authors: Max Jaderberg, Wojciech M. Czarnecki, Iain Dunning, Luke Marris, Guy Lever, Antonio Garcia Castaneda, Charles Beattie, Neil C. Rabinowitz, Ari S. Morcos, Avraham Ruderman, Nicolas Sonnerat, Tim Green, Louise Deason, Joel Z. Leibo, David Silver, Demis Hassabis, Koray Kavukcuoglu, Thore Graepel

    Abstract: Recent progress in artificial intelligence through reinforcement learning (RL) has shown great success on increasingly complex single-agent environments and two-player turn-based games. However, the real-world contains multiple agents, each learning and acting independently to cooperate and compete with other agents, and environments reflecting this degree of complexity remain an open challenge. I… ▽ More

    Submitted 3 July, 2018; originally announced July 2018.

  47. arXiv:1806.05759  [pdf, other

    stat.ML cs.AI cs.CV cs.LG cs.NE

    Insights on representational similarity in neural networks with canonical correlation

    Authors: Ari S. Morcos, Maithra Raghu, Samy Bengio

    Abstract: Comparing different neural network representations and determining how representations evolve over time remain challenging open questions in our understanding of the function of neural networks. Comparing representations in neural networks is fundamentally difficult as the structure of representations varies greatly, even across groups of networks trained on identical tasks, and over the course of… ▽ More

    Submitted 23 October, 2018; v1 submitted 14 June, 2018; originally announced June 2018.

    Comments: NIPS 2018

  48. arXiv:1804.04438  [pdf, other

    cs.CV cs.LG stat.ML

    Pooling is neither necessary nor sufficient for appropriate deformation stability in CNNs

    Authors: Avraham Ruderman, Neil C. Rabinowitz, Ari S. Morcos, Daniel Zoran

    Abstract: Many of our core assumptions about how neural networks operate remain empirically untested. One common assumption is that convolutional neural networks need to be stable to small translations and deformations to solve image recognition tasks. For many years, this stability was baked into CNN architectures by incorporating interleaved pooling layers. Recently, however, interleaved pooling has large… ▽ More

    Submitted 25 May, 2018; v1 submitted 12 April, 2018; originally announced April 2018.

    Comments: NIPS 2018 submission

  49. arXiv:1803.06959  [pdf, other

    stat.ML cs.AI cs.LG cs.NE

    On the importance of single directions for generalization

    Authors: Ari S. Morcos, David G. T. Barrett, Neil C. Rabinowitz, Matthew Botvinick

    Abstract: Despite their ability to memorize large datasets, deep neural networks often achieve good generalization performance. However, the differences between the learned solutions of networks which generalize and those which do not remain unclear. Additionally, the tuning properties of single directions (defined as the activation of a single unit or some linear combination of units in response to some in… ▽ More

    Submitted 22 May, 2018; v1 submitted 19 March, 2018; originally announced March 2018.

    Comments: ICLR 2018 conference paper; added additional methodological details