Skip to main content

Showing 1–50 of 58 results for author: Achille, A

  1. arXiv:2407.08934  [pdf, other

    cs.LG

    Compositional Structures in Neural Embedding and Interaction Decompositions

    Authors: Matthew Trager, Alessandro Achille, Pramuditha Perera, Luca Zancato, Stefano Soatto

    Abstract: We describe a basic correspondence between linear algebraic structures within vector embeddings in artificial neural networks and conditional independence constraints on the probability distributions modeled by these networks. Our framework aims to shed light on the emergence of structural patterns in data representations, a phenomenon widely acknowledged but arguably still lacking a solid formal… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 15 pages, 3 figures

  2. arXiv:2407.06324  [pdf, other

    cs.LG cs.CL cs.NE

    B'MOJO: Hybrid State Space Realizations of Foundation Models with Eidetic and Fading Memory

    Authors: Luca Zancato, Arjun Seshadri, Yonatan Dukler, Aditya Golatkar, Yantao Shen, Benjamin Bowman, Matthew Trager, Alessandro Achille, Stefano Soatto

    Abstract: We describe a family of architectures to support transductive inference by allowing memory to grow to a finite but a-priori unknown bound while making efficient use of finite resources for inference. Current architectures use such resources to represent data either eidetically over a finite span ("context" in Transformers), or fading over an infinite span (in State Space Models, or SSMs). Recent h… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  3. arXiv:2406.08431  [pdf, other

    cs.CV cs.AI cs.CR cs.LG

    Diffusion Soup: Model Merging for Text-to-Image Diffusion Models

    Authors: Benjamin Biggs, Arjun Seshadri, Yang Zou, Achin Jain, Aditya Golatkar, Yusheng Xie, Alessandro Achille, Ashwin Swaminathan, Stefano Soatto

    Abstract: We present Diffusion Soup, a compartmentalization method for Text-to-Image Generation that averages the weights of diffusion models trained on sharded data. By construction, our approach enables training-free continual learning and unlearning with no additional memory or inference costs, since models corresponding to data shards can be added or removed by re-averaging. We show that Diffusion Soup… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  4. arXiv:2406.03684  [pdf, other

    cs.CV cs.CR

    Principles of Designing Robust Remote Face Anti-Spoofing Systems

    Authors: Xiang Xu, Tianchen Zhao, Zheng Zhang, Zhihua Li, Jon Wu, Alessandro Achille, Mani Srivastava

    Abstract: Protecting digital identities of human face from various attack vectors is paramount, and face anti-spoofing plays a crucial role in this endeavor. Current approaches primarily focus on detecting spoofing attempts within individual frames to detect presentation attacks. However, the emergence of hyper-realistic generative models capable of real-time operation has heightened the risk of digitally g… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Under review

  5. arXiv:2404.19204  [pdf, other

    cs.CV cs.AI cs.GR

    NeRF-Insert: 3D Local Editing with Multimodal Control Signals

    Authors: Benet Oriol Sabat, Alessandro Achille, Matthew Trager, Stefano Soatto

    Abstract: We propose NeRF-Insert, a NeRF editing framework that allows users to make high-quality local edits with a flexible level of control. Unlike previous work that relied on image-to-image models, we cast scene editing as an in-painting problem, which encourages the global structure of the scene to be preserved. Moreover, while most existing methods use only textual prompts to condition edits, our fra… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  6. arXiv:2403.18920  [pdf, other

    cs.CR cs.AI cs.CV

    CPR: Retrieval Augmented Generation for Copyright Protection

    Authors: Aditya Golatkar, Alessandro Achille, Luca Zancato, Yu-Xiang Wang, Ashwin Swaminathan, Stefano Soatto

    Abstract: Retrieval Augmented Generation (RAG) is emerging as a flexible and robust technique to adapt models to private users data without training, to handle credit attribution, and to allow efficient machine unlearning at scale. However, RAG techniques for image generation may lead to parts of the retrieved samples being copied in the model's output. To reduce risks of leaking private information contain… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: CVPR 2024

  7. arXiv:2403.14003  [pdf, other

    cs.CV cs.CL cs.LG

    Multi-Modal Hallucination Control by Visual Information Grounding

    Authors: Alessandro Favero, Luca Zancato, Matthew Trager, Siddharth Choudhary, Pramuditha Perera, Alessandro Achille, Ashwin Swaminathan, Stefano Soatto

    Abstract: Generative Vision-Language Models (VLMs) are prone to generate plausible-sounding textual answers that, however, are not always grounded in the input image. We investigate this phenomenon, usually referred to as "hallucination" and show that it stems from an excessive reliance on the language prior. In particular, we show that as more tokens are generated, the reliance on the visual prompt decreas… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Journal ref: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024

  8. arXiv:2402.08919  [pdf, other

    cs.CV cs.LG

    Interpretable Measures of Conceptual Similarity by Complexity-Constrained Descriptive Auto-Encoding

    Authors: Alessandro Achille, Greg Ver Steeg, Tian Yu Liu, Matthew Trager, Carson Klingenberg, Stefano Soatto

    Abstract: Quantifying the degree of similarity between images is a key copyright issue for image-based machine learning. In legal doctrine however, determining the degree of similarity between works requires subjective analysis, and fact-finders (judges and juries) can demonstrate considerable variability in these subjective judgement calls. Images that are structurally similar can be deemed dissimilar, whe… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

  9. arXiv:2310.18348  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    Meaning Representations from Trajectories in Autoregressive Models

    Authors: Tian Yu Liu, Matthew Trager, Alessandro Achille, Pramuditha Perera, Luca Zancato, Stefano Soatto

    Abstract: We propose to extract meaning representations from autoregressive language models by considering the distribution of all possible trajectories extending an input text. This strategy is prompt-free, does not require fine-tuning, and is applicable to any pre-trained autoregressive model. Moreover, unlike vector-based representations, distribution-based representations can also model asymmetric relat… ▽ More

    Submitted 29 November, 2023; v1 submitted 23 October, 2023; originally announced October 2023.

  10. arXiv:2308.12221  [pdf, other

    cs.LG cs.AI q-bio.NC stat.ML

    Critical Learning Periods Emerge Even in Deep Linear Networks

    Authors: Michael Kleinman, Alessandro Achille, Stefano Soatto

    Abstract: Critical learning periods are periods early in development where temporary sensory deficits can have a permanent effect on behavior and learned representations. Despite the radical differences between biological and artificial networks, critical learning periods have been empirically observed in both systems. This suggests that critical periods may be fundamental to learning and not an accident of… ▽ More

    Submitted 24 May, 2024; v1 submitted 23 August, 2023; originally announced August 2023.

    Comments: ICLR 2024 (Spotlight)

  11. arXiv:2308.01937  [pdf, other

    cs.LG cs.AI cs.CR cs.CV

    Training Data Protection with Compositional Diffusion Models

    Authors: Aditya Golatkar, Alessandro Achille, Ashwin Swaminathan, Stefano Soatto

    Abstract: We introduce Compartmentalized Diffusion Models (CDM), a method to train different diffusion models (or prompts) on distinct data sources and arbitrarily compose them at inference time. The individual models can be trained in isolation, at different times, and on different distributions and domains and can be later composed to achieve performance comparable to a paragon model trained on all data s… ▽ More

    Submitted 13 February, 2024; v1 submitted 2 August, 2023; originally announced August 2023.

  12. arXiv:2306.03727  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Towards Visual Foundational Models of Physical Scenes

    Authors: Chethan Parameshwara, Alessandro Achille, Matthew Trager, Xiaolong Li, Jiawei Mo, Matthew Trager, Ashwin Swaminathan, CJ Taylor, Dheera Venkatraman, Xiaohan Fei, Stefano Soatto

    Abstract: We describe a first step towards learning general-purpose visual representations of physical scenes using only image prediction as a training criterion. To do so, we first define "physical scene" and show that, even though different agents may maintain different representations of the same scene, the underlying physical scene that can be inferred is unique. Then, we show that NeRFs cannot represen… ▽ More

    Submitted 6 June, 2023; originally announced June 2023.

    Comments: TLDR: Physical scenes are equivalence classes of sufficient statistics, and can be inferred uniquely by any agent measuring the same finite data; We formalize and implement an approach to representation learning that overturns "naive realism" in favor of an analytical approach of Russell and Koenderink. NeRFs cannot capture the physical scenes, but combined with Diffusion Models they can

  13. arXiv:2306.00310  [pdf, other

    cs.CV

    Prompt Algebra for Task Composition

    Authors: Pramuditha Perera, Matthew Trager, Luca Zancato, Alessandro Achille, Stefano Soatto

    Abstract: We investigate whether prompts learned independently for different tasks can be later combined through prompt algebra to obtain a model that supports composition of tasks. We consider Visual Language Models (VLM) with prompt tuning as our base classifier and formally define the notion of prompt algebra. We propose constrained prompt tuning to improve performance of the composite classifier. In the… ▽ More

    Submitted 31 May, 2023; originally announced June 2023.

  14. arXiv:2304.13169  [pdf, other

    cs.LG

    SAFE: Machine Unlearning With Shard Graphs

    Authors: Yonatan Dukler, Benjamin Bowman, Alessandro Achille, Aditya Golatkar, Ashwin Swaminathan, Stefano Soatto

    Abstract: We present Synergy Aware Forgetting Ensemble (SAFE), a method to adapt large models on a diverse collection of data while minimizing the expected cost to remove the influence of training samples from the trained model. This process, also known as selective forgetting or unlearning, is often conducted by partitioning a dataset into shards, training fully independent models on each, then ensembling… ▽ More

    Submitted 22 August, 2023; v1 submitted 25 April, 2023; originally announced April 2023.

    Comments: Accepted at ICCV 2023

  15. arXiv:2304.07939  [pdf, other

    cs.LG

    Leveraging sparse and shared feature activations for disentangled representation learning

    Authors: Marco Fumero, Florian Wenzel, Luca Zancato, Alessandro Achille, Emanuele Rodolà, Stefano Soatto, Bernhard Schölkopf, Francesco Locatello

    Abstract: Recovering the latent factors of variation of high dimensional data has so far focused on simple synthetic settings. Mostly building on unsupervised and weakly-supervised objectives, prior work missed out on the positive implications for representation learning on real world data. In this work, we propose to leverage knowledge extracted from a diversified set of supervised tasks to learn a common… ▽ More

    Submitted 12 December, 2023; v1 submitted 16 April, 2023; originally announced April 2023.

  16. arXiv:2304.03545  [pdf, other

    cs.LG cs.CR

    AI Model Disgorgement: Methods and Choices

    Authors: Alessandro Achille, Michael Kearns, Carson Klingenberg, Stefano Soatto

    Abstract: Responsible use of data is an indispensable part of any machine learning (ML) implementation. ML developers must carefully collect and curate their datasets, and document their provenance. They must also make sure to respect intellectual property rights, preserve individual privacy, and use data in an ethical way. Over the past few years, ML models have significantly increased in size and complexi… ▽ More

    Submitted 7 April, 2023; originally announced April 2023.

  17. arXiv:2303.14333  [pdf, other

    cs.CV cs.AI

    Train/Test-Time Adaptation with Retrieval

    Authors: Luca Zancato, Alessandro Achille, Tian Yu Liu, Matthew Trager, Pramuditha Perera, Stefano Soatto

    Abstract: We introduce Train/Test-Time Adaptation with Retrieval (${\rm T^3AR}$), a method to adapt models both at train and test time by means of a retrieval module and a searchable pool of external samples. Before inference, ${\rm T^3AR}$ adapts a given model to the downstream task using refined pseudo-labels and a self-supervised contrastive objective function whose noise distribution leverages retrieved… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

  18. arXiv:2303.04105  [pdf, other

    cs.LG cs.CV

    Your representations are in the network: composable and parallel adaptation for large scale models

    Authors: Yonatan Dukler, Alessandro Achille, Hao Yang, Varsha Vivek, Luca Zancato, Benjamin Bowman, Avinash Ravichandran, Charless Fowlkes, Ashwin Swaminathan, Stefano Soatto

    Abstract: We propose InCA, a lightweight method for transfer learning that cross-attends to any activation layer of a pre-trained model. During training, InCA uses a single forward pass to extract multiple activations, which are passed to external cross-attention adapters, trained anew and combined or selected for downstream tasks. We show that, even when selecting a single top-scoring adapter, InCA achieve… ▽ More

    Submitted 31 October, 2023; v1 submitted 7 March, 2023; originally announced March 2023.

    Comments: Accepted to NeurIPS 2023

  19. arXiv:2303.02131  [pdf, other

    quant-ph cs.CC cs.LG

    Spacetime-Efficient Low-Depth Quantum State Preparation with Applications

    Authors: Kaiwen Gui, Alexander M. Dalzell, Alessandro Achille, Martin Suchara, Frederic T. Chong

    Abstract: We propose a novel deterministic method for preparing arbitrary quantum states. When our protocol is compiled into CNOT and arbitrary single-qubit gates, it prepares an $N$-dimensional state in depth $O(\log(N))$ and spacetime allocation (a metric that accounts for the fact that oftentimes some ancilla qubits need not be active for the entire circuit) $O(N)$, which are both optimal. When compiled… ▽ More

    Submitted 9 February, 2024; v1 submitted 3 March, 2023; originally announced March 2023.

    Journal ref: Quantum 8, 1257 (2024)

  20. arXiv:2303.01598  [pdf, other

    cs.CV cs.LG

    A Meta-Learning Approach to Predicting Performance and Data Requirements

    Authors: Achin Jain, Gurumurthy Swaminathan, Paolo Favaro, Hao Yang, Avinash Ravichandran, Hrayr Harutyunyan, Alessandro Achille, Onkar Dabeer, Bernt Schiele, Ashwin Swaminathan, Stefano Soatto

    Abstract: We propose an approach to estimate the number of samples required for a model to reach a target performance. We find that the power law, the de facto principle to estimate model performance, leads to large error when using a small dataset (e.g., 5 samples per class) for extrapolation. This is because the log-performance error against the log-dataset size follows a nonlinear progression in the few-… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

    Comments: CVPR 2023

  21. arXiv:2302.14383  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    Linear Spaces of Meanings: Compositional Structures in Vision-Language Models

    Authors: Matthew Trager, Pramuditha Perera, Luca Zancato, Alessandro Achille, Parminder Bhatia, Stefano Soatto

    Abstract: We investigate compositional structures in data embeddings from pre-trained vision-language models (VLMs). Traditionally, compositionality has been associated with algebraic operations on embeddings of words from a pre-existing vocabulary. In contrast, we seek to approximate representations from an encoder as combinations of a smaller set of vectors in the embedding space. These vectors can be see… ▽ More

    Submitted 11 January, 2024; v1 submitted 28 February, 2023; originally announced February 2023.

    Comments: 18 pages, 9 figures, 7 tables

    Journal ref: Proceedings of the IEEE/CVF International Conference on Computer Vision 2023 (pp. 15395-15404)

  22. arXiv:2302.07994  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    À-la-carte Prompt Tuning (APT): Combining Distinct Data Via Composable Prompting

    Authors: Benjamin Bowman, Alessandro Achille, Luca Zancato, Matthew Trager, Pramuditha Perera, Giovanni Paolini, Stefano Soatto

    Abstract: We introduce À-la-carte Prompt Tuning (APT), a transformer-based scheme to tune prompts on distinct data so that they can be arbitrarily composed at inference time. The individual prompts can be trained in isolation, possibly on different devices, at different times, and on different distributions or domains. Furthermore each prompt only contains information about the subset of data it was exposed… ▽ More

    Submitted 15 February, 2023; originally announced February 2023.

    Comments: 13 pages, 4 figures, 8 tables

  23. arXiv:2211.13108  [pdf, other

    cs.LG

    Integral Continual Learning Along the Tangent Vector Field of Tasks

    Authors: Tian Yu Liu, Aditya Golatkar, Stefano Soatto, Alessandro Achille

    Abstract: We propose a lightweight continual learning method which incorporates information from specialized datasets incrementally, by integrating it along the vector field of "generalist" models. The tangent plane to the specialist model acts as a generalist guide and avoids the kind of over-fitting that leads to catastrophic forgetting, while exploiting the convexity of the optimization landscape in the… ▽ More

    Submitted 11 December, 2023; v1 submitted 23 November, 2022; originally announced November 2022.

  24. arXiv:2210.04643  [pdf, other

    cs.LG cs.AI cs.CV q-bio.NC

    Critical Learning Periods for Multisensory Integration in Deep Networks

    Authors: Michael Kleinman, Alessandro Achille, Stefano Soatto

    Abstract: We show that the ability of a neural network to integrate information from diverse sources hinges critically on being exposed to properly correlated signals during the early phases of training. Interfering with the learning process during this initial stage can permanently impair the development of a skill, both in artificial and biological systems where the phenomenon is known as a critical learn… ▽ More

    Submitted 14 September, 2023; v1 submitted 6 October, 2022; originally announced October 2022.

    Comments: CVPR 2023 (Highlighted Paper)

  25. arXiv:2207.12186  [pdf, other

    cs.LG cs.AI cs.CV

    On the Learnability of Physical Concepts: Can a Neural Network Understand What's Real?

    Authors: Alessandro Achille, Stefano Soatto

    Abstract: We revisit the classic signal-to-symbol barrier in light of the remarkable ability of deep neural networks to generate realistic synthetic data. DeepFakes and spoofing highlight the feebleness of the link between physical reality and its abstract representation, whether learned by a digital computer or a biological agent. Starting from a widely applicable definition of abstract concept, we show th… ▽ More

    Submitted 3 August, 2022; v1 submitted 25 July, 2022; originally announced July 2022.

  26. arXiv:2207.00581  [pdf, other

    cs.LG

    On Leave-One-Out Conditional Mutual Information For Generalization

    Authors: Mohamad Rida Rammal, Alessandro Achille, Aditya Golatkar, Suhas Diggavi, Stefano Soatto

    Abstract: We derive information theoretic generalization bounds for supervised learning algorithms based on a new measure of leave-one-out conditional mutual information (loo-CMI). Contrary to other CMI bounds, which are black-box bounds that do not exploit the structure of the problem and may be hard to evaluate in practice, our loo-CMI bounds can be computed easily and can be interpreted in connection to… ▽ More

    Submitted 1 July, 2022; originally announced July 2022.

  27. arXiv:2205.12239  [pdf, other

    cs.LG cs.CV cs.IT

    Gacs-Korner Common Information Variational Autoencoder

    Authors: Michael Kleinman, Alessandro Achille, Stefano Soatto, Jonathan Kao

    Abstract: We propose a notion of common information that allows one to quantify and separate the information that is shared between two random variables from the information that is unique to each. Our notion of common information is defined by an optimization problem over a family of functions and recovers the Gács-Körner common information as a special case. Importantly, our notion can be approximated emp… ▽ More

    Submitted 5 November, 2023; v1 submitted 24 May, 2022; originally announced May 2022.

    Comments: Accepted to NeurIPS 2023

  28. arXiv:2203.16708  [pdf, other

    cs.LG cs.CV

    Task Adaptive Parameter Sharing for Multi-Task Learning

    Authors: Matthew Wallingford, Hao Li, Alessandro Achille, Avinash Ravichandran, Charless Fowlkes, Rahul Bhotika, Stefano Soatto

    Abstract: Adapting pre-trained models with broad capabilities has become standard practice for learning a wide range of downstream tasks. The typical approach of fine-tuning different models for each task is performant, but incurs a substantial memory cost. To efficiently learn multiple downstream tasks we introduce Task Adaptive Parameter Sharing (TAPS), a general method for tuning a base model to a new ta… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

    Comments: CVPR 2022 Camera Ready. 15 pages, 11 figures

  29. arXiv:2203.16701  [pdf, other

    cs.LG cs.CR stat.ML

    Towards Differential Relational Privacy and its use in Question Answering

    Authors: Simone Bombari, Alessandro Achille, Zijian Wang, Yu-Xiang Wang, Yusheng Xie, Kunwar Yashraj Singh, Srikar Appalaraju, Vijay Mahadevan, Stefano Soatto

    Abstract: Memorization of the relation between entities in a dataset can lead to privacy issues when using a trained model for question answering. We introduce Relational Memorization (RM) to understand, quantify and control this phenomenon. While bounding general memorization can have detrimental effects on the performance of a trained model, bounding RM does not prevent effective learning. The difference… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

  30. arXiv:2203.11481  [pdf, other

    cs.CV cs.CR

    Mixed Differential Privacy in Computer Vision

    Authors: Aditya Golatkar, Alessandro Achille, Yu-Xiang Wang, Aaron Roth, Michael Kearns, Stefano Soatto

    Abstract: We introduce AdaMix, an adaptive differentially private algorithm for training deep neural network classifiers using both private and public image data. While pre-training language models on large public datasets has enabled strong differential privacy (DP) guarantees with minor loss of accuracy, a similar practice yields punishing trade-offs in vision tasks. A few-shot or even zero-shot learning… ▽ More

    Submitted 28 March, 2022; v1 submitted 22 March, 2022; originally announced March 2022.

    Comments: Accepted at CVPR 2022

  31. arXiv:2202.12457  [pdf, other

    cs.LG eess.SY stat.ML

    Stacked Residuals of Dynamic Layers for Time Series Anomaly Detection

    Authors: L. Zancato, A. Achille, G. Paolini, A. Chiuso, S. Soatto

    Abstract: We present an end-to-end differentiable neural network architecture to perform anomaly detection in multivariate time series by incorporating a Sequential Probability Ratio Test on the prediction residual. The architecture is a cascade of dynamical systems designed to separate linearly predictable components of the signal such as trends and seasonality, from the non-linear ones. The former are mod… ▽ More

    Submitted 24 February, 2022; originally announced February 2022.

  32. arXiv:2111.09785  [pdf, other

    cs.LG

    DIVA: Dataset Derivative of a Learning Task

    Authors: Yonatan Dukler, Alessandro Achille, Giovanni Paolini, Avinash Ravichandran, Marzia Polito, Stefano Soatto

    Abstract: We present a method to compute the derivative of a learning task with respect to a dataset. A learning task is a function from a training set to the validation error, which can be represented by a trained deep neural network (DNN). The "dataset derivative" is a linear operator, computed around the trained model, that informs how perturbations of the weight of each training sample affect the valida… ▽ More

    Submitted 18 November, 2021; originally announced November 2021.

  33. arXiv:2102.00084  [pdf, other

    cs.CV cs.LG

    A linearized framework and a new benchmark for model selection for fine-tuning

    Authors: Aditya Deshpande, Alessandro Achille, Avinash Ravichandran, Hao Li, Luca Zancato, Charless Fowlkes, Rahul Bhotika, Stefano Soatto, Pietro Perona

    Abstract: Fine-tuning from a collection of models pre-trained on different domains (a "model zoo") is emerging as a technique to improve test accuracy in the low-data regime. However, model selection, i.e. how to pre-select the right model to fine-tune from a model zoo without performing any training, remains an open topic. We use a linearized framework to approximate fine-tuning, and introduce two new base… ▽ More

    Submitted 29 January, 2021; originally announced February 2021.

    Comments: 14 pages

  34. arXiv:2101.11058  [pdf, other

    cs.CV cs.LG

    Supervised Momentum Contrastive Learning for Few-Shot Classification

    Authors: Orchid Majumder, Avinash Ravichandran, Subhransu Maji, Alessandro Achille, Marzia Polito, Stefano Soatto

    Abstract: Few-shot learning aims to transfer information from one task to enable generalization on novel tasks given a few examples. This information is present both in the domain and the class labels. In this work we investigate the complementary roles of these two sources of information by combining instance-discriminative contrastive learning and supervised learning in a single framework called Supervise… ▽ More

    Submitted 21 June, 2021; v1 submitted 26 January, 2021; originally announced January 2021.

    Comments: V2 version; updated with new experiments and figures

  35. arXiv:2101.06640  [pdf, other

    cs.LG stat.ML

    Estimating informativeness of samples with Smooth Unique Information

    Authors: Hrayr Harutyunyan, Alessandro Achille, Giovanni Paolini, Orchid Majumder, Avinash Ravichandran, Rahul Bhotika, Stefano Soatto

    Abstract: We define a notion of information that an individual sample provides to the training of a neural network, and we specialize it to measure both how much a sample informs the final weights and how much it informs the function computed by the weights. Though related, we show that these quantities have a qualitatively different behavior. We give efficient approximations of these quantities using a lin… ▽ More

    Submitted 28 March, 2021; v1 submitted 17 January, 2021; originally announced January 2021.

    Comments: ICLR 2021, 22 pages

  36. arXiv:2101.05779  [pdf, other

    cs.LG cs.CL

    Structured Prediction as Translation between Augmented Natural Languages

    Authors: Giovanni Paolini, Ben Athiwaratkun, Jason Krone, Jie Ma, Alessandro Achille, Rishita Anubhai, Cicero Nogueira dos Santos, Bing Xiang, Stefano Soatto

    Abstract: We propose a new framework, Translation between Augmented Natural Languages (TANL), to solve many structured prediction language tasks including joint entity and relation extraction, nested named entity recognition, relation classification, semantic role labeling, event extraction, coreference resolution, and dialogue state tracking. Instead of tackling the problem by training task-specific discri… ▽ More

    Submitted 2 December, 2021; v1 submitted 14 January, 2021; originally announced January 2021.

    Journal ref: International Conference on Learning Representations (ICLR) 2021

  37. arXiv:2012.13431  [pdf, other

    cs.LG cs.AI cs.CV

    Mixed-Privacy Forgetting in Deep Networks

    Authors: Aditya Golatkar, Alessandro Achille, Avinash Ravichandran, Marzia Polito, Stefano Soatto

    Abstract: We show that the influence of a subset of the training samples can be removed -- or "forgotten" -- from the weights of a network trained on large-scale image classification tasks, and we provide strong computable bounds on the amount of remaining information after forgetting. Inspired by real-world applications of forgetting techniques, we introduce a novel notion of forgetting in mixed-privacy se… ▽ More

    Submitted 20 June, 2021; v1 submitted 24 December, 2020; originally announced December 2020.

    Comments: CVPR 2021

  38. arXiv:2012.11140  [pdf, other

    cs.LG cs.CV stat.ML

    LQF: Linear Quadratic Fine-Tuning

    Authors: Alessandro Achille, Aditya Golatkar, Avinash Ravichandran, Marzia Polito, Stefano Soatto

    Abstract: Classifiers that are linear in their parameters, and trained by optimizing a convex loss function, have predictable behavior with respect to changes in the training data, initial conditions, and optimization. Such desirable properties are absent in deep neural networks (DNNs), typically trained by non-linear fine-tuning of a pre-trained model. Previous attempts to linearize DNNs have led to intere… ▽ More

    Submitted 21 December, 2020; originally announced December 2020.

  39. arXiv:2010.02459  [pdf, other

    cs.LG cs.IT stat.ML

    Usable Information and Evolution of Optimal Representations During Training

    Authors: Michael Kleinman, Alessandro Achille, Daksh Idnani, Jonathan C. Kao

    Abstract: We introduce a notion of usable information contained in the representation learned by a deep network, and use it to study how optimal representations for the task emerge during training. We show that the implicit regularization coming from training with Stochastic Gradient Descent with a high learning-rate and small batch size plays an important role in learning minimal sufficient representations… ▽ More

    Submitted 28 February, 2021; v1 submitted 5 October, 2020; originally announced October 2020.

    Comments: ICLR 2021

  40. arXiv:2008.12478  [pdf, other

    cs.LG stat.ML

    Predicting Training Time Without Training

    Authors: Luca Zancato, Alessandro Achille, Avinash Ravichandran, Rahul Bhotika, Stefano Soatto

    Abstract: We tackle the problem of predicting the number of optimization steps that a pre-trained deep network needs to converge to a given value of the loss function. To do so, we leverage the fact that the training dynamics of a deep network during fine-tuning are well approximated by those of a linearized model. This allows us to approximate the training loss and accuracy at any point during training by… ▽ More

    Submitted 28 August, 2020; originally announced August 2020.

  41. arXiv:2007.11259  [pdf, other

    cs.LG cs.CV stat.ML

    Adversarial Training Reduces Information and Improves Transferability

    Authors: Matteo Terzi, Alessandro Achille, Marco Maggipinto, Gian Antonio Susto

    Abstract: Recent results show that features of adversarially trained networks for classification, in addition to being robust, enable desirable properties such as invertibility. The latter property may seem counter-intuitive as it is widely accepted by the community that classification models should only capture the minimal information (features) required for the task. Motivated by this discrepancy, we inve… ▽ More

    Submitted 15 December, 2020; v1 submitted 22 July, 2020; originally announced July 2020.

  42. arXiv:2006.14615  [pdf, other

    cs.CV cs.LG

    LayoutTransformer: Layout Generation and Completion with Self-attention

    Authors: Kamal Gupta, Justin Lazarow, Alessandro Achille, Larry Davis, Vijay Mahadevan, Abhinav Shrivastava

    Abstract: We address the problem of scene layout generation for diverse domains such as images, mobile applications, documents, and 3D objects. Most complex scenes, natural or human-designed, can be expressed as a meaningful arrangement of simpler compositional graphical primitives. Generating a new layout or extending an existing layout requires understanding the relationships between these primitives. To… ▽ More

    Submitted 30 September, 2021; v1 submitted 25 June, 2020; originally announced June 2020.

    Comments: To appear at ICCV 2021

  43. arXiv:2003.02960  [pdf, other

    cs.LG cs.CV cs.IT stat.ML

    Forgetting Outside the Box: Scrubbing Deep Networks of Information Accessible from Input-Output Observations

    Authors: Aditya Golatkar, Alessandro Achille, Stefano Soatto

    Abstract: We describe a procedure for removing dependency on a cohort of training data from a trained deep network that improves upon and generalizes previous methods to different readout functions and can be extended to ensure forgetting in the activations of the network. We introduce a new bound on how much information can be extracted per query about the forgotten cohort from a black-box network for whic… ▽ More

    Submitted 28 October, 2020; v1 submitted 5 March, 2020; originally announced March 2020.

    Comments: ECCV 2020

  44. arXiv:2002.04162  [pdf, other

    cs.LG cs.CV stat.ML

    Incremental Meta-Learning via Indirect Discriminant Alignment

    Authors: Qing Liu, Orchid Majumder, Alessandro Achille, Avinash Ravichandran, Rahul Bhotika, Stefano Soatto

    Abstract: Majority of the modern meta-learning methods for few-shot classification tasks operate in two phases: a meta-training phase where the meta-learner learns a generic representation by solving multiple few-shot tasks sampled from a large dataset and a testing phase, where the meta-learner leverages its learnt internal representation for a specific few-shot task involving classes which were not seen d… ▽ More

    Submitted 21 April, 2020; v1 submitted 10 February, 2020; originally announced February 2020.

  45. arXiv:1912.08990  [pdf, other

    cs.CV cs.LG

    TextTubes for Detecting Curved Text in the Wild

    Authors: Joël Seytre, Jon Wu, Alessandro Achille

    Abstract: We present a detector for curved text in natural images. We model scene text instances as tubes around their medial axes and introduce a parametrization-invariant loss function. We train a two-stage curved text detector, and evaluate it on the curved text benchmarks CTW-1500 and Total-Text. Our approach achieves state-of-the-art results or improves upon them, notably for CTW-1500 by over 8 percent… ▽ More

    Submitted 18 December, 2019; originally announced December 2019.

  46. arXiv:1911.04933  [pdf, other

    cs.LG stat.ML

    Eternal Sunshine of the Spotless Net: Selective Forgetting in Deep Networks

    Authors: Aditya Golatkar, Alessandro Achille, Stefano Soatto

    Abstract: We explore the problem of selectively forgetting a particular subset of the data used for training a deep neural network. While the effects of the data to be forgotten can be hidden from the output of the network, insights may still be gleaned by probing deep into its weights. We propose a method for "scrubbing'" the weights clean of information about a particular set of training data. The method… ▽ More

    Submitted 31 March, 2020; v1 submitted 12 November, 2019; originally announced November 2019.

    Comments: Accepted at CVPR 2020

  47. arXiv:1908.01091  [pdf, other

    cs.LG cs.CV stat.ML

    Toward Understanding Catastrophic Forgetting in Continual Learning

    Authors: Cuong V. Nguyen, Alessandro Achille, Michael Lam, Tal Hassner, Vijay Mahadevan, Stefano Soatto

    Abstract: We study the relationship between catastrophic forgetting and properties of task sequences. In particular, given a sequence of tasks, we would like to understand which properties of this sequence influence the error rates of continual learning algorithms trained on the sequence. To this end, we propose a new procedure that makes use of recent developments in task space modeling as well as correlat… ▽ More

    Submitted 2 August, 2019; originally announced August 2019.

  48. arXiv:1905.13277  [pdf, other

    cs.LG cs.AI stat.ML

    Time Matters in Regularizing Deep Networks: Weight Decay and Data Augmentation Affect Early Learning Dynamics, Matter Little Near Convergence

    Authors: Aditya Golatkar, Alessandro Achille, Stefano Soatto

    Abstract: Regularization is typically understood as improving generalization by altering the landscape of local extrema to which the model eventually converges. Deep neural networks (DNNs), however, challenge this view: We show that removing regularization after an initial transient period has little effect on generalization, even if the final loss landscape is the same as if there had been no regularizatio… ▽ More

    Submitted 30 May, 2019; originally announced May 2019.

  49. arXiv:1905.12213  [pdf, other

    cs.LG cs.AI cs.IT stat.ML

    Where is the Information in a Deep Neural Network?

    Authors: Alessandro Achille, Giovanni Paolini, Stefano Soatto

    Abstract: Whatever information a deep neural network has gleaned from training data is encoded in its weights. How this information affects the response of the network to future data remains largely an open question. Indeed, even defining and measuring information entails some subtleties, since a trained network is a deterministic map, so standard information measures can be degenerate. We measure informati… ▽ More

    Submitted 21 June, 2020; v1 submitted 29 May, 2019; originally announced May 2019.

    Report number: UCLA-TR:190005

  50. arXiv:1904.03292  [pdf, other

    cs.LG cs.IT stat.ML

    The Information Complexity of Learning Tasks, their Structure and their Distance

    Authors: Alessandro Achille, Giovanni Paolini, Glen Mbeng, Stefano Soatto

    Abstract: We introduce an asymmetric distance in the space of learning tasks, and a framework to compute their complexity. These concepts are foundational for the practice of transfer learning, whereby a parametric model is pre-trained for a task, and then fine-tuned for another. The framework we develop is non-asymptotic, captures the finite nature of the training dataset, and allows distinguishing learnin… ▽ More

    Submitted 14 July, 2020; v1 submitted 5 April, 2019; originally announced April 2019.

    Report number: UCLA CSD180003