Skip to main content

Showing 1–17 of 17 results for author: Caccia, L

  1. arXiv:2405.11157  [pdf, other

    cs.LG cs.CL

    Towards Modular LLMs by Building and Reusing a Library of LoRAs

    Authors: Oleksiy Ostapenko, Zhan Su, Edoardo Maria Ponti, Laurent Charlin, Nicolas Le Roux, Matheus Pereira, Lucas Caccia, Alessandro Sordoni

    Abstract: The growing number of parameter-efficient adaptations of a base large language model (LLM) calls for studying whether we can reuse such trained adapters to improve performance for new tasks. We study how to best build a library of adapters given multi-task data and devise techniques for both zero-shot and supervised task generalization through routing in such library. We benchmark existing approac… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  2. arXiv:2310.05707  [pdf, other

    cs.CL cs.AI cs.LG

    Guiding Language Model Math Reasoning with Planning Tokens

    Authors: Xinyi Wang, Lucas Caccia, Oleksiy Ostapenko, Xingdi Yuan, William Yang Wang, Alessandro Sordoni

    Abstract: Large language models (LLMs) have recently attracted considerable interest for their ability to perform complex reasoning tasks, such as chain-of-thought reasoning. However, most of the existing approaches to enhance this ability rely heavily on data-driven methods, while neglecting the structural aspects of the model's reasoning capacity. We find that while LLMs can manage individual reasoning st… ▽ More

    Submitted 5 February, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

  3. arXiv:2306.03937  [pdf, other

    cs.LG cs.AI

    Guiding The Last Layer in Federated Learning with Pre-Trained Models

    Authors: Gwen Legate, Nicolas Bernier, Lucas Caccia, Edouard Oyallon, Eugene Belilovsky

    Abstract: Federated Learning (FL) is an emerging paradigm that allows a model to be trained across a number of participants without sharing data. Recent works have begun to consider the effects of using pre-trained models as an initialization point for existing FL algorithms; however, these approaches ignore the vast body of efficient transfer learning literature from the centralized learning setting. Here… ▽ More

    Submitted 6 November, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

  4. arXiv:2304.05260  [pdf, other

    cs.LG cs.AI

    Re-Weighted Softmax Cross-Entropy to Control Forgetting in Federated Learning

    Authors: Gwen Legate, Lucas Caccia, Eugene Belilovsky

    Abstract: In Federated Learning, a global model is learned by aggregating model updates computed at a set of independent client nodes, to reduce communication costs multiple gradient steps are performed at each node prior to aggregation. A key challenge in this setting is data heterogeneity across clients resulting in differing local objectives which can lead clients to overly minimize their own local objec… ▽ More

    Submitted 11 April, 2023; originally announced April 2023.

  5. arXiv:2211.10445  [pdf, other

    cs.LG cs.AI

    Building a Subspace of Policies for Scalable Continual Learning

    Authors: Jean-Baptiste Gaya, Thang Doan, Lucas Caccia, Laure Soulier, Ludovic Denoyer, Roberta Raileanu

    Abstract: The ability to continuously acquire new knowledge and skills is crucial for autonomous agents. Existing methods are typically based on either fixed-size models that struggle to learn a large number of diverse behaviors, or growing-size models that scale poorly with the number of tasks. In this work, we aim to strike a better balance between an agent's size and performance by designing a method tha… ▽ More

    Submitted 2 March, 2023; v1 submitted 18 November, 2022; originally announced November 2022.

    Comments: Accepted at ICLR2023 (notable-top-25%). website: https://continual-subspace-policies-streamlit-app-gofujp.streamlit.app/ code: https://github.com/facebookresearch/salina/tree/main/salina_cl

  6. arXiv:2211.03831  [pdf, other

    cs.AI

    Multi-Head Adapter Routing for Cross-Task Generalization

    Authors: Lucas Caccia, Edoardo Ponti, Zhan Su, Matheus Pereira, Nicolas Le Roux, Alessandro Sordoni

    Abstract: Parameter-efficient fine-tuning (PEFT) for cross-task generalization consists in pre-training adapters on a multi-task training set before few-shot adaptation to test tasks. Polytropon [Ponti et al., 2023] ($\texttt{Poly}$) jointly learns an inventory of adapters and a routing function that selects a (variable-size) subset of adapters for each task during both pre-training and few-shot adaptation.… ▽ More

    Submitted 13 November, 2023; v1 submitted 7 November, 2022; originally announced November 2022.

    Comments: Accepted at NeurIPS 2023. Code is available at https://github.com/microsoft/mttl

  7. arXiv:2203.03798   

    cs.LG cs.AI

    New Insights on Reducing Abrupt Representation Change in Online Continual Learning

    Authors: Lucas Caccia, Rahaf Aljundi, Nader Asadi, Tinne Tuytelaars, Joelle Pineau, Eugene Belilovsky

    Abstract: In the online continual learning paradigm, agents must learn from a changing distribution while respecting memory and compute constraints. Experience Replay (ER), where a small subset of past data is stored and replayed alongside new data, has emerged as a simple and effective learning strategy. In this work, we focus on the change in representations of observed data that arises when previously un… ▽ More

    Submitted 25 April, 2022; v1 submitted 7 March, 2022; originally announced March 2022.

    Comments: This has been withdrawn as it is a new version of arXiv:2104.05025

  8. arXiv:2106.09563  [pdf, other

    cs.LG cs.CV

    On Anytime Learning at Macroscale

    Authors: Lucas Caccia, Jing Xu, Myle Ott, Marc'Aurelio Ranzato, Ludovic Denoyer

    Abstract: In many practical applications of machine learning data arrives sequentially over time in large chunks. Practitioners have then to decide how to allocate their computational budget in order to obtain the best performance at any point in time. Online learning theory for convex optimization suggests that the best strategy is to use data as soon as it arrives. However, this might not be the best stra… ▽ More

    Submitted 2 August, 2022; v1 submitted 17 June, 2021; originally announced June 2021.

    Comments: Accepted at the Conference on Lifelong Learning Agents (CoLLAs) 2022

  9. arXiv:2106.09065  [pdf, other

    cs.CV cs.LG

    SPeCiaL: Self-Supervised Pretraining for Continual Learning

    Authors: Lucas Caccia, Joelle Pineau

    Abstract: This paper presents SPeCiaL: a method for unsupervised pretraining of representations tailored for continual learning. Our approach devises a meta-learning objective that differentiates through a sequential learning process. Specifically, we train a linear model over the representations to match different augmented views of the same image together, each view presented sequentially. The linear mode… ▽ More

    Submitted 16 June, 2021; originally announced June 2021.

  10. arXiv:2106.06401  [pdf, other

    cs.LG cs.DC

    Decoupled Greedy Learning of CNNs for Synchronous and Asynchronous Distributed Learning

    Authors: Eugene Belilovsky, Louis Leconte, Lucas Caccia, Michael Eickenberg, Edouard Oyallon

    Abstract: A commonly cited inefficiency of neural network training using back-propagation is the update locking problem: each layer must wait for the signal to propagate through the full network before updating. Several alternatives that can alleviate this issue have been proposed. In this context, we consider a simple alternative based on minimal feedback, which we call Decoupled Greedy Learning (DGL). It… ▽ More

    Submitted 11 June, 2021; originally announced June 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:1901.08164

  11. arXiv:2104.05025  [pdf, other

    cs.LG

    New Insights on Reducing Abrupt Representation Change in Online Continual Learning

    Authors: Lucas Caccia, Rahaf Aljundi, Nader Asadi, Tinne Tuytelaars, Joelle Pineau, Eugene Belilovsky

    Abstract: In the online continual learning paradigm, agents must learn from a changing distribution while respecting memory and compute constraints. Experience Replay (ER), where a small subset of past data is stored and replayed alongside new data, has emerged as a simple and effective learning strategy. In this work, we focus on the change in representations of observed data that arises when previously un… ▽ More

    Submitted 2 May, 2022; v1 submitted 11 April, 2021; originally announced April 2021.

    Comments: Accepted at ICLR 2022. Code available at www.github.com/pclucas14/AML

  12. arXiv:2003.05856  [pdf, other

    cs.AI cs.LG

    Online Fast Adaptation and Knowledge Accumulation: a New Approach to Continual Learning

    Authors: Massimo Caccia, Pau Rodriguez, Oleksiy Ostapenko, Fabrice Normandin, Min Lin, Lucas Caccia, Issam Laradji, Irina Rish, Alexandre Lacoste, David Vazquez, Laurent Charlin

    Abstract: Continual learning studies agents that learn from streams of tasks without forgetting previous ones while adapting to new ones. Two recent continual-learning scenarios have opened new avenues of research. In meta-continual learning, the model is pre-trained to minimize catastrophic forgetting of previous tasks. In continual-meta learning, the aim is to train agents for faster remembering of previo… ▽ More

    Submitted 20 January, 2021; v1 submitted 12 March, 2020; originally announced March 2020.

    Journal ref: NeurIPS 2020

  13. arXiv:1911.08019  [pdf, other

    cs.LG cs.CV stat.ML

    Online Learned Continual Compression with Adaptive Quantization Modules

    Authors: Lucas Caccia, Eugene Belilovsky, Massimo Caccia, Joelle Pineau

    Abstract: We introduce and study the problem of Online Continual Compression, where one attempts to simultaneously learn to compress and store a representative dataset from a non i.i.d data stream, while only observing each sample once. A naive application of auto-encoders in this setting encounters a major challenge: representations derived from earlier encoder states must be usable by later decoder states… ▽ More

    Submitted 20 August, 2020; v1 submitted 18 November, 2019; originally announced November 2019.

  14. arXiv:1908.04742  [pdf, other

    cs.LG stat.ML

    Online Continual Learning with Maximally Interfered Retrieval

    Authors: Rahaf Aljundi, Lucas Caccia, Eugene Belilovsky, Massimo Caccia, Min Lin, Laurent Charlin, Tinne Tuytelaars

    Abstract: Continual learning, the setting where a learning agent is faced with a never ending stream of data, continues to be a great challenge for modern machine learning systems. In particular the online or "single-pass through the data" setting has gained attention recently as a natural setting that is difficult to tackle. Methods based on replay, either generative or from a stored memory, have been show… ▽ More

    Submitted 29 October, 2019; v1 submitted 11 August, 2019; originally announced August 2019.

    Journal ref: NeurIPS 2019

  15. arXiv:1905.09562  [pdf, other

    cs.LG stat.ML

    Recurrent Value Functions

    Authors: Pierre Thodoroff, Nishanth Anand, Lucas Caccia, Doina Precup, Joelle Pineau

    Abstract: Despite recent successes in Reinforcement Learning, value-based methods often suffer from high variance hindering performance. In this paper, we illustrate this in a continuous control setting where state of the art methods perform poorly whenever sensor noise is introduced. To overcome this issue, we introduce Recurrent Value Functions (RVFs) as an alternative to estimate the value function of a… ▽ More

    Submitted 23 May, 2019; originally announced May 2019.

  16. arXiv:1812.01180  [pdf, other

    cs.CV

    Deep Generative Modeling of LiDAR Data

    Authors: Lucas Caccia, Herke van Hoof, Aaron Courville, Joelle Pineau

    Abstract: Building models capable of generating structured output is a key challenge for AI and robotics. While generative models have been explored on many types of data, little work has been done on synthesizing lidar scans, which play a key role in robot mapping and localization. In this work, we show that one can adapt deep generative models for this task by unravelling lidar scans into a 2D point map.… ▽ More

    Submitted 2 December, 2019; v1 submitted 3 December, 2018; originally announced December 2018.

    Comments: Presented at IROS 2019

  17. arXiv:1811.02549  [pdf, other

    cs.CL cs.LG

    Language GANs Falling Short

    Authors: Massimo Caccia, Lucas Caccia, William Fedus, Hugo Larochelle, Joelle Pineau, Laurent Charlin

    Abstract: Generating high-quality text with sufficient diversity is essential for a wide range of Natural Language Generation (NLG) tasks. Maximum-Likelihood (MLE) models trained with teacher forcing have consistently been reported as weak baselines, where poor performance is attributed to exposure bias (Bengio et al., 2015; Ranzato et al., 2015); at inference time, the model is fed its own prediction inste… ▽ More

    Submitted 19 February, 2020; v1 submitted 6 November, 2018; originally announced November 2018.

    Journal ref: ICLR 2020 - Proceedings of the Seventh International Conference on Learning Representation