Skip to main content

Showing 1–10 of 10 results for author: Ostapenko, O

  1. arXiv:2405.11157  [pdf, other

    cs.LG cs.CL

    Towards Modular LLMs by Building and Reusing a Library of LoRAs

    Authors: Oleksiy Ostapenko, Zhan Su, Edoardo Maria Ponti, Laurent Charlin, Nicolas Le Roux, Matheus Pereira, Lucas Caccia, Alessandro Sordoni

    Abstract: The growing number of parameter-efficient adaptations of a base large language model (LLM) calls for studying whether we can reuse such trained adapters to improve performance for new tasks. We study how to best build a library of adapters given multi-task data and devise techniques for both zero-shot and supervised task generalization through routing in such library. We benchmark existing approac… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  2. arXiv:2310.05707  [pdf, other

    cs.CL cs.AI cs.LG

    Guiding Language Model Math Reasoning with Planning Tokens

    Authors: Xinyi Wang, Lucas Caccia, Oleksiy Ostapenko, Xingdi Yuan, William Yang Wang, Alessandro Sordoni

    Abstract: Large language models (LLMs) have recently attracted considerable interest for their ability to perform complex reasoning tasks, such as chain-of-thought reasoning. However, most of the existing approaches to enhance this ability rely heavily on data-driven methods, while neglecting the structural aspects of the model's reasoning capacity. We find that while LLMs can manage individual reasoning st… ▽ More

    Submitted 5 February, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

  3. arXiv:2207.04543  [pdf, other

    cs.LG cs.AI

    Challenging Common Assumptions about Catastrophic Forgetting

    Authors: Timothée Lesort, Oleksiy Ostapenko, Diganta Misra, Md Rifat Arefin, Pau Rodríguez, Laurent Charlin, Irina Rish

    Abstract: Building learning agents that can progressively learn and accumulate knowledge is the core goal of the continual learning (CL) research field. Unfortunately, training a model on new data usually compromises the performance on past data. In the CL literature, this effect is referred to as catastrophic forgetting (CF). CF has been largely studied, and a plethora of methods have been proposed to addr… ▽ More

    Submitted 15 May, 2023; v1 submitted 10 July, 2022; originally announced July 2022.

  4. arXiv:2205.00329  [pdf, other

    cs.LG cs.AI

    Continual Learning with Foundation Models: An Empirical Study of Latent Replay

    Authors: Oleksiy Ostapenko, Timothee Lesort, Pau Rodríguez, Md Rifat Arefin, Arthur Douillard, Irina Rish, Laurent Charlin

    Abstract: Rapid development of large-scale pre-training has resulted in foundation models that can act as effective feature extractors on a variety of downstream tasks and domains. Motivated by this, we study the efficacy of pre-trained vision models as a foundation for downstream continual learning (CL) scenarios. Our goal is twofold. First, we want to understand the compute-accuracy trade-off between CL i… ▽ More

    Submitted 2 July, 2022; v1 submitted 30 April, 2022; originally announced May 2022.

  5. arXiv:2111.07736  [pdf, other

    cs.LG cs.AI

    Continual Learning via Local Module Composition

    Authors: Oleksiy Ostapenko, Pau Rodriguez, Massimo Caccia, Laurent Charlin

    Abstract: Modularity is a compelling solution to continual learning (CL), the problem of modeling sequences of related tasks. Learning and then composing modules to solve different tasks provides an abstraction to address the principal challenges of CL including catastrophic forgetting, backward and forward transfer across tasks, and sub-linear model growth. We introduce local module composition (LMC), an a… ▽ More

    Submitted 15 November, 2021; originally announced November 2021.

    Journal ref: NeurIPS 2021

  6. arXiv:2108.01005  [pdf, other

    cs.LG

    Sequoia: A Software Framework to Unify Continual Learning Research

    Authors: Fabrice Normandin, Florian Golemo, Oleksiy Ostapenko, Pau Rodriguez, Matthew D Riemer, Julio Hurtado, Khimya Khetarpal, Ryan Lindeborg, Lucas Cecchi, Timothée Lesort, Laurent Charlin, Irina Rish, Massimo Caccia

    Abstract: The field of Continual Learning (CL) seeks to develop algorithms that accumulate knowledge and skills over time through interaction with non-stationary environments. In practice, a plethora of evaluation procedures (settings) and algorithmic solutions (methods) exist, each with their own potentially disjoint set of assumptions. This variety makes measuring progress in CL difficult. We propose a ta… ▽ More

    Submitted 5 June, 2023; v1 submitted 2 August, 2021; originally announced August 2021.

  7. arXiv:2003.05856  [pdf, other

    cs.AI cs.LG

    Online Fast Adaptation and Knowledge Accumulation: a New Approach to Continual Learning

    Authors: Massimo Caccia, Pau Rodriguez, Oleksiy Ostapenko, Fabrice Normandin, Min Lin, Lucas Caccia, Issam Laradji, Irina Rish, Alexandre Lacoste, David Vazquez, Laurent Charlin

    Abstract: Continual learning studies agents that learn from streams of tasks without forgetting previous ones while adapting to new ones. Two recent continual-learning scenarios have opened new avenues of research. In meta-continual learning, the model is pre-trained to minimize catastrophic forgetting of previous tasks. In continual-meta learning, the aim is to train agents for faster remembering of previo… ▽ More

    Submitted 20 January, 2021; v1 submitted 12 March, 2020; originally announced March 2020.

    Journal ref: NeurIPS 2020

  8. arXiv:1912.00200  [pdf, other

    cs.CV cs.LG cs.NE

    Pruning at a Glance: Global Neural Pruning for Model Compression

    Authors: Abdullah Salama, Oleksiy Ostapenko, Tassilo Klein, Moin Nabi

    Abstract: Deep Learning models have become the dominant approach in several areas due to their high performance. Unfortunately, the size and hence computational requirements of operating such models can be considerably high. Therefore, this constitutes a limitation for deployment on memory and battery constrained devices such as mobile phones or embedded systems. To address these limitations, we propose a n… ▽ More

    Submitted 3 December, 2019; v1 submitted 30 November, 2019; originally announced December 2019.

    Comments: Extended version of the ICASSP paper (https://ieeexplore.ieee.org/document/8683224)

  9. arXiv:1904.03137  [pdf, other

    cs.NE cs.CV cs.LG

    Learning to Remember: A Synaptic Plasticity Driven Framework for Continual Learning

    Authors: Oleksiy Ostapenko, Mihai Puscas, Tassilo Klein, Patrick Jähnichen, Moin Nabi

    Abstract: Models trained in the context of continual learning (CL) should be able to learn from a stream of data over an undefined period of time. The main challenges herein are: 1) maintaining old knowledge while simultaneously benefiting from it when learning new tasks, and 2) guaranteeing model scalability with a growing amount of data to learn from. In order to tackle these challenges, we introduce Dyna… ▽ More

    Submitted 2 December, 2019; v1 submitted 5 April, 2019; originally announced April 2019.

    Comments: CVPR 2019

  10. arXiv:1811.09192  [pdf, other

    cs.CV cs.LG cs.MM

    Self Paced Adversarial Training for Multimodal Few-shot Learning

    Authors: Frederik Pahde, Oleksiy Ostapenko, Patrick Jähnichen, Tassilo Klein, Moin Nabi

    Abstract: State-of-the-art deep learning algorithms yield remarkable results in many visual recognition tasks. However, they still fail to provide satisfactory results in scarce data regimes. To a certain extent this lack of data can be compensated by multimodal information. Missing information in one modality of a single data point (e.g. an image) can be made up for in another modality (e.g. a textual desc… ▽ More

    Submitted 22 November, 2018; originally announced November 2018.

    Comments: To appear at WACV 2019