Skip to main content

Showing 1–11 of 11 results for author: Mialon, G

  1. arXiv:2311.15930  [pdf, other

    cs.CL cs.AI

    WorldSense: A Synthetic Benchmark for Grounded Reasoning in Large Language Models

    Authors: Youssef Benchekroun, Megi Dervishi, Mark Ibrahim, Jean-Baptiste Gaya, Xavier Martinet, Grégoire Mialon, Thomas Scialom, Emmanuel Dupoux, Dieuwke Hupkes, Pascal Vincent

    Abstract: We propose WorldSense, a benchmark designed to assess the extent to which LLMs are consistently able to sustain tacit world models, by testing how they draw simple inferences from descriptions of simple arrangements of entities. Worldsense is a synthetic benchmark with three problem types, each with their own trivial control, which explicitly avoids bias by decorrelating the abstract structure of… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

  2. arXiv:2311.12983  [pdf, other

    cs.CL cs.AI

    GAIA: a benchmark for General AI Assistants

    Authors: Grégoire Mialon, Clémentine Fourrier, Craig Swift, Thomas Wolf, Yann LeCun, Thomas Scialom

    Abstract: We introduce GAIA, a benchmark for General AI Assistants that, if solved, would represent a milestone in AI research. GAIA proposes real-world questions that require a set of fundamental abilities such as reasoning, multi-modality handling, web browsing, and generally tool-use proficiency. GAIA questions are conceptually simple for humans yet challenging for most advanced AIs: we show that human r… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

  3. arXiv:2307.05432  [pdf, other

    cs.LG math.NA

    Self-Supervised Learning with Lie Symmetries for Partial Differential Equations

    Authors: Grégoire Mialon, Quentin Garrido, Hannah Lawrence, Danyal Rehman, Yann LeCun, Bobak T. Kiani

    Abstract: Machine learning for differential equations paves the way for computationally efficient alternatives to numerical solvers, with potentially broad impacts in science and engineering. Though current algorithms typically require simulated training data tailored to a given setting, one may instead wish to learn useful information from heterogeneous sources, or from real dynamical systems observations… ▽ More

    Submitted 14 February, 2024; v1 submitted 11 July, 2023; originally announced July 2023.

    Comments: NeurIPS 2023

  4. arXiv:2304.12210  [pdf, other

    cs.LG cs.CV

    A Cookbook of Self-Supervised Learning

    Authors: Randall Balestriero, Mark Ibrahim, Vlad Sobal, Ari Morcos, Shashank Shekhar, Tom Goldstein, Florian Bordes, Adrien Bardes, Gregoire Mialon, Yuandong Tian, Avi Schwarzschild, Andrew Gordon Wilson, Jonas Geiping, Quentin Garrido, Pierre Fernandez, Amir Bar, Hamed Pirsiavash, Yann LeCun, Micah Goldblum

    Abstract: Self-supervised learning, dubbed the dark matter of intelligence, is a promising path to advance machine learning. Yet, much like cooking, training SSL methods is a delicate art with a high barrier to entry. While many components are familiar, successfully training a SSL method involves a dizzying set of choices from the pretext tasks to training hyper-parameters. Our goal is to lower the barrier… ▽ More

    Submitted 28 June, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

  5. arXiv:2302.10692  [pdf, other

    cs.LG

    On Inductive Biases for Machine Learning in Data Constrained Settings

    Authors: Grégoire Mialon

    Abstract: Learning with limited data is one of the biggest problems of machine learning. Current approaches to this issue consist in learning general representations from huge amounts of data before fine-tuning the model on a small dataset of interest. While such technique, coined transfer learning, is very effective in domains such as computer vision or natural langage processing, it does not yet solve com… ▽ More

    Submitted 21 February, 2023; originally announced February 2023.

    Comments: PhD thesis defended on January 19th, 2022

  6. arXiv:2302.07842  [pdf, ps, other

    cs.CL

    Augmented Language Models: a Survey

    Authors: Grégoire Mialon, Roberto Dessì, Maria Lomeli, Christoforos Nalmpantis, Ram Pasunuru, Roberta Raileanu, Baptiste Rozière, Timo Schick, Jane Dwivedi-Yu, Asli Celikyilmaz, Edouard Grave, Yann LeCun, Thomas Scialom

    Abstract: This survey reviews works in which language models (LMs) are augmented with reasoning skills and the ability to use tools. The former is defined as decomposing a potentially complex task into simpler subtasks while the latter consists in calling external modules such as a code interpreter. LMs can leverage these augmentations separately or in combination via heuristics, or learn to do so from demo… ▽ More

    Submitted 15 February, 2023; originally announced February 2023.

  7. arXiv:2209.14905  [pdf, other

    cs.LG

    Variance Covariance Regularization Enforces Pairwise Independence in Self-Supervised Representations

    Authors: Grégoire Mialon, Randall Balestriero, Yann LeCun

    Abstract: Self-Supervised Learning (SSL) methods such as VICReg, Barlow Twins or W-MSE avoid collapse of their joint embedding architectures by constraining or regularizing the covariance matrix of their projector's output. This study highlights important properties of such strategy, which we coin Variance-Covariance regularization (VCReg). More precisely, we show that {\em VCReg combined to a MLP projector… ▽ More

    Submitted 14 February, 2024; v1 submitted 29 September, 2022; originally announced September 2022.

  8. arXiv:2106.05667  [pdf, other

    cs.LG

    GraphiT: Encoding Graph Structure in Transformers

    Authors: Grégoire Mialon, Dexiong Chen, Margot Selosse, Julien Mairal

    Abstract: We show that viewing graphs as sets of node features and incorporating structural and positional information into a transformer architecture is able to outperform representations learned with classical graph neural networks (GNNs). Our model, GraphiT, encodes such information by (i) leveraging relative positional encoding strategies in self-attention scores based on positive definite kernels on gr… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

  9. arXiv:2006.12065  [pdf, other

    cs.LG stat.ML

    A Trainable Optimal Transport Embedding for Feature Aggregation and its Relationship to Attention

    Authors: Grégoire Mialon, Dexiong Chen, Alexandre d'Aspremont, Julien Mairal

    Abstract: We address the problem of learning on sets of features, motivated by the need of performing pooling operations in long biological sequences of varying sizes, with long-range dependencies, and possibly few labeled data. To address this challenging task, we introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal… ▽ More

    Submitted 9 February, 2021; v1 submitted 22 June, 2020; originally announced June 2020.

    Comments: ICLR 2021

  10. arXiv:1912.02566  [pdf, other

    cs.LG stat.ML

    Screening Data Points in Empirical Risk Minimization via Ellipsoidal Regions and Safe Loss Functions

    Authors: Grégoire Mialon, Alexandre d'Aspremont, Julien Mairal

    Abstract: We design simple screening tests to automatically discard data samples in empirical risk minimization without losing optimization guarantees. We derive loss functions that produce dual objectives with a sparse solution. We also show how to regularize convex losses to ensure such a dual sparsity-inducing property, and propose a general method to design screening tests for classification or regressi… ▽ More

    Submitted 12 June, 2020; v1 submitted 5 December, 2019; originally announced December 2019.

    Comments: AISTATS 2020

  11. arXiv:1810.00363  [pdf, other

    stat.ML cs.LG

    A Kernel Perspective for Regularizing Deep Neural Networks

    Authors: Alberto Bietti, Grégoire Mialon, Dexiong Chen, Julien Mairal

    Abstract: We propose a new point of view for regularizing deep neural networks by using the norm of a reproducing kernel Hilbert space (RKHS). Even though this norm cannot be computed, it admits upper and lower approximations leading to various practical strategies. Specifically, this perspective (i) provides a common umbrella for many existing regularization principles, including spectral norm and gradient… ▽ More

    Submitted 13 May, 2019; v1 submitted 30 September, 2018; originally announced October 2018.

    Comments: ICML