Skip to main content

Showing 1–17 of 17 results for author: Brandfonbrener, D

  1. arXiv:2407.07972  [pdf, other

    cs.LG cs.AI

    Deconstructing What Makes a Good Optimizer for Language Models

    Authors: Rosie Zhao, Depen Morwani, David Brandfonbrener, Nikhil Vyas, Sham Kakade

    Abstract: Training language models becomes increasingly expensive with scale, prompting numerous attempts to improve optimization efficiency. Despite these efforts, the Adam optimizer remains the most widely used, due to a prevailing view that it is the most effective approach. We aim to compare several optimization algorithms, including SGD, Adafactor, Adam, and Lion, in the context of autoregressive langu… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  2. arXiv:2407.03310  [pdf, other

    cs.LG

    Universal Length Generalization with Turing Programs

    Authors: Kaiying Hou, David Brandfonbrener, Sham Kakade, Samy Jelassi, Eran Malach

    Abstract: Length generalization refers to the ability to extrapolate from short training sequences to long test sequences and is a challenge for current large language models. While prior work has proposed some architecture or data format changes to achieve length generalization, these proposals typically apply to a limited set of tasks. Building on prior scratchpad and Chain-of-Thought (CoT) techniques, we… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  3. arXiv:2406.10670  [pdf, other

    cs.LG cs.AI cs.CL

    CoLoR-Filter: Conditional Loss Reduction Filtering for Targeted Language Model Pre-training

    Authors: David Brandfonbrener, Hanlin Zhang, Andreas Kirsch, Jonathan Richard Schwarz, Sham Kakade

    Abstract: Selecting high-quality data for pre-training is crucial in shaping the downstream task performance of language models. A major challenge lies in identifying this optimal subset, a problem generally considered intractable, thus necessitating scalable and effective heuristics. In this work, we propose a data selection method, CoLoR-Filter (Conditional Loss Reduction Filtering), which leverages an em… ▽ More

    Submitted 24 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

  4. arXiv:2402.14688  [pdf, other

    cs.LG

    Q-Probe: A Lightweight Approach to Reward Maximization for Language Models

    Authors: Kenneth Li, Samy Jelassi, Hugh Zhang, Sham Kakade, Martin Wattenberg, David Brandfonbrener

    Abstract: We present an approach called Q-probing to adapt a pre-trained language model to maximize a task-specific reward function. At a high level, Q-probing sits between heavier approaches such as finetuning and lighter approaches such as few shot prompting, but can also be combined with either. The idea is to learn a simple linear function on a model's embedding space that can be used to reweight candid… ▽ More

    Submitted 2 June, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

  5. arXiv:2402.08147  [pdf, other

    cs.SE cs.AI cs.LG cs.LO cs.PL

    VerMCTS: Synthesizing Multi-Step Programs using a Verifier, a Large Language Model, and Tree Search

    Authors: David Brandfonbrener, Simon Henniger, Sibi Raja, Tarun Prasad, Chloe Loughridge, Federico Cassano, Sabrina Ruixin Hu, Jianang Yang, William E. Byrd, Robert Zinkov, Nada Amin

    Abstract: Large Language Models (LLMs) can generate useful code, but often the code they generate cannot be trusted to be sound. In this paper, we present VerMCTS, an approach to begin to resolve this issue by generating verified programs in Dafny and Coq. VerMCTS uses a logical verifier in concert with an LLM to guide a modified Monte Carlo Tree Search (MCTS). This approach leverages the verifier to gain i… ▽ More

    Submitted 24 May, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

  6. arXiv:2402.01032  [pdf, other

    cs.LG cs.AI cs.CL

    Repeat After Me: Transformers are Better than State Space Models at Copying

    Authors: Samy Jelassi, David Brandfonbrener, Sham M. Kakade, Eran Malach

    Abstract: Transformers are the dominant architecture for sequence modeling, but there is growing interest in models that use a fixed-size latent state that does not depend on the sequence length, which we refer to as "generalized state space models" (GSSMs). In this paper we show that while GSSMs are promising in terms of inference-time efficiency, they are limited compared to transformer models on tasks th… ▽ More

    Submitted 3 June, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

  7. arXiv:2305.16985  [pdf, other

    cs.LG

    Inverse Dynamics Pretraining Learns Good Representations for Multitask Imitation

    Authors: David Brandfonbrener, Ofir Nachum, Joan Bruna

    Abstract: In recent years, domains such as natural language processing and image recognition have popularized the paradigm of using large datasets to pretrain representations that can be effectively transferred to downstream tasks. In this work we evaluate how such a paradigm should be done in imitation learning, where both pretraining and finetuning data are trajectories collected by experts interacting wi… ▽ More

    Submitted 25 October, 2023; v1 submitted 26 May, 2023; originally announced May 2023.

  8. arXiv:2210.02343  [pdf, other

    cs.RO cs.LG

    Visual Backtracking Teleoperation: A Data Collection Protocol for Offline Image-Based Reinforcement Learning

    Authors: David Brandfonbrener, Stephen Tu, Avi Singh, Stefan Welker, Chad Boodoo, Nikolai Matni, Jake Varley

    Abstract: We consider how to most efficiently leverage teleoperator time to collect data for learning robust image-based value functions and policies for sparse reward robotic tasks. To accomplish this goal, we modify the process of data collection to include more than just successful demonstrations of the desired task. Instead we develop a novel protocol that we call Visual Backtracking Teleoperation (VBT)… ▽ More

    Submitted 5 October, 2022; originally announced October 2022.

  9. arXiv:2206.01085  [pdf, other

    cs.LG

    Incorporating Explicit Uncertainty Estimates into Deep Offline Reinforcement Learning

    Authors: David Brandfonbrener, Remi Tachet des Combes, Romain Laroche

    Abstract: Most theoretically motivated work in the offline reinforcement learning setting requires precise uncertainty estimates. This requirement restricts the algorithms derived in that work to the tabular and linear settings where such estimates exist. In this work, we develop a novel method for incorporating scalable uncertainty estimates into an offline reinforcement learning algorithm called deep-SPIB… ▽ More

    Submitted 2 June, 2022; originally announced June 2022.

  10. arXiv:2206.01079  [pdf, other

    cs.LG

    When does return-conditioned supervised learning work for offline reinforcement learning?

    Authors: David Brandfonbrener, Alberto Bietti, Jacob Buckman, Romain Laroche, Joan Bruna

    Abstract: Several recent works have proposed a class of algorithms for the offline reinforcement learning (RL) problem that we will refer to as return-conditioned supervised learning (RCSL). RCSL algorithms learn the distribution of actions conditioned on both the state and the return of the trajectory. Then they define a policy by conditioning on achieving high return. In this paper, we provide a rigorous… ▽ More

    Submitted 11 January, 2023; v1 submitted 2 June, 2022; originally announced June 2022.

  11. arXiv:2201.13425  [pdf, other

    cs.LG cs.AI

    Don't Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning

    Authors: Denis Yarats, David Brandfonbrener, Hao Liu, Michael Laskin, Pieter Abbeel, Alessandro Lazaric, Lerrel Pinto

    Abstract: Recent progress in deep learning has relied on access to large and diverse datasets. Such data-driven progress has been less evident in offline reinforcement learning (RL), because offline RL data is usually collected to optimize specific target tasks limiting the data's diversity. In this work, we propose Exploratory data for Offline RL (ExORL), a data-centric approach to offline RL. ExORL first… ▽ More

    Submitted 5 April, 2022; v1 submitted 31 January, 2022; originally announced January 2022.

  12. arXiv:2112.00950  [pdf, other

    cs.LG stat.ML

    Quantile Filtered Imitation Learning

    Authors: David Brandfonbrener, William F. Whitney, Rajesh Ranganath, Joan Bruna

    Abstract: We introduce quantile filtered imitation learning (QFIL), a novel policy improvement operator designed for offline reinforcement learning. QFIL performs policy improvement by running imitation learning on a filtered version of the offline dataset. The filtering process removes $ s,a $ pairs whose estimated Q values fall below a given quantile of the pushforward distribution over values induced by… ▽ More

    Submitted 1 December, 2021; originally announced December 2021.

    Comments: Offline Reinforcement Learning Workshop at Neural Information Processing Systems, 2021

  13. arXiv:2106.08909  [pdf, other

    cs.LG stat.ML

    Offline RL Without Off-Policy Evaluation

    Authors: David Brandfonbrener, William F. Whitney, Rajesh Ranganath, Joan Bruna

    Abstract: Most prior approaches to offline reinforcement learning (RL) have taken an iterative actor-critic approach involving off-policy evaluation. In this paper we show that simply doing one step of constrained/regularized policy improvement using an on-policy Q estimate of the behavior policy performs surprisingly well. This one-step algorithm beats the previously reported results of iterative algorithm… ▽ More

    Submitted 3 December, 2021; v1 submitted 16 June, 2021; originally announced June 2021.

    Comments: Thirty-fifth Conference on Neural Information Processing Systems, 2021

  14. arXiv:2009.07368  [pdf, other

    cs.LG cs.AI stat.ML

    Evaluating representations by the complexity of learning low-loss predictors

    Authors: William F. Whitney, Min Jae Song, David Brandfonbrener, Jaan Altosaar, Kyunghyun Cho

    Abstract: We consider the problem of evaluating representations of data for use in solving a downstream task. We propose to measure the quality of a representation by the complexity of learning a predictor on top of the representation that achieves low loss on a task of interest, and introduce two methods, surplus description length (SDL) and $\varepsilon$ sample complexity ($\varepsilon$SC). In contrast to… ▽ More

    Submitted 5 February, 2021; v1 submitted 15 September, 2020; originally announced September 2020.

  15. arXiv:2006.15368  [pdf, other

    cs.LG stat.ML

    Offline Contextual Bandits with Overparameterized Models

    Authors: David Brandfonbrener, William F. Whitney, Rajesh Ranganath, Joan Bruna

    Abstract: Recent results in supervised learning suggest that while overparameterized models have the capacity to overfit, they in fact generalize quite well. We ask whether the same phenomenon occurs for offline contextual bandits. Our results are mixed. Value-based algorithms benefit from the same generalization behavior as overparameterized supervised learning, but policy-based algorithms do not. We show… ▽ More

    Submitted 16 June, 2021; v1 submitted 27 June, 2020; originally announced June 2020.

    Journal ref: Proceedings of the 38th International Conference on Machine Learning, PMLR 139, 2021

  16. arXiv:1911.00567  [pdf, ps, other

    cs.LG stat.ML

    Frequentist Regret Bounds for Randomized Least-Squares Value Iteration

    Authors: Andrea Zanette, David Brandfonbrener, Emma Brunskill, Matteo Pirotta, Alessandro Lazaric

    Abstract: We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning (RL). When the state space is large or continuous, traditional tabular approaches are unfeasible and some form of function approximation is mandatory. In this paper, we introduce an optimistically-initialized variant of the popular randomized least-squares value iteration (RLSVI), a model-free algorithm where… ▽ More

    Submitted 8 September, 2023; v1 submitted 1 November, 2019; originally announced November 2019.

    Comments: Minor bug fixes

  17. arXiv:1905.12185  [pdf, other

    cs.LG math.OC stat.ML

    Geometric Insights into the Convergence of Nonlinear TD Learning

    Authors: David Brandfonbrener, Joan Bruna

    Abstract: While there are convergence guarantees for temporal difference (TD) learning when using linear function approximators, the situation for nonlinear models is far less understood, and divergent examples are known. Here we take a first step towards extending theoretical convergence guarantees to TD learning with nonlinear function approximation. More precisely, we consider the expected learning dynam… ▽ More

    Submitted 11 February, 2020; v1 submitted 28 May, 2019; originally announced May 2019.

    Comments: ICLR 2020