Skip to main content

Showing 1–10 of 10 results for author: Lee, J N

  1. arXiv:2401.05193  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Experiment Planning with Function Approximation

    Authors: Aldo Pacchiano, Jonathan N. Lee, Emma Brunskill

    Abstract: We study the problem of experiment planning with function approximation in contextual bandit problems. In settings where there is a significant overhead to deploying adaptive algorithms -- for example, when the execution of the data collection policies is required to be distributed, or a human in the loop is needed to implement these policies -- producing in advance a set of policies for data coll… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

    Comments: 10 pages main

  2. arXiv:2306.14892  [pdf, other

    cs.LG cs.AI

    Supervised Pretraining Can Learn In-Context Reinforcement Learning

    Authors: Jonathan N. Lee, Annie Xie, Aldo Pacchiano, Yash Chandak, Chelsea Finn, Ofir Nachum, Emma Brunskill

    Abstract: Large transformer models trained on diverse datasets have shown a remarkable ability to learn in-context, achieving high few-shot performance on tasks they were not explicitly trained to solve. In this paper, we study the in-context learning capabilities of transformers in decision-making problems, i.e., reinforcement learning (RL) for bandits and Markov decision processes. To do so, we introduce… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

  3. arXiv:2302.09451  [pdf, other

    cs.LG stat.ML

    Estimating Optimal Policy Value in General Linear Contextual Bandits

    Authors: Jonathan N. Lee, Weihao Kong, Aldo Pacchiano, Vidya Muthukumar, Emma Brunskill

    Abstract: In many bandit problems, the maximal reward achievable by a policy is often unknown in advance. We consider the problem of estimating the optimal policy value in the sublinear data regime before the optimal policy is even learnable. We refer to this as $V^*$ estimation. It was recently shown that fast $V^*$ estimation is possible but only in disjoint linear bandits with Gaussian covariates. Whethe… ▽ More

    Submitted 18 February, 2023; originally announced February 2023.

  4. arXiv:2301.13857  [pdf, other

    cs.LG cs.AI stat.ML

    Learning in POMDPs is Sample-Efficient with Hindsight Observability

    Authors: Jonathan N. Lee, Alekh Agarwal, Christoph Dann, Tong Zhang

    Abstract: POMDPs capture a broad class of decision making problems, but hardness results suggest that learning is intractable even in simple settings due to the inherent partial observability. However, in many realistic problems, more information is either revealed or can be computed during some point of the learning process. Motivated by diverse applications ranging from robotics to data center scheduling,… ▽ More

    Submitted 3 February, 2023; v1 submitted 31 January, 2023; originally announced January 2023.

  5. arXiv:2211.02016  [pdf, other

    cs.LG cs.AI

    Oracle Inequalities for Model Selection in Offline Reinforcement Learning

    Authors: Jonathan N. Lee, George Tucker, Ofir Nachum, Bo Dai, Emma Brunskill

    Abstract: In offline reinforcement learning (RL), a learner leverages prior logged data to learn a good policy without interacting with the environment. A major challenge in applying such methods in practice is the lack of both theoretically principled and practical tools for model selection and evaluation. To address this, we study the problem of model selection in offline RL with value function approximat… ▽ More

    Submitted 3 November, 2022; originally announced November 2022.

  6. arXiv:2112.12320  [pdf, other

    cs.LG stat.ML

    Model Selection in Batch Policy Optimization

    Authors: Jonathan N. Lee, George Tucker, Ofir Nachum, Bo Dai

    Abstract: We study the problem of model selection in batch policy optimization: given a fixed, partial-feedback dataset and $M$ model classes, learn a policy with performance that is competitive with the policy derived from the best model class. We formalize the problem in the contextual bandit setting with linear model classes by identifying three sources of error that any model selection algorithm should… ▽ More

    Submitted 22 December, 2021; originally announced December 2021.

  7. arXiv:2011.09750  [pdf, ps, other

    cs.LG stat.ML

    Online Model Selection for Reinforcement Learning with Function Approximation

    Authors: Jonathan N. Lee, Aldo Pacchiano, Vidya Muthukumar, Weihao Kong, Emma Brunskill

    Abstract: Deep reinforcement learning has achieved impressive successes yet often requires a very large amount of interaction data. This result is perhaps unsurprising, as using complicated function approximation often requires more data to fit, and early theoretical results on linear Markov decision processes provide regret bounds that scale with the dimension of the linear approximation. Ideally, we would… ▽ More

    Submitted 19 November, 2020; originally announced November 2020.

  8. arXiv:2007.00699  [pdf, other

    cs.LG math.OC stat.ML

    Accelerated Message Passing for Entropy-Regularized MAP Inference

    Authors: Jonathan N. Lee, Aldo Pacchiano, Peter Bartlett, Michael I. Jordan

    Abstract: Maximum a posteriori (MAP) inference in discrete-valued Markov random fields is a fundamental problem in machine learning that involves identifying the most likely configuration of random variables given a distribution. Due to the difficulty of this combinatorial problem, linear programming (LP) relaxations are commonly used to derive specialized message passing algorithms that are often interpret… ▽ More

    Submitted 1 July, 2020; originally announced July 2020.

  9. arXiv:1907.01127  [pdf, other

    cs.LG math.OC stat.ML

    Convergence Rates of Smooth Message Passing with Rounding in Entropy-Regularized MAP Inference

    Authors: Jonathan N. Lee, Aldo Pacchiano, Michael I. Jordan

    Abstract: Maximum a posteriori (MAP) inference is a fundamental computational paradigm for statistical inference. In the setting of graphical models, MAP inference entails solving a combinatorial optimization problem to find the most likely configuration of the discrete-valued model. Linear programming (LP) relaxations in the Sherali-Adams hierarchy are widely used to attempt to solve this problem, and smoo… ▽ More

    Submitted 29 February, 2020; v1 submitted 1 July, 2019; originally announced July 2019.

  10. arXiv:1811.02184  [pdf, other

    cs.RO cs.LG

    Dynamic Regret Convergence Analysis and an Adaptive Regularization Algorithm for On-Policy Robot Imitation Learning

    Authors: Jonathan N. Lee, Michael Laskey, Ajay Kumar Tanwani, Anil Aswani, Ken Goldberg

    Abstract: On-policy imitation learning algorithms such as DAgger evolve a robot control policy by executing it, measuring performance (loss), obtaining corrective feedback from a supervisor, and generating the next policy. As the loss between iterations can vary unpredictably, a fundamental question is under what conditions this process will eventually achieve a converged policy. If one assumes the underlyi… ▽ More

    Submitted 8 July, 2019; v1 submitted 6 November, 2018; originally announced November 2018.