Skip to main content

Showing 1–18 of 18 results for author: Ma, Y J

  1. arXiv:2406.01967  [pdf, other

    cs.RO cs.AI cs.LG

    DrEureka: Language Model Guided Sim-To-Real Transfer

    Authors: Yecheng Jason Ma, William Liang, Hung-Ju Wang, Sam Wang, Yuke Zhu, Linxi Fan, Osbert Bastani, Dinesh Jayaraman

    Abstract: Transferring policies learned in simulation to the real world is a promising strategy for acquiring robot skills at scale. However, sim-to-real approaches typically rely on manual design and tuning of the task reward function as well as the simulation physics parameters, rendering the process slow and human-labor intensive. In this paper, we investigate using Large Language Models (LLMs) to automa… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Robotics: Science and Systems (RSS) 2024. Project website and open-source code: https://eureka-research.github.io/dr-eureka/

  2. arXiv:2404.13474  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Composing Pre-Trained Object-Centric Representations for Robotics From "What" and "Where" Foundation Models

    Authors: Junyao Shi, Jianing Qian, Yecheng Jason Ma, Dinesh Jayaraman

    Abstract: There have recently been large advances both in pre-training visual representations for robotic control and segmenting unknown category objects in general images. To leverage these for improved robot learning, we propose $\textbf{POCR}$, a new framework for building pre-trained object-centric representations for robotic control. Building on theories of "what-where" representations in psychology an… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

    Comments: ICRA 2024. Project website: https://sites.google.com/view/pocr

  3. arXiv:2403.12945  [pdf, other

    cs.RO

    DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

    Authors: Alexander Khazatsky, Karl Pertsch, Suraj Nair, Ashwin Balakrishna, Sudeep Dasari, Siddharth Karamcheti, Soroush Nasiriany, Mohan Kumar Srirama, Lawrence Yunliang Chen, Kirsty Ellis, Peter David Fagan, Joey Hejna, Masha Itkina, Marion Lepert, Yecheng Jason Ma, Patrick Tree Miller, Jimmy Wu, Suneel Belkhale, Shivin Dass, Huy Ha, Arhan Jain, Abraham Lee, Youngwoon Lee, Marius Memmel, Sungjae Park , et al. (74 additional authors not shown)

    Abstract: The creation of large, diverse, high-quality robot manipulation datasets is an important stepping stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a resu… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: Project website: https://droid-dataset.github.io/

  4. arXiv:2310.12931  [pdf, other

    cs.RO cs.AI cs.LG

    Eureka: Human-Level Reward Design via Coding Large Language Models

    Authors: Yecheng Jason Ma, William Liang, Guanzhi Wang, De-An Huang, Osbert Bastani, Dinesh Jayaraman, Yuke Zhu, Linxi Fan, Anima Anandkumar

    Abstract: Large Language Models (LLMs) have excelled as high-level semantic planners for sequential decision-making tasks. However, harnessing them to learn complex low-level manipulation tasks, such as dexterous pen spinning, remains an open problem. We bridge this fundamental gap and present Eureka, a human-level reward design algorithm powered by LLMs. Eureka exploits the remarkable zero-shot generation,… ▽ More

    Submitted 30 April, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: ICLR 2024. Project website and open-source code: https://eureka-research.github.io/

  5. arXiv:2310.08864  [pdf, other

    cs.RO

    Open X-Embodiment: Robotic Learning Datasets and RT-X Models

    Authors: Open X-Embodiment Collaboration, Abby O'Neill, Abdul Rehman, Abhinav Gupta, Abhiram Maddukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, Ajinkya Jain, Albert Tung, Alex Bewley, Alex Herzog, Alex Irpan, Alexander Khazatsky, Anant Rai, Anchit Gupta, Andrew Wang, Andrey Kolobov, Anikait Singh, Animesh Garg, Aniruddha Kembhavi, Annie Xie , et al. (267 additional authors not shown)

    Abstract: Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning method… ▽ More

    Submitted 1 June, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: Project website: https://robotics-transformer-x.github.io

  6. arXiv:2310.08581  [pdf, other

    cs.RO cs.CV

    Universal Visual Decomposer: Long-Horizon Manipulation Made Easy

    Authors: Zichen Zhang, Yunshuang Li, Osbert Bastani, Abhishek Gupta, Dinesh Jayaraman, Yecheng Jason Ma, Luca Weihs

    Abstract: Real-world robotic tasks stretch over extended horizons and encompass multiple stages. Learning long-horizon manipulation tasks, however, is a long-standing challenge, and demands decomposing the overarching task into several manageable subtasks to facilitate policy learning and generalization to unseen tasks. Prior task decomposition methods require task-specific knowledge, are computationally in… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

  7. arXiv:2306.00958  [pdf, other

    cs.RO cs.AI cs.LG

    LIV: Language-Image Representations and Rewards for Robotic Control

    Authors: Yecheng Jason Ma, William Liang, Vaidehi Som, Vikash Kumar, Amy Zhang, Osbert Bastani, Dinesh Jayaraman

    Abstract: We present Language-Image Value learning (LIV), a unified objective for vision-language representation and reward learning from action-free videos with text annotations. Exploiting a novel connection between dual reinforcement learning and mutual information contrastive learning, the LIV objective trains a multi-modal representation that implicitly encodes a universal value function for tasks spec… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: Extended version of ICML 2023 camera-ready; Project website: https://penn-pal-lab.github.io/LIV/

  8. arXiv:2305.12663  [pdf, other

    cs.LG cs.AI

    TOM: Learning Policy-Aware Models for Model-Based Reinforcement Learning via Transition Occupancy Matching

    Authors: Yecheng Jason Ma, Kausik Sivakumar, Jason Yan, Osbert Bastani, Dinesh Jayaraman

    Abstract: Standard model-based reinforcement learning (MBRL) approaches fit a transition model of the environment to all past experience, but this wastes model capacity on data that is irrelevant for policy improvement. We instead propose a new "transition occupancy matching" (TOM) objective for MBRL model learning: a model is good to the extent that the current policy experiences the same distribution of t… ▽ More

    Submitted 21 May, 2023; originally announced May 2023.

    Comments: L4DC 2023; Project website: https://penn-pal-lab.github.io/TOM/

  9. arXiv:2303.18240  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?

    Authors: Arjun Majumdar, Karmesh Yadav, Sergio Arnaud, Yecheng Jason Ma, Claire Chen, Sneha Silwal, Aryan Jain, Vincent-Pierre Berges, Pieter Abbeel, Jitendra Malik, Dhruv Batra, Yixin Lin, Oleksandr Maksymets, Aravind Rajeswaran, Franziska Meier

    Abstract: We present the largest and most comprehensive empirical study of pre-trained visual representations (PVRs) or visual 'foundation models' for Embodied AI. First, we curate CortexBench, consisting of 17 different tasks spanning locomotion, navigation, dexterous, and mobile manipulation. Next, we systematically evaluate existing PVRs and find that none are universally dominant. To study the effect of… ▽ More

    Submitted 1 February, 2024; v1 submitted 31 March, 2023; originally announced March 2023.

    Comments: Project website: https://eai-vc.github.io

  10. arXiv:2210.05650  [pdf, other

    cs.LG

    Regret Bounds for Risk-Sensitive Reinforcement Learning

    Authors: O. Bastani, Y. J. Ma, E. Shen, W. Xu

    Abstract: In safety-critical applications of reinforcement learning such as healthcare and robotics, it is often desirable to optimize risk-sensitive objectives that account for tail outcomes rather than expected reward. We prove the first regret bounds for reinforcement learning under a general class of risk-sensitive objectives including the popular CVaR objective. Our theory is based on a novel character… ▽ More

    Submitted 11 October, 2022; originally announced October 2022.

  11. arXiv:2210.00030  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    VIP: Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training

    Authors: Yecheng Jason Ma, Shagun Sodhani, Dinesh Jayaraman, Osbert Bastani, Vikash Kumar, Amy Zhang

    Abstract: Reward and representation learning are two long-standing challenges for learning an expanding set of robot manipulation skills from sensory observations. Given the inherent cost and scarcity of in-domain, task-specific robot data, learning from large, diverse, offline human videos has emerged as a promising path towards acquiring a generally useful visual representation for control; however, how t… ▽ More

    Submitted 6 March, 2023; v1 submitted 30 September, 2022; originally announced October 2022.

    Comments: ICLR 2023, Notable-Top-25% (Spotlight). Project website: https://sites.google.com/view/vip-rl

  12. arXiv:2206.03023  [pdf, other

    cs.LG cs.AI

    How Far I'll Go: Offline Goal-Conditioned Reinforcement Learning via $f$-Advantage Regression

    Authors: Yecheng Jason Ma, Jason Yan, Dinesh Jayaraman, Osbert Bastani

    Abstract: Offline goal-conditioned reinforcement learning (GCRL) promises general-purpose skill learning in the form of reaching diverse goals from purely offline datasets. We propose $\textbf{Go}$al-conditioned $f$-$\textbf{A}$dvantage $\textbf{R}$egression (GoFAR), a novel regression-based offline GCRL algorithm derived from a state-occupancy matching perspective; the key intuition is that the goal-reachi… ▽ More

    Submitted 10 November, 2022; v1 submitted 7 June, 2022; originally announced June 2022.

    Comments: NeurIPS 2022. Project website: https://jasonma2016.github.io/GoFAR/

  13. arXiv:2202.02433  [pdf, other

    cs.LG cs.AI

    Versatile Offline Imitation from Observations and Examples via Regularized State-Occupancy Matching

    Authors: Yecheng Jason Ma, Andrew Shen, Dinesh Jayaraman, Osbert Bastani

    Abstract: We propose State Matching Offline DIstribution Correction Estimation (SMODICE), a novel and versatile regression-based offline imitation learning (IL) algorithm derived via state-occupancy matching. We show that the SMODICE objective admits a simple optimization procedure through an application of Fenchel duality and an analytic solution in tabular MDPs. Without requiring access to expert actions,… ▽ More

    Submitted 18 June, 2022; v1 submitted 4 February, 2022; originally announced February 2022.

    Comments: ICML 2022. Project website: https://sites.google.com/view/smodice/home

  14. arXiv:2112.07701  [pdf, other

    cs.LG

    Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning

    Authors: Yecheng Jason Ma, Andrew Shen, Osbert Bastani, Dinesh Jayaraman

    Abstract: Reinforcement Learning (RL) agents in the real world must satisfy safety constraints in addition to maximizing a reward objective. Model-based RL algorithms hold promise for reducing unsafe real-world actions: they may synthesize policies that obey all constraints using simulated samples from a learned model. However, imperfect models can result in real-world constraint violations even for actions… ▽ More

    Submitted 14 December, 2021; originally announced December 2021.

    Comments: AAAI 2022

  15. arXiv:2110.05440  [pdf, other

    cs.RO

    Safe Human-Interactive Control via Shielding

    Authors: Jeevana Priya Inala, Yecheng Jason Ma, Osbert Bastani, Xin Zhang, Armando Solar-Lezama

    Abstract: Ensuring safety for human-interactive robotics is important due to the potential for human injury. The key challenge is defining safety in a way that accounts for the complex range of human behaviors without modeling the human as an unconstrained adversary. We propose a novel approach to ensuring safety in these settings. Our approach focuses on defining backup actions that we believe human always… ▽ More

    Submitted 11 October, 2021; originally announced October 2021.

  16. arXiv:2109.06310  [pdf, other

    cs.LG stat.ML

    State Relevance for Off-Policy Evaluation

    Authors: Simon P. Shen, Yecheng Jason Ma, Omer Gottesman, Finale Doshi-Velez

    Abstract: Importance sampling-based estimators for off-policy evaluation (OPE) are valued for their simplicity, unbiasedness, and reliance on relatively few assumptions. However, the variance of these estimators is often high, especially when trajectories are of different lengths. In this work, we introduce Omitting-States-Irrelevant-to-Return Importance Sampling (OSIRIS), an estimator which reduces varianc… ▽ More

    Submitted 13 September, 2021; originally announced September 2021.

    Comments: ICML 2021

    Journal ref: Proceedings of the 38th International Conference on Machine Learning, PMLR 139:9537-9546, 2021

  17. arXiv:2107.06106  [pdf, other

    cs.LG

    Conservative Offline Distributional Reinforcement Learning

    Authors: Yecheng Jason Ma, Dinesh Jayaraman, Osbert Bastani

    Abstract: Many reinforcement learning (RL) problems in practice are offline, learning purely from observational data. A key challenge is how to ensure the learned policy is safe, which requires quantifying the risk associated with different actions. In the online setting, distributional RL algorithms do so by learning the distribution over returns (i.e., cumulative rewards) instead of the expected return; b… ▽ More

    Submitted 26 October, 2021; v1 submitted 12 July, 2021; originally announced July 2021.

    Comments: NeurIPS 2021

  18. arXiv:2011.15084  [pdf, other

    cs.CV cs.LG

    Likelihood-Based Diverse Sampling for Trajectory Forecasting

    Authors: Yecheng Jason Ma, Jeevana Priya Inala, Dinesh Jayaraman, Osbert Bastani

    Abstract: Forecasting complex vehicle and pedestrian multi-modal distributions requires powerful probabilistic approaches. Normalizing flows (NF) have recently emerged as an attractive tool to model such distributions. However, a key drawback is that independent samples drawn from a flow model often do not adequately capture all the modes in the underlying distribution. We propose Likelihood-Based Diverse S… ▽ More

    Submitted 14 September, 2021; v1 submitted 30 November, 2020; originally announced November 2020.

    Comments: ICCV 2021