Skip to main content

Showing 1–50 of 239 results for author: Finn, C

  1. arXiv:2407.10341  [pdf, other

    cs.RO cs.AI cs.LG

    Affordance-Guided Reinforcement Learning via Visual Prompting

    Authors: Olivia Y. Lee, Annie Xie, Kuan Fang, Karl Pertsch, Chelsea Finn

    Abstract: Robots equipped with reinforcement learning (RL) have the potential to learn a wide range of skills solely from a reward signal. However, obtaining a robust and dense reward signal for general manipulation tasks remains a challenge. Existing learning-based approaches require significant data, such as demonstrations or examples of success and failure, to learn task-specific reward functions. Recent… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: 15 pages, 9 figures. Robotics: Science and Systems (RSS) 2024, Task Specification for General-Purpose Intelligent Robots & Lifelong Robot Learning Workshops

  2. arXiv:2407.08693  [pdf, other

    cs.RO cs.LG

    Robotic Control via Embodied Chain-of-Thought Reasoning

    Authors: Michał Zawalski, William Chen, Karl Pertsch, Oier Mees, Chelsea Finn, Sergey Levine

    Abstract: A key limitation of learned robot control policies is their inability to generalize outside their training data. Recent works on vision-language-action models (VLAs) have shown that the use of large, internet pre-trained vision-language models as the backbone of learned robot policies can substantially improve their robustness and generalization ability. Yet, one of the most exciting capabilities… ▽ More

    Submitted 12 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

    Comments: Project Website: https://embodied-cot.github.io

  3. arXiv:2407.07775  [pdf, other

    cs.RO cs.AI

    Mobility VLA: Multimodal Instruction Navigation with Long-Context VLMs and Topological Graphs

    Authors: Hao-Tien Lewis Chiang, Zhuo Xu, Zipeng Fu, Mithun George Jacob, Tingnan Zhang, Tsang-Wei Edward Lee, Wenhao Yu, Connor Schenck, David Rendleman, Dhruv Shah, Fei Xia, Jasmine Hsu, Jonathan Hoech, Pete Florence, Sean Kirmani, Sumeet Singh, Vikas Sindhwani, Carolina Parada, Chelsea Finn, Peng Xu, Sergey Levine, Jie Tan

    Abstract: An elusive goal in navigation research is to build an intelligent agent that can understand multimodal instructions including natural language and image, and perform useful navigation. To achieve this, we study a widely useful category of navigation tasks we call Multimodal Instruction Navigation with demonstration Tours (MINT), in which the environment prior is provided through a previously recor… ▽ More

    Submitted 12 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

  4. arXiv:2407.04842  [pdf, other

    cs.CV cs.CL cs.LG

    MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

    Authors: Zhaorun Chen, Yichao Du, Zichen Wen, Yiyang Zhou, Chenhang Cui, Zhenzhen Weng, Haoqin Tu, Chaoqi Wang, Zhengwei Tong, Qinglan Huang, Canyu Chen, Qinghao Ye, Zhihong Zhu, Yuqing Zhang, Jiawei Zhou, Zhuokai Zhao, Rafael Rafailov, Chelsea Finn, Huaxiu Yao

    Abstract: While text-to-image models like DALLE-3 and Stable Diffusion are rapidly proliferating, they often encounter challenges such as hallucination, bias, and the production of unsafe, low-quality output. To effectively address these issues, it is crucial to align these models with desired behaviors based on feedback from a multimodal judge. Despite their significance, current multimodal judges frequent… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 42 pages, 13 figures, 33 tables

  5. arXiv:2407.02666  [pdf, other

    cs.RO cs.AI

    Commonsense Reasoning for Legged Robot Adaptation with Vision-Language Models

    Authors: Annie S. Chen, Alec M. Lessing, Andy Tang, Govind Chada, Laura Smith, Sergey Levine, Chelsea Finn

    Abstract: Legged robots are physically capable of navigating a diverse variety of environments and overcoming a wide range of obstructions. For example, in a search and rescue mission, a legged robot could climb over debris, crawl through gaps, and navigate out of dead ends. However, the robot's controller needs to respond intelligently to such varied obstacles, and this requires handling unexpected and unu… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 27 pages

  6. arXiv:2406.15917  [pdf, other

    cs.RO

    To Err is Robotic: Rapid Value-Based Trial-and-Error during Deployment

    Authors: Maximilian Du, Alexander Khazatsky, Tobias Gerstenberg, Chelsea Finn

    Abstract: When faced with a novel scenario, it can be hard to succeed on the first attempt. In these challenging situations, it is important to know how to retry quickly and meaningfully. Retrying behavior can emerge naturally in robots trained on diverse data, but such robot policies will typically only exhibit undirected retrying behavior and may not terminate a suboptimal approach before an unrecoverable… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  7. arXiv:2406.10454  [pdf, other

    cs.RO cs.AI cs.CV cs.LG eess.SY

    HumanPlus: Humanoid Shadowing and Imitation from Humans

    Authors: Zipeng Fu, Qingqing Zhao, Qi Wu, Gordon Wetzstein, Chelsea Finn

    Abstract: One of the key arguments for building robots that have similar form factors to human beings is that we can leverage the massive human data for training. Yet, doing so has remained challenging in practice due to the complexities in humanoid perception and control, lingering physical gaps between humanoids and humans in morphologies and actuation, and lack of a data pipeline for humanoids to learn a… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: project website: https://humanoid-ai.github.io/

  8. arXiv:2406.09246  [pdf, other

    cs.RO cs.LG

    OpenVLA: An Open-Source Vision-Language-Action Model

    Authors: Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan Foster, Grace Lam, Pannag Sanketi, Quan Vuong, Thomas Kollar, Benjamin Burchfiel, Russ Tedrake, Dorsa Sadigh, Sergey Levine, Percy Liang, Chelsea Finn

    Abstract: Large policies pretrained on a combination of Internet-scale vision-language data and diverse robot demonstrations have the potential to change how we teach robots new skills: rather than training new behaviors from scratch, we can fine-tune such vision-language-action (VLA) models to obtain robust, generalizable policies for visuomotor control. Yet, widespread adoption of VLAs for robotics has be… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Website: https://openvla.github.io/

  9. arXiv:2406.02900  [pdf, other

    cs.LG cs.AI cs.CL

    Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms

    Authors: Rafael Rafailov, Yaswanth Chittepu, Ryan Park, Harshit Sikchi, Joey Hejna, Bradley Knox, Chelsea Finn, Scott Niekum

    Abstract: Reinforcement Learning from Human Feedback (RLHF) has been crucial to the recent success of Large Language Models (LLMs), however, it is often a complex and brittle process. In the classical RLHF framework, a reward model is first trained to represent human preferences, which is in turn used by an online reinforcement learning (RL) algorithm to optimize the LLM. A prominent issue with such methods… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  10. arXiv:2405.13193  [pdf, other

    cs.LG

    Efficient Imitation Learning with Conservative World Models

    Authors: Victor Kolev, Rafael Rafailov, Kyle Hatch, Jiajun Wu, Chelsea Finn

    Abstract: We tackle the problem of policy learning from expert demonstrations without a reward function. A central challenge in this space is that these policies fail upon deployment due to issues of distributional shift, environment stochasticity, or compounding errors. Adversarial imitation learning alleviates this issue but requires additional on-policy training samples for stability, which presents a ch… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: Oral presentation, L4DC 2024

  11. arXiv:2405.12213  [pdf, other

    cs.RO cs.LG

    Octo: An Open-Source Generalist Robot Policy

    Authors: Octo Model Team, Dibya Ghosh, Homer Walke, Karl Pertsch, Kevin Black, Oier Mees, Sudeep Dasari, Joey Hejna, Tobias Kreiman, Charles Xu, Jianlan Luo, You Liang Tan, Lawrence Yunliang Chen, Pannag Sanketi, Quan Vuong, Ted Xiao, Dorsa Sadigh, Chelsea Finn, Sergey Levine

    Abstract: Large policies pretrained on diverse robot datasets have the potential to transform robotic learning: instead of training new policies from scratch, such generalist robot policies may be finetuned with only a little in-domain data, yet generalize broadly. However, to be widely applicable across a range of robotic learning scenarios, environments, and tasks, such policies need to handle diverse sen… ▽ More

    Submitted 26 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

    Comments: Project website: https://octo-models.github.io

  12. arXiv:2405.05941  [pdf, other

    cs.RO cs.CV cs.LG

    Evaluating Real-World Robot Manipulation Policies in Simulation

    Authors: Xuanlin Li, Kyle Hsu, Jiayuan Gu, Karl Pertsch, Oier Mees, Homer Rich Walke, Chuyuan Fu, Ishikaa Lunawat, Isabel Sieh, Sean Kirmani, Sergey Levine, Jiajun Wu, Chelsea Finn, Hao Su, Quan Vuong, Ted Xiao

    Abstract: The field of robotics has made significant advances towards generalist robot manipulation policies. However, real-world evaluation of such policies is not scalable and faces reproducibility challenges, which are likely to worsen as policies broaden the spectrum of tasks they can perform. We identify control and visual disparities between real and simulated environments as key challenges for reliab… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  13. arXiv:2405.02292  [pdf, other

    cs.RO cs.LG

    ALOHA 2: An Enhanced Low-Cost Hardware for Bimanual Teleoperation

    Authors: ALOHA 2 Team, Jorge Aldaco, Travis Armstrong, Robert Baruch, Jeff Bingham, Sanky Chan, Kenneth Draper, Debidatta Dwibedi, Chelsea Finn, Pete Florence, Spencer Goodrich, Wayne Gramlich, Torr Hage, Alexander Herzog, Jonathan Hoech, Thinh Nguyen, Ian Storz, Baruch Tabanpour, Leila Takayama, Jonathan Tompson, Ayzaan Wahid, Ted Wahrburg, Sichun Xu, Sergey Yaroshenko, Kevin Zakka , et al. (1 additional authors not shown)

    Abstract: Diverse demonstration datasets have powered significant advances in robot learning, but the dexterity and scale of such data can be limited by the hardware cost, the hardware robustness, and the ease of teleoperation. We introduce ALOHA 2, an enhanced version of ALOHA that has greater performance, ergonomics, and robustness compared to the original design. To accelerate research in large-scale bim… ▽ More

    Submitted 7 February, 2024; originally announced May 2024.

    Comments: Project website: aloha-2.github.io

  14. arXiv:2404.14367  [pdf, other

    cs.LG

    Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data

    Authors: Fahim Tajwar, Anikait Singh, Archit Sharma, Rafael Rafailov, Jeff Schneider, Tengyang Xie, Stefano Ermon, Chelsea Finn, Aviral Kumar

    Abstract: Learning from preference labels plays a crucial role in fine-tuning large language models. There are several distinct approaches for preference fine-tuning, including supervised learning, on-policy reinforcement learning (RL), and contrastive learning. Different methods come with different implementation tradeoffs and performance differences, and existing empirical findings present different concl… ▽ More

    Submitted 2 June, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: International Conference on Machine Learning (ICML), 2024

  15. arXiv:2404.12358  [pdf, other

    cs.LG

    From $r$ to $Q^*$: Your Language Model is Secretly a Q-Function

    Authors: Rafael Rafailov, Joey Hejna, Ryan Park, Chelsea Finn

    Abstract: Reinforcement Learning From Human Feedback (RLHF) has been a critical to the success of the latest generation of generative AI models. In response to the complex nature of the classical RLHF pipeline, direct alignment algorithms such as Direct Preference Optimization (DPO) have emerged as an alternative approach. Although DPO solves the same objective as the standard RLHF setup, there is a mismatc… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  16. arXiv:2404.10282  [pdf, other

    cs.LG cs.CV

    Tripod: Three Complementary Inductive Biases for Disentangled Representation Learning

    Authors: Kyle Hsu, Jubayer Ibn Hamid, Kaylee Burns, Chelsea Finn, Jiajun Wu

    Abstract: Inductive biases are crucial in disentangled representation learning for narrowing down an underspecified solution set. In this work, we consider endowing a neural network autoencoder with three select inductive biases from the literature: data compression into a grid-like latent space via quantization, collective independence amongst latents, and minimal functional influence of any latent on how… ▽ More

    Submitted 24 May, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: ICML 2024 camera-ready. 22 pages, 10 figures, code available at https://github.com/kylehkhsu/tripod

  17. arXiv:2403.19159  [pdf, other

    cs.CL cs.LG

    Disentangling Length from Quality in Direct Preference Optimization

    Authors: Ryan Park, Rafael Rafailov, Stefano Ermon, Chelsea Finn

    Abstract: Reinforcement Learning from Human Feedback (RLHF) has been a crucial component in the recent success of Large Language Models. However, RLHF is know to exploit biases in human preferences, such as verbosity. A well-formatted and eloquent answer is often more highly rated by users, even when it is less helpful and objective. A number of approaches have been developed to control those biases in the… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  18. arXiv:2403.12945  [pdf, other

    cs.RO

    DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

    Authors: Alexander Khazatsky, Karl Pertsch, Suraj Nair, Ashwin Balakrishna, Sudeep Dasari, Siddharth Karamcheti, Soroush Nasiriany, Mohan Kumar Srirama, Lawrence Yunliang Chen, Kirsty Ellis, Peter David Fagan, Joey Hejna, Masha Itkina, Marion Lepert, Yecheng Jason Ma, Patrick Tree Miller, Jimmy Wu, Suneel Belkhale, Shivin Dass, Huy Ha, Arhan Jain, Abraham Lee, Youngwoon Lee, Marius Memmel, Sungjae Park , et al. (74 additional authors not shown)

    Abstract: The creation of large, diverse, high-quality robot manipulation datasets is an important stepping stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a resu… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: Project website: https://droid-dataset.github.io/

  19. arXiv:2403.12910  [pdf, other

    cs.RO cs.AI cs.LG

    Yell At Your Robot: Improving On-the-Fly from Language Corrections

    Authors: Lucy Xiaoyang Shi, Zheyuan Hu, Tony Z. Zhao, Archit Sharma, Karl Pertsch, Jianlan Luo, Sergey Levine, Chelsea Finn

    Abstract: Hierarchical policies that combine language and low-level control have been shown to perform impressively long-horizon robotic tasks, by leveraging either zero-shot high-level planners like pretrained language and vision-language models (LLMs/VLMs) or models trained on annotated robotic demonstrations. However, for complex and dexterous skills, attaining high success rates on long-horizon tasks st… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: Project website: https://yay-robot.github.io/

  20. arXiv:2403.05110  [pdf, other

    cs.RO cs.AI cs.LG

    Efficient Data Collection for Robotic Manipulation via Compositional Generalization

    Authors: Jensen Gao, Annie Xie, Ted Xiao, Chelsea Finn, Dorsa Sadigh

    Abstract: Data collection has become an increasingly important problem in robotic manipulation, yet there still lacks much understanding of how to effectively collect data to facilitate broad generalization. Recent works on large-scale robotic data collection typically vary many environmental factors of variation (e.g., object types, table textures) during data collection, to cover a diverse range of scenar… ▽ More

    Submitted 21 May, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

    Comments: RSS 2024

  21. arXiv:2402.19432  [pdf, other

    cs.RO

    Pushing the Limits of Cross-Embodiment Learning for Manipulation and Navigation

    Authors: Jonathan Yang, Catherine Glossop, Arjun Bhorkar, Dhruv Shah, Quan Vuong, Chelsea Finn, Dorsa Sadigh, Sergey Levine

    Abstract: Recent years in robotics and imitation learning have shown remarkable progress in training large-scale foundation models by leveraging data across a multitude of embodiments. The success of such policies might lead us to wonder: just how diverse can the robots in the training set be while still facilitating positive transfer? In this work, we study this question in the context of heterogeneous emb… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

    Comments: 16 pages, 9 figures

    MSC Class: 68T40 ACM Class: I.2.9

  22. arXiv:2402.14789  [pdf, other

    cs.LG cs.AI

    Self-Guided Masked Autoencoders for Domain-Agnostic Self-Supervised Learning

    Authors: Johnathan Xie, Yoonho Lee, Annie S. Chen, Chelsea Finn

    Abstract: Self-supervised learning excels in learning representations from large amounts of unlabeled data, demonstrating success across multiple data modalities. Yet, extending self-supervised learning to new modalities is non-trivial because the specifics of existing methods are tailored to each domain, such as domain-specific augmentations which reflect the invariances in the target task. While masked mo… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: ICLR 2024

  23. arXiv:2402.12366  [pdf, other

    cs.LG cs.AI cs.CL

    A Critical Evaluation of AI Feedback for Aligning Large Language Models

    Authors: Archit Sharma, Sedrick Keh, Eric Mitchell, Chelsea Finn, Kushal Arora, Thomas Kollar

    Abstract: Reinforcement learning with AI feedback (RLAIF) is a popular paradigm for improving the instruction-following abilities of powerful pre-trained language models. RLAIF first performs supervised fine-tuning (SFT) using demonstrations from a teacher model and then further fine-tunes the model with reinforcement learning (RL), using feedback from a critic model. While recent popular open-source models… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

  24. arXiv:2402.11411  [pdf, other

    cs.LG cs.CL cs.CV

    Aligning Modalities in Vision Large Language Models via Preference Fine-tuning

    Authors: Yiyang Zhou, Chenhang Cui, Rafael Rafailov, Chelsea Finn, Huaxiu Yao

    Abstract: Instruction-following Vision Large Language Models (VLLMs) have achieved significant progress recently on a variety of tasks. These approaches merge strong pre-trained vision models and large language models (LLMs). Since these components are trained separately, the learned representations need to be aligned with joint training on additional image-language pairs. This procedure is not perfect and… ▽ More

    Submitted 17 February, 2024; originally announced February 2024.

  25. arXiv:2402.10893  [pdf, other

    cs.LG cs.AI cs.CL

    RLVF: Learning from Verbal Feedback without Overgeneralization

    Authors: Moritz Stephan, Alexander Khazatsky, Eric Mitchell, Annie S Chen, Sheryl Hsu, Archit Sharma, Chelsea Finn

    Abstract: The diversity of contexts in which large language models (LLMs) are deployed requires the ability to modify or customize default model behaviors to incorporate nuanced requirements and preferences. A convenient interface to specify such model adjustments is high-level verbal feedback, such as "Don't use emojis when drafting emails to my boss." However, while writing high-level feedback is far simp… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: 9 pages, 9 figures

  26. arXiv:2402.07872  [pdf, other

    cs.RO cs.CL cs.CV cs.LG

    PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs

    Authors: Soroush Nasiriany, Fei Xia, Wenhao Yu, Ted Xiao, Jacky Liang, Ishita Dasgupta, Annie Xie, Danny Driess, Ayzaan Wahid, Zhuo Xu, Quan Vuong, Tingnan Zhang, Tsang-Wei Edward Lee, Kuang-Huei Lee, Peng Xu, Sean Kirmani, Yuke Zhu, Andy Zeng, Karol Hausman, Nicolas Heess, Chelsea Finn, Sergey Levine, Brian Ichter

    Abstract: Vision language models (VLMs) have shown impressive capabilities across a variety of tasks, from logical reasoning to visual understanding. This opens the door to richer interaction with the world, for example robotic control. However, VLMs produce only textual outputs, while robotic control and other spatial tasks require outputting continuous coordinates, actions, or trajectories. How can we ena… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

  27. arXiv:2402.05232  [pdf, other

    cs.LG cs.AI

    Universal Neural Functionals

    Authors: Allan Zhou, Chelsea Finn, James Harrison

    Abstract: A challenging problem in many modern machine learning tasks is to process weight-space features, i.e., to transform or extract information from the weights and gradients of a neural network. Recent works have developed promising weight-space models that are equivariant to the permutation symmetries of simple feedforward networks. However, they are not applicable to general architectures, since the… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

  28. arXiv:2402.03715  [pdf, other

    cs.LG cs.AI cs.CL

    Clarify: Improving Model Robustness With Natural Language Corrections

    Authors: Yoonho Lee, Michelle S. Lam, Helena Vasconcelos, Michael S. Bernstein, Chelsea Finn

    Abstract: In supervised learning, models are trained to extract correlations from a static dataset. This often leads to models that rely on high-level misconceptions. To prevent such misconceptions, we must necessarily provide additional information beyond the training data. Existing methods incorporate forms of additional instance-level supervision, such as labels for spurious features or additional labele… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  29. arXiv:2401.16013  [pdf, other

    cs.RO cs.AI

    SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning

    Authors: Jianlan Luo, Zheyuan Hu, Charles Xu, You Liang Tan, Jacob Berg, Archit Sharma, Stefan Schaal, Chelsea Finn, Abhishek Gupta, Sergey Levine

    Abstract: In recent years, significant progress has been made in the field of robotic reinforcement learning (RL), enabling methods that handle complex image observations, train in the real world, and incorporate auxiliary data, such as demonstrations and prior experience. However, despite these advances, robotic RL remains hard to use. It is acknowledged among practitioners that the particular implementati… ▽ More

    Submitted 12 February, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: ICRA 2024

  30. arXiv:2401.12963  [pdf, other

    cs.RO cs.AI cs.CL cs.CV cs.LG

    AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents

    Authors: Michael Ahn, Debidatta Dwibedi, Chelsea Finn, Montse Gonzalez Arenas, Keerthana Gopalakrishnan, Karol Hausman, Brian Ichter, Alex Irpan, Nikhil Joshi, Ryan Julian, Sean Kirmani, Isabel Leal, Edward Lee, Sergey Levine, Yao Lu, Isabel Leal, Sharath Maddineni, Kanishka Rao, Dorsa Sadigh, Pannag Sanketi, Pierre Sermanet, Quan Vuong, Stefan Welker, Fei Xia, Ted Xiao , et al. (3 additional authors not shown)

    Abstract: Foundation models that incorporate language, vision, and more recently actions have revolutionized the ability to harness internet scale data to reason about useful tasks. However, one of the key challenges of training embodied foundation models is the lack of data grounded in the physical world. In this paper, we propose AutoRT, a system that leverages existing foundation models to scale up the d… ▽ More

    Submitted 1 July, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

    Comments: 26 pages, 9 figures, ICRA 2024 VLMNM Workshop

  31. arXiv:2401.10220  [pdf, other

    cs.LG cs.CV

    AutoFT: Learning an Objective for Robust Fine-Tuning

    Authors: Caroline Choi, Yoonho Lee, Annie Chen, Allan Zhou, Aditi Raghunathan, Chelsea Finn

    Abstract: Foundation models encode rich representations that can be adapted to downstream tasks by fine-tuning. However, fine-tuning a model on one data distribution often degrades performance under distribution shifts. Current approaches to robust fine-tuning use hand-crafted regularization techniques to constrain the fine-tuning process towards the pretrained model. Yet, it is hard to specify how to adapt… ▽ More

    Submitted 7 March, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

    Comments: 18 pages

  32. arXiv:2401.03306  [pdf, other

    cs.LG cs.AI cs.RO

    MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot Learning

    Authors: Rafael Rafailov, Kyle Hatch, Victor Kolev, John D. Martin, Mariano Phielipp, Chelsea Finn

    Abstract: We study the problem of offline pre-training and online fine-tuning for reinforcement learning from high-dimensional observations in the context of realistic robot tasks. Recent offline model-free approaches successfully use online fine-tuning to either improve the performance of the agent over the data collection policy or adapt to novel tasks. At the same time, model-based RL algorithms have ach… ▽ More

    Submitted 6 January, 2024; originally announced January 2024.

    Comments: This is an updated version of a manuscript that originally appeared at CoRL 2023. The project website is here https://sites.google.com/view/mo2o

    Journal ref: Proceedings of The 7th Conference on Robot Learning, PMLR 229:3654-3671, 2023

  33. arXiv:2401.02117  [pdf, other

    cs.RO cs.AI cs.CV cs.LG eess.SY

    Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation

    Authors: Zipeng Fu, Tony Z. Zhao, Chelsea Finn

    Abstract: Imitation learning from human demonstrations has shown impressive performance in robotics. However, most results focus on table-top manipulation, lacking the mobility and dexterity necessary for generally useful tasks. In this work, we develop a system for imitating mobile manipulation tasks that are bimanual and require whole-body control. We first present Mobile ALOHA, a low-cost and whole-body… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

    Comments: Project website: https://mobile-aloha.github.io (Zipeng Fu and Tony Z. Zhao are project co-leads, Chelsea Finn is the advisor)

  34. arXiv:2312.12444  [pdf, other

    cs.CV cs.AI cs.RO

    What Makes Pre-Trained Visual Representations Successful for Robust Manipulation?

    Authors: Kaylee Burns, Zach Witzel, Jubayer Ibn Hamid, Tianhe Yu, Chelsea Finn, Karol Hausman

    Abstract: Inspired by the success of transfer learning in computer vision, roboticists have investigated visual pre-training as a means to improve the learning efficiency and generalization ability of policies learned from pixels. To that end, past work has favored large object interaction datasets, such as first-person videos of humans completing diverse tasks, in pursuit of manipulation-relevant features.… ▽ More

    Submitted 3 November, 2023; originally announced December 2023.

    Comments: 20 pages, 12 figures

  35. arXiv:2311.08401  [pdf, other

    cs.CL cs.AI cs.LG

    Fine-tuning Language Models for Factuality

    Authors: Katherine Tian, Eric Mitchell, Huaxiu Yao, Christopher D. Manning, Chelsea Finn

    Abstract: The fluency and creativity of large pre-trained language models (LLMs) have led to their widespread use, sometimes even as a replacement for traditional search engines. Yet language models are prone to making convincing but factually inaccurate claims, often referred to as 'hallucinations.' These errors can inadvertently spread misinformation or harmfully perpetuate misconceptions. Further, manual… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

  36. arXiv:2311.01977  [pdf, other

    cs.RO cs.AI

    RT-Trajectory: Robotic Task Generalization via Hindsight Trajectory Sketches

    Authors: Jiayuan Gu, Sean Kirmani, Paul Wohlhart, Yao Lu, Montserrat Gonzalez Arenas, Kanishka Rao, Wenhao Yu, Chuyuan Fu, Keerthana Gopalakrishnan, Zhuo Xu, Priya Sundaresan, Peng Xu, Hao Su, Karol Hausman, Chelsea Finn, Quan Vuong, Ted Xiao

    Abstract: Generalization remains one of the most important desiderata for robust robot learning systems. While recently proposed approaches show promise in generalization to novel objects, semantic concepts, or visual distribution shifts, generalization to new tasks remains challenging. For example, a language-conditioned policy trained on pick-and-place tasks will not be able to generalize to a folding tas… ▽ More

    Submitted 6 November, 2023; v1 submitted 3 November, 2023; originally announced November 2023.

    Comments: Evaluation videos can be found at https://rt-trajectory.github.io/

  37. arXiv:2311.01059  [pdf, other

    cs.RO cs.LG

    Adapt On-the-Go: Behavior Modulation for Single-Life Robot Deployment

    Authors: Annie S. Chen, Govind Chada, Laura Smith, Archit Sharma, Zipeng Fu, Sergey Levine, Chelsea Finn

    Abstract: To succeed in the real world, robots must cope with situations that differ from those seen during training. We study the problem of adapting on-the-fly to such novel scenarios during deployment, by drawing upon a diverse repertoire of previously learned behaviors. Our approach, RObust Autonomous Modulation (ROAM), introduces a mechanism based on the perceived value of pre-trained behaviors to sele… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Comments: 19 pages, 6 figures

  38. arXiv:2310.15145  [pdf, other

    cs.RO cs.AI cs.LG

    Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for Autonomous Real-World Reinforcement Learning

    Authors: Jingyun Yang, Max Sobol Mark, Brandon Vu, Archit Sharma, Jeannette Bohg, Chelsea Finn

    Abstract: The pre-train and fine-tune paradigm in machine learning has had dramatic success in a wide range of domains because the use of existing data or pre-trained models on the internet enables quick and easy learning of new tasks. We aim to enable this paradigm in robotic reinforcement learning, allowing a robot to learn a new task with little human effort by leveraging data and models from the Interne… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

  39. arXiv:2310.13639  [pdf, other

    cs.LG cs.AI

    Contrastive Preference Learning: Learning from Human Feedback without RL

    Authors: Joey Hejna, Rafael Rafailov, Harshit Sikchi, Chelsea Finn, Scott Niekum, W. Bradley Knox, Dorsa Sadigh

    Abstract: Reinforcement Learning from Human Feedback (RLHF) has emerged as a popular paradigm for aligning models with human intent. Typically RLHF algorithms operate in two phases: first, use human preferences to learn a reward function and second, align the model by optimizing the learned reward via reinforcement learning (RL). This paradigm assumes that human preferences are distributed according to rewa… ▽ More

    Submitted 30 April, 2024; v1 submitted 20 October, 2023; originally announced October 2023.

    Comments: ICLR 2024. Code released at https://github.com/jhejna/cpl

  40. arXiv:2310.12962  [pdf, other

    cs.CL cs.AI cs.LG

    An Emulator for Fine-Tuning Large Language Models using Small Language Models

    Authors: Eric Mitchell, Rafael Rafailov, Archit Sharma, Chelsea Finn, Christopher D. Manning

    Abstract: Widely used language models (LMs) are typically built by scaling up a two-stage training pipeline: a pre-training stage that uses a very large, diverse dataset of text and a fine-tuning (sometimes, 'alignment') stage that uses targeted examples or other specifications of desired behaviors. While it has been hypothesized that knowledge and skills come from pre-training, and fine-tuning mostly filte… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

  41. arXiv:2310.10639  [pdf, other

    cs.RO

    Zero-Shot Robotic Manipulation with Pretrained Image-Editing Diffusion Models

    Authors: Kevin Black, Mitsuhiko Nakamoto, Pranav Atreya, Homer Walke, Chelsea Finn, Aviral Kumar, Sergey Levine

    Abstract: If generalist robots are to operate in truly unstructured environments, they need to be able to recognize and reason about novel objects and scenarios. Such objects and scenarios might not be present in the robot's own training data. We propose SuSIE, a method that leverages an image-editing diffusion model to act as a high-level planner by proposing intermediate subgoals that a low-level controll… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: 22 pages, 8 figures

  42. arXiv:2310.08864  [pdf, other

    cs.RO

    Open X-Embodiment: Robotic Learning Datasets and RT-X Models

    Authors: Open X-Embodiment Collaboration, Abby O'Neill, Abdul Rehman, Abhinav Gupta, Abhiram Maddukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, Ajinkya Jain, Albert Tung, Alex Bewley, Alex Herzog, Alex Irpan, Alexander Khazatsky, Anant Rai, Anchit Gupta, Andrew Wang, Andrey Kolobov, Anikait Singh, Animesh Garg, Aniruddha Kembhavi, Annie Xie , et al. (267 additional authors not shown)

    Abstract: Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning method… ▽ More

    Submitted 1 June, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: Project website: https://robotics-transformer-x.github.io

  43. arXiv:2310.08558  [pdf, other

    cs.LG cs.AI cs.RO

    Offline Retraining for Online RL: Decoupled Policy Learning to Mitigate Exploration Bias

    Authors: Max Sobol Mark, Archit Sharma, Fahim Tajwar, Rafael Rafailov, Sergey Levine, Chelsea Finn

    Abstract: It is desirable for policies to optimistically explore new states and behaviors during online reinforcement learning (RL) or fine-tuning, especially when prior offline data does not provide enough state coverage. However, exploration bonuses can bias the learned policy, and our experiments find that naive, yet standard use of such bonuses can fail to recover a performant policy. Concurrently, pess… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

  44. arXiv:2310.07899  [pdf, other

    cs.AI cs.RO

    RoboCLIP: One Demonstration is Enough to Learn Robot Policies

    Authors: Sumedh A Sontakke, Jesse Zhang, Sébastien M. R. Arnold, Karl Pertsch, Erdem Bıyık, Dorsa Sadigh, Chelsea Finn, Laurent Itti

    Abstract: Reward specification is a notoriously difficult problem in reinforcement learning, requiring extensive expert supervision to design robust reward functions. Imitation learning (IL) methods attempt to circumvent these problems by utilizing expert demonstrations but typically require a large number of in-domain expert demonstrations. Inspired by advances in the field of Video-and-Language Models (VL… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

  45. arXiv:2310.00754  [pdf, other

    cs.LG cs.CL cs.CV

    Analyzing and Mitigating Object Hallucination in Large Vision-Language Models

    Authors: Yiyang Zhou, Chenhang Cui, Jaehong Yoon, Linjun Zhang, Zhun Deng, Chelsea Finn, Mohit Bansal, Huaxiu Yao

    Abstract: Large vision-language models (LVLMs) have shown remarkable abilities in understanding visual information with human languages. However, LVLMs still suffer from object hallucination, which is the problem of generating descriptions that include objects that do not actually exist in the images. This can negatively impact many vision-language tasks, such as visual summarization and reasoning. To addre… ▽ More

    Submitted 16 March, 2024; v1 submitted 1 October, 2023; originally announced October 2023.

    Comments: Accepted by ICLR 2024

  46. arXiv:2309.10150  [pdf, other

    cs.RO cs.AI cs.LG

    Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions

    Authors: Yevgen Chebotar, Quan Vuong, Alex Irpan, Karol Hausman, Fei Xia, Yao Lu, Aviral Kumar, Tianhe Yu, Alexander Herzog, Karl Pertsch, Keerthana Gopalakrishnan, Julian Ibarz, Ofir Nachum, Sumedh Sontakke, Grecia Salazar, Huong T Tran, Jodilyn Peralta, Clayton Tan, Deeksha Manjunath, Jaspiar Singht, Brianna Zitkovich, Tomas Jackson, Kanishka Rao, Chelsea Finn, Sergey Levine

    Abstract: In this work, we present a scalable reinforcement learning method for training multi-task policies from large offline datasets that can leverage both human demonstrations and autonomously collected data. Our method uses a Transformer to provide a scalable representation for Q-functions trained via offline temporal difference backups. We therefore refer to the method as Q-Transformer. By discretizi… ▽ More

    Submitted 17 October, 2023; v1 submitted 18 September, 2023; originally announced September 2023.

    Comments: See website at https://qtransformer.github.io

  47. arXiv:2309.05665  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Robot Parkour Learning

    Authors: Ziwen Zhuang, Zipeng Fu, Jianren Wang, Christopher Atkeson, Soeren Schwertfeger, Chelsea Finn, Hang Zhao

    Abstract: Parkour is a grand challenge for legged locomotion that requires robots to overcome various obstacles rapidly in complex environments. Existing methods can generate either diverse but blind locomotion skills or vision-based but specialized skills by using reference animal data or complex rewards. However, autonomous parkour requires robots to learn generalizable skills that are both vision-based a… ▽ More

    Submitted 11 September, 2023; v1 submitted 11 September, 2023; originally announced September 2023.

    Comments: CoRL 2023 (Oral). Project website at https://robot-parkour.github.io

  48. arXiv:2308.12952  [pdf, other

    cs.RO cs.LG

    BridgeData V2: A Dataset for Robot Learning at Scale

    Authors: Homer Walke, Kevin Black, Abraham Lee, Moo Jin Kim, Max Du, Chongyi Zheng, Tony Zhao, Philippe Hansen-Estruch, Quan Vuong, Andre He, Vivek Myers, Kuan Fang, Chelsea Finn, Sergey Levine

    Abstract: We introduce BridgeData V2, a large and diverse dataset of robotic manipulation behaviors designed to facilitate research on scalable robot learning. BridgeData V2 contains 60,096 trajectories collected across 24 environments on a publicly available low-cost robot. BridgeData V2 provides extensive task and environment variability, leading to skills that can generalize across environments, domains,… ▽ More

    Submitted 17 January, 2024; v1 submitted 24 August, 2023; originally announced August 2023.

    Comments: 9 pages

  49. arXiv:2307.15818  [pdf, other

    cs.RO cs.CL cs.CV cs.LG

    RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

    Authors: Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Xi Chen, Krzysztof Choromanski, Tianli Ding, Danny Driess, Avinava Dubey, Chelsea Finn, Pete Florence, Chuyuan Fu, Montse Gonzalez Arenas, Keerthana Gopalakrishnan, Kehang Han, Karol Hausman, Alexander Herzog, Jasmine Hsu, Brian Ichter, Alex Irpan, Nikhil Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Isabel Leal , et al. (29 additional authors not shown)

    Abstract: We study how vision-language models trained on Internet-scale data can be incorporated directly into end-to-end robotic control to boost generalization and enable emergent semantic reasoning. Our goal is to enable a single end-to-end trained model to both learn to map robot observations to actions and enjoy the benefits of large-scale pretraining on language and vision-language data from the web.… ▽ More

    Submitted 28 July, 2023; originally announced July 2023.

    Comments: Website: https://robotics-transformer.github.io/

  50. arXiv:2307.14326  [pdf, other

    cs.RO cs.AI cs.LG

    Waypoint-Based Imitation Learning for Robotic Manipulation

    Authors: Lucy Xiaoyang Shi, Archit Sharma, Tony Z. Zhao, Chelsea Finn

    Abstract: While imitation learning methods have seen a resurgent interest for robotic manipulation, the well-known problem of compounding errors continues to afflict behavioral cloning (BC). Waypoints can help address this problem by reducing the horizon of the learning problem for BC, and thus, the errors compounded over time. However, waypoint labeling is underspecified, and requires additional human supe… ▽ More

    Submitted 26 July, 2023; originally announced July 2023.

    Comments: The first two authors contributed equally