Skip to main content

Showing 1–24 of 24 results for author: Driess, D

  1. arXiv:2403.12943  [pdf, other

    cs.RO cs.AI

    Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers

    Authors: Vidhi Jain, Maria Attarian, Nikhil J Joshi, Ayzaan Wahid, Danny Driess, Quan Vuong, Pannag R Sanketi, Pierre Sermanet, Stefan Welker, Christine Chan, Igor Gilitschenski, Yonatan Bisk, Debidatta Dwibedi

    Abstract: While large-scale robotic systems typically rely on textual instructions for tasks, this work explores a different approach: can robots infer the task directly from observing humans? This shift necessitates the robot's ability to decode human intent and translate it into executable actions within its physical constraints and environment. We introduce Vid2Robot, a novel end-to-end video-based learn… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: Robot learning: Imitation Learning, Robot Perception, Sensing & Vision, Grasping & Manipulation

  2. arXiv:2402.07872  [pdf, other

    cs.RO cs.CL cs.CV cs.LG

    PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs

    Authors: Soroush Nasiriany, Fei Xia, Wenhao Yu, Ted Xiao, Jacky Liang, Ishita Dasgupta, Annie Xie, Danny Driess, Ayzaan Wahid, Zhuo Xu, Quan Vuong, Tingnan Zhang, Tsang-Wei Edward Lee, Kuang-Huei Lee, Peng Xu, Sean Kirmani, Yuke Zhu, Andy Zeng, Karol Hausman, Nicolas Heess, Chelsea Finn, Sergey Levine, Brian Ichter

    Abstract: Vision language models (VLMs) have shown impressive capabilities across a variety of tasks, from logical reasoning to visual understanding. This opens the door to richer interaction with the world, for example robotic control. However, VLMs produce only textual outputs, while robotic control and other spatial tasks require outputting continuous coordinates, actions, or trajectories. How can we ena… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

  3. arXiv:2401.12168  [pdf, other

    cs.CV cs.CL cs.LG cs.RO

    SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities

    Authors: Boyuan Chen, Zhuo Xu, Sean Kirmani, Brian Ichter, Danny Driess, Pete Florence, Dorsa Sadigh, Leonidas Guibas, Fei Xia

    Abstract: Understanding and reasoning about spatial relationships is a fundamental capability for Visual Question Answering (VQA) and robotics. While Vision Language Models (VLM) have demonstrated remarkable performance in certain VQA benchmarks, they still lack capabilities in 3D spatial reasoning, such as recognizing quantitative relationships of physical objects like distances or size differences. We hyp… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

  4. arXiv:2312.07843  [pdf, ps, other

    cs.RO

    Foundation Models in Robotics: Applications, Challenges, and the Future

    Authors: Roya Firoozi, Johnathan Tucker, Stephen Tian, Anirudha Majumdar, Jiankai Sun, Weiyu Liu, Yuke Zhu, Shuran Song, Ashish Kapoor, Karol Hausman, Brian Ichter, Danny Driess, Jiajun Wu, Cewu Lu, Mac Schwager

    Abstract: We survey applications of pretrained foundation models in robotics. Traditional deep learning models in robotics are trained on small datasets tailored for specific tasks, which limits their adaptability across diverse applications. In contrast, foundation models pretrained on internet-scale data appear to have superior generalization capabilities, and in some instances display an emergent ability… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

  5. arXiv:2310.08864  [pdf, other

    cs.RO

    Open X-Embodiment: Robotic Learning Datasets and RT-X Models

    Authors: Open X-Embodiment Collaboration, Abby O'Neill, Abdul Rehman, Abhinav Gupta, Abhiram Maddukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, Ajinkya Jain, Albert Tung, Alex Bewley, Alex Herzog, Alex Irpan, Alexander Khazatsky, Anant Rai, Anchit Gupta, Andrew Wang, Andrey Kolobov, Anikait Singh, Animesh Garg, Aniruddha Kembhavi, Annie Xie , et al. (267 additional authors not shown)

    Abstract: Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning method… ▽ More

    Submitted 1 June, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: Project website: https://robotics-transformer-x.github.io

  6. arXiv:2307.15818  [pdf, other

    cs.RO cs.CL cs.CV cs.LG

    RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

    Authors: Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Xi Chen, Krzysztof Choromanski, Tianli Ding, Danny Driess, Avinava Dubey, Chelsea Finn, Pete Florence, Chuyuan Fu, Montse Gonzalez Arenas, Keerthana Gopalakrishnan, Kehang Han, Karol Hausman, Alexander Herzog, Jasmine Hsu, Brian Ichter, Alex Irpan, Nikhil Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Isabel Leal , et al. (29 additional authors not shown)

    Abstract: We study how vision-language models trained on Internet-scale data can be incorporated directly into end-to-end robotic control to boost generalization and enable emergent semantic reasoning. Our goal is to enable a single end-to-end trained model to both learn to map robot observations to actions and enjoy the benefits of large-scale pretraining on language and vision-language data from the web.… ▽ More

    Submitted 28 July, 2023; originally announced July 2023.

    Comments: Website: https://robotics-transformer.github.io/

  7. arXiv:2307.14334  [pdf, other

    cs.CL cs.CV

    Towards Generalist Biomedical AI

    Authors: Tao Tu, Shekoofeh Azizi, Danny Driess, Mike Schaekermann, Mohamed Amin, Pi-Chuan Chang, Andrew Carroll, Chuck Lau, Ryutaro Tanno, Ira Ktena, Basil Mustafa, Aakanksha Chowdhery, Yun Liu, Simon Kornblith, David Fleet, Philip Mansfield, Sushant Prakash, Renee Wong, Sunny Virmani, Christopher Semturs, S Sara Mahdavi, Bradley Green, Ewa Dominowska, Blaise Aguera y Arcas, Joelle Barral , et al. (7 additional authors not shown)

    Abstract: Medicine is inherently multimodal, with rich data modalities spanning text, imaging, genomics, and more. Generalist biomedical artificial intelligence (AI) systems that flexibly encode, integrate, and interpret this data at scale can potentially enable impactful applications ranging from scientific discovery to care delivery. To enable the development of these models, we first curate MultiMedBench… ▽ More

    Submitted 26 July, 2023; originally announced July 2023.

  8. arXiv:2307.04721  [pdf, other

    cs.AI cs.CL cs.RO

    Large Language Models as General Pattern Machines

    Authors: Suvir Mirchandani, Fei Xia, Pete Florence, Brian Ichter, Danny Driess, Montserrat Gonzalez Arenas, Kanishka Rao, Dorsa Sadigh, Andy Zeng

    Abstract: We observe that pre-trained large language models (LLMs) are capable of autoregressively completing complex token sequences -- from arbitrary ones procedurally generated by probabilistic context-free grammars (PCFG), to more rich spatial patterns found in the Abstraction and Reasoning Corpus (ARC), a general AI benchmark, prompted in the style of ASCII art. Surprisingly, pattern completion profici… ▽ More

    Submitted 25 October, 2023; v1 submitted 10 July, 2023; originally announced July 2023.

    Comments: 21 pages, 25 figures. To appear at Conference on Robot Learning (CoRL) 2023

  9. arXiv:2303.03378  [pdf, other

    cs.LG cs.AI cs.RO

    PaLM-E: An Embodied Multimodal Language Model

    Authors: Danny Driess, Fei Xia, Mehdi S. M. Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, Wenlong Huang, Yevgen Chebotar, Pierre Sermanet, Daniel Duckworth, Sergey Levine, Vincent Vanhoucke, Karol Hausman, Marc Toussaint, Klaus Greff, Andy Zeng, Igor Mordatch, Pete Florence

    Abstract: Large language models excel at a wide range of complex tasks. However, enabling general inference in the real world, e.g., for robotics problems, raises the challenge of grounding. We propose embodied language models to directly incorporate real-world continuous sensor modalities into language models and thereby establish the link between words and percepts. Input to our embodied language model ar… ▽ More

    Submitted 6 March, 2023; originally announced March 2023.

  10. arXiv:2303.00855  [pdf

    cs.RO cs.AI cs.CL cs.CV cs.LG

    Grounded Decoding: Guiding Text Generation with Grounded Models for Embodied Agents

    Authors: Wenlong Huang, Fei Xia, Dhruv Shah, Danny Driess, Andy Zeng, Yao Lu, Pete Florence, Igor Mordatch, Sergey Levine, Karol Hausman, Brian Ichter

    Abstract: Recent progress in large language models (LLMs) has demonstrated the ability to learn and leverage Internet-scale knowledge through pre-training with autoregressive models. Unfortunately, applying such models to settings with embodied agents, such as robots, is challenging due to their lack of experience with the physical world, inability to parse non-language observations, and ignorance of reward… ▽ More

    Submitted 11 December, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

  11. arXiv:2210.12386  [pdf, other

    cs.RO

    Learning Feasibility of Factored Nonlinear Programs in Robotic Manipulation Planning

    Authors: Joaquim Ortiz-Haro, Jung-Su Ha, Danny Driess, Erez Karpas, Marc Toussaint

    Abstract: A factored Nonlinear Program (Factored-NLP) explicitly models the dependencies between a set of continuous variables and nonlinear constraints, providing an expressive formulation for relevant robotics problems such as manipulation planning or simultaneous localization and mapping. When the problem is over-constrained or infeasible, a fundamental issue is to detect a minimal subset of variables an… ▽ More

    Submitted 23 May, 2023; v1 submitted 22 October, 2022; originally announced October 2022.

    Comments: Submitted to ICRA 2023

  12. arXiv:2206.01634  [pdf, other

    cs.LG cs.CV cs.RO

    Reinforcement Learning with Neural Radiance Fields

    Authors: Danny Driess, Ingmar Schubert, Pete Florence, Yunzhu Li, Marc Toussaint

    Abstract: It is a long-standing problem to find effective representations for training reinforcement learning (RL) agents. This paper demonstrates that learning state representations with supervision from Neural Radiance Fields (NeRFs) can improve the performance of RL compared to other learned representations or even low-dimensional, hand-engineered state information. Specifically, we propose to train an e… ▽ More

    Submitted 3 June, 2022; originally announced June 2022.

  13. arXiv:2205.04362  [pdf, other

    cs.RO

    FC$^3$: Feasibility-Based Control Chain Coordination

    Authors: Jason Harris, Danny Driess, Marc Toussaint

    Abstract: Hierarchical coordination of controllers often uses symbolic state representations that fully abstract their underlying low-level controllers, treating them as "black boxes" to the symbolic action abstraction. This paper proposes a framework to realize robust behavior, which we call Feasibility-based Control Chain Coordination (FC$^3$). Our controllers expose the geometric features and constraints… ▽ More

    Submitted 9 May, 2022; originally announced May 2022.

  14. arXiv:2203.05390  [pdf, other

    cs.RO

    Sequence-of-Constraints MPC: Reactive Timing-Optimal Control of Sequential Manipulation

    Authors: Marc Toussaint, Jason Harris, Jung-Su Ha, Danny Driess, Wolfgang Hönig

    Abstract: Task and Motion Planning has made great progress in solving hard sequential manipulation problems. However, a gap between such planning formulations and control methods for reactive execution remains. In this paper we propose a model predictive control approach dedicated to robustly execute a single sequence of constraints, which corresponds to a discrete decision sequence of a TAMP plan. We decom… ▽ More

    Submitted 22 September, 2022; v1 submitted 10 March, 2022; originally announced March 2022.

    Comments: IROS 2022 - Int. Conf. on Intelligent Robots and Systems

  15. arXiv:2202.11855  [pdf, other

    cs.CV cs.LG cs.RO

    Learning Multi-Object Dynamics with Compositional Neural Radiance Fields

    Authors: Danny Driess, Zhiao Huang, Yunzhu Li, Russ Tedrake, Marc Toussaint

    Abstract: We present a method to learn compositional multi-object dynamics models from image observations based on implicit object encoders, Neural Radiance Fields (NeRFs), and graph neural networks. NeRFs have become a popular choice for representing scenes due to their strong 3D prior. However, most NeRF approaches are trained on a single scene, representing the whole scene with a global model, making gen… ▽ More

    Submitted 27 July, 2022; v1 submitted 23 February, 2022; originally announced February 2022.

    Comments: v3: real robot exp

  16. arXiv:2112.04812  [pdf, other

    cs.RO

    Deep Visual Constraints: Neural Implicit Models for Manipulation Planning from Visual Input

    Authors: Jung-Su Ha, Danny Driess, Marc Toussaint

    Abstract: Manipulation planning is the problem of finding a sequence of robot configurations that involves interactions with objects in the scene, e.g., grasping and placing an object, or more general tool-use. To achieve such interactions, traditional approaches require hand-engineering of object representations and interaction constraints, which easily becomes tedious when complex objects/interactions are… ▽ More

    Submitted 28 July, 2022; v1 submitted 9 December, 2021; originally announced December 2021.

    Comments: IEEE Robotics and Automation Letters (RA-L) 2022

  17. arXiv:2111.07908  [pdf, other

    cs.AI cs.RO

    Learning to Execute: Efficient Learning of Universal Plan-Conditioned Policies in Robotics

    Authors: Ingmar Schubert, Danny Driess, Ozgur S. Oguz, Marc Toussaint

    Abstract: Applications of Reinforcement Learning (RL) in robotics are often limited by high data demand. On the other hand, approximate models are readily available in many robotics scenarios, making model-based approaches like planning a data-efficient alternative. Still, the performance of these methods suffers if the model is imprecise or wrong. In this sense, the respective strengths and weaknesses of R… ▽ More

    Submitted 15 November, 2021; originally announced November 2021.

    Journal ref: 35th Conference on Neural Information Processing Systems (NeurIPS 2021), Sydney, Australia

  18. arXiv:2110.00792  [pdf, other

    cs.RO cs.LG

    Learning Models as Functionals of Signed-Distance Fields for Manipulation Planning

    Authors: Danny Driess, Jung-Su Ha, Marc Toussaint, Russ Tedrake

    Abstract: This work proposes an optimization-based manipulation planning framework where the objectives are learned functionals of signed-distance fields that represent objects in the scene. Most manipulation planning approaches rely on analytical models and carefully chosen abstractions/state-spaces to be effective. A central question is how models can be obtained from data that are not primarily accurate… ▽ More

    Submitted 2 October, 2021; originally announced October 2021.

  19. Long-Horizon Multi-Robot Rearrangement Planning for Construction Assembly

    Authors: Valentin Noah Hartmann, Andreas Orthey, Danny Driess, Ozgur S. Oguz, Marc Toussaint

    Abstract: Robotic assembly planning enables architects to explicitly account for the assembly process during the design phase, and enables efficient building methods that profit from the robots' different capabilities. Previous work has addressed planning of robot assembly sequences and identifying the feasibility of architectural designs. This paper extends previous work by enabling planning with large, he… ▽ More

    Submitted 7 March, 2022; v1 submitted 4 June, 2021; originally announced June 2021.

    Comments: 13 pages, 16 Figures, 2 Tables, 3 Algorithms

    Journal ref: IEEE Transactions on Robotics (Volume: 39, Issue: 1, February 2023)

  20. arXiv:2103.05401  [pdf, other

    cs.RO cs.AI

    Deep 6-DoF Tracking of Unknown Objects for Reactive Grasping

    Authors: Marc Tuscher, Julian Hörz, Danny Driess, Marc Toussaint

    Abstract: Robotic manipulation of unknown objects is an important field of research. Practical applications occur in many real-world settings where robots need to interact with an unknown environment. We tackle the problem of reactive grasping by proposing a method for unknown object tracking, grasp point sampling and dynamic trajectory planning. Our object tracking method combines Siamese Networks with an… ▽ More

    Submitted 25 March, 2021; v1 submitted 9 March, 2021; originally announced March 2021.

  21. arXiv:2006.05398  [pdf, other

    cs.LG cs.AI cs.CV cs.RO stat.ML

    Deep Visual Reasoning: Learning to Predict Action Sequences for Task and Motion Planning from an Initial Scene Image

    Authors: Danny Driess, Jung-Su Ha, Marc Toussaint

    Abstract: In this paper, we propose a deep convolutional recurrent neural network that predicts action sequences for task and motion planning (TAMP) from an initial scene image. Typical TAMP problems are formalized by combining reasoning on a symbolic, discrete level (e.g. first-order logic) with continuous motion planning such as nonlinear trajectory optimization. Due to the great combinatorial complexity… ▽ More

    Submitted 9 June, 2020; originally announced June 2020.

    Comments: Robotics: Science and Systems (R:SS) 2020

  22. Robust Task and Motion Planning for Long-Horizon Architectural Construction Planning

    Authors: Valentin N. Hartmann, Ozgur S. Oguz, Danny Driess, Marc Toussaint, Achim Menges

    Abstract: Integrating robotic systems in architectural and construction processes is of core interest to increase the efficiency of the building industry. Automated planning for such systems enables design analysis tools and facilitates faster design iteration cycles for designers and engineers. However, generic task-and-motion planning (TAMP) for long-horizon construction processes is beyond the capabiliti… ▽ More

    Submitted 17 March, 2020; originally announced March 2020.

  23. arXiv:2003.04259  [pdf, other

    cs.RO

    Probabilistic Framework for Constrained Manipulations and Task and Motion Planning under Uncertainty

    Authors: Jung-Su Ha, Danny Driess, Marc Toussaint

    Abstract: Logic-Geometric Programming (LGP) is a powerful motion and manipulation planning framework, which represents hierarchical structure using logic rules that describe discrete aspects of problems, e.g., touch, grasp, hit, or push, and solves the resulting smooth trajectory optimization. The expressive power of logic allows LGP for handling complex, large-scale sequential manipulation and tool-use pla… ▽ More

    Submitted 9 March, 2020; originally announced March 2020.

    Comments: ICRA 2020

  24. arXiv:2002.12780  [pdf, other

    cs.RO

    Describing Physics For Physical Reasoning: Force-based Sequential Manipulation Planning

    Authors: Marc Toussaint, Jung-Su Ha, Danny Driess

    Abstract: Physical reasoning is a core aspect of intelligence in animals and humans. A central question is what model should be used as a basis for reasoning. Existing work considered models ranging from intuitive physics and physical simulators to contact dynamics models used in robotic manipulation and locomotion. In this work we propose descriptions of physics which directly allow us to leverage optimiza… ▽ More

    Submitted 5 July, 2020; v1 submitted 28 February, 2020; originally announced February 2020.

    Journal ref: International Conference on Intelligent Robots and Systems (IROS 2020) IEEE Robotics and Automation Letters (RA-L)