subscribe to arXiv mailings

AutoMate: Specialist and Generalist Assembly Policies over Diverse Geometries

Authors: Bingjie Tang, Iretiayo Akinola, Jie Xu, Bowen Wen, Ankur Handa, Karl Van Wyk, Dieter Fox, Gaurav S. Sukhatme, Fabio Ramos, Yashraj Narang

Abstract: Robotic assembly for high-mixture settings requires adaptivity to diverse parts and poses, which is an open challenge. Meanwhile, in other areas of robotics, large models and sim-to-real have led to tremendous progress. Inspired by such work, we present AutoMate, a learning framework and system that consists of 4 parts: 1) a dataset of 100 assemblies compatible with simulation and the real world,… ▽ More Robotic assembly for high-mixture settings requires adaptivity to diverse parts and poses, which is an open challenge. Meanwhile, in other areas of robotics, large models and sim-to-real have led to tremendous progress. Inspired by such work, we present AutoMate, a learning framework and system that consists of 4 parts: 1) a dataset of 100 assemblies compatible with simulation and the real world, along with parallelized simulation environments for policy learning, 2) a novel simulation-based approach for learning specialist (i.e., part-specific) policies and generalist (i.e., unified) assembly policies, 3) demonstrations of specialist policies that individually solve 80 assemblies with 80% or higher success rates in simulation, as well as a generalist policy that jointly solves 20 assemblies with an 80%+ success rate, and 4) zero-shot sim-to-real transfer that achieves similar (or better) performance than simulation, including on perception-initialized assembly. The key methodological takeaway is that a union of diverse algorithms from manufacturing engineering, character animation, and time-series analysis provides a generic and robust solution for a diverse range of robotic assembly problems.To our knowledge, AutoMate provides the first simulation-based framework for learning specialist and generalist policies over a wide range of assemblies, as well as the first system demonstrating zero-shot sim-to-real transfer over such a range. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.02274 [pdf, other]

DextrAH-G: Pixels-to-Action Dexterous Arm-Hand Grasping with Geometric Fabrics

Authors: Tyler Ga Wei Lum, Martin Matak, Viktor Makoviychuk, Ankur Handa, Arthur Allshire, Tucker Hermans, Nathan D. Ratliff, Karl Van Wyk

Abstract: A pivotal challenge in robotics is achieving fast, safe, and robust dexterous grasping across a diverse range of objects, an important goal within industrial applications. However, existing methods often have very limited speed, dexterity, and generality, along with limited or no hardware safety guarantees. In this work, we introduce DextrAH-G, a depth-based dexterous grasping policy trained entir… ▽ More A pivotal challenge in robotics is achieving fast, safe, and robust dexterous grasping across a diverse range of objects, an important goal within industrial applications. However, existing methods often have very limited speed, dexterity, and generality, along with limited or no hardware safety guarantees. In this work, we introduce DextrAH-G, a depth-based dexterous grasping policy trained entirely in simulation that combines reinforcement learning, geometric fabrics, and teacher-student distillation. We address key challenges in joint arm-hand policy learning, such as high-dimensional observation and action spaces, the sim2real gap, collision avoidance, and hardware constraints. DextrAH-G enables a 23 motor arm-hand robot to safely and continuously grasp and transport a large variety of objects at high speed using multi-modal inputs including depth images, allowing generalization across object geometry. Videos at https://sites.google.com/view/dextrah-g. △ Less

Submitted 3 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

arXiv:2405.02250 [pdf, other]

Geometric Fabrics: a Safe Guiding Medium for Policy Learning

Authors: Karl Van Wyk, Ankur Handa, Viktor Makoviychuk, Yijie Guo, Arthur Allshire, Nathan D. Ratliff

Abstract: Robotics policies are always subjected to complex, second order dynamics that entangle their actions with resulting states. In reinforcement learning (RL) contexts, policies have the burden of deciphering these complicated interactions over massive amounts of experience and complex reward functions to learn how to accomplish tasks. Moreover, policies typically issue actions directly to controllers… ▽ More Robotics policies are always subjected to complex, second order dynamics that entangle their actions with resulting states. In reinforcement learning (RL) contexts, policies have the burden of deciphering these complicated interactions over massive amounts of experience and complex reward functions to learn how to accomplish tasks. Moreover, policies typically issue actions directly to controllers like Operational Space Control (OSC) or joint PD control, which induces straightline motion towards these action targets in task or joint space. However, straightline motion in these spaces for the most part do not capture the rich, nonlinear behavior our robots need to exhibit, shifting the burden of discovering these behaviors more completely to the agent. Unlike these simpler controllers, geometric fabrics capture a much richer and desirable set of behaviors via artificial, second order dynamics grounded in nonlinear geometry. These artificial dynamics shift the uncontrolled dynamics of a robot via an appropriate control law to form behavioral dynamics. Behavioral dynamics unlock a new action space and safe, guiding behavior over which RL policies are trained. Behavioral dynamics enable bang-bang-like RL policy actions that are still safe for real robots, simplify reward engineering, and help sequence real-world, high-performance policies. We describe the framework more generally and create a specific instantiation for the problem of dexterous, in-hand reorientation of a cube by a highly actuated robot hand. △ Less

Submitted 3 May, 2024; originally announced May 2024.

arXiv:2404.03336 [pdf, other]

Scaling Population-Based Reinforcement Learning with GPU Accelerated Simulation

Authors: Asad Ali Shahid, Yashraj Narang, Vincenzo Petrone, Enrico Ferrentino, Ankur Handa, Dieter Fox, Marco Pavone, Loris Roveda

Abstract: In recent years, deep reinforcement learning (RL) has shown its effectiveness in solving complex continuous control tasks like locomotion and dexterous manipulation. However, this comes at the cost of an enormous amount of experience required for training, exacerbated by the sensitivity of learning efficiency and the policy performance to hyperparameter selection, which often requires numerous tri… ▽ More In recent years, deep reinforcement learning (RL) has shown its effectiveness in solving complex continuous control tasks like locomotion and dexterous manipulation. However, this comes at the cost of an enormous amount of experience required for training, exacerbated by the sensitivity of learning efficiency and the policy performance to hyperparameter selection, which often requires numerous trials of time-consuming experiments. This work introduces a Population-Based Reinforcement Learning (PBRL) approach that exploits a GPU-accelerated physics simulator to enhance the exploration capabilities of RL by concurrently training multiple policies in parallel. The PBRL framework is applied to three state-of-the-art RL algorithms - PPO, SAC, and DDPG - dynamically adjusting hyperparameters based on the performance of learning agents. The experiments are performed on four challenging tasks in Isaac Gym - Anymal Terrain, Shadow Hand, Humanoid, Franka Nut Pick - by analyzing the effect of population size and mutation mechanisms for hyperparameters. The results demonstrate that PBRL agents outperform non-evolutionary baseline agents across tasks essential for humanoid robots, such as bipedal locomotion, manipulation, and grasping in unstructured environments. The trained agents are finally deployed in the real world for the Franka Nut Pick manipulation task. To our knowledge, this is the first sim-to-real attempt for successfully deploying PBRL agents on real hardware. Code and videos of the learned policies are available on our project website (https://sites.google.com/view/pbrl). △ Less

Submitted 24 June, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

Comments: Submitted for publication to IEEE-RAS 23rd International Conference on Humanoid Robots

arXiv:2310.17274 [pdf, other]

cuRobo: Parallelized Collision-Free Minimum-Jerk Robot Motion Generation

Authors: Balakumar Sundaralingam, Siva Kumar Sastry Hari, Adam Fishman, Caelan Garrett, Karl Van Wyk, Valts Blukis, Alexander Millane, Helen Oleynikova, Ankur Handa, Fabio Ramos, Nathan Ratliff, Dieter Fox

Abstract: This paper explores the problem of collision-free motion generation for manipulators by formulating it as a global motion optimization problem. We develop a parallel optimization technique to solve this problem and demonstrate its effectiveness on massively parallel GPUs. We show that combining simple optimization techniques with many parallel seeds leads to solving difficult motion generation pro… ▽ More This paper explores the problem of collision-free motion generation for manipulators by formulating it as a global motion optimization problem. We develop a parallel optimization technique to solve this problem and demonstrate its effectiveness on massively parallel GPUs. We show that combining simple optimization techniques with many parallel seeds leads to solving difficult motion generation problems within 50ms on average, 60x faster than state-of-the-art (SOTA) trajectory optimization methods. We achieve SOTA performance by combining L-BFGS step direction estimation with a novel parallel noisy line search scheme and a particle-based optimization solver. To further aid trajectory optimization, we develop a parallel geometric planner that plans within 20ms and also introduce a collision-free IK solver that can solve over 7000 queries/s. We package our contributions into a state of the art GPU accelerated motion generation library, cuRobo and release it to enrich the robotics community. Additional details are available at https://curobo.org △ Less

Submitted 3 November, 2023; v1 submitted 26 October, 2023; originally announced October 2023.

Comments: revised technical report, 62 pages, Website: https://curobo.org

arXiv:2305.17110 [pdf, other]

IndustReal: Transferring Contact-Rich Assembly Tasks from Simulation to Reality

Authors: Bingjie Tang, Michael A. Lin, Iretiayo Akinola, Ankur Handa, Gaurav S. Sukhatme, Fabio Ramos, Dieter Fox, Yashraj Narang

Abstract: Robotic assembly is a longstanding challenge, requiring contact-rich interaction and high precision and accuracy. Many applications also require adaptivity to diverse parts, poses, and environments, as well as low cycle times. In other areas of robotics, simulation is a powerful tool to develop algorithms, generate datasets, and train agents. However, simulation has had a more limited impact on as… ▽ More Robotic assembly is a longstanding challenge, requiring contact-rich interaction and high precision and accuracy. Many applications also require adaptivity to diverse parts, poses, and environments, as well as low cycle times. In other areas of robotics, simulation is a powerful tool to develop algorithms, generate datasets, and train agents. However, simulation has had a more limited impact on assembly. We present IndustReal, a set of algorithms, systems, and tools that solve assembly tasks in simulation with reinforcement learning (RL) and successfully achieve policy transfer to the real world. Specifically, we propose 1) simulation-aware policy updates, 2) signed-distance-field rewards, and 3) sampling-based curricula for robotic RL agents. We use these algorithms to enable robots to solve contact-rich pick, place, and insertion tasks in simulation. We then propose 4) a policy-level action integrator to minimize error at policy deployment time. We build and demonstrate a real-world robotic assembly system that uses the trained policies and action integrator to achieve repeatable performance in the real world. Finally, we present hardware and software tools that allow other researchers to reproduce our system and results. For videos and additional details, please see http://sites.google.com/nvidia.com/industreal . △ Less

Submitted 26 May, 2023; originally announced May 2023.

Comments: Accepted to Robotics: Science and Systems (RSS) 2023

arXiv:2305.16309 [pdf, other]

Imitating Task and Motion Planning with Visuomotor Transformers

Authors: Murtaza Dalal, Ajay Mandlekar, Caelan Garrett, Ankur Handa, Ruslan Salakhutdinov, Dieter Fox

Abstract: Imitation learning is a powerful tool for training robot manipulation policies, allowing them to learn from expert demonstrations without manual programming or trial-and-error. However, common methods of data collection, such as human supervision, scale poorly, as they are time-consuming and labor-intensive. In contrast, Task and Motion Planning (TAMP) can autonomously generate large-scale dataset… ▽ More Imitation learning is a powerful tool for training robot manipulation policies, allowing them to learn from expert demonstrations without manual programming or trial-and-error. However, common methods of data collection, such as human supervision, scale poorly, as they are time-consuming and labor-intensive. In contrast, Task and Motion Planning (TAMP) can autonomously generate large-scale datasets of diverse demonstrations. In this work, we show that the combination of large-scale datasets generated by TAMP supervisors and flexible Transformer models to fit them is a powerful paradigm for robot manipulation. To that end, we present a novel imitation learning system called OPTIMUS that trains large-scale visuomotor Transformer policies by imitating a TAMP agent. OPTIMUS introduces a pipeline for generating TAMP data that is specifically curated for imitation learning and can be used to train performant transformer-based policies. In this paper, we present a thorough study of the design decisions required to imitate TAMP and demonstrate that OPTIMUS can solve a wide variety of challenging vision-based manipulation tasks with over 70 different objects, ranging from long-horizon pick-and-place tasks, to shelf and articulated object manipulation, achieving 70 to 80% success rates. Video results and code at https://mihdalal.github.io/optimus/ △ Less

Submitted 17 October, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

Comments: Conference on Robot Learning (CoRL) 2023. 8 pages, 5 figures, 2 tables; 11 pages appendix (10 additional figures)

arXiv:2305.12127 [pdf, other]

DexPBT: Scaling up Dexterous Manipulation for Hand-Arm Systems with Population Based Training

Authors: Aleksei Petrenko, Arthur Allshire, Gavriel State, Ankur Handa, Viktor Makoviychuk

Abstract: In this work, we propose algorithms and methods that enable learning dexterous object manipulation using simulated one- or two-armed robots equipped with multi-fingered hand end-effectors. Using a parallel GPU-accelerated physics simulator (Isaac Gym), we implement challenging tasks for these robots, including regrasping, grasp-and-throw, and object reorientation. To solve these problems we introd… ▽ More In this work, we propose algorithms and methods that enable learning dexterous object manipulation using simulated one- or two-armed robots equipped with multi-fingered hand end-effectors. Using a parallel GPU-accelerated physics simulator (Isaac Gym), we implement challenging tasks for these robots, including regrasping, grasp-and-throw, and object reorientation. To solve these problems we introduce a decentralized Population-Based Training (PBT) algorithm that allows us to massively amplify the exploration capabilities of deep reinforcement learning. We find that this method significantly outperforms regular end-to-end learning and is able to discover robust control policies in challenging tasks. Video demonstrations of learned behaviors and the code can be found at https://sites.google.com/view/dexpbt △ Less

Submitted 20 May, 2023; originally announced May 2023.

Comments: Published in RSS2023

arXiv:2210.13702 [pdf, other]

DeXtreme: Transfer of Agile In-hand Manipulation from Simulation to Reality

Authors: Ankur Handa, Arthur Allshire, Viktor Makoviychuk, Aleksei Petrenko, Ritvik Singh, Jingzhou Liu, Denys Makoviichuk, Karl Van Wyk, Alexander Zhurkevich, Balakumar Sundaralingam, Yashraj Narang, Jean-Francois Lafleche, Dieter Fox, Gavriel State

Abstract: Recent work has demonstrated the ability of deep reinforcement learning (RL) algorithms to learn complex robotic behaviours in simulation, including in the domain of multi-fingered manipulation. However, such models can be challenging to transfer to the real world due to the gap between simulation and reality. In this paper, we present our techniques to train a) a policy that can perform robust de… ▽ More Recent work has demonstrated the ability of deep reinforcement learning (RL) algorithms to learn complex robotic behaviours in simulation, including in the domain of multi-fingered manipulation. However, such models can be challenging to transfer to the real world due to the gap between simulation and reality. In this paper, we present our techniques to train a) a policy that can perform robust dexterous manipulation on an anthropomorphic robot hand and b) a robust pose estimator suitable for providing reliable real-time information on the state of the object being manipulated. Our policies are trained to adapt to a wide range of conditions in simulation. Consequently, our vision-based policies significantly outperform the best vision policies in the literature on the same reorientation task and are competitive with policies that are given privileged state information via motion capture systems. Our work reaffirms the possibilities of sim-to-real transfer for dexterous manipulation in diverse kinds of hardware and simulator setups, and in our case, with the Allegro Hand and Isaac Gym GPU-based simulation. Furthermore, it opens up possibilities for researchers to achieve such results with commonly-available, affordable robot hands and cameras. Videos of the resulting policy and supplementary information, including experiments and demos, can be found at https://dextreme.org/ △ Less

Submitted 2 January, 2024; v1 submitted 24 October, 2022; originally announced October 2022.

Comments: 28 pages. A smaller version of this paper is accepted to ICRA 2023

arXiv:2205.03532 [pdf, other]

Factory: Fast Contact for Robotic Assembly

Authors: Yashraj Narang, Kier Storey, Iretiayo Akinola, Miles Macklin, Philipp Reist, Lukasz Wawrzyniak, Yunrong Guo, Adam Moravanszky, Gavriel State, Michelle Lu, Ankur Handa, Dieter Fox

Abstract: Robotic assembly is one of the oldest and most challenging applications of robotics. In other areas of robotics, such as perception and grasping, simulation has rapidly accelerated research progress, particularly when combined with modern deep learning. However, accurately, efficiently, and robustly simulating the range of contact-rich interactions in assembly remains a longstanding challenge. In… ▽ More Robotic assembly is one of the oldest and most challenging applications of robotics. In other areas of robotics, such as perception and grasping, simulation has rapidly accelerated research progress, particularly when combined with modern deep learning. However, accurately, efficiently, and robustly simulating the range of contact-rich interactions in assembly remains a longstanding challenge. In this work, we present Factory, a set of physics simulation methods and robot learning tools for such applications. We achieve real-time or faster simulation of a wide range of contact-rich scenes, including simultaneous simulation of 1000 nut-and-bolt interactions. We provide $60$ carefully-designed part models, 3 robotic assembly environments, and 7 robot controllers for training and testing virtual robots. Finally, we train and evaluate proof-of-concept reinforcement learning policies for nut-and-bolt assembly. We aim for Factory to open the doors to using simulation for robotic assembly, as well as many other contact-rich applications in robotics. Please see https://sites.google.com/nvidia.com/factory for supplementary content, including videos. △ Less

Submitted 6 May, 2022; originally announced May 2022.

Comments: Accepted to Robotics: Science and Systems (RSS) 2022

arXiv:2112.05129 [pdf, other]

Assistive Tele-op: Leveraging Transformers to Collect Robotic Task Demonstrations

Authors: Henry M. Clever, Ankur Handa, Hammad Mazhar, Kevin Parker, Omer Shapira, Qian Wan, Yashraj Narang, Iretiayo Akinola, Maya Cakmak, Dieter Fox

Abstract: Sharing autonomy between robots and human operators could facilitate data collection of robotic task demonstrations to continuously improve learned models. Yet, the means to communicate intent and reason about the future are disparate between humans and robots. We present Assistive Tele-op, a virtual reality (VR) system for collecting robot task demonstrations that displays an autonomous trajector… ▽ More Sharing autonomy between robots and human operators could facilitate data collection of robotic task demonstrations to continuously improve learned models. Yet, the means to communicate intent and reason about the future are disparate between humans and robots. We present Assistive Tele-op, a virtual reality (VR) system for collecting robot task demonstrations that displays an autonomous trajectory forecast to communicate the robot's intent. As the robot moves, the user can switch between autonomous and manual control when desired. This allows users to collect task demonstrations with both a high success rate and with greater ease than manual teleoperation systems. Our system is powered by transformers, which can provide a window of potential states and actions far into the future -- with almost no added computation time. A key insight is that human intent can be injected at any location within the transformer sequence if the user decides that the model-predicted actions are inappropriate. At every time step, the user can (1) do nothing and allow autonomous operation to continue while observing the robot's future plan sequence, or (2) take over and momentarily prescribe a different set of actions to nudge the model back on track. We host the videos and other supplementary material at https://sites.google.com/view/assistive-teleop. △ Less

Submitted 9 December, 2021; originally announced December 2021.

Comments: 9 pages, 4 figures, 1 table. NeurIPS 2021 Workshop on Robot Learning: Self-Supervised and Lifelong Learning, Virtual, Virtual

arXiv:2108.10470 [pdf, other]

Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning

Authors: Viktor Makoviychuk, Lukasz Wawrzyniak, Yunrong Guo, Michelle Lu, Kier Storey, Miles Macklin, David Hoeller, Nikita Rudin, Arthur Allshire, Ankur Handa, Gavriel State

Abstract: Isaac Gym offers a high performance learning platform to train policies for wide variety of robotics tasks directly on GPU. Both physics simulation and the neural network policy training reside on GPU and communicate by directly passing data from physics buffers to PyTorch tensors without ever going through any CPU bottlenecks. This leads to blazing fast training times for complex robotics tasks o… ▽ More Isaac Gym offers a high performance learning platform to train policies for wide variety of robotics tasks directly on GPU. Both physics simulation and the neural network policy training reside on GPU and communicate by directly passing data from physics buffers to PyTorch tensors without ever going through any CPU bottlenecks. This leads to blazing fast training times for complex robotics tasks on a single GPU with 2-3 orders of magnitude improvements compared to conventional RL training that uses a CPU based simulator and GPU for neural networks. We host the results and videos at \url{https://sites.google.com/view/isaacgym-nvidia} and isaac gym can be downloaded at \url{https://developer.nvidia.com/isaac-gym}. △ Less

Submitted 25 August, 2021; v1 submitted 23 August, 2021; originally announced August 2021.

Comments: tech report on isaac-gym

arXiv:2108.09779 [pdf, other]

doi 10.1109/IROS47612.2022.9981458

Transferring Dexterous Manipulation from GPU Simulation to a Remote Real-World TriFinger

Authors: Arthur Allshire, Mayank Mittal, Varun Lodaya, Viktor Makoviychuk, Denys Makoviichuk, Felix Widmaier, Manuel Wüthrich, Stefan Bauer, Ankur Handa, Animesh Garg

Abstract: We present a system for learning a challenging dexterous manipulation task involving moving a cube to an arbitrary 6-DoF pose with only 3-fingers trained with NVIDIA's IsaacGym simulator. We show empirical benefits, both in simulation and sim-to-real transfer, of using keypoints as opposed to position+quaternion representations for the object pose in 6-DoF for policy observations and in reward cal… ▽ More We present a system for learning a challenging dexterous manipulation task involving moving a cube to an arbitrary 6-DoF pose with only 3-fingers trained with NVIDIA's IsaacGym simulator. We show empirical benefits, both in simulation and sim-to-real transfer, of using keypoints as opposed to position+quaternion representations for the object pose in 6-DoF for policy observations and in reward calculation to train a model-free reinforcement learning agent. By utilizing domain randomization strategies along with the keypoint representation of the pose of the manipulated object, we achieve a high success rate of 83% on a remote TriFinger system maintained by the organizers of the Real Robot Challenge. With the aim of assisting further research in learning in-hand manipulation, we make the codebase of our system, along with trained checkpoints that come with billions of steps of experience available, at https://s2r2-ig.github.io △ Less

Submitted 20 October, 2022; v1 submitted 22 August, 2021; originally announced August 2021.

Comments: International Conference on Intelligent Robots and Systems (IROS 2022)

arXiv:2104.04631 [pdf, other]

DexYCB: A Benchmark for Capturing Hand Grasping of Objects

Authors: Yu-Wei Chao, Wei Yang, Yu Xiang, Pavlo Molchanov, Ankur Handa, Jonathan Tremblay, Yashraj S. Narang, Karl Van Wyk, Umar Iqbal, Stan Birchfield, Jan Kautz, Dieter Fox

Abstract: We introduce DexYCB, a new dataset for capturing hand grasping of objects. We first compare DexYCB with a related one through cross-dataset evaluation. We then present a thorough benchmark of state-of-the-art approaches on three relevant tasks: 2D object and keypoint detection, 6D object pose estimation, and 3D hand pose estimation. Finally, we evaluate a new robotics-relevant task: generating saf… ▽ More We introduce DexYCB, a new dataset for capturing hand grasping of objects. We first compare DexYCB with a related one through cross-dataset evaluation. We then present a thorough benchmark of state-of-the-art approaches on three relevant tasks: 2D object and keypoint detection, 6D object pose estimation, and 3D hand pose estimation. Finally, we evaluate a new robotics-relevant task: generating safe robot grasps in human-to-robot object handover. Dataset and code are available at https://dex-ycb.github.io. △ Less

Submitted 9 April, 2021; originally announced April 2021.

Comments: Accepted to CVPR 2021

arXiv:2012.03806 [pdf, other]

Perspectives on Sim2Real Transfer for Robotics: A Summary of the R:SS 2020 Workshop

Authors: Sebastian Höfer, Kostas Bekris, Ankur Handa, Juan Camilo Gamboa, Florian Golemo, Melissa Mozifian, Chris Atkeson, Dieter Fox, Ken Goldberg, John Leonard, C. Karen Liu, Jan Peters, Shuran Song, Peter Welinder, Martha White

Abstract: This report presents the debates, posters, and discussions of the Sim2Real workshop held in conjunction with the 2020 edition of the "Robotics: Science and System" conference. Twelve leaders of the field took competing debate positions on the definition, viability, and importance of transferring skills from simulation to the real world in the context of robotics problems. The debaters also joined… ▽ More This report presents the debates, posters, and discussions of the Sim2Real workshop held in conjunction with the 2020 edition of the "Robotics: Science and System" conference. Twelve leaders of the field took competing debate positions on the definition, viability, and importance of transferring skills from simulation to the real world in the context of robotics problems. The debaters also joined a large panel discussion, answering audience questions and outlining the future of Sim2Real in robotics. Furthermore, we invited extended abstracts to this workshop which are summarized in this report. Based on the workshop, this report concludes with directions for practitioners exploiting this technology and for researchers further exploring open problems in this area. △ Less

Submitted 7 December, 2020; originally announced December 2020.

Comments: Summary of the "2nd Workshop on Closing the Reality Gap in Sim2Real Transfer for Robotics" held in conjunction with "Robotics: Science and System 2020". Website: https://sim2real.github.io/

arXiv:2011.08985 [pdf, other]

A User's Guide to Calibrating Robotics Simulators

Authors: Bhairav Mehta, Ankur Handa, Dieter Fox, Fabio Ramos

Abstract: Simulators are a critical component of modern robotics research. Strategies for both perception and decision making can be studied in simulation first before deployed to real world systems, saving on time and costs. Despite significant progress on the development of sim-to-real algorithms, the analysis of different methods is still conducted in an ad-hoc manner, without a consistent set of tests a… ▽ More Simulators are a critical component of modern robotics research. Strategies for both perception and decision making can be studied in simulation first before deployed to real world systems, saving on time and costs. Despite significant progress on the development of sim-to-real algorithms, the analysis of different methods is still conducted in an ad-hoc manner, without a consistent set of tests and metrics for comparison. This paper fills this gap and proposes a set of benchmarks and a framework for the study of various algorithms aimed to transfer models and policies learnt in simulation to the real world. We conduct experiments on a wide range of well known simulated environments to characterize and offer insights into the performance of different algorithms. Our analysis can be useful for practitioners working in this area and can help make informed choices about the behavior and main properties of sim-to-real algorithms. We open-source the benchmark, training data, and trained models, which can be found at https://github.com/NVlabs/sim-parameter-estimation. △ Less

Submitted 17 November, 2020; originally announced November 2020.

Comments: Accepted at Conference on Robot Learning 2020

arXiv:2006.03106 [pdf, other]

doi 10.1109/LRA.2020.2972836

Model-Based Generalization Under Parameter Uncertainty Using Path Integral Control

Authors: Ian Abraham, Ankur Handa, Nathan Ratliff, Kendall Lowrey, Todd D. Murphey, Dieter Fox

Abstract: This work addresses the problem of robot interaction in complex environments where online control and adaptation is necessary. By expanding the sample space in the free energy formulation of path integral control, we derive a natural extension to the path integral control that embeds uncertainty into action and provides robustness for model-based robot planning. Our algorithm is applied to a diver… ▽ More This work addresses the problem of robot interaction in complex environments where online control and adaptation is necessary. By expanding the sample space in the free energy formulation of path integral control, we derive a natural extension to the path integral control that embeds uncertainty into action and provides robustness for model-based robot planning. Our algorithm is applied to a diverse set of tasks using different robots and validate our results in simulation and real-world experiments. We further show that our method is capable of running in real-time without loss of performance. Videos of the experiments as well as additional implementation details can be found at https://sites.google.com/view/emppi. △ Less

Submitted 4 June, 2020; originally announced June 2020.

Journal ref: IEEE Robotics and Automation Letters ( Volume: 5 , Issue: 2 , April 2020 )

arXiv:2003.01223 [pdf, other]

A Deep learning Approach to Generate Contrast-Enhanced Computerised Tomography Angiography without the Use of Intravenous Contrast Agents

Authors: Anirudh Chandrashekar, Ashok Handa, Natesh Shivakumar, Pierfrancesco Lapolla, Vicente Grau, Regent Lee

Abstract: Contrast-enhanced computed tomography angiograms (CTAs) are widely used in cardiovascular imaging to obtain a non-invasive view of arterial structures. However, contrast agents are associated with complications at the injection site as well as renal toxicity leading to contrast-induced nephropathy (CIN) and renal failure. We hypothesised that the raw data acquired from a non-contrast CT contains s… ▽ More Contrast-enhanced computed tomography angiograms (CTAs) are widely used in cardiovascular imaging to obtain a non-invasive view of arterial structures. However, contrast agents are associated with complications at the injection site as well as renal toxicity leading to contrast-induced nephropathy (CIN) and renal failure. We hypothesised that the raw data acquired from a non-contrast CT contains sufficient information to differentiate blood and other soft tissue components. We utilised deep learning methods to define the subtleties between soft tissue components in order to simulate contrast enhanced CTAs without contrast agents. Twenty-six patients with paired non-contrast and CTA images were randomly selected from an approved clinical study. Non-contrast axial slices within the AAA from 10 patients (n = 100) were sampled for the underlying Hounsfield unit (HU) distribution at the lumen, intra-luminal thrombus and interface locations. Sampling of HUs in these regions revealed significant differences between all regions (p<0.001 for all comparisons), confirming the intrinsic differences in the radiomic signatures between these regions. To generate a large training dataset, paired axial slices from the training set (n=13) were augmented to produce a total of 23,551 2-D images. We trained a 2-D Cycle Generative Adversarial Network (cycleGAN) for this non-contrast to contrast (NC2C) transformation task. The accuracy of the cycleGAN output was assessed by comparison to the contrast image. This pipeline is able to differentiate between visually incoherent soft tissue regions in non-contrast CT images. The CTAs generated from the non-contrast images bear strong resemblance to the ground truth. Here we describe a novel application of Generative Adversarial Network for CT image processing. This is poised to disrupt clinical pathways requiring contrast enhanced CT imaging. △ Less

Submitted 2 March, 2020; originally announced March 2020.

Comments: 7 Pages, 6 Figures

arXiv:2002.12160 [pdf, other]

In-Hand Object Pose Tracking via Contact Feedback and GPU-Accelerated Robotic Simulation

Authors: Jacky Liang, Ankur Handa, Karl Van Wyk, Viktor Makoviychuk, Oliver Kroemer, Dieter Fox

Abstract: Tracking the pose of an object while it is being held and manipulated by a robot hand is difficult for vision-based methods due to significant occlusions. Prior works have explored using contact feedback and particle filters to localize in-hand objects. However, they have mostly focused on the static grasp setting and not when the object is in motion, as doing so requires modeling of complex conta… ▽ More Tracking the pose of an object while it is being held and manipulated by a robot hand is difficult for vision-based methods due to significant occlusions. Prior works have explored using contact feedback and particle filters to localize in-hand objects. However, they have mostly focused on the static grasp setting and not when the object is in motion, as doing so requires modeling of complex contact dynamics. In this work, we propose using GPU-accelerated parallel robot simulations and derivative-free, sample-based optimizers to track in-hand object poses with contact feedback during manipulation. We use physics simulation as the forward model for robot-object interactions, and the algorithm jointly optimizes for the state and the parameters of the simulations, so they better match with those of the real world. Our method runs in real-time (30Hz) on a single GPU, and it achieves an average point cloud distance error of 6mm in simulation experiments and 13mm in the real-world ones. View experiment videos at https://sites.google.com/view/in-hand-object-pose-tracking/ △ Less

Submitted 5 November, 2020; v1 submitted 27 February, 2020; originally announced February 2020.

Comments: Accepted to the International Conference on Robotics and Automation (ICRA) 2020

arXiv:2002.03463 [pdf]

A Deep Learning Approach to Automate High-Resolution Blood Vessel Reconstruction on Computerized Tomography Images With or Without the Use of Contrast Agent

Authors: Anirudh Chandrashekar, Ashok Handa, Natesh Shivakumar, Pierfrancesco Lapolla, Vicente Grau, Regent Lee

Abstract: Existing methods to reconstruct vascular structures from a computed tomography (CT) angiogram rely on injection of intravenous contrast to enhance the radio-density within the vessel lumen. However, pathological changes can be present in the blood lumen, vessel wall or a combination of both that prevent accurate reconstruction. In the example of aortic aneurysmal disease, a blood clot or thrombus… ▽ More Existing methods to reconstruct vascular structures from a computed tomography (CT) angiogram rely on injection of intravenous contrast to enhance the radio-density within the vessel lumen. However, pathological changes can be present in the blood lumen, vessel wall or a combination of both that prevent accurate reconstruction. In the example of aortic aneurysmal disease, a blood clot or thrombus adherent to the aortic wall within the expanding aneurysmal sac is present in 70-80% of cases. These deformations prevent the automatic extraction of vital clinically relevant information by current methods. In this study, we implemented a modified U-Net architecture with attention-gating to establish a high-throughput and automated segmentation pipeline of pathological blood vessels in CT images acquired with or without the use of a contrast agent. Twenty-six patients with paired non-contrast and contrast-enhanced CT images within the ongoing Oxford Abdominal Aortic Aneurysm (OxAAA) study were randomly selected, manually annotated and used for model training and evaluation (13/13). Data augmentation methods were implemented to diversify the training data set in a ratio of 10:1. The performance of our Attention-based U-Net in extracting both the inner lumen and the outer wall of the aortic aneurysm from CT angiograms (CTA) was compared against a generic 3-D U-Net and displayed superior results. Subsequent implementation of this network architecture within the aortic segmentation pipeline from both contrast-enhanced CTA and non-contrast CT images has allowed for accurate and efficient extraction of the entire aortic volume. This extracted volume can be used to standardize current methods of aneurysmal disease management and sets the foundation for subsequent complex geometric and morphological analysis. Furthermore, the proposed pipeline can be extended to other vascular pathologies. △ Less

Submitted 9 February, 2020; originally announced February 2020.

Comments: 18 pages, 10 figures, 7 tables

arXiv:2001.02153 [pdf, other]

Information Theoretic Model Predictive Q-Learning

Authors: Mohak Bhardwaj, Ankur Handa, Dieter Fox, Byron Boots

Abstract: Model-free Reinforcement Learning (RL) works well when experience can be collected cheaply and model-based RL is effective when system dynamics can be modeled accurately. However, both assumptions can be violated in real world problems such as robotics, where querying the system can be expensive and real-world dynamics can be difficult to model. In contrast to RL, Model Predictive Control (MPC) al… ▽ More Model-free Reinforcement Learning (RL) works well when experience can be collected cheaply and model-based RL is effective when system dynamics can be modeled accurately. However, both assumptions can be violated in real world problems such as robotics, where querying the system can be expensive and real-world dynamics can be difficult to model. In contrast to RL, Model Predictive Control (MPC) algorithms use a simulator to optimize a simple policy class online, constructing a closed-loop controller that can effectively contend with real-world dynamics. MPC performance is usually limited by factors such as model bias and the limited horizon of optimization. In this work, we present a novel theoretical connection between information theoretic MPC and entropy regularized RL and develop a Q-learning algorithm that can leverage biased models. We validate the proposed algorithm on sim-to-sim control tasks to demonstrate the improvements over optimal control and reinforcement learning from scratch. Our approach paves the way for deploying reinforcement learning algorithms on real systems in a systematic manner. △ Less

Submitted 5 May, 2020; v1 submitted 30 December, 2019; originally announced January 2020.

Comments: Extended version (15 pages) of paper accepted at the 2nd Learning for Dynamics and Control (L4DC) Conference, 2020

arXiv:1910.03135 [pdf, other]

DexPilot: Vision Based Teleoperation of Dexterous Robotic Hand-Arm System

Authors: Ankur Handa, Karl Van Wyk, Wei Yang, Jacky Liang, Yu-Wei Chao, Qian Wan, Stan Birchfield, Nathan Ratliff, Dieter Fox

Abstract: Teleoperation offers the possibility of imparting robotic systems with sophisticated reasoning skills, intuition, and creativity to perform tasks. However, current teleoperation solutions for high degree-of-actuation (DoA), multi-fingered robots are generally cost-prohibitive, while low-cost offerings usually provide reduced degrees of control. Herein, a low-cost, vision based teleoperation system… ▽ More Teleoperation offers the possibility of imparting robotic systems with sophisticated reasoning skills, intuition, and creativity to perform tasks. However, current teleoperation solutions for high degree-of-actuation (DoA), multi-fingered robots are generally cost-prohibitive, while low-cost offerings usually provide reduced degrees of control. Herein, a low-cost, vision based teleoperation system, DexPilot, was developed that allows for complete control over the full 23 DoA robotic system by merely observing the bare human hand. DexPilot enables operators to carry out a variety of complex manipulation tasks that go beyond simple pick-and-place operations. This allows for collection of high dimensional, multi-modality, state-action data that can be leveraged in the future to learn sensorimotor policies for challenging manipulation tasks. The system performance was measured through speed and reliability metrics across two human demonstrators on a variety of tasks. The videos of the experiments can be found at https://sites.google.com/view/dex-pilot. △ Less

Submitted 14 October, 2019; v1 submitted 7 October, 2019; originally announced October 2019.

Comments: 17 pages, first version of DexPilot

arXiv:1904.03754 [pdf, other]

ContactGrasp: Functional Multi-finger Grasp Synthesis from Contact

Authors: Samarth Brahmbhatt, Ankur Handa, James Hays, Dieter Fox

Abstract: Grasping and manipulating objects is an important human skill. Since most objects are designed to be manipulated by human hands, anthropomorphic hands can enable richer human-robot interaction. Desirable grasps are not only stable, but also functional: they enable post-grasp actions with the object. However, functional grasp synthesis for high degree-of-freedom anthropomorphic hands from object sh… ▽ More Grasping and manipulating objects is an important human skill. Since most objects are designed to be manipulated by human hands, anthropomorphic hands can enable richer human-robot interaction. Desirable grasps are not only stable, but also functional: they enable post-grasp actions with the object. However, functional grasp synthesis for high degree-of-freedom anthropomorphic hands from object shape alone is challenging because of the large optimization space. We present ContactGrasp, a framework for functional grasp synthesis from object shape and contact on the object surface. Contact can be manually specified or obtained through demonstrations. Our contact representation is object-centric and allows functional grasp synthesis even for hand models different than the one used for demonstration. Using a dataset of contact demonstrations from humans grasping diverse household objects, we synthesize functional grasps for three hand models and two functional intents. The project webpage is https://contactdb.cc.gatech.edu/contactgrasp.html. △ Less

Submitted 25 July, 2019; v1 submitted 7 April, 2019; originally announced April 2019.

Comments: IROS 2019 camera ready version

arXiv:1811.03704 [pdf, other]

doi 10.1109/ICRA.2019.8793520

Learning Latent Space Dynamics for Tactile Servoing

Authors: Giovanni Sutanto, Nathan Ratliff, Balakumar Sundaralingam, Yevgen Chebotar, Zhe Su, Ankur Handa, Dieter Fox

Abstract: To achieve a dexterous robotic manipulation, we need to endow our robot with tactile feedback capability, i.e. the ability to drive action based on tactile sensing. In this paper, we specifically address the challenge of tactile servoing, i.e. given the current tactile sensing and a target/goal tactile sensing --memorized from a successful task execution in the past-- what is the action that will… ▽ More To achieve a dexterous robotic manipulation, we need to endow our robot with tactile feedback capability, i.e. the ability to drive action based on tactile sensing. In this paper, we specifically address the challenge of tactile servoing, i.e. given the current tactile sensing and a target/goal tactile sensing --memorized from a successful task execution in the past-- what is the action that will bring the current tactile sensing to move closer towards the target tactile sensing at the next time step. We develop a data-driven approach to acquire a dynamics model for tactile servoing by learning from demonstration. Moreover, our method represents the tactile sensing information as to lie on a surface --or a 2D manifold-- and perform a manifold learning, making it applicable to any tactile skin geometry. We evaluate our method on a contact point tracking task using a robot equipped with a tactile finger. A video demonstrating our approach can be seen in https://youtu.be/0QK0-Vx7WkI △ Less

Submitted 15 April, 2019; v1 submitted 8 November, 2018; originally announced November 2018.

Comments: Accepted to be published at the International Conference on Robotics and Automation (ICRA) 2019. The final version for publication at ICRA 2019 is 7 pages (i.e. 6 pages of technical content (including text, figures, tables, acknowledgement, etc.) and 1 page of the Bibliography/References), while this arXiv version is 8 pages (added Appendix and some extra details)

arXiv:1810.06187 [pdf, other]

Robust Learning of Tactile Force Estimation through Robot Interaction

Authors: Balakumar Sundaralingam, Alexander Lambert, Ankur Handa, Byron Boots, Tucker Hermans, Stan Birchfield, Nathan Ratliff, Dieter Fox

Abstract: Current methods for estimating force from tactile sensor signals are either inaccurate analytic models or task-specific learned models. In this paper, we explore learning a robust model that maps tactile sensor signals to force. We specifically explore learning a mapping for the SynTouch BioTac sensor via neural networks. We propose a voxelized input feature layer for spatial signals and leverage… ▽ More Current methods for estimating force from tactile sensor signals are either inaccurate analytic models or task-specific learned models. In this paper, we explore learning a robust model that maps tactile sensor signals to force. We specifically explore learning a mapping for the SynTouch BioTac sensor via neural networks. We propose a voxelized input feature layer for spatial signals and leverage information about the sensor surface to regularize the loss function. To learn a robust tactile force model that transfers across tasks, we generate ground truth data from three different sources: (1) the BioTac rigidly mounted to a force torque~(FT) sensor, (2) a robot interacting with a ball rigidly attached to the same FT sensor, and (3) through force inference on a planar pushing task by formalizing the mechanics as a system of particles and optimizing over the object motion. A total of 140k samples were collected from the three sources. We achieve a median angular accuracy of 3.5 degrees in predicting force direction (66% improvement over the current state of the art) and a median magnitude accuracy of 0.06 N (93% improvement) on a test dataset. Additionally, we evaluate the learned force model in a force feedback grasp controller performing object lifting and gentle placement. Our results can be found on https://sites.google.com/view/tactile-force. △ Less

Submitted 5 March, 2019; v1 submitted 15 October, 2018; originally announced October 2018.

Comments: accepted to ICRA 2019 (camera ready version)

arXiv:1810.05762 [pdf, other]

GPU-Accelerated Robotic Simulation for Distributed Reinforcement Learning

Authors: Jacky Liang, Viktor Makoviychuk, Ankur Handa, Nuttapong Chentanez, Miles Macklin, Dieter Fox

Abstract: Most Deep Reinforcement Learning (Deep RL) algorithms require a prohibitively large number of training samples for learning complex tasks. Many recent works on speeding up Deep RL have focused on distributed training and simulation. While distributed training is often done on the GPU, simulation is not. In this work, we propose using GPU-accelerated RL simulations as an alternative to CPU ones. Us… ▽ More Most Deep Reinforcement Learning (Deep RL) algorithms require a prohibitively large number of training samples for learning complex tasks. Many recent works on speeding up Deep RL have focused on distributed training and simulation. While distributed training is often done on the GPU, simulation is not. In this work, we propose using GPU-accelerated RL simulations as an alternative to CPU ones. Using NVIDIA Flex, a GPU-based physics engine, we show promising speed-ups of learning various continuous-control, locomotion tasks. With one GPU and CPU core, we are able to train the Humanoid running task in less than 20 minutes, using 10-1000x fewer CPU cores than previous works. We also demonstrate the scalability of our simulator to multi-GPU settings to train more challenging locomotion tasks. △ Less

Submitted 24 October, 2018; v1 submitted 12 October, 2018; originally announced October 2018.

Comments: Accepted and to appear at the Conference on Robot Learning (CoRL) 2018

arXiv:1810.05687 [pdf, other]

Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience

Authors: Yevgen Chebotar, Ankur Handa, Viktor Makoviychuk, Miles Macklin, Jan Issac, Nathan Ratliff, Dieter Fox

Abstract: We consider the problem of transferring policies to the real world by training on a distribution of simulated scenarios. Rather than manually tuning the randomization of simulations, we adapt the simulation parameter distribution using a few real world roll-outs interleaved with policy training. In doing so, we are able to change the distribution of simulations to improve the policy transfer by ma… ▽ More We consider the problem of transferring policies to the real world by training on a distribution of simulated scenarios. Rather than manually tuning the randomization of simulations, we adapt the simulation parameter distribution using a few real world roll-outs interleaved with policy training. In doing so, we are able to change the distribution of simulations to improve the policy transfer by matching the policy behavior in simulation and the real world. We show that policies trained with our method are able to reliably transfer to different robots in two real world tasks: swing-peg-in-hole and opening a cabinet drawer. The video of our experiments can be found at https://sites.google.com/view/simopt △ Less

Submitted 5 March, 2019; v1 submitted 12 October, 2018; originally announced October 2018.

arXiv:1710.06425 [pdf, other]

Domain Randomization and Generative Models for Robotic Grasping

Authors: Joshua Tobin, Lukas Biewald, Rocky Duan, Marcin Andrychowicz, Ankur Handa, Vikash Kumar, Bob McGrew, Jonas Schneider, Peter Welinder, Wojciech Zaremba, Pieter Abbeel

Abstract: Deep learning-based robotic grasping has made significant progress thanks to algorithmic improvements and increased data availability. However, state-of-the-art models are often trained on as few as hundreds or thousands of unique object instances, and as a result generalization can be a challenge. In this work, we explore a novel data generation pipeline for training a deep neural network to pe… ▽ More Deep learning-based robotic grasping has made significant progress thanks to algorithmic improvements and increased data availability. However, state-of-the-art models are often trained on as few as hundreds or thousands of unique object instances, and as a result generalization can be a challenge. In this work, we explore a novel data generation pipeline for training a deep neural network to perform grasp planning that applies the idea of domain randomization to object synthesis. We generate millions of unique, unrealistic procedurally generated objects, and train a deep neural network to perform grasp planning on these objects. Since the distribution of successful grasps for a given object can be highly multimodal, we propose an autoregressive grasp planning model that maps sensor inputs of a scene to a probability distribution over possible grasps. This model allows us to sample grasps efficiently at test time (or avoid sampling entirely). We evaluate our model architecture and data generation pipeline in simulation and the real world. We find we can achieve a $>$90% success rate on previously unseen realistic objects at test time in simulation despite having only been trained on random objects. We also demonstrate an 80% success rate on real-world grasp attempts despite having only been trained on random simulated objects. △ Less

Submitted 3 April, 2018; v1 submitted 17 October, 2017; originally announced October 2017.

Comments: 8 pages, 11 figures. Submitted to 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2018)

arXiv:1705.08260 [pdf]

Self-Supervised Siamese Learning on Stereo Image Pairs for Depth Estimation in Robotic Surgery

Authors: Menglong Ye, Edward Johns, Ankur Handa, Lin Zhang, Philip Pratt, Guang-Zhong Yang

Abstract: Robotic surgery has become a powerful tool for performing minimally invasive procedures, providing advantages in dexterity, precision, and 3D vision, over traditional surgery. One popular robotic system is the da Vinci surgical platform, which allows preoperative information to be incorporated into live procedures using Augmented Reality (AR). Scene depth estimation is a prerequisite for AR, as ac… ▽ More Robotic surgery has become a powerful tool for performing minimally invasive procedures, providing advantages in dexterity, precision, and 3D vision, over traditional surgery. One popular robotic system is the da Vinci surgical platform, which allows preoperative information to be incorporated into live procedures using Augmented Reality (AR). Scene depth estimation is a prerequisite for AR, as accurate registration requires 3D correspondences between preoperative and intraoperative organ models. In the past decade, there has been much progress on depth estimation for surgical scenes, such as using monocular or binocular laparoscopes [1,2]. More recently, advances in deep learning have enabled depth estimation via Convolutional Neural Networks (CNNs) [3], but training requires a large image dataset with ground truth depths. Inspired by [4], we propose a deep learning framework for surgical scene depth estimation using self-supervision for scalable data acquisition. Our framework consists of an autoencoder for depth prediction, and a differentiable spatial transformer for training the autoencoder on stereo image pairs without ground truth depths. Validation was conducted on stereo videos collected in robotic partial nephrectomy. △ Less

Submitted 17 May, 2017; originally announced May 2017.

Comments: A two-page short report to be presented at the Hamlyn Symposium on Medical Robotics 2017. An extension of this work is on progress

arXiv:1612.05079 [pdf, other]

SceneNet RGB-D: 5M Photorealistic Images of Synthetic Indoor Trajectories with Ground Truth

Authors: John McCormac, Ankur Handa, Stefan Leutenegger, Andrew J. Davison

Abstract: We introduce SceneNet RGB-D, expanding the previous work of SceneNet to enable large scale photorealistic rendering of indoor scene trajectories. It provides pixel-perfect ground truth for scene understanding problems such as semantic segmentation, instance segmentation, and object detection, and also for geometric computer vision problems such as optical flow, depth estimation, camera pose estima… ▽ More We introduce SceneNet RGB-D, expanding the previous work of SceneNet to enable large scale photorealistic rendering of indoor scene trajectories. It provides pixel-perfect ground truth for scene understanding problems such as semantic segmentation, instance segmentation, and object detection, and also for geometric computer vision problems such as optical flow, depth estimation, camera pose estimation, and 3D reconstruction. Random sampling permits virtually unlimited scene configurations, and here we provide a set of 5M rendered RGB-D images from over 15K trajectories in synthetic layouts with random but physically simulated object poses. Each layout also has random lighting, camera trajectories, and textures. The scale of this dataset is well suited for pre-training data-driven computer vision techniques from scratch with RGB-D inputs, which previously has been limited by relatively small labelled datasets in NYUv2 and SUN RGB-D. It also provides a basis for investigating 3D scene labelling tasks by providing perfect camera poses and depth data as proxy for a SLAM system. We host the dataset at http://robotvault.bitbucket.io/scenenet-rgbd.html △ Less

Submitted 30 January, 2017; v1 submitted 15 December, 2016; originally announced December 2016.

arXiv:1609.05130 [pdf, other]

SemanticFusion: Dense 3D Semantic Mapping with Convolutional Neural Networks

Authors: John McCormac, Ankur Handa, Andrew Davison, Stefan Leutenegger

Abstract: Ever more robust, accurate and detailed mapping using visual sensing has proven to be an enabling factor for mobile robots across a wide variety of applications. For the next level of robot intelligence and intuitive user interaction, maps need extend beyond geometry and appearence - they need to contain semantics. We address this challenge by combining Convolutional Neural Networks (CNNs) and a s… ▽ More Ever more robust, accurate and detailed mapping using visual sensing has proven to be an enabling factor for mobile robots across a wide variety of applications. For the next level of robot intelligence and intuitive user interaction, maps need extend beyond geometry and appearence - they need to contain semantics. We address this challenge by combining Convolutional Neural Networks (CNNs) and a state of the art dense Simultaneous Localisation and Mapping (SLAM) system, ElasticFusion, which provides long-term dense correspondence between frames of indoor RGB-D video even during loopy scanning trajectories. These correspondences allow the CNN's semantic predictions from multiple view points to be probabilistically fused into a map. This not only produces a useful semantic 3D map, but we also show on the NYUv2 dataset that fusing multiple predictions leads to an improvement even in the 2D semantic labelling over baseline single frame predictions. We also show that for a smaller reconstruction dataset with larger variation in prediction viewpoint, the improvement over single frame segmentation increases. Our system is efficient enough to allow real-time interactive use at frame-rates of approximately 25Hz. △ Less

Submitted 28 September, 2016; v1 submitted 16 September, 2016; originally announced September 2016.

arXiv:1607.07405 [pdf, other]

gvnn: Neural Network Library for Geometric Computer Vision

Authors: Ankur Handa, Michael Bloesch, Viorica Patraucean, Simon Stent, John McCormac, Andrew Davison

Abstract: We introduce gvnn, a neural network library in Torch aimed towards bridging the gap between classic geometric computer vision and deep learning. Inspired by the recent success of Spatial Transformer Networks, we propose several new layers which are often used as parametric transformations on the data in geometric computer vision. These layers can be inserted within a neural network much in the spi… ▽ More We introduce gvnn, a neural network library in Torch aimed towards bridging the gap between classic geometric computer vision and deep learning. Inspired by the recent success of Spatial Transformer Networks, we propose several new layers which are often used as parametric transformations on the data in geometric computer vision. These layers can be inserted within a neural network much in the spirit of the original spatial transformers and allow backpropagation to enable end-to-end learning of a network involving any domain knowledge in geometric computer vision. This opens up applications in learning invariance to 3D geometric transformation for place recognition, end-to-end visual odometry, depth estimation and unsupervised learning through warping with a parametric transformation for image reconstruction error. △ Less

Submitted 12 August, 2016; v1 submitted 25 July, 2016; originally announced July 2016.

Comments: Submitted to ECCV Workshop on Deep Geometry

arXiv:1604.00895 [pdf, other]

HDRFusion: HDR SLAM using a low-cost auto-exposure RGB-D sensor

Authors: Shuda Li, Ankur Handa, Yang Zhang, Andrew Calway

Abstract: We describe a new method for comparing frame appearance in a frame-to-model 3-D mapping and tracking system using an low dynamic range (LDR) RGB-D camera which is robust to brightness changes caused by auto exposure. It is based on a normalised radiance measure which is invariant to exposure changes and not only robustifies the tracking under changing lighting conditions, but also enables the foll… ▽ More We describe a new method for comparing frame appearance in a frame-to-model 3-D mapping and tracking system using an low dynamic range (LDR) RGB-D camera which is robust to brightness changes caused by auto exposure. It is based on a normalised radiance measure which is invariant to exposure changes and not only robustifies the tracking under changing lighting conditions, but also enables the following exposure compensation perform accurately to allow online building of high dynamic range (HDR) maps. The latter facilitates the frame-to-model tracking to minimise drift as well as better capturing light variation within the scene. Results from experiments with synthetic and real data demonstrate that the method provides both improved tracking and maps with far greater dynamic range of luminosity. △ Less

Submitted 4 April, 2016; originally announced April 2016.

Comments: 14 pages

arXiv:1511.07041 [pdf, other]

SceneNet: Understanding Real World Indoor Scenes With Synthetic Data

Authors: Ankur Handa, Viorica Patraucean, Vijay Badrinarayanan, Simon Stent, Roberto Cipolla

Abstract: Scene understanding is a prerequisite to many high level tasks for any automated intelligent machine operating in real world environments. Recent attempts with supervised learning have shown promise in this direction but also highlighted the need for enormous quantity of supervised data --- performance increases in proportion to the amount of data used. However, this quickly becomes prohibitive wh… ▽ More Scene understanding is a prerequisite to many high level tasks for any automated intelligent machine operating in real world environments. Recent attempts with supervised learning have shown promise in this direction but also highlighted the need for enormous quantity of supervised data --- performance increases in proportion to the amount of data used. However, this quickly becomes prohibitive when considering the manual labour needed to collect such data. In this work, we focus our attention on depth based semantic per-pixel labelling as a scene understanding problem and show the potential of computer graphics to generate virtually unlimited labelled data from synthetic 3D scenes. By carefully synthesizing training data with appropriate noise models we show comparable performance to state-of-the-art RGBD systems on NYUv2 dataset despite using only depth data as input and set a benchmark on depth-based segmentation on SUN RGB-D dataset. Additionally, we offer a route to generating synthesized frame or video data, and understanding of different factors influencing performance gains. △ Less

Submitted 26 November, 2015; v1 submitted 22 November, 2015; originally announced November 2015.

arXiv:1511.06309 [pdf, other]

Spatio-temporal video autoencoder with differentiable memory

Authors: Viorica Patraucean, Ankur Handa, Roberto Cipolla

Abstract: We describe a new spatio-temporal video autoencoder, based on a classic spatial image autoencoder and a novel nested temporal autoencoder. The temporal encoder is represented by a differentiable visual memory composed of convolutional long short-term memory (LSTM) cells that integrate changes over time. Here we target motion changes and use as temporal decoder a robust optical flow prediction modu… ▽ More We describe a new spatio-temporal video autoencoder, based on a classic spatial image autoencoder and a novel nested temporal autoencoder. The temporal encoder is represented by a differentiable visual memory composed of convolutional long short-term memory (LSTM) cells that integrate changes over time. Here we target motion changes and use as temporal decoder a robust optical flow prediction module together with an image sampler serving as built-in feedback loop. The architecture is end-to-end differentiable. At each time step, the system receives as input a video frame, predicts the optical flow based on the current observation and the LSTM memory state as a dense transformation map, and applies it to the current frame to generate the next frame. By minimising the reconstruction error between the predicted next frame and the corresponding ground truth next frame, we train the whole system to extract features useful for motion estimation without any supervision effort. We present one direct application of the proposed framework in weakly-supervised semantic segmentation of videos through label propagation using optical flow. △ Less

Submitted 1 September, 2016; v1 submitted 19 November, 2015; originally announced November 2015.

Comments: The experiments section has been extended and a direct application to weakly-supervised video segmentation through label propagation has been included

arXiv:1505.07293 [pdf, other]

SegNet: A Deep Convolutional Encoder-Decoder Architecture for Robust Semantic Pixel-Wise Labelling

Authors: Vijay Badrinarayanan, Ankur Handa, Roberto Cipolla

Abstract: We propose a novel deep architecture, SegNet, for semantic pixel wise image labelling. SegNet has several attractive properties; (i) it only requires forward evaluation of a fully learnt function to obtain smooth label predictions, (ii) with increasing depth, a larger context is considered for pixel labelling which improves accuracy, and (iii) it is easy to visualise the effect of feature activati… ▽ More We propose a novel deep architecture, SegNet, for semantic pixel wise image labelling. SegNet has several attractive properties; (i) it only requires forward evaluation of a fully learnt function to obtain smooth label predictions, (ii) with increasing depth, a larger context is considered for pixel labelling which improves accuracy, and (iii) it is easy to visualise the effect of feature activation(s) in the pixel label space at any depth. SegNet is composed of a stack of encoders followed by a corresponding decoder stack which feeds into a soft-max classification layer. The decoders help map low resolution feature maps at the output of the encoder stack to full input image size feature maps. This addresses an important drawback of recent deep learning approaches which have adopted networks designed for object categorization for pixel wise labelling. These methods lack a mechanism to map deep layer feature maps to input dimensions. They resort to ad hoc methods to upsample features, e.g. by replication. This results in noisy predictions and also restricts the number of pooling layers in order to avoid too much upsampling and thus reduces spatial context. SegNet overcomes these problems by learning to map encoder outputs to image pixel labels. We test the performance of SegNet on outdoor RGB scenes from CamVid, KITTI and indoor scenes from the NYU dataset. Our results show that SegNet achieves state-of-the-art performance even without use of additional cues such as depth, video frames or post-processing with CRF models. △ Less

Submitted 27 May, 2015; originally announced May 2015.

Comments: This version was first submitted to CVPR' 15 on November 14, 2014 with paper Id 1468. A similar architecture was proposed more recently on May 17, 2015, see http://arxiv.org/pdf/1505.04366.pdf

arXiv:1505.00171 [pdf, other]

SynthCam3D: Semantic Understanding With Synthetic Indoor Scenes

Authors: Ankur Handa, Viorica Patraucean, Vijay Badrinarayanan, Simon Stent, Roberto Cipolla

Abstract: We are interested in automatic scene understanding from geometric cues. To this end, we aim to bring semantic segmentation in the loop of real-time reconstruction. Our semantic segmentation is built on a deep autoencoder stack trained exclusively on synthetic depth data generated from our novel 3D scene library, SynthCam3D. Importantly, our network is able to segment real world scenes without any… ▽ More We are interested in automatic scene understanding from geometric cues. To this end, we aim to bring semantic segmentation in the loop of real-time reconstruction. Our semantic segmentation is built on a deep autoencoder stack trained exclusively on synthetic depth data generated from our novel 3D scene library, SynthCam3D. Importantly, our network is able to segment real world scenes without any noise modelling. We present encouraging preliminary results. △ Less

Submitted 1 May, 2015; originally announced May 2015.

Showing 1–37 of 37 results for author: Handa, A