subscribe to arXiv mailings

Learning to Learn Faster from Human Feedback with Language Model Predictive Control

Authors: Jacky Liang, Fei Xia, Wenhao Yu, Andy Zeng, Montserrat Gonzalez Arenas, Maria Attarian, Maria Bauza, Matthew Bennice, Alex Bewley, Adil Dostmohamed, Chuyuan Kelly Fu, Nimrod Gileadi, Marissa Giustina, Keerthana Gopalakrishnan, Leonard Hasenclever, Jan Humplik, Jasmine Hsu, Nikhil Joshi, Ben Jyenis, Chase Kew, Sean Kirmani, Tsang-Wei Edward Lee, Kuang-Huei Lee, Assaf Hurwitz Michaely, Joss Moore , et al. (25 additional authors not shown)

Abstract: Large language models (LLMs) have been shown to exhibit a wide range of capabilities, such as writing robot code from language commands -- enabling non-experts to direct robot behaviors, modify them based on feedback, or compose them to perform new tasks. However, these capabilities (driven by in-context learning) are limited to short-term interactions, where users' feedback remains relevant for o… ▽ More Large language models (LLMs) have been shown to exhibit a wide range of capabilities, such as writing robot code from language commands -- enabling non-experts to direct robot behaviors, modify them based on feedback, or compose them to perform new tasks. However, these capabilities (driven by in-context learning) are limited to short-term interactions, where users' feedback remains relevant for only as long as it fits within the context size of the LLM, and can be forgotten over longer interactions. In this work, we investigate fine-tuning the robot code-writing LLMs, to remember their in-context interactions and improve their teachability i.e., how efficiently they adapt to human inputs (measured by average number of corrections before the user considers the task successful). Our key observation is that when human-robot interactions are viewed as a partially observable Markov decision process (in which human language inputs are observations, and robot code outputs are actions), then training an LLM to complete previous interactions is training a transition dynamics model -- that can be combined with classic robotics techniques such as model predictive control (MPC) to discover shorter paths to success. This gives rise to Language Model Predictive Control (LMPC), a framework that fine-tunes PaLM 2 to improve its teachability on 78 tasks across 5 robot embodiments -- improving non-expert teaching success rates of unseen tasks by 26.9% while reducing the average number of human corrections from 2.4 to 1.9. Experiments show that LMPC also produces strong meta-learners, improving the success rate of in-context learning new tasks on unseen robot embodiments and APIs by 31.5%. See videos, code, and demos at: https://robot-teaching.github.io/. △ Less

Submitted 31 May, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

arXiv:2307.13133 [pdf, other]

simPLE: a visuotactile method learned in simulation to precisely pick, localize, regrasp, and place objects

Authors: Maria Bauza, Antonia Bronars, Yifan Hou, Ian Taylor, Nikhil Chavan-Dafle, Alberto Rodriguez

Abstract: Existing robotic systems have a clear tension between generality and precision. Deployed solutions for robotic manipulation tend to fall into the paradigm of one robot solving a single task, lacking precise generalization, i.e., the ability to solve many tasks without compromising on precision. This paper explores solutions for precise and general pick-and-place. In precise pick-and-place, i.e. ki… ▽ More Existing robotic systems have a clear tension between generality and precision. Deployed solutions for robotic manipulation tend to fall into the paradigm of one robot solving a single task, lacking precise generalization, i.e., the ability to solve many tasks without compromising on precision. This paper explores solutions for precise and general pick-and-place. In precise pick-and-place, i.e. kitting, the robot transforms an unstructured arrangement of objects into an organized arrangement, which can facilitate further manipulation. We propose simPLE (simulation to Pick Localize and PLacE) as a solution to precise pick-and-place. simPLE learns to pick, regrasp and place objects precisely, given only the object CAD model and no prior experience. We develop three main components: task-aware grasping, visuotactile perception, and regrasp planning. Task-aware grasping computes affordances of grasps that are stable, observable, and favorable to placing. The visuotactile perception model relies on matching real observations against a set of simulated ones through supervised learning. Finally, we compute the desired robot motion by solving a shortest path problem on a graph of hand-to-hand regrasps. On a dual-arm robot equipped with visuotactile sensing, we demonstrate pick-and-place of 15 diverse objects with simPLE. The objects span a wide range of shapes and simPLE achieves successful placements into structured arrangements with 1mm clearance over 90% of the time for 6 objects, and over 80% of the time for 11 objects. Videos are available at http://mcube.mit.edu/research/simPLE.html . △ Less

Submitted 24 July, 2023; originally announced July 2023.

Comments: 33 pages, 6 figures, 2 tables, submitted to Science Robotics

arXiv:2306.11706 [pdf, other]

RoboCat: A Self-Improving Generalist Agent for Robotic Manipulation

Authors: Konstantinos Bousmalis, Giulia Vezzani, Dushyant Rao, Coline Devin, Alex X. Lee, Maria Bauza, Todor Davchev, Yuxiang Zhou, Agrim Gupta, Akhil Raju, Antoine Laurens, Claudio Fantacci, Valentin Dalibard, Martina Zambelli, Murilo Martins, Rugile Pevceviciute, Michiel Blokzijl, Misha Denil, Nathan Batchelor, Thomas Lampe, Emilio Parisotto, Konrad Żołna, Scott Reed, Sergio Gómez Colmenarejo, Jon Scholz , et al. (14 additional authors not shown)

Abstract: The ability to leverage heterogeneous robotic experience from different robots and tasks to quickly master novel skills and embodiments has the potential to transform robot learning. Inspired by recent advances in foundation models for vision and language, we propose a multi-embodiment, multi-task generalist agent for robotic manipulation. This agent, named RoboCat, is a visual goal-conditioned de… ▽ More The ability to leverage heterogeneous robotic experience from different robots and tasks to quickly master novel skills and embodiments has the potential to transform robot learning. Inspired by recent advances in foundation models for vision and language, we propose a multi-embodiment, multi-task generalist agent for robotic manipulation. This agent, named RoboCat, is a visual goal-conditioned decision transformer capable of consuming action-labelled visual experience. This data spans a large repertoire of motor control skills from simulated and real robotic arms with varying sets of observations and actions. With RoboCat, we demonstrate the ability to generalise to new tasks and robots, both zero-shot as well as through adaptation using only 100-1000 examples for the target task. We also show how a trained model itself can be used to generate data for subsequent training iterations, thus providing a basic building block for an autonomous improvement loop. We investigate the agent's capabilities, with large-scale evaluations both in simulation and on three different real robot embodiments. We find that as we grow and diversify its training data, RoboCat not only shows signs of cross-task transfer, but also becomes more efficient at adapting to new tasks. △ Less

Submitted 22 December, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

Comments: Transactions on Machine Learning Research (12/2023)

arXiv:2303.07997 [pdf, other]

FingerSLAM: Closed-loop Unknown Object Localization and Reconstruction from Visuo-tactile Feedback

Authors: Jialiang Zhao, Maria Bauza, Edward H. Adelson

Abstract: In this paper, we address the problem of using visuo-tactile feedback for 6-DoF localization and 3D reconstruction of unknown in-hand objects. We propose FingerSLAM, a closed-loop factor graph-based pose estimator that combines local tactile sensing at finger-tip and global vision sensing from a wrist-mount camera. FingerSLAM is constructed with two constituent pose estimators: a multi-pass refine… ▽ More In this paper, we address the problem of using visuo-tactile feedback for 6-DoF localization and 3D reconstruction of unknown in-hand objects. We propose FingerSLAM, a closed-loop factor graph-based pose estimator that combines local tactile sensing at finger-tip and global vision sensing from a wrist-mount camera. FingerSLAM is constructed with two constituent pose estimators: a multi-pass refined tactile-based pose estimator that captures movements from detailed local textures, and a single-pass vision-based pose estimator that predicts from a global view of the object. We also design a loop closure mechanism that actively matches current vision and tactile images to previously stored key-frames to reduce accumulated error. FingerSLAM incorporates the two sensing modalities of tactile and vision, as well as the loop closure mechanism with a factor graph-based optimization framework. Such a framework produces an optimized pose estimation solution that is more accurate than the standalone estimators. The estimated poses are then used to reconstruct the shape of the unknown object incrementally by stitching the local point clouds recovered from tactile images. We train our system on real-world data collected with 20 objects. We demonstrate reliable visuo-tactile pose estimation and shape reconstruction through quantitative and qualitative real-world evaluations on 6 objects that are unseen during training. △ Less

Submitted 14 March, 2023; originally announced March 2023.

Comments: Submitted and accepted to 2023 IEEE International Conference on Robotics and Automation (ICRA 2023)

arXiv:2204.11701 [pdf, other]

doi 10.1177/02783649231196925

Tac2Pose: Tactile Object Pose Estimation from the First Touch

Authors: Maria Bauza, Antonia Bronars, Alberto Rodriguez

Abstract: In this paper, we present Tac2Pose, an object-specific approach to tactile pose estimation from the first touch for known objects. Given the object geometry, we learn a tailored perception model in simulation that estimates a probability distribution over possible object poses given a tactile observation. To do so, we simulate the contact shapes that a dense set of object poses would produce on th… ▽ More In this paper, we present Tac2Pose, an object-specific approach to tactile pose estimation from the first touch for known objects. Given the object geometry, we learn a tailored perception model in simulation that estimates a probability distribution over possible object poses given a tactile observation. To do so, we simulate the contact shapes that a dense set of object poses would produce on the sensor. Then, given a new contact shape obtained from the sensor, we match it against the pre-computed set using an object-specific embedding learned using contrastive learning. We obtain contact shapes from the sensor with an object-agnostic calibration step that maps RGB tactile observations to binary contact shapes. This mapping, which can be reused across object and sensor instances, is the only step trained with real sensor data. This results in a perception model that localizes objects from the first real tactile observation. Importantly, it produces pose distributions and can incorporate additional pose constraints coming from other perception systems, contacts, or priors. We provide quantitative results for 20 objects. Tac2Pose provides high accuracy pose estimations from distinctive tactile observations while regressing meaningful pose distributions to account for those contact shapes that could result from different object poses. We also test Tac2Pose on object models reconstructed from a 3D scanner, to evaluate the robustness to uncertainty in the object model. Finally, we demonstrate the advantages of Tac2Pose compared with three baseline methods for tactile pose estimation: directly regressing the object pose with a neural network, matching an observed contact to a set of possible contacts using a standard classification neural network, and direct pixel comparison of an observed contact with a set of possible contacts. Website: http://mcube.mit.edu/research/tac2pose.html △ Less

Submitted 14 September, 2023; v1 submitted 25 April, 2022; originally announced April 2022.

Comments: Submitted to IJRR, 22 pages + Appendix, 11 figures

arXiv:2012.05205 [pdf, other]

Tactile Object Pose Estimation from the First Touch with Geometric Contact Rendering

Authors: Maria Bauza, Eric Valls, Bryan Lim, Theo Sechopoulos, Alberto Rodriguez

Abstract: In this paper, we present an approach to tactile pose estimation from the first touch for known objects. First, we create an object-agnostic map from real tactile observations to contact shapes. Next, for a new object with known geometry, we learn a tailored perception model completely in simulation. To do so, we simulate the contact shapes that a dense set of object poses would produce on the sen… ▽ More In this paper, we present an approach to tactile pose estimation from the first touch for known objects. First, we create an object-agnostic map from real tactile observations to contact shapes. Next, for a new object with known geometry, we learn a tailored perception model completely in simulation. To do so, we simulate the contact shapes that a dense set of object poses would produce on the sensor. Then, given a new contact shape obtained from the sensor output, we match it against the pre-computed set using the object-specific embedding learned purely in simulation using contrastive learning. This results in a perception model that can localize objects from a single tactile observation. It also allows reasoning over pose distributions and including additional pose constraints coming from other perception systems or multiple contacts. We provide quantitative results for four objects. Our approach provides high accuracy pose estimations from distinctive tactile observations while regressing pose distributions to account for those contact shapes that could result from different object poses. We further extend and test our approach in multi-contact scenarios where several tactile sensors are simultaneously in contact with the object. Website: http://mcube.mit.edu/research/tactile_loc_first_touch.html △ Less

Submitted 9 December, 2020; originally announced December 2020.

Comments: CORL 2020, 5 figures + 2 in appendix Video: https://youtu.be/2ygtSJTmo08

arXiv:2011.07044 [pdf, other]

Tactile SLAM: Real-time inference of shape and pose from planar pushing

Authors: Sudharshan Suresh, Maria Bauza, Kuan-Ting Yu, Joshua G. Mangelson, Alberto Rodriguez, Michael Kaess

Abstract: Tactile perception is central to robot manipulation in unstructured environments. However, it requires contact, and a mature implementation must infer object models while also accounting for the motion induced by the interaction. In this work, we present a method to estimate both object shape and pose in real-time from a stream of tactile measurements. This is applied towards tactile exploration o… ▽ More Tactile perception is central to robot manipulation in unstructured environments. However, it requires contact, and a mature implementation must infer object models while also accounting for the motion induced by the interaction. In this work, we present a method to estimate both object shape and pose in real-time from a stream of tactile measurements. This is applied towards tactile exploration of an unknown object by planar pushing. We consider this as an online SLAM problem with a nonparametric shape representation. Our formulation of tactile inference alternates between Gaussian process implicit surface regression and pose estimation on a factor graph. Through a combination of local Gaussian processes and fixed-lag smoothing, we infer object shape and pose in real-time. We evaluate our system across different objects in both simulated and real-world planar pushing tasks. △ Less

Submitted 26 March, 2021; v1 submitted 13 November, 2020; originally announced November 2020.

Comments: Camera-ready version to be presented at the 2021 IEEE International Conference on Robotics and Automation (ICRA 2021). For associated video file, see https://youtu.be/wdyagx5MM40

arXiv:2009.10623 [pdf, other]

Tailoring: encoding inductive biases by optimizing unsupervised objectives at prediction time

Authors: Ferran Alet, Maria Bauza, Kenji Kawaguchi, Nurullah Giray Kuru, Tomas Lozano-Perez, Leslie Pack Kaelbling

Abstract: From CNNs to attention mechanisms, encoding inductive biases into neural networks has been a fruitful source of improvement in machine learning. Adding auxiliary losses to the main objective function is a general way of encoding biases that can help networks learn better representations. However, since auxiliary losses are minimized only on training data, they suffer from the same generalization g… ▽ More From CNNs to attention mechanisms, encoding inductive biases into neural networks has been a fruitful source of improvement in machine learning. Adding auxiliary losses to the main objective function is a general way of encoding biases that can help networks learn better representations. However, since auxiliary losses are minimized only on training data, they suffer from the same generalization gap as regular task losses. Moreover, by adding a term to the loss function, the model optimizes a different objective than the one we care about. In this work we address both problems: first, we take inspiration from \textit{transductive learning} and note that after receiving an input but before making a prediction, we can fine-tune our networks on any unsupervised loss. We call this process {\em tailoring}, because we customize the model to each input to ensure our prediction satisfies the inductive bias. Second, we formulate {\em meta-tailoring}, a nested optimization similar to that in meta-learning, and train our models to perform well on the task objective after adapting them using an unsupervised loss. The advantages of tailoring and meta-tailoring are discussed theoretically and demonstrated empirically on a diverse set of examples. △ Less

Submitted 6 September, 2021; v1 submitted 22 September, 2020; originally announced September 2020.

Comments: NeurIPS 2020 workshops on Interpretable Inductive Biases and Meta-learning

arXiv:1911.05071 [pdf, other]

Experience-Embedded Visual Foresight

Authors: Lin Yen-Chen, Maria Bauza, Phillip Isola

Abstract: Visual foresight gives an agent a window into the future, which it can use to anticipate events before they happen and plan strategic behavior. Although impressive results have been achieved on video prediction in constrained settings, these models fail to generalize when confronted with unfamiliar real-world objects. In this paper, we tackle the generalization problem via fast adaptation, where w… ▽ More Visual foresight gives an agent a window into the future, which it can use to anticipate events before they happen and plan strategic behavior. Although impressive results have been achieved on video prediction in constrained settings, these models fail to generalize when confronted with unfamiliar real-world objects. In this paper, we tackle the generalization problem via fast adaptation, where we train a prediction model to quickly adapt to the observed visual dynamics of a novel object. Our method, Experience-embedded Visual Foresight (EVF), jointly learns a fast adaptation module, which encodes observed trajectories of the new object into a vector embedding, and a visual prediction model, which conditions on this embedding to generate physically plausible predictions. For evaluation, we compare our method against baselines on video prediction and benchmark its utility on two real-world control tasks. We show that our method is able to quickly adapt to new visual dynamics and achieves lower error than the baselines when manipulating novel objects. △ Less

Submitted 17 November, 2019; v1 submitted 12 November, 2019; originally announced November 2019.

Comments: CoRL 2019. Project website: http://yenchenlin.me/evf/

arXiv:1911.03112 [pdf, other]

Accurate Vision-based Manipulation through Contact Reasoning

Authors: Alina Kloss, Maria Bauza, Jiajun Wu, Joshua B. Tenenbaum, Alberto Rodriguez, Jeannette Bohg

Abstract: Planning contact interactions is one of the core challenges of many robotic tasks. Optimizing contact locations while taking dynamics into account is computationally costly and, in environments that are only partially observable, executing contact-based tasks often suffers from low accuracy. We present an approach that addresses these two challenges for the problem of vision-based manipulation. Fi… ▽ More Planning contact interactions is one of the core challenges of many robotic tasks. Optimizing contact locations while taking dynamics into account is computationally costly and, in environments that are only partially observable, executing contact-based tasks often suffers from low accuracy. We present an approach that addresses these two challenges for the problem of vision-based manipulation. First, we propose to disentangle contact from motion optimization. Thereby, we improve planning efficiency by focusing computation on promising contact locations. Second, we use a hybrid approach for perception and state estimation that combines neural networks with a physically meaningful state representation. In simulation and real-world experiments on the task of planar pushing, we show that our method is more efficient and achieves a higher manipulation accuracy than previous vision-based approaches. △ Less

Submitted 17 April, 2020; v1 submitted 8 November, 2019; originally announced November 2019.

Comments: accepted at ICRA 2020

arXiv:1910.00618 [pdf, other]

Omnipush: accurate, diverse, real-world dataset of pushing dynamics with RGB-D video

Authors: Maria Bauza, Ferran Alet, Yen-Chen Lin, Tomas Lozano-Perez, Leslie P. Kaelbling, Phillip Isola, Alberto Rodriguez

Abstract: Pushing is a fundamental robotic skill. Existing work has shown how to exploit models of pushing to achieve a variety of tasks, including grasping under uncertainty, in-hand manipulation and clearing clutter. Such models, however, are approximate, which limits their applicability. Learning-based methods can reason directly from raw sensory data with accuracy, and have the potential to generalize t… ▽ More Pushing is a fundamental robotic skill. Existing work has shown how to exploit models of pushing to achieve a variety of tasks, including grasping under uncertainty, in-hand manipulation and clearing clutter. Such models, however, are approximate, which limits their applicability. Learning-based methods can reason directly from raw sensory data with accuracy, and have the potential to generalize to a wider diversity of scenarios. However, developing and testing such methods requires rich-enough datasets. In this paper we introduce Omnipush, a dataset with high variety of planar pushing behavior. In particular, we provide 250 pushes for each of 250 objects, all recorded with RGB-D and a high precision tracking system. The objects are constructed so as to systematically explore key factors that affect pushing -- the shape of the object and its mass distribution -- which have not been broadly explored in previous datasets, and allow to study generalization in model learning. Omnipush includes a benchmark for meta-learning dynamic models, which requires algorithms that make good predictions and estimate their own uncertainty. We also provide an RGB video prediction benchmark and propose other relevant tasks that can be suited with this dataset. Data and code are available at \url{https://web.mit.edu/mcube/omnipush-dataset/}. △ Less

Submitted 19 August, 2021; v1 submitted 1 October, 2019; originally announced October 2019.

Comments: IROS 2019, 8 pages, 7 figures

arXiv:1904.10944 [pdf, other]

Tactile Mapping and Localization from High-Resolution Tactile Imprints

Authors: Maria Bauza, Oleguer Canal, Alberto Rodriguez

Abstract: This work studies the problem of shape reconstruction and object localization using a vision-based tactile sensor, GelSlim. The main contributions are the recovery of local shapes from contact, an approach to reconstruct the tactile shape of objects from tactile imprints, and an accurate method for object localization of previously reconstructed objects. The algorithms can be applied to a large va… ▽ More This work studies the problem of shape reconstruction and object localization using a vision-based tactile sensor, GelSlim. The main contributions are the recovery of local shapes from contact, an approach to reconstruct the tactile shape of objects from tactile imprints, and an accurate method for object localization of previously reconstructed objects. The algorithms can be applied to a large variety of 3D objects and provide accurate tactile feedback for in-hand manipulation. Results show that by exploiting the dense tactile information we can reconstruct the shape of objects with high accuracy and do on-line object identification and localization, opening the door to reactive manipulation guided by tactile sensing. We provide videos and supplemental information in the project's website http://web.mit.edu/mcube/research/tactile_localization.html. △ Less

Submitted 11 July, 2019; v1 submitted 24 April, 2019; originally announced April 2019.

Comments: ICRA 2019, 7 pages, 7 figures. Website: http://web.mit.edu/mcube/research/tactile_localization.html Video: https://youtu.be/uMkspjmDbqs

arXiv:1904.09019 [pdf, other]

Graph Element Networks: adaptive, structured computation and memory

Authors: Ferran Alet, Adarsh K. Jeewajee, Maria Bauza, Alberto Rodriguez, Tomas Lozano-Perez, Leslie Pack Kaelbling

Abstract: We explore the use of graph neural networks (GNNs) to model spatial processes in which there is no a priori graphical structure. Similar to finite element analysis, we assign nodes of a GNN to spatial locations and use a computational process defined on the graph to model the relationship between an initial function defined over a space and a resulting function in the same space. We use GNNs as a… ▽ More We explore the use of graph neural networks (GNNs) to model spatial processes in which there is no a priori graphical structure. Similar to finite element analysis, we assign nodes of a GNN to spatial locations and use a computational process defined on the graph to model the relationship between an initial function defined over a space and a resulting function in the same space. We use GNNs as a computational substrate, and show that the locations of the nodes in space as well as their connectivity can be optimized to focus on the most complex parts of the space. Moreover, this representational strategy allows the learned input-output relationship to generalize over the size of the underlying space and run the same model at different levels of precision, trading computation for accuracy. We demonstrate this method on a traditional PDE problem, a physical prediction problem from robotics, and learning to predict scene images from novel viewpoints. △ Less

Submitted 17 November, 2019; v1 submitted 18 April, 2019; originally announced April 2019.

Comments: Accepted to ICML 2019

arXiv:1904.06580 [pdf, other]

Combining Physical Simulators and Object-Based Networks for Control

Authors: Anurag Ajay, Maria Bauza, Jiajun Wu, Nima Fazeli, Joshua B. Tenenbaum, Alberto Rodriguez, Leslie P. Kaelbling

Abstract: Physics engines play an important role in robot planning and control; however, many real-world control problems involve complex contact dynamics that cannot be characterized analytically. Most physics engines therefore employ . approximations that lead to a loss in precision. In this paper, we propose a hybrid dynamics model, simulator-augmented interaction networks (SAIN), combining a physics eng… ▽ More Physics engines play an important role in robot planning and control; however, many real-world control problems involve complex contact dynamics that cannot be characterized analytically. Most physics engines therefore employ . approximations that lead to a loss in precision. In this paper, we propose a hybrid dynamics model, simulator-augmented interaction networks (SAIN), combining a physics engine with an object-based neural network for dynamics modeling. Compared with existing models that are purely analytical or purely data-driven, our hybrid model captures the dynamics of interacting objects in a more accurate and data-efficient manner.Experiments both in simulation and on a real robot suggest that it also leads to better performance when used in complex control tasks. Finally, we show that our model generalizes to novel environments with varying object shapes and materials. △ Less

Submitted 13 April, 2019; originally announced April 2019.

Comments: ICRA 2019; Project page: http://sain.csail.mit.edu

arXiv:1812.07768 [pdf, other]

Modular meta-learning in abstract graph networks for combinatorial generalization

Authors: Ferran Alet, Maria Bauza, Alberto Rodriguez, Tomas Lozano-Perez, Leslie P. Kaelbling

Abstract: Modular meta-learning is a new framework that generalizes to unseen datasets by combining a small set of neural modules in different ways. In this work we propose abstract graph networks: using graphs as abstractions of a system's subparts without a fixed assignment of nodes to system subparts, for which we would need supervision. We combine this idea with modular meta-learning to get a flexible f… ▽ More Modular meta-learning is a new framework that generalizes to unseen datasets by combining a small set of neural modules in different ways. In this work we propose abstract graph networks: using graphs as abstractions of a system's subparts without a fixed assignment of nodes to system subparts, for which we would need supervision. We combine this idea with modular meta-learning to get a flexible framework with combinatorial generalization to new tasks built in. We then use it to model the pushing of arbitrarily shaped objects from little or no training data. △ Less

Submitted 19 December, 2018; originally announced December 2018.

Comments: Presented at NeurIPS meta-learning workshop 2018

arXiv:1808.03246 [pdf, other]

Augmenting Physical Simulators with Stochastic Neural Networks: Case Study of Planar Pushing and Bouncing

Authors: Anurag Ajay, Jiajun Wu, Nima Fazeli, Maria Bauza, Leslie P. Kaelbling, Joshua B. Tenenbaum, Alberto Rodriguez

Abstract: An efficient, generalizable physical simulator with universal uncertainty estimates has wide applications in robot state estimation, planning, and control. In this paper, we build such a simulator for two scenarios, planar pushing and ball bouncing, by augmenting an analytical rigid-body simulator with a neural network that learns to model uncertainty as residuals. Combining symbolic, deterministi… ▽ More An efficient, generalizable physical simulator with universal uncertainty estimates has wide applications in robot state estimation, planning, and control. In this paper, we build such a simulator for two scenarios, planar pushing and ball bouncing, by augmenting an analytical rigid-body simulator with a neural network that learns to model uncertainty as residuals. Combining symbolic, deterministic simulators with learnable, stochastic neural nets provides us with expressiveness, efficiency, and generalizability simultaneously. Our model outperforms both purely analytical and purely learned simulators consistently on real, standard benchmarks. Compared with methods that model uncertainty using Gaussian processes, our model runs much faster, generalizes better to new object shapes, and is able to characterize the complex distribution of object trajectories. △ Less

Submitted 9 August, 2018; originally announced August 2018.

Comments: IROS 2018

arXiv:1807.09904 [pdf, other]

A Data-Efficient Approach to Precise and Controlled Pushing

Authors: Maria Bauza, Francois R. Hogan, Alberto Rodriguez

Abstract: Decades of research in control theory have shown that simple controllers, when provided with timely feedback, can control complex systems. Pushing is an example of a complex mechanical system that is difficult to model accurately due to unknown system parameters such as coefficients of friction and pressure distributions. In this paper, we explore the data-complexity required for controlling, rath… ▽ More Decades of research in control theory have shown that simple controllers, when provided with timely feedback, can control complex systems. Pushing is an example of a complex mechanical system that is difficult to model accurately due to unknown system parameters such as coefficients of friction and pressure distributions. In this paper, we explore the data-complexity required for controlling, rather than modeling, such a system. Results show that a model-based control approach, where the dynamical model is learned from data, is capable of performing complex pushing trajectories with a minimal amount of training data (10 data points). The dynamics of pushing interactions are modeled using a Gaussian process (GP) and are leveraged within a model predictive control approach that linearizes the GP and imposes actuator and task constraints for a planar manipulation task. △ Less

Submitted 9 October, 2018; v1 submitted 25 July, 2018; originally announced July 2018.

Comments: Maria Bauza and Francois R. Hogan contributed equally to this work. 10 pages, 5 figures

Journal ref: CoRL 2018

arXiv:1803.01940 [pdf, other]

Tactile Regrasp: Grasp Adjustments via Simulated Tactile Transformations

Authors: Francois R. Hogan, Maria Bauza, Oleguer Canal, Elliott Donlon, Alberto Rodriguez

Abstract: This paper presents a novel regrasp control policy that makes use of tactile sensing to plan local grasp adjustments. Our approach determines regrasp actions by virtually searching for local transformations of tactile measurements that improve the quality of the grasp. First, we construct a tactile-based grasp quality metric using a deep convolutional neural network trained on over 2800 grasps. Th… ▽ More This paper presents a novel regrasp control policy that makes use of tactile sensing to plan local grasp adjustments. Our approach determines regrasp actions by virtually searching for local transformations of tactile measurements that improve the quality of the grasp. First, we construct a tactile-based grasp quality metric using a deep convolutional neural network trained on over 2800 grasps. The quality of each grasp, a continuous value between 0 and 1, is determined experimentally by measuring its resistance to external perturbations. Second, we simulate the tactile imprints associated with robot motions relative to the initial grasp by performing rigid-body transformations of the given tactile measurements. The newly generated tactile imprints are evaluated with the learned grasp quality network and the regrasp action is chosen to maximize the grasp quality. Results show that the grasp quality network can predict the outcome of grasps with an average accuracy of 85% on known objects and 75% on a cross validation set of 12 objects. The regrasp control policy improves the success rate of grasp actions by an average relative increase of 70% on a test set of 8 objects. △ Less

Submitted 9 October, 2018; v1 submitted 5 March, 2018; originally announced March 2018.

Comments: Francois R. Hogan and Maria Bauza contributed equally to this work. 8 pages, 7 figures

Journal ref: IROS 2018

arXiv:1710.01330 [pdf, other]

Robotic Pick-and-Place of Novel Objects in Clutter with Multi-Affordance Grasping and Cross-Domain Image Matching

Authors: Andy Zeng, Shuran Song, Kuan-Ting Yu, Elliott Donlon, Francois R. Hogan, Maria Bauza, Daolin Ma, Orion Taylor, Melody Liu, Eudald Romo, Nima Fazeli, Ferran Alet, Nikhil Chavan Dafle, Rachel Holladay, Isabella Morona, Prem Qu Nair, Druck Green, Ian Taylor, Weber Liu, Thomas Funkhouser, Alberto Rodriguez

Abstract: This paper presents a robotic pick-and-place system that is capable of grasping and recognizing both known and novel objects in cluttered environments. The key new feature of the system is that it handles a wide range of object categories without needing any task-specific training data for novel objects. To achieve this, it first uses a category-agnostic affordance prediction algorithm to select a… ▽ More This paper presents a robotic pick-and-place system that is capable of grasping and recognizing both known and novel objects in cluttered environments. The key new feature of the system is that it handles a wide range of object categories without needing any task-specific training data for novel objects. To achieve this, it first uses a category-agnostic affordance prediction algorithm to select and execute among four different grasping primitive behaviors. It then recognizes picked objects with a cross-domain image classification framework that matches observed images to product images. Since product images are readily available for a wide range of objects (e.g., from the web), the system works out-of-the-box for novel objects without requiring any additional training data. Exhaustive experimental results demonstrate that our multi-affordance grasping achieves high success rates for a wide variety of objects in clutter, and our recognition algorithm achieves high accuracy for both known and novel grasped objects. The approach was part of the MIT-Princeton Team system that took 1st place in the stowing task at the 2017 Amazon Robotics Challenge. All code, datasets, and pre-trained models are available online at http://arc.cs.princeton.edu △ Less

Submitted 30 May, 2020; v1 submitted 3 October, 2017; originally announced October 2017.

Comments: Project webpage: http://arc.cs.princeton.edu Summary video: https://youtu.be/6fG7zwGfIkI

arXiv:1709.08120 [pdf, other]

GP-SUM. Gaussian Processes Filtering of non-Gaussian Beliefs

Authors: Maria Bauza, Alberto Rodriguez

Abstract: This work studies the problem of stochastic dynamic filtering and state propagation with complex beliefs. The main contribution is GP-SUM, a filtering algorithm tailored to dynamic systems and observation models expressed as Gaussian Processes (GP), and to states represented as a weighted sum of Gaussians. The key attribute of GP-SUM is that it does not rely on linearizations of the dynamic or obs… ▽ More This work studies the problem of stochastic dynamic filtering and state propagation with complex beliefs. The main contribution is GP-SUM, a filtering algorithm tailored to dynamic systems and observation models expressed as Gaussian Processes (GP), and to states represented as a weighted sum of Gaussians. The key attribute of GP-SUM is that it does not rely on linearizations of the dynamic or observation models, or on unimodal Gaussian approximations of the belief, hence enables tracking complex state distributions. The algorithm can be seen as a combination of a sampling-based filter with a probabilistic Bayes filter. On the one hand, GP-SUM operates by sampling the state distribution and propagating each sample through the dynamic system and observation models. On the other hand, it achieves effective sampling and accurate probabilistic propagation by relying on the GP form of the system, and the sum-of-Gaussian form of the belief. We show that GP-SUM outperforms several GP-Bayes and Particle Filters on a standard benchmark. We also demonstrate its use in a pushing task, predicting with experimental accuracy the naturally occurring non-Gaussian distributions. △ Less

Submitted 30 January, 2019; v1 submitted 23 September, 2017; originally announced September 2017.

Comments: WAFR 2018, 16 pages, 7 figures

arXiv:1704.03033 [pdf, other]

A probabilistic data-driven model for planar pushing

Authors: Maria Bauza, Alberto Rodriguez

Abstract: This paper presents a data-driven approach to model planar pushing interaction to predict both the most likely outcome of a push and its expected variability. The learned models rely on a variation of Gaussian processes with input-dependent noise called Variational Heteroscedastic Gaussian processes (VHGP) that capture the mean and variance of a stochastic function. We show that we can learn accur… ▽ More This paper presents a data-driven approach to model planar pushing interaction to predict both the most likely outcome of a push and its expected variability. The learned models rely on a variation of Gaussian processes with input-dependent noise called Variational Heteroscedastic Gaussian processes (VHGP) that capture the mean and variance of a stochastic function. We show that we can learn accurate models that outperform analytical models after less than 100 samples and saturate in performance with less than 1000 samples. We validate the results against a collected dataset of repeated trajectories, and use the learned models to study questions such as the nature of the variability in pushing, and the validity of the quasi-static assumption. △ Less

Submitted 23 September, 2017; v1 submitted 10 April, 2017; originally announced April 2017.

Comments: 8 pages, 11 figures, ICRA 2017

arXiv:1604.04038 [pdf, other]

More than a Million Ways to Be Pushed: A High-Fidelity Experimental Dataset of Planar Pushing

Authors: Kuan-Ting Yu, Maria Bauza, Nima Fazeli, Alberto Rodriguez

Abstract: Pushing is a motion primitive useful to handle objects that are too large, too heavy, or too cluttered to be grasped. It is at the core of much of robotic manipulation, in particular when physical interaction is involved. It seems reasonable then to wish for robots to understand how pushed objects move. In reality, however, robots often rely on approximations which yield models that are computab… ▽ More Pushing is a motion primitive useful to handle objects that are too large, too heavy, or too cluttered to be grasped. It is at the core of much of robotic manipulation, in particular when physical interaction is involved. It seems reasonable then to wish for robots to understand how pushed objects move. In reality, however, robots often rely on approximations which yield models that are computable, but also restricted and inaccurate. Just how close are those models? How reasonable are the assumptions they are based on? To help answer these questions, and to get a better experimental understanding of pushing, we present a comprehensive and high-fidelity dataset of planar pushing experiments. The dataset contains timestamped poses of a circular pusher and a pushed object, as well as forces at the interaction.We vary the push interaction in 6 dimensions: surface material, shape of the pushed object, contact position, pushing direction, pushing speed, and pushing acceleration. An industrial robot automates the data capturing along precisely controlled position-velocity-acceleration trajectories of the pusher, which give dense samples of positions and forces of uniform quality. We finish the paper by characterizing the variability of friction, and evaluating the most common assumptions and simplifications made by models of frictional pushing in robotics. △ Less

Submitted 3 August, 2016; v1 submitted 14 April, 2016; originally announced April 2016.

Comments: 8 pages, 10 figures

Journal ref: IROS 2016

Showing 1–22 of 22 results for author: Bauza, M