-
Robot Instance Segmentation with Few Annotations for Grasping
Authors:
Moshe Kimhi,
David Vainshtein,
Chaim Baskin,
Dotan Di Castro
Abstract:
The ability of robots to manipulate objects relies heavily on their aptitude for visual perception. In domains characterized by cluttered scenes and high object variability, most methods call for vast labeled datasets, laboriously hand-annotated, with the aim of training capable models. Once deployed, the challenge of generalizing to unfamiliar objects implies that the model must evolve alongside…
▽ More
The ability of robots to manipulate objects relies heavily on their aptitude for visual perception. In domains characterized by cluttered scenes and high object variability, most methods call for vast labeled datasets, laboriously hand-annotated, with the aim of training capable models. Once deployed, the challenge of generalizing to unfamiliar objects implies that the model must evolve alongside its domain. To address this, we propose a novel framework that combines Semi-Supervised Learning (SSL) with Learning Through Interaction (LTI), allowing a model to learn by observing scene alterations and leverage visual consistency despite temporal gaps without requiring curated data of interaction sequences. As a result, our approach exploits partially annotated data through self-supervision and incorporates temporal context using pseudo-sequences generated from unlabeled still images. We validate our method on two common benchmarks, ARMBench mix-object-tote and OCID, where it achieves state-of-the-art performance. Notably, on ARMBench, we attain an $\text{AP}_{50}$ of $86.37$, almost a $20\%$ improvement over existing work, and obtain remarkable results in scenarios with extremely low annotation, achieving an $\text{AP}_{50}$ score of $84.89$ with just $1 \%$ of annotated data compared to $72$ presented in ARMBench on the fully annotated counterpart.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Towards Natural Language-Driven Assembly Using Foundation Models
Authors:
Omkar Joglekar,
Tal Lancewicki,
Shir Kozlovsky,
Vladimir Tchuiev,
Zohar Feldman,
Dotan Di Castro
Abstract:
Large Language Models (LLMs) and strong vision models have enabled rapid research and development in the field of Vision-Language-Action models that enable robotic control. The main objective of these methods is to develop a generalist policy that can control robots with various embodiments. However, in industrial robotic applications such as automated assembly and disassembly, some tasks, such as…
▽ More
Large Language Models (LLMs) and strong vision models have enabled rapid research and development in the field of Vision-Language-Action models that enable robotic control. The main objective of these methods is to develop a generalist policy that can control robots with various embodiments. However, in industrial robotic applications such as automated assembly and disassembly, some tasks, such as insertion, demand greater accuracy and involve intricate factors like contact engagement, friction handling, and refined motor skills. Implementing these skills using a generalist policy is challenging because these policies might integrate further sensory data, including force or torque measurements, for enhanced precision. In our method, we present a global control policy based on LLMs that can transfer the control policy to a finite set of skills that are specifically trained to perform high-precision tasks through dynamic context switching. The integration of LLMs into this framework underscores their significance in not only interpreting and processing language inputs but also in enriching the control mechanisms for diverse and intricate robotic operations.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
Radar Spectra-Language Model for Automotive Scene Parsing
Authors:
Mariia Pushkareva,
Yuri Feldman,
Csaba Domokos,
Kilian Rambach,
Dotan Di Castro
Abstract:
Radar sensors are low cost, long-range, and weather-resilient. Therefore, they are widely used for driver assistance functions, and are expected to be crucial for the success of autonomous driving in the future. In many perception tasks only pre-processed radar point clouds are considered. In contrast, radar spectra are a raw form of radar measurements and contain more information than radar point…
▽ More
Radar sensors are low cost, long-range, and weather-resilient. Therefore, they are widely used for driver assistance functions, and are expected to be crucial for the success of autonomous driving in the future. In many perception tasks only pre-processed radar point clouds are considered. In contrast, radar spectra are a raw form of radar measurements and contain more information than radar point clouds. However, radar spectra are rather difficult to interpret. In this work, we aim to explore the semantic information contained in spectra in the context of automated driving, thereby moving towards better interpretability of radar spectra. To this end, we create a radar spectra-language model, allowing us to query radar spectra measurements for the presence of scene elements using free text. We overcome the scarcity of radar spectra data by matching the embedding space of an existing vision-language model (VLM). Finally, we explore the benefit of the learned representation for scene parsing, and obtain improvements in free space segmentation and object detection merely by injecting the spectra embedding into a baseline model.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
ISCUTE: Instance Segmentation of Cables Using Text Embedding
Authors:
Shir Kozlovsky,
Omkar Joglekar,
Dotan Di Castro
Abstract:
In the field of robotics and automation, conventional object recognition and instance segmentation methods face a formidable challenge when it comes to perceiving Deformable Linear Objects (DLOs) like wires, cables, and flexible tubes. This challenge arises primarily from the lack of distinct attributes such as shape, color, and texture, which calls for tailored solutions to achieve precise identi…
▽ More
In the field of robotics and automation, conventional object recognition and instance segmentation methods face a formidable challenge when it comes to perceiving Deformable Linear Objects (DLOs) like wires, cables, and flexible tubes. This challenge arises primarily from the lack of distinct attributes such as shape, color, and texture, which calls for tailored solutions to achieve precise identification. In this work, we propose a foundation model-based DLO instance segmentation technique that is text-promptable and user-friendly. Specifically, our approach combines the text-conditioned semantic segmentation capabilities of CLIPSeg model with the zero-shot generalization capabilities of Segment Anything Model (SAM). We show that our method exceeds SOTA performance on DLO instance segmentation, achieving a mIoU of $91.21\%$. We also introduce a rich and diverse DLO-specific dataset for instance segmentation.
△ Less
Submitted 27 February, 2024; v1 submitted 19 February, 2024;
originally announced February 2024.
-
Generative Modeling of Graphs via Joint Diffusion of Node and Edge Attributes
Authors:
Nimrod Berman,
Eitan Kosman,
Dotan Di Castro,
Omri Azencot
Abstract:
Graph generation is integral to various engineering and scientific disciplines. Nevertheless, existing methodologies tend to overlook the generation of edge attributes. However, we identify critical applications where edge attributes are essential, making prior methods potentially unsuitable in such contexts. Moreover, while trivial adaptations are available, empirical investigations reveal their…
▽ More
Graph generation is integral to various engineering and scientific disciplines. Nevertheless, existing methodologies tend to overlook the generation of edge attributes. However, we identify critical applications where edge attributes are essential, making prior methods potentially unsuitable in such contexts. Moreover, while trivial adaptations are available, empirical investigations reveal their limited efficacy as they do not properly model the interplay among graph components. To address this, we propose a joint score-based model of nodes and edges for graph generation that considers all graph components. Our approach offers two key novelties: (i) node and edge attributes are combined in an attention module that generates samples based on the two ingredients; and (ii) node, edge and adjacency information are mutually dependent during the graph diffusion process. We evaluate our method on challenging benchmarks involving real-world and synthetic datasets in which edge features are crucial. Additionally, we introduce a new synthetic dataset that incorporates edge values. Furthermore, we propose a novel application that greatly benefits from the method due to its nature: the generation of traffic scenes represented as graphs. Our method outperforms other graph generation methods, demonstrating a significant advantage in edge-related measures.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
An Axiomatic Approach to Model-Agnostic Concept Explanations
Authors:
Zhili Feng,
Michal Moshkovitz,
Dotan Di Castro,
J. Zico Kolter
Abstract:
Concept explanation is a popular approach for examining how human-interpretable concepts impact the predictions of a model. However, most existing methods for concept explanations are tailored to specific models. To address this issue, this paper focuses on model-agnostic measures. Specifically, we propose an approach to concept explanations that satisfy three natural axioms: linearity, recursivit…
▽ More
Concept explanation is a popular approach for examining how human-interpretable concepts impact the predictions of a model. However, most existing methods for concept explanations are tailored to specific models. To address this issue, this paper focuses on model-agnostic measures. Specifically, we propose an approach to concept explanations that satisfy three natural axioms: linearity, recursivity, and similarity. We then establish connections with previous concept explanation methods, offering insight into their varying semantic meanings. Experimentally, we demonstrate the utility of the new method by applying it in different scenarios: for model selection, optimizer selection, and model improvement using a kind of prompt editing for zero-shot vision language models.
△ Less
Submitted 12 January, 2024;
originally announced January 2024.
-
Offline Skill Graph (OSG): A Framework for Learning and Planning using Offline Reinforcement Learning Skills
Authors:
Ben-ya Halevy,
Yehudit Aperstein,
Dotan Di Castro
Abstract:
Reinforcement Learning has received wide interest due to its success in competitive games. Yet, its adoption in everyday applications is limited (e.g. industrial, home, healthcare, etc.). In this paper, we address this limitation by presenting a framework for planning over offline skills and solving complex tasks in real-world environments. Our framework is comprised of three modules that together…
▽ More
Reinforcement Learning has received wide interest due to its success in competitive games. Yet, its adoption in everyday applications is limited (e.g. industrial, home, healthcare, etc.). In this paper, we address this limitation by presenting a framework for planning over offline skills and solving complex tasks in real-world environments. Our framework is comprised of three modules that together enable the agent to learn from previously collected data and generalize over it to solve long-horizon tasks. We demonstrate our approach by testing it on a robotic arm that is required to solve complex tasks.
△ Less
Submitted 23 June, 2023;
originally announced June 2023.
-
CONFIDE: Contextual Finite Differences Modelling of PDEs
Authors:
Ori Linial,
Orly Avner,
Dotan Di Castro
Abstract:
We introduce a method for inferring an explicit PDE from a data sample generated by previously unseen dynamics, based on a learned context. The training phase integrates knowledge of the form of the equation with a differential scheme, while the inference phase yields a PDE that fits the data sample and enables both signal prediction and data explanation. We include results of extensive experiment…
▽ More
We introduce a method for inferring an explicit PDE from a data sample generated by previously unseen dynamics, based on a learned context. The training phase integrates knowledge of the form of the equation with a differential scheme, while the inference phase yields a PDE that fits the data sample and enables both signal prediction and data explanation. We include results of extensive experimentation, comparing our method to SOTA approaches, together with ablation studies that examine different flavors of our solution.
△ Less
Submitted 7 June, 2024; v1 submitted 28 March, 2023;
originally announced March 2023.
-
AG2U -- Autonomous Grading Under Uncertainties
Authors:
Yakov Miron,
Yuval Goldfracht,
Chana Ross,
Dotan Di Castro,
Itzik Klein
Abstract:
Surface grading, the process of leveling an uneven area containing pre-dumped sand piles, is an important task in the construction site pipeline. This labour-intensive process is often carried out by a dozer, a key machinery tool at any construction site. Current attempts to automate surface grading assume perfect localization. However, in real-world scenarios, this assumption fails, as agents are…
▽ More
Surface grading, the process of leveling an uneven area containing pre-dumped sand piles, is an important task in the construction site pipeline. This labour-intensive process is often carried out by a dozer, a key machinery tool at any construction site. Current attempts to automate surface grading assume perfect localization. However, in real-world scenarios, this assumption fails, as agents are presented with imperfect perception, which leads to degraded performance. In this work, we address the problem of autonomous grading under uncertainties. First, we implement a simulation and a scaled real-world prototype environment to enable rapid policy exploration and evaluation in this setting. Second, we formalize the problem as a partially observable markov decision process and train an agent capable of handling such uncertainties. We show, through rigorous experiments, that an agent trained under perfect localization will suffer degraded performance when presented with localization uncertainties. However, an agent trained using our method will develop a more robust policy for addressing such errors and, consequently, exhibit a better grading performance.
△ Less
Submitted 4 August, 2022;
originally announced August 2022.
-
GraphVid: It Only Takes a Few Nodes to Understand a Video
Authors:
Eitan Kosman,
Dotan Di Castro
Abstract:
We propose a concise representation of videos that encode perceptually meaningful features into graphs. With this representation, we aim to leverage the large amount of redundancies in videos and save computations. First, we construct superpixel-based graph representations of videos by considering superpixels as graph nodes and create spatial and temporal connections between adjacent superpixels.…
▽ More
We propose a concise representation of videos that encode perceptually meaningful features into graphs. With this representation, we aim to leverage the large amount of redundancies in videos and save computations. First, we construct superpixel-based graph representations of videos by considering superpixels as graph nodes and create spatial and temporal connections between adjacent superpixels. Then, we leverage Graph Convolutional Networks to process this representation and predict the desired output. As a result, we are able to train models with much fewer parameters, which translates into short training periods and a reduction in computation resource requirements. A comprehensive experimental study on the publicly available datasets Kinetics-400 and Charades shows that the proposed method is highly cost-effective and uses limited commodity hardware during training and inference. It reduces the computational requirements 10-fold while achieving results that are comparable to state-of-the-art methods. We believe that the proposed approach is a promising direction that could open the door to solving video understanding more efficiently and enable more resource limited users to thrive in this research field.
△ Less
Submitted 20 July, 2022; v1 submitted 4 July, 2022;
originally announced July 2022.
-
Towards Autonomous Grading In The Real World
Authors:
Yakov Miron,
Chana Ross,
Yuval Goldfracht,
Chen Tessler,
Dotan Di Castro
Abstract:
In this work, we aim to tackle the problem of autonomous grading, where a dozer is required to flatten an uneven area. In addition, we explore methods for bridging the gap between a simulated environment and real scenarios. We design both a realistic physical simulation and a scaled real prototype environment mimicking the real dozer dynamics and sensory information. We establish heuristics and le…
▽ More
In this work, we aim to tackle the problem of autonomous grading, where a dozer is required to flatten an uneven area. In addition, we explore methods for bridging the gap between a simulated environment and real scenarios. We design both a realistic physical simulation and a scaled real prototype environment mimicking the real dozer dynamics and sensory information. We establish heuristics and learning strategies in order to solve the problem. Through extensive experimentation, we show that although heuristics are capable of tackling the problem in a clean and noise-free simulated environment, they fail catastrophically when facing real world scenarios. As the heuristics are capable of successfully solving the task in the simulated environment, we show they can be leveraged to guide a learning agent which can generalize and solve the task both in simulation and in a scaled prototype environment.
△ Less
Submitted 25 July, 2022; v1 submitted 13 June, 2022;
originally announced June 2022.
-
InsertionNet 2.0: Minimal Contact Multi-Step Insertion Using Multimodal Multiview Sensory Input
Authors:
Oren Spector,
Vladimir Tchuiev,
Dotan Di Castro
Abstract:
We address the problem of devising the means for a robot to rapidly and safely learn insertion skills with just a few human interventions and without hand-crafted rewards or demonstrations. Our InsertionNet version 2.0 provides an improved technique to robustly cope with a wide range of use-cases featuring different shapes, colors, initial poses, etc. In particular, we present a regression-based m…
▽ More
We address the problem of devising the means for a robot to rapidly and safely learn insertion skills with just a few human interventions and without hand-crafted rewards or demonstrations. Our InsertionNet version 2.0 provides an improved technique to robustly cope with a wide range of use-cases featuring different shapes, colors, initial poses, etc. In particular, we present a regression-based method based on multimodal input from stereo perception and force, augmented with contrastive learning for the efficient learning of valuable features. In addition, we introduce a one-shot learning technique for insertion, which relies on a relation network scheme to better exploit the collected data and to support multi-step insertion tasks. Our method improves on the results obtained with the original InsertionNet, achieving an almost perfect score (above 97.5$\%$ on 200 trials) in 16 real-life insertion tasks while minimizing the execution time and contact during insertion. We further demonstrate our method's ability to tackle a real-life 3-step insertion task and perfectly solve an unseen insertion task without learning.
△ Less
Submitted 2 March, 2022;
originally announced March 2022.
-
AGPNet -- Autonomous Grading Policy Network
Authors:
Chana Ross,
Yakov Miron,
Yuval Goldfracht,
Dotan Di Castro
Abstract:
In this work, we establish heuristics and learning strategies for the autonomous control of a dozer grading an uneven area studded with sand piles. We formalize the problem as a Markov Decision Process, design a simulation which demonstrates agent-environment interactions and finally compare our simulator to a real dozer prototype. We use methods from reinforcement learning, behavior cloning and c…
▽ More
In this work, we establish heuristics and learning strategies for the autonomous control of a dozer grading an uneven area studded with sand piles. We formalize the problem as a Markov Decision Process, design a simulation which demonstrates agent-environment interactions and finally compare our simulator to a real dozer prototype. We use methods from reinforcement learning, behavior cloning and contrastive learning to train a hybrid policy. Our trained agent, AGPNet, reaches human-level performance and outperforms current state-of-the-art machine learning methods for the autonomous grading task. In addition, our agent is capable of generalizing from random scenarios to unseen real world problems.
△ Less
Submitted 20 December, 2021;
originally announced December 2021.
-
LSP : Acceleration and Regularization of Graph Neural Networks via Locality Sensitive Pruning of Graphs
Authors:
Eitan Kosman,
Joel Oren,
Dotan Di Castro
Abstract:
Graph Neural Networks (GNNs) have emerged as highly successful tools for graph-related tasks. However, real-world problems involve very large graphs, and the compute resources needed to fit GNNs to those problems grow rapidly. Moreover, the noisy nature and size of real-world graphs cause GNNs to over-fit if not regularized properly. Surprisingly, recent works show that large graphs often involve…
▽ More
Graph Neural Networks (GNNs) have emerged as highly successful tools for graph-related tasks. However, real-world problems involve very large graphs, and the compute resources needed to fit GNNs to those problems grow rapidly. Moreover, the noisy nature and size of real-world graphs cause GNNs to over-fit if not regularized properly. Surprisingly, recent works show that large graphs often involve many redundant components that can be removed without compromising the performance too much. This includes node or edge removals during inference through GNNs layers or as a pre-processing step that sparsifies the input graph. This intriguing phenomenon enables the development of state-of-the-art GNNs that are both efficient and accurate. In this paper, we take a further step towards demystifying this phenomenon and propose a systematic method called Locality-Sensitive Pruning (LSP) for graph pruning based on Locality-Sensitive Hashing. We aim to sparsify a graph so that similar local environments of the original graph result in similar environments in the resulting sparsified graph, which is an essential feature for graph-related tasks. To justify the application of pruning based on local graph properties, we exemplify the advantage of applying pruning based on locality properties over other pruning strategies in various scenarios. Extensive experiments on synthetic and real-world datasets demonstrate the superiority of LSP, which removes a significant amount of edges from large graphs without compromising the performance, accompanied by a considerable acceleration.
△ Less
Submitted 10 November, 2021;
originally announced November 2021.
-
A Hybrid Approach for Learning to Shift and Grasp with Elaborate Motion Primitives
Authors:
Zohar Feldman,
Hanna Ziesche,
Ngo Anh Vien,
Dotan Di Castro
Abstract:
Many possible fields of application of robots in real world settings hinge on the ability of robots to grasp objects. As a result, robot grasping has been an active field of research for many years. With our publication we contribute to the endeavor of enabling robots to grasp, with a particular focus on bin picking applications. Bin picking is especially challenging due to the often cluttered and…
▽ More
Many possible fields of application of robots in real world settings hinge on the ability of robots to grasp objects. As a result, robot grasping has been an active field of research for many years. With our publication we contribute to the endeavor of enabling robots to grasp, with a particular focus on bin picking applications. Bin picking is especially challenging due to the often cluttered and unstructured arrangement of objects and the often limited graspability of objects by simple top down grasps. To tackle these challenges, we propose a fully self-supervised reinforcement learning approach based on a hybrid discrete-continuous adaptation of soft actor-critic (SAC). We employ parametrized motion primitives for pushing and grasping movements in order to enable a flexibly adaptable behavior to the difficult setups we consider. Furthermore, we use data augmentation to increase sample efficiency. We demonnstrate our proposed method on challenging picking scenarios in which planar grasp learning or action discretization methods would face a lot of difficulties
△ Less
Submitted 2 November, 2021;
originally announced November 2021.
-
Sim and Real: Better Together
Authors:
Shirli Di Castro Shashua,
Dotan Di Castro,
Shie Mannor
Abstract:
Simulation is used extensively in autonomous systems, particularly in robotic manipulation. By far, the most common approach is to train a controller in simulation, and then use it as an initial starting point for the real system. We demonstrate how to learn simultaneously from both simulation and interaction with the real environment. We propose an algorithm for balancing the large number of samp…
▽ More
Simulation is used extensively in autonomous systems, particularly in robotic manipulation. By far, the most common approach is to train a controller in simulation, and then use it as an initial starting point for the real system. We demonstrate how to learn simultaneously from both simulation and interaction with the real environment. We propose an algorithm for balancing the large number of samples from the high throughput but less accurate simulation and the low-throughput, high-fidelity and costly samples from the real environment. We achieve that by maintaining a replay buffer for each environment the agent interacts with. We analyze such multi-environment interaction theoretically, and provide convergence properties, through a novel theoretical replay buffer analysis. We demonstrate the efficacy of our method on a sim-to-real environment.
△ Less
Submitted 5 October, 2021; v1 submitted 1 October, 2021;
originally announced October 2021.
-
BIDCD -- Bosch Industrial Depth Completion Dataset
Authors:
Adam Botach,
Yuri Feldman,
Yakov Miron,
Yoel Shapiro,
Dotan Di Castro
Abstract:
We introduce BIDCD -- the Bosch Industrial Depth Completion Dataset. BIDCD is a new RGBD dataset of metallic industrial objects, collected with a depth camera mounted on a robotic manipulator. The main purpose of this dataset is to facilitate the training of domain-specific depth completion models, to be used in logistics and manufacturing tasks. We trained a State-of-the-Art depth completion mode…
▽ More
We introduce BIDCD -- the Bosch Industrial Depth Completion Dataset. BIDCD is a new RGBD dataset of metallic industrial objects, collected with a depth camera mounted on a robotic manipulator. The main purpose of this dataset is to facilitate the training of domain-specific depth completion models, to be used in logistics and manufacturing tasks. We trained a State-of-the-Art depth completion model on this dataset, and report the results, setting an initial benchmark. Further, we propose to use this dataset for learning synthetic-to-depth-camera domain adaptation. Modifying synthetic RGBD data to mimic characteristics of real-world depth acquisition could potentially enhance training on synthetic data. For this end, we trained a Generative Adversarial Network (GAN) on a synthetic industrial dataset and our real-world data. Finally, to address geometric distortions in the generated images, we introduce an auxiliary loss that promotes preservation of the original shape. The BIDCD data is publicly available at https://zenodo.org/communities/bidcd.
△ Less
Submitted 4 October, 2021; v1 submitted 10 August, 2021;
originally announced August 2021.
-
Vision-Guided Forecasting -- Visual Context for Multi-Horizon Time Series Forecasting
Authors:
Eitan Kosman,
Dotan Di Castro
Abstract:
Autonomous driving gained huge traction in recent years, due to its potential to change the way we commute. Much effort has been put into trying to estimate the state of a vehicle. Meanwhile, learning to forecast the state of a vehicle ahead introduces new capabilities, such as predicting dangerous situations. Moreover, forecasting brings new supervision opportunities by learning to predict richer…
▽ More
Autonomous driving gained huge traction in recent years, due to its potential to change the way we commute. Much effort has been put into trying to estimate the state of a vehicle. Meanwhile, learning to forecast the state of a vehicle ahead introduces new capabilities, such as predicting dangerous situations. Moreover, forecasting brings new supervision opportunities by learning to predict richer a context, expressed by multiple horizons. Intuitively, a video stream originated from a front-facing camera is necessary because it encodes information about the upcoming road. Besides, historical traces of the vehicle's states give more context. In this paper, we tackle multi-horizon forecasting of vehicle states by fusing the two modalities. We design and experiment with 3 end-to-end architectures that exploit 3D convolutions for visual features extraction and 1D convolutions for features extraction from speed and steering angle traces. To demonstrate the effectiveness of our method, we perform extensive experiments on two publicly available real-world datasets, Comma2k19 and the Udacity challenge. We show that we are able to forecast a vehicle's state to various horizons, while outperforming the current state-of-the-art results on the related task of driving state estimation. We examine the contribution of vision features, and find that a model fed with vision features achieves an error that is 56.6% and 66.9% of the error of a model that doesn't use those features, on the Udacity and Comma2k19 datasets respectively.
△ Less
Submitted 26 September, 2021; v1 submitted 27 July, 2021;
originally announced July 2021.
-
InsertionNet -- A Scalable Solution for Insertion
Authors:
Oren Spector,
Dotan Di Castro
Abstract:
Complicated assembly processes can be described as a sequence of two main activities: grasping and insertion. While general grasping solutions are common in industry, insertion is still only applicable to small subsets of problems, mainly ones involving simple shapes in fixed locations and in which the variations are not taken into consideration. Recently, RL approaches with prior knowledge (e.g.,…
▽ More
Complicated assembly processes can be described as a sequence of two main activities: grasping and insertion. While general grasping solutions are common in industry, insertion is still only applicable to small subsets of problems, mainly ones involving simple shapes in fixed locations and in which the variations are not taken into consideration. Recently, RL approaches with prior knowledge (e.g., LfD or residual policy) have been adopted. However, these approaches might be problematic in contact-rich tasks since interaction might endanger the robot and its equipment. In this paper, we tackled this challenge by formulating the problem as a regression problem. By combining visual and force inputs, we demonstrate that our method can scale to 16 different insertion tasks in less than 10 minutes. The resulting policies are robust to changes in the socket position, orientation or peg color, as well as to small differences in peg shape. Finally, we demonstrate an end-to-end solution for 2 complex assembly tasks with multi-insertion objectives when the assembly board is randomly placed on a table.
△ Less
Submitted 29 April, 2021;
originally announced April 2021.
-
SOLO: Search Online, Learn Offline for Combinatorial Optimization Problems
Authors:
Joel Oren,
Chana Ross,
Maksym Lefarov,
Felix Richter,
Ayal Taitler,
Zohar Feldman,
Christian Daniel,
Dotan Di Castro
Abstract:
We study combinatorial problems with real world applications such as machine scheduling, routing, and assignment. We propose a method that combines Reinforcement Learning (RL) and planning. This method can equally be applied to both the offline, as well as online, variants of the combinatorial problem, in which the problem components (e.g., jobs in scheduling problems) are not known in advance, bu…
▽ More
We study combinatorial problems with real world applications such as machine scheduling, routing, and assignment. We propose a method that combines Reinforcement Learning (RL) and planning. This method can equally be applied to both the offline, as well as online, variants of the combinatorial problem, in which the problem components (e.g., jobs in scheduling problems) are not known in advance, but rather arrive during the decision-making process. Our solution is quite generic, scalable, and leverages distributional knowledge of the problem parameters. We frame the solution process as an MDP, and take a Deep Q-Learning approach wherein states are represented as graphs, thereby allowing our trained policies to deal with arbitrary changes in a principled manner. Though learned policies work well in expectation, small deviations can have substantial negative effects in combinatorial settings. We mitigate these drawbacks by employing our graph-convolutional policies as non-optimal heuristics in a compatible search algorithm, Monte Carlo Tree Search, to significantly improve overall performance. We demonstrate our method on two problems: Machine Scheduling and Capacitated Vehicle Routing. We show that our method outperforms custom-tailored mathematical solvers, state of the art learning-based algorithms, and common heuristics, both in computation time and performance.
△ Less
Submitted 18 May, 2021; v1 submitted 4 April, 2021;
originally announced April 2021.
-
Depth Completion with RGB Prior
Authors:
Yuri Feldman,
Yoel Shapiro,
Dotan Di Castro
Abstract:
Depth cameras are a prominent perception system for robotics, especially when operating in natural unstructured environments. Industrial applications, however, typically involve reflective objects under harsh lighting conditions, a challenging scenario for depth cameras, as it induces numerous reflections and deflections, leading to loss of robustness and deteriorated accuracy. Here, we developed…
▽ More
Depth cameras are a prominent perception system for robotics, especially when operating in natural unstructured environments. Industrial applications, however, typically involve reflective objects under harsh lighting conditions, a challenging scenario for depth cameras, as it induces numerous reflections and deflections, leading to loss of robustness and deteriorated accuracy. Here, we developed a deep model to correct the depth channel in RGBD images, aiming to restore the depth information to the required accuracy. To train the model, we created a novel industrial dataset that we now present to the public. The data was collected with low-end depth cameras and the ground truth depth was generated by multi-view fusion.
△ Less
Submitted 18 August, 2020;
originally announced August 2020.
-
Practical Risk Measures in Reinforcement Learning
Authors:
Dotan Di Castro,
Joel Oren,
Shie Mannor
Abstract:
Practical application of Reinforcement Learning (RL) often involves risk considerations. We study a generalized approximation scheme for risk measures, based on Monte-Carlo simulations, where the risk measures need not necessarily be \emph{coherent}. We demonstrate that, even in simple problems, measures such as the variance of the reward-to-go do not capture the risk in a satisfactory manner. In…
▽ More
Practical application of Reinforcement Learning (RL) often involves risk considerations. We study a generalized approximation scheme for risk measures, based on Monte-Carlo simulations, where the risk measures need not necessarily be \emph{coherent}. We demonstrate that, even in simple problems, measures such as the variance of the reward-to-go do not capture the risk in a satisfactory manner. In addition, we show how a risk measure can be derived from model's realizations. We propose a neural architecture for estimating the risk and suggest the risk critic architecture that can be use to optimize a policy under general risk measures. We conclude our work with experiments that demonstrate the efficacy of our approach.
△ Less
Submitted 22 August, 2019;
originally announced August 2019.
-
One-Shot Session Recommendation Systems with Combinatorial Items
Authors:
Yahel David,
Dotan Di Castro,
Zohar Karnin
Abstract:
In recent years, content recommendation systems in large websites (or \emph{content providers}) capture an increased focus. While the type of content varies, e.g.\ movies, articles, music, advertisements, etc., the high level problem remains the same. Based on knowledge obtained so far on the user, recommend the most desired content. In this paper we present a method to handle the well known user-…
▽ More
In recent years, content recommendation systems in large websites (or \emph{content providers}) capture an increased focus. While the type of content varies, e.g.\ movies, articles, music, advertisements, etc., the high level problem remains the same. Based on knowledge obtained so far on the user, recommend the most desired content. In this paper we present a method to handle the well known user-cold-start problem in recommendation systems. In this scenario, a recommendation system encounters a new user and the objective is to present items as relevant as possible with the hope of keeping the user's session as long as possible. We formulate an optimization problem aimed to maximize the length of this initial session, as this is believed to be the key to have the user come back and perhaps register to the system. In particular, our model captures the fact that a single round with low quality recommendation is likely to terminate the session. In such a case, we do not proceed to the next round as the user leaves the system, possibly never to seen again. We denote this phenomenon a \emph{One-Shot Session}. Our optimization problem is formulated as an MDP where the action space is of a combinatorial nature as we recommend in each round, multiple items. This huge action space presents a computational challenge making the straightforward solution intractable. We analyze the structure of the MDP to prove monotone and submodular like properties that allow a computationally efficient solution via a method denoted by \emph{Greedy Value Iteration} (G-VI).
△ Less
Submitted 5 July, 2016;
originally announced July 2016.
-
Contextual Markov Decision Processes
Authors:
Assaf Hallak,
Dotan Di Castro,
Shie Mannor
Abstract:
We consider a planning problem where the dynamics and rewards of the environment depend on a hidden static parameter referred to as the context. The objective is to learn a strategy that maximizes the accumulated reward across all contexts. The new model, called Contextual Markov Decision Process (CMDP), can model a customer's behavior when interacting with a website (the learner). The customer's…
▽ More
We consider a planning problem where the dynamics and rewards of the environment depend on a hidden static parameter referred to as the context. The objective is to learn a strategy that maximizes the accumulated reward across all contexts. The new model, called Contextual Markov Decision Process (CMDP), can model a customer's behavior when interacting with a website (the learner). The customer's behavior depends on gender, age, location, device, etc. Based on that behavior, the website objective is to determine customer characteristics, and to optimize the interaction between them. Our work focuses on one basic scenario--finite horizon with a small known number of possible contexts. We suggest a family of algorithms with provable guarantees that learn the underlying models and the latent contexts, and optimize the CMDPs. Bounds are obtained for specific naive implementations, and extensions of the framework are discussed, laying the ground for future research.
△ Less
Submitted 8 February, 2015;
originally announced February 2015.
-
Policy Evaluation with Variance Related Risk Criteria in Markov Decision Processes
Authors:
Aviv Tamar,
Dotan Di Castro,
Shie Mannor
Abstract:
In this paper we extend temporal difference policy evaluation algorithms to performance criteria that include the variance of the cumulative reward. Such criteria are useful for risk management, and are important in domains such as finance and process control. We propose both TD(0) and LSTD(lambda) variants with linear function approximation, prove their convergence, and demonstrate their utility…
▽ More
In this paper we extend temporal difference policy evaluation algorithms to performance criteria that include the variance of the cumulative reward. Such criteria are useful for risk management, and are important in domains such as finance and process control. We propose both TD(0) and LSTD(lambda) variants with linear function approximation, prove their convergence, and demonstrate their utility in a 4-dimensional continuous state space problem.
△ Less
Submitted 1 January, 2013;
originally announced January 2013.
-
Policy Gradients with Variance Related Risk Criteria
Authors:
Dotan Di Castro,
Aviv Tamar,
Shie Mannor
Abstract:
Managing risk in dynamic decision problems is of cardinal importance in many fields such as finance and process control. The most common approach to defining risk is through various variance related criteria such as the Sharpe Ratio or the standard deviation adjusted reward. It is known that optimizing many of the variance related risk criteria is NP-hard. In this paper we devise a framework for l…
▽ More
Managing risk in dynamic decision problems is of cardinal importance in many fields such as finance and process control. The most common approach to defining risk is through various variance related criteria such as the Sharpe Ratio or the standard deviation adjusted reward. It is known that optimizing many of the variance related risk criteria is NP-hard. In this paper we devise a framework for local policy gradient style algorithms for reinforcement learning for variance related criteria. Our starting point is a new formula for the variance of the cost-to-go in episodic tasks. Using this formula we develop policy gradient algorithms for criteria that involve both the expected cost and the variance of the cost. We prove the convergence of these algorithms to local minima and demonstrate their applicability in a portfolio planning problem.
△ Less
Submitted 27 June, 2012;
originally announced June 2012.
-
Bandits with an Edge
Authors:
Dotan Di Castro,
Claudio Gentile,
Shie Mannor
Abstract:
We consider a bandit problem over a graph where the rewards are not directly observed. Instead, the decision maker can compare two nodes and receive (stochastic) information pertaining to the difference in their value. The graph structure describes the set of possible comparisons. Consequently, comparing between two nodes that are relatively far requires estimating the difference between every pai…
▽ More
We consider a bandit problem over a graph where the rewards are not directly observed. Instead, the decision maker can compare two nodes and receive (stochastic) information pertaining to the difference in their value. The graph structure describes the set of possible comparisons. Consequently, comparing between two nodes that are relatively far requires estimating the difference between every pair of nodes on the path between them. We analyze this problem from the perspective of sample complexity: How many queries are needed to find an approximately optimal node with probability more than $1-δ$ in the PAC setup? We show that the topology of the graph plays a crucial in defining the sample complexity: graphs with a low diameter have a much better sample complexity.
△ Less
Submitted 11 September, 2011;
originally announced September 2011.
-
A Maximal Large Deviation Inequality for Sub-Gaussian Variables
Authors:
Dotan Di Castro,
Claudio Gentile,
Shie Mannor
Abstract:
In this short note we prove a maximal concentration lemma for sub-Gaussian random variables stating that for independent sub-Gaussian random variables we have \[P<(\max_{1\le i\le N}S_{i}>ε>) \le\exp<(-\frac{1}{N^2}\sum_{i=1}^{N}\frac{ε^{2}}{2σ_{i}^{2}}>), \] where $S_i$ is the sum of $i$ zero mean independent sub-Gaussian random variables and $σ_i$ is the variance of the $i$th random variable.
In this short note we prove a maximal concentration lemma for sub-Gaussian random variables stating that for independent sub-Gaussian random variables we have \[P<(\max_{1\le i\le N}S_{i}>ε>) \le\exp<(-\frac{1}{N^2}\sum_{i=1}^{N}\frac{ε^{2}}{2σ_{i}^{2}}>), \] where $S_i$ is the sum of $i$ zero mean independent sub-Gaussian random variables and $σ_i$ is the variance of the $i$th random variable.
△ Less
Submitted 25 July, 2011; v1 submitted 12 May, 2011;
originally announced May 2011.
-
Adaptive Bases for Reinforcement Learning
Authors:
Dotan Di Castro,
Shie Mannor
Abstract:
We consider the problem of reinforcement learning using function approximation, where the approximating basis can change dynamically while interacting with the environment. A motivation for such an approach is maximizing the value function fitness to the problem faced. Three errors are considered: approximation square error, Bellman residual, and projected Bellman residual. Algorithms under the ac…
▽ More
We consider the problem of reinforcement learning using function approximation, where the approximating basis can change dynamically while interacting with the environment. A motivation for such an approach is maximizing the value function fitness to the problem faced. Three errors are considered: approximation square error, Bellman residual, and projected Bellman residual. Algorithms under the actor-critic framework are presented, and shown to converge. The advantage of such an adaptive basis is demonstrated in simulations.
△ Less
Submitted 2 May, 2010;
originally announced May 2010.
-
A Convergent Online Single Time Scale Actor Critic Algorithm
Authors:
D. Di Castro,
R. Meir
Abstract:
Actor-Critic based approaches were among the first to address reinforcement learning in a general setting. Recently, these algorithms have gained renewed interest due to their generality, good convergence properties, and possible biological relevance. In this paper, we introduce an online temporal difference based actor-critic algorithm which is proved to converge to a neighborhood of a local ma…
▽ More
Actor-Critic based approaches were among the first to address reinforcement learning in a general setting. Recently, these algorithms have gained renewed interest due to their generality, good convergence properties, and possible biological relevance. In this paper, we introduce an online temporal difference based actor-critic algorithm which is proved to converge to a neighborhood of a local maximum of the average reward. Linear function approximation is used by the critic in order estimate the value function, and the temporal difference signal, which is passed from the critic to the actor. The main distinguishing feature of the present convergence proof is that both the actor and the critic operate on a similar time scale, while in most current convergence proofs they are required to have very different time scales in order to converge. Moreover, the same temporal difference signal is used to update the parameters of both the actor and the critic. A limitation of the proposed approach, compared to results available for two time scale convergence, is that convergence is guaranteed only to a neighborhood of an optimal value, rather to an optimal value itself. The single time scale and identical temporal difference signal used by the actor and the critic, may provide a step towards constructing more biologically realistic models of reinforcement learning in the brain.
△ Less
Submitted 16 September, 2009;
originally announced September 2009.