-
Generating Piano Practice Policy with a Gaussian Process
Authors:
Alexandra Moringen,
Elad Vromen,
Helge Ritter,
Jason Friedman
Abstract:
A typical process of learning to play a piece on a piano consists of a progression through a series of practice units that focus on individual dimensions of the skill, the so-called practice modes. Practice modes in learning to play music comprise a particularly large set of possibilities, such as hand coordination, posture, articulation, ability to read a music score, correct timing or pitch, etc…
▽ More
A typical process of learning to play a piece on a piano consists of a progression through a series of practice units that focus on individual dimensions of the skill, the so-called practice modes. Practice modes in learning to play music comprise a particularly large set of possibilities, such as hand coordination, posture, articulation, ability to read a music score, correct timing or pitch, etc. Self-guided practice is known to be suboptimal, and a model that schedules optimal practice to maximize a learner's progress still does not exist. Because we each learn differently and there are many choices for possible piano practice tasks and methods, the set of practice modes should be dynamically adapted to the human learner, a process typically guided by a teacher. However, having a human teacher guide individual practice is not always feasible since it is time-consuming, expensive, and often unavailable. In this work, we present a modeling framework to guide the human learner through the learning process by choosing the practice modes generated by a policy model. To this end, we present a computational architecture building on a Gaussian process that incorporates 1) the learner state, 2) a policy that selects a suitable practice mode, 3) performance evaluation, and 4) expert knowledge. The proposed policy model is trained to approximate the expert-learner interaction during a practice session. In our future work, we will test different Bayesian optimization techniques, e.g., different acquisition functions, and evaluate their effect on the learning progress.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Video Diffusion Models: A Survey
Authors:
Andrew Melnik,
Michal Ljubljanac,
Cong Lu,
Qi Yan,
Weiming Ren,
Helge Ritter
Abstract:
Diffusion generative models have recently become a robust technique for producing and modifying coherent, high-quality video. This survey offers a systematic overview of critical elements of diffusion models for video generation, covering applications, architectural choices, and the modeling of temporal dynamics. Recent advancements in the field are summarized and grouped into development trends.…
▽ More
Diffusion generative models have recently become a robust technique for producing and modifying coherent, high-quality video. This survey offers a systematic overview of critical elements of diffusion models for video generation, covering applications, architectural choices, and the modeling of temporal dynamics. Recent advancements in the field are summarized and grouped into development trends. The survey concludes with an overview of remaining challenges and an outlook on the future of the field. Website: https://github.com/ndrwmlnk/Awesome-Video-Diffusion-Models
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Lane Segmentation Refinement with Diffusion Models
Authors:
Antonio Ruiz,
Andrew Melnik,
Dong Wang,
Helge Ritter
Abstract:
The lane graph is a key component for building high-definition (HD) maps and crucial for downstream tasks such as autonomous driving or navigation planning. Previously, He et al. (2022) explored the extraction of the lane-level graph from aerial imagery utilizing a segmentation based approach. However, segmentation networks struggle to achieve perfect segmentation masks resulting in inaccurate lan…
▽ More
The lane graph is a key component for building high-definition (HD) maps and crucial for downstream tasks such as autonomous driving or navigation planning. Previously, He et al. (2022) explored the extraction of the lane-level graph from aerial imagery utilizing a segmentation based approach. However, segmentation networks struggle to achieve perfect segmentation masks resulting in inaccurate lane graph extraction. We explore additional enhancements to refine this segmentation-based approach and extend it with a diffusion probabilistic model (DPM) component. This combination further improves the GEO F1 and TOPO F1 scores, which are crucial indicators of the quality of a lane graph, in the undirected graph in non-intersection areas. We conduct experiments on a publicly available dataset, demonstrating that our method outperforms the previous approach, particularly in enhancing the connectivity of such a graph, as measured by the TOPO F1 score. Moreover, we perform ablation studies on the individual components of our method to understand their contribution and evaluate their effectiveness.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Benchmarks for Physical Reasoning AI
Authors:
Andrew Melnik,
Robin Schiewer,
Moritz Lange,
Andrei Muresanu,
Mozhgan Saeidi,
Animesh Garg,
Helge Ritter
Abstract:
Physical reasoning is a crucial aspect in the development of general AI systems, given that human learning starts with interacting with the physical world before progressing to more complex concepts. Although researchers have studied and assessed the physical reasoning of AI approaches through various specific benchmarks, there is no comprehensive approach to evaluating and measuring progress. The…
▽ More
Physical reasoning is a crucial aspect in the development of general AI systems, given that human learning starts with interacting with the physical world before progressing to more complex concepts. Although researchers have studied and assessed the physical reasoning of AI approaches through various specific benchmarks, there is no comprehensive approach to evaluating and measuring progress. Therefore, we aim to offer an overview of existing benchmarks and their solution approaches and propose a unified perspective for measuring the physical reasoning capacity of AI systems. We select benchmarks that are designed to test algorithmic performance in physical reasoning tasks. While each of the selected benchmarks poses a unique challenge, their ensemble provides a comprehensive proving ground for an AI generalist agent with a measurable skill level for various physical reasoning concepts. This gives an advantage to such an ensemble of benchmarks over other holistic benchmarks that aim to simulate the real world by intertwining its complexity and many concepts. We group the presented set of physical reasoning benchmarks into subcategories so that more narrow generalist AI agents can be tested first on these groups.
△ Less
Submitted 17 December, 2023;
originally announced December 2023.
-
Bio-Inspired Grasping Controller for Sensorized 2-DoF Grippers
Authors:
Luca Lach,
Séverin Lemaignan,
Francesco Ferro,
Helge Ritter,
Robert Haschke
Abstract:
We present a holistic grasping controller, combining free-space position control and in-contact force-control for reliable grasping given uncertain object pose estimates. Employing tactile fingertip sensors, undesired object displacement during grasping is minimized by pausing the finger closing motion for individual joints on first contact until force-closure is established. While holding an obje…
▽ More
We present a holistic grasping controller, combining free-space position control and in-contact force-control for reliable grasping given uncertain object pose estimates. Employing tactile fingertip sensors, undesired object displacement during grasping is minimized by pausing the finger closing motion for individual joints on first contact until force-closure is established. While holding an object, the controller is compliant with external forces to avoid high internal object forces and prevent object damage. Gravity as an external force is explicitly considered and compensated for, thus preventing gravity-induced object drift. We evaluate the controller in two experiments on the TIAGo robot and its parallel-jaw gripper proving the effectiveness of the approach for robust grasping and minimizing object displacement. In a series of ablation studies, we demonstrate the utility of the individual controller components.
△ Less
Submitted 13 November, 2023;
originally announced November 2023.
-
Towards Transferring Tactile-based Continuous Force Control Policies from Simulation to Robot
Authors:
Luca Lach,
Robert Haschke,
Davide Tateo,
Jan Peters,
Helge Ritter,
Júlia Borràs,
Carme Torras
Abstract:
The advent of tactile sensors in robotics has sparked many ideas on how robots can leverage direct contact measurements of their environment interactions to improve manipulation tasks. An important line of research in this regard is that of grasp force control, which aims to manipulate objects safely by limiting the amount of force exerted on the object. While prior works have either hand-modeled…
▽ More
The advent of tactile sensors in robotics has sparked many ideas on how robots can leverage direct contact measurements of their environment interactions to improve manipulation tasks. An important line of research in this regard is that of grasp force control, which aims to manipulate objects safely by limiting the amount of force exerted on the object. While prior works have either hand-modeled their force controllers, employed model-based approaches, or have not shown sim-to-real transfer, we propose a model-free deep reinforcement learning approach trained in simulation and then transferred to the robot without further fine-tuning. We therefore present a simulation environment that produces realistic normal forces, which we use to train continuous force control policies. An evaluation in which we compare against a baseline and perform an ablation study shows that our approach outperforms the hand-modeled baseline and that our proposed inductive bias and domain randomization facilitate sim-to-real transfer. Code, models, and supplementary videos are available on https://sites.google.com/view/rl-force-ctrl
△ Less
Submitted 13 November, 2023;
originally announced November 2023.
-
Shape complexity estimation using VAE
Authors:
Markus Rothgaenger,
Andrew Melnik,
Helge Ritter
Abstract:
In this paper, we compare methods for estimating the complexity of two-dimensional shapes and introduce a method that exploits reconstruction loss of Variational Autoencoders with different sizes of latent vectors. Although complexity of a shape is not a well defined attribute, different aspects of it can be estimated. We demonstrate that our methods captures some aspects of shape complexity. Code…
▽ More
In this paper, we compare methods for estimating the complexity of two-dimensional shapes and introduce a method that exploits reconstruction loss of Variational Autoencoders with different sizes of latent vectors. Although complexity of a shape is not a well defined attribute, different aspects of it can be estimated. We demonstrate that our methods captures some aspects of shape complexity. Code and training details will be publicly available.
△ Less
Submitted 5 April, 2023;
originally announced April 2023.
-
Stroke-based Rendering: From Heuristics to Deep Learning
Authors:
Florian Nolte,
Andrew Melnik,
Helge Ritter
Abstract:
In the last few years, artistic image-making with deep learning models has gained a considerable amount of traction. A large number of these models operate directly in the pixel space and generate raster images. This is however not how most humans would produce artworks, for example, by planning a sequence of shapes and strokes to draw. Recent developments in deep learning methods help to bridge t…
▽ More
In the last few years, artistic image-making with deep learning models has gained a considerable amount of traction. A large number of these models operate directly in the pixel space and generate raster images. This is however not how most humans would produce artworks, for example, by planning a sequence of shapes and strokes to draw. Recent developments in deep learning methods help to bridge the gap between stroke-based paintings and pixel photo generation. With this survey, we aim to provide a structured introduction and understanding of common challenges and approaches in stroke-based rendering algorithms. These algorithms range from simple rule-based heuristics to stroke optimization and deep reinforcement agents, trained to paint images with differentiable vector graphics and neural rendering.
△ Less
Submitted 30 December, 2022;
originally announced February 2023.
-
Face Generation and Editing with StyleGAN: A Survey
Authors:
Andrew Melnik,
Maksim Miasayedzenkau,
Dzianis Makarovets,
Dzianis Pirshtuk,
Eren Akbulut,
Dennis Holzmann,
Tarek Renusch,
Gustav Reichert,
Helge Ritter
Abstract:
Our goal with this survey is to provide an overview of the state of the art deep learning methods for face generation and editing using StyleGAN. The survey covers the evolution of StyleGAN, from PGGAN to StyleGAN3, and explores relevant topics such as suitable metrics for training, different latent representations, GAN inversion to latent spaces of StyleGAN, face image editing, cross-domain face…
▽ More
Our goal with this survey is to provide an overview of the state of the art deep learning methods for face generation and editing using StyleGAN. The survey covers the evolution of StyleGAN, from PGGAN to StyleGAN3, and explores relevant topics such as suitable metrics for training, different latent representations, GAN inversion to latent spaces of StyleGAN, face image editing, cross-domain face stylization, face restoration, and even Deepfake applications. We aim to provide an entry point into the field for readers that have basic knowledge about the field of deep learning and are looking for an accessible introduction and overview.
△ Less
Submitted 27 September, 2023; v1 submitted 18 December, 2022;
originally announced December 2022.
-
Black-box Coreset Variational Inference
Authors:
Dionysis Manousakas,
Hippolyt Ritter,
Theofanis Karaletsos
Abstract:
Recent advances in coreset methods have shown that a selection of representative datapoints can replace massive volumes of data for Bayesian inference, preserving the relevant statistical information and significantly accelerating subsequent downstream tasks. Existing variational coreset constructions rely on either selecting subsets of the observed datapoints, or jointly performing approximate in…
▽ More
Recent advances in coreset methods have shown that a selection of representative datapoints can replace massive volumes of data for Bayesian inference, preserving the relevant statistical information and significantly accelerating subsequent downstream tasks. Existing variational coreset constructions rely on either selecting subsets of the observed datapoints, or jointly performing approximate inference and optimizing pseudodata in the observed space akin to inducing points methods in Gaussian Processes. So far, both approaches are limited by complexities in evaluating their objectives for general purpose models, and require generating samples from a typically intractable posterior over the coreset throughout inference and testing. In this work, we present a black-box variational inference framework for coresets that overcomes these constraints and enables principled application of variational coresets to intractable models, such as Bayesian neural networks. We apply our techniques to supervised learning problems, and compare them with existing approaches in the literature for data summarization and inference.
△ Less
Submitted 15 January, 2023; v1 submitted 4 November, 2022;
originally announced November 2022.
-
Training set cleansing of backdoor poisoning by self-supervised representation learning
Authors:
H. Wang,
S. Karami,
O. Dia,
H. Ritter,
E. Emamjomeh-Zadeh,
J. Chen,
Z. Xiang,
D. J. Miller,
G. Kesidis
Abstract:
A backdoor or Trojan attack is an important type of data poisoning attack against deep neural network (DNN) classifiers, wherein the training dataset is poisoned with a small number of samples that each possess the backdoor pattern (usually a pattern that is either imperceptible or innocuous) and which are mislabeled to the attacker's target class. When trained on a backdoor-poisoned dataset, a DN…
▽ More
A backdoor or Trojan attack is an important type of data poisoning attack against deep neural network (DNN) classifiers, wherein the training dataset is poisoned with a small number of samples that each possess the backdoor pattern (usually a pattern that is either imperceptible or innocuous) and which are mislabeled to the attacker's target class. When trained on a backdoor-poisoned dataset, a DNN behaves normally on most benign test samples but makes incorrect predictions to the target class when the test sample has the backdoor pattern incorporated (i.e., contains a backdoor trigger). Here we focus on image classification tasks and show that supervised training may build stronger association between the backdoor pattern and the associated target class than that between normal features and the true class of origin. By contrast, self-supervised representation learning ignores the labels of samples and learns a feature embedding based on images' semantic content. %We thus propose to use unsupervised representation learning to avoid emphasising backdoor-poisoned training samples and learn a similar feature embedding for samples of the same class. Using a feature embedding found by self-supervised representation learning, a data cleansing method, which combines sample filtering and re-labeling, is developed. Experiments on CIFAR-10 benchmark datasets show that our method achieves state-of-the-art performance in mitigating backdoor attacks.
△ Less
Submitted 14 March, 2023; v1 submitted 18 October, 2022;
originally announced October 2022.
-
Placing by Touching: An empirical study on the importance of tactile sensing for precise object placing
Authors:
Luca Lach,
Niklas Funk,
Robert Haschke,
Severin Lemaignan,
Helge Joachim Ritter,
Jan Peters,
Georgia Chalvatzaki
Abstract:
This work deals with a practical everyday problem: stable object placement on flat surfaces starting from unknown initial poses. Common object-placing approaches require either complete scene specifications or extrinsic sensor measurements, e.g., cameras, that occasionally suffer from occlusions. We propose a novel approach for stable object placing that combines tactile feedback and proprioceptiv…
▽ More
This work deals with a practical everyday problem: stable object placement on flat surfaces starting from unknown initial poses. Common object-placing approaches require either complete scene specifications or extrinsic sensor measurements, e.g., cameras, that occasionally suffer from occlusions. We propose a novel approach for stable object placing that combines tactile feedback and proprioceptive sensing. We devise a neural architecture that estimates a rotation matrix, resulting in a corrective gripper movement that aligns the object with the placing surface for the subsequent object manipulation. We compare models with different sensing modalities, such as force-torque and an external motion capture system, in real-world object placing tasks with different objects. The experimental evaluation of our placing policies with a set of unseen everyday objects reveals significant generalization of our proposed pipeline, suggesting that tactile sensing plays a vital role in the intrinsic understanding of robotic dexterous object manipulation. Code, models, and supplementary videos are available at https://sites.google.com/view/placing-by-touching.
△ Less
Submitted 27 November, 2023; v1 submitted 5 October, 2022;
originally announced October 2022.
-
Solving Learn-to-Race Autonomous Racing Challenge by Planning in Latent Space
Authors:
Shivansh Beohar,
Fabian Heinrich,
Rahul Kala,
Helge Ritter,
Andrew Melnik
Abstract:
Learn-to-Race Autonomous Racing Virtual Challenge hosted on www<dot>aicrowd<dot>com platform consisted of two tracks: Single and Multi Camera. Our UniTeam team was among the final winners in the Single Camera track. The agent is required to pass the previously unknown F1-style track in the minimum time with the least amount of off-road driving violations. In our approach, we used the U-Net archite…
▽ More
Learn-to-Race Autonomous Racing Virtual Challenge hosted on www<dot>aicrowd<dot>com platform consisted of two tracks: Single and Multi Camera. Our UniTeam team was among the final winners in the Single Camera track. The agent is required to pass the previously unknown F1-style track in the minimum time with the least amount of off-road driving violations. In our approach, we used the U-Net architecture for road segmentation, variational autocoder for encoding a road binary mask, and a nearest-neighbor search strategy that selects the best action for a given state. Our agent achieved an average speed of 105 km/h on stage 1 (known track) and 73 km/h on stage 2 (unknown track) without any off-road driving violations. Here we present our solution and results.
△ Less
Submitted 5 July, 2022; v1 submitted 4 July, 2022;
originally announced July 2022.
-
YOLO -- You only look 10647 times
Authors:
Christian Limberg,
Andrew Melnik,
Augustin Harter,
Helge Ritter
Abstract:
With this work we are explaining the "You Only Look Once" (YOLO) single-stage object detection approach as a parallel classification of 10647 fixed region proposals. We support this view by showing that each of YOLOs output pixel is attentive to a specific sub-region of previous layers, comparable to a local region proposal. This understanding reduces the conceptual gap between YOLO-like single-st…
▽ More
With this work we are explaining the "You Only Look Once" (YOLO) single-stage object detection approach as a parallel classification of 10647 fixed region proposals. We support this view by showing that each of YOLOs output pixel is attentive to a specific sub-region of previous layers, comparable to a local region proposal. This understanding reduces the conceptual gap between YOLO-like single-stage object detection models, RCNN-like two-stage region proposal based models, and ResNet-like image classification models. In addition, we created interactive exploration tools for a better visual understanding of the YOLO information processing streams: https://limchr.github.io/yolo_visualization
△ Less
Submitted 21 January, 2022; v1 submitted 16 January, 2022;
originally announced January 2022.
-
Transfer Learning with Jukebox for Music Source Separation
Authors:
W. Zai El Amri,
O. Tautz,
H. Ritter,
A. Melnik
Abstract:
In this work, we demonstrate how a publicly available, pre-trained Jukebox model can be adapted for the problem of audio source separation from a single mixed audio channel. Our neural network architecture, which is using transfer learning, is quick to train and the results demonstrate performance comparable to other state-of-the-art approaches that require a lot more compute resources, training d…
▽ More
In this work, we demonstrate how a publicly available, pre-trained Jukebox model can be adapted for the problem of audio source separation from a single mixed audio channel. Our neural network architecture, which is using transfer learning, is quick to train and the results demonstrate performance comparable to other state-of-the-art approaches that require a lot more compute resources, training data, and time. We provide an open-source code implementation of our architecture (https://github.com/wzaielamri/unmix)
△ Less
Submitted 21 September, 2022; v1 submitted 28 November, 2021;
originally announced November 2021.
-
TyXe: Pyro-based Bayesian neural nets for Pytorch
Authors:
Hippolyt Ritter,
Theofanis Karaletsos
Abstract:
We introduce TyXe, a Bayesian neural network library built on top of Pytorch and Pyro. Our leading design principle is to cleanly separate architecture, prior, inference and likelihood specification, allowing for a flexible workflow where users can quickly iterate over combinations of these components. In contrast to existing packages TyXe does not implement any layer classes, and instead relies o…
▽ More
We introduce TyXe, a Bayesian neural network library built on top of Pytorch and Pyro. Our leading design principle is to cleanly separate architecture, prior, inference and likelihood specification, allowing for a flexible workflow where users can quickly iterate over combinations of these components. In contrast to existing packages TyXe does not implement any layer classes, and instead relies on architectures defined in generic Pytorch code. TyXe then provides modular choices for canonical priors, variational guides, inference techniques, and layer selections for a Bayesian treatment of the specified architecture. Sampling tricks for variance reduction, such as local reparameterization or flipout, are implemented as effect handlers, which can be applied independently of other specifications. We showcase the ease of use of TyXe to explore Bayesian versions of popular models from various libraries: toy regression with a pure Pytorch neural network; large-scale image classification with torchvision ResNets; graph neural networks based on DGL; and Neural Radiance Fields built on top of Pytorch3D. Finally, we provide convenient abstractions for variational continual learning. In all cases the change from a deterministic to a Bayesian neural network comes with minimal modifications to existing code, offering a broad range of researchers and practitioners alike practical access to uncertainty estimation techniques. The library is available at https://github.com/TyXe-BDL/TyXe.
△ Less
Submitted 1 October, 2021;
originally announced October 2021.
-
Critic Guided Segmentation of Rewarding Objects in First-Person Views
Authors:
Andrew Melnik,
Augustin Harter,
Christian Limberg,
Krishan Rana,
Niko Suenderhauf,
Helge Ritter
Abstract:
This work discusses a learning approach to mask rewarding objects in images using sparse reward signals from an imitation learning dataset. For that, we train an Hourglass network using only feedback from a critic model. The Hourglass network learns to produce a mask to decrease the critic's score of a high score image and increase the critic's score of a low score image by swapping the masked are…
▽ More
This work discusses a learning approach to mask rewarding objects in images using sparse reward signals from an imitation learning dataset. For that, we train an Hourglass network using only feedback from a critic model. The Hourglass network learns to produce a mask to decrease the critic's score of a high score image and increase the critic's score of a low score image by swapping the masked areas between these two images. We trained the model on an imitation learning dataset from the NeurIPS 2020 MineRL Competition Track, where our model learned to mask rewarding objects in a complex interactive 3D environment with a sparse reward signal. This approach was part of the 1st place winning solution in this competition. Video demonstration and code: https://rebrand.ly/critic-guided-segmentation
△ Less
Submitted 20 July, 2021;
originally announced July 2021.
-
Optimizing piano practice with a utility-based scaffold
Authors:
Alexandra Moringen,
Sören Rüttgers,
Luisa Zintgraf,
Jason Friedman,
Helge Ritter
Abstract:
A typical part of learning to play the piano is the progression through a series of practice units that focus on individual dimensions of the skill, such as hand coordination, correct posture, or correct timing. Ideally, a focus on a particular practice method should be made in a way to maximize the learner's progress in learning to play the piano. Because we each learn differently, and because th…
▽ More
A typical part of learning to play the piano is the progression through a series of practice units that focus on individual dimensions of the skill, such as hand coordination, correct posture, or correct timing. Ideally, a focus on a particular practice method should be made in a way to maximize the learner's progress in learning to play the piano. Because we each learn differently, and because there are many choices for possible piano practice tasks and methods, the set of practice tasks should be dynamically adapted to the human learner. However, having a human teacher guide individual practice is not always feasible since it is time consuming, expensive, and not always available. Instead, we suggest to optimize in the space of practice methods, the so-called practice modes. The proposed optimization process takes into account the skills of the individual learner and their history of learning. In this work we present a modeling framework to guide the human learner through the learning process by choosing practice modes that have the highest expected utility (i.e., improvement in piano playing skill). To this end, we propose a human learner utility model based on a Gaussian process, and exemplify the model training and its application for practice scaffolding on an example of simulated human learners.
△ Less
Submitted 21 June, 2021;
originally announced June 2021.
-
Towards robust and domain agnostic reinforcement learning competitions
Authors:
William Hebgen Guss,
Stephanie Milani,
Nicholay Topin,
Brandon Houghton,
Sharada Mohanty,
Andrew Melnik,
Augustin Harter,
Benoit Buschmaas,
Bjarne Jaster,
Christoph Berganski,
Dennis Heitkamp,
Marko Henning,
Helge Ritter,
Chengjie Wu,
Xiaotian Hao,
Yiming Lu,
Hangyu Mao,
Yihuan Mao,
Chao Wang,
Michal Opanowicz,
Anssi Kanervisto,
Yanick Schraner,
Christian Scheller,
Xiren Zhou,
Lu Liu
, et al. (4 additional authors not shown)
Abstract:
Reinforcement learning competitions have formed the basis for standard research benchmarks, galvanized advances in the state-of-the-art, and shaped the direction of the field. Despite this, a majority of challenges suffer from the same fundamental problems: participant solutions to the posed challenge are usually domain-specific, biased to maximally exploit compute resources, and not guaranteed to…
▽ More
Reinforcement learning competitions have formed the basis for standard research benchmarks, galvanized advances in the state-of-the-art, and shaped the direction of the field. Despite this, a majority of challenges suffer from the same fundamental problems: participant solutions to the posed challenge are usually domain-specific, biased to maximally exploit compute resources, and not guaranteed to be reproducible. In this paper, we present a new framework of competition design that promotes the development of algorithms that overcome these barriers. We propose four central mechanisms for achieving this end: submission retraining, domain randomization, desemantization through domain obfuscation, and the limitation of competition compute and environment-sample budget. To demonstrate the efficacy of this design, we proposed, organized, and ran the MineRL 2020 Competition on Sample-Efficient Reinforcement Learning. In this work, we describe the organizational outcomes of the competition and show that the resulting participant submissions are reproducible, non-specific to the competition environment, and sample/resource efficient, despite the difficult competition task.
△ Less
Submitted 7 June, 2021;
originally announced June 2021.
-
Sparse Uncertainty Representation in Deep Learning with Inducing Weights
Authors:
Hippolyt Ritter,
Martin Kukla,
Cheng Zhang,
Yingzhen Li
Abstract:
Bayesian neural networks and deep ensembles represent two modern paradigms of uncertainty quantification in deep learning. Yet these approaches struggle to scale mainly due to memory inefficiency issues, since they require parameter storage several times higher than their deterministic counterparts. To address this, we augment the weight matrix of each layer with a small number of inducing weights…
▽ More
Bayesian neural networks and deep ensembles represent two modern paradigms of uncertainty quantification in deep learning. Yet these approaches struggle to scale mainly due to memory inefficiency issues, since they require parameter storage several times higher than their deterministic counterparts. To address this, we augment the weight matrix of each layer with a small number of inducing weights, thereby projecting the uncertainty quantification into such low dimensional spaces. We further extend Matheron's conditional Gaussian sampling rule to enable fast weight sampling, which enables our inference method to maintain reasonable run-time as compared with ensembles. Importantly, our approach achieves competitive performance to the state-of-the-art in prediction and uncertainty estimation tasks with fully connected neural networks and ResNets, while reducing the parameter size to $\leq 24.3\%$ of that of a $single$ neural network.
△ Less
Submitted 22 November, 2021; v1 submitted 30 May, 2021;
originally announced May 2021.
-
Solving Physics Puzzles by Reasoning about Paths
Authors:
Augustin Harter,
Andrew Melnik,
Gaurav Kumar,
Dhruv Agarwal,
Animesh Garg,
Helge Ritter
Abstract:
We propose a new deep learning model for goal-driven tasks that require intuitive physical reasoning and intervention in the scene to achieve a desired end goal. Its modular structure is motivated by hypothesizing a sequence of intuitive steps that humans apply when trying to solve such a task. The model first predicts the path the target object would follow without intervention and the path the t…
▽ More
We propose a new deep learning model for goal-driven tasks that require intuitive physical reasoning and intervention in the scene to achieve a desired end goal. Its modular structure is motivated by hypothesizing a sequence of intuitive steps that humans apply when trying to solve such a task. The model first predicts the path the target object would follow without intervention and the path the target object should follow in order to solve the task. Next, it predicts the desired path of the action object and generates the placement of the action object. All components of the model are trained jointly in a supervised way; each component receives its own learning signal but learning signals are also backpropagated through the entire architecture. To evaluate the model we use PHYRE - a benchmark test for goal-driven physical reasoning in 2D mechanics puzzles.
△ Less
Submitted 14 November, 2020;
originally announced November 2020.
-
Addressing Catastrophic Forgetting in Few-Shot Problems
Authors:
Pauching Yap,
Hippolyt Ritter,
David Barber
Abstract:
Neural networks are known to suffer from catastrophic forgetting when trained on sequential datasets. While there have been numerous attempts to solve this problem in large-scale supervised classification, little has been done to overcome catastrophic forgetting in few-shot classification problems. We demonstrate that the popular gradient-based model-agnostic meta-learning algorithm (MAML) indeed…
▽ More
Neural networks are known to suffer from catastrophic forgetting when trained on sequential datasets. While there have been numerous attempts to solve this problem in large-scale supervised classification, little has been done to overcome catastrophic forgetting in few-shot classification problems. We demonstrate that the popular gradient-based model-agnostic meta-learning algorithm (MAML) indeed suffers from catastrophic forgetting and introduce a Bayesian online meta-learning framework that tackles this problem. Our framework utilises Bayesian online learning and meta-learning along with Laplace approximation and variational inference to overcome catastrophic forgetting in few-shot classification problems. The experimental evaluations demonstrate that our framework can effectively achieve this goal in comparison with various baselines. As an additional utility, we also demonstrate empirically that our framework is capable of meta-learning on sequentially arriving few-shot tasks from a stationary task distribution.
△ Less
Submitted 21 June, 2021; v1 submitted 30 April, 2020;
originally announced May 2020.
-
From Crystallized Adaptivity to Fluid Adaptivity in Deep Reinforcement Learning -- Insights from Biological Systems on Adaptive Flexibility
Authors:
Malte Schilling,
Helge Ritter,
Frank W. Ohl
Abstract:
Recent developments in machine-learning algorithms have led to impressive performance increases in many traditional application scenarios of artificial intelligence research. In the area of deep reinforcement learning, deep learning functional architectures are combined with incremental learning schemes for sequential tasks that include interaction-based, but often delayed feedback. Despite their…
▽ More
Recent developments in machine-learning algorithms have led to impressive performance increases in many traditional application scenarios of artificial intelligence research. In the area of deep reinforcement learning, deep learning functional architectures are combined with incremental learning schemes for sequential tasks that include interaction-based, but often delayed feedback. Despite their impressive successes, modern machine-learning approaches, including deep reinforcement learning, still perform weakly when compared to flexibly adaptive biological systems in certain naturally occurring scenarios. Such scenarios include transfers to environments different than the ones in which the training took place or environments that dynamically change, both of which are often mastered by biological systems through a capability that we here term "fluid adaptivity" to contrast it from the much slower adaptivity ("crystallized adaptivity") of the prior learning from which the behavior emerged. In this article, we derive and discuss research strategies, based on analyzes of fluid adaptivity in biological systems and its neuronal modeling, that might aid in equipping future artificially intelligent systems with capabilities of fluid adaptivity more similar to those seen in some biologically intelligent systems. A key component of this research strategy is the dynamization of the problem space itself and the implementation of this dynamization by suitably designed flexibly interacting modules.
△ Less
Submitted 13 August, 2019;
originally announced August 2019.
-
Learning efficient haptic shape exploration with a rigid tactile sensor array
Authors:
Sascha Fleer,
Alexandra Moringen,
Roberta L. Klatzky,
Helge Ritter
Abstract:
Haptic exploration is a key skill for both robots and humans to discriminate and handle unknown objects or to recognize familiar objects. Its active nature is evident in humans who from early on reliably acquire sophisticated sensory-motor capabilities for active exploratory touch and directed manual exploration that associates surfaces and object properties with their spatial locations. This is i…
▽ More
Haptic exploration is a key skill for both robots and humans to discriminate and handle unknown objects or to recognize familiar objects. Its active nature is evident in humans who from early on reliably acquire sophisticated sensory-motor capabilities for active exploratory touch and directed manual exploration that associates surfaces and object properties with their spatial locations. This is in stark contrast to robotics. In this field, the relative lack of good real-world interaction models - along with very restricted sensors and a scarcity of suitable training data to leverage machine learning methods - has so far rendered haptic exploration a largely underdeveloped skill. In the present work, we connect recent advances in recurrent models of visual attention with previous insights about the organisation of human haptic search behavior, exploratory procedures and haptic glances for a novel architecture that learns a generative model of haptic exploration in a simulated three-dimensional environment. The proposed algorithm simultaneously optimizes main perception-action loop components: feature extraction, integration of features over time, and the control strategy, while continuously acquiring data online. We perform a multi-module neural network training, including a feature extractor and a recurrent neural network module aiding pose control for storing and combining sequential sensory data. The resulting haptic meta-controller for the rigid $16 \times 16$ tactile sensor array moving in a physics-driven simulation environment, called the Haptic Attention Model, performs a sequence of haptic glances, and outputs corresponding force measurements. The resulting method has been successfully tested with four different objects. It achieved results close to $100 \%$ while performing object contour exploration that has been optimized for its own sensor morphology.
△ Less
Submitted 26 January, 2020; v1 submitted 20 February, 2019;
originally announced February 2019.
-
Gaussian Mean Field Regularizes by Limiting Learned Information
Authors:
Julius Kunze,
Louis Kirsch,
Hippolyt Ritter,
David Barber
Abstract:
Variational inference with a factorized Gaussian posterior estimate is a widely used approach for learning parameters and hidden variables. Empirically, a regularizing effect can be observed that is poorly understood. In this work, we show how mean field inference improves generalization by limiting mutual information between learned parameters and the data through noise. We quantify a maximum cap…
▽ More
Variational inference with a factorized Gaussian posterior estimate is a widely used approach for learning parameters and hidden variables. Empirically, a regularizing effect can be observed that is poorly understood. In this work, we show how mean field inference improves generalization by limiting mutual information between learned parameters and the data through noise. We quantify a maximum capacity when the posterior variance is either fixed or learned and connect it to generalization error, even when the KL-divergence in the objective is rescaled. Our experiments demonstrate that bounding information between parameters and data effectively regularizes neural networks on both supervised and unsupervised tasks.
△ Less
Submitted 12 February, 2019;
originally announced February 2019.
-
Modularization of End-to-End Learning: Case Study in Arcade Games
Authors:
Andrew Melnik,
Sascha Fleer,
Malte Schilling,
Helge Ritter
Abstract:
Complex environments and tasks pose a difficult problem for holistic end-to-end learning approaches. Decomposition of an environment into interacting controllable and non-controllable objects allows supervised learning for non-controllable objects and universal value function approximator learning for controllable objects. Such decomposition should lead to a shorter learning time and better genera…
▽ More
Complex environments and tasks pose a difficult problem for holistic end-to-end learning approaches. Decomposition of an environment into interacting controllable and non-controllable objects allows supervised learning for non-controllable objects and universal value function approximator learning for controllable objects. Such decomposition should lead to a shorter learning time and better generalisation capability. Here, we consider arcade-game environments as sets of interacting objects (controllable, non-controllable) and propose a set of functional modules that are specialized on mastering different types of interactions in a broad range of environments. The modules utilize regression, supervised learning, and reinforcement learning algorithms. Results of this case study in different Atari games suggest that human-level performance can be achieved by a learning agent within a human amount of game experience (10-15 minutes game time) when a proper decomposition of an environment or a task is provided. However, automatization of such decomposition remains a challenging problem. This case study shows how a model of a causal structure underlying an environment or a task can benefit learning time and generalization capability of the agent, and argues in favor of exploiting modular structure in contrast to using pure end-to-end learning approaches.
△ Less
Submitted 27 January, 2019;
originally announced January 2019.
-
Online Structured Laplace Approximations For Overcoming Catastrophic Forgetting
Authors:
Hippolyt Ritter,
Aleksandar Botev,
David Barber
Abstract:
We introduce the Kronecker factored online Laplace approximation for overcoming catastrophic forgetting in neural networks. The method is grounded in a Bayesian online learning framework, where we recursively approximate the posterior after every task with a Gaussian, leading to a quadratic penalty on changes to the weights. The Laplace approximation requires calculating the Hessian around a mode,…
▽ More
We introduce the Kronecker factored online Laplace approximation for overcoming catastrophic forgetting in neural networks. The method is grounded in a Bayesian online learning framework, where we recursively approximate the posterior after every task with a Gaussian, leading to a quadratic penalty on changes to the weights. The Laplace approximation requires calculating the Hessian around a mode, which is typically intractable for modern architectures. In order to make our method scalable, we leverage recent block-diagonal Kronecker factored approximations to the curvature. Our algorithm achieves over 90% test accuracy across a sequence of 50 instantiations of the permuted MNIST dataset, substantially outperforming related methods for overcoming catastrophic forgetting.
△ Less
Submitted 20 May, 2018;
originally announced May 2018.
-
Learning to Run challenge solutions: Adapting reinforcement learning methods for neuromusculoskeletal environments
Authors:
Łukasz Kidziński,
Sharada Prasanna Mohanty,
Carmichael Ong,
Zhewei Huang,
Shuchang Zhou,
Anton Pechenko,
Adam Stelmaszczyk,
Piotr Jarosik,
Mikhail Pavlov,
Sergey Kolesnikov,
Sergey Plis,
Zhibo Chen,
Zhizheng Zhang,
Jiale Chen,
Jun Shi,
Zhuobin Zheng,
Chun Yuan,
Zhihui Lin,
Henryk Michalewski,
Piotr Miłoś,
Błażej Osiński,
Andrew Melnik,
Malte Schilling,
Helge Ritter,
Sean Carroll
, et al. (4 additional authors not shown)
Abstract:
In the NIPS 2017 Learning to Run challenge, participants were tasked with building a controller for a musculoskeletal model to make it run as fast as possible through an obstacle course. Top participants were invited to describe their algorithms. In this work, we present eight solutions that used deep reinforcement learning approaches, based on algorithms such as Deep Deterministic Policy Gradient…
▽ More
In the NIPS 2017 Learning to Run challenge, participants were tasked with building a controller for a musculoskeletal model to make it run as fast as possible through an obstacle course. Top participants were invited to describe their algorithms. In this work, we present eight solutions that used deep reinforcement learning approaches, based on algorithms such as Deep Deterministic Policy Gradient, Proximal Policy Optimization, and Trust Region Policy Optimization. Many solutions use similar relaxations and heuristics, such as reward shaping, frame skipping, discretization of the action space, symmetry, and policy blending. However, each of the eight teams implemented different modifications of the known algorithms.
△ Less
Submitted 1 April, 2018;
originally announced April 2018.
-
The Cyborg Astrobiologist: Testing a Novelty-Detection Algorithm on Two Mobile Exploration Systems at Rivas Vaciamadrid in Spain and at the Mars Desert Research Station in Utah
Authors:
P. C. McGuire,
C. Gross,
L. Wendt,
A. Bonnici,
V. Souza-Egipsy,
J. Ormo,
E. Diaz-Martinez,
B. H. Foing,
R. Bose,
S. Walter,
M. Oesker,
J. Ontrup,
R. Haschke,
H. Ritter
Abstract:
(ABRIDGED) In previous work, two platforms have been developed for testing computer-vision algorithms for robotic planetary exploration (McGuire et al. 2004b,2005; Bartolo et al. 2007). The wearable-computer platform has been tested at geological and astrobiological field sites in Spain (Rivas Vaciamadrid and Riba de Santiuste), and the phone-camera has been tested at a geological field site in…
▽ More
(ABRIDGED) In previous work, two platforms have been developed for testing computer-vision algorithms for robotic planetary exploration (McGuire et al. 2004b,2005; Bartolo et al. 2007). The wearable-computer platform has been tested at geological and astrobiological field sites in Spain (Rivas Vaciamadrid and Riba de Santiuste), and the phone-camera has been tested at a geological field site in Malta. In this work, we (i) apply a Hopfield neural-network algorithm for novelty detection based upon color, (ii) integrate a field-capable digital microscope on the wearable computer platform, (iii) test this novelty detection with the digital microscope at Rivas Vaciamadrid, (iv) develop a Bluetooth communication mode for the phone-camera platform, in order to allow access to a mobile processing computer at the field sites, and (v) test the novelty detection on the Bluetooth-enabled phone-camera connected to a netbook computer at the Mars Desert Research Station in Utah. This systems engineering and field testing have together allowed us to develop a real-time computer-vision system that is capable, for example, of identifying lichens as novel within a series of images acquired in semi-arid desert environments. We acquired sequences of images of geologic outcrops in Utah and Spain consisting of various rock types and colors to test this algorithm. The algorithm robustly recognized previously-observed units by their color, while requiring only a single image or a few images to learn colors as familiar, demonstrating its fast learning capability.
△ Less
Submitted 28 October, 2009;
originally announced October 2009.
-
The Cyborg Astrobiologist: Porting from a wearable computer to the Astrobiology Phone-cam
Authors:
Alexandra Bartolo,
Patrick C. McGuire,
Kenneth P. Camilleri,
Christopher Spiteri,
Jonathan C. Borg,
Philip J. Farrugia,
Jens Ormo,
Javier Gomez-Elvira,
Jose Antonio Rodriguez-Manfredi,
Enrique Diaz-Martinez,
Helge Ritter,
Robert Haschke,
Markus Oesker,
Joerg Ontrup
Abstract:
We have used a simple camera phone to significantly improve an `exploration system' for astrobiology and geology. This camera phone will make it much easier to develop and test computer-vision algorithms for future planetary exploration. We envision that the `Astrobiology Phone-cam' exploration system can be fruitfully used in other problem domains as well.
We have used a simple camera phone to significantly improve an `exploration system' for astrobiology and geology. This camera phone will make it much easier to develop and test computer-vision algorithms for future planetary exploration. We envision that the `Astrobiology Phone-cam' exploration system can be fruitfully used in other problem domains as well.
△ Less
Submitted 5 July, 2007;
originally announced July 2007.
-
Field geology with a wearable computer: 1st results of the Cyborg Astrobiologist System
Authors:
Patrick C. McGuire,
Javier Gomez-Elvira,
Jose Antonio Rodriguez-Manfredi,
Eduardo Sebastian-Martinez,
Jens Ormo,
Enrique Diaz-Martinez,
Markus Oesker,
Robert Haschke,
Joerg Ontrup,
Helge Ritter
Abstract:
We present results from the first geological field tests of the `Cyborg Astrobiologist', which is a wearable computer and video camcorder system that we are using to test and train a computer-vision system towards having some of the autonomous decision-making capabilities of a field-geologist. The Cyborg Astrobiologist platform has thus far been used for testing and development of these algorith…
▽ More
We present results from the first geological field tests of the `Cyborg Astrobiologist', which is a wearable computer and video camcorder system that we are using to test and train a computer-vision system towards having some of the autonomous decision-making capabilities of a field-geologist. The Cyborg Astrobiologist platform has thus far been used for testing and development of these algorithms and systems: robotic acquisition of quasi-mosaics of images, real-time image segmentation, and real-time determination of interesting points in the image mosaics. This work is more of a test of the whole system, rather than of any one part of the system. However, beyond the concept of the system itself, the uncommon map (despite its simplicity) is the main innovative part of the system. The uncommon map helps to determine interest-points in a context-free manner. Overall, the hardware and software systems function reliably, and the computer-vision algorithms are adequate for the first field tests. In addition to the proof-of-concept aspect of these field tests, the main result of these field tests is the enumeration of those issues that we can improve in the future, including: dealing with structural shadow and microtexture, and also, controlling the camera's zoom lens in an intelligent manner. Nonetheless, despite these and other technical inadequacies, this Cyborg Astrobiologist system, consisting of a camera-equipped wearable-computer and its computer-vision algorithms, has demonstrated its ability of finding genuinely interesting points in real-time in the geological scenery, and then gathering more information about these interest points in an automated manner. We use these capabilities for autonomous guidance towards geological points-of-interest.
△ Less
Submitted 24 June, 2005;
originally announced June 2005.
-
Multi-Modal Human-Machine Communication for Instructing Robot Grasping Tasks
Authors:
P. C. McGuire,
J. Fritsch,
J. J. Steil,
F. Roethling,
G. A. Fink,
S. Wachsmuth,
G. Sagerer,
H. Ritter
Abstract:
A major challenge for the realization of intelligent robots is to supply them with cognitive abilities in order to allow ordinary users to program them easily and intuitively. One way of such programming is teaching work tasks by interactive demonstration. To make this effective and convenient for the user, the machine must be capable to establish a common focus of attention and be able to use a…
▽ More
A major challenge for the realization of intelligent robots is to supply them with cognitive abilities in order to allow ordinary users to program them easily and intuitively. One way of such programming is teaching work tasks by interactive demonstration. To make this effective and convenient for the user, the machine must be capable to establish a common focus of attention and be able to use and integrate spoken instructions, visual perceptions, and non-verbal clues like gestural commands. We report progress in building a hybrid architecture that combines statistical methods, neural networks, and finite state machines into an integrated system for instructing grasping tasks by man-machine interaction. The system combines the GRAVIS-robot for visual attention and gestural instruction with an intelligent interface for speech recognition and linguistic interpretation, and an modality fusion module to allow multi-modal task-oriented man-machine communication with respect to dextrous robot manipulation of objects.
△ Less
Submitted 24 May, 2005;
originally announced May 2005.
-
The Cyborg Astrobiologist: Scouting Red Beds for Uncommon Features with Geological Significance
Authors:
Patrick C. McGuire,
Enrique Diaz-Martinez,
Jens Ormo,
Javier Gomez-Elvira,
Jose A. Rodriguez-Manfredi,
Eduardo Sebastian-Martinez,
Helge Ritter,
Robert Haschke,
Markus Oesker,
Joerg Ontrup
Abstract:
The `Cyborg Astrobiologist' (CA) has undergone a second geological field trial, at a red sandstone site in northern Guadalajara, Spain, near Riba de Santiuste. The Cyborg Astrobiologist is a wearable computer and video camera system that has demonstrated a capability to find uncommon interest points in geological imagery in real-time in the field. The first (of three) geological structures that…
▽ More
The `Cyborg Astrobiologist' (CA) has undergone a second geological field trial, at a red sandstone site in northern Guadalajara, Spain, near Riba de Santiuste. The Cyborg Astrobiologist is a wearable computer and video camera system that has demonstrated a capability to find uncommon interest points in geological imagery in real-time in the field. The first (of three) geological structures that we studied was an outcrop of nearly homogeneous sandstone, which exhibits oxidized-iron impurities in red and and an absence of these iron impurities in white. The white areas in these ``red beds'' have turned white because the iron has been removed by chemical reduction, perhaps by a biological agent. The computer vision system found in one instance several (iron-free) white spots to be uncommon and therefore interesting, as well as several small and dark nodules. The second geological structure contained white, textured mineral deposits on the surface of the sandstone, which were found by the CA to be interesting. The third geological structure was a 50 cm thick paleosol layer, with fossilized root structures of some plants, which were found by the CA to be interesting. A quasi-blind comparison of the Cyborg Astrobiologist's interest points for these images with the interest points determined afterwards by a human geologist shows that the Cyborg Astrobiologist concurred with the human geologist 68% of the time (true positive rate), with a 32% false positive rate and a 32% false negative rate.
(abstract has been abridged).
△ Less
Submitted 23 May, 2005;
originally announced May 2005.
-
The Cyborg Astrobiologist: First Field Experience
Authors:
Patrick C. McGuire,
Jens Ormo,
Enrique Diaz-Martinez,
Jose Antonio Rodriguez-Manfredi,
Javier Gomez-Elvira,
Helge Ritter,
Markus Oesker,
Joerg Ontrup
Abstract:
We present results from the first geological field tests of the `Cyborg Astrobiologist', which is a wearable computer and video camcorder system that we are using to test and train a computer-vision system towards having some of the autonomous decision-making capabilities of a field-geologist and field-astrobiologist. The Cyborg Astrobiologist platform has thus far been used for testing and deve…
▽ More
We present results from the first geological field tests of the `Cyborg Astrobiologist', which is a wearable computer and video camcorder system that we are using to test and train a computer-vision system towards having some of the autonomous decision-making capabilities of a field-geologist and field-astrobiologist. The Cyborg Astrobiologist platform has thus far been used for testing and development of these algorithms and systems: robotic acquisition of quasi-mosaics of images, real-time image segmentation, and real-time determination of interesting points in the image mosaics. The hardware and software systems function reliably, and the computer-vision algorithms are adequate for the first field tests. In addition to the proof-of-concept aspect of these field tests, the main result of these field tests is the enumeration of those issues that we can improve in the future, including: first, detection and accounting for shadows caused by 3D jagged edges in the outcrop; second, reincorporation of more sophisticated texture-analysis algorithms into the system; third, creation of hardware and software capabilities to control the camera's zoom lens in an intelligent manner; and fourth, development of algorithms for interpretation of complex geological scenery. Nonetheless, despite these technical inadequacies, this Cyborg Astrobiologist system, consisting of a camera-equipped wearable-computer and its computer-vision algorithms, has demonstrated its ability of finding genuinely interesting points in real-time in the geological scenery, and then gathering more information about these interest points in an automated manner.
△ Less
Submitted 27 October, 2004;
originally announced October 2004.
-
Neural Architectures for Robot Intelligence
Authors:
H. Ritter,
J. J. Steil,
C. Noelker,
F. Roethling,
P. C. McGuire
Abstract:
We argue that the direct experimental approaches to elucidate the architecture of higher brains may benefit from insights gained from exploring the possibilities and limits of artificial control architectures for robot systems. We present some of our recent work that has been motivated by that view and that is centered around the study of various aspects of hand actions since these are intimatel…
▽ More
We argue that the direct experimental approaches to elucidate the architecture of higher brains may benefit from insights gained from exploring the possibilities and limits of artificial control architectures for robot systems. We present some of our recent work that has been motivated by that view and that is centered around the study of various aspects of hand actions since these are intimately linked with many higher cognitive abilities. As examples, we report on the development of a modular system for the recognition of continuous hand postures based on neural nets, the use of vision and tactile sensing for guiding prehensile movements of a multifingered hand, and the recognition and use of hand gestures for robot teaching.
Regarding the issue of learning, we propose to view real-world learning from the perspective of data mining and to focus more strongly on the imitation of observed actions instead of purely reinforcement-based exploration. As a concrete example of such an effort we report on the status of an ongoing project in our lab in which a robot equipped with an attention system with a neurally inspired architecture is taught actions by using hand gestures in conjunction with speech commands. We point out some of the lessons learnt from this system, and discuss how systems of this kind can contribute to the study of issues at the junction between natural and artificial cognitive systems.
△ Less
Submitted 18 October, 2004;
originally announced October 2004.
-
Field Geology with a Wearable Computer: First Results of the Cyborg Astrobiologist System
Authors:
Patrick C. McGuire,
Javier Gomez-Elvira,
Jose Antonio Rodriguez-Manfredi,
Eduardo Sebastian-Martinez,
Jens Ormo,
Enrique Diaz-Martinez,
Helge Ritter,
Markus Oesker,
Robert Haschke,
Joerg Ontrup
Abstract:
We present results from the first geological field tests of the `Cyborg Astrobiologist', which is a wearable computer and video camcorder system that we are using to test and train a computer-vision system towards having some of the autonomous decision-making capabilities of a field-geologist. The Cyborg Astrobiologist platform has thus far been used for testing and development of these algorith…
▽ More
We present results from the first geological field tests of the `Cyborg Astrobiologist', which is a wearable computer and video camcorder system that we are using to test and train a computer-vision system towards having some of the autonomous decision-making capabilities of a field-geologist. The Cyborg Astrobiologist platform has thus far been used for testing and development of these algorithms and systems: robotic acquisition of quasi-mosaics of images, real-time image segmentation, and real-time determination of interesting points in the image mosaics. The hardware and software systems function reliably, and the computer-vision algorithms are adequate for the first field tests. In addition to the proof-of-concept aspect of these field tests, the main result of these field tests is the enumeration of those issues that we can improve in the future, including: dealing with structural shadow and microtexture, and also, controlling the camera's zoom lens in an intelligent manner. Nonetheless, despite these and other technical inadequacies, this Cyborg Astrobiologist system, consisting of a camera-equipped wearable-computer and its computer-vision algorithms, has demonstrated its ability of finding genuinely interesting points in real-time in the geological scenery, and then gathering more information about these interest points in an automated manner.
△ Less
Submitted 15 September, 2004;
originally announced September 2004.
-
Cyborg Systems as Platforms for Computer-Vision Algorithm-Development for Astrobiology
Authors:
Patrick C. McGuire,
J. A. Rodriguez-Manfredi,
E. Sebastian-Martinez,
J. Gomez-Elvira,
E. Diaz-Martinez,
J. Ormo,
K. Neuffer,
A. Giaquinta,
F. Camps-Martinez,
A. Lepinette-Malvitte,
J. Perez-Mercader,
H. Ritter,
M. Oesker,
J. Ontrup,
J. Walter
Abstract:
Employing the allegorical imagery from the film "The Matrix", we motivate and discuss our `Cyborg Astrobiologist' research program. In this research program, we are using a wearable computer and video camcorder in order to test and train a computer-vision system to be a field-geologist and field-astrobiologist.
Employing the allegorical imagery from the film "The Matrix", we motivate and discuss our `Cyborg Astrobiologist' research program. In this research program, we are using a wearable computer and video camcorder in order to test and train a computer-vision system to be a field-geologist and field-astrobiologist.
△ Less
Submitted 9 May, 2004; v1 submitted 2 January, 2004;
originally announced January 2004.