subscribe to arXiv mailings

arXiv:2407.01705 [pdf]

Optimized Learning for X-Ray Image Classification for Multi-Class Disease Diagnoses with Accelerated Computing Strategies

Authors: Sebastian A. Cruz Romero, Ivanelyz Rivera de Jesus, Dariana J. Troche Quinones, Wilson Rivera Gallego

Abstract: X-ray image-based disease diagnosis lies in ensuring the precision of identifying afflictions within the sample, a task fraught with challenges stemming from the occurrence of false positives and false negatives. False positives introduce the risk of erroneously identifying non-existent conditions, leading to misdiagnosis and a decline in patient care quality. Conversely, false negatives pose the… ▽ More X-ray image-based disease diagnosis lies in ensuring the precision of identifying afflictions within the sample, a task fraught with challenges stemming from the occurrence of false positives and false negatives. False positives introduce the risk of erroneously identifying non-existent conditions, leading to misdiagnosis and a decline in patient care quality. Conversely, false negatives pose the threat of overlooking genuine abnormalities, potentially causing delays in treatment and interventions, thereby resulting in adverse patient outcomes. The urgency to overcome these challenges compels ongoing efforts to elevate the precision and reliability of X-ray image analysis algorithms within the computational framework. This study introduces modified pre-trained ResNet models tailored for multi-class disease diagnosis of X-ray images, incorporating advanced optimization strategies to reduce the execution runtime of training and inference tasks. The primary objective is to achieve tangible performance improvements through accelerated implementations of PyTorch, CUDA, Mixed- Precision Training, and Learning Rate Scheduler. While outcomes demonstrate substantial improvements in execution runtimes between normal training and CUDA-accelerated training, negligible differences emerge between various training optimization modalities. This research marks a significant advancement in optimizing computational approaches to reduce training execution time for larger models. Additionally, we explore the potential of effective parallel data processing using MPI4Py for the distribution of gradient descent optimization across multiple nodes and leverage multiprocessing to expedite data preprocessing for larger datasets. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: High Performance Computing course final term paper

arXiv:2406.12505 [pdf, other]

Demonstrating Agile Flight from Pixels without State Estimation

Authors: Ismail Geles, Leonard Bauersfeld, Angel Romero, Jiaxu Xing, Davide Scaramuzza

Abstract: Quadrotors are among the most agile flying robots. Despite recent advances in learning-based control and computer vision, autonomous drones still rely on explicit state estimation. On the other hand, human pilots only rely on a first-person-view video stream from the drone onboard camera to push the platform to its limits and fly robustly in unseen environments. To the best of our knowledge, we pr… ▽ More Quadrotors are among the most agile flying robots. Despite recent advances in learning-based control and computer vision, autonomous drones still rely on explicit state estimation. On the other hand, human pilots only rely on a first-person-view video stream from the drone onboard camera to push the platform to its limits and fly robustly in unseen environments. To the best of our knowledge, we present the first vision-based quadrotor system that autonomously navigates through a sequence of gates at high speeds while directly mapping pixels to control commands. Like professional drone-racing pilots, our system does not use explicit state estimation and leverages the same control commands humans use (collective thrust and body rates). We demonstrate agile flight at speeds up to 40km/h with accelerations up to 2g. This is achieved by training vision-based policies with reinforcement learning (RL). The training is facilitated using an asymmetric actor-critic with access to privileged information. To overcome the computational complexity during image-based RL training, we use the inner edges of the gates as a sensor abstraction. This simple yet robust, task-relevant representation can be simulated during training without rendering images. During deployment, a Swin-transformer-based gate detector is used. Our approach enables autonomous agile flight with standard, off-the-shelf hardware. Although our demonstration focuses on drone racing, we believe that our method has an impact beyond drone racing and can serve as a foundation for future research into real-world applications in structured environments. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Journal ref: Robotics: Science and Systems (RSS), 2024

arXiv:2404.15029 [pdf, other]

Explainable LightGBM Approach for Predicting Myocardial Infarction Mortality

Authors: Ana Letícia Garcez Vicente, Roseval Donisete Malaquias Junior, Roseli A. F. Romero

Abstract: Myocardial Infarction is a main cause of mortality globally, and accurate risk prediction is crucial for improving patient outcomes. Machine Learning techniques have shown promise in identifying high-risk patients and predicting outcomes. However, patient data often contain vast amounts of information and missing values, posing challenges for feature selection and imputation methods. In this artic… ▽ More Myocardial Infarction is a main cause of mortality globally, and accurate risk prediction is crucial for improving patient outcomes. Machine Learning techniques have shown promise in identifying high-risk patients and predicting outcomes. However, patient data often contain vast amounts of information and missing values, posing challenges for feature selection and imputation methods. In this article, we investigate the impact of the data preprocessing task and compare three ensembles boosted tree methods to predict the risk of mortality in patients with myocardial infarction. Further, we use the Tree Shapley Additive Explanations method to identify relationships among all the features for the performed predictions, leveraging the entirety of the available data in the analysis. Notably, our approach achieved a superior performance when compared to other existing machine learning approaches, with an F1-score of 91,2% and an accuracy of 91,8% for LightGBM without data preprocessing. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: This article has been accepted at the 2023 International Conference on Computational Science and Computational Intelligence (CSCI 23)

arXiv:2403.17551 [pdf, other]

MPCC++: Model Predictive Contouring Control for Time-Optimal Flight with Safety Constraints

Authors: Maria Krinner, Angel Romero, Leonard Bauersfeld, Melanie Zeilinger, Andrea Carron, Davide Scaramuzza

Abstract: Quadrotor flight is an extremely challenging problem due to the limited control authority encountered at the limit of handling. Model Predictive Contouring Control (MPCC) has emerged as a promising model-based approach for time optimization problems such as drone racing. However, the standard MPCC formulation used in quadrotor racing introduces the notion of the gates directly in the cost function… ▽ More Quadrotor flight is an extremely challenging problem due to the limited control authority encountered at the limit of handling. Model Predictive Contouring Control (MPCC) has emerged as a promising model-based approach for time optimization problems such as drone racing. However, the standard MPCC formulation used in quadrotor racing introduces the notion of the gates directly in the cost function, creating a multi objective optimization that continuously trades off between maximizing progress and tracking the path accurately. This paper introduces three key components that enhance the state-of-the-art MPCC approach for drone racing. First and foremost, we provide safety guarantees in the form of a track constraint and terminal set. The track constraint is designed as a spatial constraint which prevents gate collisions while allowing for time optimization only in the cost function. Second, we augment the existing first principles dynamics with a residual term that captures complex aerodynamic effects and thrust forces learned directly from real-world data. Third, we use Trust Region Bayesian Optimization (TuRBO), a state-of-the-art global Bayesian Optimization algorithm, to tune the hyperparameters of the MPCC controller given a sparse reward based on lap time minimization. The proposed approach achieves similar lap times to the best-performing RL policy and outperforms the best model-based controller while satisfying constraints. In both simulation and real world, our approach consistently prevents gate crashes with 100% success rate, while pushing the quadrotor to its physical limits reaching speeds of more than 80km/h. △ Less

Submitted 14 June, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

Comments: 12 pages, 6 figures

Journal ref: Robotics: Science and Systems (RSS), 2024

arXiv:2403.12203 [pdf, other]

Bootstrapping Reinforcement Learning with Imitation for Vision-Based Agile Flight

Authors: Jiaxu Xing, Angel Romero, Leonard Bauersfeld, Davide Scaramuzza

Abstract: We combine the effectiveness of Reinforcement Learning (RL) and the efficiency of Imitation Learning (IL) in the context of vision-based, autonomous drone racing. We focus on directly processing visual input without explicit state estimation. While RL offers a general framework for learning complex controllers through trial and error, it faces challenges regarding sample efficiency and computation… ▽ More We combine the effectiveness of Reinforcement Learning (RL) and the efficiency of Imitation Learning (IL) in the context of vision-based, autonomous drone racing. We focus on directly processing visual input without explicit state estimation. While RL offers a general framework for learning complex controllers through trial and error, it faces challenges regarding sample efficiency and computational demands due to the high dimensionality of visual inputs. Conversely, IL demonstrates efficiency in learning from visual demonstrations but is limited by the quality of those demonstrations and faces issues like covariate shift. To overcome these limitations, we propose a novel training framework combining RL and IL's advantages. Our framework involves three stages: initial training of a teacher policy using privileged state information, distilling this policy into a student policy using IL, and performance-constrained adaptive RL fine-tuning. Our experiments in both simulated and real-world environments demonstrate that our approach achieves superior performance and robustness than IL or RL alone in navigating a quadrotor through a racing course using only visual information without explicit state estimation. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.09177 [pdf, other]

Cellular-enabled Collaborative Robots Planning and Operations for Search-and-Rescue Scenarios

Authors: Arnau Romero, Carmen Delgado, Lanfranco Zanzi, Raúl Suárez, Xavier Costa-Pérez

Abstract: Mission-critical operations, particularly in the context of Search-and-Rescue (SAR) and emergency response situations, demand optimal performance and efficiency from every component involved to maximize the success probability of such operations. In these settings, cellular-enabled collaborative robotic systems have emerged as invaluable assets, assisting first responders in several tasks, ranging… ▽ More Mission-critical operations, particularly in the context of Search-and-Rescue (SAR) and emergency response situations, demand optimal performance and efficiency from every component involved to maximize the success probability of such operations. In these settings, cellular-enabled collaborative robotic systems have emerged as invaluable assets, assisting first responders in several tasks, ranging from victim localization to hazardous area exploration. However, a critical limitation in the deployment of cellular-enabled collaborative robots in SAR missions is their energy budget, primarily supplied by batteries, which directly impacts their task execution and mobility. This paper tackles this problem, and proposes a search-and-rescue framework for cellular-enabled collaborative robots use cases that, taking as input the area size to be explored, the robots fleet size, their energy profile, exploration rate required and target response time, finds the minimum number of robots able to meet the SAR mission goals and the path they should follow to explore the area. Our results, i) show that first responders can rely on a SAR cellular-enabled robotics framework when planning mission-critical operations to take informed decisions with limited resources, and, ii) illustrate the number of robots versus explored area and response time trade-off depending on the type of robot: wheeled vs quadruped. △ Less

Submitted 14 March, 2024; originally announced March 2024.

arXiv:2403.02514 [pdf, other]

Purpose for Open-Ended Learning Robots: A Computational Taxonomy, Definition, and Operationalisation

Authors: Gianluca Baldassarre, Richard J. Duro, Emilio Cartoni, Mehdi Khamassi, Alejandro Romero, Vieri Giuliano Santucci

Abstract: Autonomous open-ended learning (OEL) robots are able to cumulatively acquire new skills and knowledge through direct interaction with the environment, for example relying on the guidance of intrinsic motivations and self-generated goals. OEL robots have a high relevance for applications as they can use the autonomously acquired knowledge to accomplish tasks relevant for their human users. OEL robo… ▽ More Autonomous open-ended learning (OEL) robots are able to cumulatively acquire new skills and knowledge through direct interaction with the environment, for example relying on the guidance of intrinsic motivations and self-generated goals. OEL robots have a high relevance for applications as they can use the autonomously acquired knowledge to accomplish tasks relevant for their human users. OEL robots, however, encounter an important limitation: this may lead to the acquisition of knowledge that is not so much relevant to accomplish the users' tasks. This work analyses a possible solution to this problem that pivots on the novel concept of `purpose'. Purposes indicate what the designers and/or users want from the robot. The robot should use internal representations of purposes, called here `desires', to focus its open-ended exploration towards the acquisition of knowledge relevant to accomplish them. This work contributes to develop a computational framework on purpose in two ways. First, it formalises a framework on purpose based on a three-level motivational hierarchy involving: (a) the purposes; (b) the desires, which are domain independent; (c) specific domain dependent state-goals. Second, the work highlights key challenges highlighted by the framework such as: the `purpose-desire alignment problem', the `purpose-goal grounding problem', and the `arbitration between desires'. Overall, the approach enables OEL robots to learn in an autonomous way but also to focus on acquiring goals and skills that meet the purposes of the designers and users. △ Less

Submitted 4 March, 2024; originally announced March 2024.

Comments: 15 pages, 6 figures

arXiv:2311.17068 [pdf, other]

Deep convolutional encoder-decoder hierarchical neural networks for conjugate heat transfer surrogate modeling

Authors: Takiah Ebbs-Picken, David A. Romero, Carlos M. Da Silva, Cristina H. Amon

Abstract: Conjugate heat transfer (CHT) models are vital for the design of many engineering systems. However, high-fidelity CHT models are computationally intensive, which limits their use in applications such as design optimization, where hundreds to thousands of model evaluations are required. In this work, we develop a modular deep convolutional encoder-decoder hierarchical (DeepEDH) neural network, a no… ▽ More Conjugate heat transfer (CHT) models are vital for the design of many engineering systems. However, high-fidelity CHT models are computationally intensive, which limits their use in applications such as design optimization, where hundreds to thousands of model evaluations are required. In this work, we develop a modular deep convolutional encoder-decoder hierarchical (DeepEDH) neural network, a novel deep-learning-based surrogate modeling methodology for computationally intensive CHT models. Leveraging convective temperature dependencies, we propose a two-stage temperature prediction architecture that couples velocity and temperature models. The proposed DeepEDH methodology is demonstrated by modeling the pressure, velocity, and temperature fields for a liquid-cooled cold-plate-based battery thermal management system with variable channel geometry. A computational model of the cold plate is developed and solved using the finite element method (FEM), generating a dataset of 1,500 simulations. The FEM results are transformed and scaled from unstructured to structured, image-like meshes to create training and test datasets. The DeepEDH methodology's performance is examined in relation to data scaling, training dataset size, and network depth. Our performance analysis covers the impact of the novel architecture, separate field models, output geometry masks, multi-stage temperature models, and optimizations of the hyperparameters and architecture. Furthermore, we quantify the influence of the CHT thermal boundary condition on surrogate model performance, highlighting improved temperature model performance with higher heat fluxes. Compared to other deep learning neural network surrogate models, such as U-Net and DenseED, the proposed DeepEDH methodology for CHT models exhibits up to a 65% enhancement in the coefficient of determination ($R^{2}$). △ Less

Submitted 24 November, 2023; originally announced November 2023.

arXiv:2310.10943 [pdf, other]

doi 10.1126/scirobotics.adg1462

Reaching the Limit in Autonomous Racing: Optimal Control versus Reinforcement Learning

Authors: Yunlong Song, Angel Romero, Matthias Mueller, Vladlen Koltun, Davide Scaramuzza

Abstract: A central question in robotics is how to design a control system for an agile mobile robot. This paper studies this question systematically, focusing on a challenging setting: autonomous drone racing. We show that a neural network controller trained with reinforcement learning (RL) outperformed optimal control (OC) methods in this setting. We then investigated which fundamental factors have contri… ▽ More A central question in robotics is how to design a control system for an agile mobile robot. This paper studies this question systematically, focusing on a challenging setting: autonomous drone racing. We show that a neural network controller trained with reinforcement learning (RL) outperformed optimal control (OC) methods in this setting. We then investigated which fundamental factors have contributed to the success of RL or have limited OC. Our study indicates that the fundamental advantage of RL over OC is not that it optimizes its objective better but that it optimizes a better objective. OC decomposes the problem into planning and control with an explicit intermediate representation, such as a trajectory, that serves as an interface. This decomposition limits the range of behaviors that can be expressed by the controller, leading to inferior control performance when facing unmodeled effects. In contrast, RL can directly optimize a task-level objective and can leverage domain randomization to cope with model uncertainty, allowing the discovery of more robust control responses. Our findings allowed us to push an agile drone to its maximum performance, achieving a peak acceleration greater than 12 times the gravitational acceleration and a peak velocity of 108 kilometers per hour. Our policy achieved superhuman control within minutes of training on a standard workstation. This work presents a milestone in agile robotics and sheds light on the role of RL and OC in robot control. △ Less

Submitted 18 October, 2023; v1 submitted 16 October, 2023; originally announced October 2023.

Journal ref: Science Robotics, 2023

arXiv:2309.12784 [pdf, other]

Learning to Walk and Fly with Adversarial Motion Priors

Authors: Giuseppe L'Erario, Drew Hanover, Angel Romero, Yunlong Song, Gabriele Nava, Paolo Maria Viceconte, Daniele Pucci, Davide Scaramuzza

Abstract: Robot multimodal locomotion encompasses the ability to transition between walking and flying, representing a significant challenge in robotics. This work presents an approach that enables automatic smooth transitions between legged and aerial locomotion. Leveraging the concept of Adversarial Motion Priors, our method allows the robot to imitate motion datasets and accomplish the desired task witho… ▽ More Robot multimodal locomotion encompasses the ability to transition between walking and flying, representing a significant challenge in robotics. This work presents an approach that enables automatic smooth transitions between legged and aerial locomotion. Leveraging the concept of Adversarial Motion Priors, our method allows the robot to imitate motion datasets and accomplish the desired task without the need for complex reward functions. The robot learns walking patterns from human-like gaits and aerial locomotion patterns from motions obtained using trajectory optimization. Through this process, the robot adapts the locomotion scheme based on environmental feedback using reinforcement learning, with the spontaneous emergence of mode-switching behavior. The results highlight the potential for achieving multimodal locomotion in aerial humanoid robotics through automatic control of walking and flying modes, paving the way for applications in diverse domains such as search and rescue, surveillance, and exploration missions. This research contributes to advancing the capabilities of aerial humanoid robots in terms of versatile locomotion in various environments. △ Less

Submitted 29 March, 2024; v1 submitted 22 September, 2023; originally announced September 2023.

Comments: 8 pages, 8 figures, submitted to IROS 2024

arXiv:2307.06100 [pdf, other]

doi 10.1126/scirobotics.abl6259

Agilicious: Open-Source and Open-Hardware Agile Quadrotor for Vision-Based Flight

Authors: Philipp Foehn, Elia Kaufmann, Angel Romero, Robert Penicka, Sihao Sun, Leonard Bauersfeld, Thomas Laengle, Giovanni Cioffi, Yunlong Song, Antonio Loquercio, Davide Scaramuzza

Abstract: Autonomous, agile quadrotor flight raises fundamental challenges for robotics research in terms of perception, planning, learning, and control. A versatile and standardized platform is needed to accelerate research and let practitioners focus on the core problems. To this end, we present Agilicious, a co-designed hardware and software framework tailored to autonomous, agile quadrotor flight. It is… ▽ More Autonomous, agile quadrotor flight raises fundamental challenges for robotics research in terms of perception, planning, learning, and control. A versatile and standardized platform is needed to accelerate research and let practitioners focus on the core problems. To this end, we present Agilicious, a co-designed hardware and software framework tailored to autonomous, agile quadrotor flight. It is completely open-source and open-hardware and supports both model-based and neural-network--based controllers. Also, it provides high thrust-to-weight and torque-to-inertia ratios for agility, onboard vision sensors, GPU-accelerated compute hardware for real-time perception and neural-network inference, a real-time flight controller, and a versatile software stack. In contrast to existing frameworks, Agilicious offers a unique combination of flexible software stack and high-performance hardware. We compare Agilicious with prior works and demonstrate it on different agile tasks, using both model-based and neural-network--based controllers. Our demonstrators include trajectory tracking at up to 5g and 70 km/h in a motion-capture system, and vision-based acrobatic flight and obstacle avoidance in both structured and unstructured environments using solely onboard perception. Finally, we demonstrate its use for hardware-in-the-loop simulation in virtual-reality environments. Thanks to its versatility, we believe that Agilicious supports the next generation of scientific and industrial quadrotor research. △ Less

Submitted 12 July, 2023; originally announced July 2023.

Comments: 14 pages, 5 figures, 2 tables

Journal ref: Science Robotics Vol. 7, Issue 67, 2022

arXiv:2306.09852 [pdf, other]

Actor-Critic Model Predictive Control

Authors: Angel Romero, Yunlong Song, Davide Scaramuzza

Abstract: An open research question in robotics is how to combine the benefits of model-free reinforcement learning (RL) - known for its strong task performance and flexibility in optimizing general reward formulations - with the robustness and online replanning capabilities of model predictive control (MPC). This paper provides an answer by introducing a new framework called Actor-Critic Model Predictive C… ▽ More An open research question in robotics is how to combine the benefits of model-free reinforcement learning (RL) - known for its strong task performance and flexibility in optimizing general reward formulations - with the robustness and online replanning capabilities of model predictive control (MPC). This paper provides an answer by introducing a new framework called Actor-Critic Model Predictive Control. The key idea is to embed a differentiable MPC within an actor-critic RL framework. The proposed approach leverages the short-term predictive optimization capabilities of MPC with the exploratory and end-to-end training properties of RL. The resulting policy effectively manages both short-term decisions through the MPC-based actor and long-term prediction via the critic network, unifying the benefits of both model-based control and end-to-end learning. We validate our method in both simulation and the real world with a quadcopter platform across various high-level tasks. We show that the proposed architecture can achieve real-time control performance, learn complex behaviors via trial and error, and retain the predictive properties of the MPC to better handle out of distribution behaviour. △ Less

Submitted 12 April, 2024; v1 submitted 16 June, 2023; originally announced June 2023.

Comments: 6 pages, 5 figures

Journal ref: IEEE Conference on Robotics and Automation (ICRA 2024)

arXiv:2305.07128 [pdf, other]

doi 10.1364/OL.492911

Pixel-wise rational model for structured light system

Authors: Raúl Vargas, Lenny A. Romero, Song Zhang, Andres G. Marrugo

Abstract: This Letter presents a novel structured light system model that effectively considers local lens distortion by pixel-wise rational functions. We leverage the stereo method for initial calibration and then estimate the rational model for each pixel. Our proposed model can achieve high measurement accuracy within and outside the calibration volume, demonstrating its robustness and accuracy. This Letter presents a novel structured light system model that effectively considers local lens distortion by pixel-wise rational functions. We leverage the stereo method for initial calibration and then estimate the rational model for each pixel. Our proposed model can achieve high measurement accuracy within and outside the calibration volume, demonstrating its robustness and accuracy. △ Less

Submitted 11 May, 2023; originally announced May 2023.

Comments: 4 pages, 5 figures

Journal ref: Optics Letters, Vol. 48, No. 10, 2023

arXiv:2303.10239 [pdf, other]

Topic Modeling in Density Functional Theory on Citations of Condensed Matter Electronic Structure Packages

Authors: Marie Dumaz, Camila Romero-Bohorquez, Donald Adjeroh, Aldo H. Romero

Abstract: With an increasing number of new scientific papers being released, it becomes harder for researchers to be aware of recent articles in their field of study. Accurately classifying papers is a first step in the direction of personalized catering and easy access to research of interest. The field of Density Functional Theory (DFT) in particular is a good example of a methodology used in very differe… ▽ More With an increasing number of new scientific papers being released, it becomes harder for researchers to be aware of recent articles in their field of study. Accurately classifying papers is a first step in the direction of personalized catering and easy access to research of interest. The field of Density Functional Theory (DFT) in particular is a good example of a methodology used in very different studies, and interconnected disciplines, which has a very strong community publishing many research articles. We devise a new unsupervised method for classifying publications, based on topic modeling, and use a DFT-related selection of documents as a use case. We first create topics from word analysis and clustering of the abstracts from the publications, then attribute each publication/paper to a topic based on word similarity. We then make interesting observations by analyzing connections between the topics and publishers, journals, country or year of publication. The proposed approach is general, and can be applied to analyze publication and citation trends in other areas of study, beyond the field of Density Function Theory. △ Less

Submitted 16 February, 2023; originally announced March 2023.

arXiv:2301.01755 [pdf, other]

doi 10.1109/TRO.2024.3400838

Autonomous Drone Racing: A Survey

Authors: Drew Hanover, Antonio Loquercio, Leonard Bauersfeld, Angel Romero, Robert Penicka, Yunlong Song, Giovanni Cioffi, Elia Kaufmann, Davide Scaramuzza

Abstract: Over the last decade, the use of autonomous drone systems for surveying, search and rescue, or last-mile delivery has increased exponentially. With the rise of these applications comes the need for highly robust, safety-critical algorithms which can operate drones in complex and uncertain environments. Additionally, flying fast enables drones to cover more ground which in turn increases productivi… ▽ More Over the last decade, the use of autonomous drone systems for surveying, search and rescue, or last-mile delivery has increased exponentially. With the rise of these applications comes the need for highly robust, safety-critical algorithms which can operate drones in complex and uncertain environments. Additionally, flying fast enables drones to cover more ground which in turn increases productivity and further strengthens their use case. One proxy for developing algorithms used in high-speed navigation is the task of autonomous drone racing, where researchers program drones to fly through a sequence of gates and avoid obstacles as quickly as possible using onboard sensors and limited computational power. Speeds and accelerations exceed over 80 kph and 4 g respectively, raising significant challenges across perception, planning, control, and state estimation. To achieve maximum performance, systems require real-time algorithms that are robust to motion blur, high dynamic range, model uncertainties, aerodynamic disturbances, and often unpredictable opponents. This survey covers the progression of autonomous drone racing across model-based and learning-based approaches. We provide an overview of the field, its evolution over the years, and conclude with the biggest challenges and open questions to be faced in the future. △ Less

Submitted 8 July, 2024; v1 submitted 4 January, 2023; originally announced January 2023.

Comments: 26 pages

Journal ref: IEEE Transactions on Robotics (T-RO), Vol. 40, 2024

arXiv:2211.07467 [pdf, other]

doi 10.1371/journal.pone.0287611

Cracking Double-Blind Review: Authorship Attribution with Deep Learning

Authors: Leonard Bauersfeld, Angel Romero, Manasi Muglikar, Davide Scaramuzza

Abstract: Double-blind peer review is considered a pillar of academic research because it is perceived to ensure a fair, unbiased, and fact-centered scientific discussion. Yet, experienced researchers can often correctly guess from which research group an anonymous submission originates, biasing the peer-review process. In this work, we present a transformer-based, neural-network architecture that only uses… ▽ More Double-blind peer review is considered a pillar of academic research because it is perceived to ensure a fair, unbiased, and fact-centered scientific discussion. Yet, experienced researchers can often correctly guess from which research group an anonymous submission originates, biasing the peer-review process. In this work, we present a transformer-based, neural-network architecture that only uses the text content and the author names in the bibliography to attribute an anonymous manuscript to an author. To train and evaluate our method, we created the largest authorship identification dataset to date. It leverages all research papers publicly available on arXiv amounting to over 2 million manuscripts. In arXiv-subsets with up to 2,000 different authors, our method achieves an unprecedented authorship attribution accuracy, where up to 73% of papers are attributed correctly. We present a scaling analysis to highlight the applicability of the proposed method to even larger datasets when sufficient compute capabilities are more widely available to the academic community. Furthermore, we analyze the attribution accuracy in settings where the goal is to identify all authors of an anonymous manuscript. Thanks to our method, we are not only able to predict the author of an anonymous work, but we also provide empirical evidence of the key aspects that make a paper attributable. We have open-sourced the necessary tools to reproduce our experiments. △ Less

Submitted 3 July, 2023; v1 submitted 14 November, 2022; originally announced November 2022.

Comments: 13 pages + 3 pages references

Journal ref: PLOS ONE 18(6): e0287611 (2023)

arXiv:2210.11087 [pdf, other]

Weighted Maximum Likelihood for Controller Tuning

Authors: Angel Romero, Shreedhar Govil, Gonca Yilmaz, Yunlong Song, Davide Scaramuzza

Abstract: Recently, Model Predictive Contouring Control (MPCC) has arisen as the state-of-the-art approach for model-based agile flight. MPCC benefits from great flexibility in trading-off between progress maximization and path following at runtime without relying on globally optimized trajectories. However, finding the optimal set of tuning parameters for MPCC is challenging because (i) the full quadrotor… ▽ More Recently, Model Predictive Contouring Control (MPCC) has arisen as the state-of-the-art approach for model-based agile flight. MPCC benefits from great flexibility in trading-off between progress maximization and path following at runtime without relying on globally optimized trajectories. However, finding the optimal set of tuning parameters for MPCC is challenging because (i) the full quadrotor dynamics are non-linear, (ii) the cost function is highly non-convex, and (iii) of the high dimensionality of the hyperparameter space. This paper leverages a probabilistic Policy Search method - Weighted Maximum Likelihood (WML)- to automatically learn the optimal objective for MPCC. WML is sample-efficient due to its closed-form solution for updating the learning parameters. Additionally, the data efficiency provided by the use of a model-based approach allows us to directly train in a high-fidelity simulator, which in turn makes our approach able to transfer zero-shot to the real world. We validate our approach in the real world, where we show that our method outperforms both the previous manually tuned controller and the state-of-the-art auto-tuning baseline reaching speeds of 75 km/h. △ Less

Submitted 2 March, 2023; v1 submitted 20 October, 2022; originally announced October 2022.

Comments: 8 pages

Journal ref: IEEE Conference on Robotics and Automation (ICRA 2023)

arXiv:2210.07102 [pdf, other]

Corneal endothelium assessment in specular microscopy images with Fuchs' dystrophy via deep regression of signed distance maps

Authors: Juan S. Sierra, Jesus Pineda, Daniela Rueda, Alejandro Tello, Angelica M. Prada, Virgilio Galvis, Giovanni Volpe, Maria S. Millan, Lenny A. Romero, Andres G. Marrugo

Abstract: Specular microscopy assessment of the human corneal endothelium (CE) in Fuchs' dystrophy is challenging due to the presence of dark image regions called guttae. This paper proposes a UNet-based segmentation approach that requires minimal post-processing and achieves reliable CE morphometric assessment and guttae identification across all degrees of Fuchs' dystrophy. We cast the segmentation proble… ▽ More Specular microscopy assessment of the human corneal endothelium (CE) in Fuchs' dystrophy is challenging due to the presence of dark image regions called guttae. This paper proposes a UNet-based segmentation approach that requires minimal post-processing and achieves reliable CE morphometric assessment and guttae identification across all degrees of Fuchs' dystrophy. We cast the segmentation problem as a regression task of the cell and gutta signed distance maps instead of a pixel-level classification task as typically done with UNets. Compared to the conventional UNet classification approach, the distance-map regression approach converges faster in clinically relevant parameters. It also produces morphometric parameters that agree with the manually-segmented ground-truth data, namely the average cell density difference of -41.9 cells/mm2 (95% confidence interval (CI) [-306.2, 222.5]) and the average difference of mean cell area of 14.8 um2 (95% CI [-41.9, 71.5]). These results suggest a promising alternative for CE assessment. △ Less

Submitted 29 November, 2022; v1 submitted 13 October, 2022; originally announced October 2022.

arXiv:2205.07562 [pdf, other]

Autonomous Open-Ended Learning of Tasks with Non-Stationary Interdependencies

Authors: Alejandro Romero, Gianluca Baldassarre, Richard J. Duro, Vieri Giuliano Santucci

Abstract: Autonomous open-ended learning is a relevant approach in machine learning and robotics, allowing the design of artificial agents able to acquire goals and motor skills without the necessity of user assigned tasks. A crucial issue for this approach is to develop strategies to ensure that agents can maximise their competence on as many tasks as possible in the shortest possible time. Intrinsic motiv… ▽ More Autonomous open-ended learning is a relevant approach in machine learning and robotics, allowing the design of artificial agents able to acquire goals and motor skills without the necessity of user assigned tasks. A crucial issue for this approach is to develop strategies to ensure that agents can maximise their competence on as many tasks as possible in the shortest possible time. Intrinsic motivations have proven to generate a task-agnostic signal to properly allocate the training time amongst goals. While the majority of works in the field of intrinsically motivated open-ended learning focus on scenarios where goals are independent from each other, only few of them studied the autonomous acquisition of interdependent tasks, and even fewer tackled scenarios where goals involve non-stationary interdependencies. Building on previous works, we tackle these crucial issues at the level of decision making (i.e., building strategies to properly select between goals), and we propose a hierarchical architecture that treating sub-tasks selection as a Markov Decision Process is able to properly learn interdependent skills on the basis of intrinsically generated motivations. In particular, we first deepen the analysis of a previous system, showing the importance of incorporating information about the relationships between tasks at a higher level of the architecture (that of goal selection). Then we introduce H-GRAIL, a new system that extends the previous one by adding a new learning layer to store the autonomously acquired sequences of tasks to be able to modify them in case the interdependencies are non-stationary. All systems are tested in a real robotic scenario, with a Baxter robot performing multiple interdependent reaching tasks. △ Less

Submitted 16 May, 2022; originally announced May 2022.

Comments: Submitted and accepted to "The Multi-disciplinary Conference on Reinforcement Learning and Decision Making" RLDM 2022

arXiv:2205.03256 [pdf, other]

doi 10.1109/TNSM.2023.3281976

OROS: Online Operation and Orchestration of Collaborative Robots using 5G

Authors: Arnau Romero, Carmen Delgado, Lanfranco Zanzi, Xi Li, Xavier Costa-Pérez

Abstract: The 5G mobile networks extend the capability for supporting collaborative robot operations in outdoor scenarios. However, the restricted battery life of robots still poses a major obstacle to their effective implementation and utilization in real scenarios. One of the most challenging situations is the execution of mission-critical tasks that require the use of various onboard sensors to perform s… ▽ More The 5G mobile networks extend the capability for supporting collaborative robot operations in outdoor scenarios. However, the restricted battery life of robots still poses a major obstacle to their effective implementation and utilization in real scenarios. One of the most challenging situations is the execution of mission-critical tasks that require the use of various onboard sensors to perform simultaneous localization and mapping (SLAM) of unexplored environments. Given the time-sensitive nature of these tasks, completing them in the shortest possible time is of the highest importance. In this paper, we analyze the benefits of 5G-enabled collaborative robots by enhancing the intelligence of the robot operation through joint orchestration of Robot Operating System (ROS) and 5G resources for energysaving goals, addressing the problem from both offline and online manners. We propose OROS, a novel orchestration approach that minimizes mission-critical task completion times as well as overall energy consumption of 5G-connected robots by jointly optimizing robotic navigation and sensing together with infrastructure resources. We validate our 5G-enabled collaborative framework by means of Matlab/Simulink, ROS software and Gazebo simulator. Our results show an improvement between 3.65% and 11.98% in exploration task by exploiting 5G orchestration features for battery savings when using 3 robots. △ Less

Submitted 1 February, 2024; v1 submitted 6 May, 2022; originally announced May 2022.

Journal ref: IEEE Transactions on Network and Service Management 2023

arXiv:2203.09839 [pdf, other]

doi 10.1109/LRA.2022.3185772

Time-Optimal Online Replanning for Agile Quadrotor Flight

Authors: Angel Romero, Robert Penicka, Davide Scaramuzza

Abstract: In this paper, we tackle the problem of flying a quadrotor using time-optimal control policies that can be replanned online when the environment changes or when encountering unknown disturbances. This problem is challenging as the time-optimal trajectories that consider the full quadrotor dynamics are computationally expensive to generate (order of minutes or even hours). We introduce a sampling-b… ▽ More In this paper, we tackle the problem of flying a quadrotor using time-optimal control policies that can be replanned online when the environment changes or when encountering unknown disturbances. This problem is challenging as the time-optimal trajectories that consider the full quadrotor dynamics are computationally expensive to generate (order of minutes or even hours). We introduce a sampling-based method for efficient generation of time-optimal paths of a point-mass model. These paths are then tracked using a Model Predictive Contouring Control approach that considers the full quadrotor dynamics and the single rotor thrust limits. Our combined approach is able to run in real-time, being the first time-optimal method that is able to adapt to changes on-the-fly. We showcase our approach's adaption capabilities by flying a quadrotor at more than 60 km/h in a racing track where gates are moving. Additionally, we show that our online replanning approach can cope with strong disturbances caused by winds of up to 68 km/h. △ Less

Submitted 21 July, 2022; v1 submitted 18 March, 2022; originally announced March 2022.

Comments: 8 pages, 10 figures

Journal ref: IEEE Robotics and Automation Letters, vol. 7, no. 3, pp. 7730-7737, July 2022

arXiv:2202.06434 [pdf, other]

doi 10.1109/LRA.2022.3145514

Perception-Aware Perching on Powerlines with Multirotors

Authors: Julio L. Paneque, Jose Ramiro Martínez de Dios, Aníbal Ollero. Drew Hanover, Sihao Sun, Ángel Romero, Davide Scaramuzza

Abstract: Multirotor aerial robots are becoming widely used for the inspection of powerlines. To enable continuous, robust inspection without human intervention, the robots must be able to perch on the powerlines to recharge their batteries. Highly versatile perching capabilities are necessary to adapt to the variety of configurations and constraints that are present in real powerline systems. This paper pr… ▽ More Multirotor aerial robots are becoming widely used for the inspection of powerlines. To enable continuous, robust inspection without human intervention, the robots must be able to perch on the powerlines to recharge their batteries. Highly versatile perching capabilities are necessary to adapt to the variety of configurations and constraints that are present in real powerline systems. This paper presents a novel perching trajectory generation framework that computes perception-aware, collision-free, and dynamically-feasible maneuvers to guide the robot to the desired final state. Trajectory generation is achieved via solving a Nonlinear Programming problem using the Primal-Dual Interior Point method. The problem considers the full dynamic model of the robot down to its single rotor thrusts and minimizes the final pose and velocity errors while avoiding collisions and maximizing the visibility of the powerline during the maneuver. The generated maneuvers consider both the perching and the posterior recovery trajectories. The framework adopts costs and constraints defined by efficient mathematical representations of powerlines, enabling online onboard execution in resource-constrained hardware. The method is validated on-board an agile quadrotor conducting powerline inspection and various perching maneuvers with final pitch values of up to 180 degrees. The developed code is available online at: https://github.com/grvcPerception/pa_powerline_perching △ Less

Submitted 13 February, 2022; originally announced February 2022.

Comments: IEEE Robotics and Automation Letters (2022)

arXiv:2201.09865 [pdf, other]

RePaint: Inpainting using Denoising Diffusion Probabilistic Models

Authors: Andreas Lugmayr, Martin Danelljan, Andres Romero, Fisher Yu, Radu Timofte, Luc Van Gool

Abstract: Free-form inpainting is the task of adding new content to an image in the regions specified by an arbitrary binary mask. Most existing approaches train for a certain distribution of masks, which limits their generalization capabilities to unseen mask types. Furthermore, training with pixel-wise and perceptual losses often leads to simple textural extensions towards the missing areas instead of sem… ▽ More Free-form inpainting is the task of adding new content to an image in the regions specified by an arbitrary binary mask. Most existing approaches train for a certain distribution of masks, which limits their generalization capabilities to unseen mask types. Furthermore, training with pixel-wise and perceptual losses often leads to simple textural extensions towards the missing areas instead of semantically meaningful generation. In this work, we propose RePaint: A Denoising Diffusion Probabilistic Model (DDPM) based inpainting approach that is applicable to even extreme masks. We employ a pretrained unconditional DDPM as the generative prior. To condition the generation process, we only alter the reverse diffusion iterations by sampling the unmasked regions using the given image information. Since this technique does not modify or condition the original DDPM network itself, the model produces high-quality and diverse output images for any inpainting form. We validate our method for both faces and general-purpose image inpainting using standard and extreme masks. RePaint outperforms state-of-the-art Autoregressive, and GAN approaches for at least five out of six mask distributions. Github Repository: git.io/RePaint △ Less

Submitted 31 August, 2022; v1 submitted 24 January, 2022; originally announced January 2022.

Comments: We missed out on other diffusion models that work on inpainting. We corrected that and apologize for this mistake

arXiv:2109.06479 [pdf, other]

doi 10.1109/LRA.2022.3154047

Large-scale Autonomous Flight with Real-time Semantic SLAM under Dense Forest Canopy

Authors: Xu Liu, Guilherme V. Nardari, Fernando Cladera Ojeda, Yuezhan Tao, Alex Zhou, Thomas Donnelly, Chao Qu, Steven W. Chen, Roseli A. F. Romero, Camillo J. Taylor, Vijay Kumar

Abstract: Semantic maps represent the environment using a set of semantically meaningful objects. This representation is storage-efficient, less ambiguous, and more informative, thus facilitating large-scale autonomy and the acquisition of actionable information in highly unstructured, GPS-denied environments. In this letter, we propose an integrated system that can perform large-scale autonomous flights an… ▽ More Semantic maps represent the environment using a set of semantically meaningful objects. This representation is storage-efficient, less ambiguous, and more informative, thus facilitating large-scale autonomy and the acquisition of actionable information in highly unstructured, GPS-denied environments. In this letter, we propose an integrated system that can perform large-scale autonomous flights and real-time semantic mapping in challenging under-canopy environments. We detect and model tree trunks and ground planes from LiDAR data, which are associated across scans and used to constrain robot poses as well as tree trunk models. The autonomous navigation module utilizes a multi-level planning and mapping framework and computes dynamically feasible trajectories that lead the UAV to build a semantic map of the user-defined region of interest in a computationally and storage efficient manner. A drift-compensation mechanism is designed to minimize the odometry drift using semantic SLAM outputs in real time, while maintaining planner optimality and controller stability. This leads the UAV to execute its mission accurately and safely at scale. △ Less

Submitted 15 August, 2023; v1 submitted 14 September, 2021; originally announced September 2021.

Comments: Xu Liu and Guilherme V. Nardari contributed equally to this work

Journal ref: IEEE Robotics and Automation Letters ( Volume: 7, Issue: 2, April 2022)

arXiv:2109.01365 [pdf, other]

A Comparative Study of Nonlinear MPC and Differential-Flatness-Based Control for Quadrotor Agile Flight

Authors: Sihao Sun, Angel Romero, Philipp Foehn, Elia Kaufmann, Davide Scaramuzza

Abstract: Accurate trajectory tracking control for quadrotors is essential for safe navigation in cluttered environments. However, this is challenging in agile flights due to nonlinear dynamics, complex aerodynamic effects, and actuation constraints. In this article, we empirically compare two state-of-the-art control frameworks: the nonlinear-model-predictive controller (NMPC) and the differential-flatness… ▽ More Accurate trajectory tracking control for quadrotors is essential for safe navigation in cluttered environments. However, this is challenging in agile flights due to nonlinear dynamics, complex aerodynamic effects, and actuation constraints. In this article, we empirically compare two state-of-the-art control frameworks: the nonlinear-model-predictive controller (NMPC) and the differential-flatness-based controller (DFBC), by tracking a wide variety of agile trajectories at speeds up to 20 m/s (i.e.,72 km/h). The comparisons are performed in both simulation and real-world environments to systematically evaluate both methods from the aspect of tracking accuracy, robustness, and computational efficiency. We show the superiority of NMPC in tracking dynamically infeasible trajectories, at the cost of higher computation time and risk of numerical convergence issues. For both methods, we also quantitatively study the effect of adding an inner-loop controller using the incremental nonlinear dynamic inversion (INDI) method, and the effect of adding an aerodynamic drag model. Our real-world experiments, performed in one of the world's largest motion capture systems, demonstrate more than 78% tracking error reduction of both NMPC and DFBC, indicating the necessity of using an inner-loop controller and aerodynamic drag model for agile trajectory tracking. △ Less

Submitted 4 January, 2024; v1 submitted 3 September, 2021; originally announced September 2021.

Journal ref: The paper has been published in the IEEE Transactions on Robotics (T-RO), 2022

arXiv:2108.13205 [pdf, other]

Model Predictive Contouring Control for Time-Optimal Quadrotor Flight

Authors: Angel Romero, Sihao Sun, Philipp Foehn, Davide Scaramuzza

Abstract: We tackle the problem of flying time-optimal trajectories through multiple waypoints with quadrotors. State-of-the-art solutions split the problem into a planning task - where a global, time-optimal trajectory is generated - and a control task - where this trajectory is accurately tracked. However, at the current state, generating a time-optimal trajectory that considers the full quadrotor model r… ▽ More We tackle the problem of flying time-optimal trajectories through multiple waypoints with quadrotors. State-of-the-art solutions split the problem into a planning task - where a global, time-optimal trajectory is generated - and a control task - where this trajectory is accurately tracked. However, at the current state, generating a time-optimal trajectory that considers the full quadrotor model requires solving a difficult time allocation problem via optimization, which is computationally demanding (in the order of minutes or even hours). This is detrimental for replanning in presence of disturbances. We overcome this issue by solving the time allocation problem and the control problem concurrently via Model Predictive Contouring Control (MPCC). Our MPCC optimally selects the future states of the platform at runtime, while maximizing the progress along the reference path and minimizing the distance to it. We show that, even when tracking simplified trajectories, the proposed MPCC results in a path that approaches the true time-optimal one, and which can be generated in real-time. We validate our approach in the real world, where we show that our method outperforms both the current state-of-the-art and a world-class human pilot in terms of lap time achieving speeds of up to 60 km/h. △ Less

Submitted 4 May, 2022; v1 submitted 30 August, 2021; originally announced August 2021.

Comments: 17 pages, 16 figures. Video: https://www.youtube.com/watch?v=mHDQcckqdg4 This paper has been accepted for publication in the IEEE Transactions on Robotics (T-RO), 2022

arXiv:2108.11505 [pdf, other]

Generalized Real-World Super-Resolution through Adversarial Robustness

Authors: Angela Castillo, María Escobar, Juan C. Pérez, Andrés Romero, Radu Timofte, Luc Van Gool, Pablo Arbeláez

Abstract: Real-world Super-Resolution (SR) has been traditionally tackled by first learning a specific degradation model that resembles the noise and corruption artifacts in low-resolution imagery. Thus, current methods lack generalization and lose their accuracy when tested on unseen types of corruption. In contrast to the traditional proposal, we present Robust Super-Resolution (RSR), a method that levera… ▽ More Real-world Super-Resolution (SR) has been traditionally tackled by first learning a specific degradation model that resembles the noise and corruption artifacts in low-resolution imagery. Thus, current methods lack generalization and lose their accuracy when tested on unseen types of corruption. In contrast to the traditional proposal, we present Robust Super-Resolution (RSR), a method that leverages the generalization capability of adversarial attacks to tackle real-world SR. Our novel framework poses a paradigm shift in the development of real-world SR methods. Instead of learning a dataset-specific degradation, we employ adversarial attacks to create difficult examples that target the model's weaknesses. Afterward, we use these adversarial examples during training to improve our model's capacity to process noisy inputs. We perform extensive experimentation on synthetic and real-world images and empirically demonstrate that our RSR method generalizes well across datasets without re-training for specific noise priors. By using a single robust model, we outperform state-of-the-art specialized methods on real-world benchmarks. △ Less

Submitted 25 August, 2021; originally announced August 2021.

Comments: ICCV Workshops, 2021

arXiv:2108.04537 [pdf, other]

doi 10.1126/scirobotics.abh1221

Time-Optimal Planning for Quadrotor Waypoint Flight

Authors: Philipp Foehn, Angel Romero, Davide Scaramuzza

Abstract: Quadrotors are among the most agile flying robots. However, planning time-optimal trajectories at the actuation limit through multiple waypoints remains an open problem. This is crucial for applications such as inspection, delivery, search and rescue, and drone racing. Early works used polynomial trajectory formulations, which do not exploit the full actuator potential because of their inherent sm… ▽ More Quadrotors are among the most agile flying robots. However, planning time-optimal trajectories at the actuation limit through multiple waypoints remains an open problem. This is crucial for applications such as inspection, delivery, search and rescue, and drone racing. Early works used polynomial trajectory formulations, which do not exploit the full actuator potential because of their inherent smoothness. Recent works resorted to numerical optimization but require waypoints to be allocated as costs or constraints at specific discrete times. However, this time allocation is a priori unknown and renders previous works incapable of producing truly time-optimal trajectories. To generate truly time-optimal trajectories, we propose a solution to the time allocation problem while exploiting the full quadrotor's actuator potential. We achieve this by introducing a formulation of progress along the trajectory, which enables the simultaneous optimization of the time allocation and the trajectory itself. We compare our method against related approaches and validate it in real-world flights in one of the world's largest motion-capture systems, where we outperform human expert drone pilots in a drone-racing task. △ Less

Submitted 1 October, 2021; v1 submitted 10 August, 2021; originally announced August 2021.

Comments: Narrated video footage available at https://youtu.be/ZPI8U1uSJUs. Code available at https://github.com/uzh-rpg/rpg_time_optimal. arXiv admin note: text overlap with arXiv:2007.06255

Journal ref: Published in Science Robotics, 21 Jul 2021, Vol. 6, Issue 56

arXiv:2107.12540 [pdf, other]

doi 10.1007/s11571-022-09886-z

A Neurorobotics Approach to Behaviour Selection based on Human Activity Recognition

Authors: Caetano M. Ranieri, Renan C. Moioli, Patricia A. Vargas, Roseli A. F. Romero

Abstract: Behaviour selection has been an active research topic for robotics, in particular in the field of human-robot interaction. For a robot to interact effectively and autonomously with humans, the coupling between techniques for human activity recognition, based on sensing information, and robot behaviour selection, based on decision-making mechanisms, is of paramount importance. However, most approac… ▽ More Behaviour selection has been an active research topic for robotics, in particular in the field of human-robot interaction. For a robot to interact effectively and autonomously with humans, the coupling between techniques for human activity recognition, based on sensing information, and robot behaviour selection, based on decision-making mechanisms, is of paramount importance. However, most approaches to date consist of deterministic associations between the recognised activities and the robot behaviours, neglecting the uncertainty inherent to sequential predictions in real-time applications. In this paper, we address this gap by presenting a neurorobotics approach based on computational models that resemble neurophysiological aspects of living beings. This neurorobotics approach was compared to a non-bioinspired, heuristics-based approach. To evaluate both approaches, a robot simulation is developed, in which a mobile robot has to accomplish tasks according to the activity being performed by the inhabitant of an intelligent home. The outcomes of each approach were evaluated according to the number of correct outcomes provided by the robot. Results revealed that the neurorobotics approach is advantageous, especially considering the computational models based on more complex animals. △ Less

Submitted 27 September, 2022; v1 submitted 26 July, 2021; originally announced July 2021.

Comments: This preprint has not undergone peer review or any post-submission improvements or corrections. The Version of Record of this article is published in Cognitive Neurodynamics (2022), and is available online at https://doi.org/10.1007/s11571-022-09886-z

arXiv:2107.12536 [pdf, other]

doi 10.1109/ACCESS.2021.3108682

A Data-Driven Biophysical Computational Model of Parkinson's Disease based on Marmoset Monkeys

Authors: Caetano M. Ranieri, Jhielson M. Pimentel, Marcelo R. Romano, Leonardo A. Elias, Roseli A. F. Romero, Michael A. Lones, Mariana F. P. Araujo, Patricia A. Vargas, Renan C. Moioli

Abstract: In this work we propose a new biophysical computational model of brain regions relevant to Parkinson's Disease based on local field potential data collected from the brain of marmoset monkeys. Parkinson's disease is a neurodegenerative disorder, linked to the death of dopaminergic neurons at the substantia nigra pars compacta, which affects the normal dynamics of the basal ganglia-thalamus-cortex… ▽ More In this work we propose a new biophysical computational model of brain regions relevant to Parkinson's Disease based on local field potential data collected from the brain of marmoset monkeys. Parkinson's disease is a neurodegenerative disorder, linked to the death of dopaminergic neurons at the substantia nigra pars compacta, which affects the normal dynamics of the basal ganglia-thalamus-cortex neuronal circuit of the brain. Although there are multiple mechanisms underlying the disease, a complete description of those mechanisms and molecular pathogenesis are still missing, and there is still no cure. To address this gap, computational models that resemble neurobiological aspects found in animal models have been proposed. In our model, we performed a data-driven approach in which a set of biologically constrained parameters is optimised using differential evolution. Evolved models successfully resembled single-neuron mean firing rates and spectral signatures of local field potentials from healthy and parkinsonian marmoset brain data. As far as we are concerned, this is the first computational model of Parkinson's Disease based on simultaneous electrophysiological recordings from seven brain regions of Marmoset monkeys. Results show that the proposed model could facilitate the investigation of the mechanisms of PD and support the development of techniques that can indicate new therapies. It could also be applied to other computational neuroscience problems in which biological data could be used to fit multi-scale models of brain circuits. △ Less

Submitted 1 September, 2021; v1 submitted 26 July, 2021; originally announced July 2021.

Journal ref: IEEE Access, 2021

arXiv:2107.10495 [pdf]

Benchmarking AutoML Frameworks for Disease Prediction Using Medical Claims

Authors: Roland Albert A. Romero, Mariefel Nicole Y. Deypalan, Suchit Mehrotra, John Titus Jungao, Natalie E. Sheils, Elisabetta Manduchi, Jason H. Moore

Abstract: We ascertain and compare the performances of AutoML tools on large, highly imbalanced healthcare datasets. We generated a large dataset using historical administrative claims including demographic information and flags for disease codes in four different time windows prior to 2019. We then trained three AutoML tools on this dataset to predict six different disease outcomes in 2019 and evaluated… ▽ More We ascertain and compare the performances of AutoML tools on large, highly imbalanced healthcare datasets. We generated a large dataset using historical administrative claims including demographic information and flags for disease codes in four different time windows prior to 2019. We then trained three AutoML tools on this dataset to predict six different disease outcomes in 2019 and evaluated model performances on several metrics. The AutoML tools showed improvement from the baseline random forest model but did not differ significantly from each other. All models recorded low area under the precision-recall curve and failed to predict true positives while keeping the true negative rate high. Model performance was not directly related to prevalence. We provide a specific use-case to illustrate how to select a threshold that gives the best balance between true and false positive rates, as this is an important consideration in medical applications. Healthcare datasets present several challenges for AutoML tools, including large sample size, high imbalance, and limitations in the available features types. Improvements in scalability, combinations of imbalance-learning resampling and ensemble approaches, and curated feature selection are possible next steps to achieve better performance. Among the three explored, no AutoML tool consistently outperforms the rest in terms of predictive performance. The performances of the models in this study suggest that there may be room for improvement in handling medical claims data. Finally, selection of the optimal prediction threshold should be guided by the specific practical application. △ Less

Submitted 22 July, 2021; originally announced July 2021.

Comments: 22 pages, 8 figures, 7 tables

arXiv:2107.09584 [pdf, other]

Active 3D Shape Reconstruction from Vision and Touch

Authors: Edward J. Smith, David Meger, Luis Pineda, Roberto Calandra, Jitendra Malik, Adriana Romero, Michal Drozdzal

Abstract: Humans build 3D understandings of the world through active object exploration, using jointly their senses of vision and touch. However, in 3D shape reconstruction, most recent progress has relied on static datasets of limited sensory data such as RGB images, depth maps or haptic readings, leaving the active exploration of the shape largely unexplored. Inactive touch sensing for 3D reconstruction,… ▽ More Humans build 3D understandings of the world through active object exploration, using jointly their senses of vision and touch. However, in 3D shape reconstruction, most recent progress has relied on static datasets of limited sensory data such as RGB images, depth maps or haptic readings, leaving the active exploration of the shape largely unexplored. Inactive touch sensing for 3D reconstruction, the goal is to actively select the tactile readings that maximize the improvement in shape reconstruction accuracy. However, the development of deep learning-based active touch models is largely limited by the lack of frameworks for shape exploration. In this paper, we focus on this problem and introduce a system composed of: 1) a haptic simulator leveraging high spatial resolution vision-based tactile sensors for active touching of 3D objects; 2)a mesh-based 3D shape reconstruction model that relies on tactile or visuotactile signals; and 3) a set of data-driven solutions with either tactile or visuotactile priors to guide the shape exploration. Our framework enables the development of the first fully data-driven solutions to active touch on top of learned models for object understanding. Our experiments show the benefits of such solutions in the task of 3D shape understanding where our models consistently outperform natural baselines. We provide our framework as a tool to foster future research in this direction. △ Less

Submitted 26 October, 2021; v1 submitted 20 July, 2021; originally announced July 2021.

Journal ref: Published at Neurips 2021

arXiv:2105.14355 [pdf, other]

doi 10.1117/1.OE.60.5.054106

Three-dimensional multimodal medical imaging system based on free-hand ultrasound and structured light

Authors: Jhacson Meza, Sonia H. Contreras-Ortiz, Lenny A. Romero, Andres G. Marrugo

Abstract: We propose a three-dimensional (3D) multimodal medical imaging system that combines freehand ultrasound and structured light 3D reconstruction in a single coordinate system without requiring registration. To the best of our knowledge, these techniques have not been combined before as a multimodal imaging technique. The system complements the internal 3D information acquired with ultrasound, with t… ▽ More We propose a three-dimensional (3D) multimodal medical imaging system that combines freehand ultrasound and structured light 3D reconstruction in a single coordinate system without requiring registration. To the best of our knowledge, these techniques have not been combined before as a multimodal imaging technique. The system complements the internal 3D information acquired with ultrasound, with the external surface measured with the structure light technique. Moreover, the ultrasound probe's optical tracking for pose estimation was implemented based on a convolutional neural network. Experimental results show the system's high accuracy and reproducibility, as well as its potential for preoperative and intraoperative applications. The experimental multimodal error, or the distance from two surfaces obtained with different modalities, was 0.12 mm. The code is available as a Github repository. △ Less

Submitted 29 May, 2021; originally announced May 2021.

Journal ref: Optical Engineering 60(5), 054106 (2021)

arXiv:2105.08826 [pdf, other]

Real-Time Video Super-Resolution on Smartphones with Deep Learning, Mobile AI 2021 Challenge: Report

Authors: Andrey Ignatov, Andres Romero, Heewon Kim, Radu Timofte, Chiu Man Ho, Zibo Meng, Kyoung Mu Lee, Yuxiang Chen, Yutong Wang, Zeyu Long, Chenhao Wang, Yifei Chen, Boshen Xu, Shuhang Gu, Lixin Duan, Wen Li, Wang Bofei, Zhang Diankai, Zheng Chengjian, Liu Shaoli, Gao Si, Zhang Xiaofeng, Lu Kaidi, Xu Tianyu, Zheng Hui , et al. (6 additional authors not shown)

Abstract: Video super-resolution has recently become one of the most important mobile-related problems due to the rise of video communication and streaming services. While many solutions have been proposed for this task, the majority of them are too computationally expensive to run on portable devices with limited hardware resources. To address this problem, we introduce the first Mobile AI challenge, where… ▽ More Video super-resolution has recently become one of the most important mobile-related problems due to the rise of video communication and streaming services. While many solutions have been proposed for this task, the majority of them are too computationally expensive to run on portable devices with limited hardware resources. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop an end-to-end deep learning-based video super-resolution solutions that can achieve a real-time performance on mobile GPUs. The participants were provided with the REDS dataset and trained their models to do an efficient 4X video upscaling. The runtime of all models was evaluated on the OPPO Find X2 smartphone with the Snapdragon 865 SoC capable of accelerating floating-point networks on its Adreno GPU. The proposed solutions are fully compatible with any mobile GPU and can upscale videos to HD resolution at up to 80 FPS while demonstrating high fidelity results. A detailed description of all models developed in the challenge is provided in this paper. △ Less

Submitted 17 May, 2021; originally announced May 2021.

Comments: Mobile AI 2021 Workshop and Challenges: https://ai-benchmark.com/workshops/mai/2021/. arXiv admin note: substantial text overlap with arXiv:2105.07825. substantial text overlap with arXiv:2105.08629, arXiv:2105.07809, arXiv:2105.08630

arXiv:2105.00368 [pdf, other]

MarkerPose: Robust Real-time Planar Target Tracking for Accurate Stereo Pose Estimation

Authors: Jhacson Meza, Lenny A. Romero, Andres G. Marrugo

Abstract: Despite the attention marker-less pose estimation has attracted in recent years, marker-based approaches still provide unbeatable accuracy under controlled environmental conditions. Thus, they are used in many fields such as robotics or biomedical applications but are primarily implemented through classical approaches, which require lots of heuristics and parameter tuning for reliable performance… ▽ More Despite the attention marker-less pose estimation has attracted in recent years, marker-based approaches still provide unbeatable accuracy under controlled environmental conditions. Thus, they are used in many fields such as robotics or biomedical applications but are primarily implemented through classical approaches, which require lots of heuristics and parameter tuning for reliable performance under different environments. In this work, we propose MarkerPose, a robust, real-time pose estimation system based on a planar target of three circles and a stereo vision system. MarkerPose is meant for high-accuracy pose estimation applications. Our method consists of two deep neural networks for marker point detection. A SuperPoint-like network for pixel-level accuracy keypoint localization and classification, and we introduce EllipSegNet, a lightweight ellipse segmentation network for sub-pixel-level accuracy keypoint detection. The marker's pose is estimated through stereo triangulation. The target point detection is robust to low lighting and motion blur conditions. We compared MarkerPose with a detection method based on classical computer vision techniques using a robotic arm for validation. The results show our method provides better accuracy than the classical technique. Finally, we demonstrate the suitability of MarkerPose in a 3D freehand ultrasound system, which is an application where highly accurate pose estimation is required. Code is available in Python and C++ at https://github.com/jhacsonmeza/MarkerPose. △ Less

Submitted 29 May, 2021; v1 submitted 1 May, 2021; originally announced May 2021.

Comments: Accepted at CVPR 2021 LXCV Workshop

arXiv:2011.05680 [pdf, other]

Zero-Pair Image to Image Translation using Domain Conditional Normalization

Authors: Samarth Shukla, Andrés Romero, Luc Van Gool, Radu Timofte

Abstract: In this paper, we propose an approach based on domain conditional normalization (DCN) for zero-pair image-to-image translation, i.e., translating between two domains which have no paired training data available but each have paired training data with a third domain. We employ a single generator which has an encoder-decoder structure and analyze different implementations of domain conditional norma… ▽ More In this paper, we propose an approach based on domain conditional normalization (DCN) for zero-pair image-to-image translation, i.e., translating between two domains which have no paired training data available but each have paired training data with a third domain. We employ a single generator which has an encoder-decoder structure and analyze different implementations of domain conditional normalization to obtain the desired target domain output. The validation benchmark uses RGB-depth pairs and RGB-semantic pairs for training and compares performance for the depth-semantic translation task. The proposed approaches improve in qualitative and quantitative terms over the compared methods, while using much fewer parameters. Code available at https://github.com/samarthshukla/dcn △ Less

Submitted 11 November, 2020; originally announced November 2020.

Comments: Paper accepted for publication at WACV 2021

arXiv:2010.11619 [pdf, other]

Self-Supervised Shadow Removal

Authors: Florin-Alexandru Vasluianu, Andres Romero, Luc Van Gool, Radu Timofte

Abstract: Shadow removal is an important computer vision task aiming at the detection and successful removal of the shadow produced by an occluded light source and a photo-realistic restoration of the image contents. Decades of re-search produced a multitude of hand-crafted restoration techniques and, more recently, learned solutions from shad-owed and shadow-free training image pairs. In this work,we propo… ▽ More Shadow removal is an important computer vision task aiming at the detection and successful removal of the shadow produced by an occluded light source and a photo-realistic restoration of the image contents. Decades of re-search produced a multitude of hand-crafted restoration techniques and, more recently, learned solutions from shad-owed and shadow-free training image pairs. In this work,we propose an unsupervised single image shadow removal solution via self-supervised learning by using a conditioned mask. In contrast to existing literature, we do not require paired shadowed and shadow-free images, instead we rely on self-supervision and jointly learn deep models to remove and add shadows to images. We validate our approach on the recently introduced ISTD and USR datasets. We largely improve quantitatively and qualitatively over the compared methods and set a new state-of-the-art performance in single image shadow removal. △ Less

Submitted 22 October, 2020; originally announced October 2020.

Comments: 10 pages, 4 figures, 6 tables

ACM Class: I.2.10

arXiv:2010.03026 [pdf, other]

Place Recognition in Forests with Urquhart Tessellations

Authors: Guilherme V. Nardari, Avraham Cohen, Steven W. Chen, Xu Liu, Vaibhav Arcot, Roseli A. F. Romero, Vijay Kumar

Abstract: In this letter, we present a novel descriptor based on Urquhart tessellations derived from the position of trees in a forest. We propose a framework that uses these descriptors to detect previously seen observations and landmark correspondences, even with partial overlap and noise. We run loop closure detection experiments in simulation and real-world data map-merging from different flights of an… ▽ More In this letter, we present a novel descriptor based on Urquhart tessellations derived from the position of trees in a forest. We propose a framework that uses these descriptors to detect previously seen observations and landmark correspondences, even with partial overlap and noise. We run loop closure detection experiments in simulation and real-world data map-merging from different flights of an Unmanned Aerial Vehicle (UAV) in a pine tree forest and show that our method outperforms state-of-the-art approaches in accuracy and robustness. △ Less

Submitted 16 November, 2020; v1 submitted 23 September, 2020; originally announced October 2020.

Comments: 9 pages, 6 Figures

arXiv:2010.02315 [pdf, other]

SMILE: Semantically-guided Multi-attribute Image and Layout Editing

Authors: Andrés Romero, Luc Van Gool, Radu Timofte

Abstract: Attribute image manipulation has been a very active topic since the introduction of Generative Adversarial Networks (GANs). Exploring the disentangled attribute space within a transformation is a very challenging task due to the multiple and mutually-inclusive nature of the facial images, where different labels (eyeglasses, hats, hair, identity, etc.) can co-exist at the same time. Several works a… ▽ More Attribute image manipulation has been a very active topic since the introduction of Generative Adversarial Networks (GANs). Exploring the disentangled attribute space within a transformation is a very challenging task due to the multiple and mutually-inclusive nature of the facial images, where different labels (eyeglasses, hats, hair, identity, etc.) can co-exist at the same time. Several works address this issue either by exploiting the modality of each domain/attribute using a conditional random vector noise, or extracting the modality from an exemplary image. However, existing methods cannot handle both random and reference transformations for multiple attributes, which limits the generality of the solutions. In this paper, we successfully exploit a multimodal representation that handles all attributes, be it guided by random noise or exemplar images, while only using the underlying domain information of the target domain. We present extensive qualitative and quantitative results for facial datasets and several different attributes that show the superiority of our method. Additionally, our method is capable of adding, removing or changing either fine-grained or coarse attributes by using an image as a reference or by exploring the style distribution space, and it can be easily extended to head-swapping and face-reenactment applications without being trained on videos. △ Less

Submitted 5 October, 2020; originally announced October 2020.

arXiv:2010.01110 [pdf, other]

AIM 2020 Challenge on Image Extreme Inpainting

Authors: Evangelos Ntavelis, Andrés Romero, Siavash Bigdeli, Radu Timofte

Abstract: This paper reviews the AIM 2020 challenge on extreme image inpainting. This report focuses on proposed solutions and results for two different tracks on extreme image inpainting: classical image inpainting and semantically guided image inpainting. The goal of track 1 is to inpaint considerably large part of the image using no supervision but the context. Similarly, the goal of track 2 is to inpain… ▽ More This paper reviews the AIM 2020 challenge on extreme image inpainting. This report focuses on proposed solutions and results for two different tracks on extreme image inpainting: classical image inpainting and semantically guided image inpainting. The goal of track 1 is to inpaint considerably large part of the image using no supervision but the context. Similarly, the goal of track 2 is to inpaint the image by having access to the entire semantic segmentation map of the image to inpaint. The challenge had 88 and 74 participants, respectively. 11 and 6 teams competed in the final phase of the challenge, respectively. This report gauges current solutions and set a benchmark for future extreme image inpainting methods. △ Less

Submitted 2 October, 2020; originally announced October 2020.

arXiv:2007.13483 [pdf, other]

Post-Workshop Report on Science meets Engineering in Deep Learning, NeurIPS 2019, Vancouver

Authors: Levent Sagun, Caglar Gulcehre, Adriana Romero, Negar Rostamzadeh, Stefano Sarao Mannelli

Abstract: Science meets Engineering in Deep Learning took place in Vancouver as part of the Workshop section of NeurIPS 2019. As organizers of the workshop, we created the following report in an attempt to isolate emerging topics and recurring themes that have been presented throughout the event. Deep learning can still be a complex mix of art and engineering despite its tremendous success in recent years.… ▽ More Science meets Engineering in Deep Learning took place in Vancouver as part of the Workshop section of NeurIPS 2019. As organizers of the workshop, we created the following report in an attempt to isolate emerging topics and recurring themes that have been presented throughout the event. Deep learning can still be a complex mix of art and engineering despite its tremendous success in recent years. The workshop aimed at gathering people across the board to address seemingly contrasting challenges in the problems they are working on. As part of the call for the workshop, particular attention has been given to the interdependence of architecture, data, and optimization that gives rise to an enormous landscape of design and performance intricacies that are not well-understood. This year, our goal was to emphasize the following directions in our community: (i) identify obstacles in the way to better models and algorithms; (ii) identify the general trends from which we would like to build scientific and potentially theoretical understanding; and (iii) the rigorous design of scientific experiments and experimental protocols whose purpose is to resolve and pinpoint the origin of mysteries while ensuring reproducibility and robustness of conclusions. In the event, these topics emerged and were broadly discussed, matching our expectations and paving the way for new studies in these directions. While we acknowledge that the text is naturally biased as it comes through our lens, here we present an attempt to do a fair job of highlighting the outcome of the workshop. △ Less

Submitted 29 July, 2020; v1 submitted 25 June, 2020; originally announced July 2020.

Comments: Report of NeurIPS 2019 workshop SEDL

arXiv:2007.10469 [pdf, other]

doi 10.1007/978-3-030-59713-9_3

Active MR k-space Sampling with Reinforcement Learning

Authors: Luis Pineda, Sumana Basu, Adriana Romero, Roberto Calandra, Michal Drozdzal

Abstract: Deep learning approaches have recently shown great promise in accelerating magnetic resonance image (MRI) acquisition. The majority of existing work have focused on designing better reconstruction models given a pre-determined acquisition trajectory, ignoring the question of trajectory optimization. In this paper, we focus on learning acquisition trajectories given a fixed image reconstruction mod… ▽ More Deep learning approaches have recently shown great promise in accelerating magnetic resonance image (MRI) acquisition. The majority of existing work have focused on designing better reconstruction models given a pre-determined acquisition trajectory, ignoring the question of trajectory optimization. In this paper, we focus on learning acquisition trajectories given a fixed image reconstruction model. We formulate the problem as a sequential decision process and propose the use of reinforcement learning to solve it. Experiments on a large scale public MRI dataset of knees show that our proposed models significantly outperform the state-of-the-art in active MRI acquisition, over a large range of acceleration factors. △ Less

Submitted 7 October, 2020; v1 submitted 20 July, 2020; originally announced July 2020.

Comments: Presented at the 23rd International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2020

Journal ref: LNCS vol. 12262 (2020) 23-33

arXiv:2007.03778 [pdf, other]

3D Shape Reconstruction from Vision and Touch

Authors: Edward J. Smith, Roberto Calandra, Adriana Romero, Georgia Gkioxari, David Meger, Jitendra Malik, Michal Drozdzal

Abstract: When a toddler is presented a new toy, their instinctual behaviour is to pick it upand inspect it with their hand and eyes in tandem, clearly searching over its surface to properly understand what they are playing with. At any instance here, touch provides high fidelity localized information while vision provides complementary global context. However, in 3D shape reconstruction, the complementary… ▽ More When a toddler is presented a new toy, their instinctual behaviour is to pick it upand inspect it with their hand and eyes in tandem, clearly searching over its surface to properly understand what they are playing with. At any instance here, touch provides high fidelity localized information while vision provides complementary global context. However, in 3D shape reconstruction, the complementary fusion of visual and haptic modalities remains largely unexplored. In this paper, we study this problem and present an effective chart-based approach to multi-modal shape understanding which encourages a similar fusion vision and touch information.To do so, we introduce a dataset of simulated touch and vision signals from the interaction between a robotic hand and a large array of 3D objects. Our results show that (1) leveraging both vision and touch signals consistently improves single-modality baselines; (2) our approach outperforms alternative modality fusion methods and strongly benefits from the proposed chart-based structure; (3) there construction quality increases with the number of grasps provided; and (4) the touch information not only enhances the reconstruction at the touch site but also extrapolates to its local neighborhood. △ Less

Submitted 2 November, 2020; v1 submitted 7 July, 2020; originally announced July 2020.

Comments: Accepted at Neurips 2020

arXiv:2004.06502 [pdf, other]

Unsupervised Multimodal Video-to-Video Translation via Self-Supervised Learning

Authors: Kangning Liu, Shuhang Gu, Andres Romero, Radu Timofte

Abstract: Existing unsupervised video-to-video translation methods fail to produce translated videos which are frame-wise realistic, semantic information preserving and video-level consistent. In this work, we propose UVIT, a novel unsupervised video-to-video translation model. Our model decomposes the style and the content, uses the specialized encoder-decoder structure and propagates the inter-frame infor… ▽ More Existing unsupervised video-to-video translation methods fail to produce translated videos which are frame-wise realistic, semantic information preserving and video-level consistent. In this work, we propose UVIT, a novel unsupervised video-to-video translation model. Our model decomposes the style and the content, uses the specialized encoder-decoder structure and propagates the inter-frame information through bidirectional recurrent neural network (RNN) units. The style-content decomposition mechanism enables us to achieve style consistent video translation results as well as provides us with a good interface for modality flexible translation. In addition, by changing the input frames and style codes incorporated in our translation, we propose a video interpolation loss, which captures temporal information within the sequence to train our building blocks in a self-supervised manner. Our model can produce photo-realistic, spatio-temporal consistent translated videos in a multimodal way. Subjective and objective experimental results validate the superiority of our model over existing methods. More details can be found on our project website: https://uvit.netlify.com △ Less

Submitted 14 April, 2020; originally announced April 2020.

arXiv:2004.04977 [pdf, other]

doi 10.1007/978-3-030-58542-6_24

SESAME: Semantic Editing of Scenes by Adding, Manipulating or Erasing Objects

Authors: Evangelos Ntavelis, Andrés Romero, Iason Kastanis, Luc Van Gool, Radu Timofte

Abstract: Recent advances in image generation gave rise to powerful tools for semantic image editing. However, existing approaches can either operate on a single image or require an abundance of additional information. They are not capable of handling the complete set of editing operations, that is addition, manipulation or removal of semantic concepts. To address these limitations, we propose SESAME, a nov… ▽ More Recent advances in image generation gave rise to powerful tools for semantic image editing. However, existing approaches can either operate on a single image or require an abundance of additional information. They are not capable of handling the complete set of editing operations, that is addition, manipulation or removal of semantic concepts. To address these limitations, we propose SESAME, a novel generator-discriminator pair for Semantic Editing of Scenes by Adding, Manipulating or Erasing objects. In our setup, the user provides the semantic labels of the areas to be edited and the generator synthesizes the corresponding pixels. In contrast to previous methods that employ a discriminator that trivially concatenates semantics and image as an input, the SESAME discriminator is composed of two input streams that independently process the image and its semantics, using the latter to manipulate the results of the former. We evaluate our model on a diverse set of datasets and report state-of-the-art performance on two tasks: (a) image manipulation and (b) image generation conditioned on semantic labels. △ Less

Submitted 8 October, 2020; v1 submitted 10 April, 2020; originally announced April 2020.

arXiv:2004.04433 [pdf, other]

DeepSEE: Deep Disentangled Semantic Explorative Extreme Super-Resolution

Authors: Marcel C. Bühler, Andrés Romero, Radu Timofte

Abstract: Super-resolution (SR) is by definition ill-posed. There are infinitely many plausible high-resolution variants for a given low-resolution natural image. Most of the current literature aims at a single deterministic solution of either high reconstruction fidelity or photo-realistic perceptual quality. In this work, we propose an explorative facial super-resolution framework, DeepSEE, for Deep disen… ▽ More Super-resolution (SR) is by definition ill-posed. There are infinitely many plausible high-resolution variants for a given low-resolution natural image. Most of the current literature aims at a single deterministic solution of either high reconstruction fidelity or photo-realistic perceptual quality. In this work, we propose an explorative facial super-resolution framework, DeepSEE, for Deep disentangled Semantic Explorative Extreme super-resolution. To the best of our knowledge, DeepSEE is the first method to leverage semantic maps for explorative super-resolution. In particular, it provides control of the semantic regions, their disentangled appearance and it allows a broad range of image manipulations. We validate DeepSEE on faces, for up to 32x magnification and exploration of the space of super-resolution. Our code and models are available at: https://mcbuehler.github.io/DeepSEE/ △ Less

Submitted 2 October, 2020; v1 submitted 9 April, 2020; originally announced April 2020.

Comments: 19 pages. Supplementary material is available on the project page. Accepted for oral presentation at the 15th Asian Conference on Computer Vision (ACCV) 2020

arXiv:2003.04168 [pdf, other]

doi 10.1364/AO.383602

Hybrid calibration procedure for fringe projection profilometry based on stereo-vision and polynomial fitting

Authors: Raul Vargas, Andres G. Marrugo, Song Zhang, Lenny A. Romero

Abstract: The key to accurate 3D shape measurement in Fringe Projection Profilometry (FPP) is the proper calibration of the measurement system. Current calibration techniques rely on phase-coordinate mapping (PCM) or back-projection stereo-vision (SV) methods. PCM methods are cumbersome to implement as they require precise positioning of the calibration target relative to the FPP system but produce highly a… ▽ More The key to accurate 3D shape measurement in Fringe Projection Profilometry (FPP) is the proper calibration of the measurement system. Current calibration techniques rely on phase-coordinate mapping (PCM) or back-projection stereo-vision (SV) methods. PCM methods are cumbersome to implement as they require precise positioning of the calibration target relative to the FPP system but produce highly accurate measurements within the calibration volume. SV methods generally do not achieve the same accuracy level. However, the calibration is more flexible in that the calibration target can be arbitrarily positioned. In this work, we propose a hybrid calibration method that leverages the SV calibration approach using a PCM method to achieve higher accuracy. The method has the flexibility of SV methods, is robust to lens distortions, and has a simple relation between the recovered phase and the metric coordinates. Experimental results show that the proposed Hybrid method outperforms the SV method in terms of accuracy and reconstruction time due to its low computational complexity. △ Less

Submitted 9 March, 2020; originally announced March 2020.

Comments: Accepted for publication in Applied Optics Vol. 59 No. 13, 2020

ACM Class: I.4.5; I.4.7

arXiv:2001.09698 [pdf]

doi 10.1371/journal.pone.0243437

The side effect profile of Clozapine in real world data of three large mental hospitals

Authors: Ehtesham Iqbal, Risha Govind, Alvin Romero, Olubanke Dzahini, Matthew Broadbent, Robert Stewart, Tanya Smith, Chi-Hun Kim, Nomi Werbeloff, Richard Dobson, Zina Ibrahim

Abstract: Objective: Mining the data contained within Electronic Health Records (EHRs) can potentially generate a greater understanding of medication effects in the real world, complementing what we know from Randomised control trials (RCTs). We Propose a text mining approach to detect adverse events and medication episodes from the clinical text to enhance our understanding of adverse effects related to Cl… ▽ More Objective: Mining the data contained within Electronic Health Records (EHRs) can potentially generate a greater understanding of medication effects in the real world, complementing what we know from Randomised control trials (RCTs). We Propose a text mining approach to detect adverse events and medication episodes from the clinical text to enhance our understanding of adverse effects related to Clozapine, the most effective antipsychotic drug for the management of treatment-resistant schizophrenia, but underutilised due to concerns over its side effects. Material and Methods: We used data from de-identified EHRs of three mental health trusts in the UK (>50 million documents, over 500,000 patients, 2835 of which were prescribed Clozapine). We explored the prevalence of 33 adverse effects by age, gender, ethnicity, smoking status and admission type three months before and after the patients started Clozapine treatment. We compared the prevalence of adverse effects with those reported in the Side Effects Resource (SIDER) where possible. Results: Sedation, fatigue, agitation, dizziness, hypersalivation, weight gain, tachycardia, headache, constipation and confusion were amongst the highest recorded Clozapine adverse effect in the three months following the start of treatment. Higher percentages of all adverse effects were found in the first month of Clozapine therapy. Using a significance level of (p< 0.05) out chi-square tests show a significant association between most of the ADRs in smoking status and hospital admissions and some in gender and age groups. Further, the data was combined from three trusts, and chi-square tests were applied to estimate the average effect of ADRs in each monthly interval. Conclusion: A better understanding of how the drug works in the real world can complement clinical trials and precision medicine. △ Less

Submitted 27 January, 2020; originally announced January 2020.

arXiv:2001.08311 [pdf, other]

Learning to adapt class-specific features across domains for semantic segmentation

Authors: Mikel Menta, Adriana Romero, Joost van de Weijer

Abstract: Recent advances in unsupervised domain adaptation have shown the effectiveness of adversarial training to adapt features across domains, endowing neural networks with the capability of being tested on a target domain without requiring any training annotations in this domain. The great majority of existing domain adaptation models rely on image translation networks, which often contain a huge amoun… ▽ More Recent advances in unsupervised domain adaptation have shown the effectiveness of adversarial training to adapt features across domains, endowing neural networks with the capability of being tested on a target domain without requiring any training annotations in this domain. The great majority of existing domain adaptation models rely on image translation networks, which often contain a huge amount of domain-specific parameters. Additionally, the feature adaptation step often happens globally, at a coarse level, hindering its applicability to tasks such as semantic segmentation, where details are of crucial importance to provide sharp results. In this thesis, we present a novel architecture, which learns to adapt features across domains by taking into account per class information. To that aim, we design a conditional pixel-wise discriminator network, whose output is conditioned on the segmentation masks. Moreover, following recent advances in image translation, we adopt the recently introduced StarGAN architecture as image translation backbone, since it is able to perform translations across multiple domains by means of a single generator network. Preliminary results on a segmentation task designed to assess the effectiveness of the proposed approach highlight the potential of the model, improving upon strong baselines and alternative designs. △ Less

Submitted 22 January, 2020; originally announced January 2020.

Comments: Master thesis dissertation for the Master in Computer Vision (Barcelona). 11 pages main article and 3 pages appendices. Code available

arXiv:1912.12726 [pdf, other]

SLOAM: Semantic Lidar Odometry and Mapping for Forest Inventory

Authors: Steven W. Chen, Guilherme V. Nardari, Elijah S. Lee, Chao Qu, Xu Liu, Roseli A. F. Romero, Vijay Kumar

Abstract: This paper describes an end-to-end pipeline for tree diameter estimation based on semantic segmentation and lidar odometry and mapping. Accurate mapping of this type of environment is challenging since the ground and the trees are surrounded by leaves, thorns and vines, and the sensor typically experiences extreme motion. We propose a semantic feature based pose optimization that simultaneously re… ▽ More This paper describes an end-to-end pipeline for tree diameter estimation based on semantic segmentation and lidar odometry and mapping. Accurate mapping of this type of environment is challenging since the ground and the trees are surrounded by leaves, thorns and vines, and the sensor typically experiences extreme motion. We propose a semantic feature based pose optimization that simultaneously refines the tree models while estimating the robot pose. The pipeline utilizes a custom virtual reality tool for labeling 3D scans that is used to train a semantic segmentation network. The masked point cloud is used to compute a trellis graph that identifies individual instances and extracts relevant features that are used by the SLAM module. We show that traditional lidar and image based methods fail in the forest environment on both Unmanned Aerial Vehicle (UAV) and hand-carry systems, while our method is more robust, scalable, and automatically generates tree diameter estimations. △ Less

Submitted 29 December, 2019; originally announced December 2019.

Comments: 8 pages, 5 figures, IEEE Robotics and Automation Letters

Showing 1–50 of 86 results for author: Romero, A