subscribe to arXiv mailings

arXiv:1812.11315 [pdf, other]

On Infusing Reachability-Based Safety Assurance within Probabilistic Planning Frameworks for Human-Robot Vehicle Interactions

Authors: Karen Leung, Edward Schmerling, Mo Chen, John Talbot, J. Christian Gerdes, Marco Pavone

Abstract: Action anticipation, intent prediction, and proactive behavior are all desirable characteristics for autonomous driving policies in interactive scenarios. Paramount, however, is ensuring safety on the road --- a key challenge in doing so is accounting for uncertainty in human driver actions without unduly impacting planner performance. This paper introduces a minimally-interventional safety contro… ▽ More Action anticipation, intent prediction, and proactive behavior are all desirable characteristics for autonomous driving policies in interactive scenarios. Paramount, however, is ensuring safety on the road --- a key challenge in doing so is accounting for uncertainty in human driver actions without unduly impacting planner performance. This paper introduces a minimally-interventional safety controller operating within an autonomous vehicle control stack with the role of ensuring collision-free interaction with an externally controlled (e.g., human-driven) counterpart. We leverage reachability analysis to construct a real-time (100Hz) controller that serves the dual role of (1) tracking an input trajectory from a higher-level planning algorithm using model predictive control, and (2) assuring safety through maintaining the availability of a collision-free escape maneuver as a persistent constraint regardless of whatever future actions the other car takes. A full-scale steer-by-wire platform is used to conduct traffic weaving experiments wherein the two cars, initially side-by-side, must swap lanes in a limited amount of time and distance, emulating cars merging onto/off of a highway. We demonstrate that, with our control stack, the autonomous vehicle is able to avoid collision even when the other car defies the planner's expectations and takes dangerous actions, either carelessly or with the intent to collide, and otherwise deviates minimally from the planned trajectory to the extent required to maintain safety. △ Less

Submitted 29 December, 2018; originally announced December 2018.

Comments: Presented at the International Symposium on Experimental Robotics, Buenos Aires, Argentina, 2018

arXiv:1811.06590 [pdf, other]

Reduced Order Model Predictive Control For Setpoint Tracking

Authors: Joseph Lorenzetti, Benoit Landry, Sumeet Singh, Marco Pavone

Abstract: Despite the success of model predictive control (MPC), its application to high-dimensional systems, such as flexible structures and coupled fluid/rigid-body systems, remains a largely open challenge due to excessive computational complexity. A promising solution approach is to leverage reduced order models for designing the model predictive controller. In this paper we present a reduced order MPC… ▽ More Despite the success of model predictive control (MPC), its application to high-dimensional systems, such as flexible structures and coupled fluid/rigid-body systems, remains a largely open challenge due to excessive computational complexity. A promising solution approach is to leverage reduced order models for designing the model predictive controller. In this paper we present a reduced order MPC scheme that enables setpoint tracking while robustly guaranteeing constraint satisfaction for linear, discrete, time-invariant systems. Setpoint tracking is enabled by designing the MPC cost function to account for the steady-state error between the full and reduced order models. Robust constraint satisfaction is accomplished by solving (offline) a set of linear programs to provide bounds on the errors due to bounded disturbances, state estimation, and model approximation. The approach is validated on a synthetic system as well as a high-dimensional linear model of a flexible rod, obtained using finite element methods. △ Less

Submitted 2 May, 2019; v1 submitted 15 November, 2018; originally announced November 2018.

arXiv:1810.05993 [pdf, other]

The Trajectron: Probabilistic Multi-Agent Trajectory Modeling With Dynamic Spatiotemporal Graphs

Authors: Boris Ivanovic, Marco Pavone

Abstract: Developing safe human-robot interaction systems is a necessary step towards the widespread integration of autonomous agents in society. A key component of such systems is the ability to reason about the many potential futures (e.g. trajectories) of other agents in the scene. Towards this end, we present the Trajectron, a graph-structured model that predicts many potential future trajectories of mu… ▽ More Developing safe human-robot interaction systems is a necessary step towards the widespread integration of autonomous agents in society. A key component of such systems is the ability to reason about the many potential futures (e.g. trajectories) of other agents in the scene. Towards this end, we present the Trajectron, a graph-structured model that predicts many potential future trajectories of multiple agents simultaneously in both highly dynamic and multimodal scenarios (i.e. where the number of agents in the scene is time-varying and there are many possible highly-distinct futures for each agent). It combines tools from recurrent sequence modeling and variational deep generative modeling to produce a distribution of future trajectories for each agent in a scene. We demonstrate the performance of our model on several datasets, obtaining state-of-the-art results on standard trajectory prediction metrics as well as introducing a new metric for comparing models that output distributions. △ Less

Submitted 23 August, 2019; v1 submitted 14 October, 2018; originally announced October 2018.

Comments: IEEE/CVF International Conference on Computer Vision (ICCV) 2019 -- 10 pages, 10 figures, 2 tables

arXiv:1808.04468 [pdf, ps, other]

Risk-Sensitive Generative Adversarial Imitation Learning

Authors: Jonathan Lacotte, Mohammad Ghavamzadeh, Yinlam Chow, Marco Pavone

Abstract: We study risk-sensitive imitation learning where the agent's goal is to perform at least as well as the expert in terms of a risk profile. We first formulate our risk-sensitive imitation learning setting. We consider the generative adversarial approach to imitation learning (GAIL) and derive an optimization problem for our formulation, which we call it risk-sensitive GAIL (RS-GAIL). We then derive… ▽ More We study risk-sensitive imitation learning where the agent's goal is to perform at least as well as the expert in terms of a risk profile. We first formulate our risk-sensitive imitation learning setting. We consider the generative adversarial approach to imitation learning (GAIL) and derive an optimization problem for our formulation, which we call it risk-sensitive GAIL (RS-GAIL). We then derive two different versions of our RS-GAIL optimization problem that aim at matching the risk profiles of the agent and the expert w.r.t. Jensen-Shannon (JS) divergence and Wasserstein distance, and develop risk-sensitive generative adversarial imitation learning algorithms based on these optimization problems. We evaluate the performance of our algorithms and compare them with GAIL and the risk-averse imitation learning (RAIL) algorithms in two MuJoCo and two OpenAI classical control tasks. △ Less

Submitted 23 December, 2018; v1 submitted 13 August, 2018; originally announced August 2018.

arXiv:1808.00649 [pdf, other]

Robust Tracking with Model Mismatch for Fast and Safe Planning: an SOS Optimization Approach

Authors: Sumeet Singh, Mo Chen, Sylvia L. Herbert, Claire J. Tomlin, Marco Pavone

Abstract: In the pursuit of real-time motion planning, a commonly adopted practice is to compute a trajectory by running a planning algorithm on a simplified, low-dimensional dynamical model, and then employ a feedback tracking controller that tracks such a trajectory by accounting for the full, high-dimensional system dynamics. While this strategy of planning with model mismatch generally yields fast compu… ▽ More In the pursuit of real-time motion planning, a commonly adopted practice is to compute a trajectory by running a planning algorithm on a simplified, low-dimensional dynamical model, and then employ a feedback tracking controller that tracks such a trajectory by accounting for the full, high-dimensional system dynamics. While this strategy of planning with model mismatch generally yields fast computation times, there are no guarantees of dynamic feasibility, which hampers application to safety-critical systems. Building upon recent work that addressed this problem through the lens of Hamilton-Jacobi (HJ) reachability, we devise an algorithmic framework whereby one computes, offline, for a pair of "planner" (i.e., low-dimensional) and "tracking" (i.e., high-dimensional) models, a feedback tracking controller and associated tracking bound. This bound is then used as a safety margin when generating motion plans via the low-dimensional model. Specifically, we harness the computational tool of sum-of-squares (SOS) programming to design a bilinear optimization algorithm for the computation of the feedback tracking controller and associated tracking bound. The algorithm is demonstrated via numerical experiments, with an emphasis on investigating the trade-off between the increased computational scalability afforded by SOS and its intrinsic conservativeness. Collectively, our results enable scaling the appealing strategy of planning with model mismatch to systems that are beyond the reach of HJ analysis, while maintaining safety guarantees. △ Less

Submitted 28 July, 2019; v1 submitted 1 August, 2018; originally announced August 2018.

Comments: Presented at WAFR 2018; final version v2 -- fixed typos

arXiv:1808.00113 [pdf, other]

Learning Stabilizable Dynamical Systems via Control Contraction Metrics

Authors: Sumeet Singh, Vikas Sindhwani, Jean-Jacques E. Slotine, Marco Pavone

Abstract: We propose a novel framework for learning stabilizable nonlinear dynamical systems for continuous control tasks in robotics. The key idea is to develop a new control-theoretic regularizer for dynamics fitting rooted in the notion of stabilizability, which guarantees that the learned system can be accompanied by a robust controller capable of stabilizing any open-loop trajectory that the system may… ▽ More We propose a novel framework for learning stabilizable nonlinear dynamical systems for continuous control tasks in robotics. The key idea is to develop a new control-theoretic regularizer for dynamics fitting rooted in the notion of stabilizability, which guarantees that the learned system can be accompanied by a robust controller capable of stabilizing any open-loop trajectory that the system may generate. By leveraging tools from contraction theory, statistical learning, and convex optimization, we provide a general and tractable semi-supervised algorithm to learn stabilizable dynamics, which can be applied to complex underactuated systems. We validated the proposed algorithm on a simulated planar quadrotor system and observed notably improved trajectory generation and tracking performance with the control-theoretic regularized model over models learned using traditional regression techniques, especially when using a small number of demonstration examples. The results presented illustrate the need to infuse standard model-based reinforcement learning algorithms with concepts drawn from nonlinear control theory for improved reliability. △ Less

Submitted 10 November, 2018; v1 submitted 31 July, 2018; originally announced August 2018.

Comments: To appear at WAFR 2018. v2: re-structured Sections 3 & 4 to improve clarity; expanded discussion on limitations & future work in Section 5; added details on training & validation, significantly expanded experiments

arXiv:1807.11553 [pdf, other]

Reach-Avoid Problems via Sum-of-Squares Optimization and Dynamic Programming

Authors: Benoit Landry, Mo Chen, Scott Hemley, Marco Pavone

Abstract: Reach-avoid problems involve driving a system to a set of desirable configurations while keeping it away from undesirable ones. Providing mathematical guarantees for such scenarios is challenging but have numerous potential practical applications. Due to the challenges, analysis of reach-avoid problems involves making trade-offs between generality of system dynamics, generality of problem setups,… ▽ More Reach-avoid problems involve driving a system to a set of desirable configurations while keeping it away from undesirable ones. Providing mathematical guarantees for such scenarios is challenging but have numerous potential practical applications. Due to the challenges, analysis of reach-avoid problems involves making trade-offs between generality of system dynamics, generality of problem setups, optimality of solutions, and computational complexity. In this paper, we combine sum-of-squares optimization and dynamic programming to address the reach-avoid problem, and provide a conservative solution that maintains reaching and avoidance guarantees. Our method is applicable to polynomial system dynamics and to general problem setups, and is more computationally scalable than previous related methods. Through a numerical example involving two single integrators, we validate our proposed theory and compare our method to Hamilton-Jacobi reachability. Having validated our theory, we demonstrate the computational scalability of our method by computing the reach-avoid set of a system involving two kinematic cars. △ Less

Submitted 30 July, 2018; originally announced July 2018.

Comments: International Conference on Intelligent Robots & Systems (IROS), 2018

arXiv:1807.10366 [pdf, other]

Robot Motion Planning in Learned Latent Spaces

Authors: Brian Ichter, Marco Pavone

Abstract: This paper presents Latent Sampling-based Motion Planning (L-SBMP), a methodology towards computing motion plans for complex robotic systems by learning a plannable latent representation. Recent works in control of robotic systems have effectively leveraged local, low-dimensional embeddings of high-dimensional dynamics. In this paper we combine these recent advances with techniques from sampling-b… ▽ More This paper presents Latent Sampling-based Motion Planning (L-SBMP), a methodology towards computing motion plans for complex robotic systems by learning a plannable latent representation. Recent works in control of robotic systems have effectively leveraged local, low-dimensional embeddings of high-dimensional dynamics. In this paper we combine these recent advances with techniques from sampling-based motion planning (SBMP) in order to design a methodology capable of planning for high-dimensional robotic systems beyond the reach of traditional approaches (e.g., humanoids, or even systems where planning occurs in the visual space). Specifically, the learned latent space is constructed through an autoencoding network, a dynamics network, and a collision checking network, which mirror the three main algorithmic primitives of SBMP, namely state sampling, local steering, and collision checking. Notably, these networks can be trained through only raw data of the system's states and actions along with a supervising collision checker. Building upon these networks, an RRT-based algorithm is used to plan motions directly in the latent space - we refer to this exploration algorithm as Learned Latent RRT (L2RRT). This algorithm globally explores the latent space and is capable of generalizing to new environments. The overall methodology is demonstrated on two planning problems, namely a visual planning problem, whereby planning happens in the visual (pixel) space, and a humanoid robot planning problem. △ Less

Submitted 6 November, 2018; v1 submitted 26 July, 2018; originally announced July 2018.

arXiv:1807.08912 [pdf, other]

Meta-Learning Priors for Efficient Online Bayesian Regression

Authors: James Harrison, Apoorva Sharma, Marco Pavone

Abstract: Gaussian Process (GP) regression has seen widespread use in robotics due to its generality, simplicity of use, and the utility of Bayesian predictions. The predominant implementation of GP regression is a nonparameteric kernel-based approach, as it enables fitting of arbitrary nonlinear functions. However, this approach suffers from two main drawbacks: (1) it is computationally inefficient, as com… ▽ More Gaussian Process (GP) regression has seen widespread use in robotics due to its generality, simplicity of use, and the utility of Bayesian predictions. The predominant implementation of GP regression is a nonparameteric kernel-based approach, as it enables fitting of arbitrary nonlinear functions. However, this approach suffers from two main drawbacks: (1) it is computationally inefficient, as computation scales poorly with the number of samples; and (2) it can be data inefficient, as encoding prior knowledge that can aid the model through the choice of kernel and associated hyperparameters is often challenging and unintuitive. In this work, we propose ALPaCA, an algorithm for efficient Bayesian regression which addresses these issues. ALPaCA uses a dataset of sample functions to learn a domain-specific, finite-dimensional feature encoding, as well as a prior over the associated weights, such that Bayesian linear regression in this feature space yields accurate online predictions of the posterior predictive density. These features are neural networks, which are trained via a meta-learning (or "learning-to-learn") approach. ALPaCA extracts all prior information directly from the dataset, rather than restricting prior information to the choice of kernel hyperparameters. Furthermore, by operating in the weight space, it substantially reduces sample complexity. We investigate the performance of ALPaCA on two simple regression problems, two simulated robotic systems, and on a lane-change driving task performed by humans. We find our approach outperforms kernel-based GP regression, as well as state of the art meta-learning approaches, thereby providing a promising plug-in tool for many regression tasks in robotics where scalability and data-efficiency are important. △ Less

Submitted 30 October, 2018; v1 submitted 24 July, 2018; originally announced July 2018.

Comments: Workshop on the Algorithmic Foundations of Robotics (WAFR), 2018

arXiv:1806.06161 [pdf, other]

BaRC: Backward Reachability Curriculum for Robotic Reinforcement Learning

Authors: Boris Ivanovic, James Harrison, Apoorva Sharma, Mo Chen, Marco Pavone

Abstract: Model-free Reinforcement Learning (RL) offers an attractive approach to learn control policies for high-dimensional systems, but its relatively poor sample complexity often forces training in simulated environments. Even in simulation, goal-directed tasks whose natural reward function is sparse remain intractable for state-of-the-art model-free algorithms for continuous control. The bottleneck in… ▽ More Model-free Reinforcement Learning (RL) offers an attractive approach to learn control policies for high-dimensional systems, but its relatively poor sample complexity often forces training in simulated environments. Even in simulation, goal-directed tasks whose natural reward function is sparse remain intractable for state-of-the-art model-free algorithms for continuous control. The bottleneck in these tasks is the prohibitive amount of exploration required to obtain a learning signal from the initial state of the system. In this work, we leverage physical priors in the form of an approximate system dynamics model to design a curriculum scheme for a model-free policy optimization algorithm. Our Backward Reachability Curriculum (BaRC) begins policy training from states that require a small number of actions to accomplish the task, and expands the initial state distribution backwards in a dynamically-consistent manner once the policy optimization algorithm demonstrates sufficient performance. BaRC is general, in that it can accelerate training of any model-free RL algorithm on a broad class of goal-directed continuous control MDPs. Its curriculum strategy is physically intuitive, easy-to-tune, and allows incorporating physical priors to accelerate training without hindering the performance, flexibility, and applicability of the model-free RL algorithm. We evaluate our approach on two representative dynamic robotic learning problems and find substantial performance improvement relative to previous curriculum generation techniques and naive exploration strategies. △ Less

Submitted 16 September, 2018; v1 submitted 15 June, 2018; originally announced June 2018.

arXiv:1804.11278 [pdf, other]

On the Interaction between Autonomous Mobility-on-Demand and Public Transportation Systems

Authors: Mauro Salazar, Federico Rossi, Maximilian Schiffer, Christopher H. Onder, Marco Pavone

Abstract: In this paper we study models and coordination policies for intermodal Autonomous Mobility-on-Demand (AMoD), wherein a fleet of self-driving vehicles provides on-demand mobility jointly with public transit. Specifically, we first present a network flow model for intermodal AMoD, where we capture the coupling between AMoD and public transit and the goal is to maximize social welfare. Second, levera… ▽ More In this paper we study models and coordination policies for intermodal Autonomous Mobility-on-Demand (AMoD), wherein a fleet of self-driving vehicles provides on-demand mobility jointly with public transit. Specifically, we first present a network flow model for intermodal AMoD, where we capture the coupling between AMoD and public transit and the goal is to maximize social welfare. Second, leveraging such a model, we design a pricing and tolling scheme that allows to achieve the social optimum under the assumption of a perfect market with selfish agents. Finally, we present a real-world case study for New York City. Our results show that the coordination between AMoD fleets and public transit can yield significant benefits compared to an AMoD system operating in isolation. △ Less

Submitted 5 September, 2018; v1 submitted 30 April, 2018; originally announced April 2018.

Comments: 9 pages, 8 figures, ITSC 2018

arXiv:1804.11074 [pdf, other]

Stochastic Model Predictive Control for Autonomous Mobility on Demand

Authors: Matthew Tsao, Ramon Iglesias, Marco Pavone

Abstract: This paper presents a stochastic, model predictive control (MPC) algorithm that leverages short-term probabilistic forecasts for dispatching and rebalancing Autonomous Mobility-on-Demand systems (AMoD, i.e. fleets of self-driving vehicles). We first present the core stochastic optimization problem in terms of a time-expanded network flow model. Then, to ameliorate its tractability, we present two… ▽ More This paper presents a stochastic, model predictive control (MPC) algorithm that leverages short-term probabilistic forecasts for dispatching and rebalancing Autonomous Mobility-on-Demand systems (AMoD, i.e. fleets of self-driving vehicles). We first present the core stochastic optimization problem in terms of a time-expanded network flow model. Then, to ameliorate its tractability, we present two key relaxations. First, we replace the original stochastic problem with a Sample Average Approximation (SAA), and characterize the performance guarantees. Second, we separate the controller into two separate parts to address the task of assigning vehicles to the outstanding customers separate from that of rebalancing. This enables the problem to be solved as two totally unimodular linear programs, and thus easily scalable to large problem sizes. Finally, we test the proposed algorithm in two scenarios based on real data and show that it outperforms prior state-of-the-art algorithms. In particular, in a simulation using customer data from DiDi Chuxing, the algorithm presented here exhibits a 62.3 percent reduction in customer waiting time compared to state of the art non-stochastic algorithms. △ Less

Submitted 4 May, 2018; v1 submitted 30 April, 2018; originally announced April 2018.

Comments: Submitting to the IEEE International Conference on Intelligent Transportation Systems 2018

arXiv:1804.05804 [pdf, other]

Safe Motion Planning in Unknown Environments: Optimality Benchmarks and Tractable Policies

Authors: Lucas Janson, Tommy Hu, Marco Pavone

Abstract: This paper addresses the problem of planning a safe (i.e., collision-free) trajectory from an initial state to a goal region when the obstacle space is a-priori unknown and is incrementally revealed online, e.g., through line-of-sight perception. Despite its ubiquitous nature, this formulation of motion planning has received relatively little theoretical investigation, as opposed to the setup wher… ▽ More This paper addresses the problem of planning a safe (i.e., collision-free) trajectory from an initial state to a goal region when the obstacle space is a-priori unknown and is incrementally revealed online, e.g., through line-of-sight perception. Despite its ubiquitous nature, this formulation of motion planning has received relatively little theoretical investigation, as opposed to the setup where the environment is assumed known. A fundamental challenge is that, unlike motion planning with known obstacles, it is not even clear what an optimal policy to strive for is. Our contribution is threefold. First, we present a notion of optimality for safe planning in unknown environments in the spirit of comparative (as opposed to competitive) analysis, with the goal of obtaining a benchmark that is, at least conceptually, attainable. Second, by leveraging this theoretical benchmark, we derive a pseudo-optimal class of policies that can seamlessly incorporate any amount of prior or learned information while still guaranteeing the robot never collides. Finally, we demonstrate the practicality of our algorithmic approach in numerical experiments using a range of environment types and dynamics, including a comparison with a state of the art method. A key aspect of our framework is that it automatically and implicitly weighs exploration versus exploitation in a way that is optimal with respect to the information available. △ Less

Submitted 16 April, 2018; originally announced April 2018.

arXiv:1803.05464 [pdf, ps, other]

Review of Multi-Agent Algorithms for Collective Behavior: a Structural Taxonomy

Authors: Federico Rossi, Saptarshi Bandyopadhyay, Michael Wolf, Marco Pavone

Abstract: In this paper, we review multi-agent collective behavior algorithms in the literature and classify them according to their underlying mathematical structure. For each mathematical technique, we identify the multi-agent coordination tasks it can be applied to, and we analyze its scalability, bandwidth use, and demonstrated maturity. We highlight how versatile techniques such as artificial potential… ▽ More In this paper, we review multi-agent collective behavior algorithms in the literature and classify them according to their underlying mathematical structure. For each mathematical technique, we identify the multi-agent coordination tasks it can be applied to, and we analyze its scalability, bandwidth use, and demonstrated maturity. We highlight how versatile techniques such as artificial potential functions can be used for applications ranging from low-level position control to high-level coordination and task allocation, we discuss possible reasons for the slow adoption of complex distributed coordination algorithms in the field, and we highlight areas for further research and development. △ Less

Submitted 14 March, 2018; originally announced March 2018.

Comments: Six pages, one table. To be presented at NAASS 2018

arXiv:1803.02015 [pdf, other]

Generative Modeling of Multimodal Multi-Human Behavior

Authors: Boris Ivanovic, Edward Schmerling, Karen Leung, Marco Pavone

Abstract: This work presents a methodology for modeling and predicting human behavior in settings with N humans interacting in highly multimodal scenarios (i.e. where there are many possible highly-distinct futures). A motivating example includes robots interacting with humans in crowded environments, such as self-driving cars operating alongside human-driven vehicles or human-robot collaborative bin packin… ▽ More This work presents a methodology for modeling and predicting human behavior in settings with N humans interacting in highly multimodal scenarios (i.e. where there are many possible highly-distinct futures). A motivating example includes robots interacting with humans in crowded environments, such as self-driving cars operating alongside human-driven vehicles or human-robot collaborative bin packing in a warehouse. Our approach to model human behavior in such uncertain environments is to model humans in the scene as nodes in a graphical model, with edges encoding relationships between them. For each human, we learn a multimodal probability distribution over future actions from a dataset of multi-human interactions. Learning such distributions is made possible by recent advances in the theory of conditional variational autoencoders and deep learning approximations of probabilistic graphical models. Specifically, we learn action distributions conditioned on interaction history, neighboring human behavior, and candidate future agent behavior in order to take into account response dynamics. We demonstrate the performance of such a modeling approach in modeling basketball player trajectories, a highly multimodal, multi-human scenario which serves as a proxy for many robotic applications. △ Less

Submitted 26 July, 2018; v1 submitted 5 March, 2018; originally announced March 2018.

Comments: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2018 -- 8 pages, 5 figures

arXiv:1711.10055 [pdf, other]

Risk-sensitive Inverse Reinforcement Learning via Semi- and Non-Parametric Methods

Authors: Sumeet Singh, Jonathan Lacotte, Anirudha Majumdar, Marco Pavone

Abstract: The literature on Inverse Reinforcement Learning (IRL) typically assumes that humans take actions in order to minimize the expected value of a cost function, i.e., that humans are risk neutral. Yet, in practice, humans are often far from being risk neutral. To fill this gap, the objective of this paper is to devise a framework for risk-sensitive IRL in order to explicitly account for a human's ris… ▽ More The literature on Inverse Reinforcement Learning (IRL) typically assumes that humans take actions in order to minimize the expected value of a cost function, i.e., that humans are risk neutral. Yet, in practice, humans are often far from being risk neutral. To fill this gap, the objective of this paper is to devise a framework for risk-sensitive IRL in order to explicitly account for a human's risk sensitivity. To this end, we propose a flexible class of models based on coherent risk measures, which allow us to capture an entire spectrum of risk preferences from risk-neutral to worst-case. We propose efficient non-parametric algorithms based on linear programming and semi-parametric algorithms based on maximum likelihood for inferring a human's underlying risk measure and cost function for a rich class of static and dynamic decision-making settings. The resulting approach is demonstrated on a simulated driving game with ten human participants. Our method is able to infer and mimic a wide range of qualitatively different driving styles from highly risk-averse to risk-neutral in a data-efficient manner. Moreover, comparisons of the Risk-Sensitive (RS) IRL approach with a risk-neutral model show that the RS-IRL framework more accurately captures observed participant behavior both qualitatively and quantitatively, especially in scenarios where catastrophic outcomes such as collisions can occur. △ Less

Submitted 22 March, 2018; v1 submitted 27 November, 2017; originally announced November 2017.

Comments: Submitted to International Journal of Robotics Research; Revision 1: (i) Clarified minor technical points; (ii) Revised proof for Theorem 3 to hold under weaker assumptions; (iii) Added additional figures and expanded discussions to improve readability

arXiv:1710.11040 [pdf, other]

How Should a Robot Assess Risk? Towards an Axiomatic Theory of Risk in Robotics

Authors: Anirudha Majumdar, Marco Pavone

Abstract: Endowing robots with the capability of assessing risk and making risk-aware decisions is widely considered a key step toward ensuring safety for robots operating under uncertainty. But, how should a robot quantify risk? A natural and common approach is to consider the framework whereby costs are assigned to stochastic outcomes - an assignment captured by a cost random variable. Quantifying risk th… ▽ More Endowing robots with the capability of assessing risk and making risk-aware decisions is widely considered a key step toward ensuring safety for robots operating under uncertainty. But, how should a robot quantify risk? A natural and common approach is to consider the framework whereby costs are assigned to stochastic outcomes - an assignment captured by a cost random variable. Quantifying risk then corresponds to evaluating a risk metric, i.e., a mapping from the cost random variable to a real number. Yet, the question of what constitutes a "good" risk metric has received little attention within the robotics community. The goal of this paper is to explore and partially address this question by advocating axioms that risk metrics in robotics applications should satisfy in order to be employed as rational assessments of risk. We discuss general representation theorems that precisely characterize the class of metrics that satisfy these axioms (referred to as distortion risk metrics), and provide instantiations that can be used in applications. We further discuss pitfalls of commonly used risk metrics in robotics, and discuss additional properties that one must consider in sequential decision making tasks. Our hope is that the ideas presented here will lead to a foundational framework for quantifying risk (and hence safety) in robotics applications. △ Less

Submitted 1 November, 2017; v1 submitted 30 October, 2017; originally announced October 2017.

Comments: Extended version of paper published in International Symposium on Robotics Research (ISRR) 2017

arXiv:1710.09483 [pdf, other]

Multimodal Probabilistic Model-Based Planning for Human-Robot Interaction

Authors: Edward Schmerling, Karen Leung, Wolf Vollprecht, Marco Pavone

Abstract: This paper presents a method for constructing human-robot interaction policies in settings where multimodality, i.e., the possibility of multiple highly distinct futures, plays a critical role in decision making. We are motivated in this work by the example of traffic weaving, e.g., at highway on-ramps/off-ramps, where entering and exiting cars must swap lanes in a short distance---a challenging n… ▽ More This paper presents a method for constructing human-robot interaction policies in settings where multimodality, i.e., the possibility of multiple highly distinct futures, plays a critical role in decision making. We are motivated in this work by the example of traffic weaving, e.g., at highway on-ramps/off-ramps, where entering and exiting cars must swap lanes in a short distance---a challenging negotiation even for experienced drivers due to the inherent multimodal uncertainty of who will pass whom. Our approach is to learn multimodal probability distributions over future human actions from a dataset of human-human exemplars and perform real-time robot policy construction in the resulting environment model through massively parallel sampling of human responses to candidate robot action sequences. Direct learning of these distributions is made possible by recent advances in the theory of conditional variational autoencoders (CVAEs), whereby we learn action distributions simultaneously conditioned on the present interaction history, as well as candidate future robot actions in order to take into account response dynamics. We demonstrate the efficacy of this approach with a human-in-the-loop simulation of a traffic weaving scenario. △ Less

Submitted 25 October, 2017; originally announced October 2017.

arXiv:1709.07032 [pdf, other]

Data-Driven Model Predictive Control of Autonomous Mobility-on-Demand Systems

Authors: Ramon Iglesias, Federico Rossi, Kevin Wang, David Hallac, Jure Leskovec, Marco Pavone

Abstract: The goal of this paper is to present an end-to-end, data-driven framework to control Autonomous Mobility-on-Demand systems (AMoD, i.e. fleets of self-driving vehicles). We first model the AMoD system using a time-expanded network, and present a formulation that computes the optimal rebalancing strategy (i.e., preemptive repositioning) and the minimum feasible fleet size for a given travel demand.… ▽ More The goal of this paper is to present an end-to-end, data-driven framework to control Autonomous Mobility-on-Demand systems (AMoD, i.e. fleets of self-driving vehicles). We first model the AMoD system using a time-expanded network, and present a formulation that computes the optimal rebalancing strategy (i.e., preemptive repositioning) and the minimum feasible fleet size for a given travel demand. Then, we adapt this formulation to devise a Model Predictive Control (MPC) algorithm that leverages short-term demand forecasts based on historical data to compute rebalancing strategies. We test the end-to-end performance of this controller with a state-of-the-art LSTM neural network to predict customer demand and real customer data from DiDi Chuxing: we show that this approach scales very well for large systems (indeed, the computational complexity of the MPC algorithm does not depend on the number of customers and of vehicles in the system) and outperforms state-of-the-art rebalancing strategies by reducing the mean customer wait time by up to to 89.6%. △ Less

Submitted 20 September, 2017; originally announced September 2017.

Comments: Submitted to the International Conference on Robotics and Automation 2018

arXiv:1709.05448 [pdf, other]

Learning Sampling Distributions for Robot Motion Planning

Authors: Brian Ichter, James Harrison, Marco Pavone

Abstract: A defining feature of sampling-based motion planning is the reliance on an implicit representation of the state space, which is enabled by a set of probing samples. Traditionally, these samples are drawn either probabilistically or deterministically to uniformly cover the state space. Yet, the motion of many robotic systems is often restricted to "small" regions of the state space, due to, for exa… ▽ More A defining feature of sampling-based motion planning is the reliance on an implicit representation of the state space, which is enabled by a set of probing samples. Traditionally, these samples are drawn either probabilistically or deterministically to uniformly cover the state space. Yet, the motion of many robotic systems is often restricted to "small" regions of the state space, due to, for example, differential constraints or collision-avoidance constraints. To accelerate the planning process, it is thus desirable to devise non-uniform sampling strategies that favor sampling in those regions where an optimal solution might lie. This paper proposes a methodology for non-uniform sampling, whereby a sampling distribution is learned from demonstrations, and then used to bias sampling. The sampling distribution is computed through a conditional variational autoencoder, allowing sample generation from the latent space conditioned on the specific planning problem. This methodology is general, can be used in combination with any sampling-based planner, and can effectively exploit the underlying structure of a planning problem while maintaining the theoretical guarantees of sampling-based approaches. Specifically, on several planning problems, the proposed methodology is shown to effectively learn representations for the relevant regions of the state space, resulting in an order of magnitude improvement in terms of success rate and convergence to the optimal cost. △ Less

Submitted 11 March, 2019; v1 submitted 15 September, 2017; originally announced September 2017.

Comments: International Conference on Robotics and Automation (ICRA), 2018

arXiv:1709.04906 [pdf, other]

On the interaction between Autonomous Mobility-on-Demand systems and the power network: models and coordination algorithms

Authors: Federico Rossi, Ramon Iglesias, Mahnoosh Alizadeh, Marco Pavone

Abstract: We study the interaction between a fleet of electric, self-driving vehicles servicing on-demand transportation requests (referred to as Autonomous Mobility-on-Demand, or AMoD, system) and the electric power network. We propose a model that captures the coupling between the two systems stemming from the vehicles' charging requirements and captures time-varying customer demand and power generation c… ▽ More We study the interaction between a fleet of electric, self-driving vehicles servicing on-demand transportation requests (referred to as Autonomous Mobility-on-Demand, or AMoD, system) and the electric power network. We propose a model that captures the coupling between the two systems stemming from the vehicles' charging requirements and captures time-varying customer demand and power generation costs, road congestion, battery depreciation, and power transmission and distribution constraints. We then leverage the model to jointly optimize the operation of both systems. We devise an algorithmic procedure to losslessly reduce the problem size by bundling customer requests, allowing it to be efficiently solved by off-the-shelf linear programming solvers. Next, we show that the socially optimal solution to the joint problem can be enforced as a general equilibrium, and we provide a dual decomposition algorithm that allows self-interested agents to compute the market clearing prices without sharing private information. We assess the performance of the mode by studying a hypothetical AMoD system in Dallas-Fort Worth and its impact on the Texas power network. Lack of coordination between the AMoD system and the power network can cause a 4.4% increase in the price of electricity in Dallas-Fort Worth; conversely, coordination between the AMoD system and the power network could reduce electricity expenditure compared to the case where no cars are present (despite the increased demand for electricity) and yield savings of up $147M/year. Finally, we provide a receding-horizon implementation and assess its performance with agent-based simulations. Collectively, the results of this paper provide a first-of-a-kind characterization of the interaction between electric-powered AMoD systems and the power network, and shed additional light on the economic and societal value of AMoD. △ Less

Submitted 8 June, 2019; v1 submitted 14 September, 2017; originally announced September 2017.

Comments: Extended version of the paper presented at Robotics: Science and Systems XIV and accepted by TCNS. In Version 4, the body of the paper is largely rewritten for clarity and consistency, and new numerical simulations are presented. All source code is available (MIT) at https://dx.doi.org/10.5281/zenodo.3241651

arXiv:1707.04674 [pdf, other]

ADAPT: Zero-Shot Adaptive Policy Transfer for Stochastic Dynamical Systems

Authors: James Harrison, Animesh Garg, Boris Ivanovic, Yuke Zhu, Silvio Savarese, Li Fei-Fei, Marco Pavone

Abstract: Model-free policy learning has enabled robust performance of complex tasks with relatively simple algorithms. However, this simplicity comes at the cost of requiring an Oracle and arguably very poor sample complexity. This renders such methods unsuitable for physical systems. Variants of model-based methods address this problem through the use of simulators, however, this gives rise to the problem… ▽ More Model-free policy learning has enabled robust performance of complex tasks with relatively simple algorithms. However, this simplicity comes at the cost of requiring an Oracle and arguably very poor sample complexity. This renders such methods unsuitable for physical systems. Variants of model-based methods address this problem through the use of simulators, however, this gives rise to the problem of policy transfer from simulated to the physical system. Model mismatch due to systematic parameter shift and unmodelled dynamics error may cause sub-optimal or unsafe behavior upon direct transfer. We introduce the Adaptive Policy Transfer for Stochastic Dynamics (ADAPT) algorithm that achieves provably safe and robust, dynamically-feasible zero-shot transfer of RL-policies to new domains with dynamics error. ADAPT combines the strengths of offline policy learning in a black-box source simulator with online tube-based MPC to attenuate bounded model mismatch between the source and target dynamics. ADAPT allows online transfer of policy, trained solely in a simulation offline, to a family of unknown targets without fine-tuning. We also formally show that (i) ADAPT guarantees state and control safety through state-action tubes under the assumption of Lipschitz continuity of the divergence in dynamics and, (ii) ADAPT results in a bounded loss of reward accumulation relative to a policy trained and evaluated in the source environment. We evaluate ADAPT on 2 continuous, non-holonomic simulated dynamical systems with 4 different disturbance models, and find that ADAPT performs between 50%-300% better on mean reward accrual than direct policy transfer. △ Less

Submitted 8 November, 2017; v1 submitted 14 July, 2017; originally announced July 2017.

Comments: International Symposium on Robotics Research (ISRR), 2017

arXiv:1705.02408 [pdf, other]

Perception-Aware Motion Planning via Multiobjective Search on GPUs

Authors: Brian Ichter, Benoit Landry, Edward Schmerling, Marco Pavone

Abstract: In this paper we describe a framework towards computing well-localized, robust motion plans through the perception-aware motion planning problem, whereby we seek a low-cost motion plan subject to a separate constraint on perception localization quality. To solve this problem we introduce the Multiobjective Perception-Aware Planning (MPAP) algorithm which explores the state space via a multiobjecti… ▽ More In this paper we describe a framework towards computing well-localized, robust motion plans through the perception-aware motion planning problem, whereby we seek a low-cost motion plan subject to a separate constraint on perception localization quality. To solve this problem we introduce the Multiobjective Perception-Aware Planning (MPAP) algorithm which explores the state space via a multiobjective search, considering both cost and a perception heuristic. This framework can accommodate a large range of heuristics, allowing those that capture the history dependence of localization drift and represent complex modern perception methods. We present two such heuristics, one derived from a simplified model of robot perception and a second learned from ground-truth sensor error, which we show to be capable of predicting the performance of a state-of-the-art perception system. The solution trajectory from this heuristic-based search is then certified via Monte Carlo methods to be well-localized and robust. The additional computational burden of perception-aware planning is offset by GPU massive parallelization. Through numerical experiments the algorithm is shown to find well-localized, robust solutions in about a second. Finally, we demonstrate MPAP on a quadrotor flying perception-aware and perception-agnostic plans using Google Tango for localization, finding the quadrotor safely executes the perception-aware plan every time, while crashing in over 20% of the perception-agnostic runs due to loss of localization. △ Less

Submitted 6 December, 2017; v1 submitted 5 May, 2017; originally announced May 2017.

arXiv:1705.02403 [pdf, other]

Group Marching Tree: Sampling-Based Approximately Optimal Motion Planning on GPUs

Authors: Brian Ichter, Edward Schmerling, Marco Pavone

Abstract: This paper presents a novel approach, named the Group Marching Tree (GMT*) algorithm, to planning on GPUs at rates amenable to application within control loops, allowing planning in real-world settings via repeated computation of near-optimal plans. GMT*, like the Fast Marching Tree (FMT) algorithm, explores the state space with a "lazy" dynamic programming recursion on a set of samples to grow a… ▽ More This paper presents a novel approach, named the Group Marching Tree (GMT*) algorithm, to planning on GPUs at rates amenable to application within control loops, allowing planning in real-world settings via repeated computation of near-optimal plans. GMT*, like the Fast Marching Tree (FMT) algorithm, explores the state space with a "lazy" dynamic programming recursion on a set of samples to grow a tree of near-optimal paths. GMT*, however, alters the approach of FMT with approximate dynamic programming by expanding, in parallel, the group of all active samples with cost below an increasing threshold, rather than only the minimum cost sample. This group approximation enables low-level parallelism over the sample set and removes the need for sequential data structures, while the "lazy" collision checking limits thread divergence---all contributing to a very efficient GPU implementation. While this approach incurs some suboptimality, we prove that GMT* remains asymptotically optimal up to a constant multiplicative factor. We show solutions for complex planning problems under differential constraints can be found in ~10 ms on a desktop GPU and ~30 ms on an embedded GPU, representing a significant speed up over the state of the art, with only small losses in performance. Finally, we present a scenario demonstrating the efficacy of planning within the control loop (~100 Hz) towards operating in dynamic, uncertain settings. △ Less

Submitted 5 May, 2017; originally announced May 2017.

arXiv:1703.01029 [pdf, other]

A Framework for Time-Consistent, Risk-Sensitive Model Predictive Control: Theory and Algorithms

Authors: Sumeet Singh, Yin-Lam Chow, Anirudha Majumdar, Marco Pavone

Abstract: In this paper we present a framework for risk-sensitive model predictive control (MPC) of linear systems affected by stochastic multiplicative uncertainty. Our key innovation is to consider a time-consistent, dynamic risk evaluation of the cumulative cost as the objective function to be minimized. This framework is axiomatically justified in terms of time-consistency of risk assessments, is amenab… ▽ More In this paper we present a framework for risk-sensitive model predictive control (MPC) of linear systems affected by stochastic multiplicative uncertainty. Our key innovation is to consider a time-consistent, dynamic risk evaluation of the cumulative cost as the objective function to be minimized. This framework is axiomatically justified in terms of time-consistency of risk assessments, is amenable to dynamic optimization, and is unifying in the sense that it captures a full range of risk preferences from risk-neutral (i.e., expectation) to worst case. Within this framework, we propose and analyze an online risk-sensitive MPC algorithm that is provably stabilizing. Furthermore, by exploiting the dual representation of time-consistent, dynamic risk measures, we cast the computation of the MPC control law as a convex optimization problem amenable to real-time implementation. Simulation results are presented and discussed. △ Less

Submitted 25 April, 2018; v1 submitted 2 March, 2017; originally announced March 2017.

Comments: Submitted to IEEE Transactions on Automatic Control. arXiv admin note: text overlap with arXiv:1511.06981; v2: clarified exposition, reduced review of dynamic risk theory, updated simulations with computation time

arXiv:1612.03232 [pdf, other]

The Team Surviving Orienteers Problem: Routing Robots in Uncertain Environments with Survival Constraints

Authors: Stefan Jorgensen, Robert H. Chen, Mark B. Milam, Marco Pavone

Abstract: In this paper we study the following multi-robot coordination problem: given a graph, where each edge is weighted by the probability of surviving while traversing it, find a set of paths for $K$ robots that maximizes the expected number of nodes collectively visited, subject to constraints on the probability that each robot survives to its destination. We call this problem the Team Surviving Orien… ▽ More In this paper we study the following multi-robot coordination problem: given a graph, where each edge is weighted by the probability of surviving while traversing it, find a set of paths for $K$ robots that maximizes the expected number of nodes collectively visited, subject to constraints on the probability that each robot survives to its destination. We call this problem the Team Surviving Orienteers (TSO) problem. The TSO problem is motivated by scenarios where a team of robots must traverse a dangerous, uncertain environment, such as aid delivery in disaster or war zones. We present the TSO problem formally along with several variants, which represent "survivability-aware" counterparts for a wide range of multi-robot coordination problems such as vehicle routing, patrolling, and informative path planning. We propose an approximate greedy approach for selecting paths, and prove that the value of its output is bounded within a factor $1-e^{-p_s/λ}$ of the optimum where $p_s$ is the per-robot survival probability threshold, and $1/λ\le 1$ is the approximation factor of an oracle routine for the well-known orienteering problem. Our approach has linear time complexity in the team size and polynomial complexity in the graph size. Using numerical simulations, we verify that our approach is close to the optimum in practice and that it scales to problems with hundreds of nodes and tens of robots. △ Less

Submitted 9 December, 2016; originally announced December 2016.

Comments: 8 pages, 6 figures. Submitted to the IEEE International Conference on Robotic Computing, 2017

arXiv:1609.05399 [pdf, other]

Evaluating Trajectory Collision Probability through Adaptive Importance Sampling for Safe Motion Planning

Authors: Edward Schmerling, Marco Pavone

Abstract: This paper presents a tool for addressing a key component in many algorithms for planning robot trajectories under uncertainty: evaluation of the safety of a robot whose actions are governed by a closed-loop feedback policy near a nominal planned trajectory. We describe an adaptive importance sampling Monte Carlo framework that enables the evaluation of a given control policy for satisfaction of a… ▽ More This paper presents a tool for addressing a key component in many algorithms for planning robot trajectories under uncertainty: evaluation of the safety of a robot whose actions are governed by a closed-loop feedback policy near a nominal planned trajectory. We describe an adaptive importance sampling Monte Carlo framework that enables the evaluation of a given control policy for satisfaction of a probabilistic collision avoidance constraint which also provides an associated certificate of accuracy (in the form of a confidence interval). In particular this adaptive technique is well-suited to addressing the complexities of rigid-body collision checking applied to non-linear robot dynamics. As a Monte Carlo method it is amenable to parallelization for computational tractability, and is generally applicable to a wide gamut of simulatable systems, including alternative noise models. Numerical experiments demonstrating the effectiveness of the adaptive importance sampling procedure are presented and discussed. △ Less

Submitted 1 June, 2017; v1 submitted 17 September, 2016; originally announced September 2016.

arXiv:1609.02546

Congestion-Aware Randomized Routing in Autonomous Mobility-on-Demand Systems

Authors: Federico Rossi, Rick Zhang, Marco Pavone

Abstract: In this paper we study the routing and rebalancing problem for a fleet of autonomous vehicles providing on-demand transportation within a congested urban road network (that is, a road network where traffic speed depends on vehicle density). We show that the congestion-free routing and rebalancing problem is NP-hard and provide a randomized algorithm which finds a low-congestion solution to the rou… ▽ More In this paper we study the routing and rebalancing problem for a fleet of autonomous vehicles providing on-demand transportation within a congested urban road network (that is, a road network where traffic speed depends on vehicle density). We show that the congestion-free routing and rebalancing problem is NP-hard and provide a randomized algorithm which finds a low-congestion solution to the routing and rebalancing problem that approximately minimizes the number of vehicles on the road in polynomial time. We provide theoretical bounds on the probability of violating the congestion constraints; we also characterize the expected number of vehicles required by the solution with a commonly-used empirical congestion model and provide a bound on the approximation factor of the algorithm. Numerical experiments on a realistic road network with real-world customer demands show that our algorithm introduces very small amounts of congestion. The performance of our algorithm in terms of travel times and required number of vehicles is very close to (and sometimes better than) the optimal congestion-free solution. △ Less

Submitted 15 September, 2016; v1 submitted 8 September, 2016; originally announced September 2016.

Comments: This paper has been withdrawn by the authors due to an error in the proofs of Theorem 3.4 (bound on the probability of violating the congestion constraints) and Lemma 3.5 (approximation factor of the algorithm)

arXiv:1607.06886 [pdf, other]

Real-Time Stochastic Kinodynamic Motion Planning via Multiobjective Search on GPUs

Authors: Brian Ichter, Edward Schmerling, Ali-akbar Agha-mohammadi, Marco Pavone

Abstract: In this paper we present the PUMP (Parallel Uncertainty-aware Multiobjective Planning) algorithm for addressing the stochastic kinodynamic motion planning problem, whereby one seeks a low-cost, dynamically-feasible motion plan subject to a constraint on collision probability (CP). To ensure exhaustive evaluation of candidate motion plans (as needed to tradeoff the competing objectives of performan… ▽ More In this paper we present the PUMP (Parallel Uncertainty-aware Multiobjective Planning) algorithm for addressing the stochastic kinodynamic motion planning problem, whereby one seeks a low-cost, dynamically-feasible motion plan subject to a constraint on collision probability (CP). To ensure exhaustive evaluation of candidate motion plans (as needed to tradeoff the competing objectives of performance and safety), PUMP incrementally builds the Pareto front of the problem, accounting for the optimization objective and an approximation of CP. This is performed by a massively parallel multiobjective search, here implemented with a focus on GPUs. Upon termination of the exploration phase, PUMP searches the Pareto set of motion plans to identify the lowest cost solution that is certified to satisfy the CP constraint (according to an asymptotically exact estimator). We introduce a novel particle-based CP approximation scheme, designed for efficient GPU implementation, which accounts for dependencies over the history of a trajectory execution. We present numerical experiments for quadrotor planning wherein PUMP identifies solutions in ~100 ms, evaluating over one hundred thousand partial plans through the course of its exploration phase. The results show that this multiobjective search achieves a lower motion plan cost, for the same CP constraint, compared to a safety buffer-based search heuristic and repeated RRT trials. △ Less

Submitted 23 February, 2017; v1 submitted 22 July, 2016; originally announced July 2016.

arXiv:1607.04357 [pdf, other]

A BCMP Network Approach to Modeling and Controlling Autonomous Mobility-on-Demand Systems

Authors: Ramon Iglesias, Federico Rossi, Rick Zhang, Marco Pavone

Abstract: In this paper we present a queueing network approach to the problem of routing and rebalancing a fleet of self-driving vehicles providing on-demand mobility within a capacitated road network. We refer to such systems as autonomous mobility-on-demand systems, or AMoD. We first cast an AMoD system into a closed, multi-class BCMP queueing network model. Second, we present analysis tools that allow th… ▽ More In this paper we present a queueing network approach to the problem of routing and rebalancing a fleet of self-driving vehicles providing on-demand mobility within a capacitated road network. We refer to such systems as autonomous mobility-on-demand systems, or AMoD. We first cast an AMoD system into a closed, multi-class BCMP queueing network model. Second, we present analysis tools that allow the characterization of performance metrics for a given routing policy, in terms, e.g., of vehicle availabilities, and first and second order moments of vehicle throughput. Third, we propose a scalable method for the synthesis of routing policies, with performance guarantees in the limit of large fleet sizes. Finally, we validate our theoretical results on a case study of New York City. Collectively, this paper provides a unifying framework for the analysis and control of AMoD systems, which subsumes earlier Jackson and network flow models, provides a quite large set of modeling options (e.g., the inclusion of road capacities and general travel time distributions), and allows the analysis of second and higher-order moments for the performance metrics. △ Less

Submitted 26 March, 2017; v1 submitted 14 July, 2016; originally announced July 2016.

Comments: 18 pages, 3 figures. In preparation for conference submission. In version 2, clarity is improved and some typos are removed with no changes to the technical content of the paper

arXiv:1607.01478 [pdf, other]

Mixed Strategy for Constrained Stochastic Optimal Control

Authors: Masahiro Ono, Mahmoud El Chamie, Marco Pavone, Behcet Acikmese

Abstract: Choosing control inputs randomly can result in a reduced expected cost in optimal control problems with stochastic constraints, such as stochastic model predictive control (SMPC). We consider a controller with initial randomization, meaning that the controller randomly chooses from K+1 control sequences at the beginning (called K-randimization).It is known that, for a finite-state, finite-action M… ▽ More Choosing control inputs randomly can result in a reduced expected cost in optimal control problems with stochastic constraints, such as stochastic model predictive control (SMPC). We consider a controller with initial randomization, meaning that the controller randomly chooses from K+1 control sequences at the beginning (called K-randimization).It is known that, for a finite-state, finite-action Markov Decision Process (MDP) with K constraints, K-randimization is sufficient to achieve the minimum cost. We found that the same result holds for stochastic optimal control problems with continuous state and action spaces.Furthermore, we show the randomization of control input can result in reduced cost when the optimization problem is nonconvex, and the cost reduction is equal to the duality gap. We then provide the necessary and sufficient conditions for the optimality of a randomized solution, and develop an efficient solution method based on dual optimization. Furthermore, in a special case with K=1 such as a joint chance-constrained problem, the dual optimization can be solved even more efficiently by root finding. Finally, we test the theories and demonstrate the solution method on multiple practical problems ranging from path planning to the planning of entry, descent, and landing (EDL) for future Mars missions. △ Less

Submitted 6 July, 2016; originally announced July 2016.

Comments: 11 pages. 9 figures.Preliminary version of a working journal paper

arXiv:1603.00939 [pdf, other]

doi 10.15607/RSS.2016.XII.032

Routing Autonomous Vehicles in Congested Transportation Networks: Structural Properties and Coordination Algorithms

Authors: Rick Zhang, Federico Rossi, Marco Pavone

Abstract: This paper considers the problem of routing and rebalancing a shared fleet of autonomous (i.e., self-driving) vehicles providing on-demand mobility within a capacitated transportation network, where congestion might disrupt throughput. We model the problem within a network flow framework and show that under relatively mild assumptions the rebalancing vehicles, if properly coordinated, do not lead… ▽ More This paper considers the problem of routing and rebalancing a shared fleet of autonomous (i.e., self-driving) vehicles providing on-demand mobility within a capacitated transportation network, where congestion might disrupt throughput. We model the problem within a network flow framework and show that under relatively mild assumptions the rebalancing vehicles, if properly coordinated, do not lead to an increase in congestion (in stark contrast to common belief). From an algorithmic standpoint, such theoretical insight suggests that the problem of routing customers and rebalancing vehicles can be decoupled, which leads to a computationally-efficient routing and rebalancing algorithm for the autonomous vehicles. Numerical experiments and case studies corroborate our theoretical insights and show that the proposed algorithm outperforms state-of-the-art point-to-point methods by avoiding excess congestion on the road. Collectively, this paper provides a rigorous approach to the problem of congestion-aware, system-wide coordination of autonomously driving vehicles, and to the characterization of the sustainability of such robotic systems. △ Less

Submitted 29 July, 2016; v1 submitted 2 March, 2016; originally announced March 2016.

Comments: 11 pages, 3 figures. Presented at Robotics: Science and Systems (RSS) 2016. Version 2 is the extended version of the final submission included in the conference proceedings. The title of the initial submission was modified in deference to RSS's double-blind submission process: in this version, the title matches the published paper

arXiv:1602.05130 [pdf, other]

Risk Aversion in Finite Markov Decision Processes Using Total Cost Criteria and Average Value at Risk

Authors: Stefano Carpin, Yin-Lam Chow, Marco Pavone

Abstract: In this paper we present an algorithm to compute risk averse policies in Markov Decision Processes (MDP) when the total cost criterion is used together with the average value at risk (AVaR) metric. Risk averse policies are needed when large deviations from the expected behavior may have detrimental effects, and conventional MDP algorithms usually ignore this aspect. We provide conditions for the s… ▽ More In this paper we present an algorithm to compute risk averse policies in Markov Decision Processes (MDP) when the total cost criterion is used together with the average value at risk (AVaR) metric. Risk averse policies are needed when large deviations from the expected behavior may have detrimental effects, and conventional MDP algorithms usually ignore this aspect. We provide conditions for the structure of the underlying MDP ensuring that approximations for the exact problem can be derived and solved efficiently. Our findings are novel inasmuch as average value at risk has not previously been considered in association with the total cost criterion. Our method is demonstrated in a rapid deployment scenario, whereby a robot is tasked with the objective of reaching a target location within a temporal deadline where increased speed is associated with increased probability of failure. We demonstrate that the proposed algorithm not only produces a risk averse policy reducing the probability of exceeding the expected temporal deadline, but also provides the statistical distribution of costs, thus offering a valuable analysis tool. △ Less

Submitted 16 February, 2016; originally announced February 2016.

arXiv:1602.04762 [pdf, other]

Optimized and Trusted Collision Avoidance for Unmanned Aerial Vehicles using Approximate Dynamic Programming (Technical Report)

Authors: Zachary N. Sunberg, Mykel J. Kochenderfer, Marco Pavone

Abstract: Safely integrating unmanned aerial vehicles into civil airspace is contingent upon development of a trustworthy collision avoidance system. This paper proposes an approach whereby a parameterized resolution logic that is considered trusted for a given range of its parameters is adaptively tuned online. Specifically, to address the potential conservatism of the resolution logic with static paramete… ▽ More Safely integrating unmanned aerial vehicles into civil airspace is contingent upon development of a trustworthy collision avoidance system. This paper proposes an approach whereby a parameterized resolution logic that is considered trusted for a given range of its parameters is adaptively tuned online. Specifically, to address the potential conservatism of the resolution logic with static parameters, we present a dynamic programming approach for adapting the parameters dynamically based on the encounter state. We compute the adaptation policy offline using a simulation-based approximate dynamic programming method that accommodates the high dimensionality of the problem. Numerical experiments show that this approach improves safety and operational performance compared to the baseline resolution logic, while retaining trustworthiness. △ Less

Submitted 18 February, 2016; v1 submitted 15 February, 2016; originally announced February 2016.

Comments: An abbreviated version was submitted to ICRA 2016

arXiv:1601.00042 [pdf, other]

doi 10.2514/1.G001913

Fast, Safe, and Propellant-Efficient Spacecraft Planning under Clohessy-Wiltshire-Hill Dynamics

Authors: Joseph A. Starek, Edward Schmerling, Gabriel D. Maher, Brent W. Barbee, Marco Pavone

Abstract: This paper presents a sampling-based motion planning algorithm for real-time and propellant-optimized autonomous spacecraft trajectory generation in near-circular orbits. Specifically, this paper leverages recent algorithmic advances in the field of robot motion planning to the problem of impulsively-actuated, propellant-optimized rendezvous and proximity operations under the Clohessy-Wiltshire-Hi… ▽ More This paper presents a sampling-based motion planning algorithm for real-time and propellant-optimized autonomous spacecraft trajectory generation in near-circular orbits. Specifically, this paper leverages recent algorithmic advances in the field of robot motion planning to the problem of impulsively-actuated, propellant-optimized rendezvous and proximity operations under the Clohessy-Wiltshire-Hill (CWH) dynamics model. The approach calls upon a modified version of the Fast Marching Tree (FMT*) algorithm to grow a set of feasible trajectories over a deterministic, low-dispersion set of sample points covering the free state space. To enforce safety, the tree is only grown over the subset of actively-safe samples, from which there exists a feasible one-burn collision avoidance maneuver that can safely circularize the spacecraft orbit along its coasting arc under a given set of potential thruster failures. Key features of the proposed algorithm include: (i) theoretical guarantees in terms of trajectory safety and performance, (ii) amenability to real-time implementation, and (iii) generality, in the sense that a large class of constraints can be handled directly. As a result, the proposed algorithm offers the potential for widespread application, ranging from on-orbit satellite servicing to orbital debris removal and autonomous inspection missions. △ Less

Submitted 31 December, 2015; originally announced January 2016.

Comments: Submitted to the AIAA Journal of Guidance, Control, and Dynamics (JGCD) special issue entitled "Computational Guidance and Control". This submission is the journal version corresponding to the conference manuscript "Real-Time, Propellant-Efficient Spacecraft Planning under Clohessy-Wiltshire-Hill Dynamics" accepted to the 2016 IEEE Aerospace Conference in Big Sky, MT, USA

arXiv:1512.01629 [pdf, ps, other]

Risk-Constrained Reinforcement Learning with Percentile Risk Criteria

Authors: Yinlam Chow, Mohammad Ghavamzadeh, Lucas Janson, Marco Pavone

Abstract: In many sequential decision-making problems one is interested in minimizing an expected cumulative cost while taking into account \emph{risk}, i.e., increased awareness of events of small probability and high consequences. Accordingly, the objective of this paper is to present efficient reinforcement learning algorithms for risk-constrained Markov decision processes (MDPs), where risk is represent… ▽ More In many sequential decision-making problems one is interested in minimizing an expected cumulative cost while taking into account \emph{risk}, i.e., increased awareness of events of small probability and high consequences. Accordingly, the objective of this paper is to present efficient reinforcement learning algorithms for risk-constrained Markov decision processes (MDPs), where risk is represented via a chance constraint or a constraint on the conditional value-at-risk (CVaR) of the cumulative cost. We collectively refer to such problems as percentile risk-constrained MDPs. Specifically, we first derive a formula for computing the gradient of the Lagrangian function for percentile risk-constrained MDPs. Then, we devise policy gradient and actor-critic algorithms that (1) estimate such gradient, (2) update the policy in the descent direction, and (3) update the Lagrange multiplier in the ascent direction. For these algorithms we prove convergence to locally optimal policies. Finally, we demonstrate the effectiveness of our algorithms in an optimal stopping problem and an online marketing application. △ Less

Submitted 6 April, 2017; v1 submitted 5 December, 2015; originally announced December 2015.

Comments: arXiv admin note: substantial text overlap with arXiv:1406.3339

arXiv:1511.06982 [pdf, other]

Trading Safety Versus Performance: Rapid Deployment of Robotic Swarms with Robust Performance Constraints

Authors: Yin-Lam Chow, Marco Pavone, Brian M. Sadler, Stefano Carpin

Abstract: In this paper we consider a stochastic deployment problem, where a robotic swarm is tasked with the objective of positioning at least one robot at each of a set of pre-assigned targets while meeting a temporal deadline. Travel times and failure rates are stochastic but related, inasmuch as failure rates increase with speed. To maximize chances of success while meeting the deadline, a control strat… ▽ More In this paper we consider a stochastic deployment problem, where a robotic swarm is tasked with the objective of positioning at least one robot at each of a set of pre-assigned targets while meeting a temporal deadline. Travel times and failure rates are stochastic but related, inasmuch as failure rates increase with speed. To maximize chances of success while meeting the deadline, a control strategy has therefore to balance safety and performance. Our approach is to cast the problem within the theory of constrained Markov Decision Processes, whereby we seek to compute policies that maximize the probability of successful deployment while ensuring that the expected duration of the task is bounded by a given deadline. To account for uncertainties in the problem parameters, we consider a robust formulation and we propose efficient solution algorithms, which are of independent interest. Numerical experiments confirming our theoretical results are presented and discussed. △ Less

Submitted 22 November, 2015; originally announced November 2015.

arXiv:1511.06981 [pdf, other]

A Framework for Time-Consistent, Risk-Averse Model Predictive Control: Theory and Algorithms

Authors: Yin-Lam Chow, Marco Pavone

Abstract: In this paper we present a framework for risk-averse model predictive control (MPC) of linear systems affected by multiplicative uncertainty. Our key innovation is to consider time-consistent, dynamic risk metrics as objective functions to be minimized. This framework is axiomatically justified in terms of time-consistency of risk preferences, is amenable to dynamic optimization, and is unifying i… ▽ More In this paper we present a framework for risk-averse model predictive control (MPC) of linear systems affected by multiplicative uncertainty. Our key innovation is to consider time-consistent, dynamic risk metrics as objective functions to be minimized. This framework is axiomatically justified in terms of time-consistency of risk preferences, is amenable to dynamic optimization, and is unifying in the sense that it captures a full range of risk assessments from risk-neutral to worst case. Within this framework, we propose and analyze an online risk-averse MPC algorithm that is provably stabilizing. Furthermore, by exploiting the dual representation of time-consistent, dynamic risk metrics, we cast the computation of the MPC control law as a convex optimization problem amenable to implementation on embedded systems. Simulation results are presented and discussed. △ Less

Submitted 22 November, 2015; originally announced November 2015.

arXiv:1511.06980 [pdf, other]

Stochastic Optimal Control With Dynamic, Time-Consistent Risk Constraints

Authors: Yin-Lam Chow, Marco Pavone

Abstract: In this paper we present a dynamic programing approach to stochastic optimal control problems with dynamic, time-consistent risk constraints. Constrained stochastic optimal control problems, which naturally arise when one has to consider multiple objectives, have been extensively investigated in the past 20 years, however, in most formulations, the constraints are formulated as either risk-neutral… ▽ More In this paper we present a dynamic programing approach to stochastic optimal control problems with dynamic, time-consistent risk constraints. Constrained stochastic optimal control problems, which naturally arise when one has to consider multiple objectives, have been extensively investigated in the past 20 years, however, in most formulations, the constraints are formulated as either risk-neutral (i.e., by considering an expected cost), or by applying static, single-period risk metrics with limited attention to "time-consistency" (i.e., to whether such metrics ensure rational consistency of risk preferences across multiple periods). Recently, significant strides have been made in the development of a rigorous theory of dynamic, \emph{time-consistent} risk metrics for multi-period (risk-sensitive) decision processes, however, their integration within constrained stochastic optimal control problems has received little attention. The goal of this paper is to bridge this gap. First, we formulate the stochastic optimal control problem with dynamic, time-consistent risk constraints and we characterize the tail subproblems (which requires the addition of a Markovian structure to the risk metrics). Second, we develop a dynamic programming approach for its solution, which allows to compute the optimal costs by value iteration. Finally, we discuss both theoretical and practical features of our approach, such as generalizations, construction of optimal control policies, and computational aspects. A simple, two-state example is given to illustrate the problem setup and the solution approach. △ Less

Submitted 22 November, 2015; originally announced November 2015.

Comments: arXiv admin note: text overlap with arXiv:1501.02024, arXiv:1503.07461

arXiv:1511.02547 [pdf, other]

doi 10.1109/ICRA.2015.7139355

Decentralized Algorithms for 3D Symmetric Formations in Robotic Networks: a Contraction Theory Approach

Authors: Sumeet Singh, Edward Schmerling, Marco Pavone

Abstract: This paper presents decentralized algorithms for formation control of multiple robots in three dimensions. Specifically, we leverage the mathematical properties of cyclic pursuit along with results from contraction and partial contraction theory to design decentralized control algorithms that ensure global convergence to symmetric formations. We first consider regular polygon formations as a base… ▽ More This paper presents decentralized algorithms for formation control of multiple robots in three dimensions. Specifically, we leverage the mathematical properties of cyclic pursuit along with results from contraction and partial contraction theory to design decentralized control algorithms that ensure global convergence to symmetric formations. We first consider regular polygon formations as a base case, and then extend the results to Johnson solid and other polygonal mesh formations. The algorithms are further augmented to allow control over formation size and avoid collisions with other robots in the formation. The robustness properties of the algorithms are assessed in the presence of bounded additive disturbances and their effect on the quality of the formation is quantified. Finally, we present a general methodology for embedding the control laws on complex dynamical systems, in this case, quadcopters, and validate this approach via simulations and experiments on a fleet of quadcopters. △ Less

Submitted 8 November, 2015; originally announced November 2015.

Comments: Submitted to IEEE Transactions in Robotics

arXiv:1509.08932 [pdf, ps, other]

Two Phase $Q-$learning for Bidding-based Vehicle Sharing

Authors: Yinlam Chow, Jia Yuan Yu, Marco Pavone

Abstract: We consider one-way vehicle sharing systems where customers can rent a car at one station and drop it off at another. The problem we address is to optimize the distribution of cars, and quality of service, by pricing rentals appropriately. We propose a bidding approach that is inspired from auctions and takes into account the significant uncertainty inherent in the problem data (e.g., pick-up and… ▽ More We consider one-way vehicle sharing systems where customers can rent a car at one station and drop it off at another. The problem we address is to optimize the distribution of cars, and quality of service, by pricing rentals appropriately. We propose a bidding approach that is inspired from auctions and takes into account the significant uncertainty inherent in the problem data (e.g., pick-up and drop-off locations, time of requests, and duration of trips). Specifically, in contrast to current vehicle sharing systems, the operator does not set prices. Instead, customers submit bids and the operator decides whether to rent or not. The operator can even accept negative bids to motivate drivers to rebalance available cars to unpopular destinations within a city. We model the operator's sequential decision-making problem as a \emph{constrained Markov decision problem} (CMDP) and propose and rigorously analyze a novel two phase $Q$-learning algorithm for its solution. Numerical experiments are presented and discussed. △ Less

Submitted 19 October, 2015; v1 submitted 29 September, 2015; originally announced September 2015.

Comments: Submitted to AISTATS 2016

arXiv:1509.03985 [pdf, ps, other]

doi 10.1109/ICRA.2016.7487272

Model Predictive Control of Autonomous Mobility-on-Demand Systems

Authors: Rick Zhang, Federico Rossi, Marco Pavone

Abstract: In this paper we present a model predictive control (MPC) approach to optimize vehicle scheduling and routing in an autonomous mobility-on-demand (AMoD) system. In AMoD systems, robotic, self-driving vehicles transport customers within an urban environment and are coordinated to optimize service throughout the entire network. Specifically, we first propose a novel discrete-time model of an AMoD sy… ▽ More In this paper we present a model predictive control (MPC) approach to optimize vehicle scheduling and routing in an autonomous mobility-on-demand (AMoD) system. In AMoD systems, robotic, self-driving vehicles transport customers within an urban environment and are coordinated to optimize service throughout the entire network. Specifically, we first propose a novel discrete-time model of an AMoD system and we show that this formulation allows the easy integration of a number of real-world constraints, e.g., electric vehicle charging constraints. Second, leveraging our model, we design a model predictive control algorithm for the optimal coordination of an AMoD system and prove its stability in the sense of Lyapunov. At each optimization step, the vehicle scheduling and routing problem is solved as a mixed integer linear program (MILP) where the decision variables are binary variables representing whether a vehicle will 1) wait at a station, 2) service a customer, or 3) rebalance to another station. Finally, by using real-world data, we show that the MPC algorithm can be run in real-time for moderately-sized systems and outperforms previous control strategies for AMoD systems. △ Less

Submitted 15 February, 2016; v1 submitted 14 September, 2015; originally announced September 2015.

Comments: Extended version of ICRA16 paper, with full proofs of the theorems

arXiv:1507.07602 [pdf, other]

doi 10.1109/IROS.2015.7353652

An Asymptotically-Optimal Sampling-Based Algorithm for Bi-directional Motion Planning

Authors: Joseph A. Starek, Javier V. Gomez, Edward Schmerling, Lucas Janson, Luis Moreno, Marco Pavone

Abstract: Bi-directional search is a widely used strategy to increase the success and convergence rates of sampling-based motion planning algorithms. Yet, few results are available that merge both bi-directional search and asymptotic optimality into existing optimal planners, such as PRM*, RRT*, and FMT*. The objective of this paper is to fill this gap. Specifically, this paper presents a bi-directional, sa… ▽ More Bi-directional search is a widely used strategy to increase the success and convergence rates of sampling-based motion planning algorithms. Yet, few results are available that merge both bi-directional search and asymptotic optimality into existing optimal planners, such as PRM*, RRT*, and FMT*. The objective of this paper is to fill this gap. Specifically, this paper presents a bi-directional, sampling-based, asymptotically-optimal algorithm named Bi-directional FMT* (BFMT*) that extends the Fast Marching Tree (FMT*) algorithm to bi-directional search while preserving its key properties, chiefly lazy search and asymptotic optimality through convergence in probability. BFMT* performs a two-source, lazy dynamic programming recursion over a set of randomly-drawn samples, correspondingly generating two search trees: one in cost-to-come space from the initial configuration and another in cost-to-go space from the goal configuration. Numerical experiments illustrate the advantages of BFMT* over its unidirectional counterpart, as well as a number of other state-of-the-art planners. △ Less

Submitted 27 July, 2015; originally announced July 2015.

Comments: Accepted to the 2015 IEEE Intelligent Robotics and Systems Conference in Hamburg, Germany. This submission represents the long version of the conference manuscript, with additional proof details (Section IV) regarding the asymptotic optimality of the BFMT* algorithm

arXiv:1506.02188 [pdf, other]

Risk-Sensitive and Robust Decision-Making: a CVaR Optimization Approach

Authors: Yinlam Chow, Aviv Tamar, Shie Mannor, Marco Pavone

Abstract: In this paper we address the problem of decision making within a Markov decision process (MDP) framework where risk and modeling errors are taken into account. Our approach is to minimize a risk-sensitive conditional-value-at-risk (CVaR) objective, as opposed to a standard risk-neutral expectation. We refer to such problem as CVaR MDP. Our first contribution is to show that a CVaR objective, besid… ▽ More In this paper we address the problem of decision making within a Markov decision process (MDP) framework where risk and modeling errors are taken into account. Our approach is to minimize a risk-sensitive conditional-value-at-risk (CVaR) objective, as opposed to a standard risk-neutral expectation. We refer to such problem as CVaR MDP. Our first contribution is to show that a CVaR objective, besides capturing risk sensitivity, has an alternative interpretation as expected cost under worst-case modeling errors, for a given error budget. This result, which is of independent interest, motivates CVaR MDPs as a unifying framework for risk-sensitive and robust decision making. Our second contribution is to present an approximate value-iteration algorithm for CVaR MDPs and analyze its convergence rate. To our knowledge, this is the first solution algorithm for CVaR MDPs that enjoys error guarantees. Finally, we present results from numerical experiments that corroborate our theoretical findings and show the practicality of our approach. △ Less

Submitted 6 June, 2015; originally announced June 2015.

Comments: Submitted to NIPS 15

arXiv:1506.01085 [pdf, other]

A Convex Optimization Approach to Smooth Trajectories for Motion Planning with Car-Like Robots

Authors: Zhijie Zhu, Edward Schmerling, Marco Pavone

Abstract: In the recent past, several sampling-based algorithms have been proposed to compute trajectories that are collision-free and dynamically-feasible. However, the outputs of such algorithms are notoriously jagged. In this paper, by focusing on robots with car-like dynamics, we present a fast and simple heuristic algorithm, named Convex Elastic Smoothing (CES) algorithm, for trajectory smoothing and s… ▽ More In the recent past, several sampling-based algorithms have been proposed to compute trajectories that are collision-free and dynamically-feasible. However, the outputs of such algorithms are notoriously jagged. In this paper, by focusing on robots with car-like dynamics, we present a fast and simple heuristic algorithm, named Convex Elastic Smoothing (CES) algorithm, for trajectory smoothing and speed optimization. The CES algorithm is inspired by earlier work on elastic band planning and iteratively performs shape and speed optimization. The key feature of the algorithm is that both optimization problems can be solved via convex programming, making CES particularly fast. A range of numerical experiments show that the CES algorithm returns high-quality solutions in a matter of a few hundreds of milliseconds and hence appears amenable to a real-time implementation. △ Less

Submitted 26 October, 2015; v1 submitted 2 June, 2015; originally announced June 2015.

arXiv:1505.00023 [pdf, other]

Deterministic Sampling-Based Motion Planning: Optimality, Complexity, and Performance

Authors: Lucas Janson, Brian Ichter, Marco Pavone

Abstract: Probabilistic sampling-based algorithms, such as the probabilistic roadmap (PRM) and the rapidly-exploring random tree (RRT) algorithms, represent one of the most successful approaches to robotic motion planning, due to their strong theoretical properties (in terms of probabilistic completeness or even asymptotic optimality) and remarkable practical performance. Such algorithms are probabilistic i… ▽ More Probabilistic sampling-based algorithms, such as the probabilistic roadmap (PRM) and the rapidly-exploring random tree (RRT) algorithms, represent one of the most successful approaches to robotic motion planning, due to their strong theoretical properties (in terms of probabilistic completeness or even asymptotic optimality) and remarkable practical performance. Such algorithms are probabilistic in that they compute a path by connecting independently and identically distributed random points in the configuration space. Their randomization aspect, however, makes several tasks challenging, including certification for safety-critical applications and use of offline computation to improve real-time execution. Hence, an important open question is whether similar (or better) theoretical guarantees and practical performance could be obtained by considering deterministic, as opposed to random sampling sequences. The objective of this paper is to provide a rigorous answer to this question. Specifically, we first show that PRM, for a certain selection of tuning parameters and deterministic low-dispersion sampling sequences, is deterministically asymptotically optimal. Second, we characterize the convergence rate, and we find that the factor of sub-optimality can be very explicitly upper-bounded in terms of the l2-dispersion of the sampling sequence and the connection radius of PRM. Third, we show that an asymptotically optimal version of PRM exists with computational and space complexity arbitrarily close to O(n) (the theoretical lower bound), where n is the number of points in the sequence. This is in stark contrast to the O(n logn) complexity results for existing asymptotically-optimal probabilistic planners. Finally, through numerical experiments, we show that planning with deterministic low-dispersion sampling generally provides superior performance in terms of path cost and success rate. △ Less

Submitted 3 May, 2016; v1 submitted 30 April, 2015; originally announced May 2015.

arXiv:1504.08053 [pdf, other]

Monte Carlo Motion Planning for Robot Trajectory Optimization Under Uncertainty

Authors: Lucas Janson, Edward Schmerling, Marco Pavone

Abstract: This article presents a novel approach, named MCMP (Monte Carlo Motion Planning), to the problem of motion planning under uncertainty, i.e., to the problem of computing a low-cost path that fulfills probabilistic collision avoidance constraints. MCMP estimates the collision probability (CP) of a given path by sampling via Monte Carlo the execution of a reference tracking controller (in this paper… ▽ More This article presents a novel approach, named MCMP (Monte Carlo Motion Planning), to the problem of motion planning under uncertainty, i.e., to the problem of computing a low-cost path that fulfills probabilistic collision avoidance constraints. MCMP estimates the collision probability (CP) of a given path by sampling via Monte Carlo the execution of a reference tracking controller (in this paper we consider LQG). The key algorithmic contribution of this paper is the design of statistical variance-reduction techniques, namely control variates and importance sampling, to make such a sampling procedure amenable to real-time implementation. MCMP applies this CP estimation procedure to motion planning by iteratively (i) computing an (approximately) optimal path for the deterministic version of the problem (here, using the FMT* algorithm), (ii) computing the CP of this path, and (iii) inflating or deflating the obstacles by a common factor depending on whether the CP is higher or lower than a target value. The advantages of MCMP are threefold: (i) asymptotic correctness of CP estimation, as opposed to most current approximations, which, as shown in this paper, can be off by large multiples and hinder the computation of feasible plans; (ii) speed and parallelizability, and (iii) generality, i.e., the approach is applicable to virtually any planning problem provided that a path tracking controller and a notion of distance to obstacles in the configuration space are available. Numerical results illustrate the correctness (in terms of feasibility), efficiency (in terms of path cost), and computational speed of MCMP. △ Less

Submitted 28 May, 2015; v1 submitted 29 April, 2015; originally announced April 2015.

arXiv:1503.07461 [pdf, other]

A Time Consistent Formulation of Risk Constrained Stochastic Optimal Control

Authors: Yinlam Chow, Marco Pavone

Abstract: Time-consistency is an essential requirement in risk sensitive optimal control problems to make rational decisions. An optimization problem is time consistent if its solution policy does not depend on the time sequence of solving the optimization problem. On the other hand, a dynamic risk measure is time consistent if a certain outcome is considered less risky in the future implies this outcome is… ▽ More Time-consistency is an essential requirement in risk sensitive optimal control problems to make rational decisions. An optimization problem is time consistent if its solution policy does not depend on the time sequence of solving the optimization problem. On the other hand, a dynamic risk measure is time consistent if a certain outcome is considered less risky in the future implies this outcome is also less risky at current stage. In this paper, we study time-consistency of risk constrained problem where the risk metric is time consistent. From the Bellman optimality condition in [1], we establish an analytical "risk-to-go" that results in a time consistent optimal policy. Finally we demonstrate the effectiveness of the analytical solution by solving Haviv's counter-example [2] in time inconsistent planning. △ Less

Submitted 25 March, 2015; originally announced March 2015.

arXiv:1501.02024 [pdf, ps, other]

A Uniform-grid Discretization Algorithm for Stochastic Control with Risk Constraints

Authors: Yin-Lam Chow, Marco Pavone

Abstract: In this paper, we present a discretization algorithm for finite horizon risk constrained dynamic programming algorithm in [Chow_Pavone_13]. Although in a theoretical standpoint, Bellman's recursion provides a systematic way to find optimal value functions and generate optimal history dependent policies, there is a serious computational issue. Even if the state space and action space of this constr… ▽ More In this paper, we present a discretization algorithm for finite horizon risk constrained dynamic programming algorithm in [Chow_Pavone_13]. Although in a theoretical standpoint, Bellman's recursion provides a systematic way to find optimal value functions and generate optimal history dependent policies, there is a serious computational issue. Even if the state space and action space of this constrained stochastic optimal control problem are finite, the spaces of risk threshold and the feasible risk update are closed bounded subset of real numbers. This prohibits any direct applications of unconstrained finite state iterative methods in dynamic programming found in [Bertsekas_05]. In order to approximate Bellman's operator derived in [Chow_Pavone_13], we discretize the continuous action spaces and formulate a finite space approximation for the exact dynamic programming algorithm. We will also prove that the approximation error bound of optimal value functions is bound linearly by the step size of discretization. Finally, details for implementations and possible modifications are discussed. △ Less

Submitted 8 January, 2015; originally announced January 2015.

arXiv:1410.0956 [pdf, other]

Distributed consensus with mixed time/communication bandwidth performance metrics

Authors: Federico Rossi, Marco Pavone

Abstract: In this paper we study the inherent trade-off between time and communication complexity for the distributed consensus problem. In our model, communication complexity is measured as the maximum data throughput (in bits per second) sent through the network at a given instant. Such a notion of communication complexity, referred to as bandwidth complexity, is related to the frequency bandwidth a desig… ▽ More In this paper we study the inherent trade-off between time and communication complexity for the distributed consensus problem. In our model, communication complexity is measured as the maximum data throughput (in bits per second) sent through the network at a given instant. Such a notion of communication complexity, referred to as bandwidth complexity, is related to the frequency bandwidth a designer should collectively allocate to the agents if they were to communicate via a wireless channel, which represents an important constraint for dense robotic networks. We prove a lower bound on the bandwidth complexity of the consensus problem and provide a consensus algorithm that is bandwidth-optimal for a wide class of consensus functions. We then propose a distributed algorithm that can trade communication complexity versus time complexity as a function of a tunable parameter, which can be adjusted by a system designer as a function of the properties of the wireless communication channel. We rigorously characterize the tunable algorithm's worst-case bandwidth complexity and show that it compares favorably with the bandwidth complexity of well-known consensus algorithm. △ Less

Submitted 3 October, 2014; originally announced October 2014.

Comments: Draft, submitted to Allerton 2014

Showing 201–250 of 262 results for author: Pavone, M