-
Repeatable and Reliable Efforts of Accelerated Risk Assessment
Authors:
Linda Capito,
Guillermo A. Castillo,
Bowen Weng
Abstract:
Risk assessment of a robot in controlled environments, such as laboratories and proving grounds, is a common means to assess, certify, validate, verify, and characterize the robots' safety performance before, during, and even after their commercialization in the real-world. A standard testing program that acquires the risk estimate is expected to be (i) repeatable, such that it obtains similar ris…
▽ More
Risk assessment of a robot in controlled environments, such as laboratories and proving grounds, is a common means to assess, certify, validate, verify, and characterize the robots' safety performance before, during, and even after their commercialization in the real-world. A standard testing program that acquires the risk estimate is expected to be (i) repeatable, such that it obtains similar risk assessments of the same testing subject among multiple trials or attempts with the similar testing effort by different stakeholders, and (ii) reliable against a variety of testing subjects produced by different vendors and manufacturers. Both repeatability and reliability are fundamental and crucial for a testing algorithm's validity, fairness, and practical feasibility, especially for standardization. However, these properties are rarely satisfied or ensured, especially as the subject robots become more complex, uncertain, and varied. This issue was present in traditional risk assessments through Monte-Carlo sampling, and remains a bottleneck for the recent accelerated risk assessment methods, primarily those using importance sampling. This study aims to enhance existing accelerated testing frameworks by proposing a new algorithm that provably integrates repeatability and reliability with the already established formality and efficiency. It also features demonstrations assessing the risk of instability from frontal impacts, initiated by push-over disturbances on a controlled inverted pendulum and a 7-DoF planar bipedal robot Rabbit managed by various control algorithms.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Navigating the Landscape of Large Language Models: A Comprehensive Review and Analysis of Paradigms and Fine-Tuning Strategies
Authors:
Benjue Weng
Abstract:
With the surge of ChatGPT,the use of large models has significantly increased,rapidly rising to prominence across the industry and sweeping across the internet. This article is a comprehensive review of fine-tuning methods for large models. This paper investigates the latest technological advancements and the application of advanced methods in aspects such as task-adaptive fine-tuning,domain-adapt…
▽ More
With the surge of ChatGPT,the use of large models has significantly increased,rapidly rising to prominence across the industry and sweeping across the internet. This article is a comprehensive review of fine-tuning methods for large models. This paper investigates the latest technological advancements and the application of advanced methods in aspects such as task-adaptive fine-tuning,domain-adaptive fine-tuning,few-shot learning,knowledge distillation,multi-task learning,parameter-efficient fine-tuning,and dynamic fine-tuning.
△ Less
Submitted 13 April, 2024;
originally announced April 2024.
-
Data-Driven Latent Space Representation for Robust Bipedal Locomotion Learning
Authors:
Guillermo A. Castillo,
Bowen Weng,
Wei Zhang,
Ayonga Hereid
Abstract:
This paper presents a novel framework for learning robust bipedal walking by combining a data-driven state representation with a Reinforcement Learning (RL) based locomotion policy. The framework utilizes an autoencoder to learn a low-dimensional latent space that captures the complex dynamics of bipedal locomotion from existing locomotion data. This reduced dimensional state representation is the…
▽ More
This paper presents a novel framework for learning robust bipedal walking by combining a data-driven state representation with a Reinforcement Learning (RL) based locomotion policy. The framework utilizes an autoencoder to learn a low-dimensional latent space that captures the complex dynamics of bipedal locomotion from existing locomotion data. This reduced dimensional state representation is then used as states for training a robust RL-based gait policy, eliminating the need for heuristic state selections or the use of template models for gait planning. The results demonstrate that the learned latent variables are disentangled and directly correspond to different gaits or speeds, such as moving forward, backward, or walking in place. Compared to traditional template model-based approaches, our framework exhibits superior performance and robustness in simulation. The trained policy effectively tracks a wide range of walking speeds and demonstrates good generalization capabilities to unseen scenarios.
△ Less
Submitted 27 September, 2023;
originally announced September 2023.
-
Template Model Inspired Task Space Learning for Robust Bipedal Locomotion
Authors:
Guillermo A. Castillo,
Bowen Weng,
Shunpeng Yang,
Wei Zhang,
Ayonga Hereid
Abstract:
This work presents a hierarchical framework for bipedal locomotion that combines a Reinforcement Learning (RL)-based high-level (HL) planner policy for the online generation of task space commands with a model-based low-level (LL) controller to track the desired task space trajectories. Different from traditional end-to-end learning approaches, our HL policy takes insights from the angular momentu…
▽ More
This work presents a hierarchical framework for bipedal locomotion that combines a Reinforcement Learning (RL)-based high-level (HL) planner policy for the online generation of task space commands with a model-based low-level (LL) controller to track the desired task space trajectories. Different from traditional end-to-end learning approaches, our HL policy takes insights from the angular momentum-based linear inverted pendulum (ALIP) to carefully design the observation and action spaces of the Markov Decision Process (MDP). This simple yet effective design creates an insightful mapping between a low-dimensional state that effectively captures the complex dynamics of bipedal locomotion and a set of task space outputs that shape the walking gait of the robot. The HL policy is agnostic to the task space LL controller, which increases the flexibility of the design and generalization of the framework to other bipedal robots. This hierarchical design results in a learning-based framework with improved performance, data efficiency, and robustness compared with the ALIP model-based approach and state-of-the-art learning-based frameworks for bipedal locomotion. The proposed hierarchical controller is tested in three different robots, Rabbit, a five-link underactuated planar biped; Walker2D, a seven-link fully-actuated planar biped; and Digit, a 3D humanoid robot with 20 actuated joints. The trained policy naturally learns human-like locomotion behaviors and is able to effectively track a wide range of walking speeds while preserving the robustness and stability of the walking gait even under adversarial conditions.
△ Less
Submitted 27 September, 2023;
originally announced September 2023.
-
Towards Standardized Disturbance Rejection Testing of Legged Robot Locomotion with Linear Impactor: A Preliminary Study, Observations, and Implications
Authors:
Bowen Weng,
Guillermo A. Castillo,
Yun-Seok Kang,
Ayonga Hereid
Abstract:
Dynamic locomotion in legged robots is close to industrial collaboration, but a lack of standardized testing obstructs commercialization. The issues are not merely political, theoretical, or algorithmic but also physical, indicating limited studies and comprehension regarding standard testing infrastructure and equipment. For decades, the approaches we have been testing legged robots were rarely s…
▽ More
Dynamic locomotion in legged robots is close to industrial collaboration, but a lack of standardized testing obstructs commercialization. The issues are not merely political, theoretical, or algorithmic but also physical, indicating limited studies and comprehension regarding standard testing infrastructure and equipment. For decades, the approaches we have been testing legged robots were rarely standardizable with hand-pushing, foot-kicking, rope-dragging, stick-poking, and ball-swinging. This paper aims to bridge the gap by proposing the use of the linear impactor, a well-established tool in other standardized testing disciplines, to serve as an adaptive, repeatable, and fair disturbance rejection testing equipment for legged robots. A pneumatic linear impactor is also adopted for the case study involving the humanoid robot Digit. Three locomotion controllers are examined, including a commercial one, using a walking-in-place task against frontal impacts. The statistically best controller was able to withstand the impact momentum (26.376 kg$\cdot$m/s) on par with a reported average effective momentum from straight punches by Olympic boxers (26.506 kg$\cdot$m/s). Moreover, the case study highlights other anti-intuitive observations, demonstrations, and implications that, to the best of the authors' knowledge, are first-of-its-kind revealed in real-world testing of legged robots.
△ Less
Submitted 29 January, 2024; v1 submitted 28 August, 2023;
originally announced August 2023.
-
A Diversity Analysis of Safety Metrics Comparing Vehicle Performance in the Lead-Vehicle Interaction Regime
Authors:
Harnarayan Singh,
Bowen Weng,
Sughosh J. Rao,
Devin Elsasser
Abstract:
Vehicle performance metrics analyze data sets consisting of subject vehicle's interactions with other road users in a nominal driving environment and provide certain performance measures as outputs. To the best of the authors' knowledge, the vehicle safety performance metrics research dates back to at least 1967. To date, there still does not exist a community-wide accepted metric or a set of metr…
▽ More
Vehicle performance metrics analyze data sets consisting of subject vehicle's interactions with other road users in a nominal driving environment and provide certain performance measures as outputs. To the best of the authors' knowledge, the vehicle safety performance metrics research dates back to at least 1967. To date, there still does not exist a community-wide accepted metric or a set of metrics for vehicle safety performance assessment and justification. This issue gets further amplified with the evolving interest in Advanced Driver Assistance Systems and Automated Driving Systems. In this paper, the authors seek to perform a unified study that facilitates an improved community-wide understanding of vehicle performance metrics using the lead-vehicle interaction operational design domain as a common means of performance comparison. In particular, the authors study the diversity (including constructive formulation discrepancies and empirical performance differences) among 33 base metrics with up to 51 metric variants (with different choices of hyper-parameters) in the existing literature, published between 1967 and 2022. Two data sets are adopted for the empirical performance diversity analysis, including vehicle trajectories from normal highway driving environment and relatively high-risk incidents with collisions and near-miss cases. The analysis further implies that (i) the conceptual acceptance of a safety metric proposal can be problematic if the assumptions, conditions, and types of outcome assurance are not justified properly, and (ii) the empirical performance justification of an acceptable metric can also be problematic as a dominant consensus is not observed among metrics empirically.
△ Less
Submitted 26 June, 2023;
originally announced June 2023.
-
On the Adversarial Scenario-based Safety Testing of Robots: the Comparability and Optimal Aggressiveness
Authors:
Bowen Weng,
Guillermo A. Castillo,
Wei Zhang,
Ayonga Hereid
Abstract:
This paper studies the class of scenario-based safety testing algorithms in the black-box safety testing configuration. For algorithms sharing the same state-action set coverage with different sampling distributions, it is commonly believed that prioritizing the exploration of high-risk state-actions leads to a better sampling efficiency. Our proposal disputes the above intuition by introducing an…
▽ More
This paper studies the class of scenario-based safety testing algorithms in the black-box safety testing configuration. For algorithms sharing the same state-action set coverage with different sampling distributions, it is commonly believed that prioritizing the exploration of high-risk state-actions leads to a better sampling efficiency. Our proposal disputes the above intuition by introducing an impossibility theorem that provably shows all safety testing algorithms of the aforementioned difference perform equally well with the same expected sampling efficiency. Moreover, for testing algorithms covering different sets of state-actions, the sampling efficiency criterion is no longer applicable as different algorithms do not necessarily converge to the same termination condition. We then propose a testing aggressiveness definition based on the almost safe set concept along with an unbiased and efficient algorithm that compares the aggressiveness between testing algorithms. Empirical observations from the safety testing of bipedal locomotion controllers and vehicle decision-making modules are also presented to support the proposed theoretical implications and methodologies.
△ Less
Submitted 3 April, 2023; v1 submitted 20 September, 2022;
originally announced September 2022.
-
On Safety Testing, Validation, and Characterization with Scenario-Sampling: A Case Study of Legged Robots
Authors:
Bowen Weng,
Guillermo A. Castillo,
Wei Zhang,
Ayonga Hereid
Abstract:
The dynamic response of the legged robot locomotion is non-Lipschitz and can be stochastic due to environmental uncertainties. To test, validate, and characterize the safety performance of legged robots, existing solutions on observed and inferred risk can be incomplete and sampling inefficient. Some formal verification methods suffer from the model precision and other surrogate assumptions. In th…
▽ More
The dynamic response of the legged robot locomotion is non-Lipschitz and can be stochastic due to environmental uncertainties. To test, validate, and characterize the safety performance of legged robots, existing solutions on observed and inferred risk can be incomplete and sampling inefficient. Some formal verification methods suffer from the model precision and other surrogate assumptions. In this paper, we propose a scenario sampling based testing framework that characterizes the overall safety performance of a legged robot by specifying (i) where (in terms of a set of states) the robot is potentially safe, and (ii) how safe the robot is within the specified set. The framework can also help certify the commercial deployment of the legged robot in real-world environment along with human and compare safety performance among legged robots with different mechanical structures and dynamic properties. The proposed framework is further deployed to evaluate a group of state-of-the-art legged robot locomotion controllers from various model-based, deep neural network involved, and reinforcement learning based methods in the literature. Among a series of intended work domains of the studied legged robots (e.g. tracking speed on sloped surface, with abrupt changes on demanded velocity, and against adversarial push-over disturbances), we show that the method can adequately capture the overall safety characterization and the subtle performance insights. Many of the observed safety outcomes, to the best of our knowledge, have never been reported by the existing work in the legged robot literature.
△ Less
Submitted 16 April, 2022;
originally announced April 2022.
-
A Formal Safety Characterization of Advanced Driver Assist Systems in the Car-Following Regime with Scenario-Sampling
Authors:
Bowen Weng,
Minghao Zhu,
Keith Redmill
Abstract:
The capability to follow a lead-vehicle and avoid rear-end collisions is one of the most important functionalities for human drivers and various Advanced Driver Assist Systems (ADAS). Existing safety performance justification of the car-following systems either relies on simple concrete scenarios with biased surrogate metrics or requires a significantly long driving distance for risk observation a…
▽ More
The capability to follow a lead-vehicle and avoid rear-end collisions is one of the most important functionalities for human drivers and various Advanced Driver Assist Systems (ADAS). Existing safety performance justification of the car-following systems either relies on simple concrete scenarios with biased surrogate metrics or requires a significantly long driving distance for risk observation and inference. In this paper, we propose a guaranteed unbiased and sampling efficient scenario-based safety evaluation framework inspired by the previous work on $εδ$-almost safe set quantification. The proposal characterizes the complete safety performance of the test subject in the car-following regime. The performance of the proposed method is also demonstrated in challenging cases including some widely adopted car-following decision-making modules and the commercially available Openpilot driving stack by CommaAI.
△ Less
Submitted 23 May, 2022; v1 submitted 17 February, 2022;
originally announced February 2022.
-
Meta-Auto-Decoder for Solving Parametric Partial Differential Equations
Authors:
Xiang Huang,
Zhanhong Ye,
Hongsheng Liu,
Beiji Shi,
Zidong Wang,
Kang Yang,
Yang Li,
Bingya Weng,
Min Wang,
Haotian Chu,
Fan Yu,
Bei Hua,
Lei Chen,
Bin Dong
Abstract:
Many important problems in science and engineering require solving the so-called parametric partial differential equations (PDEs), i.e., PDEs with different physical parameters, boundary conditions, shapes of computation domains, etc. Recently, building learning-based numerical solvers for parametric PDEs has become an emerging new field. One category of methods such as the Deep Galerkin Method (D…
▽ More
Many important problems in science and engineering require solving the so-called parametric partial differential equations (PDEs), i.e., PDEs with different physical parameters, boundary conditions, shapes of computation domains, etc. Recently, building learning-based numerical solvers for parametric PDEs has become an emerging new field. One category of methods such as the Deep Galerkin Method (DGM) and Physics-Informed Neural Networks (PINNs) aim to approximate the solution of the PDEs. They are typically unsupervised and mesh-free, but require going through the time-consuming network training process from scratch for each set of parameters of the PDE. Another category of methods such as Fourier Neural Operator (FNO) and Deep Operator Network (DeepONet) try to approximate the solution mapping directly. Being fast with only one forward inference for each PDE parameter without retraining, they often require a large corpus of paired input-output observations drawn from numerical simulations, and most of them need a predefined mesh as well. In this paper, we propose Meta-Auto-Decoder (MAD), a mesh-free and unsupervised deep learning method that enables the pre-trained model to be quickly adapted to equation instances by implicitly encoding (possibly heterogenous) PDE parameters as latent vectors. The proposed method MAD can be interpreted by manifold learning in infinite-dimensional spaces, granting it a geometric insight. Extensive numerical experiments show that the MAD method exhibits faster convergence speed without losing accuracy than other deep learning-based methods. The project page with code is available: https://gitee.com/mindspore/mindscience/tree/master/MindElec/.
△ Less
Submitted 18 November, 2022; v1 submitted 14 November, 2021;
originally announced November 2021.
-
A Finite-Sampling, Operational Domain Specific, and Provably Unbiased Connected and Automated Vehicle Safety Metric
Authors:
Bowen Weng,
Linda Capito,
Umit Ozguner,
Keith Redmill
Abstract:
A connected and automated vehicle safety metric determines the performance of a subject vehicle (SV) by analyzing the data involving the interactions among the SV and other dynamic road users and environmental features. When the data set contains only a finite set of samples collected from the naturalistic mixed-traffic driving environment, a metric is expected to generalize the safety assessment…
▽ More
A connected and automated vehicle safety metric determines the performance of a subject vehicle (SV) by analyzing the data involving the interactions among the SV and other dynamic road users and environmental features. When the data set contains only a finite set of samples collected from the naturalistic mixed-traffic driving environment, a metric is expected to generalize the safety assessment outcome from the observed finite samples to the unobserved cases by specifying in what domain the SV is expected to be safe and how safe the SV is, statistically, in that domain. However, to the best of our knowledge, none of the existing safety metrics are able to justify the above properties with an operational domain specific, guaranteed complete, and provably unbiased safety evaluation outcome. In this paper, we propose a novel safety metric that involves the $α$-shape and the $ε$-almost robustly forward invariant set to characterize the SV's almost safe operable domain and the probability for the SV to remain inside the safe domain indefinitely, respectively. The empirical performance of the proposed method is demonstrated in several different operational design domains through a series of cases covering a variety of fidelity levels (real-world and simulators), driving environments (highway, urban, and intersections), road users (car, truck, and pedestrian), and SV driving behaviors (human driver and self driving algorithms).
△ Less
Submitted 2 February, 2022; v1 submitted 15 November, 2021;
originally announced November 2021.
-
Solving Partial Differential Equations with Point Source Based on Physics-Informed Neural Networks
Authors:
Xiang Huang,
Hongsheng Liu,
Beiji Shi,
Zidong Wang,
Kang Yang,
Yang Li,
Bingya Weng,
Min Wang,
Haotian Chu,
Jing Zhou,
Fan Yu,
Bei Hua,
Lei Chen,
Bin Dong
Abstract:
In recent years, deep learning technology has been used to solve partial differential equations (PDEs), among which the physics-informed neural networks (PINNs) emerges to be a promising method for solving both forward and inverse PDE problems. PDEs with a point source that is expressed as a Dirac delta function in the governing equations are mathematical models of many physical processes. However…
▽ More
In recent years, deep learning technology has been used to solve partial differential equations (PDEs), among which the physics-informed neural networks (PINNs) emerges to be a promising method for solving both forward and inverse PDE problems. PDEs with a point source that is expressed as a Dirac delta function in the governing equations are mathematical models of many physical processes. However, they cannot be solved directly by conventional PINNs method due to the singularity brought by the Dirac delta function. We propose a universal solution to tackle this problem with three novel techniques. Firstly the Dirac delta function is modeled as a continuous probability density function to eliminate the singularity; secondly a lower bound constrained uncertainty weighting algorithm is proposed to balance the PINNs losses between point source area and other areas; and thirdly a multi-scale deep neural network with periodic activation function is used to improve the accuracy and convergence speed of the PINNs method. We evaluate the proposed method with three representative PDEs, and the experimental results show that our method outperforms existing deep learning-based methods with respect to the accuracy, the efficiency and the versatility.
△ Less
Submitted 2 November, 2021;
originally announced November 2021.
-
A Formal Characterization of Black-Box System Safety Performance with Scenario Sampling
Authors:
Bowen Weng,
Linda Capito,
Umit Ozguner,
Keith Redmill
Abstract:
A typical scenario-based evaluation framework seeks to characterize a black-box system's safety performance (e.g., failure rate) through repeatedly sampling initialization configurations (scenario sampling) and executing a certain test policy for scenario propagation (scenario testing) with the black-box system involved as the test subject. In this letter, we first present a novel safety evaluatio…
▽ More
A typical scenario-based evaluation framework seeks to characterize a black-box system's safety performance (e.g., failure rate) through repeatedly sampling initialization configurations (scenario sampling) and executing a certain test policy for scenario propagation (scenario testing) with the black-box system involved as the test subject. In this letter, we first present a novel safety evaluation criterion that seeks to characterize the actual operational domain within which the test subject would remain safe indefinitely with high probability. By formulating the black-box testing scenario as a dynamic system, we show that the presented problem is equivalent to finding a certain "almost" robustly forward invariant set for the given system. Second, for an arbitrary scenario testing strategy, we propose a scenario sampling algorithm that is provably asymptotically optimal in obtaining the safe invariant set with arbitrarily high accuracy. Moreover, as one considers different testing strategies (e.g., biased sampling of safety-critical cases), we show that the proposed algorithm still converges to the unbiased approximation of the safety characterization outcome if the scenario testing satisfies a certain condition. Finally, the effectiveness of the presented scenario sampling algorithms and various theoretical properties are demonstrated in a case study of the safety evaluation of a control barrier function-based mobile robot collision avoidance system.
△ Less
Submitted 5 October, 2021;
originally announced October 2021.
-
Towards Guaranteed Safety Assurance of Automated Driving Systems with Scenario Sampling: An Invariant Set Perspective (Extended Version)
Authors:
Bowen Weng,
Linda Capito,
Umit Ozguner,
Keith Redmill
Abstract:
How many scenarios are sufficient to validate the safe Operational Design Domain (ODD) of an Automated Driving System (ADS) equipped vehicle? Is a more significant number of sampled scenarios guaranteeing a more accurate safety assessment of the ADS? Despite the various empirical success of ADS safety evaluation with scenario sampling in practice, some of the fundamental properties are largely unk…
▽ More
How many scenarios are sufficient to validate the safe Operational Design Domain (ODD) of an Automated Driving System (ADS) equipped vehicle? Is a more significant number of sampled scenarios guaranteeing a more accurate safety assessment of the ADS? Despite the various empirical success of ADS safety evaluation with scenario sampling in practice, some of the fundamental properties are largely unknown. This paper seeks to remedy this gap by formulating and tackling the scenario sampling safety assurance problem from a set invariance perspective. First, a novel conceptual equivalence is drawn between the scenario sampling safety assurance problem and the data-driven robustly controlled forward invariant set validation and quantification problem. This paper then provides a series of resolution complete and probabilistic complete solutions with finite-sampling analyses for the safety validation problem that authenticates a given ODD. On the other hand, the quantification problem escalates the validation challenge and starts looking for a safe sub-domain of a particular property. This inspires various algorithms that are provably probabilistic incomplete, probabilistic complete but sub-optimal, and asymptotically optimal. Finally, the proposed asymptotically optimal scenario sampling safety quantification algorithm is also empirically demonstrated through simulation experiments.
△ Less
Submitted 29 September, 2021; v1 submitted 19 April, 2021;
originally announced April 2021.
-
Robust Feedback Motion Policy Design Using Reinforcement Learning on a 3D Digit Bipedal Robot
Authors:
Guillermo A. Castillo,
Bowen Weng,
Wei Zhang,
Ayonga Hereid
Abstract:
In this paper, a hierarchical and robust framework for learning bipedal locomotion is presented and successfully implemented on the 3D biped robot Digit built by Agility Robotics. We propose a cascade-structure controller that combines the learning process with intuitive feedback regulations. This design allows the framework to realize robust and stable walking with a reduced-dimension state and a…
▽ More
In this paper, a hierarchical and robust framework for learning bipedal locomotion is presented and successfully implemented on the 3D biped robot Digit built by Agility Robotics. We propose a cascade-structure controller that combines the learning process with intuitive feedback regulations. This design allows the framework to realize robust and stable walking with a reduced-dimension state and action spaces of the policy, significantly simplifying the design and reducing the sampling efficiency of the learning method. The inclusion of feedback regulation into the framework improves the robustness of the learned walking gait and ensures the success of the sim-to-real transfer of the proposed controller with minimal tuning. We specifically present a learning pipeline that considers hardware-feasible initial poses of the robot within the learning process to ensure the initial state of the learning is replicated as close as possible to the initial state of the robot in hardware experiments. Finally, we demonstrate the feasibility of our method by successfully transferring the learned policy in simulation to the Digit robot hardware, realizing sustained walking gaits under external force disturbances and challenging terrains not included during the training process. To the best of our knowledge, this is the first time a learning-based policy is transferred successfully to the Digit robot in hardware experiments without using dynamic randomization or curriculum learning.
△ Less
Submitted 28 March, 2021;
originally announced March 2021.
-
A Person Re-identification Data Augmentation Method with Adversarial Defense Effect
Authors:
Yunpeng Gong,
Zhiyong Zeng,
Liwen Chen,
Yifan Luo,
Bin Weng,
Feng Ye
Abstract:
The security of the Person Re-identification(ReID) model plays a decisive role in the application of ReID. However, deep neural networks have been shown to be vulnerable, and adding undetectable adversarial perturbations to clean images can trick deep neural networks that perform well in clean images. We propose a ReID multi-modal data augmentation method with adversarial defense effect: 1) Graysc…
▽ More
The security of the Person Re-identification(ReID) model plays a decisive role in the application of ReID. However, deep neural networks have been shown to be vulnerable, and adding undetectable adversarial perturbations to clean images can trick deep neural networks that perform well in clean images. We propose a ReID multi-modal data augmentation method with adversarial defense effect: 1) Grayscale Patch Replacement, it consists of Local Grayscale Patch Replacement(LGPR) and Global Grayscale Patch Replacement(GGPR). This method can not only improve the accuracy of the model, but also help the model defend against adversarial examples; 2) Multi-Modal Defense, it integrates three homogeneous modal images of visible, grayscale and sketch, and further strengthens the defense ability of the model. These methods fuse different modalities of homogeneous images to enrich the input sample variety, the variaty of samples will reduce the over-fitting of the ReID model to color variations and make the adversarial space of the dataset that the attack method can find difficult to align, thus the accuracy of model is improved, and the attack effect is greatly reduced. The more modal homogeneous images are fused, the stronger the defense capabilities is . The proposed method performs well on multiple datasets, and successfully defends the attack of MS-SSIM proposed by CVPR2020 against ReID [10], and increases the accuracy by 467 times(0.2% to 93.3%).The code is available at https://github.com/finger-monkey/ReID_Adversarial_Defense.
△ Less
Submitted 7 April, 2021; v1 submitted 21 January, 2021;
originally announced January 2021.
-
Stock2Vec: A Hybrid Deep Learning Framework for Stock Market Prediction with Representation Learning and Temporal Convolutional Network
Authors:
Xing Wang,
Yijun Wang,
Bin Weng,
Aleksandr Vinel
Abstract:
We have proposed to develop a global hybrid deep learning framework to predict the daily prices in the stock market. With representation learning, we derived an embedding called Stock2Vec, which gives us insight for the relationship among different stocks, while the temporal convolutional layers are used for automatically capturing effective temporal patterns both within and across series. Evaluat…
▽ More
We have proposed to develop a global hybrid deep learning framework to predict the daily prices in the stock market. With representation learning, we derived an embedding called Stock2Vec, which gives us insight for the relationship among different stocks, while the temporal convolutional layers are used for automatically capturing effective temporal patterns both within and across series. Evaluated on S&P 500, our hybrid framework integrates both advantages and achieves better performance on the stock price prediction task than several popular benchmarked models.
△ Less
Submitted 29 September, 2020;
originally announced October 2020.
-
A Modeled Approach for Online Adversarial Test of Operational Vehicle Safety (extended version)
Authors:
Linda Capito,
Bowen Weng,
Umit Ozguner,
Keith Redmill
Abstract:
The scenario-based testing of operational vehicle safety presents a set of principal other vehicle (POV) trajectories that seek to force the subject vehicle (SV) into a certain safety-critical situation. Current scenarios are mostly (i) statistics-driven: inspired by human driver crash data, (ii) deterministic: POV trajectories are pre-determined and are independent of SV responses, and (iii) over…
▽ More
The scenario-based testing of operational vehicle safety presents a set of principal other vehicle (POV) trajectories that seek to force the subject vehicle (SV) into a certain safety-critical situation. Current scenarios are mostly (i) statistics-driven: inspired by human driver crash data, (ii) deterministic: POV trajectories are pre-determined and are independent of SV responses, and (iii) overly simplified: defined over a finite set of actions performed at the abstracted motion planning level. Such scenario-based testing (i) lacks severity guarantees, (ii) has predefined maneuvers making it easy for an SV with intelligent driving policies to game the test, and (iii) is inefficient in producing safety-critical instances with limited and expensive testing effort. We propose a model-driven online feedback control policy for multiple POVs which propagates efficient adversarial trajectories while respecting traffic rules and other concerns formulated as an admissible state-action space. The approach is formulated in an anchor-template hierarchy structure, with the template model planning inducing a theoretical SV capturing guarantee under standard assumptions. The planned adversarial trajectory is then tracked by a lower-level controller applied to the full-system or the anchor model. The effectiveness of the methodology is illustrated through various simulated examples with the SV controlled by either parameterized self-driving policies or human drivers.
△ Less
Submitted 20 May, 2021; v1 submitted 25 September, 2020;
originally announced September 2020.
-
Velocity Regulation of 3D Bipedal Walking Robots with Uncertain Dynamics Through Adaptive Neural Network Controller
Authors:
Guillermo A. Castillo,
Bowen Weng,
Terrence C. Stewart,
Wei Zhang,
Ayonga Hereid
Abstract:
This paper presents a neural-network based adaptive feedback control structure to regulate the velocity of 3D bipedal robots under dynamics uncertainties. Existing Hybrid Zero Dynamics (HZD)-based controllers regulate velocity through the implementation of heuristic regulators that do not consider model and environmental uncertainties, which may significantly affect the tracking performance of the…
▽ More
This paper presents a neural-network based adaptive feedback control structure to regulate the velocity of 3D bipedal robots under dynamics uncertainties. Existing Hybrid Zero Dynamics (HZD)-based controllers regulate velocity through the implementation of heuristic regulators that do not consider model and environmental uncertainties, which may significantly affect the tracking performance of the controllers. In this paper, we address the uncertainties in the robot dynamics from the perspective of the reduced dimensional representation of virtual constraints and propose the integration of an adaptive neural network-based controller to regulate the robot velocity in the presence of model parameter uncertainties. The proposed approach yields improved tracking performance under dynamics uncertainties. The shallow adaptive neural network used in this paper does not require training a priori and has the potential to be implemented on the real-time robotic controller. A comparative simulation study of a 3D Cassie robot is presented to illustrate the performance of the proposed approach under various scenarios.
△ Less
Submitted 1 August, 2020;
originally announced August 2020.
-
Momentum Q-learning with Finite-Sample Convergence Guarantee
Authors:
Bowen Weng,
Huaqing Xiong,
Lin Zhao,
Yingbin Liang,
Wei Zhang
Abstract:
Existing studies indicate that momentum ideas in conventional optimization can be used to improve the performance of Q-learning algorithms. However, the finite-sample analysis for momentum-based Q-learning algorithms is only available for the tabular case without function approximations. This paper analyzes a class of momentum-based Q-learning algorithms with finite-sample guarantee. Specifically,…
▽ More
Existing studies indicate that momentum ideas in conventional optimization can be used to improve the performance of Q-learning algorithms. However, the finite-sample analysis for momentum-based Q-learning algorithms is only available for the tabular case without function approximations. This paper analyzes a class of momentum-based Q-learning algorithms with finite-sample guarantee. Specifically, we propose the MomentumQ algorithm, which integrates the Nesterov's and Polyak's momentum schemes, and generalizes the existing momentum-based Q-learning algorithms. For the infinite state-action space case, we establish the convergence guarantee for MomentumQ with linear function approximations and Markovian sampling. In particular, we characterize the finite-sample convergence rate which is provably faster than the vanilla Q-learning. This is the first finite-sample analysis for momentum-based Q-learning algorithms with function approximations. For the tabular case under synchronous sampling, we also obtain a finite-sample convergence rate that is slightly better than the SpeedyQ \citep{azar2011speedy} when choosing a special family of step sizes. Finally, we demonstrate through various experiments that the proposed MomentumQ outperforms other momentum-based Q-learning algorithms.
△ Less
Submitted 30 July, 2020;
originally announced July 2020.
-
Analysis of Q-learning with Adaptation and Momentum Restart for Gradient Descent
Authors:
Bowen Weng,
Huaqing Xiong,
Yingbin Liang,
Wei Zhang
Abstract:
Existing convergence analyses of Q-learning mostly focus on the vanilla stochastic gradient descent (SGD) type of updates. Despite the Adaptive Moment Estimation (Adam) has been commonly used for practical Q-learning algorithms, there has not been any convergence guarantee provided for Q-learning with such type of updates. In this paper, we first characterize the convergence rate for Q-AMSGrad, wh…
▽ More
Existing convergence analyses of Q-learning mostly focus on the vanilla stochastic gradient descent (SGD) type of updates. Despite the Adaptive Moment Estimation (Adam) has been commonly used for practical Q-learning algorithms, there has not been any convergence guarantee provided for Q-learning with such type of updates. In this paper, we first characterize the convergence rate for Q-AMSGrad, which is the Q-learning algorithm with AMSGrad update (a commonly adopted alternative of Adam for theoretical analysis). To further improve the performance, we propose to incorporate the momentum restart scheme to Q-AMSGrad, resulting in the so-called Q-AMSGradR algorithm. The convergence rate of Q-AMSGradR is also established. Our experiments on a linear quadratic regulator problem show that the two proposed Q-learning algorithms outperform the vanilla Q-learning with SGD updates. The two algorithms also exhibit significantly better performance than the DQN learning method over a batch of Atari 2600 games.
△ Less
Submitted 14 July, 2020;
originally announced July 2020.
-
Model Predictive Instantaneous Safety Metric for Evaluation of Automated Driving Systems
Authors:
Bowen Weng,
Sughosh J. Rao,
Eeshan Deosthale,
Scott Schnelle,
Frank Barickman
Abstract:
Vehicles with Automated Driving Systems (ADS) operate in a high-dimensional continuous system with multi-agent interactions. This continuous system features various types of traffic agents (non-homogeneous) governed by continuous-motion ordinary differential equations (differential-drive). Each agent makes decisions independently that may lead to conflicts with the subject vehicle (SV), as well as…
▽ More
Vehicles with Automated Driving Systems (ADS) operate in a high-dimensional continuous system with multi-agent interactions. This continuous system features various types of traffic agents (non-homogeneous) governed by continuous-motion ordinary differential equations (differential-drive). Each agent makes decisions independently that may lead to conflicts with the subject vehicle (SV), as well as other participants (non-cooperative). A typical vehicle safety evaluation procedure that uses various safety-critical scenarios and observes resultant collisions (or near collisions), is not sufficient enough to evaluate the performance of the ADS in terms of operational safety status maintenance. In this paper, we introduce a Model Predictive Instantaneous Safety Metric (MPrISM), which determines the safety status of the SV, considering the worst-case safety scenario for a given traffic snapshot. The method then analyzes the SV's closeness to a potential collision within a certain evaluation time period. The described metric induces theoretical guarantees of safety in terms of the time to collision under standard assumptions. Through formulating the solution as a series of minimax quadratic optimization problems of a specific structure, the method is tractable for real-time safety evaluation applications. Its capabilities are demonstrated with synthesized examples and cases derived from real-world tests.
△ Less
Submitted 20 May, 2020;
originally announced May 2020.
-
Reciprocal Collision Avoidance for General Nonlinear Agents using Reinforcement Learning
Authors:
Hao Li,
Bowen Weng,
Abhishek Gupta,
Jia Pan,
Wei Zhang
Abstract:
Finding feasible and collision-free paths for multiple nonlinear agents is challenging in the decentralized scenarios due to limited available information of other agents and complex dynamics constraints. In this paper, we propose a fast multi-agent collision avoidance algorithm for general nonlinear agents with continuous action space, where each agent observes only positions and velocities of ne…
▽ More
Finding feasible and collision-free paths for multiple nonlinear agents is challenging in the decentralized scenarios due to limited available information of other agents and complex dynamics constraints. In this paper, we propose a fast multi-agent collision avoidance algorithm for general nonlinear agents with continuous action space, where each agent observes only positions and velocities of nearby agents. To reduce online computation, we first decompose the multi-agent scenario and solve a two agents collision avoidance problem using reinforcement learning (RL). When extending the trained policy to a multi-agent problem, safety is ensured by introducing the optimal reciprocal collision avoidance (ORCA) as linear constraints and the overall collision avoidance action could be found through simple convex optimization. Most existing RL-based multi-agent collision avoidance algorithms rely on the direct control of agent velocities. In sharp contrasts, our approach is applicable to general nonlinear agents. Realistic simulations based on nonlinear bicycle agent models are performed with various challenging scenarios, indicating a competitive performance of the proposed method in avoiding collisions, congestion and deadlock with smooth trajectories.
△ Less
Submitted 2 March, 2020; v1 submitted 23 October, 2019;
originally announced October 2019.
-
History-Gradient Aided Batch Size Adaptation for Variance Reduced Algorithms
Authors:
Kaiyi Ji,
Zhe Wang,
Bowen Weng,
Yi Zhou,
Wei Zhang,
Yingbin Liang
Abstract:
Variance-reduced algorithms, although achieve great theoretical performance, can run slowly in practice due to the periodic gradient estimation with a large batch of data. Batch-size adaptation thus arises as a promising approach to accelerate such algorithms. However, existing schemes either apply prescribed batch-size adaption rule or exploit the information along optimization path via additiona…
▽ More
Variance-reduced algorithms, although achieve great theoretical performance, can run slowly in practice due to the periodic gradient estimation with a large batch of data. Batch-size adaptation thus arises as a promising approach to accelerate such algorithms. However, existing schemes either apply prescribed batch-size adaption rule or exploit the information along optimization path via additional backtracking and condition verification steps. In this paper, we propose a novel scheme, which eliminates backtracking line search but still exploits the information along optimization path by adapting the batch size via history stochastic gradients. We further theoretically show that such a scheme substantially reduces the overall complexity for popular variance-reduced algorithms SVRG and SARAH/SPIDER for both conventional nonconvex optimization and reinforcement learning problems. To this end, we develop a new convergence analysis framework to handle the dependence of the batch size on history stochastic gradients. Extensive experiments validate the effectiveness of the proposed batch-size adaptation scheme.
△ Less
Submitted 26 July, 2020; v1 submitted 21 October, 2019;
originally announced October 2019.
-
Hybrid Zero Dynamics Inspired Feedback Control Policy Design for 3D Bipedal Locomotion using Reinforcement Learning
Authors:
Guillermo A. Castillo,
Bowen Weng,
Wei Zhang,
Ayonga Hereid
Abstract:
This paper presents a novel model-free reinforcement learning (RL) framework to design feedback control policies for 3D bipedal walking. Existing RL algorithms are often trained in an end-to-end manner or rely on prior knowledge of some reference joint trajectories. Different from these studies, we propose a novel policy structure that appropriately incorporates physical insights gained from the h…
▽ More
This paper presents a novel model-free reinforcement learning (RL) framework to design feedback control policies for 3D bipedal walking. Existing RL algorithms are often trained in an end-to-end manner or rely on prior knowledge of some reference joint trajectories. Different from these studies, we propose a novel policy structure that appropriately incorporates physical insights gained from the hybrid nature of the walking dynamics and the well-established hybrid zero dynamics approach for 3D bipedal walking. As a result, the overall RL framework has several key advantages, including lightweight network structure, short training time, and less dependence on prior knowledge. We demonstrate the effectiveness of the proposed method on Cassie, a challenging 3D bipedal robot. The proposed solution produces stable limit walking cycles that can track various walking speed in different directions. Surprisingly, without specifically trained with disturbances to achieve robustness, it also performs robustly against various adversarial forces applied to the torso towards both the forward and the backward directions.
△ Less
Submitted 3 October, 2019;
originally announced October 2019.
-
Accelerated Target Updates for Q-learning
Authors:
Bowen Weng,
Huaqing Xiong,
Wei Zhang
Abstract:
This paper studies accelerations in Q-learning algorithms. We propose an accelerated target update scheme by incorporating the historical iterates of Q functions. The idea is conceptually inspired by the momentum-based accelerated methods in the optimization theory. Conditions under which the proposed accelerated algorithms converge are established. The algorithms are validated using commonly adop…
▽ More
This paper studies accelerations in Q-learning algorithms. We propose an accelerated target update scheme by incorporating the historical iterates of Q functions. The idea is conceptually inspired by the momentum-based accelerated methods in the optimization theory. Conditions under which the proposed accelerated algorithms converge are established. The algorithms are validated using commonly adopted testing problems in reinforcement learning, including the FrozenLake grid world game, two discrete-time LQR problems from the Deepmind Control Suite, and the Atari 2600 games. Simulation results show that the proposed accelerated algorithms can improve the convergence performance compared with the vanilla Q-learning algorithm.
△ Less
Submitted 11 May, 2019; v1 submitted 7 May, 2019;
originally announced May 2019.
-
Reinforcement Learning Meets Hybrid Zero Dynamics: A Case Study for RABBIT
Authors:
Guillermo A. Castillo,
Bowen Weng,
Ayonga Hereid,
Wei Zhang
Abstract:
The design of feedback controllers for bipedal robots is challenging due to the hybrid nature of its dynamics and the complexity imposed by high-dimensional bipedal models. In this paper, we present a novel approach for the design of feedback controllers using Reinforcement Learning (RL) and Hybrid Zero Dynamics (HZD). Existing RL approaches for bipedal walking are inefficient as they do not consi…
▽ More
The design of feedback controllers for bipedal robots is challenging due to the hybrid nature of its dynamics and the complexity imposed by high-dimensional bipedal models. In this paper, we present a novel approach for the design of feedback controllers using Reinforcement Learning (RL) and Hybrid Zero Dynamics (HZD). Existing RL approaches for bipedal walking are inefficient as they do not consider the underlying physics, often requires substantial training, and the resulting controller may not be applicable to real robots. HZD is a powerful tool for bipedal control with local stability guarantees of the walking limit cycles. In this paper, we propose a non traditional RL structure that embeds the HZD framework into the policy learning. More specifically, we propose to use RL to find a control policy that maps from the robot's reduced order states to a set of parameters that define the desired trajectories for the robot's joints through the virtual constraints. Then, these trajectories are tracked using an adaptive PD controller. The method results in a stable and robust control policy that is able to track variable speed within a continuous interval. Robustness of the policy is evaluated by applying external forces to the torso of the robot. The proposed RL framework is implemented and demonstrated in OpenAI Gym with the MuJoCo physics engine based on the well-known RABBIT robot model.
△ Less
Submitted 3 October, 2018;
originally announced October 2018.