subscribe to arXiv mailings

arXiv:2405.20013 [pdf, other]

Repeatable and Reliable Efforts of Accelerated Risk Assessment

Authors: Linda Capito, Guillermo A. Castillo, Bowen Weng

Abstract: Risk assessment of a robot in controlled environments, such as laboratories and proving grounds, is a common means to assess, certify, validate, verify, and characterize the robots' safety performance before, during, and even after their commercialization in the real-world. A standard testing program that acquires the risk estimate is expected to be (i) repeatable, such that it obtains similar ris… ▽ More Risk assessment of a robot in controlled environments, such as laboratories and proving grounds, is a common means to assess, certify, validate, verify, and characterize the robots' safety performance before, during, and even after their commercialization in the real-world. A standard testing program that acquires the risk estimate is expected to be (i) repeatable, such that it obtains similar risk assessments of the same testing subject among multiple trials or attempts with the similar testing effort by different stakeholders, and (ii) reliable against a variety of testing subjects produced by different vendors and manufacturers. Both repeatability and reliability are fundamental and crucial for a testing algorithm's validity, fairness, and practical feasibility, especially for standardization. However, these properties are rarely satisfied or ensured, especially as the subject robots become more complex, uncertain, and varied. This issue was present in traditional risk assessments through Monte-Carlo sampling, and remains a bottleneck for the recent accelerated risk assessment methods, primarily those using importance sampling. This study aims to enhance existing accelerated testing frameworks by proposing a new algorithm that provably integrates repeatability and reliability with the already established formality and efficiency. It also features demonstrations assessing the risk of instability from frontal impacts, initiated by push-over disturbances on a controlled inverted pendulum and a 7-DoF planar bipedal robot Rabbit managed by various control algorithms. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2404.09022 [pdf, other]

Navigating the Landscape of Large Language Models: A Comprehensive Review and Analysis of Paradigms and Fine-Tuning Strategies

Authors: Benjue Weng

Abstract: With the surge of ChatGPT,the use of large models has significantly increased,rapidly rising to prominence across the industry and sweeping across the internet. This article is a comprehensive review of fine-tuning methods for large models. This paper investigates the latest technological advancements and the application of advanced methods in aspects such as task-adaptive fine-tuning,domain-adapt… ▽ More With the surge of ChatGPT,the use of large models has significantly increased,rapidly rising to prominence across the industry and sweeping across the internet. This article is a comprehensive review of fine-tuning methods for large models. This paper investigates the latest technological advancements and the application of advanced methods in aspects such as task-adaptive fine-tuning,domain-adaptive fine-tuning,few-shot learning,knowledge distillation,multi-task learning,parameter-efficient fine-tuning,and dynamic fine-tuning. △ Less

Submitted 13 April, 2024; originally announced April 2024.

arXiv:2309.15740 [pdf, other]

Data-Driven Latent Space Representation for Robust Bipedal Locomotion Learning

Authors: Guillermo A. Castillo, Bowen Weng, Wei Zhang, Ayonga Hereid

Abstract: This paper presents a novel framework for learning robust bipedal walking by combining a data-driven state representation with a Reinforcement Learning (RL) based locomotion policy. The framework utilizes an autoencoder to learn a low-dimensional latent space that captures the complex dynamics of bipedal locomotion from existing locomotion data. This reduced dimensional state representation is the… ▽ More This paper presents a novel framework for learning robust bipedal walking by combining a data-driven state representation with a Reinforcement Learning (RL) based locomotion policy. The framework utilizes an autoencoder to learn a low-dimensional latent space that captures the complex dynamics of bipedal locomotion from existing locomotion data. This reduced dimensional state representation is then used as states for training a robust RL-based gait policy, eliminating the need for heuristic state selections or the use of template models for gait planning. The results demonstrate that the learned latent variables are disentangled and directly correspond to different gaits or speeds, such as moving forward, backward, or walking in place. Compared to traditional template model-based approaches, our framework exhibits superior performance and robustness in simulation. The trained policy effectively tracks a wide range of walking speeds and demonstrates good generalization capabilities to unseen scenarios. △ Less

Submitted 27 September, 2023; originally announced September 2023.

Comments: Supplemental video: https://youtu.be/SUIkrigsrao

arXiv:2309.15442 [pdf, other]

Template Model Inspired Task Space Learning for Robust Bipedal Locomotion

Authors: Guillermo A. Castillo, Bowen Weng, Shunpeng Yang, Wei Zhang, Ayonga Hereid

Abstract: This work presents a hierarchical framework for bipedal locomotion that combines a Reinforcement Learning (RL)-based high-level (HL) planner policy for the online generation of task space commands with a model-based low-level (LL) controller to track the desired task space trajectories. Different from traditional end-to-end learning approaches, our HL policy takes insights from the angular momentu… ▽ More This work presents a hierarchical framework for bipedal locomotion that combines a Reinforcement Learning (RL)-based high-level (HL) planner policy for the online generation of task space commands with a model-based low-level (LL) controller to track the desired task space trajectories. Different from traditional end-to-end learning approaches, our HL policy takes insights from the angular momentum-based linear inverted pendulum (ALIP) to carefully design the observation and action spaces of the Markov Decision Process (MDP). This simple yet effective design creates an insightful mapping between a low-dimensional state that effectively captures the complex dynamics of bipedal locomotion and a set of task space outputs that shape the walking gait of the robot. The HL policy is agnostic to the task space LL controller, which increases the flexibility of the design and generalization of the framework to other bipedal robots. This hierarchical design results in a learning-based framework with improved performance, data efficiency, and robustness compared with the ALIP model-based approach and state-of-the-art learning-based frameworks for bipedal locomotion. The proposed hierarchical controller is tested in three different robots, Rabbit, a five-link underactuated planar biped; Walker2D, a seven-link fully-actuated planar biped; and Digit, a 3D humanoid robot with 20 actuated joints. The trained policy naturally learns human-like locomotion behaviors and is able to effectively track a wide range of walking speeds while preserving the robustness and stability of the walking gait even under adversarial conditions. △ Less

Submitted 27 September, 2023; originally announced September 2023.

Comments: Accepted at 2023 International Conference on Intelligent Robots and Systems (IROS). Supplemental Video: https://youtu.be/YTjMgGka4Ig

arXiv:2308.14636 [pdf, other]

Towards Standardized Disturbance Rejection Testing of Legged Robot Locomotion with Linear Impactor: A Preliminary Study, Observations, and Implications

Authors: Bowen Weng, Guillermo A. Castillo, Yun-Seok Kang, Ayonga Hereid

Abstract: Dynamic locomotion in legged robots is close to industrial collaboration, but a lack of standardized testing obstructs commercialization. The issues are not merely political, theoretical, or algorithmic but also physical, indicating limited studies and comprehension regarding standard testing infrastructure and equipment. For decades, the approaches we have been testing legged robots were rarely s… ▽ More Dynamic locomotion in legged robots is close to industrial collaboration, but a lack of standardized testing obstructs commercialization. The issues are not merely political, theoretical, or algorithmic but also physical, indicating limited studies and comprehension regarding standard testing infrastructure and equipment. For decades, the approaches we have been testing legged robots were rarely standardizable with hand-pushing, foot-kicking, rope-dragging, stick-poking, and ball-swinging. This paper aims to bridge the gap by proposing the use of the linear impactor, a well-established tool in other standardized testing disciplines, to serve as an adaptive, repeatable, and fair disturbance rejection testing equipment for legged robots. A pneumatic linear impactor is also adopted for the case study involving the humanoid robot Digit. Three locomotion controllers are examined, including a commercial one, using a walking-in-place task against frontal impacts. The statistically best controller was able to withstand the impact momentum (26.376 kg$\cdot$m/s) on par with a reported average effective momentum from straight punches by Olympic boxers (26.506 kg$\cdot$m/s). Moreover, the case study highlights other anti-intuitive observations, demonstrations, and implications that, to the best of the authors' knowledge, are first-of-its-kind revealed in real-world testing of legged robots. △ Less

Submitted 29 January, 2024; v1 submitted 28 August, 2023; originally announced August 2023.

Comments: A modified version of this preprint has been accepted at IEEE International Conference on Robotics and Automation (ICRA) 2024

arXiv:2306.14657 [pdf, other]

A Diversity Analysis of Safety Metrics Comparing Vehicle Performance in the Lead-Vehicle Interaction Regime

Authors: Harnarayan Singh, Bowen Weng, Sughosh J. Rao, Devin Elsasser

Abstract: Vehicle performance metrics analyze data sets consisting of subject vehicle's interactions with other road users in a nominal driving environment and provide certain performance measures as outputs. To the best of the authors' knowledge, the vehicle safety performance metrics research dates back to at least 1967. To date, there still does not exist a community-wide accepted metric or a set of metr… ▽ More Vehicle performance metrics analyze data sets consisting of subject vehicle's interactions with other road users in a nominal driving environment and provide certain performance measures as outputs. To the best of the authors' knowledge, the vehicle safety performance metrics research dates back to at least 1967. To date, there still does not exist a community-wide accepted metric or a set of metrics for vehicle safety performance assessment and justification. This issue gets further amplified with the evolving interest in Advanced Driver Assistance Systems and Automated Driving Systems. In this paper, the authors seek to perform a unified study that facilitates an improved community-wide understanding of vehicle performance metrics using the lead-vehicle interaction operational design domain as a common means of performance comparison. In particular, the authors study the diversity (including constructive formulation discrepancies and empirical performance differences) among 33 base metrics with up to 51 metric variants (with different choices of hyper-parameters) in the existing literature, published between 1967 and 2022. Two data sets are adopted for the empirical performance diversity analysis, including vehicle trajectories from normal highway driving environment and relatively high-risk incidents with collisions and near-miss cases. The analysis further implies that (i) the conceptual acceptance of a safety metric proposal can be problematic if the assumptions, conditions, and types of outcome assurance are not justified properly, and (ii) the empirical performance justification of an acceptable metric can also be problematic as a dominant consensus is not observed among metrics empirically. △ Less

Submitted 26 June, 2023; originally announced June 2023.

Comments: A modified manuscript of this preprint has been accepted to be published as a regular paper at IEEE Transactions on Intelligent Transportation Systems

arXiv:2209.09879 [pdf, other]

doi 10.1109/TRO.2023.3267020

On the Adversarial Scenario-based Safety Testing of Robots: the Comparability and Optimal Aggressiveness

Authors: Bowen Weng, Guillermo A. Castillo, Wei Zhang, Ayonga Hereid

Abstract: This paper studies the class of scenario-based safety testing algorithms in the black-box safety testing configuration. For algorithms sharing the same state-action set coverage with different sampling distributions, it is commonly believed that prioritizing the exploration of high-risk state-actions leads to a better sampling efficiency. Our proposal disputes the above intuition by introducing an… ▽ More This paper studies the class of scenario-based safety testing algorithms in the black-box safety testing configuration. For algorithms sharing the same state-action set coverage with different sampling distributions, it is commonly believed that prioritizing the exploration of high-risk state-actions leads to a better sampling efficiency. Our proposal disputes the above intuition by introducing an impossibility theorem that provably shows all safety testing algorithms of the aforementioned difference perform equally well with the same expected sampling efficiency. Moreover, for testing algorithms covering different sets of state-actions, the sampling efficiency criterion is no longer applicable as different algorithms do not necessarily converge to the same termination condition. We then propose a testing aggressiveness definition based on the almost safe set concept along with an unbiased and efficient algorithm that compares the aggressiveness between testing algorithms. Empirical observations from the safety testing of bipedal locomotion controllers and vehicle decision-making modules are also presented to support the proposed theoretical implications and methodologies. △ Less

Submitted 3 April, 2023; v1 submitted 20 September, 2022; originally announced September 2022.

Journal ref: IEEE Transactions on Robotics, 2023

arXiv:2204.07846 [pdf, other]

doi 10.1109/IROS47612.2022.9981359

On Safety Testing, Validation, and Characterization with Scenario-Sampling: A Case Study of Legged Robots

Authors: Bowen Weng, Guillermo A. Castillo, Wei Zhang, Ayonga Hereid

Abstract: The dynamic response of the legged robot locomotion is non-Lipschitz and can be stochastic due to environmental uncertainties. To test, validate, and characterize the safety performance of legged robots, existing solutions on observed and inferred risk can be incomplete and sampling inefficient. Some formal verification methods suffer from the model precision and other surrogate assumptions. In th… ▽ More The dynamic response of the legged robot locomotion is non-Lipschitz and can be stochastic due to environmental uncertainties. To test, validate, and characterize the safety performance of legged robots, existing solutions on observed and inferred risk can be incomplete and sampling inefficient. Some formal verification methods suffer from the model precision and other surrogate assumptions. In this paper, we propose a scenario sampling based testing framework that characterizes the overall safety performance of a legged robot by specifying (i) where (in terms of a set of states) the robot is potentially safe, and (ii) how safe the robot is within the specified set. The framework can also help certify the commercial deployment of the legged robot in real-world environment along with human and compare safety performance among legged robots with different mechanical structures and dynamic properties. The proposed framework is further deployed to evaluate a group of state-of-the-art legged robot locomotion controllers from various model-based, deep neural network involved, and reinforcement learning based methods in the literature. Among a series of intended work domains of the studied legged robots (e.g. tracking speed on sloped surface, with abrupt changes on demanded velocity, and against adversarial push-over disturbances), we show that the method can adequately capture the overall safety characterization and the subtle performance insights. Many of the observed safety outcomes, to the best of our knowledge, have never been reported by the existing work in the legged robot literature. △ Less

Submitted 16 April, 2022; originally announced April 2022.

Journal ref: 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

arXiv:2202.08935 [pdf, other]

A Formal Safety Characterization of Advanced Driver Assist Systems in the Car-Following Regime with Scenario-Sampling

Authors: Bowen Weng, Minghao Zhu, Keith Redmill

Abstract: The capability to follow a lead-vehicle and avoid rear-end collisions is one of the most important functionalities for human drivers and various Advanced Driver Assist Systems (ADAS). Existing safety performance justification of the car-following systems either relies on simple concrete scenarios with biased surrogate metrics or requires a significantly long driving distance for risk observation a… ▽ More The capability to follow a lead-vehicle and avoid rear-end collisions is one of the most important functionalities for human drivers and various Advanced Driver Assist Systems (ADAS). Existing safety performance justification of the car-following systems either relies on simple concrete scenarios with biased surrogate metrics or requires a significantly long driving distance for risk observation and inference. In this paper, we propose a guaranteed unbiased and sampling efficient scenario-based safety evaluation framework inspired by the previous work on $εδ$-almost safe set quantification. The proposal characterizes the complete safety performance of the test subject in the car-following regime. The performance of the proposed method is also demonstrated in challenging cases including some widely adopted car-following decision-making modules and the commercially available Openpilot driving stack by CommaAI. △ Less

Submitted 23 May, 2022; v1 submitted 17 February, 2022; originally announced February 2022.

arXiv:2111.08823 [pdf, other]

Meta-Auto-Decoder for Solving Parametric Partial Differential Equations

Authors: Xiang Huang, Zhanhong Ye, Hongsheng Liu, Beiji Shi, Zidong Wang, Kang Yang, Yang Li, Bingya Weng, Min Wang, Haotian Chu, Fan Yu, Bei Hua, Lei Chen, Bin Dong

Abstract: Many important problems in science and engineering require solving the so-called parametric partial differential equations (PDEs), i.e., PDEs with different physical parameters, boundary conditions, shapes of computation domains, etc. Recently, building learning-based numerical solvers for parametric PDEs has become an emerging new field. One category of methods such as the Deep Galerkin Method (D… ▽ More Many important problems in science and engineering require solving the so-called parametric partial differential equations (PDEs), i.e., PDEs with different physical parameters, boundary conditions, shapes of computation domains, etc. Recently, building learning-based numerical solvers for parametric PDEs has become an emerging new field. One category of methods such as the Deep Galerkin Method (DGM) and Physics-Informed Neural Networks (PINNs) aim to approximate the solution of the PDEs. They are typically unsupervised and mesh-free, but require going through the time-consuming network training process from scratch for each set of parameters of the PDE. Another category of methods such as Fourier Neural Operator (FNO) and Deep Operator Network (DeepONet) try to approximate the solution mapping directly. Being fast with only one forward inference for each PDE parameter without retraining, they often require a large corpus of paired input-output observations drawn from numerical simulations, and most of them need a predefined mesh as well. In this paper, we propose Meta-Auto-Decoder (MAD), a mesh-free and unsupervised deep learning method that enables the pre-trained model to be quickly adapted to equation instances by implicitly encoding (possibly heterogenous) PDE parameters as latent vectors. The proposed method MAD can be interpreted by manifold learning in infinite-dimensional spaces, granting it a geometric insight. Extensive numerical experiments show that the MAD method exhibits faster convergence speed without losing accuracy than other deep learning-based methods. The project page with code is available: https://gitee.com/mindspore/mindscience/tree/master/MindElec/. △ Less

Submitted 18 November, 2022; v1 submitted 14 November, 2021; originally announced November 2021.

arXiv:2111.07769 [pdf, other]

doi 10.1109/TITS.2022.3164358

A Finite-Sampling, Operational Domain Specific, and Provably Unbiased Connected and Automated Vehicle Safety Metric

Authors: Bowen Weng, Linda Capito, Umit Ozguner, Keith Redmill

Abstract: A connected and automated vehicle safety metric determines the performance of a subject vehicle (SV) by analyzing the data involving the interactions among the SV and other dynamic road users and environmental features. When the data set contains only a finite set of samples collected from the naturalistic mixed-traffic driving environment, a metric is expected to generalize the safety assessment… ▽ More A connected and automated vehicle safety metric determines the performance of a subject vehicle (SV) by analyzing the data involving the interactions among the SV and other dynamic road users and environmental features. When the data set contains only a finite set of samples collected from the naturalistic mixed-traffic driving environment, a metric is expected to generalize the safety assessment outcome from the observed finite samples to the unobserved cases by specifying in what domain the SV is expected to be safe and how safe the SV is, statistically, in that domain. However, to the best of our knowledge, none of the existing safety metrics are able to justify the above properties with an operational domain specific, guaranteed complete, and provably unbiased safety evaluation outcome. In this paper, we propose a novel safety metric that involves the $α$-shape and the $ε$-almost robustly forward invariant set to characterize the SV's almost safe operable domain and the probability for the SV to remain inside the safe domain indefinitely, respectively. The empirical performance of the proposed method is demonstrated in several different operational design domains through a series of cases covering a variety of fidelity levels (real-world and simulators), driving environments (highway, urban, and intersections), road users (car, truck, and pedestrian), and SV driving behaviors (human driver and self driving algorithms). △ Less

Submitted 2 February, 2022; v1 submitted 15 November, 2021; originally announced November 2021.

arXiv:2111.01394 [pdf, other]

Solving Partial Differential Equations with Point Source Based on Physics-Informed Neural Networks

Authors: Xiang Huang, Hongsheng Liu, Beiji Shi, Zidong Wang, Kang Yang, Yang Li, Bingya Weng, Min Wang, Haotian Chu, Jing Zhou, Fan Yu, Bei Hua, Lei Chen, Bin Dong

Abstract: In recent years, deep learning technology has been used to solve partial differential equations (PDEs), among which the physics-informed neural networks (PINNs) emerges to be a promising method for solving both forward and inverse PDE problems. PDEs with a point source that is expressed as a Dirac delta function in the governing equations are mathematical models of many physical processes. However… ▽ More In recent years, deep learning technology has been used to solve partial differential equations (PDEs), among which the physics-informed neural networks (PINNs) emerges to be a promising method for solving both forward and inverse PDE problems. PDEs with a point source that is expressed as a Dirac delta function in the governing equations are mathematical models of many physical processes. However, they cannot be solved directly by conventional PINNs method due to the singularity brought by the Dirac delta function. We propose a universal solution to tackle this problem with three novel techniques. Firstly the Dirac delta function is modeled as a continuous probability density function to eliminate the singularity; secondly a lower bound constrained uncertainty weighting algorithm is proposed to balance the PINNs losses between point source area and other areas; and thirdly a multi-scale deep neural network with periodic activation function is used to improve the accuracy and convergence speed of the PINNs method. We evaluate the proposed method with three representative PDEs, and the experimental results show that our method outperforms existing deep learning-based methods with respect to the accuracy, the efficiency and the versatility. △ Less

Submitted 2 November, 2021; originally announced November 2021.

arXiv:2110.02331 [pdf, other]

doi 10.1109/LRA.2021.3122517

A Formal Characterization of Black-Box System Safety Performance with Scenario Sampling

Authors: Bowen Weng, Linda Capito, Umit Ozguner, Keith Redmill

Abstract: A typical scenario-based evaluation framework seeks to characterize a black-box system's safety performance (e.g., failure rate) through repeatedly sampling initialization configurations (scenario sampling) and executing a certain test policy for scenario propagation (scenario testing) with the black-box system involved as the test subject. In this letter, we first present a novel safety evaluatio… ▽ More A typical scenario-based evaluation framework seeks to characterize a black-box system's safety performance (e.g., failure rate) through repeatedly sampling initialization configurations (scenario sampling) and executing a certain test policy for scenario propagation (scenario testing) with the black-box system involved as the test subject. In this letter, we first present a novel safety evaluation criterion that seeks to characterize the actual operational domain within which the test subject would remain safe indefinitely with high probability. By formulating the black-box testing scenario as a dynamic system, we show that the presented problem is equivalent to finding a certain "almost" robustly forward invariant set for the given system. Second, for an arbitrary scenario testing strategy, we propose a scenario sampling algorithm that is provably asymptotically optimal in obtaining the safe invariant set with arbitrarily high accuracy. Moreover, as one considers different testing strategies (e.g., biased sampling of safety-critical cases), we show that the proposed algorithm still converges to the unbiased approximation of the safety characterization outcome if the scenario testing satisfies a certain condition. Finally, the effectiveness of the presented scenario sampling algorithms and various theoretical properties are demonstrated in a case study of the safety evaluation of a control barrier function-based mobile robot collision avoidance system. △ Less

Submitted 5 October, 2021; originally announced October 2021.

Comments: A shorter version of this manuscript has been accepted to be published at IEEE Robotics and Automation Letters (RA-L)

Journal ref: IEEE Robotics and Automation Letters, vol. 7, no. 1, pp. 199-206, Jan. 2022

arXiv:2104.09595 [pdf, other]

doi 10.1109/TIV.2021.3117049

Towards Guaranteed Safety Assurance of Automated Driving Systems with Scenario Sampling: An Invariant Set Perspective (Extended Version)

Authors: Bowen Weng, Linda Capito, Umit Ozguner, Keith Redmill

Abstract: How many scenarios are sufficient to validate the safe Operational Design Domain (ODD) of an Automated Driving System (ADS) equipped vehicle? Is a more significant number of sampled scenarios guaranteeing a more accurate safety assessment of the ADS? Despite the various empirical success of ADS safety evaluation with scenario sampling in practice, some of the fundamental properties are largely unk… ▽ More How many scenarios are sufficient to validate the safe Operational Design Domain (ODD) of an Automated Driving System (ADS) equipped vehicle? Is a more significant number of sampled scenarios guaranteeing a more accurate safety assessment of the ADS? Despite the various empirical success of ADS safety evaluation with scenario sampling in practice, some of the fundamental properties are largely unknown. This paper seeks to remedy this gap by formulating and tackling the scenario sampling safety assurance problem from a set invariance perspective. First, a novel conceptual equivalence is drawn between the scenario sampling safety assurance problem and the data-driven robustly controlled forward invariant set validation and quantification problem. This paper then provides a series of resolution complete and probabilistic complete solutions with finite-sampling analyses for the safety validation problem that authenticates a given ODD. On the other hand, the quantification problem escalates the validation challenge and starts looking for a safe sub-domain of a particular property. This inspires various algorithms that are provably probabilistic incomplete, probabilistic complete but sub-optimal, and asymptotically optimal. Finally, the proposed asymptotically optimal scenario sampling safety quantification algorithm is also empirically demonstrated through simulation experiments. △ Less

Submitted 29 September, 2021; v1 submitted 19 April, 2021; originally announced April 2021.

Comments: A shorter version of this manuscript has been accepted by the IEEE Transactions on Intelligent Vehicles

arXiv:2103.15309 [pdf, other]

Robust Feedback Motion Policy Design Using Reinforcement Learning on a 3D Digit Bipedal Robot

Authors: Guillermo A. Castillo, Bowen Weng, Wei Zhang, Ayonga Hereid

Abstract: In this paper, a hierarchical and robust framework for learning bipedal locomotion is presented and successfully implemented on the 3D biped robot Digit built by Agility Robotics. We propose a cascade-structure controller that combines the learning process with intuitive feedback regulations. This design allows the framework to realize robust and stable walking with a reduced-dimension state and a… ▽ More In this paper, a hierarchical and robust framework for learning bipedal locomotion is presented and successfully implemented on the 3D biped robot Digit built by Agility Robotics. We propose a cascade-structure controller that combines the learning process with intuitive feedback regulations. This design allows the framework to realize robust and stable walking with a reduced-dimension state and action spaces of the policy, significantly simplifying the design and reducing the sampling efficiency of the learning method. The inclusion of feedback regulation into the framework improves the robustness of the learned walking gait and ensures the success of the sim-to-real transfer of the proposed controller with minimal tuning. We specifically present a learning pipeline that considers hardware-feasible initial poses of the robot within the learning process to ensure the initial state of the learning is replicated as close as possible to the initial state of the robot in hardware experiments. Finally, we demonstrate the feasibility of our method by successfully transferring the learned policy in simulation to the Digit robot hardware, realizing sustained walking gaits under external force disturbances and challenging terrains not included during the training process. To the best of our knowledge, this is the first time a learning-based policy is transferred successfully to the Digit robot in hardware experiments without using dynamic randomization or curriculum learning. △ Less

Submitted 28 March, 2021; originally announced March 2021.

Comments: "Supplemental video: https://www.youtube.com/watch?v=j8KbW-a9dbw"

arXiv:2101.08783 [pdf]

A Person Re-identification Data Augmentation Method with Adversarial Defense Effect

Authors: Yunpeng Gong, Zhiyong Zeng, Liwen Chen, Yifan Luo, Bin Weng, Feng Ye

Abstract: The security of the Person Re-identification(ReID) model plays a decisive role in the application of ReID. However, deep neural networks have been shown to be vulnerable, and adding undetectable adversarial perturbations to clean images can trick deep neural networks that perform well in clean images. We propose a ReID multi-modal data augmentation method with adversarial defense effect: 1) Graysc… ▽ More The security of the Person Re-identification(ReID) model plays a decisive role in the application of ReID. However, deep neural networks have been shown to be vulnerable, and adding undetectable adversarial perturbations to clean images can trick deep neural networks that perform well in clean images. We propose a ReID multi-modal data augmentation method with adversarial defense effect: 1) Grayscale Patch Replacement, it consists of Local Grayscale Patch Replacement(LGPR) and Global Grayscale Patch Replacement(GGPR). This method can not only improve the accuracy of the model, but also help the model defend against adversarial examples; 2) Multi-Modal Defense, it integrates three homogeneous modal images of visible, grayscale and sketch, and further strengthens the defense ability of the model. These methods fuse different modalities of homogeneous images to enrich the input sample variety, the variaty of samples will reduce the over-fitting of the ReID model to color variations and make the adversarial space of the dataset that the attack method can find difficult to align, thus the accuracy of model is improved, and the attack effect is greatly reduced. The more modal homogeneous images are fused, the stronger the defense capabilities is . The proposed method performs well on multiple datasets, and successfully defends the attack of MS-SSIM proposed by CVPR2020 against ReID [10], and increases the accuracy by 467 times(0.2% to 93.3%).The code is available at https://github.com/finger-monkey/ReID_Adversarial_Defense. △ Less

Submitted 7 April, 2021; v1 submitted 21 January, 2021; originally announced January 2021.

Comments: arXiv admin note: text overlap with arXiv:2101.08533

arXiv:2010.01197 [pdf, other]

Stock2Vec: A Hybrid Deep Learning Framework for Stock Market Prediction with Representation Learning and Temporal Convolutional Network

Authors: Xing Wang, Yijun Wang, Bin Weng, Aleksandr Vinel

Abstract: We have proposed to develop a global hybrid deep learning framework to predict the daily prices in the stock market. With representation learning, we derived an embedding called Stock2Vec, which gives us insight for the relationship among different stocks, while the temporal convolutional layers are used for automatically capturing effective temporal patterns both within and across series. Evaluat… ▽ More We have proposed to develop a global hybrid deep learning framework to predict the daily prices in the stock market. With representation learning, we derived an embedding called Stock2Vec, which gives us insight for the relationship among different stocks, while the temporal convolutional layers are used for automatically capturing effective temporal patterns both within and across series. Evaluated on S&P 500, our hybrid framework integrates both advantages and achieves better performance on the stock price prediction task than several popular benchmarked models. △ Less

Submitted 29 September, 2020; originally announced October 2020.

arXiv:2009.12222 [pdf, other]

A Modeled Approach for Online Adversarial Test of Operational Vehicle Safety (extended version)

Authors: Linda Capito, Bowen Weng, Umit Ozguner, Keith Redmill

Abstract: The scenario-based testing of operational vehicle safety presents a set of principal other vehicle (POV) trajectories that seek to force the subject vehicle (SV) into a certain safety-critical situation. Current scenarios are mostly (i) statistics-driven: inspired by human driver crash data, (ii) deterministic: POV trajectories are pre-determined and are independent of SV responses, and (iii) over… ▽ More The scenario-based testing of operational vehicle safety presents a set of principal other vehicle (POV) trajectories that seek to force the subject vehicle (SV) into a certain safety-critical situation. Current scenarios are mostly (i) statistics-driven: inspired by human driver crash data, (ii) deterministic: POV trajectories are pre-determined and are independent of SV responses, and (iii) overly simplified: defined over a finite set of actions performed at the abstracted motion planning level. Such scenario-based testing (i) lacks severity guarantees, (ii) has predefined maneuvers making it easy for an SV with intelligent driving policies to game the test, and (iii) is inefficient in producing safety-critical instances with limited and expensive testing effort. We propose a model-driven online feedback control policy for multiple POVs which propagates efficient adversarial trajectories while respecting traffic rules and other concerns formulated as an admissible state-action space. The approach is formulated in an anchor-template hierarchy structure, with the template model planning inducing a theoretical SV capturing guarantee under standard assumptions. The planned adversarial trajectory is then tracked by a lower-level controller applied to the full-system or the anchor model. The effectiveness of the methodology is illustrated through various simulated examples with the SV controlled by either parameterized self-driving policies or human drivers. △ Less

Submitted 20 May, 2021; v1 submitted 25 September, 2020; originally announced September 2020.

Comments: This document is the extended version of our paper accepted to the 2021 IEEE American Control Conference

arXiv:2008.00376 [pdf, other]

Velocity Regulation of 3D Bipedal Walking Robots with Uncertain Dynamics Through Adaptive Neural Network Controller

Authors: Guillermo A. Castillo, Bowen Weng, Terrence C. Stewart, Wei Zhang, Ayonga Hereid

Abstract: This paper presents a neural-network based adaptive feedback control structure to regulate the velocity of 3D bipedal robots under dynamics uncertainties. Existing Hybrid Zero Dynamics (HZD)-based controllers regulate velocity through the implementation of heuristic regulators that do not consider model and environmental uncertainties, which may significantly affect the tracking performance of the… ▽ More This paper presents a neural-network based adaptive feedback control structure to regulate the velocity of 3D bipedal robots under dynamics uncertainties. Existing Hybrid Zero Dynamics (HZD)-based controllers regulate velocity through the implementation of heuristic regulators that do not consider model and environmental uncertainties, which may significantly affect the tracking performance of the controllers. In this paper, we address the uncertainties in the robot dynamics from the perspective of the reduced dimensional representation of virtual constraints and propose the integration of an adaptive neural network-based controller to regulate the robot velocity in the presence of model parameter uncertainties. The proposed approach yields improved tracking performance under dynamics uncertainties. The shallow adaptive neural network used in this paper does not require training a priori and has the potential to be implemented on the real-time robotic controller. A comparative simulation study of a 3D Cassie robot is presented to illustrate the performance of the proposed approach under various scenarios. △ Less

Submitted 1 August, 2020; originally announced August 2020.

Comments: "Accepted at 2020 International Conference on Intelligent Robots and Systems (IROS 2020). Supplemental Video: https://youtu.be/DAHk9-GFS0k"

arXiv:2007.15418 [pdf, other]

Momentum Q-learning with Finite-Sample Convergence Guarantee

Authors: Bowen Weng, Huaqing Xiong, Lin Zhao, Yingbin Liang, Wei Zhang

Abstract: Existing studies indicate that momentum ideas in conventional optimization can be used to improve the performance of Q-learning algorithms. However, the finite-sample analysis for momentum-based Q-learning algorithms is only available for the tabular case without function approximations. This paper analyzes a class of momentum-based Q-learning algorithms with finite-sample guarantee. Specifically,… ▽ More Existing studies indicate that momentum ideas in conventional optimization can be used to improve the performance of Q-learning algorithms. However, the finite-sample analysis for momentum-based Q-learning algorithms is only available for the tabular case without function approximations. This paper analyzes a class of momentum-based Q-learning algorithms with finite-sample guarantee. Specifically, we propose the MomentumQ algorithm, which integrates the Nesterov's and Polyak's momentum schemes, and generalizes the existing momentum-based Q-learning algorithms. For the infinite state-action space case, we establish the convergence guarantee for MomentumQ with linear function approximations and Markovian sampling. In particular, we characterize the finite-sample convergence rate which is provably faster than the vanilla Q-learning. This is the first finite-sample analysis for momentum-based Q-learning algorithms with function approximations. For the tabular case under synchronous sampling, we also obtain a finite-sample convergence rate that is slightly better than the SpeedyQ \citep{azar2011speedy} when choosing a special family of step sizes. Finally, we demonstrate through various experiments that the proposed MomentumQ outperforms other momentum-based Q-learning algorithms. △ Less

Submitted 30 July, 2020; originally announced July 2020.

arXiv:2007.07422 [pdf, other]

doi 10.24963/ijcai.2020/422

Analysis of Q-learning with Adaptation and Momentum Restart for Gradient Descent

Authors: Bowen Weng, Huaqing Xiong, Yingbin Liang, Wei Zhang

Abstract: Existing convergence analyses of Q-learning mostly focus on the vanilla stochastic gradient descent (SGD) type of updates. Despite the Adaptive Moment Estimation (Adam) has been commonly used for practical Q-learning algorithms, there has not been any convergence guarantee provided for Q-learning with such type of updates. In this paper, we first characterize the convergence rate for Q-AMSGrad, wh… ▽ More Existing convergence analyses of Q-learning mostly focus on the vanilla stochastic gradient descent (SGD) type of updates. Despite the Adaptive Moment Estimation (Adam) has been commonly used for practical Q-learning algorithms, there has not been any convergence guarantee provided for Q-learning with such type of updates. In this paper, we first characterize the convergence rate for Q-AMSGrad, which is the Q-learning algorithm with AMSGrad update (a commonly adopted alternative of Adam for theoretical analysis). To further improve the performance, we propose to incorporate the momentum restart scheme to Q-AMSGrad, resulting in the so-called Q-AMSGradR algorithm. The convergence rate of Q-AMSGradR is also established. Our experiments on a linear quadratic regulator problem show that the two proposed Q-learning algorithms outperform the vanilla Q-learning with SGD updates. The two algorithms also exhibit significantly better performance than the DQN learning method over a batch of Atari 2600 games. △ Less

Submitted 14 July, 2020; originally announced July 2020.

Comments: This paper extends the work presented at the 2020 International Joint Conferences on Artificial Intelligence with supplementary materials

Journal ref: Proceedings of the Twenty-Ninth International Joint Conference IJCAI20 (2020) 3051-3057

arXiv:2005.09999 [pdf, other]

Model Predictive Instantaneous Safety Metric for Evaluation of Automated Driving Systems

Authors: Bowen Weng, Sughosh J. Rao, Eeshan Deosthale, Scott Schnelle, Frank Barickman

Abstract: Vehicles with Automated Driving Systems (ADS) operate in a high-dimensional continuous system with multi-agent interactions. This continuous system features various types of traffic agents (non-homogeneous) governed by continuous-motion ordinary differential equations (differential-drive). Each agent makes decisions independently that may lead to conflicts with the subject vehicle (SV), as well as… ▽ More Vehicles with Automated Driving Systems (ADS) operate in a high-dimensional continuous system with multi-agent interactions. This continuous system features various types of traffic agents (non-homogeneous) governed by continuous-motion ordinary differential equations (differential-drive). Each agent makes decisions independently that may lead to conflicts with the subject vehicle (SV), as well as other participants (non-cooperative). A typical vehicle safety evaluation procedure that uses various safety-critical scenarios and observes resultant collisions (or near collisions), is not sufficient enough to evaluate the performance of the ADS in terms of operational safety status maintenance. In this paper, we introduce a Model Predictive Instantaneous Safety Metric (MPrISM), which determines the safety status of the SV, considering the worst-case safety scenario for a given traffic snapshot. The method then analyzes the SV's closeness to a potential collision within a certain evaluation time period. The described metric induces theoretical guarantees of safety in terms of the time to collision under standard assumptions. Through formulating the solution as a series of minimax quadratic optimization problems of a specific structure, the method is tractable for real-time safety evaluation applications. Its capabilities are demonstrated with synthesized examples and cases derived from real-world tests. △ Less

Submitted 20 May, 2020; originally announced May 2020.

Comments: Accepted at IEEE Intelligent Vehicles Symposium (IV), 2020

arXiv:1910.10887 [pdf, other]

Reciprocal Collision Avoidance for General Nonlinear Agents using Reinforcement Learning

Authors: Hao Li, Bowen Weng, Abhishek Gupta, Jia Pan, Wei Zhang

Abstract: Finding feasible and collision-free paths for multiple nonlinear agents is challenging in the decentralized scenarios due to limited available information of other agents and complex dynamics constraints. In this paper, we propose a fast multi-agent collision avoidance algorithm for general nonlinear agents with continuous action space, where each agent observes only positions and velocities of ne… ▽ More Finding feasible and collision-free paths for multiple nonlinear agents is challenging in the decentralized scenarios due to limited available information of other agents and complex dynamics constraints. In this paper, we propose a fast multi-agent collision avoidance algorithm for general nonlinear agents with continuous action space, where each agent observes only positions and velocities of nearby agents. To reduce online computation, we first decompose the multi-agent scenario and solve a two agents collision avoidance problem using reinforcement learning (RL). When extending the trained policy to a multi-agent problem, safety is ensured by introducing the optimal reciprocal collision avoidance (ORCA) as linear constraints and the overall collision avoidance action could be found through simple convex optimization. Most existing RL-based multi-agent collision avoidance algorithms rely on the direct control of agent velocities. In sharp contrasts, our approach is applicable to general nonlinear agents. Realistic simulations based on nonlinear bicycle agent models are performed with various challenging scenarios, indicating a competitive performance of the proposed method in avoiding collisions, congestion and deadlock with smooth trajectories. △ Less

Submitted 2 March, 2020; v1 submitted 23 October, 2019; originally announced October 2019.

arXiv:1910.09670 [pdf, other]

History-Gradient Aided Batch Size Adaptation for Variance Reduced Algorithms

Authors: Kaiyi Ji, Zhe Wang, Bowen Weng, Yi Zhou, Wei Zhang, Yingbin Liang

Abstract: Variance-reduced algorithms, although achieve great theoretical performance, can run slowly in practice due to the periodic gradient estimation with a large batch of data. Batch-size adaptation thus arises as a promising approach to accelerate such algorithms. However, existing schemes either apply prescribed batch-size adaption rule or exploit the information along optimization path via additiona… ▽ More Variance-reduced algorithms, although achieve great theoretical performance, can run slowly in practice due to the periodic gradient estimation with a large batch of data. Batch-size adaptation thus arises as a promising approach to accelerate such algorithms. However, existing schemes either apply prescribed batch-size adaption rule or exploit the information along optimization path via additional backtracking and condition verification steps. In this paper, we propose a novel scheme, which eliminates backtracking line search but still exploits the information along optimization path by adapting the batch size via history stochastic gradients. We further theoretically show that such a scheme substantially reduces the overall complexity for popular variance-reduced algorithms SVRG and SARAH/SPIDER for both conventional nonconvex optimization and reinforcement learning problems. To this end, we develop a new convergence analysis framework to handle the dependence of the batch size on history stochastic gradients. Extensive experiments validate the effectiveness of the proposed batch-size adaptation scheme. △ Less

Submitted 26 July, 2020; v1 submitted 21 October, 2019; originally announced October 2019.

Comments: 46 pages, 23 figures; Published in ICML 2020

arXiv:1910.01748 [pdf, other]

Hybrid Zero Dynamics Inspired Feedback Control Policy Design for 3D Bipedal Locomotion using Reinforcement Learning

Authors: Guillermo A. Castillo, Bowen Weng, Wei Zhang, Ayonga Hereid

Abstract: This paper presents a novel model-free reinforcement learning (RL) framework to design feedback control policies for 3D bipedal walking. Existing RL algorithms are often trained in an end-to-end manner or rely on prior knowledge of some reference joint trajectories. Different from these studies, we propose a novel policy structure that appropriately incorporates physical insights gained from the h… ▽ More This paper presents a novel model-free reinforcement learning (RL) framework to design feedback control policies for 3D bipedal walking. Existing RL algorithms are often trained in an end-to-end manner or rely on prior knowledge of some reference joint trajectories. Different from these studies, we propose a novel policy structure that appropriately incorporates physical insights gained from the hybrid nature of the walking dynamics and the well-established hybrid zero dynamics approach for 3D bipedal walking. As a result, the overall RL framework has several key advantages, including lightweight network structure, short training time, and less dependence on prior knowledge. We demonstrate the effectiveness of the proposed method on Cassie, a challenging 3D bipedal robot. The proposed solution produces stable limit walking cycles that can track various walking speed in different directions. Surprisingly, without specifically trained with disturbances to achieve robustness, it also performs robustly against various adversarial forces applied to the torso towards both the forward and the backward directions. △ Less

Submitted 3 October, 2019; originally announced October 2019.

Comments: Supplemental video: https://youtu.be/GOT6bnxqwuU

arXiv:1905.02841

Accelerated Target Updates for Q-learning

Authors: Bowen Weng, Huaqing Xiong, Wei Zhang

Abstract: This paper studies accelerations in Q-learning algorithms. We propose an accelerated target update scheme by incorporating the historical iterates of Q functions. The idea is conceptually inspired by the momentum-based accelerated methods in the optimization theory. Conditions under which the proposed accelerated algorithms converge are established. The algorithms are validated using commonly adop… ▽ More This paper studies accelerations in Q-learning algorithms. We propose an accelerated target update scheme by incorporating the historical iterates of Q functions. The idea is conceptually inspired by the momentum-based accelerated methods in the optimization theory. Conditions under which the proposed accelerated algorithms converge are established. The algorithms are validated using commonly adopted testing problems in reinforcement learning, including the FrozenLake grid world game, two discrete-time LQR problems from the Deepmind Control Suite, and the Atari 2600 games. Simulation results show that the proposed accelerated algorithms can improve the convergence performance compared with the vanilla Q-learning algorithm. △ Less

Submitted 11 May, 2019; v1 submitted 7 May, 2019; originally announced May 2019.

Comments: We need further adjustment of some parts of the papaer

arXiv:1810.01977 [pdf, other]

Reinforcement Learning Meets Hybrid Zero Dynamics: A Case Study for RABBIT

Authors: Guillermo A. Castillo, Bowen Weng, Ayonga Hereid, Wei Zhang

Abstract: The design of feedback controllers for bipedal robots is challenging due to the hybrid nature of its dynamics and the complexity imposed by high-dimensional bipedal models. In this paper, we present a novel approach for the design of feedback controllers using Reinforcement Learning (RL) and Hybrid Zero Dynamics (HZD). Existing RL approaches for bipedal walking are inefficient as they do not consi… ▽ More The design of feedback controllers for bipedal robots is challenging due to the hybrid nature of its dynamics and the complexity imposed by high-dimensional bipedal models. In this paper, we present a novel approach for the design of feedback controllers using Reinforcement Learning (RL) and Hybrid Zero Dynamics (HZD). Existing RL approaches for bipedal walking are inefficient as they do not consider the underlying physics, often requires substantial training, and the resulting controller may not be applicable to real robots. HZD is a powerful tool for bipedal control with local stability guarantees of the walking limit cycles. In this paper, we propose a non traditional RL structure that embeds the HZD framework into the policy learning. More specifically, we propose to use RL to find a control policy that maps from the robot's reduced order states to a set of parameters that define the desired trajectories for the robot's joints through the virtual constraints. Then, these trajectories are tracked using an adaptive PD controller. The method results in a stable and robust control policy that is able to track variable speed within a continuous interval. Robustness of the policy is evaluated by applying external forces to the torso of the robot. The proposed RL framework is implemented and demonstrated in OpenAI Gym with the MuJoCo physics engine based on the well-known RABBIT robot model. △ Less

Submitted 3 October, 2018; originally announced October 2018.

Comments: Supplemental video: https://www.youtube.com/watch?v=dhHMfnl7YlU

Showing 1–27 of 27 results for author: Weng, B