subscribe to arXiv mailings

100 Drivers, 2200 km: A Natural Dataset of Driving Style toward Human-centered Intelligent Driving Systems

Authors: Chaopeng Zhang, Wenshuo Wang, Zhaokun Chen, Junqiang Xi

Abstract: Effective driving style analysis is critical to developing human-centered intelligent driving systems that consider drivers' preferences. However, the approaches and conclusions of most related studies are diverse and inconsistent because no unified datasets tagged with driving styles exist as a reliable benchmark. The absence of explicit driving style labels makes verifying different approaches a… ▽ More Effective driving style analysis is critical to developing human-centered intelligent driving systems that consider drivers' preferences. However, the approaches and conclusions of most related studies are diverse and inconsistent because no unified datasets tagged with driving styles exist as a reliable benchmark. The absence of explicit driving style labels makes verifying different approaches and algorithms difficult. This paper provides a new benchmark by constructing a natural dataset of Driving Style (100-DrivingStyle) tagged with the subjective evaluation of 100 drivers' driving styles. In this dataset, the subjective quantification of each driver's driving style is from themselves and an expert according to the Likert-scale questionnaire. The testing routes are selected to cover various driving scenarios, including highways, urban, highway ramps, and signalized traffic. The collected driving data consists of lateral and longitudinal manipulation information, including steering angle, steering speed, lateral acceleration, throttle position, throttle rate, brake pressure, etc. This dataset is the first to provide detailed manipulation data with driving-style tags, and we demonstrate its benchmark function using six classifiers. The 100-DrivingStyle dataset is available via https://github.com/chaopengzhang/100-DrivingStyle-Dataset △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2405.20775 [pdf, other]

Cross-Modality Jailbreak and Mismatched Attacks on Medical Multimodal Large Language Models

Authors: Xijie Huang, Xinyuan Wang, Hantao Zhang, Jiawen Xi, Jingkun An, Hao Wang, Chengwei Pan

Abstract: Security concerns related to Large Language Models (LLMs) have been extensively explored, yet the safety implications for Multimodal Large Language Models (MLLMs), particularly in medical contexts (MedMLLMs), remain insufficiently studied. This paper delves into the underexplored security vulnerabilities of MedMLLMs, especially when deployed in clinical environments where the accuracy and relevanc… ▽ More Security concerns related to Large Language Models (LLMs) have been extensively explored, yet the safety implications for Multimodal Large Language Models (MLLMs), particularly in medical contexts (MedMLLMs), remain insufficiently studied. This paper delves into the underexplored security vulnerabilities of MedMLLMs, especially when deployed in clinical environments where the accuracy and relevance of question-and-answer interactions are critically tested against complex medical challenges. By combining existing clinical medical data with atypical natural phenomena, we redefine two types of attacks: mismatched malicious attack (2M-attack) and optimized mismatched malicious attack (O2M-attack). Using our own constructed voluminous 3MAD dataset, which covers a wide range of medical image modalities and harmful medical scenarios, we conduct a comprehensive analysis and propose the MCM optimization method, which significantly enhances the attack success rate on MedMLLMs. Evaluations with this dataset and novel attack methods, including white-box attacks on LLaVA-Med and transfer attacks on four other state-of-the-art models, indicate that even MedMLLMs designed with enhanced security features are vulnerable to security breaches. Our work underscores the urgent need for a concerted effort to implement robust security measures and enhance the safety and efficacy of open-source MedMLLMs, particularly given the potential severity of jailbreak attacks and other malicious or clinically significant exploits in medical settings. For further research and replication, anonymous access to our code is available at https://github.com/dirtycomputer/O2M_attack. Warning: Medical large model jailbreaking may generate content that includes unverified diagnoses and treatment recommendations. Always consult professional medical advice. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2405.19514 [pdf, other]

doi 10.1145/3656420

Wavefront Threading Enables Effective High-Level Synthesis

Authors: Blake Pelton, Adam Sapek, Ken Eguro, Daniel Lo, Alessandro Forin, Matt Humphrey, Jinwen Xi, David Cox, Rajas Karandikar, Johannes de Fine Licht, Evgeny Babin, Adrian Caulfield, Doug Burger

Abstract: Digital systems are growing in importance and computing hardware is growing more heterogeneous. Hardware design, however, remains laborious and expensive, in part due to the limitations of conventional hardware description languages (HDLs) like VHDL and Verilog. A longstanding research goal has been programming hardware like software, with high-level languages that can generate efficient hardware… ▽ More Digital systems are growing in importance and computing hardware is growing more heterogeneous. Hardware design, however, remains laborious and expensive, in part due to the limitations of conventional hardware description languages (HDLs) like VHDL and Verilog. A longstanding research goal has been programming hardware like software, with high-level languages that can generate efficient hardware designs. This paper describes Kanagawa, a language that takes a new approach to combine the programmer productivity benefits of traditional High-Level Synthesis (HLS) approaches with the expressibility and hardware efficiency of Register-Transfer Level (RTL) design. The language's concise syntax, matched with a hardware design-friendly execution model, permits a relatively simple toolchain to map high-level code into efficient hardware implementations. △ Less

Submitted 10 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

Comments: Accepted to PLDI'24

arXiv:2405.05945 [pdf, other]

Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers

Authors: Peng Gao, Le Zhuo, Dongyang Liu, Ruoyi Du, Xu Luo, Longtian Qiu, Yuhang Zhang, Chen Lin, Rongjie Huang, Shijie Geng, Renrui Zhang, Junlin Xi, Wenqi Shao, Zhengkai Jiang, Tianshuo Yang, Weicai Ye, He Tong, Jingwen He, Yu Qiao, Hongsheng Li

Abstract: Sora unveils the potential of scaling Diffusion Transformer for generating photorealistic images and videos at arbitrary resolutions, aspect ratios, and durations, yet it still lacks sufficient implementation details. In this technical report, we introduce the Lumina-T2X family - a series of Flow-based Large Diffusion Transformers (Flag-DiT) equipped with zero-initialized attention, as a unified f… ▽ More Sora unveils the potential of scaling Diffusion Transformer for generating photorealistic images and videos at arbitrary resolutions, aspect ratios, and durations, yet it still lacks sufficient implementation details. In this technical report, we introduce the Lumina-T2X family - a series of Flow-based Large Diffusion Transformers (Flag-DiT) equipped with zero-initialized attention, as a unified framework designed to transform noise into images, videos, multi-view 3D objects, and audio clips conditioned on text instructions. By tokenizing the latent spatial-temporal space and incorporating learnable placeholders such as [nextline] and [nextframe] tokens, Lumina-T2X seamlessly unifies the representations of different modalities across various spatial-temporal resolutions. This unified approach enables training within a single framework for different modalities and allows for flexible generation of multimodal data at any resolution, aspect ratio, and length during inference. Advanced techniques like RoPE, RMSNorm, and flow matching enhance the stability, flexibility, and scalability of Flag-DiT, enabling models of Lumina-T2X to scale up to 7 billion parameters and extend the context window to 128K tokens. This is particularly beneficial for creating ultra-high-definition images with our Lumina-T2I model and long 720p videos with our Lumina-T2V model. Remarkably, Lumina-T2I, powered by a 5-billion-parameter Flag-DiT, requires only 35% of the training computational costs of a 600-million-parameter naive DiT. Our further comprehensive analysis underscores Lumina-T2X's preliminary capability in resolution extrapolation, high-resolution editing, generating consistent 3D views, and synthesizing videos with seamless transitions. We expect that the open-sourcing of Lumina-T2X will further foster creativity, transparency, and diversity in the generative AI community. △ Less

Submitted 13 June, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

Comments: Technical Report; Code at: https://github.com/Alpha-VLLM/Lumina-T2X

arXiv:2404.01595 [pdf, other]

Propensity Score Alignment of Unpaired Multimodal Data

Authors: Johnny Xi, Jason Hartford

Abstract: Multimodal representation learning techniques typically rely on paired samples to learn common representations, but paired samples are challenging to collect in fields such as biology where measurement devices often destroy the samples. This paper presents an approach to address the challenge of aligning unpaired samples across disparate modalities in multimodal representation learning. We draw an… ▽ More Multimodal representation learning techniques typically rely on paired samples to learn common representations, but paired samples are challenging to collect in fields such as biology where measurement devices often destroy the samples. This paper presents an approach to address the challenge of aligning unpaired samples across disparate modalities in multimodal representation learning. We draw an analogy between potential outcomes in causal inference and potential views in multimodal observations, which allows us to use Rubin's framework to estimate a common space in which to match samples. Our approach assumes we collect samples that are experimentally perturbed by treatments, and uses this to estimate a propensity score from each modality, which encapsulates all shared information between a latent state and treatment and can be used to define a distance between samples. We experiment with two alignment techniques that leverage this distance -- shared nearest neighbours (SNN) and optimal transport (OT) matching -- and find that OT matching results in significant improvements over state-of-the-art alignment approaches in both a synthetic multi-modal setting and in real-world data from NeurIPS Multimodal Single-Cell Integration Challenge. △ Less

Submitted 1 April, 2024; originally announced April 2024.

arXiv:2402.09742 [pdf, other]

AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator

Authors: Zhihao Fan, Jialong Tang, Wei Chen, Siyuan Wang, Zhongyu Wei, Jun Xi, Fei Huang, Jingren Zhou

Abstract: Artificial intelligence has significantly advanced healthcare, particularly through large language models (LLMs) that excel in medical question answering benchmarks. However, their real-world clinical application remains limited due to the complexities of doctor-patient interactions. To address this, we introduce \textbf{AI Hospital}, a multi-agent framework simulating dynamic medical interactions… ▽ More Artificial intelligence has significantly advanced healthcare, particularly through large language models (LLMs) that excel in medical question answering benchmarks. However, their real-world clinical application remains limited due to the complexities of doctor-patient interactions. To address this, we introduce \textbf{AI Hospital}, a multi-agent framework simulating dynamic medical interactions between \emph{Doctor} as player and NPCs including \emph{Patient}, \emph{Examiner}, \emph{Chief Physician}. This setup allows for realistic assessments of LLMs in clinical scenarios. We develop the Multi-View Medical Evaluation (MVME) benchmark, utilizing high-quality Chinese medical records and NPCs to evaluate LLMs' performance in symptom collection, examination recommendations, and diagnoses. Additionally, a dispute resolution collaborative mechanism is proposed to enhance diagnostic accuracy through iterative discussions. Despite improvements, current LLMs exhibit significant performance gaps in multi-turn interactions compared to one-step approaches. Our findings highlight the need for further research to bridge these gaps and improve LLMs' clinical diagnostic capabilities. Our data, code, and experimental results are all open-sourced at \url{https://github.com/LibertFan/AI_Hospital}. △ Less

Submitted 27 June, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

Comments: https://github.com/LibertFan/AI_Hospital

arXiv:2401.15196 [pdf, other]

Regularized Q-Learning with Linear Function Approximation

Authors: Jiachen Xi, Alfredo Garcia, Petar Momcilovic

Abstract: Several successful reinforcement learning algorithms make use of regularization to promote multi-modal policies that exhibit enhanced exploration and robustness. With functional approximation, the convergence properties of some of these algorithms (e.g. soft Q-learning) are not well understood. In this paper, we consider a single-loop algorithm for minimizing the projected Bellman error with finit… ▽ More Several successful reinforcement learning algorithms make use of regularization to promote multi-modal policies that exhibit enhanced exploration and robustness. With functional approximation, the convergence properties of some of these algorithms (e.g. soft Q-learning) are not well understood. In this paper, we consider a single-loop algorithm for minimizing the projected Bellman error with finite time convergence guarantees in the case of linear function approximation. The algorithm operates on two scales: a slower scale for updating the target network of the state-action values, and a faster scale for approximating the Bellman backups in the subspace of the span of basis vectors. We show that, under certain assumptions, the proposed algorithm converges to a stationary point in the presence of Markovian noise. In addition, we provide a performance guarantee for the policies derived from the proposed algorithm. △ Less

Submitted 26 January, 2024; originally announced January 2024.

arXiv:2310.15057 [pdf, other]

Shareable Driving Style Learning and Analysis with a Hierarchical Latent Model

Authors: Chaopeng Zhang, Wenshuo Wang, Zhaokun Chen, Jian Zhang, Lijun Sun, Junqiang Xi

Abstract: Driving style is usually used to characterize driving behavior for a driver or a group of drivers. However, it remains unclear how one individual's driving style shares certain common grounds with other drivers. Our insight is that driving behavior is a sequence of responses to the weighted mixture of latent driving styles that are shareable within and between individuals. To this end, this paper… ▽ More Driving style is usually used to characterize driving behavior for a driver or a group of drivers. However, it remains unclear how one individual's driving style shares certain common grounds with other drivers. Our insight is that driving behavior is a sequence of responses to the weighted mixture of latent driving styles that are shareable within and between individuals. To this end, this paper develops a hierarchical latent model to learn the relationship between driving behavior and driving styles. We first propose a fragment-based approach to represent complex sequential driving behavior, allowing for sufficiently representing driving behavior in a low-dimension feature space. Then, we provide an analytical formulation for the interaction of driving behavior and shareable driving style with a hierarchical latent model by introducing the mechanism of Dirichlet allocation. Our developed model is finally validated and verified with 100 drivers in naturalistic driving settings with urban and highways. Experimental results reveal that individuals share driving styles within and between them. We also analyzed the influence of personalities (e.g., age, gender, and driving experience) on driving styles and found that a naturally aggressive driver would not always keep driving aggressively (i.e., could behave calmly sometimes) but with a higher proportion of aggressiveness than other types of drivers. △ Less

Submitted 24 October, 2023; v1 submitted 23 October, 2023; originally announced October 2023.

arXiv:2309.11725 [pdf, other]

FluentEditor: Text-based Speech Editing by Considering Acoustic and Prosody Consistency

Authors: Rui Liu, Jiatian Xi, Ziyue Jiang, Haizhou Li

Abstract: Text-based speech editing (TSE) techniques are designed to enable users to edit the output audio by modifying the input text transcript instead of the audio itself. Despite much progress in neural network-based TSE techniques, the current techniques have focused on reducing the difference between the generated speech segment and the reference target in the editing region, ignoring its local and gl… ▽ More Text-based speech editing (TSE) techniques are designed to enable users to edit the output audio by modifying the input text transcript instead of the audio itself. Despite much progress in neural network-based TSE techniques, the current techniques have focused on reducing the difference between the generated speech segment and the reference target in the editing region, ignoring its local and global fluency in the context and original utterance. To maintain the speech fluency, we propose a fluency speech editing model, termed \textit{FluentEditor}, by considering fluency-aware training criterion in the TSE training. Specifically, the \textit{acoustic consistency constraint} aims to smooth the transition between the edited region and its neighboring acoustic segments consistent with the ground truth, while the \textit{prosody consistency constraint} seeks to ensure that the prosody attributes within the edited regions remain consistent with the overall style of the original utterance. The subjective and objective experimental results on VCTK demonstrate that our \textit{FluentEditor} outperforms all advanced baselines in terms of naturalness and fluency. The audio samples and code are available at \url{https://github.com/Ai-S2-Lab/FluentEditor}. △ Less

Submitted 21 September, 2023; v1 submitted 20 September, 2023; originally announced September 2023.

Comments: Submitted to ICASSP'2024

arXiv:2308.12834 [pdf]

A Blockchain based Fund Management System for Construction Projects -- A Comprehensive Case Study in Xiong'an New Area China

Authors: Wenlue Song, Hanyuan Wu, Hongwei Meng, Evan Bian, Cong Tang, Jiaqi Xi, Haogang Zhu

Abstract: As large scale construction projects become increasingly complex, the use and integration of advanced technologies are being emphasized more and more. However, the construction industry often lags behind most industries in the application of digital technologies. In recent years, a decentralized, peer-topeer blockchain technology has attracted widespread attention from academia and industry. This… ▽ More As large scale construction projects become increasingly complex, the use and integration of advanced technologies are being emphasized more and more. However, the construction industry often lags behind most industries in the application of digital technologies. In recent years, a decentralized, peer-topeer blockchain technology has attracted widespread attention from academia and industry. This paper provides a solution that combines blockchain technology with construction project fund management. The system involves participants such as the owner's unit, construction companies, government departments, banks, etc., adopting the technical architecture of the Xiong'an Blockchain Underlying System. The core business and key logic processing are all implemented through smart contracts, ensuring the transparency and traceability of the fund payment process. The goal of ensuring investment quality, standardizing investment behavior, and strengthening cost control is achieved through blockchain technology. The application of this system in the management of Xiong'an construction projects has verified that blockchain technology plays a significant positive role in strengthening fund management, enhancing fund supervision, and ensuring fund safety in the construction process of engineering projects. It helps to eliminate the common problems of multi-party trust and transparent supervision in the industry and can further improve the investment benefits of government investment projects and improve the management system and operation mechanism of investment projects. △ Less

Submitted 24 August, 2023; originally announced August 2023.

Comments: Accepted to the 8th International Conference on Smart Finance (ICSF 2023)

arXiv:2307.10233 [pdf, other]

RayMVSNet++: Learning Ray-based 1D Implicit Fields for Accurate Multi-View Stereo

Authors: Yifei Shi, Junhua Xi, Dewen Hu, Zhiping Cai, Kai Xu

Abstract: Learning-based multi-view stereo (MVS) has by far centered around 3D convolution on cost volumes. Due to the high computation and memory consumption of 3D CNN, the resolution of output depth is often considerably limited. Different from most existing works dedicated to adaptive refinement of cost volumes, we opt to directly optimize the depth value along each camera ray, mimicking the range findin… ▽ More Learning-based multi-view stereo (MVS) has by far centered around 3D convolution on cost volumes. Due to the high computation and memory consumption of 3D CNN, the resolution of output depth is often considerably limited. Different from most existing works dedicated to adaptive refinement of cost volumes, we opt to directly optimize the depth value along each camera ray, mimicking the range finding of a laser scanner. This reduces the MVS problem to ray-based depth optimization which is much more light-weight than full cost volume optimization. In particular, we propose RayMVSNet which learns sequential prediction of a 1D implicit field along each camera ray with the zero-crossing point indicating scene depth. This sequential modeling, conducted based on transformer features, essentially learns the epipolar line search in traditional multi-view stereo. We devise a multi-task learning for better optimization convergence and depth accuracy. We found the monotonicity property of the SDFs along each ray greatly benefits the depth estimation. Our method ranks top on both the DTU and the Tanks & Temples datasets over all previous learning-based methods, achieving an overall reconstruction score of 0.33mm on DTU and an F-score of 59.48% on Tanks & Temples. It is able to produce high-quality depth estimation and point cloud reconstruction in challenging scenarios such as objects/scenes with non-textured surface, severe occlusion, and highly varying depth range. Further, we propose RayMVSNet++ to enhance contextual feature aggregation for each ray through designing an attentional gating unit to select semantically relevant neighboring rays within the local frustum around that ray. RayMVSNet++ achieves state-of-the-art performance on the ScanNet dataset. In particular, it attains an AbsRel of 0.058m and produces accurate results on the two subsets of textureless regions and large depth variation. △ Less

Submitted 15 July, 2023; originally announced July 2023.

Comments: IEEE Transactions on Pattern Analysis and Machine Intelligence. arXiv admin note: substantial text overlap with arXiv:2204.01320

arXiv:2204.01320 [pdf, other]

RayMVSNet: Learning Ray-based 1D Implicit Fields for Accurate Multi-View Stereo

Authors: Junhua Xi, Yifei Shi, Yijie Wang, Yulan Guo, Kai Xu

Abstract: Learning-based multi-view stereo (MVS) has by far centered around 3D convolution on cost volumes. Due to the high computation and memory consumption of 3D CNN, the resolution of output depth is often considerably limited. Different from most existing works dedicated to adaptive refinement of cost volumes, we opt to directly optimize the depth value along each camera ray, mimicking the range (depth… ▽ More Learning-based multi-view stereo (MVS) has by far centered around 3D convolution on cost volumes. Due to the high computation and memory consumption of 3D CNN, the resolution of output depth is often considerably limited. Different from most existing works dedicated to adaptive refinement of cost volumes, we opt to directly optimize the depth value along each camera ray, mimicking the range (depth) finding of a laser scanner. This reduces the MVS problem to ray-based depth optimization which is much more light-weight than full cost volume optimization. In particular, we propose RayMVSNet which learns sequential prediction of a 1D implicit field along each camera ray with the zero-crossing point indicating scene depth. This sequential modeling, conducted based on transformer features, essentially learns the epipolar line search in traditional multi-view stereo. We also devise a multi-task learning for better optimization convergence and depth accuracy. Our method ranks top on both the DTU and the Tanks \& Temples datasets over all previous learning-based methods, achieving overall reconstruction score of 0.33mm on DTU and f-score of 59.48% on Tanks & Temples. △ Less

Submitted 4 April, 2022; originally announced April 2022.

Comments: cvpr 2022, 11 pages

arXiv:2112.07097 [pdf, other]

Grant Free MIMO-NOMA with Differential Modulation for Machine Type Communications

Authors: Yuanyuan Zhang, Zhengdao Yuan, Qinghua Guo, Zhongyong Wang, Jiangtao Xi, Yanguang Yu, Yonghui Li

Abstract: This paper considers a challenging scenario of machine type communications, where we assume internet of things (IoT) devices send short packets sporadically to an access point (AP) and the devices are not synchronized in the packet level. High transmission efficiency and low latency are concerned. Motivated by the great potential of multiple-input multiple-output non-orthogonal multiple access (MI… ▽ More This paper considers a challenging scenario of machine type communications, where we assume internet of things (IoT) devices send short packets sporadically to an access point (AP) and the devices are not synchronized in the packet level. High transmission efficiency and low latency are concerned. Motivated by the great potential of multiple-input multiple-output non-orthogonal multiple access (MIMO-NOMA) in massive access, we design a grant-free MIMO-NOMA scheme, and in particular differential modulation is used so that expensive channel estimation at the receiver (AP) can be bypassed. The receiver at AP needs to carry out active device detection and multi-device data detection. The active user detection is formulated as the estimation of the common support of sparse signals, and a message passing based sparse Bayesian learning (SBL) algorithm is designed to solve the problem. Due to the use of differential modulation, we investigate the problem of non-coherent multi-device data detection, and develop a message passing based Bayesian data detector, where the constraint of differential modulation is exploited to drastically improve the detection performance, compared to the conventional non-coherent detection scheme. Simulation results demonstrate the effectiveness of the proposed active device detector and non-coherent multi-device data detector. △ Less

Submitted 11 June, 2024; v1 submitted 13 December, 2021; originally announced December 2021.

arXiv:2008.10771 [pdf, other]

MuCo: Publishing Microdata with Privacy Preservation through Mutual Cover

Authors: Boyu Li, Jianfeng Ma, Junhua Xi, Lili Zhang, Tao Xie, Tongfei Shang

Abstract: We study the anonymization technique of k-anonymity family for preserving privacy in the publication of microdata. Although existing approaches based on generalization can provide good enough protections, the generalized table always suffers from considerable information loss, mainly because the distributions of QI (Quasi-Identifier) values are barely preserved and the results of query statements… ▽ More We study the anonymization technique of k-anonymity family for preserving privacy in the publication of microdata. Although existing approaches based on generalization can provide good enough protections, the generalized table always suffers from considerable information loss, mainly because the distributions of QI (Quasi-Identifier) values are barely preserved and the results of query statements are groups rather than specific tuples. To this end, we propose a novel technique, called the Mutual Cover (MuCo), to prevent the adversary from matching the combination of QI values in published microdata. The rationale is to replace some original QI values with random values according to random output tables, making similar tuples to cover for each other with the minimum cost. As a result, MuCo can prevent both identity disclosure and attribute disclosure while retaining the information utility more effectively than generalization. The effectiveness of MuCo is verified with extensive experiments. △ Less

Submitted 29 March, 2024; v1 submitted 24 August, 2020; originally announced August 2020.

arXiv:2004.03462 [pdf]

Efficient Task Mapping for Manycore Systems

Authors: Xiqian Wang, Jiajin Xi, Yinghao Wang, Paul Bogdan, Shahin Nazarian

Abstract: System-on-chip (SoC) has migrated from single core to manycore architectures to cope with the increasing complexity of real-life applications. Application task mapping has a significant impact on the efficiency of manycore system (MCS) computation and communication. We present WAANSO, a scalable framework that incorporates a Wavelet Clustering based approach to cluster application tasks. We also i… ▽ More System-on-chip (SoC) has migrated from single core to manycore architectures to cope with the increasing complexity of real-life applications. Application task mapping has a significant impact on the efficiency of manycore system (MCS) computation and communication. We present WAANSO, a scalable framework that incorporates a Wavelet Clustering based approach to cluster application tasks. We also introduce Ant Swarm Optimization (ASO) based on iterative execution of Ant Colony Optimization (ACO) and Particle Swarm Optimization (PSO) for task clustering and mapping to the MCS processing elements. We have shown that WAANSO can significantly increase the MCS energy and performance efficiencies. Based on our experiments on a 64-core system, WAANSO improves energy efficiency by 19%, compared to baseline approaches, namely DPSO, ACO and branch and bound (B&B). Additionally, the performance improves by 65.86% compared to Density-Based Spatial Clustering of Applications with Noise (DBSCAN) baseline. △ Less

Submitted 5 April, 2020; originally announced April 2020.

Comments: This paper is accepted to appear in ISCAS 2020

arXiv:2003.01307 [pdf, other]

Bayesian Receiver Design for Grant-Free NOMA with Message Passing Based Structured Signal Estimation

Authors: Yuanyuan Zhang, Zhengdao Yuan, Qinghua Guo, Zhongyong Wang, Jiangtao Xi, Yonghui Li

Abstract: Grant-free non-orthogonal multiple access (NOMA) is promising to achieve low latency massive access in Internet of Things (IoT) applications. In grant-free NOMA, pilot signals are often used for user activity detection (UAD) and channel estimation (CE) prior to multiuser detection (MUD) of active users. However, the pilot overhead makes the communications inefficient for IoT devices with sporadic… ▽ More Grant-free non-orthogonal multiple access (NOMA) is promising to achieve low latency massive access in Internet of Things (IoT) applications. In grant-free NOMA, pilot signals are often used for user activity detection (UAD) and channel estimation (CE) prior to multiuser detection (MUD) of active users. However, the pilot overhead makes the communications inefficient for IoT devices with sporadic transmissions and short data packets, or when the channel coherence time is short. Hence, it is desirable to improve the efficiency by avoiding the use of pilot signals, which can also further achieve lower latency. This work focuses on Bayesian receiver design for grant-free low density signature orthogonal frequency division multiplexing (LDS-OFDM), where each user is allocated a unique low density spreading sequence. We propose to use the low density spreading sequences for active user detection, thereby avoiding the use of pilot signals. Firstly, the task of joint UAD, CE and MUD is formulated as a structured signal estimation problem. Then message passing based Bayesian approach is developed to solve the structured signal estimation problem. In particular, belief propagation (BP), expectation propagation (EP) and mean field (MF) message passing are used to develop efficient hybrid message passing algorithms to achieve trade-off between performance and complexity. Simulation results demonstrate the effectiveness of the proposed receiver for grant-free LDS-OFDM without the use of pilot signals. △ Less

Submitted 2 March, 2020; originally announced March 2020.

arXiv:2003.00759 [pdf, other]

doi 10.1109/TITS.2021.3057645

Spatiotemporal Learning of Multivehicle Interaction Patterns in Lane-Change Scenarios

Authors: Chengyuan Zhang, Jiacheng Zhu, Wenshuo Wang, Junqiang Xi

Abstract: Interpretation of common-yet-challenging interaction scenarios can benefit well-founded decisions for autonomous vehicles. Previous research achieved this using their prior knowledge of specific scenarios with predefined models, limiting their adaptive capabilities. This paper describes a Bayesian nonparametric approach that leverages continuous (i.e., Gaussian processes) and discrete (i.e., Diric… ▽ More Interpretation of common-yet-challenging interaction scenarios can benefit well-founded decisions for autonomous vehicles. Previous research achieved this using their prior knowledge of specific scenarios with predefined models, limiting their adaptive capabilities. This paper describes a Bayesian nonparametric approach that leverages continuous (i.e., Gaussian processes) and discrete (i.e., Dirichlet processes) stochastic processes to reveal underlying interaction patterns of the ego vehicle with other nearby vehicles. Our model relaxes dependency on the number of surrounding vehicles by developing an acceleration-sensitive velocity field based on Gaussian processes. The experiment results demonstrate that the velocity field can represent the spatial interactions between the ego vehicle and its surroundings. Then, a discrete Bayesian nonparametric model, integrating Dirichlet processes and hidden Markov models, is developed to learn the interaction patterns over the temporal space by segmenting and clustering the sequential interaction data into interpretable granular patterns automatically. We then evaluate our approach in the highway lane-change scenarios using the highD dataset collected from real-world settings. Results demonstrate that our proposed Bayesian nonparametric approach provides an insight into the complicated lane-change interactions of the ego vehicle with multiple surrounding traffic participants based on the interpretable interaction patterns and their transition properties in temporal relationships. Our proposed approach sheds light on efficiently analyzing other kinds of multi-agent interactions, such as vehicle-pedestrian interactions. View the demos via https://youtu.be/z_vf9UHtdAM. △ Less

Submitted 5 September, 2020; v1 submitted 2 March, 2020; originally announced March 2020.

Comments: for the supplements, see https://chengyuan-zhang.github.io/Multivehicle-Interaction/

arXiv:2002.05645 [pdf, other]

Training Large Neural Networks with Constant Memory using a New Execution Algorithm

Authors: Bharadwaj Pudipeddi, Maral Mesmakhosroshahi, Jinwen Xi, Sujeeth Bharadwaj

Abstract: Widely popular transformer-based NLP models such as BERT and Turing-NLG have enormous capacity trending to billions of parameters. Current execution methods demand brute-force resources such as HBM devices and high speed interconnectivity for data parallelism. In this paper, we introduce a new relay-style execution technique called L2L (layer-to-layer) where at any given moment, the device memory… ▽ More Widely popular transformer-based NLP models such as BERT and Turing-NLG have enormous capacity trending to billions of parameters. Current execution methods demand brute-force resources such as HBM devices and high speed interconnectivity for data parallelism. In this paper, we introduce a new relay-style execution technique called L2L (layer-to-layer) where at any given moment, the device memory is primarily populated only with the executing layer(s)'s footprint. The model resides in the DRAM memory attached to either a CPU or an FPGA as an entity we call eager param-server (EPS). To overcome the bandwidth issues of shuttling parameters to and from EPS, the model is executed a layer at a time across many micro-batches instead of the conventional method of minibatches over whole model. L2L is implemented using 16GB V100 devices for BERT-Large running it with a device batch size of up to 256. Our results show 45% reduction in memory and 40% increase in the throughput compared to the state-of-the-art baseline. L2L is also able to fit models up to 50 Billion parameters on a machine with a single 16GB V100 and 512GB CPU memory and without requiring any model partitioning. L2L scales to arbitrary depth allowing researchers to develop on affordable devices which is a big step toward democratizing AI. By running the optimizer in the host EPS, we show a new form of mixed precision for faster throughput and convergence. In addition, the EPS enables dynamic neural architecture approaches by varying layers across iterations. Finally, we also propose and demonstrate a constant memory variation of L2L and we propose future enhancements. This work has been performed on GPUs first, but also targeted towards all high TFLOPS/Watt accelerators. △ Less

Submitted 4 June, 2020; v1 submitted 13 February, 2020; originally announced February 2020.

arXiv:1909.08974 [pdf, ps, other]

doi 10.1002/rnc.4941

Robust time-varying formation design for multi-agent systems with disturbances: Extended-state-observer method

Authors: Le Wang, Jianxiang Xi, Ming He, Guangbin Liu

Abstract: Robust time-varying formation design problems for second-order multi-agent systems subjected to external disturbances are investigated. Firstly, by constructing an extended state observer, the disturbance compensation is estimated, which is a critical term in the proposed robust time-varying formation control protocol. Then, an explicit expression of the formation center function is determined and… ▽ More Robust time-varying formation design problems for second-order multi-agent systems subjected to external disturbances are investigated. Firstly, by constructing an extended state observer, the disturbance compensation is estimated, which is a critical term in the proposed robust time-varying formation control protocol. Then, an explicit expression of the formation center function is determined and impacts of disturbance compensations on the formation center function are presented. With the formation feasibility conditions, robust time-varying formation design criteria are derived to determine the gain matrix of the formation control protocol by utilizing the algebraic Riccati equation technique. Furthermore, the tracking performance and the robustness property of multi-agent systems are analyzed. Finally, the numerical simulation is provided to illustrate the effectiveness of theoretical results. △ Less

Submitted 19 September, 2019; originally announced September 2019.

Comments: 14 pages, 5 figures

Journal ref: INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL Volume: 30 Issue: 7 Pages: 2796-2808 Published: MAY 10 2020

arXiv:1909.08345 [pdf, other]

doi 10.1109/TCYB.2019.2963172

Limited-budget output consensus for descriptor multiagent systems with energy constraints

Authors: Jianxiang Xi, Cheng Wang, Xiaojun Yang, Bailong Yang

Abstract: The current paper deals with limited-budget output consensus for descriptor multiagent systems with two types of switching communication topologies; that is, switching connected ones and jointly connected ones. Firstly, a singular dynamic output feedback control protocol with switching communication topologies is proposed on the basis of the observable decomposition, where an energy constraint is… ▽ More The current paper deals with limited-budget output consensus for descriptor multiagent systems with two types of switching communication topologies; that is, switching connected ones and jointly connected ones. Firstly, a singular dynamic output feedback control protocol with switching communication topologies is proposed on the basis of the observable decomposition, where an energy constraint is involved and protocol states of neighboring agents are utilized to derive a new two-step design approach of gain matrices. Then, limited-budget output consensus problems are transformed into asymptotic stability ones and a valid candidate of the output consensus function is determined. Furthermore, sufficient conditions for limited-budget output consensus design for two types of switching communication topologies are proposed, respectively. Finally, two numerical simulations are shown to demonstrate theoretical conclusions. △ Less

Submitted 18 September, 2019; originally announced September 2019.

Comments: 10 pages, 5 figures

Journal ref: IEEE Transactions on Cybernetics 2020

arXiv:1906.09841 [pdf, ps, other]

On the Performance of Massive MIMO Systems With Low-Resolution ADCs Over Rician Fading Channels

Authors: Tianle Liu, Jun Tong, Qinghua Guo, Jiangtao Xi, Yanguang Yu, Zhitao Xiao

Abstract: This paper considers uplink massive multiple-input multiple-output (MIMO) systems with lowresolution analog-to-digital converters (ADCs) over Rician fading channels. Maximum-ratio-combining (MRC) and zero-forcing (ZF) receivers are considered under the assumption of perfect and imperfect channel state information (CSI). Low-resolution ADCs are considered for both data detection and channel estimat… ▽ More This paper considers uplink massive multiple-input multiple-output (MIMO) systems with lowresolution analog-to-digital converters (ADCs) over Rician fading channels. Maximum-ratio-combining (MRC) and zero-forcing (ZF) receivers are considered under the assumption of perfect and imperfect channel state information (CSI). Low-resolution ADCs are considered for both data detection and channel estimation, and the resulting performance is analyzed. Asymptotic approximations of the spectrum efficiency (SE) for large systems are derived based on random matrix theory. With these results, we can provide insights into the trade-offs between the SE and the ADC resolution and study the influence of the Rician K-factors on the performance. It is shown that a large value of K-factors may lead to better performance and alleviate the influence of quantization noise on channel estimation. Moreover, we investigate the power scaling laws for both receivers under imperfect CSI and it shows that when the number of base station (BS) antennas is very large, without loss of SE performance, the transmission power can be scaled by the number of BS antennas for both receivers while the overall performance is limited by the resolution of ADCs. The asymptotic analysis is validated by numerical results. Besides, it is also shown that the SE gap between the two receivers is narrowed down when the K-factor is increased. We also show that ADCs with moderate resolutions lead to better energy efficiency (EE) than that with high-resolution or extremely low-resolution ADCs and using ZF receivers achieve higher EE as compared with the MRC receivers. △ Less

Submitted 24 June, 2019; originally announced June 2019.

Comments: 12 pages, 7 figures

arXiv:1901.05104 [pdf]

A Comprehensive Performance Evaluation for 3D Transformation Estimation Techniques

Authors: Bao Zhao, Xiaobo Chen, Xinyi Le, Juntong Xi

Abstract: 3D local feature extraction and matching is the basis for solving many tasks in the area of computer vision, such as 3D registration, modeling, recognition and retrieval. However, this process commonly draws into false correspondences, due to noise, limited features, occlusion, incomplete surface and etc. In order to estimate accurate transformation based on these corrupted correspondences, numero… ▽ More 3D local feature extraction and matching is the basis for solving many tasks in the area of computer vision, such as 3D registration, modeling, recognition and retrieval. However, this process commonly draws into false correspondences, due to noise, limited features, occlusion, incomplete surface and etc. In order to estimate accurate transformation based on these corrupted correspondences, numerous transformation estimation techniques have been proposed. However, the merits, demerits and appropriate application for these methods are unclear owing to that no comprehensive evaluation for the performance of these methods has been conducted. This paper evaluates eleven state-of-the-art transformation estimation proposals on both descriptor based and synthetic correspondences. On descriptor based correspondences, several evaluation items (including the performance on different datasets, robustness to different overlap ratios and the performance of these technique combined with Iterative Closest Point (ICP), different local features and LRF/A techniques) of these methods are tested on four popular datasets acquired with different devices. On synthetic correspondences, the robustness of these methods to varying percentages of correct correspondences (PCC) is evaluated. In addition, we also evaluate the efficiencies of these methods. Finally, the merits, demerits and application guidance of these tested transformation estimation methods are summarized. △ Less

Submitted 15 January, 2019; originally announced January 2019.

arXiv:1810.08360 [pdf, ps, other]

doi 10.1016/j.sigpro.2018.02.026

Linear Shrinkage Estimation of Covariance Matrices Using Low-Complexity Cross-Validation

Authors: Jun Tong, Rui Hu, Jiangtao Xi, Zhitao Xiao, Qinghua Guo, Yanguang Yu

Abstract: Shrinkage can effectively improve the condition number and accuracy of covariance matrix estimation, especially for low-sample-support applications with the number of training samples smaller than the dimensionality. This paper investigates parameter choice for linear shrinkage estimators. We propose data-driven, leave-one-out cross-validation (LOOCV) methods for automatically choosing the shrinka… ▽ More Shrinkage can effectively improve the condition number and accuracy of covariance matrix estimation, especially for low-sample-support applications with the number of training samples smaller than the dimensionality. This paper investigates parameter choice for linear shrinkage estimators. We propose data-driven, leave-one-out cross-validation (LOOCV) methods for automatically choosing the shrinkage coefficients, aiming to minimize the Frobenius norm of the estimation error. A quadratic loss is used as the prediction error for LOOCV. The resulting solutions can be found analytically or by solving optimization problems of small sizes and thus have low complexities. Our proposed methods are compared with various existing techniques. We show that the LOOCV method achieves near-oracle performance for shrinkage designs using sample covariance matrix (SCM) and several typical shrinkage targets. Furthermore, the LOOCV method provides low-complexity solutions for estimators that use general shrinkage targets, multiple targets, and/or ordinary least squares (OLS)-based covariance matrix estimation. We also show applications of our proposed techniques to several different problems in array signal processing. △ Less

Submitted 19 October, 2018; originally announced October 2018.

Comments: 12 pages, 6 figures. Published in Signal Processing

Journal ref: J. Tong, R. Hu, J. Xi, Z. Xiao, Q. Guo, and Y. Yu, "Linear shrinkage estimation of covariance matrices using low-complexity cross-validation," Signal Processing, vol.148, pp. 223-233, July 2018

arXiv:1810.04788 [pdf, ps, other]

doi 10.1109/ACCESS.2018.2877432

Matrix Completion-Based Channel Estimation for MmWave Communication Systems With Array-Inherent Impairments

Authors: Rui Hu, Jun Tong, Jiangtao Xi, Qinghua Guo, Yanguang Yu

Abstract: Hybrid massive MIMO structures with reduced hardware complexity and power consumption have been widely studied as a potential candidate for millimeter wave (mmWave) communications. Channel estimators that require knowledge of the array response, such as those using compressive sensing (CS) methods, may suffer from performance degradation when array-inherent impairments bring unknown phase errors a… ▽ More Hybrid massive MIMO structures with reduced hardware complexity and power consumption have been widely studied as a potential candidate for millimeter wave (mmWave) communications. Channel estimators that require knowledge of the array response, such as those using compressive sensing (CS) methods, may suffer from performance degradation when array-inherent impairments bring unknown phase errors and gain errors to the antenna elements. In this paper, we design matrix completion (MC)-based channel estimation schemes which are robust against the array-inherent impairments. We first design an open-loop training scheme that can sample entries from the effective channel matrix randomly and is compatible with the phase shifter-based hybrid system. Leveraging the low-rank property of the effective channel matrix, we then design a channel estimator based on the generalized conditional gradient (GCG) framework and the alternating minimization (AltMin) approach. The resulting estimator is immune to array-inherent impairments and can be implemented to systems with any array shapes for its independence of the array response. In addition, we extend our design to sample a transformed channel matrix following the concept of inductive matrix completion (IMC), which can be solved efficiently using our proposed estimator and achieve similar performance with a lower requirement of the dynamic range of the transmission power per antenna. Numerical results demonstrate the advantages of our proposed MC-based channel estimators in terms of estimation performance, computational complexity and robustness against array-inherent impairments over the orthogonal matching pursuit (OMP)-based CS channel estimator. △ Less

Submitted 10 October, 2018; originally announced October 2018.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:1806.09757 [pdf, ps, other]

Adaptive guaranteed-performance consensus design for high-order multiagent systems

Authors: Jianxiang Xi, Jie Yang, Hao Liu, Tang Zheng

Abstract: The current paper addresses the distributed guaranteed-performance consensus design problems for general high-order linear multiagent systems with leaderless and leader-follower structures, respectively. The information about the Laplacian matrix of the interaction topology or its minimum nonzero eigenvalue is usually required in existing works on the guaranteed-performance consensus, which means… ▽ More The current paper addresses the distributed guaranteed-performance consensus design problems for general high-order linear multiagent systems with leaderless and leader-follower structures, respectively. The information about the Laplacian matrix of the interaction topology or its minimum nonzero eigenvalue is usually required in existing works on the guaranteed-performance consensus, which means that their conclusions are not completely distributed. A new translation-adaptive strategy is proposed to realize the completely distributed guaranteed-performance consensus control by using the structure feature of a complete graph in the current paper. For the leaderless case, an adaptive guaranteed-performance consensualization criterion is given in terms of Riccati inequalities and a regulation approach of the consensus control gain is presented by linear matrix inequalities. Extensions to the leader-follower cases are further investigated. Especially, the guaranteed-performance costs for leaderless and leader-follower cases are determined, respectively, which are associated with the intrinsic structure characteristic of the interaction topologies. Finally, two numerical examples are provided to demonstrate theoretical results. △ Less

Submitted 25 June, 2018; originally announced June 2018.

arXiv:1802.07923 [pdf, ps, other]

Dynamic Output Feedback Guaranteed-Cost Synchronization for Multiagent Networks with Given Cost Budgets

Authors: Jianxiang Xi, Cheng Wang, Hao Liu, Zhong Wang

Abstract: The current paper addresses the distributed guaranteed-cost synchronization problems for general high-order linear multiagent networks. Existing works on the guaranteed-cost synchronization usually require all state information of neighboring agents and cannot give the cost budget previously. For both leaderless and leader-following interaction topologies, the current paper firstly proposes a dyna… ▽ More The current paper addresses the distributed guaranteed-cost synchronization problems for general high-order linear multiagent networks. Existing works on the guaranteed-cost synchronization usually require all state information of neighboring agents and cannot give the cost budget previously. For both leaderless and leader-following interaction topologies, the current paper firstly proposes a dynamic output feedback synchronization protocol with guaranteed-cost constraints, which can realize the tradeoff design between the energy consumption and the synchronization regulation performance with the given cost budget. Then, according to different structure features of interaction topologies, leaderless and leader-following guaranteed-cost synchronization analysis and design criteria are presented, respectively, and an algorithm is proposed to deal with the impacts of nonlinear terms by using both synchronization analysis and design criteria. Especially, an explicit expression of the synchronization function is shown for leaderless cases, which is independent of protocol states and the given cost budget. Finally, numerical examples are presented to demonstrate theoretical results. △ Less

Submitted 22 February, 2018; originally announced February 2018.

Comments: 12 pages

arXiv:1801.03905 [pdf, other]

doi 10.1109/TVT.2018.2793889

Learning and Inferring a Driver's Braking Action in Car-Following Scenarios

Authors: Wenshuo Wang, Junqiang Xi, Ding Zhao

Abstract: Accurately predicting and inferring a driver's decision to brake is critical for designing warning systems and avoiding collisions. In this paper we focus on predicting a driver's intent to brake in car-following scenarios from a perception-decision-action perspective according to his/her driving history. A learning-based inference method, using onboard data from CAN-Bus, radar and cameras as expl… ▽ More Accurately predicting and inferring a driver's decision to brake is critical for designing warning systems and avoiding collisions. In this paper we focus on predicting a driver's intent to brake in car-following scenarios from a perception-decision-action perspective according to his/her driving history. A learning-based inference method, using onboard data from CAN-Bus, radar and cameras as explanatory variables, is introduced to infer drivers' braking decisions by combining a Gaussian mixture model (GMM) with a hidden Markov model (HMM). The GMM is used to model stochastic relationships among variables, while the HMM is applied to infer drivers' braking actions based on the GMM. Real-case driving data from 49 drivers (more than three years' driving data per driver on average) have been collected from the University of Michigan Safety Pilot Model Deployment database. We compare the GMM-HMM method to a support vector machine (SVM) method and an SVM-Bayesian filtering method. The experimental results are evaluated by employing three performance metrics: accuracy, sensitivity, specificity. The comparison results show that the GMM-HMM obtains the best performance, with an accuracy of 90%, sensitivity of 84%, and specificity of 97%. Thus, we believe that this method has great potential for real-world active safety systems. △ Less

Submitted 11 January, 2018; originally announced January 2018.

arXiv:1711.05368 [pdf]

A Novel SDASS Descriptor for Fully Encoding the Information of 3D Local Surface

Authors: Bao Zhao, Xinyi Le, Juntong Xi

Abstract: Local feature description is a fundamental yet challenging task in 3D computer vision. This paper proposes a novel descriptor, named Statistic of Deviation Angles on Subdivided Space (SDASS), of encoding geometrical and spatial information of local surface on Local Reference Axis (LRA). In terms of encoding geometrical information, considering that surface normals, which are usually used for encod… ▽ More Local feature description is a fundamental yet challenging task in 3D computer vision. This paper proposes a novel descriptor, named Statistic of Deviation Angles on Subdivided Space (SDASS), of encoding geometrical and spatial information of local surface on Local Reference Axis (LRA). In terms of encoding geometrical information, considering that surface normals, which are usually used for encoding geometrical information of local surface, are vulnerable to various nuisances (e.g., noise, varying mesh resolutions etc.), we propose a robust geometrical attribute, called Local Minimum Axis (LMA), to replace the normals for generating the geometrical feature in our SDASS descriptor. For encoding spatial information, we use two spatial features for fully encoding the spatial information of a local surface based on LRA which usually presents high overall repeatability than Local Reference Axis (LRF). Besides, an improved LRA is proposed for increasing the robustness of our SDASS to noise and varying mesh resolutions. The performance of the SDASS descriptor is rigorously tested on four popular datasets. The results show that our descriptor has a high descriptiveness and strong robustness, and its performance outperform existing algorithms by a large margin. Finally, the proposed descriptor is applied to 3D registration. The accurate result further confirms the effectiveness of our SDASS method. △ Less

Submitted 26 June, 2018; v1 submitted 14 November, 2017; originally announced November 2017.

Comments: 21 pages, 15 figures

arXiv:1708.08986 [pdf, other]

Driving Style Analysis Using Primitive Driving Patterns With Bayesian Nonparametric Approaches

Authors: Wenshuo Wang, Junqiang Xi, Ding Zhao

Abstract: Analysis and recognition of driving styles are profoundly important to intelligent transportation and vehicle calibration. This paper presents a novel driving style analysis framework using the primitive driving patterns learned from naturalistic driving data. In order to achieve this, first, a Bayesian nonparametric learning method based on a hidden semi-Markov model (HSMM) is introduced to extra… ▽ More Analysis and recognition of driving styles are profoundly important to intelligent transportation and vehicle calibration. This paper presents a novel driving style analysis framework using the primitive driving patterns learned from naturalistic driving data. In order to achieve this, first, a Bayesian nonparametric learning method based on a hidden semi-Markov model (HSMM) is introduced to extract primitive driving patterns from time series driving data without prior knowledge of the number of these patterns. In the Bayesian nonparametric approach, we utilize a hierarchical Dirichlet process (HDP) instead of learning the unknown number of smooth dynamical modes of HSMM, thus generating the primitive driving patterns. Each primitive pattern is clustered and then labeled using behavioral semantics according to drivers' physical and psychological perception thresholds. For each driver, 75 primitive driving patterns in car-following scenarios are learned and semantically labeled. In order to show the HDP-HSMM's utility to learn primitive driving patterns, other two Bayesian nonparametric approaches, HDP-HMM and sticky HDP-HMM, are compared. The naturalistic driving data of 18 drivers were collected from the University of Michigan Safety Pilot Model Deployment (SPDM) database. The individual driving styles are discussed according to distribution characteristics of the learned primitive driving patterns and also the difference in driving styles among drivers are evaluated using the Kullback-Leibler divergence. The experiment results demonstrate that the proposed primitive pattern-based method can allow one to semantically understand driver behaviors and driving styles. △ Less

Submitted 15 August, 2017; originally announced August 2017.

arXiv:1702.01228 [pdf, other]

A Learning-Based Approach for Lane Departure Warning Systems with a Personalized Driver Model

Authors: Wenshuo Wang, Ding Zhao, Junqiang Xi, Wei Han

Abstract: Misunderstanding of driver correction behaviors (DCB) is the primary reason for false warnings of lane-departure-prediction systems. We propose a learning-based approach to predicting unintended lane-departure behaviors (LDB) and the chance for drivers to bring the vehicle back to the lane. First, in this approach, a personalized driver model for lane-departure and lane-keeping behavior is establi… ▽ More Misunderstanding of driver correction behaviors (DCB) is the primary reason for false warnings of lane-departure-prediction systems. We propose a learning-based approach to predicting unintended lane-departure behaviors (LDB) and the chance for drivers to bring the vehicle back to the lane. First, in this approach, a personalized driver model for lane-departure and lane-keeping behavior is established by combining the Gaussian mixture model and the hidden Markov model. Second, based on this model, we develop an online model-based prediction algorithm to predict the forthcoming vehicle trajectory and judge whether the driver will demonstrate an LDB or a DCB. We also develop a warning strategy based on the model-based prediction algorithm that allows the lane-departure warning system to be acceptable for drivers according to the predicted trajectory. In addition, the naturalistic driving data of 10 drivers is collected through the University of Michigan Safety Pilot Model Deployment program to train the personalized driver model and validate this approach. We compare the proposed method with a basic time-to-lane-crossing (TLC) method and a TLC-directional sequence of piecewise lateral slopes (TLC-DSPLS) method. The results show that the proposed approach can reduce the false-warning rate to 3.07\%. △ Less

Submitted 3 February, 2017; originally announced February 2017.

Comments: 12 pages, 13 figures, Journal

arXiv:1609.05693 [pdf, ps, other]

Low-Complexity and Basis-Free Channel Estimation for Switch-Based mmWave MIMO Systems via Matrix Completion

Authors: Rui Hu, Jun Tong, Jiangtao Xi, Qinghua Guo, Yanguang Yu

Abstract: Recently, a switch-based hybrid massive MIMO structure that aims to reduce the hardware complexity and power consumption has been proposed as a potential candidate for millimeter wave (mmWave) communications. Exploiting the sparse nature of the mmWave channel, compressive sensing (CS)-based channel estimators have been proposed. When applied to real mmWave channels, the CS-based channel estimators… ▽ More Recently, a switch-based hybrid massive MIMO structure that aims to reduce the hardware complexity and power consumption has been proposed as a potential candidate for millimeter wave (mmWave) communications. Exploiting the sparse nature of the mmWave channel, compressive sensing (CS)-based channel estimators have been proposed. When applied to real mmWave channels, the CS-based channel estimators may encounter heavy computational burden due to the high dimensionality of the basis. Meanwhile, knowledge about the response of the antenna array, which is needed for constructing the basis of the CS estimators, may not be perfect due to array uncertainties such as phase mismatch among array elements. This can result in the loss of sparse representation and hence the degraded performance of the CS estimator. In this paper, we propose a novel matrix completion (MC)-based low-complexity channel estimator. The proposed scheme is compatible with switch-based hybrid structures, does not need to specify a basis, and can avoid the basis mismatch issue. Compared with the existing CS-based estimator, the proposed basis-free scheme is immune to array response mismatch and exhibits a significantly lower complexity. △ Less

Submitted 23 November, 2016; v1 submitted 19 September, 2016; originally announced September 2016.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:1606.01284 [pdf, other]

doi 10.1049/iet-its.2017.0379

Statistical Pattern Recognition for Driving Styles Based on Bayesian Probability and Kernel Density Estimation

Authors: Wenshuo Wang, Junqiang Xi, Xiaohan Li

Abstract: Driving styles have a great influence on vehicle fuel economy, active safety, and drivability. To recognize driving styles of path-tracking behaviors for different divers, a statistical pattern-recognition method is developed to deal with the uncertainty of driving styles or characteristics based on probability density estimation. First, to describe driver path-tracking styles, vehicle speed and t… ▽ More Driving styles have a great influence on vehicle fuel economy, active safety, and drivability. To recognize driving styles of path-tracking behaviors for different divers, a statistical pattern-recognition method is developed to deal with the uncertainty of driving styles or characteristics based on probability density estimation. First, to describe driver path-tracking styles, vehicle speed and throttle opening are selected as the discriminative parameters, and a conditional kernel density function of vehicle speed and throttle opening is built, respectively, to describe the uncertainty and probability of two representative driving styles, e.g., aggressive and normal. Meanwhile, a posterior probability of each element in feature vector is obtained using full Bayesian theory. Second, a Euclidean distance method is involved to decide to which class the driver should be subject instead of calculating the complex covariance between every two elements of feature vectors. By comparing the Euclidean distance between every elements in feature vector, driving styles are classified into seven levels ranging from low normal to high aggressive. Subsequently, to show benefits of the proposed pattern-recognition method, a cross-validated method is used, compared with a fuzzy logic-based pattern-recognition method. The experiment results show that the proposed statistical pattern-recognition method for driving styles based on kernel density estimation is more efficient and stable than the fuzzy logic-based method. △ Less

Submitted 3 June, 2016; originally announced June 2016.

Comments: 10 pages, 9 figures. Submitted to International Journal of Automotive Technology

Journal ref: IET Intelligent Transportation Systems, 2018

arXiv:1605.06742 [pdf, other]

doi 10.1109/ACC.2016.7526495

A Rapid Pattern-Recognition Method for Driving Types Using Clustering-Based Support Vector Machines

Authors: Wenshuo Wang, Junqiang Xi

Abstract: A rapid pattern-recognition approach to characterize driver's curve-negotiating behavior is proposed. To shorten the recognition time and improve the recognition of driving styles, a k-means clustering-based support vector machine ( kMC-SVM) method is developed and used for classifying drivers into two types: aggressive and moderate. First, vehicle speed and throttle opening are treated as the fea… ▽ More A rapid pattern-recognition approach to characterize driver's curve-negotiating behavior is proposed. To shorten the recognition time and improve the recognition of driving styles, a k-means clustering-based support vector machine ( kMC-SVM) method is developed and used for classifying drivers into two types: aggressive and moderate. First, vehicle speed and throttle opening are treated as the feature parameters to reflect the driving styles. Second, to discriminate driver curve-negotiating behaviors and reduce the number of support vectors, the k-means clustering method is used to extract and gather the two types of driving data and shorten the recognition time. Then, based on the clustering results, a support vector machine approach is utilized to generate the hyperplane for judging and predicting to which types the human driver are subject. Lastly, to verify the validity of the kMC-SVM method, a cross-validation experiment is designed and conducted. The research results show that the $ k $MC-SVM is an effective method to classify driving styles with a short time, compared with SVM method. △ Less

Submitted 22 May, 2016; originally announced May 2016.

Comments: 6 pages, 9 figures, 2 tables. To be appear in 2016 American Control Conference, Boston, MA, USA, 2016

Journal ref: 2017 American Control Conference

arXiv:1504.04799 [pdf, ps, other]

Approximate Message Passing with Unitary Transformation

Authors: Qinghua Guo, Jiangtao Xi

Abstract: Approximate message passing (AMP) and its variants, developed based on loopy belief propagation, are attractive for estimating a vector x from a noisy version of z = Ax, which arises in many applications. For a large A with i. i. d. elements, AMP can be characterized by the state evolution and exhibits fast convergence. However, it has been shown that, AMP mayeasily diverge for a generic A. In thi… ▽ More Approximate message passing (AMP) and its variants, developed based on loopy belief propagation, are attractive for estimating a vector x from a noisy version of z = Ax, which arises in many applications. For a large A with i. i. d. elements, AMP can be characterized by the state evolution and exhibits fast convergence. However, it has been shown that, AMP mayeasily diverge for a generic A. In this work, we develop a new variant of AMP based on a unitary transformation of the original model (hence the variant is called UT-AMP), where the unitary matrix is available for any matrix A, e.g., the conjugate transpose of the left singular matrix of A, or a normalized DFT (discrete Fourier transform) matrix for any circulant A. We prove that, in the case of Gaussian priors, UT-AMP always converges for any matrix A. It is observed that UT-AMP is much more robust than the original AMP for difficult A and exhibits fast convergence. A special form of UT-AMP with a circulant A was used in our previous work [13] for turbo equalization. This work extends it to a generic A, and provides a theoretical investigation on the convergence. △ Less

Submitted 19 April, 2015; originally announced April 2015.

Comments: 5 pages

Showing 1–34 of 34 results for author: Xi, J