subscribe to arXiv mailings

Enhancing Thermal Infrared Tracking with Natural Language Modeling and Coordinate Sequence Generation

Authors: Miao Yan, Ping Zhang, Haofei Zhang, Ruqian Hao, Juanxiu Liu, Xiaoyang Wang, Lin Liu

Abstract: Thermal infrared tracking is an essential topic in computer vision tasks because of its advantage of all-weather imaging. However, most conventional methods utilize only hand-crafted features, while deep learning-based correlation filtering methods are limited by simple correlation operations. Transformer-based methods ignore temporal and coordinate information, which is critical for TIR tracking… ▽ More Thermal infrared tracking is an essential topic in computer vision tasks because of its advantage of all-weather imaging. However, most conventional methods utilize only hand-crafted features, while deep learning-based correlation filtering methods are limited by simple correlation operations. Transformer-based methods ignore temporal and coordinate information, which is critical for TIR tracking that lacks texture and color information. In this paper, to address these issues, we apply natural language modeling to TIR tracking and propose a novel model called NLMTrack, which enhances the utilization of coordinate and temporal information. NLMTrack applies an encoder that unifies feature extraction and feature fusion, which simplifies the TIR tracking pipeline. To address the challenge of low detail and low contrast in TIR images, on the one hand, we design a multi-level progressive fusion module that enhances the semantic representation and incorporates multi-scale features. On the other hand, the decoder combines the TIR features and the coordinate sequence features using a causal transformer to generate the target sequence step by step. Moreover, we explore an adaptive loss aimed at elevating tracking accuracy and a simple template update strategy to accommodate the target's appearance variations. Experiments show that NLMTrack achieves state-of-the-art performance on multiple benchmarks. The Code is publicly available at \url{https://github.com/ELOESZHANG/NLMTrack}. △ Less

Submitted 11 July, 2024; originally announced July 2024.

arXiv:2407.00297 [pdf]

UADSN: Uncertainty-Aware Dual-Stream Network for Facial Nerve Segmentation

Authors: Guanghao Zhu, Lin Liu, Jing Zhang, Xiaohui Du, Ruqian Hao, Juanxiu Liu

Abstract: Facial nerve segmentation is crucial for preoperative path planning in cochlear implantation surgery. Recently, researchers have proposed some segmentation methods, such as atlas-based and deep learning-based methods. However, since the facial nerve is a tubular organ with a diameter of only 1.0-1.5mm, it is challenging to locate and segment the facial nerve in CT scans. In this work, we propose a… ▽ More Facial nerve segmentation is crucial for preoperative path planning in cochlear implantation surgery. Recently, researchers have proposed some segmentation methods, such as atlas-based and deep learning-based methods. However, since the facial nerve is a tubular organ with a diameter of only 1.0-1.5mm, it is challenging to locate and segment the facial nerve in CT scans. In this work, we propose an uncertainty-aware dualstream network (UADSN). UADSN consists of a 2D segmentation stream and a 3D segmentation stream. Predictions from two streams are used to identify uncertain regions, and a consistency loss is employed to supervise the segmentation of these regions. In addition, we introduce channel squeeze & spatial excitation modules into the skip connections of U-shaped networks to extract meaningful spatial information. In order to consider topologypreservation, a clDice loss is introduced into the supervised loss function. Experimental results on the facial nerve dataset demonstrate the effectiveness of UADSN and our submodules. △ Less

Submitted 28 June, 2024; originally announced July 2024.

arXiv:2406.19649 [pdf]

AstMatch: Adversarial Self-training Consistency Framework for Semi-Supervised Medical Image Segmentation

Authors: Guanghao Zhu, Jing Zhang, Juanxiu Liu, Xiaohui Du, Ruqian Hao, Yong Liu, Lin Liu

Abstract: Semi-supervised learning (SSL) has shown considerable potential in medical image segmentation, primarily leveraging consistency regularization and pseudo-labeling. However, many SSL approaches only pay attention to low-level consistency and overlook the significance of pseudo-label reliability. Therefore, in this work, we propose an adversarial self-training consistency framework (AstMatch). First… ▽ More Semi-supervised learning (SSL) has shown considerable potential in medical image segmentation, primarily leveraging consistency regularization and pseudo-labeling. However, many SSL approaches only pay attention to low-level consistency and overlook the significance of pseudo-label reliability. Therefore, in this work, we propose an adversarial self-training consistency framework (AstMatch). Firstly, we design an adversarial consistency regularization (ACR) approach to enhance knowledge transfer and strengthen prediction consistency under varying perturbation intensities. Second, we apply a feature matching loss for adversarial training to incorporate high-level consistency regularization. Additionally, we present the pyramid channel attention (PCA) and efficient channel and spatial attention (ECSA) modules to improve the discriminator's performance. Finally, we propose an adaptive self-training (AST) approach to ensure the pseudo-labels' quality. The proposed AstMatch has been extensively evaluated with cutting-edge SSL methods on three public-available datasets. The experimental results under different labeled ratios indicate that AstMatch outperforms other existing methods, achieving new state-of-the-art performance. Our code will be available at https://github.com/GuanghaoZhu663/AstMatch. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.17538 [pdf, other]

SKD-TSTSAN: Three-Stream Temporal-Shift Attention Network Based on Self-Knowledge Distillation for Micro-Expression Recognition

Authors: Guanghao Zhu, Lin Liu, Yuhao Hu, Haixin Sun, Fang Liu, Xiaohui Du, Ruqian Hao, Juanxiu Liu, Yong Liu, Hao Deng, Jing Zhang

Abstract: Micro-expressions (MEs) are subtle facial movements that occur spontaneously when people try to conceal the real emotions. Micro-expression recognition (MER) is crucial in many fields, including criminal analysis and psychotherapy. However, MER is challenging since MEs have low intensity and ME datasets are small in size. To this end, a three-stream temporal-shift attention network based on self-k… ▽ More Micro-expressions (MEs) are subtle facial movements that occur spontaneously when people try to conceal the real emotions. Micro-expression recognition (MER) is crucial in many fields, including criminal analysis and psychotherapy. However, MER is challenging since MEs have low intensity and ME datasets are small in size. To this end, a three-stream temporal-shift attention network based on self-knowledge distillation (SKD-TSTSAN) is proposed in this paper. Firstly, to address the low intensity of ME muscle movements, we utilize learning-based motion magnification modules to enhance the intensity of ME muscle movements. Secondly, we employ efficient channel attention (ECA) modules in the local-spatial stream to make the network focus on facial regions that are highly relevant to MEs. In addition, temporal shift modules (TSMs) are used in the dynamic-temporal stream, which enables temporal modeling with no additional parameters by mixing ME motion information from two different temporal domains. Furthermore, we introduce self-knowledge distillation (SKD) into the MER task by introducing auxiliary classifiers and using the deepest section of the network for supervision, encouraging all blocks to fully explore the features of the training set. Finally, extensive experiments are conducted on four ME datasets: CASME II, SAMM, MMEW, and CAS(ME)3. The experimental results demonstrate that our SKD-TSTSAN outperforms other existing methods and achieves new state-of-the-art performance. Our code will be available at https://github.com/GuanghaoZhu663/SKD-TSTSAN. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2405.03971 [pdf, other]

Unified End-to-End V2X Cooperative Autonomous Driving

Authors: Zhiwei Li, Bozhen Zhang, Lei Yang, Tianyu Shen, Nuo Xu, Ruosen Hao, Weiting Li, Tao Yan, Huaping Liu

Abstract: V2X cooperation, through the integration of sensor data from both vehicles and infrastructure, is considered a pivotal approach to advancing autonomous driving technology. Current research primarily focuses on enhancing perception accuracy, often overlooking the systematic improvement of accident prediction accuracy through end-to-end learning, leading to insufficient attention to the safety issue… ▽ More V2X cooperation, through the integration of sensor data from both vehicles and infrastructure, is considered a pivotal approach to advancing autonomous driving technology. Current research primarily focuses on enhancing perception accuracy, often overlooking the systematic improvement of accident prediction accuracy through end-to-end learning, leading to insufficient attention to the safety issues of autonomous driving. To address this challenge, this paper introduces the UniE2EV2X framework, a V2X-integrated end-to-end autonomous driving system that consolidates key driving modules within a unified network. The framework employs a deformable attention-based data fusion strategy, effectively facilitating cooperation between vehicles and infrastructure. The main advantages include: 1) significantly enhancing agents' perception and motion prediction capabilities, thereby improving the accuracy of accident predictions; 2) ensuring high reliability in the data fusion process; 3) superior end-to-end perception compared to modular approaches. Furthermore, We implement the UniE2EV2X framework on the challenging DeepAccident, a simulation dataset designed for V2X cooperative driving. △ Less

Submitted 6 May, 2024; originally announced May 2024.

arXiv:2403.10145 [pdf, other]

RCooper: A Real-world Large-scale Dataset for Roadside Cooperative Perception

Authors: Ruiyang Hao, Siqi Fan, Yingru Dai, Zhenlin Zhang, Chenxi Li, Yuntian Wang, Haibao Yu, Wenxian Yang, Jirui Yuan, Zaiqing Nie

Abstract: The value of roadside perception, which could extend the boundaries of autonomous driving and traffic management, has gradually become more prominent and acknowledged in recent years. However, existing roadside perception approaches only focus on the single-infrastructure sensor system, which cannot realize a comprehensive understanding of a traffic area because of the limited sensing range and bl… ▽ More The value of roadside perception, which could extend the boundaries of autonomous driving and traffic management, has gradually become more prominent and acknowledged in recent years. However, existing roadside perception approaches only focus on the single-infrastructure sensor system, which cannot realize a comprehensive understanding of a traffic area because of the limited sensing range and blind spots. Orienting high-quality roadside perception, we need Roadside Cooperative Perception (RCooper) to achieve practical area-coverage roadside perception for restricted traffic areas. Rcooper has its own domain-specific challenges, but further exploration is hindered due to the lack of datasets. We hence release the first real-world, large-scale RCooper dataset to bloom the research on practical roadside cooperative perception, including detection and tracking. The manually annotated dataset comprises 50k images and 30k point clouds, including two representative traffic scenes (i.e., intersection and corridor). The constructed benchmarks prove the effectiveness of roadside cooperation perception and demonstrate the direction of further research. Codes and dataset can be accessed at: https://github.com/AIR-THU/DAIR-RCooper. △ Less

Submitted 31 March, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

Comments: Accepted by CVPR2024. 10 pages with 6 figures

ACM Class: I.4.8; I.5.4

arXiv:2401.00271 [pdf, other]

HybridGait: A Benchmark for Spatial-Temporal Cloth-Changing Gait Recognition with Hybrid Explorations

Authors: Yilan Dong, Chunlin Yu, Ruiyang Ha, Ye Shi, Yuexin Ma, Lan Xu, Yanwei Fu, Jingya Wang

Abstract: Existing gait recognition benchmarks mostly include minor clothing variations in the laboratory environments, but lack persistent changes in appearance over time and space. In this paper, we propose the first in-the-wild benchmark CCGait for cloth-changing gait recognition, which incorporates diverse clothing changes, indoor and outdoor scenes, and multi-modal statistics over 92 days. To further a… ▽ More Existing gait recognition benchmarks mostly include minor clothing variations in the laboratory environments, but lack persistent changes in appearance over time and space. In this paper, we propose the first in-the-wild benchmark CCGait for cloth-changing gait recognition, which incorporates diverse clothing changes, indoor and outdoor scenes, and multi-modal statistics over 92 days. To further address the coupling effect of clothing and viewpoint variations, we propose a hybrid approach HybridGait that exploits both temporal dynamics and the projected 2D information of 3D human meshes. Specifically, we introduce a Canonical Alignment Spatial-Temporal Transformer (CA-STT) module to encode human joint position-aware features, and fully exploit 3D dense priors via a Silhouette-guided Deformation with 3D-2D Appearance Projection (SilD) strategy. Our contributions are twofold: we provide a challenging benchmark CCGait that captures realistic appearance changes across an expanded and space, and we propose a hybrid framework HybridGait that outperforms prior works on CCGait and Gait3D benchmarks. Our project page is available at https://github.com/HCVLab/HybridGait. △ Less

Submitted 30 December, 2023; originally announced January 2024.

arXiv:2308.01551 [pdf, other]

Avoidance Navigation Based on Offline Pre-Training Reinforcement Learning

Authors: Yang Wenkai Ji Ruihang Zhang Yuxiang Lei Hao, Zhao Zijie

Abstract: This paper presents a Pre-Training Deep Reinforcement Learning(DRL) for avoidance navigation without map for mobile robots which map raw sensor data to control variable and navigate in an unknown environment. The efficient offline training strategy is proposed to speed up the inefficient random explorations in early stage and we also collect a universal dataset including expert experience for offl… ▽ More This paper presents a Pre-Training Deep Reinforcement Learning(DRL) for avoidance navigation without map for mobile robots which map raw sensor data to control variable and navigate in an unknown environment. The efficient offline training strategy is proposed to speed up the inefficient random explorations in early stage and we also collect a universal dataset including expert experience for offline training, which is of some significance for other navigation training work. The pre-training and prioritized expert experience are proposed to reduce 80\% training time and has been verified to improve the 2 times reward of DRL. The advanced simulation gazebo with real physical modelling and dynamic equations reduce the gap between sim-to-real. We train our model a corridor environment, and evaluate the model in different environment getting the same effect. Compared to traditional method navigation, we can confirm the trained model can be directly applied into different scenarios and have the ability to no collision navigate. It was demonstrated that our DRL model have universal general capacity in different environment. △ Less

Submitted 3 August, 2023; originally announced August 2023.

arXiv:2305.19278 [pdf, other]

Enhancing Human Capabilities through Symbiotic Artificial Intelligence with Shared Sensory Experiences

Authors: Rui Hao, Dianbo Liu, Linmei Hu

Abstract: The merging of human intelligence and artificial intelligence has long been a subject of interest in both science fiction and academia. In this paper, we introduce a novel concept in Human-AI interaction called Symbiotic Artificial Intelligence with Shared Sensory Experiences (SAISSE), which aims to establish a mutually beneficial relationship between AI systems and human users through shared sens… ▽ More The merging of human intelligence and artificial intelligence has long been a subject of interest in both science fiction and academia. In this paper, we introduce a novel concept in Human-AI interaction called Symbiotic Artificial Intelligence with Shared Sensory Experiences (SAISSE), which aims to establish a mutually beneficial relationship between AI systems and human users through shared sensory experiences. By integrating multiple sensory input channels and processing human experiences, SAISSE fosters a strong human-AI bond, enabling AI systems to learn from and adapt to individual users, providing personalized support, assistance, and enhancement. Furthermore, we discuss the incorporation of memory storage units for long-term growth and development of both the AI system and its human user. As we address user privacy and ethical guidelines for responsible AI-human symbiosis, we also explore potential biases and inequalities in AI-human symbiosis and propose strategies to mitigate these challenges. Our research aims to provide a comprehensive understanding of the SAISSE concept and its potential to effectively support and enhance individual human users through symbiotic AI systems. This position article aims at discussing poteintial AI-human interaction related topics within the scientific community, rather than providing experimental or theoretical results. △ Less

Submitted 26 May, 2023; originally announced May 2023.

arXiv:2304.12998 [pdf, other]

ChatLLM Network: More brains, More intelligence

Authors: Rui Hao, Linmei Hu, Weijian Qi, Qingliu Wu, Yirui Zhang, Liqiang Nie

Abstract: Dialogue-based language models mark a huge milestone in the field of artificial intelligence, by their impressive ability to interact with users, as well as a series of challenging tasks prompted by customized instructions. However, the prevalent large-scale dialogue-based language models like ChatGPT still have room for improvement, such as unstable responses to questions and the inability to thi… ▽ More Dialogue-based language models mark a huge milestone in the field of artificial intelligence, by their impressive ability to interact with users, as well as a series of challenging tasks prompted by customized instructions. However, the prevalent large-scale dialogue-based language models like ChatGPT still have room for improvement, such as unstable responses to questions and the inability to think cooperatively like humans. Considering the ability of dialogue-based language models in conversation and their inherent randomness in thinking, we propose ChatLLM network that allows multiple dialogue-based language models to interact, provide feedback, and think together. We design the network of ChatLLMs based on ChatGPT. Specifically, individual instances of ChatGPT may possess distinct perspectives towards the same problem, and by consolidating these diverse viewpoints via a separate ChatGPT, the ChatLLM network system can conduct decision-making more objectively and comprehensively. In addition, a language-based feedback mechanism comparable to backpropagation is devised to update the ChatGPTs within the network. Experiments on two datasets demonstrate that our network attains significant improvements in problem-solving, leading to observable progress amongst each member. △ Less

Submitted 24 April, 2023; originally announced April 2023.

arXiv:2211.05100 [pdf, other]

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Authors: BigScience Workshop, :, Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow, Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, Thomas Wang, Benoît Sagot, Niklas Muennighoff, Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas Bekman, Angelina McMillan-Major , et al. (369 additional authors not shown)

Abstract: Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access… ▽ More Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License. △ Less

Submitted 27 June, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

arXiv:2208.05872 [pdf, other]

Optimizing Irregular-Shaped Matrix-Matrix Multiplication on Multi-Core DSPs

Authors: Shangfei Yin, Qinglin Wang, Ruochen Hao, Tianyang Zhou, Songzhu Mei, Jie Liu

Abstract: General Matrix Multiplication (GEMM) has a wide range of applications in scientific simulation and artificial intelligence. Although traditional libraries can achieve high performance on large regular-shaped GEMMs, they often behave not well on irregular-shaped GEMMs, which are often found in new algorithms and applications of high-performance computing (HPC). Due to energy efficiency constraints,… ▽ More General Matrix Multiplication (GEMM) has a wide range of applications in scientific simulation and artificial intelligence. Although traditional libraries can achieve high performance on large regular-shaped GEMMs, they often behave not well on irregular-shaped GEMMs, which are often found in new algorithms and applications of high-performance computing (HPC). Due to energy efficiency constraints, low-power multi-core digital signal processors (DSPs) have become an alternative architecture in HPC systems. Targeting multi-core DSPs in FT-m7032, a prototype CPU-DSPs heterogeneous processor for HPC, an efficient implementation - ftIMM - for three types of irregular-shaped GEMMs is proposed. FtIMM supports automatic generation of assembly micro-kernels, two parallelization strategies, and auto-tuning of block sizes and parallelization strategies. The experiments show that ftIMM can get better performance than the traditional GEMM implementations on multi-core DSPs in FT-m7032, yielding on up to 7.2x performance improvement, when performing on irregular-shaped GEMMs. And ftIMM on multi-core DSPs can also far outperform the open source library on multi-core CPUs in FT-m7032, delivering up to 3.1x higher efficiency. △ Less

Submitted 11 August, 2022; originally announced August 2022.

Comments: 11 pages

arXiv:2206.12124 [pdf, ps, other]

Towards Effective Depthwise Convolutions on ARMv8 Architecture

Authors: Ruochen Hao, Qinglin Wang, Shangfei Yin, Tianyang Zhou, Siqi Shen, Songzhu Mei, Jie Liu

Abstract: Depthwise convolutions are widely used in lightweight convolutional neural networks (CNNs). The performance of depthwise convolutions is mainly bounded by the memory access rather than the arithmetic operations for classic convolutions so that direct algorithms are often more efficient than indirect ones (matrix multiplication-, Winograd-, and FFT-based convolutions) with additional memory accesse… ▽ More Depthwise convolutions are widely used in lightweight convolutional neural networks (CNNs). The performance of depthwise convolutions is mainly bounded by the memory access rather than the arithmetic operations for classic convolutions so that direct algorithms are often more efficient than indirect ones (matrix multiplication-, Winograd-, and FFT-based convolutions) with additional memory accesses. However, the existing direct implementations of depthwise convolutions on ARMv8 architectures feature a bad trade-off between register-level reuse of different tensors, which usually leads to sub-optimal performance. In this paper, we propose new direct implementations of depthwise convolutions by means of implicit padding, register tiling, etc., which contain forward propagation, backward propagation and weight gradient update procedures. Compared to the existing ones, our new implementations can incur much less communication overhead between registers and cache. Experimental results on two ARMv8 CPUs show that our implementations can averagely deliver 4.88x and 16.4x performance improvement over the existing direct ones in open source libraries and matrix multiplications-based ones in Pytorch, respectively. △ Less

Submitted 24 June, 2022; originally announced June 2022.

arXiv:2205.01278 [pdf, other]

doi 10.1109/TITS.2023.3243940

Real-time Cooperative Vehicle Coordination at Unsignalized Road Intersections

Authors: Jiping Luo, Tingting Zhang, Rui Hao, Donglin Li, Chunsheng Chen, Zhenyu Na, Qinyu Zhang

Abstract: Cooperative coordination at unsignalized road intersections, which aims to improve the driving safety and traffic throughput for connected and automated vehicles, has attracted increasing interests in recent years. However, most existing investigations either suffer from computational complexity or cannot harness the full potential of the road infrastructure. To this end, we first present a dedica… ▽ More Cooperative coordination at unsignalized road intersections, which aims to improve the driving safety and traffic throughput for connected and automated vehicles, has attracted increasing interests in recent years. However, most existing investigations either suffer from computational complexity or cannot harness the full potential of the road infrastructure. To this end, we first present a dedicated intersection coordination framework, where the involved vehicles hand over their control authorities and follow instructions from a centralized coordinator. Then a unified cooperative trajectory optimization problem will be formulated to maximize the traffic throughput while ensuring the driving safety and long-term stability of the coordination system. To address the key computational challenges in the real-world deployment, we reformulate this non-convex sequential decision problem into a model-free Markov Decision Process (MDP) and tackle it by devising a Twin Delayed Deep Deterministic Policy Gradient (TD3)-based strategy in the deep reinforcement learning (DRL) framework. Simulation and practical experiments show that the proposed strategy could achieve near-optimal performance in sub-static coordination scenarios and significantly improve the traffic throughput in the realistic continuous traffic flow. The most remarkable advantage is that our strategy could reduce the time complexity of computation to milliseconds, and is shown scalable when the road lanes increase. △ Less

Submitted 22 March, 2023; v1 submitted 2 May, 2022; originally announced May 2022.

Comments: in IEEE Transactions on Intelligent Transportation Systems

arXiv:2109.12259 [pdf, ps, other]

NUMA-aware FFT-based Convolution on ARMv8 Many-core CPUs

Authors: Xiandong Huang, Qinglin Wang, Shuyu Lu, Ruochen Hao, Songzhu Mei, Jie Liu

Abstract: Convolutional Neural Networks (CNNs), one of the most representative algorithms of deep learning, are widely used in various artificial intelligence applications. Convolution operations often take most of the computational overhead of CNNs. The FFT-based algorithm can improve the efficiency of convolution by reducing its algorithm complexity, there are a lot of works about the high-performance imp… ▽ More Convolutional Neural Networks (CNNs), one of the most representative algorithms of deep learning, are widely used in various artificial intelligence applications. Convolution operations often take most of the computational overhead of CNNs. The FFT-based algorithm can improve the efficiency of convolution by reducing its algorithm complexity, there are a lot of works about the high-performance implementation of FFT-based convolution on many-core CPUs. However, there is no optimization for the non-uniform memory access (NUMA) characteristics in many-core CPUs. In this paper, we present a NUMA-aware FFT-based convolution implementation on ARMv8 many-core CPUs with NUMA architectures. The implementation can reduce a number of remote memory access through the data reordering of FFT transformations and the three-level parallelization of the complex matrix multiplication. The experiment results on a ARMv8 many-core CPU with NUMA architectures demonstrate that our NUMA-aware implementation has much better performance than the state-of-the-art work in most cases. △ Less

Submitted 24 September, 2021; originally announced September 2021.

Comments: Accepted by the 19th IEEE International Symposium on Parallel and Distributed Processing with Applications (IEEE ISPA 2021)

arXiv:2102.04706 [pdf, other]

PyART: Python API Recommendation in Real-Time

Authors: Xincheng He, Lei Xu, Xiangyu Zhang, Rui Hao, Yang Feng, Baowen Xu

Abstract: API recommendation in real-time is challenging for dynamic languages like Python. Many existing API recommendation techniques are highly effective, but they mainly support static languages. A few Python IDEs provide API recommendation functionalities based on type inference and training on a large corpus of Python libraries and third-party libraries. As such, they may fail to recommend or make poo… ▽ More API recommendation in real-time is challenging for dynamic languages like Python. Many existing API recommendation techniques are highly effective, but they mainly support static languages. A few Python IDEs provide API recommendation functionalities based on type inference and training on a large corpus of Python libraries and third-party libraries. As such, they may fail to recommend or make poor recommendations when type information is missing or target APIs are project-specific. In this paper, we propose a novel approach, PyART, to recommend APIs for Python programs in real-time. It features a light-weight analysis to derives so-called optimistic data-flow, which is neither sound nor complete, but simulates the local data-flow information humans can derive. It extracts three kinds of features: data-flow, token similarity, and token co-occurrence, in the context of the program point where a recommendation is solicited. A predictive model is trained on these features using the Random Forest algorithm. Evaluation on 8 popular Python projects demonstrates that PyART can provide effective API recommendations. When historic commits can be leveraged, which is the target scenario of a state-of-the-art tool ARIREC, our average top-1 accuracy is over 50% and average top-10 accuracy over 70%, outperforming APIREC and Intellicode (i.e., the recommendation component in Visual Studio) by 28.48%-39.05% for top-1 accuracy and 24.41%-30.49% for top-10 accuracy. In other applications such as when historic comments are not available and cross-project recommendation, PyART also shows better overall performance. The time to make a recommendation is less than a second on average, satisfying the real-time requirement. △ Less

Submitted 9 February, 2021; originally announced February 2021.

Comments: 12 pages

arXiv:2011.09265 [pdf]

A Transfer Learning Based Active Learning Framework for Brain Tumor Classification

Authors: Ruqian Hao, Khashayar Namdar, Lin Liu, Farzad Khalvati

Abstract: Brain tumor is one of the leading causes of cancer-related death globally among children and adults. Precise classification of brain tumor grade (low-grade and high-grade glioma) at early stage plays a key role in successful prognosis and treatment planning. With recent advances in deep learning, Artificial Intelligence-enabled brain tumor grading systems can assist radiologists in the interpretat… ▽ More Brain tumor is one of the leading causes of cancer-related death globally among children and adults. Precise classification of brain tumor grade (low-grade and high-grade glioma) at early stage plays a key role in successful prognosis and treatment planning. With recent advances in deep learning, Artificial Intelligence-enabled brain tumor grading systems can assist radiologists in the interpretation of medical images within seconds. The performance of deep learning techniques is, however, highly depended on the size of the annotated dataset. It is extremely challenging to label a large quantity of medical images given the complexity and volume of medical data. In this work, we propose a novel transfer learning based active learning framework to reduce the annotation cost while maintaining stability and robustness of the model performance for brain tumor classification. We employed a 2D slice-based approach to train and finetune our model on the Magnetic Resonance Imaging (MRI) training dataset of 203 patients and a validation dataset of 66 patients which was used as the baseline. With our proposed method, the model achieved Area Under Receiver Operating Characteristic (ROC) Curve (AUC) of 82.89% on a separate test dataset of 66 patients, which was 2.92% higher than the baseline AUC while saving at least 40% of labeling cost. In order to further examine the robustness of our method, we created a balanced dataset, which underwent the same procedure. The model achieved AUC of 82% compared with AUC of 78.48% for the baseline, which reassures the robustness and stability of our proposed transfer learning augmented with active learning framework while significantly reducing the size of training data. △ Less

Submitted 16 November, 2020; originally announced November 2020.

arXiv:2006.01693 [pdf]

A Comprehensive Study of Data Augmentation Strategies for Prostate Cancer Detection in Diffusion-weighted MRI using Convolutional Neural Networks

Authors: Ruqian Hao, Khashayar Namdar, Lin Liu, Masoom A. Haider, Farzad Khalvati

Abstract: Data augmentation refers to a group of techniques whose goal is to battle limited amount of available data to improve model generalization and push sample distribution toward the true distribution. While different augmentation strategies and their combinations have been investigated for various computer vision tasks in the context of deep learning, a specific work in the domain of medical imaging… ▽ More Data augmentation refers to a group of techniques whose goal is to battle limited amount of available data to improve model generalization and push sample distribution toward the true distribution. While different augmentation strategies and their combinations have been investigated for various computer vision tasks in the context of deep learning, a specific work in the domain of medical imaging is rare and to the best of our knowledge, there has been no dedicated work on exploring the effects of various augmentation methods on the performance of deep learning models in prostate cancer detection. In this work, we have statically applied five most frequently used augmentation techniques (random rotation, horizontal flip, vertical flip, random crop, and translation) to prostate Diffusion-weighted Magnetic Resonance Imaging training dataset of 217 patients separately and evaluated the effect of each method on the accuracy of prostate cancer detection. The augmentation algorithms were applied independently to each data channel and a shallow as well as a deep Convolutional Neural Network (CNN) were trained on the five augmented sets separately. We used Area Under Receiver Operating Characteristic (ROC) curve (AUC) to evaluate the performance of the trained CNNs on a separate test set of 95 patients, using a validation set of 102 patients for finetuning. The shallow network outperformed the deep network with the best 2D slice-based AUC of 0.85 obtained by the rotation method. △ Less

Submitted 1 June, 2020; originally announced June 2020.

arXiv:1910.00852 [pdf, other]

Strong Menger connectedness of augmented $k$-ary $n$-cubes

Authors: Mei-Mei Gu, Jou-Ming Chang, Rong-Xia Hao

Abstract: A connected graph $G$ is called strongly Menger (edge) connected if for any two distinct vertices $x,y$ of $G$, there are $\min \{{\rm deg}_G(x), {\rm deg}_G(y)\}$ vertex(edge)-disjoint paths between $x$ and $y$. In this paper, we consider strong Menger (edge) connectedness of the augmented $k$-ary $n$-cube $AQ_{n,k}$, which is a variant of $k$-ary $n$-cube $Q_n^k$. By exploring the topological pr… ▽ More A connected graph $G$ is called strongly Menger (edge) connected if for any two distinct vertices $x,y$ of $G$, there are $\min \{{\rm deg}_G(x), {\rm deg}_G(y)\}$ vertex(edge)-disjoint paths between $x$ and $y$. In this paper, we consider strong Menger (edge) connectedness of the augmented $k$-ary $n$-cube $AQ_{n,k}$, which is a variant of $k$-ary $n$-cube $Q_n^k$. By exploring the topological proprieties of $AQ_{n,k}$, we show that $AQ_{n,3}$ for $n\geq 4$ (resp.\ $AQ_{n,k}$ for $n\geq 2$ and $k\geq 4$) is still strongly Menger connected even when there are $4n-9$ (resp.\ $4n-8$) faulty vertices and $AQ_{n,k}$ is still strongly Menger edge connected even when there are $4n-4$ faulty edges for $n\geq 2$ and $k\geq 3$. Moreover, under the restricted condition that each vertex has at least two fault-free edges, we show that $AQ_{n,k}$ is still strongly Menger edge connected even when there are $8n-10$ faulty edges for $n\geq 2$ and $k\geq 3$. These results are all optimal in the sense of the maximum number of tolerated vertex (resp.\ edge) faults. △ Less

Submitted 2 October, 2019; originally announced October 2019.

Comments: 18 pages, 4 figures

arXiv:1812.00617 [pdf, other]

doi 10.1109/ACCESS.2019.2929238

The Component Connectivity of Alternating Group Graphs and Split-Stars

Authors: Mei-Mei Gu, Rong-Xia Hao, Jou-Ming Chang

Abstract: For an integer $\ell\geqslant 2$, the $\ell$-component connectivity of a graph $G$, denoted by $κ_{\ell}(G)$, is the minimum number of vertices whose removal from $G$ results in a disconnected graph with at least $\ell$ components or a graph with fewer than $\ell$ vertices. This is a natural generalization of the classical connectivity of graphs defined in term of the minimum vertex-cut and is a g… ▽ More For an integer $\ell\geqslant 2$, the $\ell$-component connectivity of a graph $G$, denoted by $κ_{\ell}(G)$, is the minimum number of vertices whose removal from $G$ results in a disconnected graph with at least $\ell$ components or a graph with fewer than $\ell$ vertices. This is a natural generalization of the classical connectivity of graphs defined in term of the minimum vertex-cut and is a good measure of robustness for the graph corresponding to a network. So far, the exact values of $\ell$-connectivity are known only for a few classes of networks and small $\ell$'s. It has been pointed out in~[Component connectivity of the hypercubes, Int. J. Comput. Math. 89 (2012) 137--145] that determining $\ell$-connectivity is still unsolved for most interconnection networks, such as alternating group graphs and star graphs. In this paper, by exploring the combinatorial properties and fault-tolerance of the alternating group graphs $AG_n$ and a variation of the star graphs called split-stars $S_n^2$, we study their $\ell$-component connectivities. We obtain the following results: (i) $κ_3(AG_n)=4n-10$ and $κ_4(AG_n)=6n-16$ for $n\geqslant 4$, and $κ_5(AG_n)=8n-24$ for $n\geqslant 5$; (ii) $κ_3(S_n^2)=4n-8$, $κ_4(S_n^2)=6n-14$, and $κ_5(S_n^2)=8n-20$ for $n\geqslant 4$. △ Less

Submitted 3 December, 2018; originally announced December 2018.

Journal ref: IEEE Access, Vol. 7, (2019) pp. 97745-97759

arXiv:1702.00259 [pdf, ps, other]

Fault diagnosability of data center networks

Authors: Mei-Mei Gu, Rong-Xia Hao, Shuming Zhou

Abstract: The data center networks $D_{n,k}$, proposed in 2008, has many desirable features such as high network capacity. A kind of generalization of diagnosability for network $G$ is $g$-good-neighbor diagnosability which is denoted by $t_g(G)$. Let $κ^g(G)$ be the $R^g$-connectivity. Lin et. al. in [IEEE Trans. on Reliability, 65 (3) (2016) 1248--1262] and Xu et. al in [Theor. Comput. Sci. 659 (2017) 53-… ▽ More The data center networks $D_{n,k}$, proposed in 2008, has many desirable features such as high network capacity. A kind of generalization of diagnosability for network $G$ is $g$-good-neighbor diagnosability which is denoted by $t_g(G)$. Let $κ^g(G)$ be the $R^g$-connectivity. Lin et. al. in [IEEE Trans. on Reliability, 65 (3) (2016) 1248--1262] and Xu et. al in [Theor. Comput. Sci. 659 (2017) 53--63] gave the same problem independently that: the relationship between the $R^g$-connectivity $κ^g(G)$ and $t_g(G)$ of a general graph $G$ need to be studied in the future. In this paper, this open problem is solved for general regular graphs. We firstly establish the relationship of $κ^g(G)$ and $t_g(G)$, and obtain that $t_g(G)=κ^g(G)+g$ under some conditions. Secondly, we obtain the $g$-good-neighbor diagnosability of $D_{k,n}$ which are $t_g(D_{k,n})=(g+1)(k-1)+n+g$ for $1\leq g\leq n-1$ under the PMC model and the MM model, respectively. Further more, we show that $D_{k,n}$ is tightly super $(n+k-1)$-connected for $n\geq 2$ and $k\geq 2$ and we also prove that the largest connected component of the survival graph contains almost all of the remaining vertices in $D_{k,n}$ when $2k+n-2$ vertices removed. △ Less

Submitted 29 January, 2017; originally announced February 2017.

Comments: 16 pages, 2 figures

MSC Class: 68R10; 05C90 ACM Class: G.2.2

arXiv:1309.5083 [pdf, ps, other]

3-extra connectivity of 3-ary n-cube networks

Authors: Meimei Gu, Rongxia Hao

Abstract: Let G be a connected graph and S be a set of vertices. The h-extra connectivity of G is the cardinality of a minimum set S such that G-S is disconnected and each component of G-S has at least h+1 vertices. The h-extra connectivity is an important parameter to measure the reliability and fault tolerance ability of large interconnection networks. The h-extra connectivity for h=1,2 of k-ary n-cube ar… ▽ More Let G be a connected graph and S be a set of vertices. The h-extra connectivity of G is the cardinality of a minimum set S such that G-S is disconnected and each component of G-S has at least h+1 vertices. The h-extra connectivity is an important parameter to measure the reliability and fault tolerance ability of large interconnection networks. The h-extra connectivity for h=1,2 of k-ary n-cube are gotten by Hsieh et al. in [Theoretical Computer Science, 443 (2012) 63-69] for k>=4 and Zhu et al. in [Theory of Computing Systems, arxiv.org/pdf/1105.0991v1 [cs.DM] 5 May 2011] for k=3. In this paper, we show that the h-extra connectivity of the 3-ary n-cube networks for h=3 is equal to 8n-12, where n>=3. △ Less

Submitted 19 September, 2013; originally announced September 2013.

Comments: 20 pages,1 figures. arXiv admin note: substantial text overlap with arXiv:1309.4961

arXiv:1309.4961

On 3-extra connectivity of k-ary n-cubes

Authors: Mei-Mei Gu, Rong-Xia Hao, Jian-Bing Liu

Abstract: Given a graph G, a non-negative integer h and a set of vertices S, the h-extra connectivity of G is the cardinality of a minimum set S such that G-S is disconnected and each component of G-S has at least h+1 vertices. The 2-extra connectivity of k-ary n-cube is gotten by Hsieh et al. in [Theoretical Computer Science, 443 (2012) 63-69]. In this paper, we obtained the h-extra connectivity of the k-a… ▽ More Given a graph G, a non-negative integer h and a set of vertices S, the h-extra connectivity of G is the cardinality of a minimum set S such that G-S is disconnected and each component of G-S has at least h+1 vertices. The 2-extra connectivity of k-ary n-cube is gotten by Hsieh et al. in [Theoretical Computer Science, 443 (2012) 63-69]. In this paper, we obtained the h-extra connectivity of the k-ary n-cube networks for h=3. △ Less

Submitted 13 September, 2014; v1 submitted 19 September, 2013; originally announced September 2013.

Comments: This paper has been withdrawn by the author due to a crucial sign error in the main theorem

arXiv:1201.0219 [pdf]

A-GPS Assisted Wi-Fi Access Point Discovery on Mobile Devices for Energy Saving

Authors: Feng Xia, Wei Zhang, Fangwei Ding, Ruonan Hao

Abstract: Mobile devices have been shipped with multiple wireless network interfaces in order to meet their diverse communication and networking demands. In this paper, we propose an A-GPS assisted scheme that discovers the nearest Wi-Fi network access points (APs) by using user's location information. This allows the user to switch to the Wi-Fi interface in an intelligent manner when she/he arrives at the… ▽ More Mobile devices have been shipped with multiple wireless network interfaces in order to meet their diverse communication and networking demands. In this paper, we propose an A-GPS assisted scheme that discovers the nearest Wi-Fi network access points (APs) by using user's location information. This allows the user to switch to the Wi-Fi interface in an intelligent manner when she/he arrives at the nearest Wi-Fi network AP. Therefore, it avoids the long periods in idle state and greatly reduces the number of unnecessary Wi-Fi scans on the mobile device. The experimental results demonstrate that our scheme effectively saves energy for mobile devices integrated with Wi-Fi and cellular interfaces. △ Less

Submitted 30 December, 2011; originally announced January 2012.

Comments: IEEE Global Information Infrastructure Symposium (GIIS 2011), August 2011, Da Nang, Vietnam

MSC Class: 68M10 ACM Class: C.2

arXiv:1201.0215 [pdf, ps, other]

ART-GAS: An Adaptive and Real-Time GTS Allocation Scheme for IEEE 802.15.4

Authors: Feng Xia, Ruonan Hao, Yang Cao, Lei Xue

Abstract: IEEE 802.15.4 supports a Guaranteed Time Slot (GTS) allocation mechanism for time-critical and delay-sensitive data transmissions in Wireless Personal Area Networks (WPANs). However, the inflexible first-come-first-served GTS allocation policy and the passive deallocation mechanism significantly reduce network efficiency. In this paper, we propose an Adaptive and Real-Time GTS Allocation Scheme (A… ▽ More IEEE 802.15.4 supports a Guaranteed Time Slot (GTS) allocation mechanism for time-critical and delay-sensitive data transmissions in Wireless Personal Area Networks (WPANs). However, the inflexible first-come-first-served GTS allocation policy and the passive deallocation mechanism significantly reduce network efficiency. In this paper, we propose an Adaptive and Real-Time GTS Allocation Scheme (ART-GAS) to provide differentiated services for devices with different priorities, which guarantees data transmissions for time-sensitive and high-traffic devices. The bandwidth utilization in IEEE 802.15.4-based PAN is improved. Simulation results show that our ART-GAS algorithm significantly outperforms the existing GTS mechanism specified in IEEE 802.15.4. △ Less

Submitted 30 December, 2011; originally announced January 2012.

Comments: The Asian Internet Engineering Conference (AINTEC 2011), ACM, November 2011, Bangkok, Thailand

MSC Class: 68M12 ACM Class: C.2.2

arXiv:1201.0210 [pdf]

Real-Time Performance Analysis of Infrastructure-based IEEE 802.11 Distributed Coordination Function

Authors: Feng Xia, Ruixia Gao, Linqiang Wang, Ruonan Hao

Abstract: With the increasing popularity of wireless networks, wireless local area networks (WLANs) have attracted significant research interest, which play a critical role in providing anywhere and anytime connectivity. For WLANs the IEEE 802.11 standard is the most mature technology and has been widely adopted for wireless networks. This paper analyzes real-time performance of the IEEE 802.11 standard tha… ▽ More With the increasing popularity of wireless networks, wireless local area networks (WLANs) have attracted significant research interest, which play a critical role in providing anywhere and anytime connectivity. For WLANs the IEEE 802.11 standard is the most mature technology and has been widely adopted for wireless networks. This paper analyzes real-time performance of the IEEE 802.11 standard that adopts the MAC protocol of Distributed Coordination Function (DCF) operating in infrastructure mode. Extensive simulations have been done to examine how the network performance in terms of realtime metrics including effective data rate, latency and packet loss rate will be impacted by some critical parameters (e.g. CWmin and packet payload). The results are presented and analyzed. The analysis of simulation results can provide support for parameter configuration and optimization of WLANs for realtime applications. △ Less

Submitted 30 December, 2011; originally announced January 2012.

MSC Class: 68M20 ACM Class: C.2.2

Journal ref: Control Engineering and Applied Informatics, Vol.13, No.3, pp. 74-81, 2011

Showing 1–26 of 26 results for author: Hao, R