subscribe to arXiv mailings

Gentle-CLIP: Exploring Aligned Semantic In Low-Quality Multimodal Data With Soft Alignment

Authors: Zijia Song, Zelin Zang, Yelin Wang, Guozheng Yang, Jiangbin Zheng, Kaicheng yu, Wanyu Chen, Stan Z. Li

Abstract: Multimodal fusion breaks through the barriers between diverse modalities and has already yielded numerous impressive performances. However, in various specialized fields, it is struggling to obtain sufficient alignment data for the training process, which seriously limits the use of previously elegant models. Thus, semi-supervised learning attempts to achieve multimodal alignment with fewer matche… ▽ More Multimodal fusion breaks through the barriers between diverse modalities and has already yielded numerous impressive performances. However, in various specialized fields, it is struggling to obtain sufficient alignment data for the training process, which seriously limits the use of previously elegant models. Thus, semi-supervised learning attempts to achieve multimodal alignment with fewer matched pairs but traditional methods like pseudo-labeling are difficult to apply in domains with no label information. To address these problems, we transform semi-supervised multimodal alignment into a manifold matching problem and propose a new method based on CLIP, named Gentle-CLIP. Specifically, we design a novel semantic density distribution loss to explore implicit semantic alignment information from unpaired multimodal data by constraining the latent representation distribution with fine granularity, thus eliminating the need for numerous strictly matched pairs. Meanwhile, we introduce multi-kernel maximum mean discrepancy as well as self-supervised contrastive loss to pull separate modality distributions closer and enhance the stability of the representation distribution. In addition, the contrastive loss used in CLIP is employed on the supervised matched data to prevent negative optimization. Extensive experiments conducted on a range of tasks in various fields, including protein, remote sensing, and the general vision-language field, demonstrate the effectiveness of our proposed Gentle-CLIP. △ Less

Submitted 9 June, 2024; originally announced June 2024.

arXiv:2406.01627 [pdf, other]

GenBench: A Benchmarking Suite for Systematic Evaluation of Genomic Foundation Models

Authors: Zicheng Liu, Jiahui Li, Siyuan Li, Zelin Zang, Cheng Tan, Yufei Huang, Yajing Bai, Stan Z. Li

Abstract: The Genomic Foundation Model (GFM) paradigm is expected to facilitate the extraction of generalizable representations from massive genomic data, thereby enabling their application across a spectrum of downstream applications. Despite advancements, a lack of evaluation framework makes it difficult to ensure equitable assessment due to experimental settings, model intricacy, benchmark datasets, and… ▽ More The Genomic Foundation Model (GFM) paradigm is expected to facilitate the extraction of generalizable representations from massive genomic data, thereby enabling their application across a spectrum of downstream applications. Despite advancements, a lack of evaluation framework makes it difficult to ensure equitable assessment due to experimental settings, model intricacy, benchmark datasets, and reproducibility challenges. In the absence of standardization, comparative analyses risk becoming biased and unreliable. To surmount this impasse, we introduce GenBench, a comprehensive benchmarking suite specifically tailored for evaluating the efficacy of Genomic Foundation Models. GenBench offers a modular and expandable framework that encapsulates a variety of state-of-the-art methodologies. Through systematic evaluations of datasets spanning diverse biological domains with a particular emphasis on both short-range and long-range genomic tasks, firstly including the three most important DNA tasks covering Coding Region, Non-Coding Region, Genome Structure, etc. Moreover, We provide a nuanced analysis of the interplay between model architecture and dataset characteristics on task-specific performance. Our findings reveal an interesting observation: independent of the number of parameters, the discernible difference in preference between the attention-based and convolution-based models on short- and long-range tasks may provide insights into the future design of GFM. △ Less

Submitted 5 June, 2024; v1 submitted 1 June, 2024; originally announced June 2024.

arXiv:2405.16258 [pdf, other]

USD: Unsupervised Soft Contrastive Learning for Fault Detection in Multivariate Time Series

Authors: Hong Liu, Xiuxiu Qiu, Yiming Shi, Zelin Zang

Abstract: Unsupervised fault detection in multivariate time series is critical for maintaining the integrity and efficiency of complex systems, with current methodologies largely focusing on statistical and machine learning techniques. However, these approaches often rest on the assumption that data distributions conform to Gaussian models, overlooking the diversity of patterns that can manifest in both nor… ▽ More Unsupervised fault detection in multivariate time series is critical for maintaining the integrity and efficiency of complex systems, with current methodologies largely focusing on statistical and machine learning techniques. However, these approaches often rest on the assumption that data distributions conform to Gaussian models, overlooking the diversity of patterns that can manifest in both normal and abnormal states, thereby diminishing discriminative performance. Our innovation addresses this limitation by introducing a combination of data augmentation and soft contrastive learning, specifically designed to capture the multifaceted nature of state behaviors more accurately. The data augmentation process enriches the dataset with varied representations of normal states, while soft contrastive learning fine-tunes the model's sensitivity to the subtle differences between normal and abnormal patterns, enabling it to recognize a broader spectrum of anomalies. This dual strategy significantly boosts the model's ability to distinguish between normal and abnormal states, leading to a marked improvement in fault detection performance across multiple datasets and settings, thereby setting a new benchmark for unsupervised fault detection in complex systems. The code of our method is available at \url{https://github.com/zangzelin/code_USD.git}. △ Less

Submitted 25 May, 2024; originally announced May 2024.

Comments: 19 pages, 7 figures, under review

arXiv:2404.13288 [pdf, other]

PoseINN: Realtime Visual-based Pose Regression and Localization with Invertible Neural Networks

Authors: Zirui Zang, Ahmad Amine, Rahul Mangharam

Abstract: Estimating ego-pose from cameras is an important problem in robotics with applications ranging from mobile robotics to augmented reality. While SOTA models are becoming increasingly accurate, they can still be unwieldy due to high computational costs. In this paper, we propose to solve the problem by using invertible neural networks (INN) to find the mapping between the latent space of images and… ▽ More Estimating ego-pose from cameras is an important problem in robotics with applications ranging from mobile robotics to augmented reality. While SOTA models are becoming increasingly accurate, they can still be unwieldy due to high computational costs. In this paper, we propose to solve the problem by using invertible neural networks (INN) to find the mapping between the latent space of images and poses for a given scene. Our model achieves similar performance to the SOTA while being faster to train and only requiring offline rendering of low-resolution synthetic data. By using normalizing flows, the proposed method also provides uncertainty estimation for the output. We also demonstrated the efficiency of this method by deploying the model on a mobile robot. △ Less

Submitted 7 May, 2024; v1 submitted 20 April, 2024; originally announced April 2024.

arXiv:2404.10667 [pdf, other]

VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time

Authors: Sicheng Xu, Guojun Chen, Yu-Xiao Guo, Jiaolong Yang, Chong Li, Zhenyu Zang, Yizhong Zhang, Xin Tong, Baining Guo

Abstract: We introduce VASA, a framework for generating lifelike talking faces with appealing visual affective skills (VAS) given a single static image and a speech audio clip. Our premiere model, VASA-1, is capable of not only producing lip movements that are exquisitely synchronized with the audio, but also capturing a large spectrum of facial nuances and natural head motions that contribute to the percep… ▽ More We introduce VASA, a framework for generating lifelike talking faces with appealing visual affective skills (VAS) given a single static image and a speech audio clip. Our premiere model, VASA-1, is capable of not only producing lip movements that are exquisitely synchronized with the audio, but also capturing a large spectrum of facial nuances and natural head motions that contribute to the perception of authenticity and liveliness. The core innovations include a holistic facial dynamics and head movement generation model that works in a face latent space, and the development of such an expressive and disentangled face latent space using videos. Through extensive experiments including evaluation on a set of new metrics, we show that our method significantly outperforms previous methods along various dimensions comprehensively. Our method not only delivers high video quality with realistic facial and head dynamics but also supports the online generation of 512x512 videos at up to 40 FPS with negligible starting latency. It paves the way for real-time engagements with lifelike avatars that emulate human conversational behaviors. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: Tech Report. Project webpage: https://www.microsoft.com/en-us/research/project/vasa-1/

arXiv:2403.11283 [pdf, other]

doi 10.1145/3597926.3598038

Pattern-Based Peephole Optimizations with Java JIT Tests

Authors: Zhiqiang Zang, Aditya Thimmaiah, Milos Gligoric

Abstract: We present JOG, a framework that facilitates developing Java JIT peephole optimizations alongside JIT tests. JOG enables developers to write a pattern, in Java itself, that specifies desired code transformations by writing code before and after the optimization, as well as any necessary preconditions. Such patterns can be written in the same way that tests of the optimization are already written i… ▽ More We present JOG, a framework that facilitates developing Java JIT peephole optimizations alongside JIT tests. JOG enables developers to write a pattern, in Java itself, that specifies desired code transformations by writing code before and after the optimization, as well as any necessary preconditions. Such patterns can be written in the same way that tests of the optimization are already written in OpenJDK. JOG translates each pattern into C/C++ code that can be integrated as a JIT optimization pass. JOG also generates Java tests for optimizations from patterns. Furthermore, JOG can automatically detect possible shadow relation between a pair of optimizations where the effect of the shadowed optimization is overridden by another. Our evaluation shows that JOG makes it easier to write readable JIT optimizations alongside tests without decreasing the effectiveness of JIT optimizations. We wrote 162 patterns, including 68 existing optimizations in OpenJDK, 92 new optimizations adapted from LLVM, and two new optimizations that we proposed. We opened eight pull requests (PRs) for OpenJDK, including six for new optimizations, one on removing shadowed optimizations, and one for newly generated JIT tests; seven PRs have already been integrated into the master branch of OpenJDK. △ Less

Submitted 17 March, 2024; originally announced March 2024.

Comments: 12 pages, 9 figures, 3 tables, published in ISSTA 2023 (Research Papers track)

arXiv:2403.11281 [pdf, other]

doi 10.1145/3643777

Java JIT Testing with Template Extraction

Authors: Zhiqiang Zang, Fu-Yao Yu, Aditya Thimmaiah, August Shi, Milos Gligoric

Abstract: We present LeJit, a template-based framework for testing Java just-in-time (JIT) compilers. Like recent template-based frameworks, LeJit executes a template -- a program with holes to be filled -- to generate concrete programs given as inputs to Java JIT compilers. LeJit automatically generates template programs from existing Java code by converting expressions to holes, as well as generating nece… ▽ More We present LeJit, a template-based framework for testing Java just-in-time (JIT) compilers. Like recent template-based frameworks, LeJit executes a template -- a program with holes to be filled -- to generate concrete programs given as inputs to Java JIT compilers. LeJit automatically generates template programs from existing Java code by converting expressions to holes, as well as generating necessary glue code (i.e., code that generates instances of non-primitive types) to make generated templates executable. We have successfully used LeJit to test a range of popular Java JIT compilers, revealing five bugs in HotSpot, nine bugs in OpenJ9, and one bug in GraalVM. All of these bugs have been confirmed by Oracle and IBM developers, and 11 of these bugs were previously unknown, including two CVEs (Common Vulnerabilities and Exposures). Our comparison with several existing approaches shows that LeJit is complementary to them and is a powerful technique for ensuring Java JIT compiler correctness. △ Less

Submitted 7 July, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

Comments: 23 pages, 6 figures, 8 tables, accepted in FSE 2024 (Research Papers track)

arXiv:2402.18233 [pdf, other]

Zero-Shot Aerial Object Detection with Visual Description Regularization

Authors: Zhengqing Zang, Chenyu Lin, Chenwei Tang, Tao Wang, Jiancheng Lv

Abstract: Existing object detection models are mainly trained on large-scale labeled datasets. However, annotating data for novel aerial object classes is expensive since it is time-consuming and may require expert knowledge. Thus, it is desirable to study label-efficient object detection methods on aerial images. In this work, we propose a zero-shot method for aerial object detection named visual Descripti… ▽ More Existing object detection models are mainly trained on large-scale labeled datasets. However, annotating data for novel aerial object classes is expensive since it is time-consuming and may require expert knowledge. Thus, it is desirable to study label-efficient object detection methods on aerial images. In this work, we propose a zero-shot method for aerial object detection named visual Description Regularization, or DescReg. Concretely, we identify the weak semantic-visual correlation of the aerial objects and aim to address the challenge with prior descriptions of their visual appearance. Instead of directly encoding the descriptions into class embedding space which suffers from the representation gap problem, we propose to infuse the prior inter-class visual similarity conveyed in the descriptions into the embedding learning. The infusion process is accomplished with a newly designed similarity-aware triplet loss which incorporates structured regularization on the representation space. We conduct extensive experiments with three challenging aerial object detection datasets, including DIOR, xView, and DOTA. The results demonstrate that DescReg significantly outperforms the state-of-the-art ZSD methods with complex projection designs and generative frameworks, e.g., DescReg outperforms best reported ZSD method on DIOR by 4.5 mAP on unseen classes and 8.1 in HM. We further show the generalizability of DescReg by integrating it into generative ZSD methods as well as varying the detection architecture. △ Less

Submitted 1 March, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

Comments: 13 pages, 3 figures

arXiv:2402.16901 [pdf, other]

FGBERT: Function-Driven Pre-trained Gene Language Model for Metagenomics

Authors: ChenRui Duan, Zelin Zang, Yongjie Xu, Hang He, Zihan Liu, Zijia Song, Ju-Sheng Zheng, Stan Z. Li

Abstract: Metagenomic data, comprising mixed multi-species genomes, are prevalent in diverse environments like oceans and soils, significantly impacting human health and ecological functions. However, current research relies on K-mer representations, limiting the capture of structurally relevant gene contexts. To address these limitations and further our understanding of complex relationships between metage… ▽ More Metagenomic data, comprising mixed multi-species genomes, are prevalent in diverse environments like oceans and soils, significantly impacting human health and ecological functions. However, current research relies on K-mer representations, limiting the capture of structurally relevant gene contexts. To address these limitations and further our understanding of complex relationships between metagenomic sequences and their functions, we introduce a protein-based gene representation as a context-aware and structure-relevant tokenizer. Our approach includes Masked Gene Modeling (MGM) for gene group-level pre-training, providing insights into inter-gene contextual information, and Triple Enhanced Metagenomic Contrastive Learning (TEM-CL) for gene-level pre-training to model gene sequence-function relationships. MGM and TEM-CL constitute our novel metagenomic language model {\NAME}, pre-trained on 100 million metagenomic sequences. We demonstrate the superiority of our proposed {\NAME} on eight datasets. △ Less

Submitted 24 February, 2024; originally announced February 2024.

arXiv:2402.13144 [pdf, other]

Neural Network Parameter Diffusion

Authors: Kai Wang, Zhaopan Xu, Yukun Zhou, Zelin Zang, Trevor Darrell, Zhuang Liu, Yang You

Abstract: Diffusion models have achieved remarkable success in image and video generation. In this work, we demonstrate that diffusion models can also \textit{generate high-performing neural network parameters}. Our approach is simple, utilizing an autoencoder and a standard latent diffusion model. The autoencoder extracts latent representations of a subset of the trained network parameters. A diffusion mod… ▽ More Diffusion models have achieved remarkable success in image and video generation. In this work, we demonstrate that diffusion models can also \textit{generate high-performing neural network parameters}. Our approach is simple, utilizing an autoencoder and a standard latent diffusion model. The autoencoder extracts latent representations of a subset of the trained network parameters. A diffusion model is then trained to synthesize these latent parameter representations from random noise. It then generates new representations that are passed through the autoencoder's decoder, whose outputs are ready to use as new subsets of network parameters. Across various architectures and datasets, our diffusion process consistently generates models of comparable or improved performance over trained networks, with minimal additional cost. Notably, we empirically find that the generated models are not memorizing the trained networks. Our results encourage more exploration on the versatile use of diffusion models. △ Less

Submitted 28 May, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

Comments: We introduce a novel approach for parameter generation, named neural network parameter diffusion (\textbf{p-diff}), which employs a standard latent diffusion model to synthesize a new set of parameters

arXiv:2402.09416 [pdf, other]

Deep Manifold Transformation for Protein Representation Learning

Authors: Bozhen Hu, Zelin Zang, Cheng Tan, Stan Z. Li

Abstract: Protein representation learning is critical in various tasks in biology, such as drug design and protein structure or function prediction, which has primarily benefited from protein language models and graph neural networks. These models can capture intrinsic patterns from protein sequences and structures through masking and task-related losses. However, the learned protein representations are usu… ▽ More Protein representation learning is critical in various tasks in biology, such as drug design and protein structure or function prediction, which has primarily benefited from protein language models and graph neural networks. These models can capture intrinsic patterns from protein sequences and structures through masking and task-related losses. However, the learned protein representations are usually not well optimized, leading to performance degradation due to limited data, difficulty adapting to new tasks, etc. To address this, we propose a new \underline{d}eep \underline{m}anifold \underline{t}ransformation approach for universal \underline{p}rotein \underline{r}epresentation \underline{l}earning (DMTPRL). It employs manifold learning strategies to improve the quality and adaptability of the learned embeddings. Specifically, we apply a novel manifold learning loss during training based on the graph inter-node similarity. Our proposed DMTPRL method outperforms state-of-the-art baselines on diverse downstream tasks across popular datasets. This validates our approach for learning universal and robust protein representations. We promise to release the code after acceptance. △ Less

Submitted 12 January, 2024; originally announced February 2024.

Comments: This work has been accepted by ICASSP 2024

arXiv:2402.09325 [pdf, other]

PC-NeRF: Parent-Child Neural Radiance Fields Using Sparse LiDAR Frames in Autonomous Driving Environments

Authors: Xiuzhong Hu, Guangming Xiong, Zheng Zang, Peng Jia, Yuxuan Han, Junyi Ma

Abstract: Large-scale 3D scene reconstruction and novel view synthesis are vital for autonomous vehicles, especially utilizing temporally sparse LiDAR frames. However, conventional explicit representations remain a significant bottleneck towards representing the reconstructed and synthetic scenes at unlimited resolution. Although the recently developed neural radiance fields (NeRF) have shown compelling res… ▽ More Large-scale 3D scene reconstruction and novel view synthesis are vital for autonomous vehicles, especially utilizing temporally sparse LiDAR frames. However, conventional explicit representations remain a significant bottleneck towards representing the reconstructed and synthetic scenes at unlimited resolution. Although the recently developed neural radiance fields (NeRF) have shown compelling results in implicit representations, the problem of large-scale 3D scene reconstruction and novel view synthesis using sparse LiDAR frames remains unexplored. To bridge this gap, we propose a 3D scene reconstruction and novel view synthesis framework called parent-child neural radiance field (PC-NeRF). Based on its two modules, parent NeRF and child NeRF, the framework implements hierarchical spatial partitioning and multi-level scene representation, including scene, segment, and point levels. The multi-level scene representation enhances the efficient utilization of sparse LiDAR point cloud data and enables the rapid acquisition of an approximate volumetric scene representation. With extensive experiments, PC-NeRF is proven to achieve high-precision novel LiDAR view synthesis and 3D reconstruction in large-scale scenes. Moreover, PC-NeRF can effectively handle situations with sparse LiDAR frames and demonstrate high deployment efficiency with limited training epochs. Our approach implementation and the pre-trained models are available at https://github.com/biter0088/pc-nerf. △ Less

Submitted 14 February, 2024; originally announced February 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2310.00874

arXiv:2402.05441 [pdf]

Spiking Neural Network Enhanced Hand Gesture Recognition Using Low-Cost Single-photon Avalanche Diode Array

Authors: Zhenya Zang, Xingda Li, David Day Uei Li

Abstract: We present a compact spiking convolutional neural network (SCNN) and spiking multilayer perceptron (SMLP) to recognize ten different gestures in dark and bright light environments, using a $9.6 single-photon avalanche diode (SPAD) array. In our hand gesture recognition (HGR) system, photon intensity data was leveraged to train and test the network. A vanilla convolutional neural network (CNN) was… ▽ More We present a compact spiking convolutional neural network (SCNN) and spiking multilayer perceptron (SMLP) to recognize ten different gestures in dark and bright light environments, using a $9.6 single-photon avalanche diode (SPAD) array. In our hand gesture recognition (HGR) system, photon intensity data was leveraged to train and test the network. A vanilla convolutional neural network (CNN) was also implemented to compare the performance of SCNN with the same network topologies and training strategies. Our SCNN was trained from scratch instead of being converted from the CNN. We tested the three models in dark and ambient light (AL)-corrupted environments. The results indicate that SCNN achieves comparable accuracy (90.8%) to CNN (92.9%) and exhibits lower floating operations with only 8 timesteps. SMLP also presents a trade-off between computational workload and accuracy. The code and collected datasets of this work are available at https://github.com/zzy666666zzy/TinyLiDAR_NET_SNN. △ Less

Submitted 8 February, 2024; originally announced February 2024.

Comments: 9 pages, 5 figures

arXiv:2401.10973 [pdf, other]

T2MAC: Targeted and Trusted Multi-Agent Communication through Selective Engagement and Evidence-Driven Integration

Authors: Chuxiong Sun, Zehua Zang, Jiabao Li, Jiangmeng Li, Xiao Xu, Rui Wang, Changwen Zheng

Abstract: Communication stands as a potent mechanism to harmonize the behaviors of multiple agents. However, existing works primarily concentrate on broadcast communication, which not only lacks practicality, but also leads to information redundancy. This surplus, one-fits-all information could adversely impact the communication efficiency. Furthermore, existing works often resort to basic mechanisms to int… ▽ More Communication stands as a potent mechanism to harmonize the behaviors of multiple agents. However, existing works primarily concentrate on broadcast communication, which not only lacks practicality, but also leads to information redundancy. This surplus, one-fits-all information could adversely impact the communication efficiency. Furthermore, existing works often resort to basic mechanisms to integrate observed and received information, impairing the learning process. To tackle these difficulties, we propose Targeted and Trusted Multi-Agent Communication (T2MAC), a straightforward yet effective method that enables agents to learn selective engagement and evidence-driven integration. With T2MAC, agents have the capability to craft individualized messages, pinpoint ideal communication windows, and engage with reliable partners, thereby refining communication efficiency. Following the reception of messages, the agents integrate information observed and received from different sources at an evidence level. This process enables agents to collectively use evidence garnered from multiple perspectives, fostering trusted and cooperative behaviors. We evaluate our method on a diverse set of cooperative multi-agent tasks, with varying difficulties, involving different scales and ranging from Hallway, MPE to SMAC. The experiments indicate that the proposed model not only surpasses the state-of-the-art methods in terms of cooperative performance and communication efficiency, but also exhibits impressive generalization. △ Less

Submitted 19 January, 2024; originally announced January 2024.

Comments: AAAI24

arXiv:2401.10249 [pdf]

doi 10.1109/FPL60245.2023.00062

Building a Reusable and Extensible Automatic Compiler Infrastructure for Reconfigurable Devices

Authors: Zhenya Zang, Uwe Dolinsky, Pietro Ghiglio, Stefano Cherubin, Mehdi Goli, Shufan Yang

Abstract: Multi-Level Intermediate Representation (MLIR) is gaining increasing attention in reconfigurable hardware communities due to its capability to represent various abstract levels for software compilers. This project aims to be the first to provide an end-to-end framework that leverages open-source, cross-platform compilation technology to generate MLIR from SYCL. Additionally, it aims to explore a l… ▽ More Multi-Level Intermediate Representation (MLIR) is gaining increasing attention in reconfigurable hardware communities due to its capability to represent various abstract levels for software compilers. This project aims to be the first to provide an end-to-end framework that leverages open-source, cross-platform compilation technology to generate MLIR from SYCL. Additionally, it aims to explore a lowering pipeline that converts MLIR to RTL using open-source hardware intermediate representation (IR) and compilers. Furthermore, it aims to couple the generated hardware module with the host CPU using vendor-specific crossbars. Our preliminary results demonstrated the feasibility of lowering customized MLIR to RTL, thus paving the way for an end-to-end compilation. △ Less

Submitted 14 December, 2023; originally announced January 2024.

Comments: 2023 33rd International Conference on Field-Programmable Logic and Applications (FPL)

arXiv:2401.07543 [pdf, other]

Must: Maximizing Latent Capacity of Spatial Transcriptomics Data

Authors: Zelin Zang, Liangyu Li, Yongjie Xu, Chenrui Duan, Kai Wang, Yang You, Yi Sun, Stan Z. Li

Abstract: Spatial transcriptomics (ST) technologies have revolutionized the study of gene expression patterns in tissues by providing multimodality data in transcriptomic, spatial, and morphological, offering opportunities for understanding tissue biology beyond transcriptomics. However, we identify the modality bias phenomenon in ST data species, i.e., the inconsistent contribution of different modalities… ▽ More Spatial transcriptomics (ST) technologies have revolutionized the study of gene expression patterns in tissues by providing multimodality data in transcriptomic, spatial, and morphological, offering opportunities for understanding tissue biology beyond transcriptomics. However, we identify the modality bias phenomenon in ST data species, i.e., the inconsistent contribution of different modalities to the labels leads to a tendency for the analysis methods to retain the information of the dominant modality. How to mitigate the adverse effects of modality bias to satisfy various downstream tasks remains a fundamental challenge. This paper introduces Multiple-modality Structure Transformation, named MuST, a novel methodology to tackle the challenge. MuST integrates the multi-modality information contained in the ST data effectively into a uniform latent space to provide a foundation for all the downstream tasks. It learns intrinsic local structures by topology discovery strategy and topology fusion loss function to solve the inconsistencies among different modalities. Thus, these topology-based and deep learning techniques provide a solid foundation for a variety of analytical tasks while coordinating different modalities. The effectiveness of MuST is assessed by performance metrics and biological significance. The results show that it outperforms existing state-of-the-art methods with clear advantages in the precision of identifying and preserving structures of tissues and biomarkers. MuST offers a versatile toolkit for the intricate analysis of complex biological systems. △ Less

Submitted 15 January, 2024; originally announced January 2024.

Comments: 30 pages and 6 figures, plus 27 pages and 14 figures in appendices

arXiv:2401.06727 [pdf, other]

Deep Manifold Graph Auto-Encoder for Attributed Graph Embedding

Authors: Bozhen Hu, Zelin Zang, Jun Xia, Lirong Wu, Cheng Tan, Stan Z. Li

Abstract: Representing graph data in a low-dimensional space for subsequent tasks is the purpose of attributed graph embedding. Most existing neural network approaches learn latent representations by minimizing reconstruction errors. Rare work considers the data distribution and the topological structure of latent codes simultaneously, which often results in inferior embeddings in real-world graph data. Thi… ▽ More Representing graph data in a low-dimensional space for subsequent tasks is the purpose of attributed graph embedding. Most existing neural network approaches learn latent representations by minimizing reconstruction errors. Rare work considers the data distribution and the topological structure of latent codes simultaneously, which often results in inferior embeddings in real-world graph data. This paper proposes a novel Deep Manifold (Variational) Graph Auto-Encoder (DMVGAE/DMGAE) method for attributed graph data to improve the stability and quality of learned representations to tackle the crowding problem. The node-to-node geodesic similarity is preserved between the original and latent space under a pre-defined distribution. The proposed method surpasses state-of-the-art baseline algorithms by a significant margin on different downstream tasks across popular datasets, which validates our solutions. We promise to release the code after acceptance. △ Less

Submitted 12 January, 2024; originally announced January 2024.

Comments: This work has been accepted by ICASSP2023, due to download limitations, we upload this work here

Journal ref: In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 1-5). IEEE

arXiv:2401.05578

Fast Cerebral Blood Flow Analysis via Extreme Learning Machine

Authors: Xi Chen, Zhenya Zang, Xingda Li

Abstract: We introduce a rapid and precise analytical approach for analyzing cerebral blood flow (CBF) using Diffuse Correlation Spectroscopy (DCS) with the application of the Extreme Learning Machine (ELM). Our evaluation of ELM and existing algorithms involves a comprehensive set of metrics. We assess these algorithms using synthetic datasets for both semi-infinite and multi-layer models. The results demo… ▽ More We introduce a rapid and precise analytical approach for analyzing cerebral blood flow (CBF) using Diffuse Correlation Spectroscopy (DCS) with the application of the Extreme Learning Machine (ELM). Our evaluation of ELM and existing algorithms involves a comprehensive set of metrics. We assess these algorithms using synthetic datasets for both semi-infinite and multi-layer models. The results demonstrate that ELM consistently achieves higher fidelity across various noise levels and optical parameters, showcasing robust generalization ability and outperforming iterative fitting algorithms. Through a comparison with a computationally efficient neural network, ELM attains comparable accuracy with reduced training and inference times. Notably, the absence of a back-propagation process in ELM during training results in significantly faster training speeds compared to existing neural network approaches. This proposed strategy holds promise for edge computing applications with online training capabilities. △ Less

Submitted 1 February, 2024; v1 submitted 10 January, 2024; originally announced January 2024.

Comments: Not ready to submission. Need further correction

arXiv:2401.02713 [pdf, other]

Graph-level Protein Representation Learning by Structure Knowledge Refinement

Authors: Ge Wang, Zelin Zang, Jiangbin Zheng, Jun Xia, Stan Z. Li

Abstract: This paper focuses on learning representation on the whole graph level in an unsupervised manner. Learning graph-level representation plays an important role in a variety of real-world issues such as molecule property prediction, protein structure feature extraction, and social network analysis. The mainstream method is utilizing contrastive learning to facilitate graph feature extraction, known a… ▽ More This paper focuses on learning representation on the whole graph level in an unsupervised manner. Learning graph-level representation plays an important role in a variety of real-world issues such as molecule property prediction, protein structure feature extraction, and social network analysis. The mainstream method is utilizing contrastive learning to facilitate graph feature extraction, known as Graph Contrastive Learning (GCL). GCL, although effective, suffers from some complications in contrastive learning, such as the effect of false negative pairs. Moreover, augmentation strategies in GCL are weakly adaptive to diverse graph datasets. Motivated by these problems, we propose a novel framework called Structure Knowledge Refinement (SKR) which uses data structure to determine the probability of whether a pair is positive or negative. Meanwhile, we propose an augmentation strategy that naturally preserves the semantic meaning of the original data and is compatible with our SKR framework. Furthermore, we illustrate the effectiveness of our SKR framework through intuition and experiments. The experimental results on the tasks of graph-level classification demonstrate that our SKR framework is superior to most state-of-the-art baselines. △ Less

Submitted 5 January, 2024; originally announced January 2024.

arXiv:2312.08374 [pdf, other]

doi 10.1016/j.knosys.2023.111225

Unsupervised Social Event Detection via Hybrid Graph Contrastive Learning and Reinforced Incremental Clustering

Authors: Yuanyuan Guo, Zehua Zang, Hang Gao, Xiao Xu, Rui Wang, Lixiang Liu, Jiangmeng Li

Abstract: Detecting events from social media data streams is gradually attracting researchers. The innate challenge for detecting events is to extract discriminative information from social media data thereby assigning the data into different events. Due to the excessive diversity and high updating frequency of social data, using supervised approaches to detect events from social messages is hardly achieved… ▽ More Detecting events from social media data streams is gradually attracting researchers. The innate challenge for detecting events is to extract discriminative information from social media data thereby assigning the data into different events. Due to the excessive diversity and high updating frequency of social data, using supervised approaches to detect events from social messages is hardly achieved. To this end, recent works explore learning discriminative information from social messages by leveraging graph contrastive learning (GCL) and embedding clustering in an unsupervised manner. However, two intrinsic issues exist in benchmark methods: conventional GCL can only roughly explore partial attributes, thereby insufficiently learning the discriminative information of social messages; for benchmark methods, the learned embeddings are clustered in the latent space by taking advantage of certain specific prior knowledge, which conflicts with the principle of unsupervised learning paradigm. In this paper, we propose a novel unsupervised social media event detection method via hybrid graph contrastive learning and reinforced incremental clustering (HCRC), which uses hybrid graph contrastive learning to comprehensively learn semantic and structural discriminative information from social messages and reinforced incremental clustering to perform efficient clustering in a solidly unsupervised manner. We conduct comprehensive experiments to evaluate HCRC on the Twitter and Maven datasets. The experimental results demonstrate that our approach yields consistent significant performance boosts. In traditional incremental setting, semi-supervised incremental setting and solidly unsupervised setting, the model performance has achieved maximum improvements of 53%, 45%, and 37%, respectively. △ Less

Submitted 15 December, 2023; v1 submitted 8 December, 2023; originally announced December 2023.

Comments: Accepted by Knowledge-Based Systems

arXiv:2310.08986 [pdf, other]

VCL Challenges 2023 at ICCV 2023 Technical Report: Bi-level Adaptation Method for Test-time Adaptive Object Detection

Authors: Chenyu Lin, Yusheng He, Zhengqing Zang, Chenwei Tang, Tao Wang, Jiancheng Lv

Abstract: This report outlines our team's participation in VCL Challenges B Continual Test_time Adaptation, focusing on the technical details of our approach. Our primary focus is Testtime Adaptation using bi_level adaptations, encompassing image_level and detector_level adaptations. At the image level, we employ adjustable parameterbased image filters, while at the detector level, we leverage adjustable pa… ▽ More This report outlines our team's participation in VCL Challenges B Continual Test_time Adaptation, focusing on the technical details of our approach. Our primary focus is Testtime Adaptation using bi_level adaptations, encompassing image_level and detector_level adaptations. At the image level, we employ adjustable parameterbased image filters, while at the detector level, we leverage adjustable parameterbased mean teacher modules. Ultimately, through the utilization of these bi_level adaptations, we have achieved a remarkable 38.3% mAP on the target domain of the test set within VCL Challenges B. It is worth noting that the minimal drop in mAP, is mearly 4.2%, and the overall performance is 32.5% mAP. △ Less

Submitted 13 October, 2023; originally announced October 2023.

arXiv:2310.00874 [pdf, other]

PC-NeRF: Parent-Child Neural Radiance Fields under Partial Sensor Data Loss in Autonomous Driving Environments

Authors: Xiuzhong Hu, Guangming Xiong, Zheng Zang, Peng Jia, Yuxuan Han, Junyi Ma

Abstract: Reconstructing large-scale 3D scenes is essential for autonomous vehicles, especially when partial sensor data is lost. Although the recently developed neural radiance fields (NeRF) have shown compelling results in implicit representations, the large-scale 3D scene reconstruction using partially lost LiDAR point cloud data still needs to be explored. To bridge this gap, we propose a novel 3D scene… ▽ More Reconstructing large-scale 3D scenes is essential for autonomous vehicles, especially when partial sensor data is lost. Although the recently developed neural radiance fields (NeRF) have shown compelling results in implicit representations, the large-scale 3D scene reconstruction using partially lost LiDAR point cloud data still needs to be explored. To bridge this gap, we propose a novel 3D scene reconstruction framework called parent-child neural radiance field (PC-NeRF). The framework comprises two modules, the parent NeRF and the child NeRF, to simultaneously optimize scene-level, segment-level, and point-level scene representations. Sensor data can be utilized more efficiently by leveraging the segment-level representation capabilities of child NeRFs, and an approximate volumetric representation of the scene can be quickly obtained even with limited observations. With extensive experiments, our proposed PC-NeRF is proven to achieve high-precision 3D reconstruction in large-scale scenes. Moreover, PC-NeRF can effectively tackle situations where partial sensor data is lost and has high deployment efficiency with limited training time. Our approach implementation and the pre-trained models will be available at https://github.com/biter0088/pc-nerf. △ Less

Submitted 1 October, 2023; originally announced October 2023.

arXiv:2309.07909 [pdf, other]

DiffAug: Enhance Unsupervised Contrastive Learning with Domain-Knowledge-Free Diffusion-based Data Augmentation

Authors: Zelin Zang, Hao Luo, Kai Wang, Panpan Zhang, Fan Wang, Stan. Z Li, Yang You

Abstract: Unsupervised Contrastive learning has gained prominence in fields such as vision, and biology, leveraging predefined positive/negative samples for representation learning. Data augmentation, categorized into hand-designed and model-based methods, has been identified as a crucial component for enhancing contrastive learning. However, hand-designed methods require human expertise in domain-specific… ▽ More Unsupervised Contrastive learning has gained prominence in fields such as vision, and biology, leveraging predefined positive/negative samples for representation learning. Data augmentation, categorized into hand-designed and model-based methods, has been identified as a crucial component for enhancing contrastive learning. However, hand-designed methods require human expertise in domain-specific data while sometimes distorting the meaning of the data. In contrast, generative model-based approaches usually require supervised or large-scale external data, which has become a bottleneck constraining model training in many domains. To address the problems presented above, this paper proposes DiffAug, a novel unsupervised contrastive learning technique with diffusion mode-based positive data generation. DiffAug consists of a semantic encoder and a conditional diffusion model; the conditional diffusion model generates new positive samples conditioned on the semantic encoding to serve the training of unsupervised contrast learning. With the help of iterative training of the semantic encoder and diffusion model, DiffAug improves the representation ability in an uninterrupted and unsupervised manner. Experimental evaluations show that DiffAug outperforms hand-designed and SOTA model-based augmentation methods on DNA sequence, visual, and bio-feature datasets. The code for review is released at \url{https://github.com/zangzelin/code_diffaug}. △ Less

Submitted 25 May, 2024; v1 submitted 10 September, 2023; originally announced September 2023.

Comments: accepted by ICML24

arXiv:2303.17108 [pdf, other]

B-spline freeform surface tailoring for prescribed irradiance based on differentiable ray-tracing

Authors: Haoqiang Wang, Zihan Zang, Yunpeng Xu, Yanjun Han, Hongtao Li, Yi Luo

Abstract: A universal and flexible design method for freeform surface that can modulate the distribution of an zero-étendue source to an arbitrary irradiance distribution is a significant challenge in the field of non-imaging optics. Current design methods typically formulate the problem as a partial differential equation and solve it through sophisticated numerical methods, especially for off-axis situatio… ▽ More A universal and flexible design method for freeform surface that can modulate the distribution of an zero-étendue source to an arbitrary irradiance distribution is a significant challenge in the field of non-imaging optics. Current design methods typically formulate the problem as a partial differential equation and solve it through sophisticated numerical methods, especially for off-axis situations. However, most of the current methods are unsuitable for directly solving multi-freeform surface or hybrid design problems that contains both freeform and spherical surfaces. To address these challenges, we propose the B-spline surface tailoring method, based on a differentiable ray-tracing algorithm. Our method features a computationally efficient B-spline model and a two-step optimization strategy based on optimal transport mapping. This allows for rapid, iterative adjustments to the surface shape based on deviations between the simulated and target distributions while ensuring a smooth resulting surface shape. In experiments, the proposed approach performs well in both paraxial and off-axis situations, and exhibits superior flexibility when applied to hybrid design case. △ Less

Submitted 29 March, 2023; originally announced March 2023.

arXiv:2303.13694 [pdf, other]

Ensemble Gaussian Processes for Adaptive Autonomous Driving on Multi-friction Surfaces

Authors: Tomáš Nagy, Ahmad Amine, Truong X. Nghiem, Ugo Rosolia, Zirui Zang, Rahul Mangharam

Abstract: Driving under varying road conditions is challenging, especially for autonomous vehicles that must adapt in real-time to changes in the environment, e.g., rain, snow, etc. It is difficult to apply offline learning-based methods in these time-varying settings, as the controller should be trained on datasets representing all conditions it might encounter in the future. While online learning may adap… ▽ More Driving under varying road conditions is challenging, especially for autonomous vehicles that must adapt in real-time to changes in the environment, e.g., rain, snow, etc. It is difficult to apply offline learning-based methods in these time-varying settings, as the controller should be trained on datasets representing all conditions it might encounter in the future. While online learning may adapt a model from real-time data, its convergence is often too slow for fast varying road conditions. We study this problem in autonomous racing, where driving at the limits of handling under varying road conditions is required for winning races. We propose a computationally-efficient approach that leverages an ensemble of Gaussian processes (GPs) to generalize and adapt pre-trained GPs to unseen conditions. Each GP is trained on driving data with a different road surface friction. A time-varying convex combination of these GPs is used within a model predictive control (MPC) framework, where the model weights are adapted online to the current road condition based on real-time data. The predictive variance of the ensemble Gaussian process (EGP) model allows the controller to account for prediction uncertainty and enables safe autonomous driving. Extensive simulations of a full scale autonomous car demonstrated the effectiveness of our proposed EGP-MPC method for providing good tracking performance in varying road conditions and the ability to generalize to unknown maps. △ Less

Submitted 26 May, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

Comments: 8 pages, 12 figures, accepted for publication in IFAC World Congress 2023

arXiv:2212.00544 [pdf, other]

Towards Explainability in Modular Autonomous Vehicle Software

Authors: Hongrui Zheng, Zirui Zang, Shuo Yang, Rahul Mangharam

Abstract: Safety-critical Autonomous Systems require trustworthy and transparent decision-making process to be deployable in the real world. The advancement of Machine Learning introduces high performance but largely through black-box algorithms. We focus the discussion of explainability specifically with Autonomous Vehicles (AVs). As a safety-critical system, AVs provide the unique opportunity to utilize c… ▽ More Safety-critical Autonomous Systems require trustworthy and transparent decision-making process to be deployable in the real world. The advancement of Machine Learning introduces high performance but largely through black-box algorithms. We focus the discussion of explainability specifically with Autonomous Vehicles (AVs). As a safety-critical system, AVs provide the unique opportunity to utilize cutting-edge Machine Learning techniques while requiring transparency in decision making. Interpretability in every action the AV takes becomes crucial in post-hoc analysis where blame assignment might be necessary. In this paper, we provide positioning on how researchers could consider incorporating explainability and interpretability into design and optimization of separate Autonomous Vehicle modules including Perception, Planning, and Control. △ Less

Submitted 1 December, 2022; originally announced December 2022.

arXiv:2211.15478 [pdf, other]

EVNet: An Explainable Deep Network for Dimension Reduction

Authors: Zelin Zang, Shenghui Cheng, Linyan Lu, Hanchen Xia, Liangyu Li, Yaoting Sun, Yongjie Xu, Lei Shang, Baigui Sun, Stan Z. Li

Abstract: Dimension reduction (DR) is commonly utilized to capture the intrinsic structure and transform high-dimensional data into low-dimensional space while retaining meaningful properties of the original data. It is used in various applications, such as image recognition, single-cell sequencing analysis, and biomarker discovery. However, contemporary parametric-free and parametric DR techniques suffer f… ▽ More Dimension reduction (DR) is commonly utilized to capture the intrinsic structure and transform high-dimensional data into low-dimensional space while retaining meaningful properties of the original data. It is used in various applications, such as image recognition, single-cell sequencing analysis, and biomarker discovery. However, contemporary parametric-free and parametric DR techniques suffer from several significant shortcomings, such as the inability to preserve global and local features and the pool generalization performance. On the other hand, regarding explainability, it is crucial to comprehend the embedding process, especially the contribution of each part to the embedding process, while understanding how each feature affects the embedding results that identify critical components and help diagnose the embedding process. To address these problems, we have developed a deep neural network method called EVNet, which provides not only excellent performance in structural maintainability but also explainability to the DR therein. EVNet starts with data augmentation and a manifold-based loss function to improve embedding performance. The explanation is based on saliency maps and aims to examine the trained EVNet parameters and contributions of components during the embedding process. The proposed techniques are integrated with a visual interface to help the user to adjust EVNet to achieve better DR performance and explainability. The interactive visual interface makes it easier to illustrate the data features, compare different DR techniques, and investigate DR. An in-depth experimental comparison shows that EVNet consistently outperforms the state-of-the-art methods in both performance measures and explainability. △ Less

Submitted 21 November, 2022; originally announced November 2022.

Comments: 18 pages, 15 figures, accepted by TVCG

arXiv:2211.11262 [pdf, other]

Boosting Novel Category Discovery Over Domains with Soft Contrastive Learning and All-in-One Classifier

Authors: Zelin Zang, Lei Shang, Senqiao Yang, Fei Wang, Baigui Sun, Xuansong Xie, Stan Z. Li

Abstract: Unsupervised domain adaptation (UDA) has proven to be highly effective in transferring knowledge from a label-rich source domain to a label-scarce target domain. However, the presence of additional novel categories in the target domain has led to the development of open-set domain adaptation (ODA) and universal domain adaptation (UNDA). Existing ODA and UNDA methods treat all novel categories as a… ▽ More Unsupervised domain adaptation (UDA) has proven to be highly effective in transferring knowledge from a label-rich source domain to a label-scarce target domain. However, the presence of additional novel categories in the target domain has led to the development of open-set domain adaptation (ODA) and universal domain adaptation (UNDA). Existing ODA and UNDA methods treat all novel categories as a single, unified unknown class and attempt to detect it during training. However, we found that domain variance can lead to more significant view-noise in unsupervised data augmentation, which affects the effectiveness of contrastive learning (CL) and causes the model to be overconfident in novel category discovery. To address these issues, a framework named Soft-contrastive All-in-one Network (SAN) is proposed for ODA and UNDA tasks. SAN includes a novel data-augmentation-based soft contrastive learning (SCL) loss to fine-tune the backbone for feature transfer and a more human-intuitive classifier to improve new class discovery capability. The SCL loss weakens the adverse effects of the data augmentation view-noise problem which is amplified in domain transfer tasks. The All-in-One (AIO) classifier overcomes the overconfidence problem of current mainstream closed-set and open-set classifiers. Visualization and ablation experiments demonstrate the effectiveness of the proposed innovations. Furthermore, extensive experiment results on ODA and UNDA show that SAN outperforms existing state-of-the-art methods. △ Less

Submitted 23 July, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

Comments: Accepted by ICCV

arXiv:2211.03576 [pdf, ps, other]

DAD vision: opto-electronic co-designed computer vision with division adjoint method

Authors: Zihan Zang, Haoqiang Wang, Yunpeng Xu

Abstract: The miniaturization and mobility of computer vision systems are limited by the heavy computational burden and the size of optical lenses. Here, we propose to use a ultra-thin diffractive optical element to implement passive optical convolution. A division adjoint opto-electronic co-design method is also proposed. In our simulation experiments, the first few convolutional layers of the neural netwo… ▽ More The miniaturization and mobility of computer vision systems are limited by the heavy computational burden and the size of optical lenses. Here, we propose to use a ultra-thin diffractive optical element to implement passive optical convolution. A division adjoint opto-electronic co-design method is also proposed. In our simulation experiments, the first few convolutional layers of the neural network can be replaced by optical convolution in a classification task on the CIFAR-10 dataset with no power consumption, while similar performance can be obtained. △ Less

Submitted 4 November, 2022; originally announced November 2022.

arXiv:2209.11925 [pdf, other]

Local_INN: Implicit Map Representation and Localization with Invertible Neural Networks

Authors: Zirui Zang, Hongrui Zheng, Johannes Betz, Rahul Mangharam

Abstract: Robot localization is an inverse problem of finding a robot's pose using a map and sensor measurements. In recent years, Invertible Neural Networks (INNs) have successfully solved ambiguous inverse problems in various fields. This paper proposes a framework that solves the localization problem with INN. We design an INN that provides implicit map representation in the forward path and localization… ▽ More Robot localization is an inverse problem of finding a robot's pose using a map and sensor measurements. In recent years, Invertible Neural Networks (INNs) have successfully solved ambiguous inverse problems in various fields. This paper proposes a framework that solves the localization problem with INN. We design an INN that provides implicit map representation in the forward path and localization in the inverse path. By sampling the latent space in evaluation, Local\_INN outputs robot poses with covariance, which can be used to estimate the uncertainty. We show that the localization performance of Local\_INN is on par with current methods with much lower latency. We show detailed 2D and 3D map reconstruction from Local\_INN using poses exterior to the training set. We also provide a global localization algorithm using Local\_INN to tackle the kidnapping problem. △ Less

Submitted 24 September, 2022; originally announced September 2022.

arXiv:2209.11181 [pdf, other]

Teaching Autonomous Systems Hands-On: Leveraging Modular Small-Scale Hardware in the Robotics Classroom

Authors: Johannes Betz, Hongrui Zheng, Zirui Zang, Florian Sauerbeck, Krzysztof Walas, Velin Dimitrov, Madhur Behl, Rosa Zheng, Joydeep Biswas, Venkat Krovi, Rahul Mangharam

Abstract: Although robotics courses are well established in higher education, the courses often focus on theory and sometimes lack the systematic coverage of the techniques involved in developing, deploying, and applying software to real hardware. Additionally, most hardware platforms for robotics teaching are low-level toys aimed at younger students at middle-school levels. To address this gap, an autonomo… ▽ More Although robotics courses are well established in higher education, the courses often focus on theory and sometimes lack the systematic coverage of the techniques involved in developing, deploying, and applying software to real hardware. Additionally, most hardware platforms for robotics teaching are low-level toys aimed at younger students at middle-school levels. To address this gap, an autonomous vehicle hardware platform, called F1TENTH, is developed for teaching autonomous systems hands-on. This article describes the teaching modules and software stack for teaching at various educational levels with the theme of "racing" and competitions that replace exams. The F1TENTH vehicles offer a modular hardware platform and its related software for teaching the fundamentals of autonomous driving algorithms. From basic reactive methods to advanced planning algorithms, the teaching modules enhance students' computational thinking through autonomous driving with the F1TENTH vehicle. The F1TENTH car fills the gap between research platforms and low-end toy cars and offers hands-on experience in learning the topics in autonomous systems. Four universities have adopted the teaching modules for their semester-long undergraduate and graduate courses for multiple years. Student feedback is used to analyze the effectiveness of the F1TENTH platform. More than 80% of the students strongly agree that the hardware platform and modules greatly motivate their learning, and more than 70% of the students strongly agree that the hardware-enhanced their understanding of the subjects. The survey results show that more than 80% of the students strongly agree that the competitions motivate them for the course. △ Less

Submitted 20 September, 2022; originally announced September 2022.

Comments: 15 pages, 12 figures, 3 tables

arXiv:2209.04514 [pdf, other]

doi 10.1145/3551349.3556958

Compiler Testing using Template Java Programs

Authors: Zhiqiang Zang, Nathan Wiatrek, Milos Gligoric, August Shi

Abstract: We present JAttack, a framework that enables template-based testing for compilers. Using JAttack, a developer writes a template program that describes a set of programs to be generated and given as test inputs to a compiler. Such a framework enables developers to incorporate their domain knowledge on testing compilers, giving a basic program structure that allows for exploring complex programs tha… ▽ More We present JAttack, a framework that enables template-based testing for compilers. Using JAttack, a developer writes a template program that describes a set of programs to be generated and given as test inputs to a compiler. Such a framework enables developers to incorporate their domain knowledge on testing compilers, giving a basic program structure that allows for exploring complex programs that can trigger sophisticated compiler optimizations. A developer writes a template program in the host language (Java) that contains holes to be filled by JAttack. Each hole, written using a domain-specific language, constructs a node within an extended abstract syntax tree (eAST). An eAST node defines the search space for the hole, i.e., a set of expressions and values. JAttack generates programs by executing templates and filling each hole by randomly choosing expressions and values (available within the search space defined by the hole). Additionally, we introduce several optimizations to reduce JAttack's generation cost. While JAttack could be used to test various compiler features, we demonstrate its capabilities in helping test just-in-time (JIT) Java compilers, whose optimizations occur at runtime after a sufficient number of executions. Using JAttack, we have found six critical bugs that were confirmed by Oracle developers. Four of them were previously unknown, including two unknown CVEs (Common Vulnerabilities and Exposures). JAttack shows the power of combining developers' domain knowledge (via templates) with random testing to detect bugs in JIT compilers. △ Less

Submitted 9 September, 2022; originally announced September 2022.

Comments: 13 pages, 6 figures, 2 tables, accepted in ASE 2022 (Research Papers track)

arXiv:2207.03809 [pdf, other]

UDRN: Unified Dimensional Reduction Neural Network for Feature Selection and Feature Projection

Authors: Zelin Zang, Yongjie Xu, Linyan Lu, Yulan Geng, Senqiao Yang, Stan Z. Li

Abstract: Dimensional reduction~(DR) maps high-dimensional data into a lower dimensions latent space with minimized defined optimization objectives. The DR method usually falls into feature selection~(FS) and feature projection~(FP). FS focuses on selecting a critical subset of dimensions but risks destroying the data distribution (structure). On the other hand, FP combines all the input features into lower… ▽ More Dimensional reduction~(DR) maps high-dimensional data into a lower dimensions latent space with minimized defined optimization objectives. The DR method usually falls into feature selection~(FS) and feature projection~(FP). FS focuses on selecting a critical subset of dimensions but risks destroying the data distribution (structure). On the other hand, FP combines all the input features into lower dimensions space, aiming to maintain the data structure; but lacks interpretability and sparsity. FS and FP are traditionally incompatible categories; thus, they have not been unified into an amicable framework. We propose that the ideal DR approach combines both FS and FP into a unified end-to-end manifold learning framework, simultaneously performing fundamental feature discovery while maintaining the intrinsic relationships between data samples in the latent space. In this work, we develop a unified framework, Unified Dimensional Reduction Neural-network~(UDRN), that integrates FS and FP in a compatible, end-to-end way. We improve the neural network structure by implementing FS and FP tasks separately using two stacked sub-networks. In addition, we designed data augmentation of the DR process to improve the generalization ability of the method when dealing with extensive feature datasets and designed loss functions that can cooperate with the data augmentation. Extensive experimental results on four image and four biological datasets, including very high-dimensional data, demonstrate the advantages of DRN over existing methods~(FS, FP, and FS\&FP pipeline), especially in downstream tasks such as classification and visualization. △ Less

Submitted 22 November, 2022; v1 submitted 8 July, 2022; originally announced July 2022.

Comments: 14 pages, 7 figures

arXiv:2207.03160 [pdf, other]

DLME: Deep Local-flatness Manifold Embedding

Authors: Zelin Zang, Siyuan Li, Di Wu, Ge Wang, Lei Shang, Baigui Sun, Hao Li, Stan Z. Li

Abstract: Manifold learning (ML) aims to seek low-dimensional embedding from high-dimensional data. The problem is challenging on real-world datasets, especially with under-sampling data, and we find that previous methods perform poorly in this case. Generally, ML methods first transform input data into a low-dimensional embedding space to maintain the data's geometric structure and subsequently perform dow… ▽ More Manifold learning (ML) aims to seek low-dimensional embedding from high-dimensional data. The problem is challenging on real-world datasets, especially with under-sampling data, and we find that previous methods perform poorly in this case. Generally, ML methods first transform input data into a low-dimensional embedding space to maintain the data's geometric structure and subsequently perform downstream tasks therein. The poor local connectivity of under-sampling data in the former step and inappropriate optimization objectives in the latter step leads to two problems: structural distortion and underconstrained embedding. This paper proposes a novel ML framework named Deep Local-flatness Manifold Embedding (DLME) to solve these problems. The proposed DLME constructs semantic manifolds by data augmentation and overcomes the structural distortion problem using a smoothness constrained based on a local flatness assumption about the manifold. To overcome the underconstrained embedding problem, we design a loss and theoretically demonstrate that it leads to a more suitable embedding based on the local flatness. Experiments on three types of datasets (toy, biological, and image) for various downstream tasks (classification, clustering, and visualization) show that our proposed DLME outperforms state-of-the-art ML and contrastive learning methods. △ Less

Submitted 25 July, 2022; v1 submitted 7 July, 2022; originally announced July 2022.

Comments: 16 pages, 7 figures

arXiv:2206.00770 [pdf, other]

doi 10.1109/IV51971.2022.9827162

Winning the 3rd Japan Automotive AI Challenge -- Autonomous Racing with the Autoware.Auto Open Source Software Stack

Authors: Zirui Zang, Renukanandan Tumu, Johannes Betz, Hongrui Zheng, Rahul Mangharam

Abstract: The 3rd Japan Automotive AI Challenge was an international online autonomous racing challenge where 164 teams competed in December 2021. This paper outlines the winning strategy to this competition, and the advantages and challenges of using the Autoware.Auto open source autonomous driving platform for multi-agent racing. Our winning approach includes a lane-switching opponent overtaking strategy,… ▽ More The 3rd Japan Automotive AI Challenge was an international online autonomous racing challenge where 164 teams competed in December 2021. This paper outlines the winning strategy to this competition, and the advantages and challenges of using the Autoware.Auto open source autonomous driving platform for multi-agent racing. Our winning approach includes a lane-switching opponent overtaking strategy, a global raceline optimization, and the integration of various tools from Autoware.Auto including a Model-Predictive Controller. We describe the use of perception, planning and control modules for high-speed racing applications and provide experience-based insights on working with Autoware.Auto. While our approach is a rule-based strategy that is suitable for non-interactive opponents, it provides a good reference and benchmark for learning-enabled approaches. △ Less

Submitted 4 June, 2022; v1 submitted 1 June, 2022; originally announced June 2022.

Comments: Accepted at Autoware Workshop at IV 2022

arXiv:2205.13943 [pdf, other]

Architecture-Agnostic Masked Image Modeling -- From ViT back to CNN

Authors: Siyuan Li, Di Wu, Fang Wu, Zelin Zang, Stan. Z. Li

Abstract: Masked image modeling, an emerging self-supervised pre-training method, has shown impressive success across numerous downstream vision tasks with Vision transformers. Its underlying idea is simple: a portion of the input image is masked out and then reconstructed via a pre-text task. However, the working principle behind MIM is not well explained, and previous studies insist that MIM primarily wor… ▽ More Masked image modeling, an emerging self-supervised pre-training method, has shown impressive success across numerous downstream vision tasks with Vision transformers. Its underlying idea is simple: a portion of the input image is masked out and then reconstructed via a pre-text task. However, the working principle behind MIM is not well explained, and previous studies insist that MIM primarily works for the Transformer family but is incompatible with CNNs. In this work, we observe that MIM essentially teaches the model to learn better middle-order interactions among patches for more generalized feature extraction. We then propose an Architecture-Agnostic Masked Image Modeling framework (A$^2$MIM), which is compatible with both Transformers and CNNs in a unified way. Extensive experiments on popular benchmarks show that A$^2$MIM learns better representations without explicit design and endows the backbone model with the stronger capability to transfer to various downstream tasks. △ Less

Submitted 2 June, 2023; v1 submitted 27 May, 2022; originally announced May 2022.

Comments: ICML 2023 (poster). The source code will be released in https://github.com/Westlake-AI/A2MIM

arXiv:2203.13754 [pdf]

Fast fluorescence lifetime imaging analysis via extreme learning machine

Authors: Zhenya Zang, Dong Xiao, Quan Wang, Zinuo Li, Wujun Xie, Yu Chen, David Day Uei Li

Abstract: We present a fast and accurate analytical method for fluorescence lifetime imaging microscopy (FLIM) using the extreme learning machine (ELM). We used extensive metrics to evaluate ELM and existing algorithms. First, we compared these algorithms using synthetic datasets. Results indicate that ELM can obtain higher fidelity, even in low-photon conditions. Afterwards, we used ELM to retrieve lifetim… ▽ More We present a fast and accurate analytical method for fluorescence lifetime imaging microscopy (FLIM) using the extreme learning machine (ELM). We used extensive metrics to evaluate ELM and existing algorithms. First, we compared these algorithms using synthetic datasets. Results indicate that ELM can obtain higher fidelity, even in low-photon conditions. Afterwards, we used ELM to retrieve lifetime components from human prostate cancer cells loaded with gold nanosensors, showing that ELM also outperforms the iterative fitting and non-fitting algorithms. By comparing ELM with a computational efficient neural network, ELM achieves comparable accuracy with less training and inference time. As there is no back-propagation process for ELM during the training phase, the training speed is much higher than existing neural network approaches. The proposed strategy is promising for edge computing with online training. △ Less

Submitted 25 March, 2022; originally announced March 2022.

Comments: 14 pages, 9 figures

arXiv:2110.14553 [pdf, other]

GenURL: A General Framework for Unsupervised Representation Learning

Authors: Siyuan Li, Zicheng Liu, Zelin Zang, Di Wu, Zhiyuan Chen, Stan Z. Li

Abstract: Unsupervised representation learning (URL), which learns compact embeddings of high-dimensional data without supervision, has made remarkable progress recently. However, the development of URLs for different requirements is independent, which limits the generalization of the algorithms, especially prohibitive as the number of tasks grows. For example, dimension reduction methods, t-SNE, and UMAP o… ▽ More Unsupervised representation learning (URL), which learns compact embeddings of high-dimensional data without supervision, has made remarkable progress recently. However, the development of URLs for different requirements is independent, which limits the generalization of the algorithms, especially prohibitive as the number of tasks grows. For example, dimension reduction methods, t-SNE, and UMAP optimize pair-wise data relationships by preserving the global geometric structure, while self-supervised learning, SimCLR, and BYOL focus on mining the local statistics of instances under specific augmentations. To address this dilemma, we summarize and propose a unified similarity-based URL framework, GenURL, which can smoothly adapt to various URL tasks. In this paper, we regard URL tasks as different implicit constraints on the data geometric structure that help to seek optimal low-dimensional representations that boil down to data structural modeling (DSM) and low-dimensional transformation (LDT). Specifically, DMS provides a structure-based submodule to describe the global structures, and LDT learns compact low-dimensional embeddings with given pretext tasks. Moreover, an objective function, General Kullback-Leibler divergence (GKL), is proposed to connect DMS and LDT naturally. Comprehensive experiments demonstrate that GenURL achieves consistent state-of-the-art performance in self-supervised visual learning, unsupervised knowledge distillation (KD), graph embeddings (GE), and dimension reduction. △ Less

Submitted 16 April, 2024; v1 submitted 27 October, 2021; originally announced October 2021.

Comments: TNNLS 2024 version with 13 pages and 14 figures

arXiv:2110.10482 [pdf, other]

doi 10.1145/3488560.3498481

Surrogate Representation Learning with Isometric Mapping for Gray-box Graph Adversarial Attacks

Authors: Zihan Liu, Yun Luo, Zelin Zang, Stan Z. Li

Abstract: Gray-box graph attacks aim at disrupting the performance of the victim model by using inconspicuous attacks with limited knowledge of the victim model. The parameters of the victim model and the labels of the test nodes are invisible to the attacker. To obtain the gradient on the node attributes or graph structure, the attacker constructs an imaginary surrogate model trained under supervision. How… ▽ More Gray-box graph attacks aim at disrupting the performance of the victim model by using inconspicuous attacks with limited knowledge of the victim model. The parameters of the victim model and the labels of the test nodes are invisible to the attacker. To obtain the gradient on the node attributes or graph structure, the attacker constructs an imaginary surrogate model trained under supervision. However, there is a lack of discussion on the training of surrogate models and the robustness of provided gradient information. The general node classification model loses the topology of the nodes on the graph, which is, in fact, an exploitable prior for the attacker. This paper investigates the effect of representation learning of surrogate models on the transferability of gray-box graph adversarial attacks. To reserve the topology in the surrogate embedding, we propose Surrogate Representation Learning with Isometric Mapping (SRLIM). By using Isometric mapping method, our proposed SRLIM can constrain the topological structure of nodes from the input layer to the embedding space, that is, to maintain the similarity of nodes in the propagation process. Experiments prove the effectiveness of our approach through the improvement in the performance of the adversarial attacks generated by the gradient-based attacker in untargeted poisoning gray-box setups. △ Less

Submitted 22 February, 2022; v1 submitted 20 October, 2021; originally announced October 2021.

Journal ref: WSDM22: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining February 2022

arXiv:2106.15788 [pdf, other]

Exploring Localization for Self-supervised Fine-grained Contrastive Learning

Authors: Di Wu, Siyuan Li, Zelin Zang, Stan Z. Li

Abstract: Self-supervised contrastive learning has demonstrated great potential in learning visual representations. Despite their success in various downstream tasks such as image classification and object detection, self-supervised pre-training for fine-grained scenarios is not fully explored. We point out that current contrastive methods are prone to memorizing background/foreground texture and therefore… ▽ More Self-supervised contrastive learning has demonstrated great potential in learning visual representations. Despite their success in various downstream tasks such as image classification and object detection, self-supervised pre-training for fine-grained scenarios is not fully explored. We point out that current contrastive methods are prone to memorizing background/foreground texture and therefore have a limitation in localizing the foreground object. Analysis suggests that learning to extract discriminative texture information and localization are equally crucial for fine-grained self-supervised pre-training. Based on our findings, we introduce cross-view saliency alignment (CVSA), a contrastive learning framework that first crops and swaps saliency regions of images as a novel view generation and then guides the model to localize on foreground objects via a cross-view alignment loss. Extensive experiments on both small- and large-scale fine-grained classification benchmarks show that CVSA significantly improves the learned representation. △ Less

Submitted 11 October, 2022; v1 submitted 29 June, 2021; originally announced June 2021.

Comments: BMVC 2022 camera-ready. 15 pages (main) with 5 pages appendix

arXiv:2104.13048 [pdf, other]

Unsupervised Deep Manifold Attributed Graph Embedding

Authors: Zelin Zang, Siyuan Li, Di Wu, Jianzhu Guo, Yongjie Xu, Stan Z. Li

Abstract: Unsupervised attributed graph representation learning is challenging since both structural and feature information are required to be represented in the latent space. Existing methods concentrate on learning latent representation via reconstruction tasks, but cannot directly optimize representation and are prone to oversmoothing, thus limiting the applications on downstream tasks. To alleviate the… ▽ More Unsupervised attributed graph representation learning is challenging since both structural and feature information are required to be represented in the latent space. Existing methods concentrate on learning latent representation via reconstruction tasks, but cannot directly optimize representation and are prone to oversmoothing, thus limiting the applications on downstream tasks. To alleviate these issues, we propose a novel graph embedding framework named Deep Manifold Attributed Graph Embedding (DMAGE). A node-to-node geodesic similarity is proposed to compute the inter-node similarity between the data space and the latent space and then use Bergman divergence as loss function to minimize the difference between them. We then design a new network structure with fewer aggregation to alleviate the oversmoothing problem and incorporate graph structure augmentation to improve the representation's stability. Our proposed DMAGE surpasses state-of-the-art methods by a significant margin on three downstream tasks: unsupervised visualization, node clustering, and link prediction across four popular datasets. △ Less

Submitted 27 April, 2021; originally announced April 2021.

Comments: arXiv admin note: text overlap with arXiv:2007.01594 by other authors

arXiv:2012.00481 [pdf, other]

Consistent Representation Learning for High Dimensional Data Analysis

Authors: Stan Z. Li, Lirong Wu, Zelin Zang

Abstract: High dimensional data analysis for exploration and discovery includes three fundamental tasks: dimensionality reduction, clustering, and visualization. When the three associated tasks are done separately, as is often the case thus far, inconsistencies can occur among the tasks in terms of data geometry and others. This can lead to confusing or misleading data interpretation. In this paper, we prop… ▽ More High dimensional data analysis for exploration and discovery includes three fundamental tasks: dimensionality reduction, clustering, and visualization. When the three associated tasks are done separately, as is often the case thus far, inconsistencies can occur among the tasks in terms of data geometry and others. This can lead to confusing or misleading data interpretation. In this paper, we propose a novel neural network-based method, called Consistent Representation Learning (CRL), to accomplish the three associated tasks end-to-end and improve the consistencies. The CRL network consists of two nonlinear dimensionality reduction (NLDR) transformations: (1) one from the input data space to the latent feature space for clustering, and (2) the other from the clustering space to the final 2D or 3D space for visualization. Importantly, the two NLDR transformations are performed to best satisfy local geometry preserving (LGP) constraints across the spaces or network layers, to improve data consistencies along with the processing flow. Also, we propose a novel metric, clustering-visualization inconsistency (CVI), for evaluating the inconsistencies. Extensive comparative results show that the proposed CRL neural network method outperforms the popular t-SNE and UMAP-based and other contemporary clustering and visualization algorithms in terms of evaluation metrics and visualization. △ Less

Submitted 1 December, 2020; originally announced December 2020.

arXiv:2010.14831 [pdf, other]

Deep Manifold Transformation for Nonlinear Dimensionality Reduction

Authors: Stan Z. Li, Zelin Zang, Lirong Wu

Abstract: Manifold learning-based encoders have been playing important roles in nonlinear dimensionality reduction (NLDR) for data exploration. However, existing methods can often fail to preserve geometric, topological and/or distributional structures of data. In this paper, we propose a deep manifold learning framework, called deep manifold transformation (DMT) for unsupervised NLDR and embedding learning… ▽ More Manifold learning-based encoders have been playing important roles in nonlinear dimensionality reduction (NLDR) for data exploration. However, existing methods can often fail to preserve geometric, topological and/or distributional structures of data. In this paper, we propose a deep manifold learning framework, called deep manifold transformation (DMT) for unsupervised NLDR and embedding learning. DMT enhances deep neural networks by using cross-layer local geometry-preserving (LGP) constraints. The LGP constraints constitute the loss for deep manifold learning and serve as geometric regularizers for NLDR network training. Extensive experiments on synthetic and real-world data demonstrate that DMT networks outperform existing leading manifold-based NLDR methods in terms of preserving the structures of data. △ Less

Submitted 3 May, 2021; v1 submitted 28 October, 2020; originally announced October 2020.

arXiv:2010.04012 [pdf, other]

Invertible Manifold Learning for Dimension Reduction

Authors: Siyuan Li, Haitao Lin, Zelin Zang, Lirong Wu, Jun Xia, Stan Z. Li

Abstract: Dimension reduction (DR) aims to learn low-dimensional representations of high-dimensional data with the preservation of essential information. In the context of manifold learning, we define that the representation after information-lossless DR preserves the topological and geometric properties of data manifolds formally, and propose a novel two-stage DR method, called invertible manifold learning… ▽ More Dimension reduction (DR) aims to learn low-dimensional representations of high-dimensional data with the preservation of essential information. In the context of manifold learning, we define that the representation after information-lossless DR preserves the topological and geometric properties of data manifolds formally, and propose a novel two-stage DR method, called invertible manifold learning (inv-ML) to bridge the gap between theoretical information-lossless and practical DR. The first stage includes a homeomorphic sparse coordinate transformation to learn low-dimensional representations without destroying topology and a local isometry constraint to preserve local geometry. In the second stage, a linear compression is implemented for the trade-off between the target dimension and the incurred information loss in excessive DR scenarios. Experiments are conducted on seven datasets with a neural network implementation of inv-ML, called i-ML-Enc. Empirically, i-ML-Enc achieves invertible DR in comparison with typical existing methods as well as reveals the characteristics of the learned manifolds. Through latent space interpolation on real-world datasets, we find that the reliability of tangent space approximated by the local neighborhood is the key to the success of manifold-based DR algorithms. △ Less

Submitted 30 June, 2021; v1 submitted 7 October, 2020; originally announced October 2020.

Comments: ECML-PKDD 2021 camera-ready. 15 pages (main) with 10 pages appendix

arXiv:2009.09590 [pdf, other]

Generalized Clustering and Multi-Manifold Learning with Geometric Structure Preservation

Authors: Lirong Wu, Zicheng Liu, Zelin Zang, Jun Xia, Siyuan Li, Stan. Z Li

Abstract: Though manifold-based clustering has become a popular research topic, we observe that one important factor has been omitted by these works, namely that the defined clustering loss may corrupt the local and global structure of the latent space. In this paper, we propose a novel Generalized Clustering and Multi-manifold Learning (GCML) framework with geometric structure preservation for generalized… ▽ More Though manifold-based clustering has become a popular research topic, we observe that one important factor has been omitted by these works, namely that the defined clustering loss may corrupt the local and global structure of the latent space. In this paper, we propose a novel Generalized Clustering and Multi-manifold Learning (GCML) framework with geometric structure preservation for generalized data, i.e., not limited to 2-D image data and has a wide range of applications in speech, text, and biology domains. In the proposed framework, manifold clustering is done in the latent space guided by a clustering loss. To overcome the problem that the clustering-oriented loss may deteriorate the geometric structure of the latent space, an isometric loss is proposed for preserving intra-manifold structure locally and a ranking loss for inter-manifold structure globally. Extensive experimental results have shown that GCML exhibits superior performance to counterparts in terms of qualitative visualizations and quantitative metrics, which demonstrates the effectiveness of preserving geometric structure. △ Less

Submitted 8 October, 2021; v1 submitted 20 September, 2020; originally announced September 2020.

arXiv:2006.08256 [pdf, other]

Markov-Lipschitz Deep Learning

Authors: Stan Z. Li, Zelin Zang, Lirong Wu

Abstract: We propose a novel framework, called Markov-Lipschitz deep learning (MLDL), to tackle geometric deterioration caused by collapse, twisting, or crossing in vector-based neural network transformations for manifold-based representation learning and manifold data generation. A prior constraint, called locally isometric smoothness (LIS), is imposed across-layers and encoded into a Markov random field (… ▽ More We propose a novel framework, called Markov-Lipschitz deep learning (MLDL), to tackle geometric deterioration caused by collapse, twisting, or crossing in vector-based neural network transformations for manifold-based representation learning and manifold data generation. A prior constraint, called locally isometric smoothness (LIS), is imposed across-layers and encoded into a Markov random field (MRF)-Gibbs distribution. This leads to the best possible solutions for local geometry preservation and robustness as measured by locally geometric distortion and locally bi-Lipschitz continuity. Consequently, the layer-wise vector transformations are enhanced into well-behaved, LIS-constrained metric homeomorphisms. Extensive experiments, comparisons, and ablation study demonstrate significant advantages of MLDL for manifold learning and manifold data generation. MLDL is general enough to enhance any vector transformation-based networks. The code is available at https://github.com/westlake-cairi/Markov-Lipschitz-Deep-Learning. △ Less

Submitted 30 September, 2020; v1 submitted 15 June, 2020; originally announced June 2020.

arXiv:2003.02631 [pdf, other]

Machine Learning for Predictive Deployment of UAVs with Multiple Access

Authors: Linyan Lu, Zhaohui Yang, Mingzhe Chen, Zelin Zang, Mohammad Shikh-Bahaei

Abstract: In this paper, a machine learning based deployment framework of unmanned aerial vehicles (UAVs) is studied. In the considered model, UAVs are deployed as flying base stations (BS) to offload heavy traffic from ground BSs. Due to time-varying traffic distribution, a long short-term memory (LSTM) based prediction algorithm is introduced to predict the future cellular traffic. To predict the user ser… ▽ More In this paper, a machine learning based deployment framework of unmanned aerial vehicles (UAVs) is studied. In the considered model, UAVs are deployed as flying base stations (BS) to offload heavy traffic from ground BSs. Due to time-varying traffic distribution, a long short-term memory (LSTM) based prediction algorithm is introduced to predict the future cellular traffic. To predict the user service distribution, a KEG algorithm, which is a joint K-means and expectation maximization (EM) algorithm based on Gaussian mixture model (GMM), is proposed for determining the service area of each UAV. Based on the predicted traffic, the optimal UAV positions are derived and three multi-access techniques are compared so as to minimize the total transmit power. Simulation results show that the proposed method can reduce up to 24\% of the total power consumption compared to the conventional method without traffic prediction. Besides, rate splitting multiple access (RSMA) has the lower required transmit power compared to frequency domain multiple access (FDMA) and time domain multiple access (TDMA). △ Less

Submitted 30 July, 2020; v1 submitted 1 March, 2020; originally announced March 2020.

arXiv:1605.09458 [pdf, other]

Training Auto-encoders Effectively via Eliminating Task-irrelevant Input Variables

Authors: Hui Shen, Dehua Li, Hong Wu, Zhaoxiang Zang

Abstract: Auto-encoders are often used as building blocks of deep network classifier to learn feature extractors, but task-irrelevant information in the input data may lead to bad extractors and result in poor generalization performance of the network. In this paper,via dropping the task-irrelevant input variables the performance of auto-encoders can be obviously improved .Specifically, an importance-based… ▽ More Auto-encoders are often used as building blocks of deep network classifier to learn feature extractors, but task-irrelevant information in the input data may lead to bad extractors and result in poor generalization performance of the network. In this paper,via dropping the task-irrelevant input variables the performance of auto-encoders can be obviously improved .Specifically, an importance-based variable selection method is proposed to aim at finding the task-irrelevant input variables and dropping them.It firstly estimates importance of each variable,and then drops the variables with importance value lower than a threshold. In order to obtain better performance, the method can be employed for each layer of stacked auto-encoders. Experimental results show that when combined with our method the stacked denoising auto-encoders achieves significantly improved performance on three challenging datasets. △ Less

Submitted 30 May, 2016; originally announced May 2016.

arXiv:1604.07704 [pdf]

Tournament selection in zeroth-level classifier systems based on average reward reinforcement learning

Authors: Zhaoxiang Zang, Zhao Li, Junying Wang, Zhiping Dan

Abstract: As a genetics-based machine learning technique, zeroth-level classifier system (ZCS) is based on a discounted reward reinforcement learning algorithm, bucket-brigade algorithm, which optimizes the discounted total reward received by an agent but is not suitable for all multi-step problems, especially large-size ones. There are some undiscounted reinforcement learning methods available, such as R-l… ▽ More As a genetics-based machine learning technique, zeroth-level classifier system (ZCS) is based on a discounted reward reinforcement learning algorithm, bucket-brigade algorithm, which optimizes the discounted total reward received by an agent but is not suitable for all multi-step problems, especially large-size ones. There are some undiscounted reinforcement learning methods available, such as R-learning, which optimize the average reward per time step. In this paper, R-learning is used as the reinforcement learning employed by ZCS, to replace its discounted reward reinforcement learning approach, and tournament selection is used to replace roulette wheel selection in ZCS. The modification results in classifier systems that can support long action chains, and thus is able to solve large multi-step problems. △ Less

Submitted 26 April, 2016; originally announced April 2016.

Comments: 14 pages, 3 figures

ACM Class: I.2

arXiv:1211.0424 [pdf]

doi 10.1007/s00500-014-1357-y

Learning classifier systems with memory condition to solve non-Markov problems

Authors: Zhaoxiang Zang, Dehua Li, Junying Wang

Abstract: In the family of Learning Classifier Systems, the classifier system XCS has been successfully used for many applications. However, the standard XCS has no memory mechanism and can only learn optimal policy in Markov environments, where the optimal action is determined solely by the state of current sensory input. In practice, most environments are partially observable environments on agent's sensa… ▽ More In the family of Learning Classifier Systems, the classifier system XCS has been successfully used for many applications. However, the standard XCS has no memory mechanism and can only learn optimal policy in Markov environments, where the optimal action is determined solely by the state of current sensory input. In practice, most environments are partially observable environments on agent's sensation, which are also known as non-Markov environments. Within these environments, XCS either fails, or only develops a suboptimal policy, since it has no memory. In this work, we develop a new classifier system based on XCS to tackle this problem. It adds an internal message list to XCS as the memory list to record input sensation history, and extends a small number of classifiers with memory conditions. The classifier's memory condition, as a foothold to disambiguate non-Markov states, is used to sense a specified element in the memory list. Besides, a detection method is employed to recognize non-Markov states in environments, to avoid these states controlling over classifiers' memory conditions. Furthermore, four sets of different complex maze environments have been tested by the proposed method. Experimental results show that our system is one of the best techniques to solve partially observable environments, compared with some well-known classifier systems proposed for these environments. △ Less

Submitted 2 November, 2012; originally announced November 2012.

Comments: 34 pages, 15 figures, 1 table

Showing 1–50 of 50 results for author: Zang, Z