subscribe to arXiv mailings

Large-Scale Learning on Overlapped Speech Detection: New Benchmark and New General System

Authors: Zhaohui Yin, Jingguang Tian, Xinhui Hu, Xinkang Xu, Yang Xiang

Abstract: Overlapped Speech Detection (OSD) is an important part of speech applications involving analysis of multi-party conversations. However, most of existing OSD systems are trained and evaluated on small datasets with limited application domains, which led to the robustness of them lacks benchmark for evaluation and the accuracy of them remains inadequate in realistic acoustic environments. To solve t… ▽ More Overlapped Speech Detection (OSD) is an important part of speech applications involving analysis of multi-party conversations. However, most of existing OSD systems are trained and evaluated on small datasets with limited application domains, which led to the robustness of them lacks benchmark for evaluation and the accuracy of them remains inadequate in realistic acoustic environments. To solve these problem, we conduct a study of large-scale learning (LSL) in OSD tasks and propose a new general OSD system named CF-OSD with LSL based on Conformer network and LSL. In our study, a large-scale test set consisting of 151h labeled speech of different styles, languages and sound-source distances is produced and used as a new benchmark for evaluating the generality of OSD systems. Rigorous comparative experiments are designed and used to evaluate the effectiveness of LSL in OSD tasks and define the OSD model of our general OSD system. The experiment results show that LSL can significantly improve the accuracy and robustness of OSD systems, and the CF-OSD with LSL system significantly outperforms other OSD systems on our proposed benchmark. Moreover, our system has also achieved state-of-the-art performance on existing small dataset benchmarks, reaching 81.6\% and 53.8\% in the Alimeeting testset and DIHARD II evaluation set, respectively. △ Less

Submitted 7 September, 2023; v1 submitted 11 August, 2023; originally announced August 2023.

arXiv:2308.04805 [pdf, other]

doi 10.1145/3581783.3613750

DiVa: An Iterative Framework to Harvest More Diverse and Valid Labels from User Comments for Music

Authors: Hongru Liang, Jingyao Liu, Yuanxin Xiang, Jiachen Du, Lanjun Zhou, Shushen Pan, Wenqiang Lei

Abstract: Towards sufficient music searching, it is vital to form a complete set of labels for each song. However, current solutions fail to resolve it as they cannot produce diverse enough mappings to make up for the information missed by the gold labels. Based on the observation that such missing information may already be presented in user comments, we propose to study the automated music labeling in an… ▽ More Towards sufficient music searching, it is vital to form a complete set of labels for each song. However, current solutions fail to resolve it as they cannot produce diverse enough mappings to make up for the information missed by the gold labels. Based on the observation that such missing information may already be presented in user comments, we propose to study the automated music labeling in an essential but under-explored setting, where the model is required to harvest more diverse and valid labels from the users' comments given limited gold labels. To this end, we design an iterative framework (DiVa) to harvest more $\underline{\text{Di}}$verse and $\underline{\text{Va}}$lid labels from user comments for music. The framework makes a classifier able to form complete sets of labels for songs via pseudo-labels inferred from pre-trained classifiers and a novel joint score function. The experiment on a densely annotated testing set reveals the superiority of the Diva over state-of-the-art solutions in producing more diverse labels missed by the gold labels. We hope our work can inspire future research on automated music labeling. △ Less

Submitted 9 August, 2023; originally announced August 2023.

Comments: 11 pages, 5 figures, published to ACM MM 2023

arXiv:2308.03152 [pdf, other]

AI-GOMS: Large AI-Driven Global Ocean Modeling System

Authors: Wei Xiong, Yanfei Xiang, Hao Wu, Shuyi Zhou, Yuze Sun, Muyuan Ma, Xiaomeng Huang

Abstract: Ocean modeling is a powerful tool for simulating the physical, chemical, and biological processes of the ocean, which is the foundation for marine science research and operational oceanography. Modern numerical ocean modeling mainly consists of governing equations and numerical algorithms. Nonlinear instability, computational expense, low reusability efficiency and high coupling costs have gradual… ▽ More Ocean modeling is a powerful tool for simulating the physical, chemical, and biological processes of the ocean, which is the foundation for marine science research and operational oceanography. Modern numerical ocean modeling mainly consists of governing equations and numerical algorithms. Nonlinear instability, computational expense, low reusability efficiency and high coupling costs have gradually become the main bottlenecks for the further development of numerical ocean modeling. Recently, artificial intelligence-based modeling in scientific computing has shown revolutionary potential for digital twins and scientific simulations, but the bottlenecks of numerical ocean modeling have not been further solved. Here, we present AI-GOMS, a large AI-driven global ocean modeling system, for accurate and efficient global ocean daily prediction. AI-GOMS consists of a backbone model with the Fourier-based Masked Autoencoder structure for basic ocean variable prediction and lightweight fine-tuning models incorporating regional downscaling, wave decoding, and biochemistry coupling modules. AI-GOMS has achieved the best performance in 30 days of prediction for the global ocean basic variables with 15 depth layers at 1/4° spatial resolution. Beyond the good performance in statistical metrics, AI-GOMS realizes the simulation of mesoscale eddies in the Kuroshio region at 1/12° spatial resolution and ocean stratification in the tropical Pacific Ocean. AI-GOMS provides a new backbone-downstream paradigm for Earth system modeling, which makes the system transferable, scalable and reusable. △ Less

Submitted 10 August, 2023; v1 submitted 6 August, 2023; originally announced August 2023.

arXiv:2308.02121 [pdf, other]

Model Provenance via Model DNA

Authors: Xin Mu, Yu Wang, Yehong Zhang, Jiaqi Zhang, Hui Wang, Yang Xiang, Yue Yu

Abstract: Understanding the life cycle of the machine learning (ML) model is an intriguing area of research (e.g., understanding where the model comes from, how it is trained, and how it is used). This paper focuses on a novel problem within this field, namely Model Provenance (MP), which concerns the relationship between a target model and its pre-training model and aims to determine whether a source model… ▽ More Understanding the life cycle of the machine learning (ML) model is an intriguing area of research (e.g., understanding where the model comes from, how it is trained, and how it is used). This paper focuses on a novel problem within this field, namely Model Provenance (MP), which concerns the relationship between a target model and its pre-training model and aims to determine whether a source model serves as the provenance for a target model. This is an important problem that has significant implications for ensuring the security and intellectual property of machine learning models but has not received much attention in the literature. To fill in this gap, we introduce a novel concept of Model DNA which represents the unique characteristics of a machine learning model. We utilize a data-driven and model-driven representation learning method to encode the model's training data and input-output information as a compact and comprehensive representation (i.e., DNA) of the model. Using this model DNA, we develop an efficient framework for model provenance identification, which enables us to identify whether a source model is a pre-training model of a target model. We conduct evaluations on both computer vision and natural language processing tasks using various models, datasets, and scenarios to demonstrate the effectiveness of our approach in accurately identifying model provenance. △ Less

Submitted 3 August, 2023; originally announced August 2023.

arXiv:2308.01463 [pdf, other]

SemDiff: Binary Similarity Detection by Diffing Key-Semantics Graphs

Authors: Zian Liu, Zhi Zhang, Siqi Ma, Dongxi Liu, Jun Zhang, Chao Chen, Shigang Liu, Muhammad Ejaz Ahmed, Yang Xiang

Abstract: Binary similarity detection is a critical technique that has been applied in many real-world scenarios where source code is not available, e.g., bug search, malware analysis, and code plagiarism detection. Existing works are ineffective in detecting similar binaries in cases where different compiling optimizations, compilers, source code versions, or obfuscation are deployed. We observe that all… ▽ More Binary similarity detection is a critical technique that has been applied in many real-world scenarios where source code is not available, e.g., bug search, malware analysis, and code plagiarism detection. Existing works are ineffective in detecting similar binaries in cases where different compiling optimizations, compilers, source code versions, or obfuscation are deployed. We observe that all the cases do not change a binary's key code behaviors although they significantly modify its syntax and structure. With this key observation, we extract a set of key instructions from a binary to capture its key code behaviors. By detecting the similarity between two binaries' key instructions, we can address well the ineffectiveness limitation of existing works. Specifically, we translate each extracted key instruction into a self-defined key expression, generating a key-semantics graph based on the binary's control flow. Each node in the key-semantics graph denotes a key instruction, and the node attribute is the key expression. To quantify the similarity between two given key-semantics graphs, we first serialize each graph into a sequence of key expressions by topological sort. Then, we tokenize and concatenate key expressions to generate token lists. We calculate the locality-sensitive hash value for all token lists and quantify their similarity. %We implement a prototype, called SemDiff, consisting of two modules: graph generation and graph diffing. The first module generates a pair of key-semantics graphs and the second module diffs the graphs. Our evaluation results show that overall, SemDiff outperforms state-of-the-art tools when detecting the similarity of binaries generated from different optimization levels, compilers, and obfuscations. SemDiff is also effective for library version search and finding similar vulnerabilities in firmware. △ Less

Submitted 2 August, 2023; originally announced August 2023.

Comments: 12 pages, conference paper

arXiv:2307.16107 [pdf, other]

doi 10.3847/1538-4357/ace166

Fermi-LAT detection of A new starburst galaxy candidate: IRAS 13052-5711

Authors: Yunchuan Xiang, Qingquan Jiang, Xiaofei Lan

Abstract: A likely starburst galaxy (SBG), IRAS 13052-5711, which is the most distant SBG candidate discovered to date, was found by analyzing 14.4 years of data from the Fermi large-area telescope (Fermi-LAT). This SBG's significance level is approximately 6.55$σ$ in the 0.1-500 GeV band. Its spatial position is close to that of 4FGL J1308.9-5730, determined from the Fermi large telescope fourth-source Cat… ▽ More A likely starburst galaxy (SBG), IRAS 13052-5711, which is the most distant SBG candidate discovered to date, was found by analyzing 14.4 years of data from the Fermi large-area telescope (Fermi-LAT). This SBG's significance level is approximately 6.55$σ$ in the 0.1-500 GeV band. Its spatial position is close to that of 4FGL J1308.9-5730, determined from the Fermi large telescope fourth-source Catalog (4FGL). Its power-law spectral index is approximately 2.1, and its light curve (LC) for 14.4 years has no significant variability. These characteristics are highly similar to those of SBGs found in the past. We calculate the SBG's star formation rate (SFR) to be 29.38 $\rm M_{\odot}\ yr^{-1}$, which is within the SFR range of SBGs found to date. Therefore, IRAS 13052-5711 is considered to be a likely SBG. In addition, its 0.1-500 GeV luminosity is (3.28 $\pm$ 0.67) $\times 10^{42}\ \rm erg\ s^{-1}$, which deviates from the empirical relationship of the $γ$-ray luminosity and the total infrared luminosity. We considered a hadronic model to explain the GeV spectrum of IRAS 13052-5711. △ Less

Submitted 29 July, 2023; originally announced July 2023.

arXiv:2307.13653 [pdf, ps, other]

A threshold dislocation dynamics method

Authors: Xiaoxue Qin, Alfonso H. W. Ngan, Yang Xiang

Abstract: The Merriman-Bence-Osher threshold dynamics method is an efficient algorithm to simulate the motion by mean curvature. It has the advantages of being easy to implement and with high efficiency. In this paper, we propose a threshold dynamics method for dislocation dynamics in a slip plane, in which the spatial operator is essentially an anisotropic fractional Laplacian. We show that this threshold… ▽ More The Merriman-Bence-Osher threshold dynamics method is an efficient algorithm to simulate the motion by mean curvature. It has the advantages of being easy to implement and with high efficiency. In this paper, we propose a threshold dynamics method for dislocation dynamics in a slip plane, in which the spatial operator is essentially an anisotropic fractional Laplacian. We show that this threshold dislocation dynamics method is able to give { two correct leading orders} in dislocation velocity, including both the $O(\log\varepsilon)$ local curvature force and the $O(1)$ nonlocal force due to the long-range stress field generated by the dislocations as well as the force due to the applied stress, where $\varepsilon$ is the dislocation core size, { if the time step is set to be $Δt=\varepsilon$. This generalizes the available result of threshold dynamics with the corresponding fractional Laplacian, which is on the leading order $O(\logΔt)$ local curvature velocity under the isotropic kernel.} We also propose a numerical method based on spatial variable stretching to correct the mobility and to rescale the velocity for efficient and accurate simulations, which can be applied generally to any threshold dynamics method. We validate the proposed threshold dislocation dynamics method by numerical simulations of various motions and interaction of dislocations. △ Less

Submitted 15 October, 2023; v1 submitted 25 July, 2023; originally announced July 2023.

Comments: 35 pages, 13 figures

MSC Class: 65R20; 65N12; 74A50; 35R11

arXiv:2307.13264 [pdf, other]

Extreme events generated in microcavity lasers and their predictions by reservoir computing

Authors: T. Wang, H. X. Zhou, Q. Fang, Y. N. Han, X. X. Guo, Y. H. Zhang, C. Qian, H. S. Chen, S. Barland, S. Y. Xiang, G. L. Lippi

Abstract: Extreme events generated by complex systems have been intensively studied in many fields due to their great impact on scientific research and our daily lives. However, their prediction is still a challenge in spite of the tremendous progress that model-free machine learning has brought to the field. We experimentally generate, and theoretically model, extreme events in a current-modulated, single-… ▽ More Extreme events generated by complex systems have been intensively studied in many fields due to their great impact on scientific research and our daily lives. However, their prediction is still a challenge in spite of the tremendous progress that model-free machine learning has brought to the field. We experimentally generate, and theoretically model, extreme events in a current-modulated, single-mode microcavity laser operating on orthogonal polarizations, where their strongly differing thresholds -- due to cavity birefringence -- give rise to giant light pulses initiated by spontaneous emission. Applying reservoir-computing techniques, we identify in advance the emergence of an extreme event from a time series, in spite of coarse sampling and limited sample length. Performance is optimized through new hybrid configurations that we introduce in this paper. Advance warning times can reach 5ns, i.e. approximately ten times the rise time of the individual extreme event. △ Less

Submitted 25 July, 2023; originally announced July 2023.

arXiv:2307.12287 [pdf, other]

Decentralized Adaptive Formation via Consensus-Oriented Multi-Agent Communication

Authors: Yuming Xiang, Sizhao Li, Rongpeng Li, Zhifeng Zhao, Honggang Zhang

Abstract: Adaptive multi-agent formation control, which requires the formation to flexibly adjust along with the quantity variations of agents in a decentralized manner, belongs to one of the most challenging issues in multi-agent systems, especially under communication-limited constraints. In this paper, we propose a novel Consensus-based Decentralized Adaptive Formation (Cons-DecAF) framework. Specificall… ▽ More Adaptive multi-agent formation control, which requires the formation to flexibly adjust along with the quantity variations of agents in a decentralized manner, belongs to one of the most challenging issues in multi-agent systems, especially under communication-limited constraints. In this paper, we propose a novel Consensus-based Decentralized Adaptive Formation (Cons-DecAF) framework. Specifically, we develop a novel multi-agent reinforcement learning method, Consensus-oriented Multi-Agent Communication (ConsMAC), to enable agents to perceive global information and establish the consensus from local states by effectively aggregating neighbor messages. Afterwards, we leverage policy distillation to accomplish the adaptive formation adjustment. Meanwhile, instead of pre-assigning specific positions of agents, we employ a displacement-based formation by Hausdorff distance to significantly improve the formation efficiency. The experimental results through extensive simulations validate that the proposed method has achieved outstanding performance in terms of both speed and stability. △ Less

Submitted 23 July, 2023; originally announced July 2023.

Comments: 6 pages, 5 figures

arXiv:2307.10580 [pdf, other]

Intelligent model for offshore China sea fog forecasting

Authors: Yanfei Xiang, Qinghong Zhang, Mingqing Wang, Ruixue Xia, Yang Kong, Xiaomeng Huang

Abstract: Accurate and timely prediction of sea fog is very important for effectively managing maritime and coastal economic activities. Given the intricate nature and inherent variability of sea fog, traditional numerical and statistical forecasting methods are often proven inadequate. This study aims to develop an advanced sea fog forecasting method embedded in a numerical weather prediction model using t… ▽ More Accurate and timely prediction of sea fog is very important for effectively managing maritime and coastal economic activities. Given the intricate nature and inherent variability of sea fog, traditional numerical and statistical forecasting methods are often proven inadequate. This study aims to develop an advanced sea fog forecasting method embedded in a numerical weather prediction model using the Yangtze River Estuary (YRE) coastal area as a case study. Prior to training our machine learning model, we employ a time-lagged correlation analysis technique to identify key predictors and decipher the underlying mechanisms driving sea fog occurrence. In addition, we implement ensemble learning and a focal loss function to address the issue of imbalanced data, thereby enhancing the predictive ability of our model. To verify the accuracy of our method, we evaluate its performance using a comprehensive dataset spanning one year, which encompasses both weather station observations and historical forecasts. Remarkably, our machine learning-based approach surpasses the predictive performance of two conventional methods, the weather research and forecasting nonhydrostatic mesoscale model (WRF-NMM) and the algorithm developed by the National Oceanic and Atmospheric Administration (NOAA) Forecast Systems Laboratory (FSL). Specifically, in regard to predicting sea fog with a visibility of less than or equal to 1 km with a lead time of 60 hours, our methodology achieves superior results by increasing the probability of detection (POD) while simultaneously reducing the false alarm ratio (FAR). △ Less

Submitted 20 July, 2023; originally announced July 2023.

Comments: 19 pages, 9 figures

arXiv:2307.09850 [pdf, ps, other]

Communication-Efficient Distribution-Free Inference Over Networks

Authors: Mehrdad Pournaderi, Yu Xiang

Abstract: Consider a star network where each local node possesses a set of test statistics that exhibit a symmetric distribution around zero when their corresponding null hypothesis is true. This paper investigates statistical inference problems in networks concerning the aggregation of this general type of statistics and global error rate control under communication constraints in various scenarios. The st… ▽ More Consider a star network where each local node possesses a set of test statistics that exhibit a symmetric distribution around zero when their corresponding null hypothesis is true. This paper investigates statistical inference problems in networks concerning the aggregation of this general type of statistics and global error rate control under communication constraints in various scenarios. The study proposes communication-efficient algorithms that are built on established non-parametric methods, such as the Wilcoxon and sign tests, as well as modern inference methods such as the Benjamini-Hochberg (BH) and Barber-Candes (BC) procedures, coupled with sampling and quantization operations. The proposed methods are evaluated through extensive simulation studies. △ Less

Submitted 28 November, 2023; v1 submitted 19 July, 2023; originally announced July 2023.

Comments: Presented in the Asilomar Conference on Signals, Systems, and Computers (2023)

arXiv:2307.07736 [pdf, ps, other]

Identifying Direct Causes using Intervened Target Variable

Authors: Kang Du, Yu Xiang, Ilya Soloveychik

Abstract: Identifying the direct causes or causal parents of a target variable is crucial for scientific discovery. Focusing on linear models, the invariant prediction framework was built upon the invariance principle, namely, the conditional distribution of the target variable given its causal parents is invariant across multiple environments or experimental conditions. However, their identifiability resul… ▽ More Identifying the direct causes or causal parents of a target variable is crucial for scientific discovery. Focusing on linear models, the invariant prediction framework was built upon the invariance principle, namely, the conditional distribution of the target variable given its causal parents is invariant across multiple environments or experimental conditions. However, their identifiability results for causal parents can be restrictive with respect to the underlying graph structure and the experimental conditions for generating interventional data. Motivated by a recent alternative formulation of invariance, called the invariant matching property, we establish identifiability results under relatively mild assumptions, which leads to a simple yet effective procedure for identifying causal parents. We demonstrate the performance of the proposed method over various synthetic and real datasets. △ Less

Submitted 17 July, 2023; v1 submitted 15 July, 2023; originally announced July 2023.

Comments: Accepted to the 57th Asilomar Conference on Signals, Systems, and Computers

arXiv:2307.03073 [pdf, other]

Proto-CLIP: Vision-Language Prototypical Network for Few-Shot Learning

Authors: Jishnu Jaykumar P, Kamalesh Palanisamy, Yu-Wei Chao, Xinya Du, Yu Xiang

Abstract: We propose a novel framework for few-shot learning by leveraging large-scale vision-language models such as CLIP. Motivated by unimodal prototypical networks for few-shot learning, we introduce Proto-CLIP which utilizes image prototypes and text prototypes for few-shot learning. Specifically, Proto-CLIP adapts the image and text encoder embeddings from CLIP in a joint fashion using few-shot exampl… ▽ More We propose a novel framework for few-shot learning by leveraging large-scale vision-language models such as CLIP. Motivated by unimodal prototypical networks for few-shot learning, we introduce Proto-CLIP which utilizes image prototypes and text prototypes for few-shot learning. Specifically, Proto-CLIP adapts the image and text encoder embeddings from CLIP in a joint fashion using few-shot examples. The embeddings from the two encoders are used to compute the respective prototypes of image classes for classification. During adaptation, we propose aligning the image and text prototypes of the corresponding classes. Such alignment is beneficial for few-shot classification due to the reinforced contributions from both types of prototypes. Proto-CLIP has both training-free and fine-tuned variants. We demonstrate the effectiveness of our method by conducting experiments on benchmark datasets for few-shot learning, as well as in the real world for robot perception. The project page is available at https://irvlutd.github.io/Proto-CLIP △ Less

Submitted 14 July, 2024; v1 submitted 6 July, 2023; originally announced July 2023.

Comments: Accepted at 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

arXiv:2307.02061 [pdf, other]

doi 10.1103/PhysRevLett.132.080201

Randomness Certification from Multipartite Quantum Steering for Arbitrary Dimensional Systems

Authors: Yi Li, Yu Xiang, Xiao-Dong Yu, H. Chau Nguyen, Otfried Gühne, Qiongyi He

Abstract: Entanglement in bipartite systems has been applied for the generation of secure random numbers, which are playing an important role in cryptography or scientific numerical simulations. Here, we propose to use multipartite entanglement distributed between trusted and untrusted parties for generating randomness of arbitrary dimensional systems. We show that the distributed structure of several parti… ▽ More Entanglement in bipartite systems has been applied for the generation of secure random numbers, which are playing an important role in cryptography or scientific numerical simulations. Here, we propose to use multipartite entanglement distributed between trusted and untrusted parties for generating randomness of arbitrary dimensional systems. We show that the distributed structure of several parties leads to additional protection against possible attacks by an eavesdropper, resulting in more secure randomness generated than in the corresponding bipartite scenario. Especially, randomness can be certified in the group of untrusted parties, even there is no randomness exists in either of them individually. We prove that the necessary and sufficient resource for quantum randomness in this scenario is multipartite quantum steering when two measurement settings are performed on the untrusted parties. However, the sufficiency no longer holds with more measurement settings. Finally, we apply our analysis to some experimentally realized states and show that more randomness can be extracted in comparison to the existing analysis. △ Less

Submitted 5 July, 2023; originally announced July 2023.

Comments: 14 pages, 5 figures

Journal ref: Phys. Rev. Lett. 132, 080201 (2024)

arXiv:2307.01879 [pdf, other]

Stability Analysis Framework for Particle-based Distance GANs with Wasserstein Gradient Flow

Authors: Chuqi Chen, Yue Wu, Yang Xiang

Abstract: In this paper, we investigate the training process of generative networks that use a type of probability density distance named particle-based distance as the objective function, e.g. MMD GAN, Cramér GAN, EIEG GAN. However, these GANs often suffer from the problem of unstable training. In this paper, we analyze the stability of the training process of these GANs from the perspective of probability… ▽ More In this paper, we investigate the training process of generative networks that use a type of probability density distance named particle-based distance as the objective function, e.g. MMD GAN, Cramér GAN, EIEG GAN. However, these GANs often suffer from the problem of unstable training. In this paper, we analyze the stability of the training process of these GANs from the perspective of probability density dynamics. In our framework, we regard the discriminator $D$ in these GANs as a feature transformation mapping that maps high dimensional data into a feature space, while the generator $G$ maps random variables to samples that resemble real data in terms of feature space. This perspective enables us to perform stability analysis for the training of GANs using the Wasserstein gradient flow of the probability density function. We find that the training process of the discriminator is usually unstable due to the formulation of $\min_G \max_D E(G, D)$ in GANs. To address this issue, we add a stabilizing term in the discriminator loss function. We conduct experiments to validate our stability analysis and stabilizing method. △ Less

Submitted 7 July, 2023; v1 submitted 4 July, 2023; originally announced July 2023.

arXiv:2306.15620 [pdf, other]

SCENEREPLICA: Benchmarking Real-World Robot Manipulation by Creating Replicable Scenes

Authors: Ninad Khargonkar, Sai Haneesh Allu, Yangxiao Lu, Jishnu Jaykumar P, Balakrishnan Prabhakaran, Yu Xiang

Abstract: We present a new reproducible benchmark for evaluating robot manipulation in the real world, specifically focusing on pick-and-place. Our benchmark uses the YCB objects, a commonly used dataset in the robotics community, to ensure that our results are comparable to other studies. Additionally, the benchmark is designed to be easily reproducible in the real world, making it accessible to researcher… ▽ More We present a new reproducible benchmark for evaluating robot manipulation in the real world, specifically focusing on pick-and-place. Our benchmark uses the YCB objects, a commonly used dataset in the robotics community, to ensure that our results are comparable to other studies. Additionally, the benchmark is designed to be easily reproducible in the real world, making it accessible to researchers and practitioners. We also provide our experimental results and analyzes for model-based and model-free 6D robotic grasping on the benchmark, where representative algorithms are evaluated for object perception, grasping planning, and motion planning. We believe that our benchmark will be a valuable tool for advancing the field of robot manipulation. By providing a standardized evaluation framework, researchers can more easily compare different techniques and algorithms, leading to faster progress in developing robot manipulation methods. △ Less

Submitted 11 March, 2024; v1 submitted 27 June, 2023; originally announced June 2023.

Comments: Accepted to ICRA 2024. Project page is available at https://irvlutd.github.io/SceneReplica

arXiv:2306.10490 [pdf, other]

doi 10.1145/3580305.3599485

Rapid Image Labeling via Neuro-Symbolic Learning

Authors: Yifeng Wang, Zhi Tu, Yiwen Xiang, Shiyuan Zhou, Xiyuan Chen, Bingxuan Li, Tianyi Zhang

Abstract: The success of Computer Vision (CV) relies heavily on manually annotated data. However, it is prohibitively expensive to annotate images in key domains such as healthcare, where data labeling requires significant domain expertise and cannot be easily delegated to crowd workers. To address this challenge, we propose a neuro-symbolic approach called Rapid, which infers image labeling rules from a sm… ▽ More The success of Computer Vision (CV) relies heavily on manually annotated data. However, it is prohibitively expensive to annotate images in key domains such as healthcare, where data labeling requires significant domain expertise and cannot be easily delegated to crowd workers. To address this challenge, we propose a neuro-symbolic approach called Rapid, which infers image labeling rules from a small amount of labeled data provided by domain experts and automatically labels unannotated data using the rules. Specifically, Rapid combines pre-trained CV models and inductive logic learning to infer the logic-based labeling rules. Rapid achieves a labeling accuracy of 83.33% to 88.33% on four image labeling tasks with only 12 to 39 labeled samples. In particular, Rapid significantly outperforms finetuned CV models in two highly specialized tasks. These results demonstrate the effectiveness of Rapid in learning from small data and its capability to generalize among different tasks. Code and our dataset are publicly available at https://github.com/Neural-Symbolic-Image-Labeling/ △ Less

Submitted 18 June, 2023; originally announced June 2023.

Comments: This paper was accepted by the 2023 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

arXiv:2306.08303 [pdf, other]

Pedestrian Recognition with Radar Data-Enhanced Deep Learning Approach Based on Micro-Doppler Signatures

Authors: Haoming Li, Yu Xiang, Haodong Xu, Wenyong Wang

Abstract: As a hot topic in recent years, the ability of pedestrians identification based on radar micro-Doppler signatures is limited by the lack of adequate training data. In this paper, we propose a data-enhanced multi-characteristic learning (DEMCL) model with data enhancement (DE) module and multi-characteristic learning (MCL) module to learn more complementary pedestrian micro-Doppler (m-D) signatures… ▽ More As a hot topic in recent years, the ability of pedestrians identification based on radar micro-Doppler signatures is limited by the lack of adequate training data. In this paper, we propose a data-enhanced multi-characteristic learning (DEMCL) model with data enhancement (DE) module and multi-characteristic learning (MCL) module to learn more complementary pedestrian micro-Doppler (m-D) signatures. In DE module, a range-Doppler generative adversarial network (RDGAN) is proposed to enhance free walking datasets, and MCL module with multi-scale convolution neural network (MCNN) and radial basis function neural network (RBFNN) is trained to learn m-D signatures extracted from enhanced datasets. Experimental results show that our model is 3.33% to 10.24% more accurate than other studies and has a short run time of 0.9324 seconds on a 25-minute walking dataset. △ Less

Submitted 14 June, 2023; originally announced June 2023.

Comments: 6 pages,17 figures

arXiv:2305.12747 [pdf, other]

The "code'' of Ethics:A Holistic Audit of AI Code Generators

Authors: Wanlun Ma, Yiliao Song, Minhui Xue, Sheng Wen, Yang Xiang

Abstract: AI-powered programming language generation (PLG) models have gained increasing attention due to their ability to generate source code of programs in a few seconds with a plain program description. Despite their remarkable performance, many concerns are raised over the potential risks of their development and deployment, such as legal issues of copyright infringement induced by training usage of li… ▽ More AI-powered programming language generation (PLG) models have gained increasing attention due to their ability to generate source code of programs in a few seconds with a plain program description. Despite their remarkable performance, many concerns are raised over the potential risks of their development and deployment, such as legal issues of copyright infringement induced by training usage of licensed code, and malicious consequences due to the unregulated use of these models. In this paper, we present the first-of-its-kind study to systematically investigate the accountability of PLG models from the perspectives of both model development and deployment. In particular, we develop a holistic framework not only to audit the training data usage of PLG models, but also to identify neural code generated by PLG models as well as determine its attribution to a source model. To this end, we propose using membership inference to audit whether a code snippet used is in the PLG model's training data. In addition, we propose a learning-based method to distinguish between human-written code and neural code. In neural code attribution, through both empirical and theoretical analysis, we show that it is impossible to reliably attribute the generation of one code snippet to one model. We then propose two feasible alternative methods: one is to attribute one neural code snippet to one of the candidate PLG models, and the other is to verify whether a set of neural code snippets can be attributed to a given PLG model. The proposed framework thoroughly examines the accountability of PLG models which are verified by extensive experiments. The implementations of our proposed framework are also encapsulated into a new artifact, named CodeForensic, to foster further research. △ Less

Submitted 22 May, 2023; originally announced May 2023.

arXiv:2305.11884 [pdf, other]

Novel deep learning methods for 3D flow field segmentation and classification

Authors: Xiaorui Bai, Wenyong Wang, Jun Zhang, Yueqing Wang, Yu Xiang

Abstract: Flow field segmentation and classification help researchers to understand vortex structure and thus turbulent flow. Existing deep learning methods mainly based on global information and focused on 2D circumstance. Based on flow field theory, we propose novel flow field segmentation and classification deep learning methods in three-dimensional space. We construct segmentation criterion based on loc… ▽ More Flow field segmentation and classification help researchers to understand vortex structure and thus turbulent flow. Existing deep learning methods mainly based on global information and focused on 2D circumstance. Based on flow field theory, we propose novel flow field segmentation and classification deep learning methods in three-dimensional space. We construct segmentation criterion based on local velocity information and classification criterion based on the relationship between local vorticity and vortex wake, to identify vortex structure in 3D flow field, and further classify the type of vortex wakes accurately and rapidly. Simulation experiment results showed that, compared with existing methods, our segmentation method can identify the vortex area more accurately, while the time consumption is reduced more than 50%; our classification method can reduce the time consumption by more than 90% while maintaining the same classification accuracy level. △ Less

Submitted 14 June, 2023; v1 submitted 10 May, 2023; originally announced May 2023.

Comments: 13 pages, 23 figures

arXiv:2305.11202 [pdf]

LLM-based Frameworks for Power Engineering from Routine to Novel Tasks

Authors: Ran Li, Chuanqing Pu, Junyi Tao, Canbing Li, Feilong Fan, Yue Xiang, Sijie Chen

Abstract: The digitalization of energy sectors has expanded the coding responsibilities for power engineers and researchers. This research article explores the potential of leveraging Large Language Models (LLMs) to alleviate this burden. Here, we propose LLM-based frameworks for different programming tasks in power systems. For well-defined and routine tasks like the classic unit commitment (UC) problem, w… ▽ More The digitalization of energy sectors has expanded the coding responsibilities for power engineers and researchers. This research article explores the potential of leveraging Large Language Models (LLMs) to alleviate this burden. Here, we propose LLM-based frameworks for different programming tasks in power systems. For well-defined and routine tasks like the classic unit commitment (UC) problem, we deploy an end-to-end framework to systematically assesses four leading LLMs-ChatGPT 3.5, ChatGPT 4.0, Claude and Google Bard in terms of success rate, consistency, and robustness. For complex tasks with limited prior knowledge, we propose a human-in-the-loop framework to enable engineers and LLMs to collaboratively solve the problem through interactive-learning of method recommendation, problem de-composition, subtask programming and synthesis. Through a comparative study between two frameworks, we find that human-in-the-loop features like web access, problem decomposition with field knowledge and human-assisted code synthesis are essential as LLMs currently still fall short in acquiring cutting-edge and domain-specific knowledge to complete a holistic problem-solving project. △ Less

Submitted 19 October, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

arXiv:2305.08466 [pdf, other]

Nearly Optimal VC-Dimension and Pseudo-Dimension Bounds for Deep Neural Network Derivatives

Authors: Yahong Yang, Haizhao Yang, Yang Xiang

Abstract: This paper addresses the problem of nearly optimal Vapnik--Chervonenkis dimension (VC-dimension) and pseudo-dimension estimations of the derivative functions of deep neural networks (DNNs). Two important applications of these estimations include: 1) Establishing a nearly tight approximation result of DNNs in the Sobolev space; 2) Characterizing the generalization error of machine learning methods… ▽ More This paper addresses the problem of nearly optimal Vapnik--Chervonenkis dimension (VC-dimension) and pseudo-dimension estimations of the derivative functions of deep neural networks (DNNs). Two important applications of these estimations include: 1) Establishing a nearly tight approximation result of DNNs in the Sobolev space; 2) Characterizing the generalization error of machine learning methods with loss functions involving function derivatives. This theoretical investigation fills the gap of learning error estimations for a wide range of physics-informed machine learning models and applications including generative models, solving partial differential equations, operator learning, network compression, distillation, regularization, etc. △ Less

Submitted 15 May, 2023; originally announced May 2023.

arXiv:2305.06674 [pdf, other]

doi 10.1145/3563703.3596621

Decentralized Governance for Virtual Community(DeGov4VC): Optimal Policy Design of Human-plant Symbiosis Co-creation

Authors: Yan Xiang, Qianhui Fan, Kejiang Qian, Jiajie Li, Yuying Tang, Ze Gao

Abstract: Does the decentralized nature of user behavior in interactive virtual communities help create rules promoting user engagement? Through scenarios like planting, this framework suggests a new paradigm for mutual influence that allows users to impact communities' political decisions. Sixteen participants in the first round of interviews were involved in the framework's creation. Then we developed and… ▽ More Does the decentralized nature of user behavior in interactive virtual communities help create rules promoting user engagement? Through scenarios like planting, this framework suggests a new paradigm for mutual influence that allows users to impact communities' political decisions. Sixteen participants in the first round of interviews were involved in the framework's creation. Then we developed and implemented our framework in the community with the help of other stakeholders. This proof-of-concept creates user groups using information from users' daily activities as input and grows the green plants in a virtual environment. Finally, we involved AI agents and stakeholders in the framework test and iterations. Our study's user evaluation of a few key stakeholders demonstrates how our strategy enhances user viscosity and experience. Via human-planting ecosystems in a virtual community, this research gives a fresh viewpoint on decentralized governance and an engaging method for co-creating interactive ecological communities. △ Less

Submitted 11 May, 2023; originally announced May 2023.

Comments: Accepted In Designing Interactive Systems Conference (DIS Companion 23), July 10-14, 2023, Pittsburgh, PA, USA. ACM, New York, NY, USA, 7 pages

arXiv:2305.05986 [pdf, other]

Structural Hawkes Processes for Learning Causal Structure from Discrete-Time Event Sequences

Authors: Jie Qiao, Ruichu Cai, Siyu Wu, Yu Xiang, Keli Zhang, Zhifeng Hao

Abstract: Learning causal structure among event types from discrete-time event sequences is a particularly important but challenging task. Existing methods, such as the multivariate Hawkes processes based methods, mostly boil down to learning the so-called Granger causality which assumes that the cause event happens strictly prior to its effect event. Such an assumption is often untenable beyond application… ▽ More Learning causal structure among event types from discrete-time event sequences is a particularly important but challenging task. Existing methods, such as the multivariate Hawkes processes based methods, mostly boil down to learning the so-called Granger causality which assumes that the cause event happens strictly prior to its effect event. Such an assumption is often untenable beyond applications, especially when dealing with discrete-time event sequences in low-resolution; and typical discrete Hawkes processes mainly suffer from identifiability issues raised by the instantaneous effect, i.e., the causal relationship that occurred simultaneously due to the low-resolution data will not be captured by Granger causality. In this work, we propose Structure Hawkes Processes (SHPs) that leverage the instantaneous effect for learning the causal structure among events type in discrete-time event sequence. The proposed method is featured with the minorization-maximization of the likelihood function and a sparse optimization scheme. Theoretical results show that the instantaneous effect is a blessing rather than a curse, and the causal structure is identifiable under the existence of the instantaneous effect. Experiments on synthetic and real-world data verify the effectiveness of the proposed method. △ Less

Submitted 10 May, 2023; originally announced May 2023.

Comments: Accepted by IJCAI 2023

arXiv:2305.04269 [pdf, other]

Dual Residual Attention Network for Image Denoising

Authors: Wencong Wu, Shijie Liu, Yi Zhou, Yungang Zhang, Yu Xiang

Abstract: In image denoising, deep convolutional neural networks (CNNs) can obtain favorable performance on removing spatially invariant noise. However, many of these networks cannot perform well on removing the real noise (i.e. spatially variant noise) generated during image acquisition or transmission, which severely sets back their application in practical image denoising tasks. Instead of continuously i… ▽ More In image denoising, deep convolutional neural networks (CNNs) can obtain favorable performance on removing spatially invariant noise. However, many of these networks cannot perform well on removing the real noise (i.e. spatially variant noise) generated during image acquisition or transmission, which severely sets back their application in practical image denoising tasks. Instead of continuously increasing the network depth, many researchers have revealed that expanding the width of networks can also be a useful way to improve model performance. It also has been verified that feature filtering can promote the learning ability of the models. Therefore, in this paper, we propose a novel Dual-branch Residual Attention Network (DRANet) for image denoising, which has both the merits of a wide model architecture and attention-guided feature learning. The proposed DRANet includes two different parallel branches, which can capture complementary features to enhance the learning ability of the model. We designed a new residual attention block (RAB) and a novel hybrid dilated residual attention block (HDRAB) for the upper and the lower branches, respectively. The RAB and HDRAB can capture rich local features through multiple skip connections between different convolutional layers, and the unimportant features are dropped by the residual attention modules. Meanwhile, the long skip connections in each branch, and the global feature fusion between the two parallel branches can capture the global features as well. Moreover, the proposed DRANet uses downsampling operations and dilated convolutions to increase the size of the receptive field, which can enable DRANet to capture more image context information. Extensive experiments demonstrate that compared with other state-of-the-art denoising methods, our DRANet can produce competitive denoising performance both on synthetic and real-world noise removal. △ Less

Submitted 7 May, 2023; originally announced May 2023.

arXiv:2305.03729 [pdf, other]

Score-based Transport Modeling for Mean-Field Fokker-Planck Equations

Authors: Jianfeng Lu, Yue Wu, Yang Xiang

Abstract: We use the score-based transport modeling method to solve the mean-field Fokker-Planck equations, which we call MSBTM. We establish an upper bound on the time derivative of the Kullback-Leibler (KL) divergence to MSBTM numerical estimation from the exact solution, thus validates the MSBTM approach. Besides, we provide an error analysis for the algorithm. In numerical experiments, we study two type… ▽ More We use the score-based transport modeling method to solve the mean-field Fokker-Planck equations, which we call MSBTM. We establish an upper bound on the time derivative of the Kullback-Leibler (KL) divergence to MSBTM numerical estimation from the exact solution, thus validates the MSBTM approach. Besides, we provide an error analysis for the algorithm. In numerical experiments, we study two types of mean-field Fokker-Planck equation and their corresponding dynamics of particles in interacting systems. The MSBTM algorithm is numerically validated through qualitative and quantitative comparison between the MSBTM solutions, the results of integrating the associated stochastic differential equation and the analytical solutions if available. △ Less

Submitted 20 April, 2023; originally announced May 2023.

arXiv:2304.08863 [pdf, other]

doi 10.1002/lpor.202300103

Remote preparation of optical cat states based on Gaussian entanglement

Authors: Dongmei Han, Fengxiao Sun, Na Wang, Yu Xiang, Meihong Wang, Mingsheng Tian, Qiongyi He, Xiaolong Su

Abstract: Remote state preparation enables one to prepare and manipulate quantum state non-locally. As an essential quantum resource, optical cat state is usually prepared locally by subtracting photons from a squeezed vacuum state. For remote quantum information processing, it is essential to prepare and manipulate optical cat states remotely based on Gaussian entanglement, which remains a challenge. Here,… ▽ More Remote state preparation enables one to prepare and manipulate quantum state non-locally. As an essential quantum resource, optical cat state is usually prepared locally by subtracting photons from a squeezed vacuum state. For remote quantum information processing, it is essential to prepare and manipulate optical cat states remotely based on Gaussian entanglement, which remains a challenge. Here, we present experimental preparation of optical cat states based on a remotely distributed two-mode Gaussian entangled state in a lossy channel. By performing photon subtraction and homodyne projective measurement at Alice's station, an optical cat state is prepared remotely at Bob's station. Furthermore, the prepared cat state is rotated by changing Alice's measurement basis of homodyne detection, which demonstrates the remote manipulation of it. By distributing two modes of the two-mode Gaussian entangled state in lossy channels, we demonstrate that the remotely prepared cat state can tolerate much more loss in Alice's channel than that in Bob's channel. We also show that cat states with amplitudes larger than 2 can be prepared by increasing the squeezing level and subtracting photon numbers. Our results make a crucial step toward remote hybrid quantum information processing involving discrete- and continuous-variable techniques. △ Less

Submitted 18 April, 2023; originally announced April 2023.

Journal ref: Laser & Photonics Reviews 2300103 (2023)

arXiv:2304.05604 [pdf, ps, other]

doi 10.1016/j.ijplas.2023.103700

A Continuum Model for Dislocation Climb

Authors: Chutian Huang, Shuyang Dai, Xiaohua Niu, Tianpeng Jiang, Zhijian Yang, Yejun Gu, Yang Xiang

Abstract: Dislocation climb plays an important role in understanding plastic deformation of metallic materials at high temperature. In this paper, we present a continuum formulation for dislocation climb velocity based on densities of dislocations. The obtained continuum formulation is an accurate approximation of the Green's function based discrete dislocation dynamics method (Gu et al. J. Mech. Phys. Soli… ▽ More Dislocation climb plays an important role in understanding plastic deformation of metallic materials at high temperature. In this paper, we present a continuum formulation for dislocation climb velocity based on densities of dislocations. The obtained continuum formulation is an accurate approximation of the Green's function based discrete dislocation dynamics method (Gu et al. J. Mech. Phys. Solids 83:319-337, 2015). The continuum dislocation climb formulation has the advantage of accounting for both the long-range effect of vacancy bulk diffusion and that of the Peach-Koehler climb force, and the two longrange effects are canceled into a short-range effect (integral with fast-decaying kernel) and in some special cases, a completely local effect. This significantly simplifies the calculation in the Green's function based discrete dislocation dynamics method, in which a linear system has to be solved over the entire system for the long-range effect of vacancy diffusion and the long-range Peach-Koehler climb force has to be calculated. This obtained continuum dislocation climb velocity can be applied in any available continuum dislocation dynamics frameworks. We also present numerical validations for this continuum climb velocity and simulation examples for implementation in continuum dislocation dynamics frameworks. △ Less

Submitted 12 April, 2023; originally announced April 2023.

arXiv:2304.04210 [pdf, other]

Filtering one-way Einstein-Podolsky-Rosen steering

Authors: Ze-Yan Hao, Yan Wang, Jia-Kun Li, Yu Xiang, Qiong-Yi He, Zheng-Hao Liu, Mu Yang, Kai Sun, Jin-Shi Xu, Chuan-Feng Li, Guang-Can Guo

Abstract: Einstein-Podolsky-Rosen (EPR) steering, a fundamental concept of quantum nonlocality, describes one observer's capability to remotely affect another distant observer's state by local measurements. Unlike quantum entanglement and Bell nonlocality, both associated with the symmetric quantum correlation, EPR steering depicts the unique asymmetric property of quantum nonlocality. With the local filter… ▽ More Einstein-Podolsky-Rosen (EPR) steering, a fundamental concept of quantum nonlocality, describes one observer's capability to remotely affect another distant observer's state by local measurements. Unlike quantum entanglement and Bell nonlocality, both associated with the symmetric quantum correlation, EPR steering depicts the unique asymmetric property of quantum nonlocality. With the local filter operation in which some system components are discarded, quantum nonlocality can be distilled to enhance the nonlocal correlation, and even the hidden nonlocality can be activated. However, asymmetric quantum nonlocality in the filter operation still lacks a well-rounded investigation, especially considering the discarded parts where quantum nonlocal correlations may still exist with probabilities. Here, in both theory and experiment, we investigate the effect of reusing the discarded particles from local filter. We observe all configurations of EPR steering simultaneously and other intriguing evolution of asymmetric quantum nonlocality, such as reversing the direction of one-way EPR steering. This work provides a perspective to answer "What is the essential role of utilizing quantum steering as a resource?", and demonstrates a practical toolbox for manipulating asymmetric quantum systems with significant potential applications in quantum information tasks. △ Less

Submitted 3 January, 2024; v1 submitted 9 April, 2023; originally announced April 2023.

Comments: 11pages, 6figures

arXiv:2304.03292 [pdf, other]

SE-shapelets: Semi-supervised Clustering of Time Series Using Representative Shapelets

Authors: Borui Cai, Guangyan Huang, Shuiqiao Yang, Yong Xiang, Chi-Hung Chi

Abstract: Shapelets that discriminate time series using local features (subsequences) are promising for time series clustering. Existing time series clustering methods may fail to capture representative shapelets because they discover shapelets from a large pool of uninformative subsequences, and thus result in low clustering accuracy. This paper proposes a Semi-supervised Clustering of Time Series Using Re… ▽ More Shapelets that discriminate time series using local features (subsequences) are promising for time series clustering. Existing time series clustering methods may fail to capture representative shapelets because they discover shapelets from a large pool of uninformative subsequences, and thus result in low clustering accuracy. This paper proposes a Semi-supervised Clustering of Time Series Using Representative Shapelets (SE-Shapelets) method, which utilizes a small number of labeled and propagated pseudo-labeled time series to help discover representative shapelets, thereby improving the clustering accuracy. In SE-Shapelets, we propose two techniques to discover representative shapelets for the effective clustering of time series. 1) A \textit{salient subsequence chain} ($SSC$) that can extract salient subsequences (as candidate shapelets) of a labeled/pseudo-labeled time series, which helps remove massive uninformative subsequences from the pool. 2) A \textit{linear discriminant selection} ($LDS$) algorithm to identify shapelets that can capture representative local features of time series in different classes, for convenient clustering. Experiments on UCR time series datasets demonstrate that SE-shapelets discovers representative shapelets and achieves higher clustering accuracy than counterpart semi-supervised time series clustering methods. △ Less

Submitted 14 November, 2023; v1 submitted 6 April, 2023; originally announced April 2023.

arXiv:2303.18174 [pdf, other]

Diff-ID: An Explainable Identity Difference Quantification Framework for DeepFake Detection

Authors: Chuer Yu, Xuhong Zhang, Yuxuan Duan, Senbo Yan, Zonghui Wang, Yang Xiang, Shouling Ji, Wenzhi Chen

Abstract: Despite the fact that DeepFake forgery detection algorithms have achieved impressive performance on known manipulations, they often face disastrous performance degradation when generalized to an unseen manipulation. Some recent works show improvement in generalization but rely on features fragile to image distortions such as compression. To this end, we propose Diff-ID, a concise and effective app… ▽ More Despite the fact that DeepFake forgery detection algorithms have achieved impressive performance on known manipulations, they often face disastrous performance degradation when generalized to an unseen manipulation. Some recent works show improvement in generalization but rely on features fragile to image distortions such as compression. To this end, we propose Diff-ID, a concise and effective approach that explains and measures the identity loss induced by facial manipulations. When testing on an image of a specific person, Diff-ID utilizes an authentic image of that person as a reference and aligns them to the same identity-insensitive attribute feature space by applying a face-swapping generator. We then visualize the identity loss between the test and the reference image from the image differences of the aligned pairs, and design a custom metric to quantify the identity loss. The metric is then proved to be effective in distinguishing the forgery images from the real ones. Extensive experiments show that our approach achieves high detection performance on DeepFake images and state-of-the-art generalization ability to unknown forgery methods, while also being robust to image distortions. △ Less

Submitted 30 March, 2023; originally announced March 2023.

arXiv:2303.17997 [pdf, other]

Switching classical and quantum nonreciprocities with spinning photonics

Authors: Yonglin Xiang, Yunlan Zuo, Xun-Wei Xu, Ran Huang, Hui Jing

Abstract: We study how to achieve, manipulate, and switch classical or quantum nonreciprocal effects of light with a spinning Kerr resonator. In particular, we show that even when there is no classical nonreciprocity (i.e., with the same mean number of photons for both clockwise and counterclockwise propagating modes), it is still possible to realize nonreciprocity of quantum correlations of photons in such… ▽ More We study how to achieve, manipulate, and switch classical or quantum nonreciprocal effects of light with a spinning Kerr resonator. In particular, we show that even when there is no classical nonreciprocity (i.e., with the same mean number of photons for both clockwise and counterclockwise propagating modes), it is still possible to realize nonreciprocity of quantum correlations of photons in such a device. Also, by tuning the angular velocity and the optical backscattering strength, higher-order quantum nonreciprocity can appear, featuring qualitatively different third-order optical correlations, even in the absence of any nonreciprocity for both the mean photon number and its second-order correlations. The possibility to switch a single device between a classical isolator and a purely quantum directional system can provide more functions for nonreciprocal materials and new opportunities to realize novel quantum effects and applications, such as nonreciprocal multi-photon blockade, one-way photon bundles, and backaction-immune quantum communications. △ Less

Submitted 28 August, 2023; v1 submitted 31 March, 2023; originally announced March 2023.

arXiv:2303.16552 [pdf, other]

Visual Content Privacy Protection: A Survey

Authors: Ruoyu Zhao, Yushu Zhang, Tao Wang, Wenying Wen, Yong Xiang, Xiaochun Cao

Abstract: Vision is the most important sense for people, and it is also one of the main ways of cognition. As a result, people tend to utilize visual content to capture and share their life experiences, which greatly facilitates the transfer of information. Meanwhile, it also increases the risk of privacy violations, e.g., an image or video can reveal different kinds of privacy-sensitive information. Resear… ▽ More Vision is the most important sense for people, and it is also one of the main ways of cognition. As a result, people tend to utilize visual content to capture and share their life experiences, which greatly facilitates the transfer of information. Meanwhile, it also increases the risk of privacy violations, e.g., an image or video can reveal different kinds of privacy-sensitive information. Researchers have been working continuously to develop targeted privacy protection solutions, and there are several surveys to summarize them from certain perspectives. However, these surveys are either problem-driven, scenario-specific, or technology-specific, making it difficult for them to summarize the existing solutions in a macroscopic way. In this survey, a framework that encompasses various concerns and solutions for visual privacy is proposed, which allows for a macro understanding of privacy concerns from a comprehensive level. It is based on the fact that privacy concerns have corresponding adversaries, and divides privacy protection into three categories, based on computer vision (CV) adversary, based on human vision (HV) adversary, and based on CV \& HV adversary. For each category, we analyze the characteristics of the main approaches to privacy protection, and then systematically review representative solutions. Open challenges and future directions for visual privacy protection are also discussed. △ Less

Submitted 29 March, 2023; originally announced March 2023.

Comments: 24 pages, 13 figures

arXiv:2303.14524 [pdf, other]

Chat-REC: Towards Interactive and Explainable LLMs-Augmented Recommender System

Authors: Yunfan Gao, Tao Sheng, Youlin Xiang, Yun Xiong, Haofen Wang, Jiawei Zhang

Abstract: Large language models (LLMs) have demonstrated their significant potential to be applied for addressing various application tasks. However, traditional recommender systems continue to face great challenges such as poor interactivity and explainability, which actually also hinder their broad deployment in real-world systems. To address these limitations, this paper proposes a novel paradigm called… ▽ More Large language models (LLMs) have demonstrated their significant potential to be applied for addressing various application tasks. However, traditional recommender systems continue to face great challenges such as poor interactivity and explainability, which actually also hinder their broad deployment in real-world systems. To address these limitations, this paper proposes a novel paradigm called Chat-Rec (ChatGPT Augmented Recommender System) that innovatively augments LLMs for building conversational recommender systems by converting user profiles and historical interactions into prompts. Chat-Rec is demonstrated to be effective in learning user preferences and establishing connections between users and products through in-context learning, which also makes the recommendation process more interactive and explainable. What's more, within the Chat-Rec framework, user's preferences can transfer to different products for cross-domain recommendations, and prompt-based injection of information into LLMs can also handle the cold-start scenarios with new items. In our experiments, Chat-Rec effectively improve the results of top-k recommendations and performs better in zero-shot rating prediction task. Chat-Rec offers a novel approach to improving recommender systems and presents new practical scenarios for the implementation of AIGC (AI generated content) in recommender system studies. △ Less

Submitted 3 April, 2023; v1 submitted 25 March, 2023; originally announced March 2023.

arXiv:2303.12816 [pdf, other]

From Wide to Deep: Dimension Lifting Network for Parameter-efficient Knowledge Graph Embedding

Authors: Borui Cai, Yong Xiang, Longxiang Gao, Di Wu, He Zhang, Jiong Jin, Tom Luan

Abstract: Knowledge graph embedding (KGE) that maps entities and relations into vector representations is essential for downstream applications. Conventional KGE methods require high-dimensional representations to learn the complex structure of knowledge graph, but lead to oversized model parameters. Recent advances reduce parameters by low-dimensional entity representations, while developing techniques (e.… ▽ More Knowledge graph embedding (KGE) that maps entities and relations into vector representations is essential for downstream applications. Conventional KGE methods require high-dimensional representations to learn the complex structure of knowledge graph, but lead to oversized model parameters. Recent advances reduce parameters by low-dimensional entity representations, while developing techniques (e.g., knowledge distillation or reinvented representation forms) to compensate for reduced dimension. However, such operations introduce complicated computations and model designs that may not benefit large knowledge graphs. To seek a simple strategy to improve the parameter efficiency of conventional KGE models, we take inspiration from that deeper neural networks require exponentially fewer parameters to achieve expressiveness comparable to wider networks for compositional structures. We view all entity representations as a single-layer embedding network, and conventional KGE methods that adopt high-dimensional entity representations equal widening the embedding network to gain expressiveness. To achieve parameter efficiency, we instead propose a deeper embedding network for entity representations, i.e., a narrow entity embedding layer plus a multi-layer dimension lifting network (LiftNet). Experiments on three public datasets show that by integrating LiftNet, four conventional KGE methods with 16-dimensional representations achieve comparable link prediction accuracy as original models that adopt 512-dimensional representations, saving 68.4% to 96.9% parameters. △ Less

Submitted 13 November, 2023; v1 submitted 22 March, 2023; originally announced March 2023.

arXiv:2303.10553 [pdf, other]

Elastic Interaction Energy-Based Generative Model: Approximation in Feature Space

Authors: Chuqi Chen, Yue Wu, Yang Xiang

Abstract: In this paper, we propose a novel approach to generative modeling using a loss function based on elastic interaction energy (EIE), which is inspired by the elastic interaction between defects in crystals. The utilization of the EIE-based metric presents several advantages, including its long range property that enables consideration of global information in the distribution. Moreover, its inclusio… ▽ More In this paper, we propose a novel approach to generative modeling using a loss function based on elastic interaction energy (EIE), which is inspired by the elastic interaction between defects in crystals. The utilization of the EIE-based metric presents several advantages, including its long range property that enables consideration of global information in the distribution. Moreover, its inclusion of a self-interaction term helps to prevent mode collapse and captures all modes of distribution. To overcome the difficulty of the relatively scattered distribution of high-dimensional data, we first map the data into a latent feature space and approximate the feature distribution instead of the data distribution. We adopt the GAN framework and replace the discriminator with a feature transformation network to map the data into a latent space. We also add a stabilizing term to the loss of the feature transformation network, which effectively addresses the issue of unstable training in GAN-based algorithms. Experimental results on popular datasets, such as MNIST, FashionMNIST, CIFAR-10, and CelebA, demonstrate that our EIEG GAN model can mitigate mode collapse, enhance stability, and improve model performance. △ Less

Submitted 18 March, 2023; originally announced March 2023.

arXiv:2303.07048 [pdf, other]

doi 10.1016/j.knosys.2023.111079

Hybrid Variational Autoencoder for Time Series Forecasting

Authors: Borui Cai, Shuiqiao Yang, Longxiang Gao, Yong Xiang

Abstract: Variational autoencoders (VAE) are powerful generative models that learn the latent representations of input data as random variables. Recent studies show that VAE can flexibly learn the complex temporal dynamics of time series and achieve more promising forecasting results than deterministic models. However, a major limitation of existing works is that they fail to jointly learn the local pattern… ▽ More Variational autoencoders (VAE) are powerful generative models that learn the latent representations of input data as random variables. Recent studies show that VAE can flexibly learn the complex temporal dynamics of time series and achieve more promising forecasting results than deterministic models. However, a major limitation of existing works is that they fail to jointly learn the local patterns (e.g., seasonality and trend) and temporal dynamics of time series for forecasting. Accordingly, we propose a novel hybrid variational autoencoder (HyVAE) to integrate the learning of local patterns and temporal dynamics by variational inference for time series forecasting. Experimental results on four real-world datasets show that the proposed HyVAE achieves better forecasting results than various counterpart methods, as well as two HyVAE variants that only learn the local patterns or temporal dynamics of time series, respectively. △ Less

Submitted 13 March, 2023; originally announced March 2023.

Journal ref: Knowledge-Based Systems. 281 (2023) 111079

arXiv:2303.02892 [pdf, other]

Differentially Private Confidence Interval for Extrema of Parameters

Authors: Xiaowen Fu, Yang Xiang, Xinzhou Guo

Abstract: This paper aims to construct a valid and efficient confidence interval for the extrema of parameters under privacy protection. The usual statistical inference on the extrema of parameters often suffers from the selection bias issue, and the problem becomes more acute, as in many application scenarios of extrema parameters, we often need to protect the privacy of the data. In this paper, we focus o… ▽ More This paper aims to construct a valid and efficient confidence interval for the extrema of parameters under privacy protection. The usual statistical inference on the extrema of parameters often suffers from the selection bias issue, and the problem becomes more acute, as in many application scenarios of extrema parameters, we often need to protect the privacy of the data. In this paper, we focus on the exponential family of distributions and propose a privatized parametric bootstrap method to address selection bias in the extrema of parameters problem under the scheme of differential privacy. While the usual privatized parametric bootstrap does not address selection bias appropriately, we prove that with a privatized bias correction term, the proposed parametric bootstrap method can lead to a valid and efficient confidence interval for the extrema of parameters. We illustrate the proposed method with the Gaussian case and regression case and demonstrate the advantages of the proposed method via numerical experiments. △ Less

Submitted 5 March, 2023; originally announced March 2023.

arXiv:2302.13929 [pdf, other]

Efficient Informed Proposals for Discrete Distributions via Newton's Series Approximation

Authors: Yue Xiang, Dongyao Zhu, Bowen Lei, Dongkuan Xu, Ruqi Zhang

Abstract: Gradients have been exploited in proposal distributions to accelerate the convergence of Markov chain Monte Carlo algorithms on discrete distributions. However, these methods require a natural differentiable extension of the target discrete distribution, which often does not exist or does not provide effective gradient guidance. In this paper, we develop a gradient-like proposal for any discrete d… ▽ More Gradients have been exploited in proposal distributions to accelerate the convergence of Markov chain Monte Carlo algorithms on discrete distributions. However, these methods require a natural differentiable extension of the target discrete distribution, which often does not exist or does not provide effective gradient guidance. In this paper, we develop a gradient-like proposal for any discrete distribution without this strong requirement. Built upon a locally-balanced proposal, our method efficiently approximates the discrete likelihood ratio via Newton's series expansion to enable a large and efficient exploration in discrete spaces. We show that our method can also be viewed as a multilinear extension, thus inheriting its desired properties. We prove that our method has a guaranteed convergence rate with or without the Metropolis-Hastings step. Furthermore, our method outperforms a number of popular alternatives in several different experiments, including the facility location problem, extractive text summarization, and image retrieval. △ Less

Submitted 27 February, 2023; originally announced February 2023.

Comments: Published at AISTATS 2023

arXiv:2302.11771 [pdf, ps, other]

doi 10.1140/epjd/s10053-023-00613-9

Multipartite quantum cryptography based on the violation of Svetlichny's inequality

Authors: Yang Xiang

Abstract: Multipartite cryptography is useful for some particular missions. In this paper, we present a quantum key distribution scheme in which three separated observers can securely share a set of keys by using a sequence of $3$-particle GHZ states. We prove that the violation of Svetlichny's inequality can be utilized to test for eavesdropping, and even when the eavesdropper can completely control the ou… ▽ More Multipartite cryptography is useful for some particular missions. In this paper, we present a quantum key distribution scheme in which three separated observers can securely share a set of keys by using a sequence of $3$-particle GHZ states. We prove that the violation of Svetlichny's inequality can be utilized to test for eavesdropping, and even when the eavesdropper can completely control the outcomes of two participants' measurements, our scheme still ensures the security of the keys distribution. This scheme can be easily extended to the case of $N$-party keys distribution, and the violation of $N$-partite Svetlichny's inequality guarantees the security of the generalized scheme. Since the GHZ state has maximum entanglement, its perfect monogamy guarantee the device-independent security of our protocol. However quantum entanglement is a vulnerable resource which is often decayed during transmission, so we need here to derive the secret-key rate of our protocol under the condition of using quantum states with non-maximal entanglement. We then calculate the extractable secret-key rate of the three-party key distribution protocol for the Werner state in the device-independent scenario. We find that the value of the extractable secret-key rate monotonously approaches $1$ as the value of the visibility of the Werner state increases, and it reaches its maximum value $1$ when the Werner state becomes the GHZ state. △ Less

Submitted 22 February, 2023; originally announced February 2023.

Comments: 9 pages, 3 figures

Journal ref: Eur. Phys. J. D 77, 31 (2023)

arXiv:2302.08774 [pdf, other]

Vision, Deduction and Alignment: An Empirical Study on Multi-modal Knowledge Graph Alignment

Authors: Yangning Li, Jiaoyan Chen, Yinghui Li, Yuejia Xiang, Xi Chen, Hai-Tao Zheng

Abstract: Entity alignment (EA) for knowledge graphs (KGs) plays a critical role in knowledge engineering. Existing EA methods mostly focus on utilizing the graph structures and entity attributes (including literals), but ignore images that are common in modern multi-modal KGs. In this study we first constructed Multi-OpenEA -- eight large-scale, image-equipped EA benchmarks, and then evaluated some existin… ▽ More Entity alignment (EA) for knowledge graphs (KGs) plays a critical role in knowledge engineering. Existing EA methods mostly focus on utilizing the graph structures and entity attributes (including literals), but ignore images that are common in modern multi-modal KGs. In this study we first constructed Multi-OpenEA -- eight large-scale, image-equipped EA benchmarks, and then evaluated some existing embedding-based methods for utilizing images. In view of the complementary nature of visual modal information and logical deduction, we further developed a new multi-modal EA method named LODEME using logical deduction and multi-modal KG embedding, with state-of-the-art performance achieved on Multi-OpenEA and other existing multi-modal EA benchmarks. △ Less

Submitted 12 March, 2023; v1 submitted 17 February, 2023; originally announced February 2023.

Comments: Accepted by ICASSP2023

arXiv:2302.08271 [pdf, ps, other]

LiQuiD-MIMO Radar: Distributed MIMO Radar with Low-Bit Quantization

Authors: Yikun Xiang, Feng Xi, Shengyao Chen

Abstract: Distributed MIMO radar is known to achieve superior sensing performance by employing widely separated antennas. However, it is challenging to implement a low-complexity distributed MIMO radar due to the complex operations at both the receivers and the fusion center. This work proposed a low-bit quantized distributed MIMO (LiQuiD-MIMO) radar to significantly reduce the burden of signal acquisition… ▽ More Distributed MIMO radar is known to achieve superior sensing performance by employing widely separated antennas. However, it is challenging to implement a low-complexity distributed MIMO radar due to the complex operations at both the receivers and the fusion center. This work proposed a low-bit quantized distributed MIMO (LiQuiD-MIMO) radar to significantly reduce the burden of signal acquisition and data transmission. In the LiQuiD-MIMO radar, the widely-separated receivers are restricted to operating with low-resolution ADCs and deliver the low-bit quantized data to the fusion center. At the fusion center, the induced quantization distortion is explicitly compensated via digital processing. By exploiting the inherent structure of our problem, a quantized version of the robust principal component analysis (RPCA) problem is formulated to simultaneously recover the low-rank target information matrices as well as the sparse data transmission errors. The least squares-based method is then employed to estimate the targets' positions and velocities from the recovered target information matrices. Numerical experiments demonstrate that the proposed LiQuiD-MIMO radar, configured with the developed algorithm, can achieve accurate target parameter estimation. △ Less

Submitted 16 February, 2023; originally announced February 2023.

Comments: 5 pages, 4 figures

arXiv:2302.03793 [pdf, other]

Self-Supervised Unseen Object Instance Segmentation via Long-Term Robot Interaction

Authors: Yangxiao Lu, Ninad Khargonkar, Zesheng Xu, Charles Averill, Kamalesh Palanisamy, Kaiyu Hang, Yunhui Guo, Nicholas Ruozzi, Yu Xiang

Abstract: We introduce a novel robotic system for improving unseen object instance segmentation in the real world by leveraging long-term robot interaction with objects. Previous approaches either grasp or push an object and then obtain the segmentation mask of the grasped or pushed object after one action. Instead, our system defers the decision on segmenting objects after a sequence of robot pushing actio… ▽ More We introduce a novel robotic system for improving unseen object instance segmentation in the real world by leveraging long-term robot interaction with objects. Previous approaches either grasp or push an object and then obtain the segmentation mask of the grasped or pushed object after one action. Instead, our system defers the decision on segmenting objects after a sequence of robot pushing actions. By applying multi-object tracking and video object segmentation on the images collected via robot pushing, our system can generate segmentation masks of all the objects in these images in a self-supervised way. These include images where objects are very close to each other, and segmentation errors usually occur on these images for existing object segmentation networks. We demonstrate the usefulness of our system by fine-tuning segmentation networks trained on synthetic data with real-world data collected by our system. We show that, after fine-tuning, the segmentation accuracy of the networks is significantly improved both in the same domain and across different domains. In addition, we verify that the fine-tuned networks improve top-down robotic grasping of unseen objects in the real world. △ Less

Submitted 7 February, 2023; originally announced February 2023.

Comments: 11 pages, 7 figures, 5 tables

arXiv:2302.02604 [pdf, ps, other]

doi 10.1088/0256-307X/40/6/067401

Temperature dependent anisotropy and two-band superconductivity revealed by lower critical field in organic superconductor $κ$-(BEDT-TTF)$_{2}$Cu[N(CN)$_{2}$]Br

Authors: Huijing Mu, Jin Si, Qingui Yang, Ying Xiang, Haipeng Yang, Hai-Hu Wen

Abstract: Resistivity and magnetization have been measured at different temperatures and magnetic fields in organic superconductors $κ$-(BEDT-TTF)$_{2}$Cu[N(CN)$_{2}$]Br. The lower critical field and upper critical field are determined, which allow to depict a complete phase diagram. Through the comparison between the upper critical fields with magnetic field perpendicular and parallel to the conducting ac-… ▽ More Resistivity and magnetization have been measured at different temperatures and magnetic fields in organic superconductors $κ$-(BEDT-TTF)$_{2}$Cu[N(CN)$_{2}$]Br. The lower critical field and upper critical field are determined, which allow to depict a complete phase diagram. Through the comparison between the upper critical fields with magnetic field perpendicular and parallel to the conducting ac-planes, and the scaling of the in-plane resistivity with field along different directions, we found that the anisotropy $Γ$ is strongly temperature dependent. It is found that $Γ$ is quite large (above 20) near $T_{c}$, which satisfies the 2D model, but approaches a small value in the low-temperature region. The 2D-Tinkham model can also be used to fit the data at high temperatures. This is explained as a crossover from the orbital depairing mechanism in high-temperature and low-field region to the paramagnetic depairing mechanism in the high-field and low-temperature region. The temperature dependence of lower critical field $H_{c1} (T)$ shows a concave shape in wide temperature region. It is found that neither a single $d$-wave nor a single $s$-wave gap can fit the $H_{c1} (T)$, however a two-gap model containing an $s$-wave and a $d$-wave can fit the data rather well, suggesting two-band superconductivity and an unconventional pairing mechanism in this organic superconductor. △ Less

Submitted 6 February, 2023; originally announced February 2023.

Journal ref: Chinese Physics Letters 40, 067401 (2023)

arXiv:2302.00633 [pdf, other]

Deep Dependency Networks for Multi-Label Classification

Authors: Shivvrat Arya, Yu Xiang, Vibhav Gogate

Abstract: We propose a simple approach which combines the strengths of probabilistic graphical models and deep learning architectures for solving the multi-label classification task, focusing specifically on image and video data. First, we show that the performance of previous approaches that combine Markov Random Fields with neural networks can be modestly improved by leveraging more powerful methods such… ▽ More We propose a simple approach which combines the strengths of probabilistic graphical models and deep learning architectures for solving the multi-label classification task, focusing specifically on image and video data. First, we show that the performance of previous approaches that combine Markov Random Fields with neural networks can be modestly improved by leveraging more powerful methods such as iterative join graph propagation, integer linear programming, and $\ell_1$ regularization-based structure learning. Then we propose a new modeling framework called deep dependency networks, which augments a dependency network, a model that is easy to train and learns more accurate dependencies but is limited to Gibbs sampling for inference, to the output layer of a neural network. We show that despite its simplicity, jointly learning this new architecture yields significant improvements in performance over the baseline neural network. In particular, our experimental evaluation on three video activity classification datasets: Charades, Textually Annotated Cooking Scenes (TACoS), and Wetlab, and three multi-label image classification datasets: MS-COCO, PASCAL VOC, and NUS-WIDE show that deep dependency networks are almost always superior to pure neural architectures that do not use dependency networks. △ Less

Submitted 6 February, 2023; v1 submitted 1 February, 2023; originally announced February 2023.

arXiv:2302.00426 [pdf, other]

doi 10.1093/mnras/stad363

Magnetic topologies of two weak-line T Tauri stars TAP 4 and TAP 40

Authors: Yue Xiang, Shenghong Gu, J. -F. Donati, G. A. J. Hussain, A. Collier Cameron, the MaTYSSE collaboration

Abstract: We present a Zeeman-Doppler imaging study of two weak-line T Tauri stars TAP 4 and TAP 40, based on the high-resolution spectropolarimetric observations with ESPaDOnS at the Canada-France-Hawaii Telescope in November 2013, in the framework of the MaTYSSE large programme. We apply two Zeeman-Doppler imaging codes to the Stokes I and V profiles to reconstruct their brightness and large-scale magneti… ▽ More We present a Zeeman-Doppler imaging study of two weak-line T Tauri stars TAP 4 and TAP 40, based on the high-resolution spectropolarimetric observations with ESPaDOnS at the Canada-France-Hawaii Telescope in November 2013, in the framework of the MaTYSSE large programme. We apply two Zeeman-Doppler imaging codes to the Stokes I and V profiles to reconstruct their brightness and large-scale magnetic field images. The results given by the two imaging codes are in good agreement with each other. TAP 4 shows a large polar cool spot and several intermediate-latitude warm spots on its surface, whereas TAP 40 exhibits very weak variations in its Stokes I profiles suggesting a mostly unspotted photosphere. We detect Zeeman signatures in the Stokes V profiles of both stars. The reconstructed magnetic maps reveal dominantly toroidal fields, which enclose about 60 per cent of the total magnetic energy for both of TAP 4 and TAP 40. Both stars show prominent circular ring features of the azimuthal magnetic field. We derive a solar-like surface differential rotation on TAP 4 from the tomographic modelling. The brightness image of TAP 4 is used to predict the radial velocity jitters induced by its activity. After filtering out the activity jitter, the RMS of its RVs is reduced from 1.7 km s$^{-1}$ to 0.2 km s$^{-1}$, but we do not detect any periodic signals in the filtered RVs of TAP 4, implying that it is unlikely to host a close-in exoplanet more massive than $\sim$3.5 M$_{\rm Jup}$ at 0.1 au. △ Less

Submitted 1 February, 2023; originally announced February 2023.

Comments: 10 pages, 11 figures, accepted for publication in MNRAS

arXiv:2301.07409 [pdf, other]

doi 10.1109/TPAMI.2024.3386985

Representing Noisy Image Without Denoising

Authors: Shuren Qi, Yushu Zhang, Chao Wang, Tao Xiang, Xiaochun Cao, Yong Xiang

Abstract: A long-standing topic in artificial intelligence is the effective recognition of patterns from noisy images. In this regard, the recent data-driven paradigm considers 1) improving the representation robustness by adding noisy samples in training phase (i.e., data augmentation) or 2) pre-processing the noisy image by learning to solve the inverse problem (i.e., image denoising). However, such metho… ▽ More A long-standing topic in artificial intelligence is the effective recognition of patterns from noisy images. In this regard, the recent data-driven paradigm considers 1) improving the representation robustness by adding noisy samples in training phase (i.e., data augmentation) or 2) pre-processing the noisy image by learning to solve the inverse problem (i.e., image denoising). However, such methods generally exhibit inefficient process and unstable result, limiting their practical applications. In this paper, we explore a non-learning paradigm that aims to derive robust representation directly from noisy images, without the denoising as pre-processing. Here, the noise-robust representation is designed as Fractional-order Moments in Radon space (FMR), with also beneficial properties of orthogonality and rotation invariance. Unlike earlier integer-order methods, our work is a more generic design taking such classical methods as special cases, and the introduced fractional-order parameter offers time-frequency analysis capability that is not available in classical methods. Formally, both implicit and explicit paths for constructing the FMR are discussed in detail. Extensive simulation experiments and an image security application are provided to demonstrate the uniqueness and usefulness of our FMR, especially for noise robustness, rotation invariance, and time-frequency discriminability. △ Less

Submitted 19 June, 2024; v1 submitted 18 January, 2023; originally announced January 2023.

Comments: Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

arXiv:2301.05975 [pdf, ps, other]

Generalized Invariant Matching Property via LASSO

Authors: Kang Du, Yu Xiang

Abstract: Learning under distribution shifts is a challenging task. One principled approach is to exploit the invariance principle via the structural causal models. However, the invariance principle is violated when the response is intervened, making it a difficult setting. In a recent work, the invariant matching property has been developed to shed light on this scenario and shows promising performance. In… ▽ More Learning under distribution shifts is a challenging task. One principled approach is to exploit the invariance principle via the structural causal models. However, the invariance principle is violated when the response is intervened, making it a difficult setting. In a recent work, the invariant matching property has been developed to shed light on this scenario and shows promising performance. In this work, by formulating a high-dimensional problem with intrinsic sparsity, we generalize the invariant matching property for an important setting when only the target is intervened. We propose a more robust and computation-efficient algorithm by leveraging a variant of Lasso, improving upon the existing algorithms. △ Less

Submitted 11 March, 2023; v1 submitted 14 January, 2023; originally announced January 2023.

Comments: Accepted to the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023)

arXiv:2301.03398 [pdf, other]

Asynchronous Multi-Agent Reinforcement Learning for Efficient Real-Time Multi-Robot Cooperative Exploration

Authors: Chao Yu, Xinyi Yang, Jiaxuan Gao, Jiayu Chen, Yunfei Li, Jijia Liu, Yunfei Xiang, Ruixin Huang, Huazhong Yang, Yi Wu, Yu Wang

Abstract: We consider the problem of cooperative exploration where multiple robots need to cooperatively explore an unknown region as fast as possible. Multi-agent reinforcement learning (MARL) has recently become a trending paradigm for solving this challenge. However, existing MARL-based methods adopt action-making steps as the metric for exploration efficiency by assuming all the agents are acting in a f… ▽ More We consider the problem of cooperative exploration where multiple robots need to cooperatively explore an unknown region as fast as possible. Multi-agent reinforcement learning (MARL) has recently become a trending paradigm for solving this challenge. However, existing MARL-based methods adopt action-making steps as the metric for exploration efficiency by assuming all the agents are acting in a fully synchronous manner: i.e., every single agent produces an action simultaneously and every single action is executed instantaneously at each time step. Despite its mathematical simplicity, such a synchronous MARL formulation can be problematic for real-world robotic applications. It can be typical that different robots may take slightly different wall-clock times to accomplish an atomic action or even periodically get lost due to hardware issues. Simply waiting for every robot being ready for the next action can be particularly time-inefficient. Therefore, we propose an asynchronous MARL solution, Asynchronous Coordination Explorer (ACE), to tackle this real-world challenge. We first extend a classical MARL algorithm, multi-agent PPO (MAPPO), to the asynchronous setting and additionally apply action-delay randomization to enforce the learned policy to generalize better to varying action delays in the real world. Moreover, each navigation agent is represented as a team-size-invariant CNN-based policy, which greatly benefits real-robot deployment by handling possible robot lost and allows bandwidth-efficient intra-agent communication through low-dimensional CNN features. We first validate our approach in a grid-based scenario. Both simulation and real-robot results show that ACE reduces over 10% actual exploration time compared with classical approaches. We also apply our framework to a high-fidelity visual-based environment, Habitat, achieving 28% improvement in exploration efficiency. △ Less

Submitted 11 April, 2023; v1 submitted 9 January, 2023; originally announced January 2023.

Comments: This paper is accepted by AAMAS 2023. The source code can be found in https://github.com/yang-xy20/async_mappo

arXiv:2301.00308 [pdf, other]

High-Accuracy Absolute-Position-Aided Code Phase Tracking Based on RTK/INS Deep Integration in Challenging Static Scenarios

Authors: Yiran Luo, Li-Ta Hsu, Yang Jiang, Baoyu Liu, Zhetao Zhang, Yan Xiang, Naser El-Sheimy

Abstract: Many multi-sensor navigation systems urgently demand accurate positioning initialization from global navigation satellite systems (GNSSs) in challenging static scenarios. However, ground blockages against line-of-sight (LOS) signal reception make it difficult for GNSS users. Steering local codes in GNSS basebands is a desiring way to correct instantaneous signal phase misalignment, efficiently gat… ▽ More Many multi-sensor navigation systems urgently demand accurate positioning initialization from global navigation satellite systems (GNSSs) in challenging static scenarios. However, ground blockages against line-of-sight (LOS) signal reception make it difficult for GNSS users. Steering local codes in GNSS basebands is a desiring way to correct instantaneous signal phase misalignment, efficiently gathering useful signal power and increasing positioning accuracy. Besides, inertial navigation systems (INSs) have been used as a well-complementary dead reckoning (DR) sensor for GNSS receivers in kinematic scenarios resisting various interferences since early. But little work focuses on the case of whether the INS can improve GNSS receivers in static scenarios. Thus, this paper proposes an enhanced navigation system deeply integrated with low-cost INS solutions and GNSS high-accuracy carrier-based positioning. First, an absolute code phase is predicted from base station information, and integrated solution of the INS DR and real-time kinematic (RTK) results through an extended Kalman filter (EKF). Then, a numerically controlled oscillator (NCO) leverages the predicted code phase to improve the alignment between instantaneous local code phases and received ones. The proposed algorithm is realized in a vector-tracking GNSS software-defined radio (SDR). Real-world experiments demonstrate the proposed SDR regarding estimating time-of-arrival (TOA) and positioning accuracy. △ Less

Submitted 31 December, 2022; originally announced January 2023.

Comments: 27 pages, 18 figures

Showing 101–150 of 553 results for author: Xiang, Y