subscribe to arXiv mailings

Constrained Reinforcement Learning Under Model Mismatch

Authors: Zhongchang Sun, Sihong He, Fei Miao, Shaofeng Zou

Abstract: Existing studies on constrained reinforcement learning (RL) may obtain a well-performing policy in the training environment. However, when deployed in a real environment, it may easily violate constraints that were originally satisfied during training because there might be model mismatch between the training and real environments. To address the above challenge, we formulate the problem as constr… ▽ More Existing studies on constrained reinforcement learning (RL) may obtain a well-performing policy in the training environment. However, when deployed in a real environment, it may easily violate constraints that were originally satisfied during training because there might be model mismatch between the training and real environments. To address the above challenge, we formulate the problem as constrained RL under model uncertainty, where the goal is to learn a good policy that optimizes the reward and at the same time satisfy the constraint under model mismatch. We develop a Robust Constrained Policy Optimization (RCPO) algorithm, which is the first algorithm that applies to large/continuous state space and has theoretical guarantees on worst-case reward improvement and constraint violation at each iteration during the training. We demonstrate the effectiveness of our algorithm on a set of RL tasks with constraints. △ Less

Submitted 3 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

Comments: ICML 2024

arXiv:2405.01255 [pdf, ps, other]

Note on holographic torus stress tensor correlators in $AdS_3$ gravity

Authors: Song He, Yi Li, Yun-Ze Li, Yunda Zhang

Abstract: In the AdS$_3$/CFT$_2$ framework, the Euclidean BTZ black hole corresponds to the dominant high-temperature phase of its dual field theory. We initially employ perturbative methods to solve the Einstein equations as boundary value problems, providing correlators for the energy-momentum tensor operator at low points. Utilizing operator equations established in our previous work, we further compute… ▽ More In the AdS$_3$/CFT$_2$ framework, the Euclidean BTZ black hole corresponds to the dominant high-temperature phase of its dual field theory. We initially employ perturbative methods to solve the Einstein equations as boundary value problems, providing correlators for the energy-momentum tensor operator at low points. Utilizing operator equations established in our previous work, we further compute arbitrary high-point correlators for the energy-momentum tensor operator in the high-temperature phase and recursive relations for these high-point functions. Concurrently, we employ the Chern-Simons formalism to derive consistent results. Further, using the cut-off AdS/$T\bar{T}$-deformed CFT duality, we calculate the energy-momentum tensor correlators, contributing to the comprehensive understanding of the system's dynamics. Finally, stress tensor correlators enable us to ascertain the corresponding KdV operator correlators at low-temperature. △ Less

Submitted 25 June, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

Comments: 46 pages

arXiv:2405.00476 [pdf, other]

A Comprehensive Survey of Dynamic Graph Neural Networks: Models, Frameworks, Benchmarks, Experiments and Challenges

Authors: ZhengZhao Feng, Rui Wang, TianXing Wang, Mingli Song, Sai Wu, Shuibing He

Abstract: Dynamic Graph Neural Networks (GNNs) combine temporal information with GNNs to capture structural, temporal, and contextual relationships in dynamic graphs simultaneously, leading to enhanced performance in various applications. As the demand for dynamic GNNs continues to grow, numerous models and frameworks have emerged to cater to different application needs. There is a pressing need for a compr… ▽ More Dynamic Graph Neural Networks (GNNs) combine temporal information with GNNs to capture structural, temporal, and contextual relationships in dynamic graphs simultaneously, leading to enhanced performance in various applications. As the demand for dynamic GNNs continues to grow, numerous models and frameworks have emerged to cater to different application needs. There is a pressing need for a comprehensive survey that evaluates the performance, strengths, and limitations of various approaches in this domain. This paper aims to fill this gap by offering a thorough comparative analysis and experimental evaluation of dynamic GNNs. It covers 81 dynamic GNN models with a novel taxonomy, 12 dynamic GNN training frameworks, and commonly used benchmarks. We also conduct experimental results from testing representative nine dynamic GNN models and three frameworks on six standard graph datasets. Evaluation metrics focus on convergence accuracy, training efficiency, and GPU memory usage, enabling a thorough comparison of performance across various models and frameworks. From the analysis and evaluation results, we identify key challenges and offer principles for future research to enhance the design of models and frameworks in the dynamic GNNs field. △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: Under review of PVLDB2025

arXiv:2404.18419 [pdf]

Research on Intelligent Aided Diagnosis System of Medical Image Based on Computer Deep Learning

Authors: Jiajie Yuan, Linxiao Wu, Yulu Gong, Zhou Yu, Ziang Liu, Shuyao He

Abstract: This paper combines Struts and Hibernate two architectures together, using DAO (Data Access Object) to store and access data. Then a set of dual-mode humidity medical image library suitable for deep network is established, and a dual-mode medical image assisted diagnosis method based on the image is proposed. Through the test of various feature extraction methods, the optimal operating characteris… ▽ More This paper combines Struts and Hibernate two architectures together, using DAO (Data Access Object) to store and access data. Then a set of dual-mode humidity medical image library suitable for deep network is established, and a dual-mode medical image assisted diagnosis method based on the image is proposed. Through the test of various feature extraction methods, the optimal operating characteristic under curve product (AUROC) is 0.9985, the recall rate is 0.9814, and the accuracy is 0.9833. This method can be applied to clinical diagnosis, and it is a practical method. Any outpatient doctor can register quickly through the system, or log in to the platform to upload the image to obtain more accurate images. Through the system, each outpatient physician can quickly register or log in to the platform for image uploading, thus obtaining more accurate images. The segmentation of images can guide doctors in clinical departments. Then the image is analyzed to determine the location and nature of the tumor, so as to make targeted treatment. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.17379 [pdf]

Adaptive speed planning for Unmanned Vehicle Based on Deep Reinforcement Learning

Authors: Hao Liu, Yi Shen, Wenjing Zhou, Yuelin Zou, Chang Zhou, Shuyao He

Abstract: In order to solve the problem of frequent deceleration of unmanned vehicles when approaching obstacles, this article uses a Deep Q-Network (DQN) and its extension, the Double Deep Q-Network (DDQN), to develop a local navigation system that adapts to obstacles while maintaining optimal speed planning. By integrating improved reward functions and obstacle angle determination methods, the system demo… ▽ More In order to solve the problem of frequent deceleration of unmanned vehicles when approaching obstacles, this article uses a Deep Q-Network (DQN) and its extension, the Double Deep Q-Network (DDQN), to develop a local navigation system that adapts to obstacles while maintaining optimal speed planning. By integrating improved reward functions and obstacle angle determination methods, the system demonstrates significant enhancements in maneuvering capabilities without frequent decelerations. Experiments conducted in simulated environments with varying obstacle densities confirm the effectiveness of the proposed method in achieving more stable and efficient path planning. △ Less

Submitted 26 April, 2024; originally announced April 2024.

arXiv:2404.16687 [pdf, other]

NTIRE 2024 Quality Assessment of AI-Generated Content Challenge

Authors: Xiaohong Liu, Xiongkuo Min, Guangtao Zhai, Chunyi Li, Tengchuan Kou, Wei Sun, Haoning Wu, Yixuan Gao, Yuqin Cao, Zicheng Zhang, Xiele Wu, Radu Timofte, Fei Peng, Huiyuan Fu, Anlong Ming, Chuanming Wang, Huadong Ma, Shuai He, Zifei Dou, Shu Chen, Huacong Zhang, Haiyi Xie, Chengwei Wang, Baoying Chen, Jishen Zeng , et al. (89 additional authors not shown)

Abstract: This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Conte… ▽ More This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Content (AIGC). The challenge is divided into the image track and the video track. The image track uses the AIGIQA-20K, which contains 20,000 AI-Generated Images (AIGIs) generated by 15 popular generative models. The image track has a total of 318 registered participants. A total of 1,646 submissions are received in the development phase, and 221 submissions are received in the test phase. Finally, 16 participating teams submitted their models and fact sheets. The video track uses the T2VQA-DB, which contains 10,000 AI-Generated Videos (AIGVs) generated by 9 popular Text-to-Video (T2V) models. A total of 196 participants have registered in the video track. A total of 991 submissions are received in the development phase, and 185 submissions are received in the test phase. Finally, 12 participating teams submitted their models and fact sheets. Some methods have achieved better results than baseline methods, and the winning methods in both tracks have demonstrated superior prediction performance on AIGC. △ Less

Submitted 7 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.16611 [pdf, ps, other]

Towards Symbiotic SAGIN Through Inter-operator Resource and Service Sharing: Joint Orchestration of User Association and Radio Resources

Authors: Shizhao He, Jungang Ge, Ying-Chang Liang, Dusit Niyato

Abstract: The space-air-ground integrated network (SAGIN) is a pivotal architecture to support ubiquitous connectivity in the upcoming 6G era. Inter-operator resource and service sharing is a promising way to realize such a huge network, utilizing resources efficiently and reducing construction costs. Given the rationality of operators, the configuration of resources and services in SAGIN should focus on bo… ▽ More The space-air-ground integrated network (SAGIN) is a pivotal architecture to support ubiquitous connectivity in the upcoming 6G era. Inter-operator resource and service sharing is a promising way to realize such a huge network, utilizing resources efficiently and reducing construction costs. Given the rationality of operators, the configuration of resources and services in SAGIN should focus on both the overall system performance and individual benefits of operators. Motivated by emerging symbiotic communication facilitating mutual benefits across different radio systems, we investigate the resource and service sharing in SAGIN from a symbiotic communication perspective in this paper. In particular, we consider a SAGIN consisting of a ground network operator (GNO) and a satellite network operator (SNO). Specifically, we aim to maximize the weighted sum rate (WSR) of the whole SAGIN by jointly optimizing the user association, resource allocation, and beamforming. Besides, we introduce a sharing coefficient to characterize the revenue of operators. Operators may suffer revenue loss when only focusing on maximizing the WSR. In pursuit of mutual benefits, we propose a mutual benefit constraint (MBC) to ensure that each operator obtains revenue gains. Then, we develop a centralized algorithm based on the successive convex approximation (SCA) method. Considering that the centralized algorithm is difficult to implement, we propose a distributed algorithm based on Lagrangian dual decomposition and the consensus alternating direction method of multipliers (ADMM). Finally, we provide extensive numerical simulations to demonstrate the effectiveness of the two proposed algorithms, and the distributed optimization algorithm can approach the performance of the centralized one. △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.16571 [pdf, other]

MonoPCC: Photometric-invariant Cycle Constraint for Monocular Depth Estimation of Endoscopic Images

Authors: Zhiwei Wang, Ying Zhou, Shiquan He, Ting Li, Fan Huang, Qiang Ding, Xinxia Feng, Mei Liu, Qiang Li

Abstract: Photometric constraint is indispensable for self-supervised monocular depth estimation. It involves warping a source image onto a target view using estimated depth&pose, and then minimizing the difference between the warped and target images. However, the endoscopic built-in light causes significant brightness fluctuations, and thus makes the photometric constraint unreliable. Previous efforts onl… ▽ More Photometric constraint is indispensable for self-supervised monocular depth estimation. It involves warping a source image onto a target view using estimated depth&pose, and then minimizing the difference between the warped and target images. However, the endoscopic built-in light causes significant brightness fluctuations, and thus makes the photometric constraint unreliable. Previous efforts only mitigate this relying on extra models to calibrate image brightness. In this paper, we propose MonoPCC to address the brightness inconsistency radically by reshaping the photometric constraint into a cycle form. Instead of only warping the source image, MonoPCC constructs a closed loop consisting of two opposite forward-backward warping paths: from target to source and then back to target. Thus, the target image finally receives an image cycle-warped from itself, which naturally makes the constraint invariant to brightness changes. Moreover, MonoPCC transplants the source image's phase-frequency into the intermediate warped image to avoid structure lost, and also stabilizes the training via an exponential moving average (EMA) strategy to avoid frequent changes in the forward warping. The comprehensive and extensive experimental results on four endoscopic datasets demonstrate that our proposed MonoPCC shows a great robustness to the brightness inconsistency, and exceeds other state-of-the-arts by reducing the absolute relative error by at least 7.27%, 9.38%, 9.90% and 3.17%, respectively. △ Less

Submitted 7 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

Comments: 11 pages, 10 figures

arXiv:2404.15127 [pdf, other]

MedDr: Diagnosis-Guided Bootstrapping for Large-Scale Medical Vision-Language Learning

Authors: Sunan He, Yuxiang Nie, Zhixuan Chen, Zhiyuan Cai, Hongmei Wang, Shu Yang, Hao Chen

Abstract: The rapid advancement of large-scale vision-language models has showcased remarkable capabilities across various tasks. However, the lack of extensive and high-quality image-text data in medicine has greatly hindered the development of large-scale medical vision-language models. In this work, we present a diagnosis-guided bootstrapping strategy that exploits both image and label information to con… ▽ More The rapid advancement of large-scale vision-language models has showcased remarkable capabilities across various tasks. However, the lack of extensive and high-quality image-text data in medicine has greatly hindered the development of large-scale medical vision-language models. In this work, we present a diagnosis-guided bootstrapping strategy that exploits both image and label information to construct vision-language datasets. Based on the constructed dataset, we developed MedDr, a generalist foundation model for healthcare capable of handling diverse medical data modalities, including radiology, pathology, dermatology, retinography, and endoscopy. Moreover, during inference, we propose a simple but effective retrieval-augmented medical diagnosis strategy, which enhances the model's generalization ability. Extensive experiments on visual question answering, medical report generation, and medical image diagnosis demonstrate the superiority of our method. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2404.14741 [pdf, other]

Generate-on-Graph: Treat LLM as both Agent and KG in Incomplete Knowledge Graph Question Answering

Authors: Yao Xu, Shizhu He, Jiabei Chen, Zihao Wang, Yangqiu Song, Hanghang Tong, Kang Liu, Jun Zhao

Abstract: To address the issue of insufficient knowledge and the tendency to generate hallucination in Large Language Models (LLMs), numerous studies have endeavored to integrate LLMs with Knowledge Graphs (KGs). However, all these methods are evaluated on conventional Knowledge Graph Question Answering (KGQA) with complete KGs, where the factual triples involved in each question are entirely covered by the… ▽ More To address the issue of insufficient knowledge and the tendency to generate hallucination in Large Language Models (LLMs), numerous studies have endeavored to integrate LLMs with Knowledge Graphs (KGs). However, all these methods are evaluated on conventional Knowledge Graph Question Answering (KGQA) with complete KGs, where the factual triples involved in each question are entirely covered by the given KG. In this situation, LLM mainly acts as an agent to find answer entities by exploring the KG, rather than effectively integrating internal and external knowledge sources. However, in real-world scenarios, KGs are often incomplete to cover all the knowledge required to answer questions. To simulate real-world scenarios and evaluate the ability of LLMs to integrate internal and external knowledge, in this paper, we propose leveraging LLMs for QA under Incomplete Knowledge Graph (IKGQA), where the given KG doesn't include all the factual triples involved in each question. To handle IKGQA, we propose a training-free method called Generate-on-Graph (GoG) that can generate new factual triples while exploring on KGs. Specifically, we propose a selecting-generating-answering framework, which not only treat the LLM as an agent to explore on KGs, but also treat it as a KG to generate new facts based on the explored subgraph and its inherent knowledge. Experimental results on two datasets demonstrate that our GoG can solve IKGQA to a certain extent, while almost all previous methods cannot perform well on IKGQA. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2404.14700 [pdf, other]

FlashSpeech: Efficient Zero-Shot Speech Synthesis

Authors: Zhen Ye, Zeqian Ju, Haohe Liu, Xu Tan, Jianyi Chen, Yiwen Lu, Peiwen Sun, Jiahao Pan, Weizhen Bian, Shulin He, Qifeng Liu, Yike Guo, Wei Xue

Abstract: Recent progress in large-scale zero-shot speech synthesis has been significantly advanced by language models and diffusion models. However, the generation process of both methods is slow and computationally intensive. Efficient speech synthesis using a lower computing budget to achieve quality on par with previous work remains a significant challenge. In this paper, we present FlashSpeech, a large… ▽ More Recent progress in large-scale zero-shot speech synthesis has been significantly advanced by language models and diffusion models. However, the generation process of both methods is slow and computationally intensive. Efficient speech synthesis using a lower computing budget to achieve quality on par with previous work remains a significant challenge. In this paper, we present FlashSpeech, a large-scale zero-shot speech synthesis system with approximately 5\% of the inference time compared with previous work. FlashSpeech is built on the latent consistency model and applies a novel adversarial consistency training approach that can train from scratch without the need for a pre-trained diffusion model as the teacher. Furthermore, a new prosody generator module enhances the diversity of prosody, making the rhythm of the speech sound more natural. The generation processes of FlashSpeech can be achieved efficiently with one or two sampling steps while maintaining high audio quality and high similarity to the audio prompt for zero-shot speech generation. Our experimental results demonstrate the superior performance of FlashSpeech. Notably, FlashSpeech can be about 20 times faster than other zero-shot speech synthesis systems while maintaining comparable performance in terms of voice quality and similarity. Furthermore, FlashSpeech demonstrates its versatility by efficiently performing tasks like voice conversion, speech editing, and diverse speech sampling. Audio samples can be found in https://flashspeech.github.io/. △ Less

Submitted 24 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

Comments: Efficient zero-shot speech synthesis

arXiv:2404.12170 [pdf, other]

Secure Semantic Communication for Image Transmission in the Presence of Eavesdroppers

Authors: Shunpu Tang, Chen Liu, Qianqian Yang, Shibo He, Dusit Niyato

Abstract: Semantic communication (SemCom) has emerged as a key technology for the forthcoming sixth-generation (6G) network, attributed to its enhanced communication efficiency and robustness against channel noise. However, the open nature of wireless channels renders them vulnerable to eavesdropping, posing a serious threat to privacy. To address this issue, we propose a novel secure semantic communication… ▽ More Semantic communication (SemCom) has emerged as a key technology for the forthcoming sixth-generation (6G) network, attributed to its enhanced communication efficiency and robustness against channel noise. However, the open nature of wireless channels renders them vulnerable to eavesdropping, posing a serious threat to privacy. To address this issue, we propose a novel secure semantic communication (SemCom) approach for image transmission, which integrates steganography technology to conceal private information within non-private images (host images). Specifically, we propose an invertible neural network (INN)-based signal steganography approach, which embeds channel input signals of a private image into those of a host image before transmission. This ensures that the original private image can be reconstructed from the received signals at the legitimate receiver, while the eavesdropper can only decode the information of the host image. Simulation results demonstrate that the proposed approach maintains comparable reconstruction quality of both host and private images at the legitimate receiver, compared to scenarios without any secure mechanisms. Experiments also show that the eavesdropper is only able to reconstruct host images, showcasing the enhanced security provided by our approach. △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.12109 [pdf, other]

Revisiting holographic model for thermal and dense QCD with a critical point

Authors: Qingxuan Fu, Song He, Li Li, Zhibin Li

Abstract: To quantitatively provide reliable predictions for the hot and dense QCD matter, a holographic model should be adjusted to describe first-principles lattice results available at vanishing baryon chemical potential. The equation of state from two well-known lattice groups, the HotQCD collaboration and the Wuppertal-Budapest (WB) collaboration, shows visible differences at high temperatures. We revi… ▽ More To quantitatively provide reliable predictions for the hot and dense QCD matter, a holographic model should be adjusted to describe first-principles lattice results available at vanishing baryon chemical potential. The equation of state from two well-known lattice groups, the HotQCD collaboration and the Wuppertal-Budapest (WB) collaboration, shows visible differences at high temperatures. We revisit the Einstein-Maxwell-dilaton (EMD) holographic model for hot QCD with 2+1 flavors and physical quark masses by fitting lattice QCD data from the WB collaboration. Using the parameterization for the scalar potential and gauge coupling proposed in our work [Phys.Rev.D 106 (2022) 12, L121902], the equation of state, the higher order baryon number susceptibilities, and the chiral condensates are in quantitative agreement with state-of-the-art lattice results. We find that the critical endpoint (CEP) obtained from fitting the WB collaboration data is nearly identical to the one from the HotQCD collaboration, suggesting the robustness of the location of the CEP. Moreover, our holographic prediction for the CEP location is in accord with more recent Bayesian analysis on a large number of holographic EMD models and an effective potential approach of QCD from gap equations. △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.11313 [pdf, other]

NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

Authors: Xin Li, Kun Yuan, Yajing Pei, Yiting Lu, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Jianhui Sun, Tianyi Wang, Lei Li, Han Kong, Wenxuan Wang, Bing Li, Cheng Luo , et al. (43 additional authors not shown)

Abstract: This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The… ▽ More This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The purpose is to build new benchmarks and advance the development of S-UGC VQA. The competition had 200 participants and 13 teams submitted valid solutions for the final testing phase. The proposed solutions achieved state-of-the-art performances for S-UGC VQA. The project can be found at https://github.com/lixinustc/KVQChallenge-CVPR-NTIRE2024. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: Accepted by CVPR2024 Workshop. The challenge report for CVPR NTIRE2024 Short-form UGC Video Quality Assessment Challenge

arXiv:2404.11105 [pdf, other]

XMiner: Efficient Directed Subgraph Matching with Pattern Reduction

Authors: Pingpeng Yuan, Yujiang Wang, Tianyu Ma, Siyuan He, Ling Liu

Abstract: Graph pattern matching, one of the fundamental graph mining problems, aims to extract structural patterns of interest from an input graph. The state-of-the-art graph matching algorithms and systems are mainly designed for undirected graphs. Directed graph matching is more complex than undirected graph matching because the edge direction must be taken into account before the exploration of each dir… ▽ More Graph pattern matching, one of the fundamental graph mining problems, aims to extract structural patterns of interest from an input graph. The state-of-the-art graph matching algorithms and systems are mainly designed for undirected graphs. Directed graph matching is more complex than undirected graph matching because the edge direction must be taken into account before the exploration of each directed edge. Thus, the technologies (e.g. storage, exploiting symmetry for graph matching) for undirected graph matching may not be fully applicable to directed graphs. For example, the redundancy implied in directed graph pattern can not be detected using the symmetry breaking for undirected pattern graph. Here, we present XMiner for efficient directed graph pattern matching whose core idea is 'pattern reduction'. It first analyzes the relationship between constraints implied in a pattern digraph. Then it reduces the pattern graph into a simplified form by finding a minimum constraint cover. Finally, XMiner generates an execution plan and follows it to extract matchings of the pattern graph. So, XMiner works on simplified pattern graph and avoids much data access and redundant computation throughout the matching process. Our experimental results show that XMiner outperforms state-of the-art stand-alone graph matching systems, and scales to complex graph pattern matching tasks on larger graph. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.10378 [pdf, other]

Second Edition FRCSyn Challenge at CVPR 2024: Face Recognition Challenge in the Era of Synthetic Data

Authors: Ivan DeAndres-Tame, Ruben Tolosana, Pietro Melzi, Ruben Vera-Rodriguez, Minchul Kim, Christian Rathgeb, Xiaoming Liu, Aythami Morales, Julian Fierrez, Javier Ortega-Garcia, Zhizhou Zhong, Yuge Huang, Yuxi Mi, Shouhong Ding, Shuigeng Zhou, Shuai He, Lingzhi Fu, Heng Cong, Rongyu Zhang, Zhihong Xiao, Evgeny Smirnov, Anton Pimenov, Aleksei Grigorev, Denis Timoshenko, Kaleb Mesfin Asfaw , et al. (33 additional authors not shown)

Abstract: Synthetic data is gaining increasing relevance for training machine learning models. This is mainly motivated due to several factors such as the lack of real data and intra-class variability, time and errors produced in manual labeling, and in some cases privacy concerns, among others. This paper presents an overview of the 2nd edition of the Face Recognition Challenge in the Era of Synthetic Data… ▽ More Synthetic data is gaining increasing relevance for training machine learning models. This is mainly motivated due to several factors such as the lack of real data and intra-class variability, time and errors produced in manual labeling, and in some cases privacy concerns, among others. This paper presents an overview of the 2nd edition of the Face Recognition Challenge in the Era of Synthetic Data (FRCSyn) organized at CVPR 2024. FRCSyn aims to investigate the use of synthetic data in face recognition to address current technological limitations, including data privacy concerns, demographic biases, generalization to novel scenarios, and performance constraints in challenging situations such as aging, pose variations, and occlusions. Unlike the 1st edition, in which synthetic data from DCFace and GANDiffFace methods was only allowed to train face recognition systems, in this 2nd edition we propose new sub-tasks that allow participants to explore novel face generative methods. The outcomes of the 2nd FRCSyn Challenge, along with the proposed experimental protocol and benchmarking contribute significantly to the application of synthetic data to face recognition. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: arXiv admin note: text overlap with arXiv:2311.10476

Journal ref: IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRw 2024)

arXiv:2404.10365 [pdf, other]

Learning Wireless Data Knowledge Graph for Green Intelligent Communications: Methodology and Experiments

Authors: Yongming Huang, Xiaohu You, Hang Zhan, Shiwen He, Ningning Fu, Wei Xu

Abstract: Intelligent communications have played a pivotal role in shaping the evolution of 6G networks. Native artificial intelligence (AI) within green communication systems must meet stringent real-time requirements. To achieve this, deploying lightweight and resource-efficient AI models is necessary. However, as wireless networks generate a multitude of data fields and indicators during operation, only… ▽ More Intelligent communications have played a pivotal role in shaping the evolution of 6G networks. Native artificial intelligence (AI) within green communication systems must meet stringent real-time requirements. To achieve this, deploying lightweight and resource-efficient AI models is necessary. However, as wireless networks generate a multitude of data fields and indicators during operation, only a fraction of them imposes significant impact on the network AI models. Therefore, real-time intelligence of communication systems heavily relies on a small but critical set of the data that profoundly influences the performance of network AI models. These challenges underscore the need for innovative architectures and solutions. In this paper, we propose a solution, termed the pervasive multi-level (PML) native AI architecture, which integrates the concept of knowledge graph (KG) into the intelligent operational manipulations of mobile networks, resulting in the establishment of a wireless data KG. Leveraging the wireless data KG, we characterize the massive and complex data collected from wireless communication networks and analyze the relationships among various data fields. The obtained graph of data field relations enables the on-demand generation of minimal and effective datasets, referred to as feature datasets, tailored to specific application requirements. Consequently, this architecture not only enhances AI training, inference, and validation processes but also significantly reduces resource wastage and overhead for communication networks. To implement this architecture, we have developed a specific solution comprising a spatio-temporal heterogeneous graph attention neural network model (STREAM) as well as a feature dataset generation algorithm. Experiments are conducted to validate the effectiveness of the proposed architecture. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: 12 pages,11 figures

arXiv:2404.10144 [pdf]

Grain boundary metastability controls irradiation resistance in nanocrystalline metals

Authors: Osman El-Atwani, Annie K. Barnett, Enrique Martinez, Jian Han, Asher C. Leff, Chang-Yu Hung, James E. Nathaniel, Sicong He, Emily H. Mang, Larissa M. Woryk, Khalid Hattar, Blas P. Uberuaga, David J. Srolovitz, Michael L. Falk, Jaime Marian, Mitra L. Taheri

Abstract: Grain boundaries (GBs) in polycrystalline materials are powerful sinks for irradiation defects. While standard theories assume that the sink efficiency of a grain boundary is defined solely by its character before irradiation, recent evidence conclusively shows that the irradiation sink efficiency is a highly dynamic property controlled by the intrinsic metastability of GBs under far-from-equilibr… ▽ More Grain boundaries (GBs) in polycrystalline materials are powerful sinks for irradiation defects. While standard theories assume that the sink efficiency of a grain boundary is defined solely by its character before irradiation, recent evidence conclusively shows that the irradiation sink efficiency is a highly dynamic property controlled by the intrinsic metastability of GBs under far-from-equilibrium irradiation conditions. In this paper, we reveal that the denuded (i.e., defect-free) zone, typically the signature of a strong sink, can collapse as irradiation damage accumulates. We propose a radiation damage evolution model that captures this behavior based on the emergence of a series of irradiation defect-enabled metastable GB microstate changes that dynamically alter the ability of the GB to absorb further damage. We show that these microstate changes control further defect absorption and give rise to the formation of a defect network that manifests itself as a net Nye-tensor signal detectable via lattice curvature experiments. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2404.09125 [pdf]

Achieving High Yield of Perpendicular SOT-MTJ Manufactured on 300 mm Wafers

Authors: Wenlong Yang, Zhenghui Ji, Yang Gao, Kaiyuan Zhou, Qijun Guo, Dinggui Zeng, Shasha Wang, Ming Wang, Lijie Shen, Guilin Chen, Yihui Sun, Enlong Liu, Shikun He

Abstract: The large-scale fabrication of three-terminal magnetic tunnel junctions (MTJs) with high yield is becoming increasingly crucial, especially with the growing interest in spin-orbit torque (SOT) magnetic random access memory (MRAM) as the next generation of MRAM technology. To achieve high yield and consistent device performance in MTJs with perpendicular magnetic anisotropy, an integration flow has… ▽ More The large-scale fabrication of three-terminal magnetic tunnel junctions (MTJs) with high yield is becoming increasingly crucial, especially with the growing interest in spin-orbit torque (SOT) magnetic random access memory (MRAM) as the next generation of MRAM technology. To achieve high yield and consistent device performance in MTJs with perpendicular magnetic anisotropy, an integration flow has been developed that incorporates special MTJ etching technique and other CMOS-compatible processes on a 300 mm wafer manufacturing platform. Systematic studies have been conducted on device performance and statistical uniformity, encompassing magnetic properties, electrical switching behavior, and reliability. Achievements include a switching current of 680 uA at 2 ns, a TMR as high as 119%, ultra-high endurance (over 1012 cycles), and excellent uniformity in the fabricated SOT-MTJ devices, with a yield of up to 99.6%. The proposed integration process, featuring high yield, is anticipated to streamline the mass production of SOT-MRAM. △ Less

Submitted 13 April, 2024; originally announced April 2024.

Comments: 8 pages, 5 figures

ACM Class: J.2.6

arXiv:2404.08943 [pdf, other]

A Novel State-Centric Necessary Condition for Time-Optimal Control of Controllable Linear Systems Based on Augmented Switching Laws

Authors: Yunan Wang, Chuxiong Hu, Yujie Lin, Zeyang Li, Shize Lin, Suqin He

Abstract: Most existing necessary conditions for optimal control based on adjoining methods require both state information and costate information, yet the lack of costates for a given feasible trajectory in practice impedes the determination of optimality. This paper establishes a novel theoretical framework for time-optimal control of controllable linear systems, proposing the augmented switching law that… ▽ More Most existing necessary conditions for optimal control based on adjoining methods require both state information and costate information, yet the lack of costates for a given feasible trajectory in practice impedes the determination of optimality. This paper establishes a novel theoretical framework for time-optimal control of controllable linear systems, proposing the augmented switching law that represents the input control and the feasibility in a compact form. Given a feasible trajectory, the disturbed trajectory under the constraints of augmented switching law is guaranteed to be feasible, resulting in a novel state-centric necessary condition without dependence on costate information. A first order necessary condition is proposed that the Jacobian matrix of the augmented switching law is not full row rank, which also results in an approach to optimizing a given feasible trajectory further. The proposed necessary condition is applied to the chain-of-integrators systems with full box constraints, contributing to some conclusions challenging to reason by traditional costate-based necessary conditions. △ Less

Submitted 13 April, 2024; originally announced April 2024.

arXiv:2404.08896 [pdf, other]

Approximate Cluster-Based Sparse Document Retrieval with Segmented Maximum Term Weights

Authors: Yifan Qiao, Shanxiu He, Yingrui Yang, Parker Carlson, Tao Yang

Abstract: This paper revisits cluster-based retrieval that partitions the inverted index into multiple groups and skips the index partially at cluster and document levels during online inference using a learned sparse representation. It proposes an approximate search scheme with two parameters to control the rank-safeness competitiveness of pruning with segmented maximum term weights within each cluster. Cl… ▽ More This paper revisits cluster-based retrieval that partitions the inverted index into multiple groups and skips the index partially at cluster and document levels during online inference using a learned sparse representation. It proposes an approximate search scheme with two parameters to control the rank-safeness competitiveness of pruning with segmented maximum term weights within each cluster. Cluster-level maximum weight segmentation allows an improvement in the rank score bound estimation and threshold-based pruning to be approximately adaptive to bound estimation tightness, resulting in better relevance and efficiency. The experiments with MS MARCO passage ranking and BEIR datasets demonstrate the usefulness of the proposed scheme with a comparison to the baselines. This paper presents the design of this approximate retrieval scheme with rank-safeness analysis, compares clustering and segmentation options, and reports evaluation results. △ Less

Submitted 13 April, 2024; originally announced April 2024.

arXiv:2404.08217 [pdf, other]

Avoid Arguments and Escape with Your Self: Expressive Subtyping and Decidable Bidirectional Checking for Reachability Types

Authors: Songlin Jia, Guannan Wei, Siyuan He, Yuyan Bao, Tiark Rompf

Abstract: Despite Rust's success in systems programming, its ``shared XOR mutable'' principle significantly restricts how mutable values can be used, precluding many useful functional programming idioms. Reachability types are a recent proposal to address the key limitations of Rust-style approaches by tracking, rather than prohibiting, shared, escaping, and mutable data, even in the presence of higher-orde… ▽ More Despite Rust's success in systems programming, its ``shared XOR mutable'' principle significantly restricts how mutable values can be used, precluding many useful functional programming idioms. Reachability types are a recent proposal to address the key limitations of Rust-style approaches by tracking, rather than prohibiting, shared, escaping, and mutable data, even in the presence of higher-order functions and polymorphic types. The key to enabling such expressiveness is the notion of self-references in reachability qualifiers. However, self-references present major challenges in designing expressive subtyping and decidable type checking algorithms, since self-references are neither fully covariant nor fully contravariant, yet still need to vary in certain circumstances. This lack of an effective type checking algorithm is a key impediment toward making reachability types truly practical, and leveraging them to bring the benefits of programming with lifetimes and sharing to practical higher-level languages. In this paper, we investigate the issues of subtyping and type checking of self-references for reachability types. We address key gaps in previous work by proposing a refined notion of subtyping, which more smoothly supports features such as Church-encoded datatypes, making the overall system more expressive. We also develop a sound and decidable bidirectional type checking algorithm, implemented and verified in Coq. △ Less

Submitted 15 July, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

arXiv:2404.08214 [pdf, other]

doi 10.1364/OE.515903

Entanglement signatures for quantum synchronization with single-ion phonon laser

Authors: Si-Wen He, Zhi-Jiao Deng, Yi Xie, Yan-Yi Wang, Ping-Xing Chen

Abstract: The entanglement properties of quantum synchronization, based on a single-ion phonon laser subjected to an external drive, have been studied. It is found that the maximum value of steady-state entanglement between the ion's internal and external states occurs near the noiseless boundary from synchronization to unsynchronization, accompanied by noticeable oscillatory behaviors during the correspond… ▽ More The entanglement properties of quantum synchronization, based on a single-ion phonon laser subjected to an external drive, have been studied. It is found that the maximum value of steady-state entanglement between the ion's internal and external states occurs near the noiseless boundary from synchronization to unsynchronization, accompanied by noticeable oscillatory behaviors during the corresponding time evolution of entanglement. In addition, the later time dynamics of entanglement also indicates the occurrence of frequency entrainment, as evidenced by the strong consistency between the bending of the observed frequency and the emergence of Liouvillian exceptional points (LEPs) in the first two eigenvalues of the Liouvillian eigenspectrum. Moreover, the emergence of LEPs, which is intimately associated with frequency entrainment, should be widely observed in quantum synchronization and can be explored in LEPs-based applications. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: 12 pages, 6 figures, Supplement 6 pages, 2 figures

Journal ref: Optics Express 32, 13998 (2024)

arXiv:2404.05948 [pdf, other]

On the robustness of double-word addition algorithms

Authors: Yuanyuan Yang, XinYu Lyu, Sida He, Xiliang Lu, Ji Qi, Zhihao Li

Abstract: We demonstrate that, even when there are moderate overlaps in the inputs of sloppy or accurate double-word addition algorithms in the QD library, these algorithms still guarantee error bounds of $O(u^2(|a|+|b|))$ in faithful rounding. Furthermore, the accurate algorithm can achieve a relative error bound of $O(u^2)$ in the presence of moderate overlaps in the inputs when rounding function is round… ▽ More We demonstrate that, even when there are moderate overlaps in the inputs of sloppy or accurate double-word addition algorithms in the QD library, these algorithms still guarantee error bounds of $O(u^2(|a|+|b|))$ in faithful rounding. Furthermore, the accurate algorithm can achieve a relative error bound of $O(u^2)$ in the presence of moderate overlaps in the inputs when rounding function is round-to-nearest. The relative error bound also holds in directed rounding, but certain additional conditions are required. Consequently, in double-word multiplication and addition operations, we can safely omit the normalization step of double-word multiplication and replace the accurate addition algorithm with the sloppy one. Numerical experiments confirm that this approach nearly doubles the performance of double-word multiplication and addition operations, with negligible precision costs. Moreover, in directed rounding mode, the signs of the errors of the two algorithms are consistent with the rounding direction, even in the presence of input overlap. This allows us to avoid changing the rounding mode in interval arithmetic. We also prove that the relative error bound of the sloppy addition algorithm exceeds $3u^2$ if and only if the input meets the condition of Sterbenz's Lemma when rounding to nearest. These findings suggest that the two addition algorithms are more robust than previously believed. △ Less

Submitted 10 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

arXiv:2404.05387 [pdf, ps, other]

On the existence and rigidity of critical Z2 eigenvalues

Authors: Jiahuang Chen, Siqi He

Abstract: In this article, we study the eigenvalues and eigenfunction problems for the Laplace operator on multivalued functions, defined on the complement of the 2n points on the round sphere. These eigenvalues and eigensections could also be viewed as functions on the configuration spaces of points, introduced and systematically studied by Taubes-Wu. Critical eigenfunctions, which serve as local singulari… ▽ More In this article, we study the eigenvalues and eigenfunction problems for the Laplace operator on multivalued functions, defined on the complement of the 2n points on the round sphere. These eigenvalues and eigensections could also be viewed as functions on the configuration spaces of points, introduced and systematically studied by Taubes-Wu. Critical eigenfunctions, which serve as local singularity models for gauge theoretical problems, are of particular interest. Our study focuses on the existence and rigidity problems pertaining to these critical eigenfunctions. We prove that for generic configurations, the critical eigenfunctions do not exist. Furthermore, for each n>1, we construct infinitely many configurations that admit critical eigensections. Additionally, we show that the Taubes-Wu tetrahedral eigensections are deformation rigid and non-degenerate. Our main tools are algebraic identities developed by Taubes-Wu and finite group representation theory. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: 46 pages

MSC Class: 53C07

arXiv:2404.03645 [pdf, other]

Decoupling Static and Hierarchical Motion Perception for Referring Video Segmentation

Authors: Shuting He, Henghui Ding

Abstract: Referring video segmentation relies on natural language expressions to identify and segment objects, often emphasizing motion clues. Previous works treat a sentence as a whole and directly perform identification at the video-level, mixing up static image-level cues with temporal motion cues. However, image-level features cannot well comprehend motion cues in sentences, and static cues are not cruc… ▽ More Referring video segmentation relies on natural language expressions to identify and segment objects, often emphasizing motion clues. Previous works treat a sentence as a whole and directly perform identification at the video-level, mixing up static image-level cues with temporal motion cues. However, image-level features cannot well comprehend motion cues in sentences, and static cues are not crucial for temporal perception. In fact, static cues can sometimes interfere with temporal perception by overshadowing motion cues. In this work, we propose to decouple video-level referring expression understanding into static and motion perception, with a specific emphasis on enhancing temporal comprehension. Firstly, we introduce an expression-decoupling module to make static cues and motion cues perform their distinct role, alleviating the issue of sentence embeddings overlooking motion cues. Secondly, we propose a hierarchical motion perception module to capture temporal information effectively across varying timescales. Furthermore, we employ contrastive learning to distinguish the motions of visually similar objects. These contributions yield state-of-the-art performance across five datasets, including a remarkable $\textbf{9.2%}$ $\mathcal{J\&F}$ improvement on the challenging $\textbf{MeViS}$ dataset. Code is available at https://github.com/heshuting555/DsHmp. △ Less

Submitted 4 April, 2024; originally announced April 2024.

Comments: CVPR 2024, code: https://github.com/heshuting555/DsHmp

arXiv:2404.02424 [pdf, other]

Rethinking Pruning for Vision-Language Models: Strategies for Effective Sparsity and Performance Restoration

Authors: Shwai He, Ang Li, Tianlong Chen

Abstract: Vision-Language Models (VLMs) integrate information from multiple modalities and have shown remarkable success across various tasks. However, deploying large-scale VLMs in resource-constrained scenarios is challenging. Pruning followed by finetuning offers a potential solution but remains underexplored for VLMs. This study addresses two key questions: how to distribute sparsity across different mo… ▽ More Vision-Language Models (VLMs) integrate information from multiple modalities and have shown remarkable success across various tasks. However, deploying large-scale VLMs in resource-constrained scenarios is challenging. Pruning followed by finetuning offers a potential solution but remains underexplored for VLMs. This study addresses two key questions: how to distribute sparsity across different modality-specific models, and how to restore the performance of pruned sparse VLMs. Our preliminary studies identified two effective pruning settings: applying the same sparsity to both vision and language models, and pruning only the language models. While LoRA finetuning aims to restore sparse models, it faces challenges due to incompatibility with sparse models, disrupting the pruned sparsity. To overcome these issues, we propose SparseLoRA, which applies sparsity directly to LoRA weights. Our experimental results demonstrate significant improvements, including an 11.3\% boost under 2:4 sparsity and a 47.6\% enhancement under unstructured 70\% sparsity. Code is released at: \url{https://github.com/Shwai-He/VLM-Compression}. △ Less

Submitted 24 June, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

arXiv:2404.01958 [pdf, other]

MESEN: Exploit Multimodal Data to Design Unimodal Human Activity Recognition with Few Labels

Authors: Lilin Xu, Chaojie Gu, Rui Tan, Shibo He, Jiming Chen

Abstract: Human activity recognition (HAR) will be an essential function of various emerging applications. However, HAR typically encounters challenges related to modality limitations and label scarcity, leading to an application gap between current solutions and real-world requirements. In this work, we propose MESEN, a multimodal-empowered unimodal sensing framework, to utilize unlabeled multimodal data a… ▽ More Human activity recognition (HAR) will be an essential function of various emerging applications. However, HAR typically encounters challenges related to modality limitations and label scarcity, leading to an application gap between current solutions and real-world requirements. In this work, we propose MESEN, a multimodal-empowered unimodal sensing framework, to utilize unlabeled multimodal data available during the HAR model design phase for unimodal HAR enhancement during the deployment phase. From a study on the impact of supervised multimodal fusion on unimodal feature extraction, MESEN is designed to feature a multi-task mechanism during the multimodal-aided pre-training stage. With the proposed mechanism integrating cross-modal feature contrastive learning and multimodal pseudo-classification aligning, MESEN exploits unlabeled multimodal data to extract effective unimodal features for each modality. Subsequently, MESEN can adapt to downstream unimodal HAR with only a few labeled samples. Extensive experiments on eight public multimodal datasets demonstrate that MESEN achieves significant performance improvements over state-of-the-art baselines in enhancing unimodal HAR by exploiting multimodal data. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: Accepted to the 21th ACM Conference on Embedded Networked Sensor Systems (SenSys 2023)

arXiv:2404.01268 [pdf, other]

Mapping the Increasing Use of LLMs in Scientific Papers

Authors: Weixin Liang, Yaohui Zhang, Zhengxuan Wu, Haley Lepp, Wenlong Ji, Xuandong Zhao, Hancheng Cao, Sheng Liu, Siyu He, Zhi Huang, Diyi Yang, Christopher Potts, Christopher D Manning, James Y. Zou

Abstract: Scientific publishing lays the foundation of science by disseminating research findings, fostering collaboration, encouraging reproducibility, and ensuring that scientific knowledge is accessible, verifiable, and built upon over time. Recently, there has been immense speculation about how many people are using large language models (LLMs) like ChatGPT in their academic writing, and to what extent… ▽ More Scientific publishing lays the foundation of science by disseminating research findings, fostering collaboration, encouraging reproducibility, and ensuring that scientific knowledge is accessible, verifiable, and built upon over time. Recently, there has been immense speculation about how many people are using large language models (LLMs) like ChatGPT in their academic writing, and to what extent this tool might have an effect on global scientific practices. However, we lack a precise measure of the proportion of academic writing substantially modified or produced by LLMs. To address this gap, we conduct the first systematic, large-scale analysis across 950,965 papers published between January 2020 and February 2024 on the arXiv, bioRxiv, and Nature portfolio journals, using a population-level statistical framework to measure the prevalence of LLM-modified content over time. Our statistical estimation operates on the corpus level and is more robust than inference on individual instances. Our findings reveal a steady increase in LLM usage, with the largest and fastest growth observed in Computer Science papers (up to 17.5%). In comparison, Mathematics papers and the Nature portfolio showed the least LLM modification (up to 6.3%). Moreover, at an aggregate level, our analysis reveals that higher levels of LLM-modification are associated with papers whose first authors post preprints more frequently, papers in more crowded research areas, and papers of shorter lengths. Our findings suggests that LLMs are being broadly used in scientific writings. △ Less

Submitted 1 April, 2024; originally announced April 2024.

arXiv:2404.01050 [pdf, other]

Drag Your Noise: Interactive Point-based Editing via Diffusion Semantic Propagation

Authors: Haofeng Liu, Chenshu Xu, Yifei Yang, Lihua Zeng, Shengfeng He

Abstract: Point-based interactive editing serves as an essential tool to complement the controllability of existing generative models. A concurrent work, DragDiffusion, updates the diffusion latent map in response to user inputs, causing global latent map alterations. This results in imprecise preservation of the original content and unsuccessful editing due to gradient vanishing. In contrast, we present Dr… ▽ More Point-based interactive editing serves as an essential tool to complement the controllability of existing generative models. A concurrent work, DragDiffusion, updates the diffusion latent map in response to user inputs, causing global latent map alterations. This results in imprecise preservation of the original content and unsuccessful editing due to gradient vanishing. In contrast, we present DragNoise, offering robust and accelerated editing without retracing the latent map. The core rationale of DragNoise lies in utilizing the predicted noise output of each U-Net as a semantic editor. This approach is grounded in two critical observations: firstly, the bottleneck features of U-Net inherently possess semantically rich features ideal for interactive editing; secondly, high-level semantics, established early in the denoising process, show minimal variation in subsequent stages. Leveraging these insights, DragNoise edits diffusion semantics in a single denoising step and efficiently propagates these changes, ensuring stability and efficiency in diffusion editing. Comparative experiments reveal that DragNoise achieves superior control and semantic retention, reducing the optimization time by over 50% compared to DragDiffusion. Our codes are available at https://github.com/haofengl/DragNoise. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: Accepted by CVPR 2024

arXiv:2404.00769 [pdf, other]

An Active Perception Game for Robust Autonomous Exploration

Authors: Siming He, Yuezhan Tao, Igor Spasojevic, Vijay Kumar, Pratik Chaudhari

Abstract: We formulate active perception for an autonomous agent that explores an unknown environment as a two-player zero-sum game: the agent aims to maximize information gained from the environment while the environment aims to minimize the information gained by the agent. In each episode, the environment reveals a set of actions with their potentially erroneous information gain. In order to select the be… ▽ More We formulate active perception for an autonomous agent that explores an unknown environment as a two-player zero-sum game: the agent aims to maximize information gained from the environment while the environment aims to minimize the information gained by the agent. In each episode, the environment reveals a set of actions with their potentially erroneous information gain. In order to select the best action, the robot needs to recover the true information gain from the erroneous one. The robot does so by minimizing the discrepancy between its estimate of information gain and the true information gain it observes after taking the action. We propose an online convex optimization algorithm that achieves sub-linear expected regret $O(T^{3/4})$ for estimating the information gain. We also provide a bound on the regret of active perception performed by any (near-)optimal prediction and trajectory selection algorithms. We evaluate this approach using semantic neural radiance fields (NeRFs) in simulated realistic 3D environments to show that the robot can discover up to 12% more objects using the improved estimate of the information gain. On the M3ED dataset, the proposed algorithm reduced the error of information gain prediction in occupancy map by over 67%. In real-world experiments using occupancy maps on a Jackal ground robot, we show that this approach can calculate complicated trajectories that efficiently explore all occluded regions. △ Less

Submitted 31 March, 2024; originally announced April 2024.

arXiv:2403.20276 [pdf, other]

Constraints on the Blazar-Boosted Dark Matter from the CDEX-10 Experiment

Authors: R. Xu, L. T. Yang, Q. Yue, K. J. Kang, Y. J. Li, H. P. An, Greeshma C., J. P. Chang, Y. H. Chen, J. P. Cheng, W. H. Dai, Z. Deng, C. H. Fang, X. P. Geng, H. Gong, Q. J. Guo, T. Guo, X. Y. Guo, L. He, S. M. He, J. W. Hu, H. X. Huang, T. C. Huang, L. Jiang, S. Karmakar , et al. (59 additional authors not shown)

Abstract: We report new constraints on light dark matter (DM) boosted by blazars using the 205.4 kg day data from the CDEX-10 experiment located at the China Jinping Underground Laboratory. Two representative blazars, TXS 0506+56 and BL Lacertae are studied. The results derived from TXS 0506+56 exclude DM-nucleon elastic scattering cross sections from $4.6\times 10^{-33}\ \rm cm^2$ to… ▽ More We report new constraints on light dark matter (DM) boosted by blazars using the 205.4 kg day data from the CDEX-10 experiment located at the China Jinping Underground Laboratory. Two representative blazars, TXS 0506+56 and BL Lacertae are studied. The results derived from TXS 0506+56 exclude DM-nucleon elastic scattering cross sections from $4.6\times 10^{-33}\ \rm cm^2$ to $1\times10^{-26}\ \rm cm^2$ for DM masses between 10 keV and 1 GeV, and the results derived from BL Lacertae exclude DM-nucleon elastic scattering cross sections from $2.4\times 10^{-34}\ \rm cm^2$ to $1\times10^{-26}\ \rm cm^2$ for the same range of DM masses. The constraints correspond to the best sensitivities among solid-state detector experiments in the sub-MeV mass range. △ Less

Submitted 29 March, 2024; originally announced March 2024.

Comments: 7 pages, 4 figures

arXiv:2403.20263 [pdf, other]

Probing Dark Matter Particles from Evaporating Primordial Black Holes via Electron Scattering in the CDEX-10 Experiment

Authors: Z. H. Zhang, L. T. Yang, Q. Yue, K. J. Kang, Y. J. Li, H. P. An, Greeshma C., J. P. Chang, Y. H. Chen, J. P. Cheng, W. H. Dai, Z. Deng, C. H. Fang, X. P. Geng, H. Gong, Q. J. Guo, T. Guo, X. Y. Guo, L. He, S. M. He, J. W. Hu, H. X. Huang, T. C. Huang, L. Jiang, S. Karmakar , et al. (59 additional authors not shown)

Abstract: Dark matter (DM) is a major constituent of the Universe. However, no definite evidence of DM particles (denoted as ``$χ$") has been found in DM direct detection (DD) experiments to date. There is a novel concept that detecting $χ$ from evaporating primordial black holes (PBHs). We search for $χ$ emitted from PBHs by investigating their interaction with target electrons. The examined PBH masses ran… ▽ More Dark matter (DM) is a major constituent of the Universe. However, no definite evidence of DM particles (denoted as ``$χ$") has been found in DM direct detection (DD) experiments to date. There is a novel concept that detecting $χ$ from evaporating primordial black holes (PBHs). We search for $χ$ emitted from PBHs by investigating their interaction with target electrons. The examined PBH masses range from 1$\times$10$^{15}$ to 7$\times$10$^{16}$ g under the current limits of PBH abundance $f_{PBH}$. Using 205.4 kg$\cdot$day data obtained from the CDEX-10 experiment conducted in the China Jinping Underground Laboratory, we exclude the $χ$--electron ($χ$--$e$) elastic-scattering cross section $σ_{χe} \sim 5\times10^{-29}$ cm$^2$ for $χ$ with a mass $m_χ\lesssim$ 0.1 keV from our results. If ($m_χ$, $σ_{χe}$) can be determined in the future, DD experiments are expected to impose strong constraints on $f_{PBH}$ for large $M_{PBH}$s. △ Less

Submitted 29 March, 2024; originally announced March 2024.

Comments: 8 pages, 6 figures

arXiv:2403.19975 [pdf, other]

Context-Aware Integration of Language and Visual References for Natural Language Tracking

Authors: Yanyan Shao, Shuting He, Qi Ye, Yuchao Feng, Wenhan Luo, Jiming Chen

Abstract: Tracking by natural language specification (TNL) aims to consistently localize a target in a video sequence given a linguistic description in the initial frame. Existing methodologies perform language-based and template-based matching for target reasoning separately and merge the matching results from two sources, which suffer from tracking drift when language and visual templates miss-align with… ▽ More Tracking by natural language specification (TNL) aims to consistently localize a target in a video sequence given a linguistic description in the initial frame. Existing methodologies perform language-based and template-based matching for target reasoning separately and merge the matching results from two sources, which suffer from tracking drift when language and visual templates miss-align with the dynamic target state and ambiguity in the later merging stage. To tackle the issues, we propose a joint multi-modal tracking framework with 1) a prompt modulation module to leverage the complementarity between temporal visual templates and language expressions, enabling precise and context-aware appearance and linguistic cues, and 2) a unified target decoding module to integrate the multi-modal reference cues and executes the integrated queries on the search image to predict the target location in an end-to-end manner directly. This design ensures spatio-temporal consistency by leveraging historical visual information and introduces an integrated solution, generating predictions in a single step. Extensive experiments conducted on TNL2K, OTB-Lang, LaSOT, and RefCOCOg validate the efficacy of our proposed approach. The results demonstrate competitive performance against state-of-the-art methods for both tracking and grounding. △ Less

Submitted 29 March, 2024; originally announced March 2024.

Comments: Accepted by CVPR2024

arXiv:2403.19414 [pdf, other]

BP4ER: Bootstrap Prompting for Explicit Reasoning in Medical Dialogue Generation

Authors: Yuhong He, Yongqi Zhang, Shizhu He, Jun Wan

Abstract: Medical dialogue generation (MDG) has gained increasing attention due to its substantial practical value. Previous works typically employ a sequence-to-sequence framework to generate medical responses by modeling dialogue context as sequential text with annotated medical entities. While these methods have been successful in generating fluent responses, they fail to provide process explanations of… ▽ More Medical dialogue generation (MDG) has gained increasing attention due to its substantial practical value. Previous works typically employ a sequence-to-sequence framework to generate medical responses by modeling dialogue context as sequential text with annotated medical entities. While these methods have been successful in generating fluent responses, they fail to provide process explanations of reasoning and require extensive entity annotation. To address these limitations, we propose the method Bootstrap Prompting for Explicit Reasoning in MDG (BP4ER), which explicitly model MDG's multi-step reasoning process and iteratively enhance this reasoning process. We employ a least-to-most prompting strategy to guide a large language model (LLM) in explicit reasoning, breaking down MDG into simpler sub-questions. These sub-questions build on answers from previous ones. Additionally, we also introduce two distinct bootstrapping techniques for prompting, which autonomously correct errors and facilitate the LLM's explicit reasoning. This approach eliminates the need for entity annotation and increases the transparency of the MDG process by explicitly generating the intermediate reasoning chain. The experimental findings on the two public datasets indicate that BP4ER outperforms state-of-the-art methods in terms of both objective and subjective evaluation metrics. △ Less

Submitted 28 March, 2024; originally announced March 2024.

Comments: Accepted at LREC-COLING 2024

arXiv:2403.17675 [pdf, other]

Chattering Phenomena in Time-Optimal Control for High-Order Chain-of-Integrators Systems with Full State Constraints

Authors: Yunan Wang, Chuxiong Hu, Zeyang Li, Yujie Lin, Shize Lin, Suqin He

Abstract: Time-optimal control for high-order chain-of-integrators systems with full state constraints remains an open and challenging problem in the optimal control theory domain. The behaviors of optimal control in high-order problems lack precision characterization, even where the existence of the chattering phenomenon remains unknown and overlooked. This paper establishes a theoretical framework for cha… ▽ More Time-optimal control for high-order chain-of-integrators systems with full state constraints remains an open and challenging problem in the optimal control theory domain. The behaviors of optimal control in high-order problems lack precision characterization, even where the existence of the chattering phenomenon remains unknown and overlooked. This paper establishes a theoretical framework for chattering phenomena in the considered problem, providing novel findings on the uniqueness of state constraints inducing chattering, the upper bound on switching times in an unconstrained arc during chattering, and the convergence of states and costates to the chattering limit point. For the first time, this paper proves the existence of the chattering phenomenon in the considered problem. The chattering optimal control for 4th order problems with velocity constraints is precisely solved, providing an approach to plan strictly time-optimal snap-limited trajectories. Other cases of order $n\leq4$ are proved not to allow chattering. The conclusions correct the longstanding misconception in the industry regarding the time-optimality of S-shaped trajectories with minimal switching times. △ Less

Submitted 29 March, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

arXiv:2403.17638 [pdf, other]

Learning with Unreliability: Fast Few-shot Voxel Radiance Fields with Relative Geometric Consistency

Authors: Yingjie Xu, Bangzhen Liu, Hao Tang, Bailin Deng, Shengfeng He

Abstract: We propose a voxel-based optimization framework, ReVoRF, for few-shot radiance fields that strategically address the unreliability in pseudo novel view synthesis. Our method pivots on the insight that relative depth relationships within neighboring regions are more reliable than the absolute color values in disoccluded areas. Consequently, we devise a bilateral geometric consistency loss that care… ▽ More We propose a voxel-based optimization framework, ReVoRF, for few-shot radiance fields that strategically address the unreliability in pseudo novel view synthesis. Our method pivots on the insight that relative depth relationships within neighboring regions are more reliable than the absolute color values in disoccluded areas. Consequently, we devise a bilateral geometric consistency loss that carefully navigates the trade-off between color fidelity and geometric accuracy in the context of depth consistency for uncertain regions. Moreover, we present a reliability-guided learning strategy to discern and utilize the variable quality across synthesized views, complemented by a reliability-aware voxel smoothing algorithm that smoothens the transition between reliable and unreliable data patches. Our approach allows for a more nuanced use of all available data, promoting enhanced learning from regions previously considered unsuitable for high-quality reconstruction. Extensive experiments across diverse datasets reveal that our approach attains significant gains in efficiency and accuracy, delivering rendering speeds of 3 FPS, 7 mins to train a $360^\circ$ scene, and a 5\% improvement in PSNR over existing few-shot methods. Code is available at https://github.com/HKCLynn/ReVoRF. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: CVPR 2024 final version

arXiv:2403.16517 [pdf, other]

Norm Violation Detection in Multi-Agent Systems using Large Language Models: A Pilot Study

Authors: Shawn He, Surangika Ranathunga, Stephen Cranefield, Bastin Tony Roy Savarimuthu

Abstract: Norms are an important component of the social fabric of society by prescribing expected behaviour. In Multi-Agent Systems (MAS), agents interacting within a society are equipped to possess social capabilities such as reasoning about norms and trust. Norms have long been of interest within the Normative Multi-Agent Systems community with researchers studying topics such as norm emergence, norm vio… ▽ More Norms are an important component of the social fabric of society by prescribing expected behaviour. In Multi-Agent Systems (MAS), agents interacting within a society are equipped to possess social capabilities such as reasoning about norms and trust. Norms have long been of interest within the Normative Multi-Agent Systems community with researchers studying topics such as norm emergence, norm violation detection and sanctioning. However, these studies have some limitations: they are often limited to simple domains, norms have been represented using a variety of representations with no standard approach emerging, and the symbolic reasoning mechanisms generally used may suffer from a lack of extensibility and robustness. In contrast, Large Language Models (LLMs) offer opportunities to discover and reason about norms across a large range of social situations. This paper evaluates the capability of LLMs to detecting norm violations. Based on simulated data from 80 stories in a household context, with varying complexities, we investigated whether 10 norms are violated. For our evaluations we first obtained the ground truth from three human evaluators for each story. Then, the majority result was compared against the results from three well-known LLM models (Llama 2 7B, Mixtral 7B and ChatGPT-4). Our results show the promise of ChatGPT-4 for detecting norm violations, with Mixtral some distance behind. Also, we identify areas where these models perform poorly and discuss implications for future work. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.16495 [pdf, other]

doi 10.1016/j.knosys.2024.111637

LSTTN: A Long-Short Term Transformer-based Spatio-temporal Neural Network for Traffic Flow Forecasting

Authors: Qinyao Luo, Silu He, Xing Han, Yuhan Wang, Haifeng Li

Abstract: Accurate traffic forecasting is a fundamental problem in intelligent transportation systems and learning long-range traffic representations with key information through spatiotemporal graph neural networks (STGNNs) is a basic assumption of current traffic flow prediction models. However, due to structural limitations, existing STGNNs can only utilize short-range traffic flow data; therefore, the m… ▽ More Accurate traffic forecasting is a fundamental problem in intelligent transportation systems and learning long-range traffic representations with key information through spatiotemporal graph neural networks (STGNNs) is a basic assumption of current traffic flow prediction models. However, due to structural limitations, existing STGNNs can only utilize short-range traffic flow data; therefore, the models cannot adequately learn the complex trends and periodic features in traffic flow. Besides, it is challenging to extract the key temporal information from the long historical traffic series and obtain a compact representation. To solve the above problems, we propose a novel LSTTN (Long-Short Term Transformer-based Network) framework comprehensively considering the long- and short-term features in historical traffic flow. First, we employ a masked subseries Transformer to infer the content of masked subseries from a small portion of unmasked subseries and their temporal context in a pretraining manner, forcing the model to efficiently learn compressed and contextual subseries temporal representations from long historical series. Then, based on the learned representations, long-term trend is extracted by using stacked 1D dilated convolution layers, and periodic features are extracted by dynamic graph convolution layers. For the difficulties in making time-step level prediction, LSTTN adopts a short-term trend extractor to learn fine-grained short-term temporal features. Finally, LSTTN fuses the long-term trend, periodic features and short-term features to obtain the prediction results. Experiments on four real-world datasets show that in 60-minute-ahead long-term forecasting, the LSTTN model achieves a minimum improvement of 5.63\% and a maximum improvement of 16.78\% over baseline models. The source code is available at https://github.com/GeoX-Lab/LSTTN. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: 15 pages, 10 figures, 6 tables

Journal ref: Knowledge-Based Systems 2024

arXiv:2403.16303 [pdf]

Large Language Models in Biomedical and Health Informatics: A Bibliometric Review

Authors: Huizi Yu, Lizhou Fan, Lingyao Li, Jiayan Zhou, Zihui Ma, Lu Xian, Wenyue Hua, Sijia He, Mingyu Jin, Yongfeng Zhang, Ashvin Gandhi, Xin Ma

Abstract: Large Language Models (LLMs) have rapidly become important tools in Biomedical and Health Informatics (BHI), enabling new ways to analyze data, treat patients, and conduct research. This bibliometric review aims to provide a panoramic view of how LLMs have been used in BHI by examining research articles and collaboration networks from 2022 to 2023. It further explores how LLMs can improve Natural… ▽ More Large Language Models (LLMs) have rapidly become important tools in Biomedical and Health Informatics (BHI), enabling new ways to analyze data, treat patients, and conduct research. This bibliometric review aims to provide a panoramic view of how LLMs have been used in BHI by examining research articles and collaboration networks from 2022 to 2023. It further explores how LLMs can improve Natural Language Processing (NLP) applications in various BHI areas like medical diagnosis, patient engagement, electronic health record management, and personalized medicine. To do this, our bibliometric review identifies key trends, maps out research networks, and highlights major developments in this fast-moving field. Lastly, it discusses the ethical concerns and practical challenges of using LLMs in BHI, such as data privacy and reliable medical recommendations. Looking ahead, we consider how LLMs could further transform biomedical research as well as healthcare delivery and patient outcomes. This bibliometric review serves as a resource for stakeholders in healthcare, including researchers, clinicians, and policymakers, to understand the current state and future potential of LLMs in BHI. △ Less

Submitted 23 April, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

Comments: 51 pages, 7 figures, 4 tables

arXiv:2403.15268 [pdf, other]

Imagination Augmented Generation: Learning to Imagine Richer Context for Question Answering over Large Language Models

Authors: Huanxuan Liao, Shizhu He, Yao Xu, Yuanzhe Zhang, Kang Liu, Shengping Liu, Jun Zhao

Abstract: Retrieval-Augmented-Generation and Gener-ation-Augmented-Generation have been proposed to enhance the knowledge required for question answering over Large Language Models (LLMs). However, the former relies on external resources, and both require incorporating explicit documents into the context, which increases execution costs and susceptibility to noise data. Recent works indicate that LLMs have… ▽ More Retrieval-Augmented-Generation and Gener-ation-Augmented-Generation have been proposed to enhance the knowledge required for question answering over Large Language Models (LLMs). However, the former relies on external resources, and both require incorporating explicit documents into the context, which increases execution costs and susceptibility to noise data. Recent works indicate that LLMs have modeled rich knowledge, albeit not effectively triggered or awakened. Inspired by this, we propose a novel knowledge-augmented framework, Imagination-Augmented-Generation (IAG), which simulates the human capacity to compensate for knowledge deficits while answering questions solely through imagination, thereby awakening relevant knowledge in LLMs without relying on external resources. Guided by IAG, we propose an imagine richer context method for question answering (IMcQA). IMcQA consists of two modules: explicit imagination, which generates a short dummy document by learning from long context compression, and implicit imagination, which creates flexible adapters by distilling from a teacher model with a long context. Experimental results on three datasets demonstrate that IMcQA exhibits significant advantages in both open-domain and closed-book settings, as well as in out-of-distribution generalization. Our code will be available at https://github.com/Xnhyacinth/IAG. △ Less

Submitted 18 June, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

arXiv:2403.15157 [pdf, other]

AllHands: Ask Me Anything on Large-scale Verbatim Feedback via Large Language Models

Authors: Chaoyun Zhang, Zicheng Ma, Yuhao Wu, Shilin He, Si Qin, Minghua Ma, Xiaoting Qin, Yu Kang, Yuyi Liang, Xiaoyu Gou, Yajie Xue, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang

Abstract: Verbatim feedback constitutes a valuable repository of user experiences, opinions, and requirements essential for software development. Effectively and efficiently extracting valuable insights from such data poses a challenging task. This paper introduces Allhands , an innovative analytic framework designed for large-scale feedback analysis through a natural language interface, leveraging large la… ▽ More Verbatim feedback constitutes a valuable repository of user experiences, opinions, and requirements essential for software development. Effectively and efficiently extracting valuable insights from such data poses a challenging task. This paper introduces Allhands , an innovative analytic framework designed for large-scale feedback analysis through a natural language interface, leveraging large language models (LLMs). Allhands adheres to a conventional feedback analytic workflow, initially conducting classification and topic modeling on the feedback to convert them into a structurally augmented format, incorporating LLMs to enhance accuracy, robustness, generalization, and user-friendliness. Subsequently, an LLM agent is employed to interpret users' diverse questions in natural language on feedback, translating them into Python code for execution, and delivering comprehensive multi-modal responses, including text, code, tables, and images. We evaluate Allhands across three diverse feedback datasets. The experiments demonstrate that Allhands achieves superior efficacy at all stages of analysis, including classification and topic modeling, eventually providing users with an "ask me anything" experience with comprehensive, correct and human-readable response. To the best of our knowledge, Allhands stands as the first comprehensive feedback analysis framework that supports diverse and customized requirements for insight extraction through a natural language interface. △ Less

Submitted 3 April, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

arXiv:2403.12624 [pdf, other]

Large-scale metric objects filtering for binary classification with application to abnormal brain connectivity detection

Authors: Shuaida He, Jiaqi Li, Xin Chen

Abstract: The classification of random objects within metric spaces without a vector structure has attracted increasing attention. However, the complexity inherent in such non-Euclidean data often restricts existing models to handle only a limited number of features, leaving a gap in real-world applications. To address this, we propose a data-adaptive filtering procedure to identify informative features fro… ▽ More The classification of random objects within metric spaces without a vector structure has attracted increasing attention. However, the complexity inherent in such non-Euclidean data often restricts existing models to handle only a limited number of features, leaving a gap in real-world applications. To address this, we propose a data-adaptive filtering procedure to identify informative features from large-scale random objects, leveraging a novel Kolmogorov-Smirnov-type statistic defined on the metric space. Our method, applicable to data in general metric spaces with binary labels, exhibits remarkable flexibility. It enjoys a model-free property, as its implementation does not rely on any specified classifier. Theoretically, it controls the false discovery rate while guaranteeing the sure screening property. Empirically, equipped with a Wasserstein metric, it demonstrates superior sample performance compared to Euclidean competitors. When applied to analyze a dataset on autism, our method identifies significant brain regions associated with the condition. Moreover, it reveals distinct interaction patterns among these regions between individuals with and without autism, achieved by filtering hundreds of thousands of covariance matrices representing various brain connectivities. △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2403.11957 [pdf, ps, other]

Relative aspherical conjecture and higher codimensional obstruction to positive scalar curvature

Authors: Shihang He

Abstract: Motivated by the solution of the aspherical conjecture up to dimension 5 [CL20][Gro20], we want to study a relative version of the aspherical conjecture. We present a natural condition generalizing the model $X\times\mathbb{T}^k$ to the relative aspherical setting. Such model is closely related to submanifold obstruction of positive scalar curvature (PSC), and would be in similar spirit as [HPS15]… ▽ More Motivated by the solution of the aspherical conjecture up to dimension 5 [CL20][Gro20], we want to study a relative version of the aspherical conjecture. We present a natural condition generalizing the model $X\times\mathbb{T}^k$ to the relative aspherical setting. Such model is closely related to submanifold obstruction of positive scalar curvature (PSC), and would be in similar spirit as [HPS15][CRZ23] in codim 2 case. In codim 3 and 4, we prove results on how 3-manifold obstructs the existence of PSC under our relative aspherical condition, the proof of which relies on a newly introduced geometric quantity called the it spherical width. This could be regarded as a relative version extension of the aspherical conjecture up to dim 5. △ Less

Submitted 26 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

Comments: 23 pages. All comments are welcome!

arXiv:2403.11098 [pdf]

Single-Shot Single-Beam Coherent Raman Scattering Thermometry Based on Air Lasing

Authors: Xu Lu, Yewei Chen, Francesco Mazza, Siyi He, Zihan Li, Shunlin Huang, Quanjun Wang, Ning Zhang, Bo Shen, Yuzhu Wu, Jinping Yao, Ya Cheng

Abstract: Thermometric techniques with high accuracy, fast response speed and ease of implementation are desirable for the study of dynamic combustion environments, transient reacting flows, and non-equilibrium plasmas. Herein, single-shot single-beam coherent Raman scattering (SS-CRS) thermometry is developed, for the first time to our knowledge, by using air lasing as a probe. It's proved that the air-las… ▽ More Thermometric techniques with high accuracy, fast response speed and ease of implementation are desirable for the study of dynamic combustion environments, transient reacting flows, and non-equilibrium plasmas. Herein, single-shot single-beam coherent Raman scattering (SS-CRS) thermometry is developed, for the first time to our knowledge, by using air lasing as a probe. It's proved that the air-lasing-assisted CRS signal has a high signal-to-noise ratio enabling single-shot measurements at a 1 kHz repetition rate. The SS-CRS thermometry consistently exhibits precision better than 2% at different temperatures, but the inaccuracy grows with the increase in temperature. The high detection precision, 1 kHz acquisition rate and easy-to-implement single-beam scheme are achieved thanks to the unique temporal, spectral and spatial characteristics of air lasing. This work opens a novel avenue for high-speed CRS thermometry, holding tremendous potential for fast diagnostics of transient reacting flows and plasmas. △ Less

Submitted 17 March, 2024; originally announced March 2024.

Comments: 15 pages, 4 figures

arXiv:2403.10897 [pdf, other]

Rethinking Multi-view Representation Learning via Distilled Disentangling

Authors: Guanzhou Ke, Bo Wang, Xiaoli Wang, Shengfeng He

Abstract: Multi-view representation learning aims to derive robust representations that are both view-consistent and view-specific from diverse data sources. This paper presents an in-depth analysis of existing approaches in this domain, highlighting a commonly overlooked aspect: the redundancy between view-consistent and view-specific representations. To this end, we propose an innovative framework for mul… ▽ More Multi-view representation learning aims to derive robust representations that are both view-consistent and view-specific from diverse data sources. This paper presents an in-depth analysis of existing approaches in this domain, highlighting a commonly overlooked aspect: the redundancy between view-consistent and view-specific representations. To this end, we propose an innovative framework for multi-view representation learning, which incorporates a technique we term 'distilled disentangling'. Our method introduces the concept of masked cross-view prediction, enabling the extraction of compact, high-quality view-consistent representations from various sources without incurring extra computational overhead. Additionally, we develop a distilled disentangling module that efficiently filters out consistency-related information from multi-view representations, resulting in purer view-specific representations. This approach significantly reduces redundancy between view-consistent and view-specific representations, enhancing the overall efficiency of the learning process. Our empirical evaluations reveal that higher mask ratios substantially improve the quality of view-consistent representations. Moreover, we find that reducing the dimensionality of view-consistent representations relative to that of view-specific representations further refines the quality of the combined representations. Our code is accessible at: https://github.com/Guanzhou-Ke/MRDD. △ Less

Submitted 29 March, 2024; v1 submitted 16 March, 2024; originally announced March 2024.

Comments: Accepted by CVPR 2024

arXiv:2403.09190 [pdf, other]

Intention-aware Denoising Diffusion Model for Trajectory Prediction

Authors: Chen Liu, Shibo He, Haoyu Liu, Jiming Chen

Abstract: Trajectory prediction is an essential component in autonomous driving, particularly for collision avoidance systems. Considering the inherent uncertainty of the task, numerous studies have utilized generative models to produce multiple plausible future trajectories for each agent. However, most of them suffer from restricted representation ability or unstable training issues. To overcome these lim… ▽ More Trajectory prediction is an essential component in autonomous driving, particularly for collision avoidance systems. Considering the inherent uncertainty of the task, numerous studies have utilized generative models to produce multiple plausible future trajectories for each agent. However, most of them suffer from restricted representation ability or unstable training issues. To overcome these limitations, we propose utilizing the diffusion model to generate the distribution of future trajectories. Two cruxes are to be settled to realize such an idea. First, the diversity of intention is intertwined with the uncertain surroundings, making the true distribution hard to parameterize. Second, the diffusion process is time-consuming during the inference phase, rendering it unrealistic to implement in a real-time driving system. We propose an Intention-aware denoising Diffusion Model (IDM), which tackles the above two problems. We decouple the original uncertainty into intention uncertainty and action uncertainty and model them with two dependent diffusion processes. To decrease the inference time, we reduce the variable dimensions in the intention-aware diffusion process and restrict the initial distribution of the action-aware diffusion process, which leads to fewer diffusion steps. To validate our approach, we conduct experiments on the Stanford Drone Dataset (SDD) and ETH/UCY dataset. Our methods achieve state-of-the-art results, with an FDE of 13.83 pixels on the SDD dataset and 0.36 meters on the ETH/UCY dataset. Compared with the original diffusion model, IDM reduces inference time by two-thirds. Interestingly, our experiments further reveal that introducing intention information is beneficial in modeling the diffusion process of fewer steps. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: 14 pages, 9 figures

arXiv:2403.09035 [pdf, other]

DiTMoS: Delving into Diverse Tiny-Model Selection on Microcontrollers

Authors: Xiao Ma, Shengfeng He, Hezhe Qiao, Dong Ma

Abstract: Enabling efficient and accurate deep neural network (DNN) inference on microcontrollers is non-trivial due to the constrained on-chip resources. Current methodologies primarily focus on compressing larger models yet at the expense of model accuracy. In this paper, we rethink the problem from the inverse perspective by constructing small/weak models directly and improving their accuracy. Thus, we i… ▽ More Enabling efficient and accurate deep neural network (DNN) inference on microcontrollers is non-trivial due to the constrained on-chip resources. Current methodologies primarily focus on compressing larger models yet at the expense of model accuracy. In this paper, we rethink the problem from the inverse perspective by constructing small/weak models directly and improving their accuracy. Thus, we introduce DiTMoS, a novel DNN training and inference framework with a selector-classifiers architecture, where the selector routes each input sample to the appropriate classifier for classification. DiTMoS is grounded on a key insight: a composition of weak models can exhibit high diversity and the union of them can significantly boost the accuracy upper bound. To approach the upper bound, DiTMoS introduces three strategies including diverse training data splitting to increase the classifiers' diversity, adversarial selector-classifiers training to ensure synergistic interactions thereby maximizing their complementarity, and heterogeneous feature aggregation to improve the capacity of classifiers. We further propose a network slicing technique to alleviate the extra memory overhead incurred by feature aggregation. We deploy DiTMoS on the Neucleo STM32F767ZI board and evaluate it based on three time-series datasets for human activity recognition, keywords spotting, and emotion recognition, respectively. The experiment results manifest that: (a) DiTMoS achieves up to 13.4% accuracy improvement compared to the best baseline; (b) network slicing almost completely eliminates the memory overhead incurred by feature aggregation with a marginal increase of latency. △ Less

Submitted 13 March, 2024; originally announced March 2024.

arXiv:2403.08855 [pdf, other]

A universal splitting of string and particle scattering amplitudes

Authors: Qu Cao, Jin Dong, Song He, Canxin Shi

Abstract: We propose a new splitting behavior of tree-level string/particle amplitudes for scalars, gluons and gravitons. We identify certain subspaces in the space of Mandelstam variables, where the universal Koba-Nielsen factor splits into two parts (each with an off-shell leg). Both open- and closed-string amplitudes with Parke-Taylor factors naturally factorize into two stringy ${\it currents}$, which i… ▽ More We propose a new splitting behavior of tree-level string/particle amplitudes for scalars, gluons and gravitons. We identify certain subspaces in the space of Mandelstam variables, where the universal Koba-Nielsen factor splits into two parts (each with an off-shell leg). Both open- and closed-string amplitudes with Parke-Taylor factors naturally factorize into two stringy ${\it currents}$, which implies the splitting of bi-adjoint $φ^3$ amplitudes and via a simple deformation to unified stringy amplitudes, the splitting of amplitudes in the non-linear sigma model and Yang-Mills-scalar theory; the same splitting holds for scalar amplitudes without color such as the special Galileon. Remarkably, if we impose similar constraints on Lorentz products involving polarizations, gluon and graviton amplitudes in bosonic string and superstring theories also split into two (stringy) currents. A special case of the splitting implies soft theorems, and more generally it extends recently proposed smooth splittings and new factorizations near zeros to all these theories. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: 6 pages, 5 figures

arXiv:2403.07380 [pdf, other]

Gabor-guided transformer for single image deraining

Authors: Sijin He, Guangfeng Lin

Abstract: Image deraining have have gained a great deal of attention in order to address the challenges posed by the effects of harsh weather conditions on visual tasks. While convolutional neural networks (CNNs) are popular, their limitations in capturing global information may result in ineffective rain removal. Transformer-based methods with self-attention mechanisms have improved, but they tend to disto… ▽ More Image deraining have have gained a great deal of attention in order to address the challenges posed by the effects of harsh weather conditions on visual tasks. While convolutional neural networks (CNNs) are popular, their limitations in capturing global information may result in ineffective rain removal. Transformer-based methods with self-attention mechanisms have improved, but they tend to distort high-frequency details that are crucial for image fidelity. To solve this problem, we propose the Gabor-guided tranformer (Gabformer) for single image deraining. The focus on local texture features is enhanced by incorporating the information processed by the Gabor filter into the query vector, which also improves the robustness of the model to noise due to the properties of the filter. Extensive experiments on the benchmarks demonstrate that our method outperforms state-of-the-art approaches. △ Less

Submitted 12 March, 2024; originally announced March 2024.

Showing 51–100 of 1,185 results for author: He, S