subscribe to arXiv mailings

VLMPC: Vision-Language Model Predictive Control for Robotic Manipulation

Authors: Wentao Zhao, Jiaming Chen, Ziyu Meng, Donghui Mao, Ran Song, Wei Zhang

Abstract: Although Model Predictive Control (MPC) can effectively predict the future states of a system and thus is widely used in robotic manipulation tasks, it does not have the capability of environmental perception, leading to the failure in some complex scenarios. To address this issue, we introduce Vision-Language Model Predictive Control (VLMPC), a robotic manipulation framework which takes advantage… ▽ More Although Model Predictive Control (MPC) can effectively predict the future states of a system and thus is widely used in robotic manipulation tasks, it does not have the capability of environmental perception, leading to the failure in some complex scenarios. To address this issue, we introduce Vision-Language Model Predictive Control (VLMPC), a robotic manipulation framework which takes advantage of the powerful perception capability of vision language model (VLM) and integrates it with MPC. Specifically, we propose a conditional action sampling module which takes as input a goal image or a language instruction and leverages VLM to sample a set of candidate action sequences. Then, a lightweight action-conditioned video prediction model is designed to generate a set of future frames conditioned on the candidate action sequences. VLMPC produces the optimal action sequence with the assistance of VLM through a hierarchical cost function that formulates both pixel-level and knowledge-level consistence between the current observation and the goal image. We demonstrate that VLMPC outperforms the state-of-the-art methods on public benchmarks. More importantly, our method showcases excellent performance in various real-world tasks of robotic manipulation. Code is available at~\url{https://github.com/PPjmchen/VLMPC}. △ Less

Submitted 13 July, 2024; originally announced July 2024.

Comments: Accepted by RSS2024

arXiv:2407.06901 [pdf, other]

RespEar: Earable-Based Robust Respiratory Rate Monitoring

Authors: Yang Liu, Kayla-Jade Butkow, Jake Stuchbury-Wass, Adam Pullin, Dong Ma, Cecilia Mascolo

Abstract: Respiratory rate (RR) monitoring is integral to understanding physical and mental health and tracking fitness. Existing studies have demonstrated the feasibility of RR monitoring under specific user conditions (e.g., while remaining still, or while breathing heavily). Yet, performing accurate, continuous and non-obtrusive RR monitoring across diverse daily routines and activities remains challengi… ▽ More Respiratory rate (RR) monitoring is integral to understanding physical and mental health and tracking fitness. Existing studies have demonstrated the feasibility of RR monitoring under specific user conditions (e.g., while remaining still, or while breathing heavily). Yet, performing accurate, continuous and non-obtrusive RR monitoring across diverse daily routines and activities remains challenging. In this work, we present RespEar, an earable-based system for robust RR monitoring. By leveraging the unique properties of in-ear microphones in earbuds, RespEar enables the use of Respiratory Sinus Arrhythmia (RSA) and Locomotor Respiratory Coupling (LRC), physiological couplings between cardiovascular activity, gait and respiration, to indirectly determine RR. This effectively addresses the challenges posed by the almost imperceptible breathing signals under daily activities. We further propose a suite of meticulously crafted signal processing schemes to improve RR estimation accuracy and robustness. With data collected from 18 subjects over 8 activities, RespEar measures RR with a mean absolute error (MAE) of 1.48 breaths per minutes (BPM) and a mean absolute percent error (MAPE) of 9.12% in sedentary conditions, and a MAE of 2.28 BPM and a MAPE of 11.04% in active conditions, respectively, which is unprecedented for a method capable of generalizing across conditions with a single modality. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.05391 [pdf, other]

Interference Management in MIMO-ISAC Systems: A Transceiver Design Approach

Authors: Yangyang Niu, Zhiqing Wei, Dingyou Ma, Xiaoyu Yang, Huici Wu, Zhiyong Feng, Jianhua Yuan

Abstract: The integrated sensing and communication (ISAC) system under multi-input multi-output (MIMO) architecture achieves dual functionalities of sensing and communication on the same platform by utilizing spatial gain, which provides a feasible paradigm facing spectrum congestion. However, the dual functionalities of sensing and communication operating simultaneously in the same platform bring severe in… ▽ More The integrated sensing and communication (ISAC) system under multi-input multi-output (MIMO) architecture achieves dual functionalities of sensing and communication on the same platform by utilizing spatial gain, which provides a feasible paradigm facing spectrum congestion. However, the dual functionalities of sensing and communication operating simultaneously in the same platform bring severe interference in the ISAC systems. Facing this challenge, we propose a joint optimization framework for transmit beamforming and receive filter design for ISAC systems with MIMO architecture. We aim to maximize the signal-to-clutter-plus-noise ratio (SCNR) at the receiver while considering various constraints such as waveform similarity, power budget, and communication performance requirements to ensure the integration of the dual functionalities. In particular, the overall transmit beamforming is refined into sensing beamforming and communication beamforming, and a quadratic transformation (QT) is introduced to relax and convert the complex non-convex optimization objective. An efficient algorithm based on covariance matrix tapers (CMT) is proposed to restructure the clutter covariance matrix considering the mismatched steering vector, thereby improving the robustness of the ISAC transceiver design. Numerical simulations are provided to demonstrate the effectiveness of the proposed algorithm. △ Less

Submitted 7 July, 2024; originally announced July 2024.

arXiv:2407.05250 [pdf, other]

CLIMB: A Benchmark of Clinical Bias in Large Language Models

Authors: Yubo Zhang, Shudi Hou, Mingyu Derek Ma, Wei Wang, Muhao Chen, Jieyu Zhao

Abstract: Large language models (LLMs) are increasingly applied to clinical decision-making. However, their potential to exhibit bias poses significant risks to clinical equity. Currently, there is a lack of benchmarks that systematically evaluate such clinical bias in LLMs. While in downstream tasks, some biases of LLMs can be avoided such as by instructing the model to answer "I'm not sure...", the intern… ▽ More Large language models (LLMs) are increasingly applied to clinical decision-making. However, their potential to exhibit bias poses significant risks to clinical equity. Currently, there is a lack of benchmarks that systematically evaluate such clinical bias in LLMs. While in downstream tasks, some biases of LLMs can be avoided such as by instructing the model to answer "I'm not sure...", the internal bias hidden within the model still lacks deep studies. We introduce CLIMB (shorthand for A Benchmark of Clinical Bias in Large Language Models), a pioneering comprehensive benchmark to evaluate both intrinsic (within LLMs) and extrinsic (on downstream tasks) bias in LLMs for clinical decision tasks. Notably, for intrinsic bias, we introduce a novel metric, AssocMAD, to assess the disparities of LLMs across multiple demographic groups. Additionally, we leverage counterfactual intervention to evaluate extrinsic bias in a task of clinical diagnosis prediction. Our experiments across popular and medically adapted LLMs, particularly from the Mistral and LLaMA families, unveil prevalent behaviors with both intrinsic and extrinsic bias. This work underscores the critical need to mitigate clinical bias and sets a new standard for future evaluations of LLMs' clinical bias. △ Less

Submitted 6 July, 2024; originally announced July 2024.

arXiv:2407.03906 [pdf]

Color-map recommendation for MR relaxometry maps

Authors: Miha Fuderer, Barbara Wichtmann, Fabio Crameri, Nandita M. deSouza, Bettina Baeßler, Vikas Gulani, Meiyun Wang, Dirk Poot, Ruud de Boer, Matt Cashmore, Wolter de Graaf, Kathryn E. Keenan, Dan Ma, Carolin Pirkl, Nico Sollmann, Sebastian Weingärtner, Stefano Mandija, Xavier Golay

Abstract: Purpose: To harmonize the use of color for MR relaxometry maps and therefore recommend the use of specific color-maps for representing T1 and T2 maps. Methods: Perceptually linearized color-maps were chosen to have similar color settings as those proposed by Griswold et al. in 2018. A Delphi process, polling the opinion of a panel of 81 experts, was used to generate consensus on the suitability of… ▽ More Purpose: To harmonize the use of color for MR relaxometry maps and therefore recommend the use of specific color-maps for representing T1 and T2 maps. Methods: Perceptually linearized color-maps were chosen to have similar color settings as those proposed by Griswold et al. in 2018. A Delphi process, polling the opinion of a panel of 81 experts, was used to generate consensus on the suitability of these maps. Results: Consensus was reached on the suitability of the logarithm-processed Lipari color-map for T1 and the logarithm-processed Navia color-map for T2. There was consensus on color bars being mandatory and on the use of a specific value indicating invalidity. There was no consensus on whether the ranges should be fixed per anatomy. Conclusion: The authors recommend the use of the logarithm-processed Lipari color map for displaying quantitative T1 maps and R1 maps; likewise, the authors recommend the logarithm-processed Navia color-map for displaying T2, T2*, R2 and R2* maps. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: 22 pages; embedded are 5 figures and 5 tables; contact the first author for supplementary material. Submitted to Magnetic Resonance in Medicine

arXiv:2407.03688 [pdf, other]

Adaptive sampling strategy for tolerance analysis of freeform optical surfaces based on critical ray aiming

Authors: Rundong Fan, Shili Wei, Zhuang Qian, Huiru Ji, Hao Tan, Yan Mo, Donglin Ma

Abstract: The tolerance analysis of freeform surfaces plays a crucial role in the development of advanced imaging systems. However, the intricate relationship between surface error and imaging quality poses significant challenges, necessitating dense sampling of featured rays during the computation process to ensure an accurate tolerance for different fields of view (FOVs). Here, we propose an adaptive samp… ▽ More The tolerance analysis of freeform surfaces plays a crucial role in the development of advanced imaging systems. However, the intricate relationship between surface error and imaging quality poses significant challenges, necessitating dense sampling of featured rays during the computation process to ensure an accurate tolerance for different fields of view (FOVs). Here, we propose an adaptive sampling strategy called "Critical Ray Aiming" for surface tolerance analysis. By identifying the most sensitive ray to wave aberration at each surface point, our methodology facilitates flexible sampling of the FOVs and entrance pupil (EP), achieving computational efficiency without compromising accuracy in determining tolerable surface error. We demonstrate the effectiveness of our method through tolerance analysis of two different freeform imaging systems. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2407.01231 [pdf, other]

MIRAI: Evaluating LLM Agents for Event Forecasting

Authors: Chenchen Ye, Ziniu Hu, Yihe Deng, Zijie Huang, Mingyu Derek Ma, Yanqiao Zhu, Wei Wang

Abstract: Recent advancements in Large Language Models (LLMs) have empowered LLM agents to autonomously collect world information, over which to conduct reasoning to solve complex problems. Given this capability, increasing interests have been put into employing LLM agents for predicting international events, which can influence decision-making and shape policy development on an international scale. Despite… ▽ More Recent advancements in Large Language Models (LLMs) have empowered LLM agents to autonomously collect world information, over which to conduct reasoning to solve complex problems. Given this capability, increasing interests have been put into employing LLM agents for predicting international events, which can influence decision-making and shape policy development on an international scale. Despite such a growing interest, there is a lack of a rigorous benchmark of LLM agents' forecasting capability and reliability. To address this gap, we introduce MIRAI, a novel benchmark designed to systematically evaluate LLM agents as temporal forecasters in the context of international events. Our benchmark features an agentic environment with tools for accessing an extensive database of historical, structured events and textual news articles. We refine the GDELT event database with careful cleaning and parsing to curate a series of relational prediction tasks with varying forecasting horizons, assessing LLM agents' abilities from short-term to long-term forecasting. We further implement APIs to enable LLM agents to utilize different tools via a code-based interface. In summary, MIRAI comprehensively evaluates the agents' capabilities in three dimensions: 1) autonomously source and integrate critical information from large global databases; 2) write codes using domain-specific APIs and libraries for tool-use; and 3) jointly reason over historical knowledge from diverse formats and time to accurately predict future events. Through comprehensive benchmarking, we aim to establish a reliable framework for assessing the capabilities of LLM agents in forecasting international events, thereby contributing to the development of more accurate and trustworthy models for international relation analysis. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: 66 pages, 8 figures, 6 tables; Website: https://mirai-llm.github.io/

arXiv:2406.18147 [pdf, ps, other]

Correlation entropy of free semigroup actions

Authors: Xiaojiang Ye, Yanjie Tang, Dongkui Ma

Abstract: This paper introduces the concepts of correlation entropy and local correlation entropy for free semigroup actions on compact metric space, and explores their fundamental properties. Thereafter, we generalize some classical results on correlation entropy and local correlation entropy to apply to free semigroup actions. Finally, we establish the relationship between topological entropy, measure-the… ▽ More This paper introduces the concepts of correlation entropy and local correlation entropy for free semigroup actions on compact metric space, and explores their fundamental properties. Thereafter, we generalize some classical results on correlation entropy and local correlation entropy to apply to free semigroup actions. Finally, we establish the relationship between topological entropy, measure-theoretic entropy, correlation entropy, and local correlation entropy for free semigroup actions under various conditions. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: 35 pages

arXiv:2406.16847 [pdf, other]

Realizing a spatially correlated lattice interferometer

Authors: Peng Peng, Dekai Mao, Yi Liang, Guoling Yin, Hongmian Shui, Bo Song, Xiaoji Zhou

Abstract: Atom interferometers provide a powerful tool for measuring physical constants and testifying fundamental physics with unprecedented precision. Conventional atom interferometry focuses on the phase difference between two paths and utilizes matter waves with fixed coherence. Here, we report on realizing a Ramsey-Bordé interferometer of coherent matter waves dressed by a moving optical lattice in the… ▽ More Atom interferometers provide a powerful tool for measuring physical constants and testifying fundamental physics with unprecedented precision. Conventional atom interferometry focuses on the phase difference between two paths and utilizes matter waves with fixed coherence. Here, we report on realizing a Ramsey-Bordé interferometer of coherent matter waves dressed by a moving optical lattice in the gravity direction, and explore the resulting interference along multiple paths with tunable coherence. We investigate spatial correlations of atoms both within the lattice and between two arms by interferometry, and observe the emerging multiple interference peaks owing to the long-range coherence nature of the Bose-Einstein condensate. Our findings agree well with theoretical simulations, paving the way for high-precision interferometry with ultracold atoms. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.13448 [pdf, other]

Demonstration of High-Efficiency Microwave Heating Producing Record Highly Charged Xenon Ion Beams with Superconducting ECR Ion Sources

Authors: X. Wang, J. B. Li, V. Mironov, J. W. Guo, X. Z. Zhang, O. Tarvainen, Y. C. Feng, L. X. Li, J. D. Ma, Z. H. Zhang, W. Lu, S. Bogomolov, L. Sun, H. W. Zhao

Abstract: Intense highly charged ion beam production is essential for high-power heavy ion accelerators. A novel movable Vlasov launcher for superconducting high charge state Electron Cyclotron Resonance (ECR) ion source has been devised that can affect the microwave power effectiveness by a factor of about 4 in terms of highly charged ion beam production. This approach based on a dedicated microwave launch… ▽ More Intense highly charged ion beam production is essential for high-power heavy ion accelerators. A novel movable Vlasov launcher for superconducting high charge state Electron Cyclotron Resonance (ECR) ion source has been devised that can affect the microwave power effectiveness by a factor of about 4 in terms of highly charged ion beam production. This approach based on a dedicated microwave launching system instead of the traditional coupling scheme has led to new insight on microwave-plasma interaction. With this new understanding, the world record highly charged xenon ion beam currents have been enhanced by up to a factor of 2, which could directly and significantly enhance the performance of heavy ion accelerators and provide many new research opportunities in nuclear physics, atomic physics and other disciplines. △ Less

Submitted 14 July, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.12323 [pdf, other]

Hybrid Beamforming Design for Near-Field ISAC with Modular XL-MIMO

Authors: Chunwei Meng, Dingyou Ma, Zhaolin Wang, Yuanwei Liu, Zhiqing Wei, Zhiyong Feng

Abstract: A novel modular extremely large-scale multiple-input-multiple-output (XL-MIMO) integrated sensing and communication (ISAC) framework is proposed in this paper. We consider a downlink ISAC scenario and exploit the modular array architecture to enhance the communication spectral efficiency and sensing resolution while reducing the channel modeling complexity by employing the hybrid spherical and pla… ▽ More A novel modular extremely large-scale multiple-input-multiple-output (XL-MIMO) integrated sensing and communication (ISAC) framework is proposed in this paper. We consider a downlink ISAC scenario and exploit the modular array architecture to enhance the communication spectral efficiency and sensing resolution while reducing the channel modeling complexity by employing the hybrid spherical and planar wavefront model. Considering the hybrid digital-analog structure inherent to modular arrays, we formulate a joint analog-digital beamforming design problem based on the communication spectral efficiency and sensing signal-to-clutter-plus-noise ratio (SCNR). By exploring the structural similarity of the communication and sensing channels, it is proved that the optimal transmit covariance matrix lies in the subspace spanned by the subarray response vectors, yielding a closed-form solution for the optimal analog beamformer. Consequently, the joint design problem is transformed into a low-dimensional rank-constrained digital beamformer optimization. We first propose a manifold optimization method that directly optimizes the digital beamformer on the rank-constrained Stiefel manifold. Additionally, we develop an semidefinite relaxation (SDR)-based approach that relaxes the rank constraint and employ the randomization technique to obtain a near-optimal solution. Simulation results demonstrate the effectiveness of the proposed modular XL-MIMO ISAC framework and algorithms. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.11816 [pdf, other]

VideoLLM-online: Online Video Large Language Model for Streaming Video

Authors: Joya Chen, Zhaoyang Lv, Shiwei Wu, Kevin Qinghong Lin, Chenan Song, Difei Gao, Jia-Wei Liu, Ziteng Gao, Dongxing Mao, Mike Zheng Shou

Abstract: Recent Large Language Models have been enhanced with vision capabilities, enabling them to comprehend images, videos, and interleaved vision-language content. However, the learning methods of these large multimodal models typically treat videos as predetermined clips, making them less effective and efficient at handling streaming video inputs. In this paper, we propose a novel Learning-In-Video-St… ▽ More Recent Large Language Models have been enhanced with vision capabilities, enabling them to comprehend images, videos, and interleaved vision-language content. However, the learning methods of these large multimodal models typically treat videos as predetermined clips, making them less effective and efficient at handling streaming video inputs. In this paper, we propose a novel Learning-In-Video-Stream (LIVE) framework, which enables temporally aligned, long-context, and real-time conversation within a continuous video stream. Our LIVE framework comprises comprehensive approaches to achieve video streaming dialogue, encompassing: (1) a training objective designed to perform language modeling for continuous streaming inputs, (2) a data generation scheme that converts offline temporal annotations into a streaming dialogue format, and (3) an optimized inference pipeline to speed up the model responses in real-world video streams. With our LIVE framework, we built VideoLLM-online model upon Llama-2/Llama-3 and demonstrate its significant advantages in processing streaming videos. For instance, on average, our model can support streaming dialogue in a 5-minute video clip at over 10 FPS on an A100 GPU. Moreover, it also showcases state-of-the-art performance on public offline video benchmarks, such as recognition, captioning, and forecasting. The code, model, data, and demo have been made available at https://showlab.github.io/videollm-online. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: CVPR 2024. This arxiv version is upgraded with Llama-3

arXiv:2406.09923 [pdf, other]

CliBench: Multifaceted Evaluation of Large Language Models in Clinical Decisions on Diagnoses, Procedures, Lab Tests Orders and Prescriptions

Authors: Mingyu Derek Ma, Chenchen Ye, Yu Yan, Xiaoxuan Wang, Peipei Ping, Timothy S Chang, Wei Wang

Abstract: The integration of Artificial Intelligence (AI), especially Large Language Models (LLMs), into the clinical diagnosis process offers significant potential to improve the efficiency and accessibility of medical care. While LLMs have shown some promise in the medical domain, their application in clinical diagnosis remains underexplored, especially in real-world clinical practice, where highly sophis… ▽ More The integration of Artificial Intelligence (AI), especially Large Language Models (LLMs), into the clinical diagnosis process offers significant potential to improve the efficiency and accessibility of medical care. While LLMs have shown some promise in the medical domain, their application in clinical diagnosis remains underexplored, especially in real-world clinical practice, where highly sophisticated, patient-specific decisions need to be made. Current evaluations of LLMs in this field are often narrow in scope, focusing on specific diseases or specialties and employing simplified diagnostic tasks. To bridge this gap, we introduce CliBench, a novel benchmark developed from the MIMIC IV dataset, offering a comprehensive and realistic assessment of LLMs' capabilities in clinical diagnosis. This benchmark not only covers diagnoses from a diverse range of medical cases across various specialties but also incorporates tasks of clinical significance: treatment procedure identification, lab test ordering and medication prescriptions. Supported by structured output ontologies, CliBench enables a precise and multi-granular evaluation, offering an in-depth understanding of LLM's capability on diverse clinical tasks of desired granularity. We conduct a zero-shot evaluation of leading LLMs to assess their proficiency in clinical decision-making. Our preliminary results shed light on the potential and limitations of current LLMs in clinical settings, providing valuable insights for future advancements in LLM-powered healthcare. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: Project page: https://clibench.github.io

arXiv:2406.09411 [pdf, other]

MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding

Authors: Fei Wang, Xingyu Fu, James Y. Huang, Zekun Li, Qin Liu, Xiaogeng Liu, Mingyu Derek Ma, Nan Xu, Wenxuan Zhou, Kai Zhang, Tianyi Lorena Yan, Wenjie Jacky Mo, Hsiang-Hui Liu, Pan Lu, Chunyuan Li, Chaowei Xiao, Kai-Wei Chang, Dan Roth, Sheng Zhang, Hoifung Poon, Muhao Chen

Abstract: We introduce MuirBench, a comprehensive benchmark that focuses on robust multi-image understanding capabilities of multimodal LLMs. MuirBench consists of 12 diverse multi-image tasks (e.g., scene understanding, ordering) that involve 10 categories of multi-image relations (e.g., multiview, temporal relations). Comprising 11,264 images and 2,600 multiple-choice questions, MuirBench is created in a… ▽ More We introduce MuirBench, a comprehensive benchmark that focuses on robust multi-image understanding capabilities of multimodal LLMs. MuirBench consists of 12 diverse multi-image tasks (e.g., scene understanding, ordering) that involve 10 categories of multi-image relations (e.g., multiview, temporal relations). Comprising 11,264 images and 2,600 multiple-choice questions, MuirBench is created in a pairwise manner, where each standard instance is paired with an unanswerable variant that has minimal semantic differences, in order for a reliable assessment. Evaluated upon 20 recent multi-modal LLMs, our results reveal that even the best-performing models like GPT-4o and Gemini Pro find it challenging to solve MuirBench, achieving 68.0% and 49.3% in accuracy. Open-source multimodal LLMs trained on single images can hardly generalize to multi-image questions, hovering below 33.3% in accuracy. These results highlight the importance of MuirBench in encouraging the community to develop multimodal LLMs that can look beyond a single image, suggesting potential pathways for future improvements. △ Less

Submitted 1 July, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

Comments: typos corrected, references added, Project Page: https://muirbench.github.io/

arXiv:2406.07552 [pdf, ps, other]

Cohomology of a restricted Lie algebra with a restricted derivation in characteristic 2

Authors: Dan Mao, Liangyun Chen

Abstract: This paper mainly studies the ResLieDer pair in characteristic 2, that is, a restricted Lie algebra with a restricted derivation. We define the restricted representation of a ResLieDer pair and the corresponding cohomology complex. We show that a ResLieDer pair is rigid if the second cohomology group is trivial and a deformation of order $n$ is extensible if and only if its obstruction class is tr… ▽ More This paper mainly studies the ResLieDer pair in characteristic 2, that is, a restricted Lie algebra with a restricted derivation. We define the restricted representation of a ResLieDer pair and the corresponding cohomology complex. We show that a ResLieDer pair is rigid if the second cohomology group is trivial and a deformation of order $n$ is extensible if and only if its obstruction class is trivial. Moreover, we prove that the central extensions of a ResLieDer pair are classified by the second cohomology group. Finally, we show that a pair of restricted derivations is extensible if and only if its obstruction class is trivial. △ Less

Submitted 12 February, 2024; originally announced June 2024.

Comments: 26 page

arXiv:2406.06962 [pdf, other]

Evolving Subnetwork Training for Large Language Models

Authors: Hanqi Li, Lu Chen, Da Ma, Zijian Wu, Su Zhu, Kai Yu

Abstract: Large language models have ushered in a new era of artificial intelligence research. However, their substantial training costs hinder further development and widespread adoption. In this paper, inspired by the redundancy in the parameters of large language models, we propose a novel training paradigm: Evolving Subnetwork Training (EST). EST samples subnetworks from the layers of the large language… ▽ More Large language models have ushered in a new era of artificial intelligence research. However, their substantial training costs hinder further development and widespread adoption. In this paper, inspired by the redundancy in the parameters of large language models, we propose a novel training paradigm: Evolving Subnetwork Training (EST). EST samples subnetworks from the layers of the large language model and from commonly used modules within each layer, Multi-Head Attention (MHA) and Multi-Layer Perceptron (MLP). By gradually increasing the size of the subnetworks during the training process, EST can save the cost of training. We apply EST to train GPT2 model and TinyLlama model, resulting in 26.7\% FLOPs saving for GPT2 and 25.0\% for TinyLlama without an increase in loss on the pre-training dataset. Moreover, EST leads to performance improvements in downstream tasks, indicating that it benefits generalization. Additionally, we provide intuitive theoretical studies based on training dynamics and Dropout theory to ensure the feasibility of EST. Our code is available at https://github.com/OpenDFM/EST. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: Accepted to ICML 2024

arXiv:2406.05357 [pdf, other]

Classification of Fermi Gamma-Ray Bursts Based on Machine Learning

Authors: Si-Yuan Zhu, Wan-Peng Sun, Da-Ling Ma, Fu-Wen Zhang

Abstract: Gamma-ray bursts (GRBs) are typically classified into long and short GRBs based on their durations. However, there is a significant overlapping in the duration distributions of these two categories. In this paper, we apply the unsupervised dimensionality reduction algorithm called t-SNE and UMAP to classify 2061 Fermi GRBs based on four observed quantities: duration, peak energy, fluence, and peak… ▽ More Gamma-ray bursts (GRBs) are typically classified into long and short GRBs based on their durations. However, there is a significant overlapping in the duration distributions of these two categories. In this paper, we apply the unsupervised dimensionality reduction algorithm called t-SNE and UMAP to classify 2061 Fermi GRBs based on four observed quantities: duration, peak energy, fluence, and peak flux. The map results of t-SNE and UMAP show a clear division of these GRBs into two clusters. We mark the two clusters as GRBs-I and GRBs-II, and find that all GRBs associated with supernovae are classified as GRBs-II. It includes the peculiar short GRB 200826A, which was confirmed to originate from the death of a massive star. Furthermore, except for two extreme events GRB 211211A and GRB 230307A, all GRBs associated with kilonovae fall into GRBs-I population. By comparing to the traditional classification of short and long GRBs, the distribution of durations for GRBs-I and GRBs-II do not have a fixed boundary. We find that more than 10% of GRBs-I have a duration greater than 2 seconds, while approximately 1% of GRBs-II have a duration shorter than 2 seconds. △ Less

Submitted 8 June, 2024; originally announced June 2024.

Comments: 11 pages, 5 figures, revised version submitted to MNRAS

Report number: https://doi.org/10.1093/mnras/stae1594

Journal ref: MNRAS, 2024, 532, 1434-1443

arXiv:2406.01392 [pdf, other]

Sparsity-Accelerated Training for Large Language Models

Authors: Da Ma, Lu Chen, Pengyu Wang, Hongshen Xu, Hanqi Li, Liangtai Sun, Su Zhu, Shuai Fan, Kai Yu

Abstract: Large language models (LLMs) have demonstrated proficiency across various natural language processing (NLP) tasks but often require additional training, such as continual pre-training and supervised fine-tuning. However, the costs associated with this, primarily due to their large parameter count, remain high. This paper proposes leveraging \emph{sparsity} in pre-trained LLMs to expedite this trai… ▽ More Large language models (LLMs) have demonstrated proficiency across various natural language processing (NLP) tasks but often require additional training, such as continual pre-training and supervised fine-tuning. However, the costs associated with this, primarily due to their large parameter count, remain high. This paper proposes leveraging \emph{sparsity} in pre-trained LLMs to expedite this training process. By observing sparsity in activated neurons during forward iterations, we identify the potential for computational speed-ups by excluding inactive neurons. We address associated challenges by extending existing neuron importance evaluation metrics and introducing a ladder omission rate scheduler. Our experiments on Llama-2 demonstrate that Sparsity-Accelerated Training (SAT) achieves comparable or superior performance to standard training while significantly accelerating the process. Specifically, SAT achieves a $45\%$ throughput improvement in continual pre-training and saves $38\%$ training time in supervised fine-tuning in practice. It offers a simple, hardware-agnostic, and easily deployable framework for additional LLM training. Our code is available at https://github.com/OpenDFM/SAT. △ Less

Submitted 6 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

Comments: Accepted to ACL 2024 Findings

arXiv:2405.19338 [pdf, other]

Accurate Patient Alignment without Unnecessary Imaging Dose via Synthesizing Patient-specific 3D CT Images from 2D kV Images

Authors: Yuzhen Ding, Jason M. Holmes, Hongying Feng, Baoxin Li, Lisa A. McGee, Jean-Claude M. Rwigema, Sujay A. Vora, Daniel J. Ma, Robert L. Foote, Samir H. Patel, Wei Liu

Abstract: In radiotherapy, 2D orthogonally projected kV images are used for patient alignment when 3D-on-board imaging(OBI) unavailable. But tumor visibility is constrained due to the projection of patient's anatomy onto a 2D plane, potentially leading to substantial setup errors. In treatment room with 3D-OBI such as cone beam CT(CBCT), the field of view(FOV) of CBCT is limited with unnecessarily high imag… ▽ More In radiotherapy, 2D orthogonally projected kV images are used for patient alignment when 3D-on-board imaging(OBI) unavailable. But tumor visibility is constrained due to the projection of patient's anatomy onto a 2D plane, potentially leading to substantial setup errors. In treatment room with 3D-OBI such as cone beam CT(CBCT), the field of view(FOV) of CBCT is limited with unnecessarily high imaging dose, thus unfavorable for pediatric patients. A solution to this dilemma is to reconstruct 3D CT from kV images obtained at the treatment position. Here, we propose a dual-models framework built with hierarchical ViT blocks. Unlike a proof-of-concept approach, our framework considers kV images as the solo input and can synthesize accurate, full-size 3D CT in real time(within milliseconds). We demonstrate the feasibility of the proposed approach on 10 patients with head and neck (H&N) cancer using image quality(MAE: <45HU), dosimetrical accuracy(Gamma passing rate (2%/2mm/10%)>97%) and patient position uncertainty(shift error: <0.4mm). The proposed framework can generate accurate 3D CT faithfully mirroring real-time patient position, thus significantly improving patient setup accuracy, keeping imaging dose minimum, and maintaining treatment veracity. △ Less

Submitted 1 April, 2024; originally announced May 2024.

Comments: 17 pages, 8 figures and tables

arXiv:2405.09116 [pdf, other]

Atomic transport dynamics in crossed optical dipole trap

Authors: Peng Peng, Zhengxi Zhang, Yaoyuan Fan, Guoling Yin, Dekai Mao, Xuzong Chen, Wei Xiong, Xiaoji Zhou

Abstract: We study the dynamical evolution of cold atoms in crossed optical dipole trap theoretically and experimentally. The atomic transport process is accompanied by two competitive kinds of physical mechanics, atomic loading and atomic loss. The loading process normally is negligible in the evaporative cooling experiment on the ground, while it is significant in the preparation of ultra-cold atoms in th… ▽ More We study the dynamical evolution of cold atoms in crossed optical dipole trap theoretically and experimentally. The atomic transport process is accompanied by two competitive kinds of physical mechanics, atomic loading and atomic loss. The loading process normally is negligible in the evaporative cooling experiment on the ground, while it is significant in the preparation of ultra-cold atoms in the space station. Normally, the atomic loading process is much weaker than the atomic loss process, and the atomic number in the center region of the trap decreases monotonically, as reported in previous research. However, when the atomic loading process is comparable to the atomic loss process, the atomic number in the center region of the trap will initially increase to a maximum value and then slowly decrease, and we have observed the phenomenon first. The increase of atomic number in the center region of the trap shows the presence of the loading process, and this will be significant especially under microgravity conditions. We build a theoretical model to analyze the competitive relationship, which coincides with the experimental results well. Furthermore, we have also given the predicted evolutionary behaviors under different conditions. This research provides a solid foundation for further understanding of the atomic transport process in traps. The analysis of loading process is of significant importance for the preparation of ultra-cold atoms in a crossed optical dipole trap under microgravity conditions. △ Less

Submitted 15 May, 2024; originally announced May 2024.

arXiv:2405.09022 [pdf, other]

doi 10.1109/JIOT.2024.3413687

Multi-Objective Optimization-based Transmit Beamforming for Multi-Target and Multi-User MIMO-ISAC Systems

Authors: Chunwei Meng, Zhiqing Wei, Dingyou Ma, Wanli Ni, Liyan Su, Zhiyong Feng

Abstract: Integrated sensing and communication (ISAC) is an enabling technology for the sixth-generation mobile communications, which equips the wireless communication networks with sensing capabilities. In this paper, we investigate transmit beamforming design for multiple-input and multiple-output (MIMO)-ISAC systems in scenarios with multiple radar targets and communication users. A general form of multi… ▽ More Integrated sensing and communication (ISAC) is an enabling technology for the sixth-generation mobile communications, which equips the wireless communication networks with sensing capabilities. In this paper, we investigate transmit beamforming design for multiple-input and multiple-output (MIMO)-ISAC systems in scenarios with multiple radar targets and communication users. A general form of multi-target sensing mutual information (MI) is derived, along with its upper bound, which can be interpreted as the sum of individual single-target sensing MI. Additionally, this upper bound can be achieved by suppressing the cross-correlation among reflected signals from different targets, which aligns with the principles of adaptive MIMO radar. Then, we propose a multi-objective optimization framework based on the signal-to-interference-plus-noise ratio of each user and the tight upper bound of sensing MI, introducing the Pareto boundary to characterize the achievable communication-sensing performance boundary of the proposed ISAC system. To achieve the Pareto boundary, the max-min system utility function method is employed, while considering the fairness between communication users and radar targets. Subsequently, the bisection search method is employed to find a specific Pareto optimal solution by solving a series of convex feasible problems. Finally, simulation results validate that the proposed method achieves a better tradeoff between multi-user communication and multi-target sensing performance. Additionally, utilizing the tight upper bound of sensing MI as a performance metric can enhance the multi-target resolution capability and angle estimation accuracy. △ Less

Submitted 14 May, 2024; originally announced May 2024.

arXiv:2405.06909 [pdf, ps, other]

Fairness in Reinforcement Learning: A Survey

Authors: Anka Reuel, Devin Ma

Abstract: While our understanding of fairness in machine learning has significantly progressed, our understanding of fairness in reinforcement learning (RL) remains nascent. Most of the attention has been on fairness in one-shot classification tasks; however, real-world, RL-enabled systems (e.g., autonomous vehicles) are much more complicated in that agents operate in dynamic environments over a long period… ▽ More While our understanding of fairness in machine learning has significantly progressed, our understanding of fairness in reinforcement learning (RL) remains nascent. Most of the attention has been on fairness in one-shot classification tasks; however, real-world, RL-enabled systems (e.g., autonomous vehicles) are much more complicated in that agents operate in dynamic environments over a long period of time. To ensure the responsible development and deployment of these systems, we must better understand fairness in RL. In this paper, we survey the literature to provide the most up-to-date snapshot of the frontiers of fairness in RL. We start by reviewing where fairness considerations can arise in RL, then discuss the various definitions of fairness in RL that have been put forth thus far. We continue to highlight the methodologies researchers used to implement fairness in single- and multi-agent RL systems before showcasing the distinct application domains that fair RL has been investigated in. Finally, we critically examine gaps in the literature, such as understanding fairness in the context of RLHF, that still need to be addressed in future work to truly operationalize fair RL in real-world systems. △ Less

Submitted 11 May, 2024; originally announced May 2024.

Comments: 10 pages

ACM Class: A.1; I.2

arXiv:2405.05983 [pdf]

Real-Time Pill Identification for the Visually Impaired Using Deep Learning

Authors: Bo Dang, Wenchao Zhao, Yufeng Li, Danqing Ma, Qixuan Yu, Elly Yijun Zhu

Abstract: The prevalence of mobile technology offers unique opportunities for addressing healthcare challenges, especially for individuals with visual impairments. This paper explores the development and implementation of a deep learning-based mobile application designed to assist blind and visually impaired individuals in real-time pill identification. Utilizing the YOLO framework, the application aims to… ▽ More The prevalence of mobile technology offers unique opportunities for addressing healthcare challenges, especially for individuals with visual impairments. This paper explores the development and implementation of a deep learning-based mobile application designed to assist blind and visually impaired individuals in real-time pill identification. Utilizing the YOLO framework, the application aims to accurately recognize and differentiate between various pill types through real-time image processing on mobile devices. The system incorporates Text-to- Speech (TTS) to provide immediate auditory feedback, enhancing usability and independence for visually impaired users. Our study evaluates the application's effectiveness in terms of detection accuracy and user experience, highlighting its potential to improve medication management and safety among the visually impaired community. Keywords-Deep Learning; YOLO Framework; Mobile Application; Visual Impairment; Pill Identification; Healthcare △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2405.04768 [pdf, other]

Circularly polarized light irradiated ferromagnetic MnBi$_2$Te$_4$: the long-sought ideal Weyl semimetal

Authors: Shuai Fan, Shengpu Huang, Zhuo Chen, Fangyang Zhan, Xian-Yong Ding, Da-Shuai Ma, Rui Wang

Abstract: The interaction between light and non-trivial energy band topology allows for the precise manipulation of topological quantum states, which has attracted intensive interest in condensed matter physics. In this work, using first-principles calculations, we studied the topological transition of ferromagnetic (FM) MnBi$_2$Te$_4$ upon irradiation with circularly polarized light (CPL). We revealed that… ▽ More The interaction between light and non-trivial energy band topology allows for the precise manipulation of topological quantum states, which has attracted intensive interest in condensed matter physics. In this work, using first-principles calculations, we studied the topological transition of ferromagnetic (FM) MnBi$_2$Te$_4$ upon irradiation with circularly polarized light (CPL). We revealed that the MnBi$_2$Te$_4$ can be driven from an FM insulator to a Weyl semimetal with a minimum number of Weyl points, i.e., two Weyl points in systems without time-reversal symmetry. More importantly, in FM MnBi$_2$Te$_4$ with out-of-plane easy magnetization axis, we found that the band dispersion of the WP evolves from Type-II to Type-III and finally to Type-I when the light intensity increases. Moreover, we show that the profile of the characteristic Fermi arc of Weyl semimetal phase is sensitive to changes in light intensity, which enables efficient manipulation of the Fermi arc length of FM MnBi$_2$Te$_4$ in experiments. In addition, for FM MnBi$_2$Te$_4$ with in-plane easy magnetization axis, the system becomes a type I Weyl semimetal under CPL irradiation. With controllable band dispersion, length of Fermi arc, and minimum number of WPs, our results indicate that CPL-irradiated FM MnBi$_2$Te$_4$ is an ideal platform to study novel transport phenomena in Weyl semimetals with distinct band dispersion. △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2405.00513 [pdf]

3D MR Fingerprinting for Dynamic Contrast-Enhanced Imaging of Whole Mouse Brain

Authors: Yuran Zhu, Guanhua Wang, Yuning Gu, Walter Zhao, Jiahao Lu, Junqing Zhu, Christina J. MacAskill, Andrew Dupuis, Mark A. Griswold, Dan Ma, Chris A. Flask, Xin Yu

Abstract: Quantitative MRI enables direct quantification of contrast agent concentrations in contrast-enhanced scans. However, the lengthy scan times required by conventional methods are inadequate for tracking contrast agent transport dynamically in mouse brain. We developed a 3D MR fingerprinting (MRF) method for simultaneous T1 and T2 mapping across the whole mouse brain with 4.3-min temporal resolution.… ▽ More Quantitative MRI enables direct quantification of contrast agent concentrations in contrast-enhanced scans. However, the lengthy scan times required by conventional methods are inadequate for tracking contrast agent transport dynamically in mouse brain. We developed a 3D MR fingerprinting (MRF) method for simultaneous T1 and T2 mapping across the whole mouse brain with 4.3-min temporal resolution. We designed a 3D MRF sequence with variable acquisition segment lengths and magnetization preparations on a 9.4T preclinical MRI scanner. Model-based reconstruction approaches were employed to improve the accuracy and speed of MRF acquisition. The method's accuracy for T1 and T2 measurements was validated in vitro, while its repeatability of T1 and T2 measurements was evaluated in vivo (n=3). The utility of the 3D MRF sequence for dynamic tracking of intracisternally infused Gd-DTPA in the whole mouse brain was demonstrated (n=5). Phantom studies confirmed accurate T1 and T2 measurements by 3D MRF with an undersampling factor up to 48. Dynamic contrast-enhanced (DCE) MRF scans achieved a spatial resolution of 192 x 192 x 500 um3 and a temporal resolution of 4.3 min, allowing for the analysis and comparison of dynamic changes in concentration and transport kinetics of intracisternally infused Gd-DTPA across brain regions. The sequence also enabled highly repeatable, high-resolution T1 and T2 mapping of the whole mouse brain (192 x 192 x 250 um3) in 30 min. We present the first dynamic and multi-parametric approach for quantitatively tracking contrast agent transport in the mouse brain using 3D MRF. △ Less

Submitted 1 May, 2024; originally announced May 2024.

arXiv:2404.12634 [pdf]

Transformer-Based Classification Outcome Prediction for Multimodal Stroke Treatment

Authors: Danqing Ma, Meng Wang, Ao Xiang, Zongqing Qi, Qin Yang

Abstract: This study proposes a multi-modal fusion framework Multitrans based on the Transformer architecture and self-attention mechanism. This architecture combines the study of non-contrast computed tomography (NCCT) images and discharge diagnosis reports of patients undergoing stroke treatment, using a variety of methods based on Transformer architecture approach to predicting functional outcomes of str… ▽ More This study proposes a multi-modal fusion framework Multitrans based on the Transformer architecture and self-attention mechanism. This architecture combines the study of non-contrast computed tomography (NCCT) images and discharge diagnosis reports of patients undergoing stroke treatment, using a variety of methods based on Transformer architecture approach to predicting functional outcomes of stroke treatment. The results show that the performance of single-modal text classification is significantly better than single-modal image classification, but the effect of multi-modal combination is better than any single modality. Although the Transformer model only performs worse on imaging data, when combined with clinical meta-diagnostic information, both can learn better complementary information and make good contributions to accurately predicting stroke treatment effects.. △ Less

Submitted 19 April, 2024; originally announced April 2024.

arXiv:2404.07506 [pdf, other]

Flexible Control of Chiral Superconductivity in Optically Driven Nodal Point Superconductors with Antiferromagnetism

Authors: Zhen Ning, Junjie Zeng, Da-Shuai Ma, Dong-Hui Xu, Rui Wang

Abstract: Recent studies have attracted widespread attention on magnet-superconductor hybrid systems with emergent topological superconductivity. Here, we present the Floquet engineering of realistic two-dimensional topological nodal-point superconductors that are composed of antiferromagnetic monolayers in proximity to an s-wave superconductor. We show that Floquet chiral topological superconductivity aris… ▽ More Recent studies have attracted widespread attention on magnet-superconductor hybrid systems with emergent topological superconductivity. Here, we present the Floquet engineering of realistic two-dimensional topological nodal-point superconductors that are composed of antiferromagnetic monolayers in proximity to an s-wave superconductor. We show that Floquet chiral topological superconductivity arises naturally due to light-induced breaking of the effective time-reversal symmetry. More strikingly, we find that the Floquet chiral topological superconducting phases can be flexibly controlled by irradiating elliptically polarized light, with the photon-dressed quasi-energy spectrum carrying different Chern numbers. Such optically switchable topological transition is attributed to the simultaneous creations (or annihilations) of valley pairs. Our findings provide a feasible approach for achieving the Floquet chiral topological superconductivity with flexible tunability, which would draw extensive attention in experiments. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2404.07472 [pdf, other]

doi 10.1109/LWC.2024.3406577

Cramer-Rao Bounds for Near-Field Sensing: A Generic Modular Architecture

Authors: Chunwei Meng, Dingyou Ma, Xu Chen, Zhiyong Feng, Yuanwei Liu

Abstract: A generic modular array architecture is proposed, featuring uniform/non-uniform subarray layouts that allows for flexible deployment. The bistatic near-field sensing system is considered, where the target is located in the near-field of the whole modular array and the far-field of each subarray. Then, the closed-form expressions of Cramer-Rao bounds (CRBs) for range and angle estimations are deriv… ▽ More A generic modular array architecture is proposed, featuring uniform/non-uniform subarray layouts that allows for flexible deployment. The bistatic near-field sensing system is considered, where the target is located in the near-field of the whole modular array and the far-field of each subarray. Then, the closed-form expressions of Cramer-Rao bounds (CRBs) for range and angle estimations are derived based on the hybrid spherical and planar wave model (HSPM). Simulation results validate the accuracy of the derived closed-form CRBs and demonstrate that: i) The HSPM with varying angles of arrival (AoAs) between subarrays can reduce the CRB for range estimation compared to the traditional HSPM with shared AoA; and ii) The proposed generic modular architecture with subarrays positioned closer to the edges can significantly reduce the CRBs compared to the traditional modular architecture with uniform subarray layout, when the array aperture is fixed. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2403.19081 [pdf]

Surface variation analysis of freeform optical systems over surface frequency bands for prescribed wavefront errors

Authors: Rundong Fan, Shili Wei, Huiru JI, Zhuang Qian, Hao Tan, Yan Mo, Donglin MA

Abstract: The surface errors of freeform surfaces reflect the manufacturing complexities and significantly impact the feasibility of processing designed optical systems. With multiple degrees of freedom, freeform surfaces pose challenges in surface tolerance analysis in the field. Nevertheless, current research has neglected the influence of surface slopes on the directions of ray propagation. A sudden alte… ▽ More The surface errors of freeform surfaces reflect the manufacturing complexities and significantly impact the feasibility of processing designed optical systems. With multiple degrees of freedom, freeform surfaces pose challenges in surface tolerance analysis in the field. Nevertheless, current research has neglected the influence of surface slopes on the directions of ray propagation. A sudden alteration in the surface slope will lead to a corresponding abrupt shift in the wavefront, even when the change in surface sag is minimal. Moreover, within the realm of freeform surface manufacturing, variation in surface slope across different frequency bands may give rise to unique surface variation. Within the context of this study, we propose a tolerance analysis method to analyze surface variation in freeform surfaces considering surface frequency band slopes based on real ray data. This approach utilizes real ray data to rapidly evaluate surface variation within a specified frequency band of surface slopes. Crucially, our proposed method yields the capability to obtain system surface variation with significant wavefront aberration, in contrast to previous methodologies. The feasibility and advantages of this framework are assessed by analyzing a single-mirror system with a single field and an off-axis two-mirror system. We expect to integrate the proposed methodology with freeform surface design and manufacturing, thereby expanding the scope of freeform optics. △ Less

Submitted 27 March, 2024; originally announced March 2024.

arXiv:2403.18349 [pdf, other]

Rejection Improves Reliability: Training LLMs to Refuse Unknown Questions Using RL from Knowledge Feedback

Authors: Hongshen Xu, Zichen Zhu, Situo Zhang, Da Ma, Shuai Fan, Lu Chen, Kai Yu

Abstract: Large Language Models (LLMs) often generate erroneous outputs, known as hallucinations, due to their limitations in discerning questions beyond their knowledge scope. While addressing hallucination has been a focal point in research, previous efforts primarily concentrate on enhancing correctness without giving due consideration to the significance of rejection mechanisms. In this paper, we conduc… ▽ More Large Language Models (LLMs) often generate erroneous outputs, known as hallucinations, due to their limitations in discerning questions beyond their knowledge scope. While addressing hallucination has been a focal point in research, previous efforts primarily concentrate on enhancing correctness without giving due consideration to the significance of rejection mechanisms. In this paper, we conduct a comprehensive examination of the role of rejection, introducing the notion of model reliability along with corresponding metrics. These metrics measure the model's ability to provide accurate responses while adeptly rejecting questions exceeding its knowledge boundaries, thereby minimizing hallucinations. To improve the inherent reliability of LLMs, we present a novel alignment framework called Reinforcement Learning from Knowledge Feedback (RLKF). RLKF leverages knowledge feedback to dynamically determine the model's knowledge boundary and trains a reliable reward model to encourage the refusal of out-of-knowledge questions. Experimental results on mathematical questions affirm the substantial efficacy of RLKF in significantly enhancing LLM reliability. △ Less

Submitted 7 April, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

arXiv:2403.17421 [pdf, other]

MA4DIV: Multi-Agent Reinforcement Learning for Search Result Diversification

Authors: Yiqun Chen, Jiaxin Mao, Yi Zhang, Dehong Ma, Long Xia, Jun Fan, Daiting Shi, Zhicong Cheng, Simiu Gu, Dawei Yin

Abstract: The objective of search result diversification (SRD) is to ensure that selected documents cover as many different subtopics as possible. Existing methods primarily utilize a paradigm of "greedy selection", i.e., selecting one document with the highest diversity score at a time. These approaches tend to be inefficient and are easily trapped in a suboptimal state. In addition, some other methods aim… ▽ More The objective of search result diversification (SRD) is to ensure that selected documents cover as many different subtopics as possible. Existing methods primarily utilize a paradigm of "greedy selection", i.e., selecting one document with the highest diversity score at a time. These approaches tend to be inefficient and are easily trapped in a suboptimal state. In addition, some other methods aim to approximately optimize the diversity metric, such as $α$-NDCG, but the results still remain suboptimal. To address these challenges, we introduce Multi-Agent reinforcement learning (MARL) for search result DIVersity, which called MA4DIV. In this approach, each document is an agent and the search result diversification is modeled as a cooperative task among multiple agents. This approach allows for directly optimizing the diversity metrics, such as $α$-NDCG, while achieving high training efficiency. We conducted preliminary experiments on public TREC datasets to demonstrate the effectiveness and potential of MA4DIV. Considering the limited number of queries in public TREC datasets, we construct a large-scale dataset from industry sources and show that MA4DIV achieves substantial improvements in both effectiveness and efficiency than existing baselines on a industrial scale dataset. △ Less

Submitted 27 March, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

arXiv:2403.14483 [pdf, other]

doi 10.54254/2755-2721/75/20240503

Utilizing the LightGBM Algorithm for Operator User Credit Assessment Research

Authors: Shaojie Li, Xinqi Dong, Danqing Ma, Bo Dang, Hengyi Zang, Yulu Gong

Abstract: Mobile Internet user credit assessment is an important way for communication operators to establish decisions and formulate measures, and it is also a guarantee for operators to obtain expected benefits. However, credit evaluation methods have long been monopolized by financial industries such as banks and credit. As supporters and providers of platform network technology and network resources, co… ▽ More Mobile Internet user credit assessment is an important way for communication operators to establish decisions and formulate measures, and it is also a guarantee for operators to obtain expected benefits. However, credit evaluation methods have long been monopolized by financial industries such as banks and credit. As supporters and providers of platform network technology and network resources, communication operators are also builders and maintainers of communication networks. Internet data improves the user's credit evaluation strategy. This paper uses the massive data provided by communication operators to carry out research on the operator's user credit evaluation model based on the fusion LightGBM algorithm. First, for the massive data related to user evaluation provided by operators, key features are extracted by data preprocessing and feature engineering methods, and a multi-dimensional feature set with statistical significance is constructed; then, linear regression, decision tree, LightGBM, and other machine learning algorithms build multiple basic models to find the best basic model; finally, integrates Averaging, Voting, Blending, Stacking and other integrated algorithms to refine multiple fusion models, and finally establish the most suitable fusion model for operator user evaluation. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Journal ref: ACE (2024) Vol. 75: 36-47

arXiv:2403.13703 [pdf]

Fostc3net:A Lightweight YOLOv5 Based On the Network Structure Optimization

Authors: Danqing Ma, Shaojie Li, Bo Dang, Hengyi Zang, Xinqi Dong

Abstract: Transmission line detection technology is crucial for automatic monitoring and ensuring the safety of electrical facilities. The YOLOv5 series is currently one of the most advanced and widely used methods for object detection. However, it faces inherent challenges, such as high computational load on devices and insufficient detection accuracy. To address these concerns, this paper presents an enha… ▽ More Transmission line detection technology is crucial for automatic monitoring and ensuring the safety of electrical facilities. The YOLOv5 series is currently one of the most advanced and widely used methods for object detection. However, it faces inherent challenges, such as high computational load on devices and insufficient detection accuracy. To address these concerns, this paper presents an enhanced lightweight YOLOv5 technique customized for mobile devices, specifically intended for identifying objects associated with transmission lines. The C3Ghost module is integrated into the convolutional network of YOLOv5 to reduce floating point operations per second (FLOPs) in the feature channel fusion process and improve feature expression performance. In addition, a FasterNet module is introduced to replace the c3 module in the YOLOv5 Backbone. The FasterNet module uses Partial Convolutions to process only a portion of the input channels, improving feature extraction efficiency and reducing computational overhead. To address the imbalance between simple and challenging samples in the dataset and the diversity of aspect ratios of bounding boxes, the wIoU v3 LOSS is adopted as the loss function. To validate the performance of the proposed approach, Experiments are conducted on a custom dataset of transmission line poles. The results show that the proposed model achieves a 1% increase in detection accuracy, a 13% reduction in FLOPs, and a 26% decrease in model parameters compared to the existing YOLOv5.In the ablation experiment, it was also discovered that while the Fastnet module and the CSghost module improved the precision of the original YOLOv5 baseline model, they caused a decrease in the mAP@.5-.95 metric. However, the improvement of the wIoUv3 loss function significantly mitigated the decline of the mAP@.5-.95 metric. △ Less

Submitted 20 March, 2024; originally announced March 2024.

arXiv:2403.12574 [pdf, other]

EAS-SNN: End-to-End Adaptive Sampling and Representation for Event-based Detection with Recurrent Spiking Neural Networks

Authors: Ziming Wang, Ziling Wang, Huaning Li, Lang Qin, Runhao Jiang, De Ma, Huajin Tang

Abstract: Event cameras, with their high dynamic range and temporal resolution, are ideally suited for object detection, especially under scenarios with motion blur and challenging lighting conditions. However, while most existing approaches prioritize optimizing spatiotemporal representations with advanced detection backbones and early aggregation functions, the crucial issue of adaptive event sampling rem… ▽ More Event cameras, with their high dynamic range and temporal resolution, are ideally suited for object detection, especially under scenarios with motion blur and challenging lighting conditions. However, while most existing approaches prioritize optimizing spatiotemporal representations with advanced detection backbones and early aggregation functions, the crucial issue of adaptive event sampling remains largely unaddressed. Spiking Neural Networks (SNNs), which operate on an event-driven paradigm through sparse spike communication, emerge as a natural fit for addressing this challenge. In this study, we discover that the neural dynamics of spiking neurons align closely with the behavior of an ideal temporal event sampler. Motivated by this insight, we propose a novel adaptive sampling module that leverages recurrent convolutional SNNs enhanced with temporal memory, facilitating a fully end-to-end learnable framework for event-based detection. Additionally, we introduce Residual Potential Dropout (RPD) and Spike-Aware Training (SAT) to regulate potential distribution and address performance degradation encountered in spike-based sampling modules. Through rigorous testing on neuromorphic datasets for event-based detection, our approach demonstrably surpasses existing state-of-the-art spike-based methods, achieving superior performance with significantly fewer parameters and time steps. For instance, our method achieves a 4.4\% mAP improvement on the Gen1 dataset, while requiring 38\% fewer parameters and three time steps. Moreover, the applicability and effectiveness of our adaptive sampling methodology extend beyond SNNs, as demonstrated through further validation on conventional non-spiking detection models. △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2403.12332 [pdf, other]

A maximum penalised likelihood approach for semiparametric accelerated failure time models with time-varying covariates and partly interval censoring

Authors: Aishwarya Bhaskaran, Ding Ma, Benoit Liquet, Angela Hong, Serigne N Lo, Stephane Heritier, Jun Ma

Abstract: Accelerated failure time (AFT) models are frequently used for modelling survival data. This approach is attractive as it quantifies the direct relationship between the time until an event occurs and various covariates. It asserts that the failure times experience either acceleration or deceleration through a multiplicative factor when these covariates are present. While existing literature provide… ▽ More Accelerated failure time (AFT) models are frequently used for modelling survival data. This approach is attractive as it quantifies the direct relationship between the time until an event occurs and various covariates. It asserts that the failure times experience either acceleration or deceleration through a multiplicative factor when these covariates are present. While existing literature provides numerous methods for fitting AFT models with time-fixed covariates, adapting these approaches to scenarios involving both time-varying covariates and partly interval-censored data remains challenging. In this paper, we introduce a maximum penalised likelihood approach to fit a semiparametric AFT model. This method, designed for survival data with partly interval-censored failure times, accommodates both time-fixed and time-varying covariates. We utilise Gaussian basis functions to construct a smooth approximation of the nonparametric baseline hazard and fit the model via a constrained optimisation approach. To illustrate the effectiveness of our proposed method, we conduct a comprehensive simulation study. We also present an implementation of our approach on a randomised clinical trial dataset on advanced melanoma patients. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: 31 pages, 5 figures, 4 tables

arXiv:2403.09035 [pdf, other]

DiTMoS: Delving into Diverse Tiny-Model Selection on Microcontrollers

Authors: Xiao Ma, Shengfeng He, Hezhe Qiao, Dong Ma

Abstract: Enabling efficient and accurate deep neural network (DNN) inference on microcontrollers is non-trivial due to the constrained on-chip resources. Current methodologies primarily focus on compressing larger models yet at the expense of model accuracy. In this paper, we rethink the problem from the inverse perspective by constructing small/weak models directly and improving their accuracy. Thus, we i… ▽ More Enabling efficient and accurate deep neural network (DNN) inference on microcontrollers is non-trivial due to the constrained on-chip resources. Current methodologies primarily focus on compressing larger models yet at the expense of model accuracy. In this paper, we rethink the problem from the inverse perspective by constructing small/weak models directly and improving their accuracy. Thus, we introduce DiTMoS, a novel DNN training and inference framework with a selector-classifiers architecture, where the selector routes each input sample to the appropriate classifier for classification. DiTMoS is grounded on a key insight: a composition of weak models can exhibit high diversity and the union of them can significantly boost the accuracy upper bound. To approach the upper bound, DiTMoS introduces three strategies including diverse training data splitting to increase the classifiers' diversity, adversarial selector-classifiers training to ensure synergistic interactions thereby maximizing their complementarity, and heterogeneous feature aggregation to improve the capacity of classifiers. We further propose a network slicing technique to alleviate the extra memory overhead incurred by feature aggregation. We deploy DiTMoS on the Neucleo STM32F767ZI board and evaluate it based on three time-series datasets for human activity recognition, keywords spotting, and emotion recognition, respectively. The experiment results manifest that: (a) DiTMoS achieves up to 13.4% accuracy improvement compared to the best baseline; (b) network slicing almost completely eliminates the memory overhead incurred by feature aggregation with a marginal increase of latency. △ Less

Submitted 13 March, 2024; originally announced March 2024.

arXiv:2403.08511 [pdf]

A Multimodal Fusion Network For Student Emotion Recognition Based on Transformer and Tensor Product

Authors: Ao Xiang, Zongqing Qi, Han Wang, Qin Yang, Danqing Ma

Abstract: This paper introduces a new multi-modal model based on the Transformer architecture and tensor product fusion strategy, combining BERT's text vectors and ViT's image vectors to classify students' psychological conditions, with an accuracy of 93.65%. The purpose of the study is to accurately analyze the mental health status of students from various data sources. This paper discusses modal fusion me… ▽ More This paper introduces a new multi-modal model based on the Transformer architecture and tensor product fusion strategy, combining BERT's text vectors and ViT's image vectors to classify students' psychological conditions, with an accuracy of 93.65%. The purpose of the study is to accurately analyze the mental health status of students from various data sources. This paper discusses modal fusion methods, including early, late and intermediate fusion, to overcome the challenges of integrating multi-modal information. Ablation studies compare the performance of different models and fusion techniques, showing that the proposed model outperforms existing methods such as CLIP and ViLBERT in terms of accuracy and inference speed. Conclusions indicate that while this model has significant advantages in emotion recognition, its potential to incorporate other data modalities provides areas for future research. △ Less

Submitted 19 April, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

arXiv:2403.08499 [pdf]

Improved YOLOv5 Based on Attention Mechanism and FasterNet for Foreign Object Detection on Railway and Airway tracks

Authors: Zongqing Qi, Danqing Ma, Jingyu Xu, Ao Xiang, Hedi Qu

Abstract: In recent years, there have been frequent incidents of foreign objects intruding into railway and Airport runways. These objects can include pedestrians, vehicles, animals, and debris. This paper introduces an improved YOLOv5 architecture incorporating FasterNet and attention mechanisms to enhance the detection of foreign objects on railways and Airport runways. This study proposes a new dataset,… ▽ More In recent years, there have been frequent incidents of foreign objects intruding into railway and Airport runways. These objects can include pedestrians, vehicles, animals, and debris. This paper introduces an improved YOLOv5 architecture incorporating FasterNet and attention mechanisms to enhance the detection of foreign objects on railways and Airport runways. This study proposes a new dataset, AARFOD (Aero and Rail Foreign Object Detection), which combines two public datasets for detecting foreign objects in aviation and railway systems.The dataset aims to improve the recognition capabilities of foreign object targets. Experimental results on this large dataset have demonstrated significant performance improvements of the proposed model over the baseline YOLOv5 model, reducing computational requirements.Improved YOLO model shows a significant improvement in precision by 1.2%, recall rate by 1.0%, and mAP@.5 by 0.6%, while mAP@.5-.95 remained unchanged. The parameters were reduced by approximately 25.12%, and GFLOPs were reduced by about 10.63%. In the ablation experiment, it is found that the FasterNet module can significantly reduce the number of parameters of the model, and the reference of the attention mechanism can slow down the performance loss caused by lightweight. △ Less

Submitted 13 March, 2024; originally announced March 2024.

arXiv:2403.02586 [pdf, other]

Improving Event Definition Following For Zero-Shot Event Detection

Authors: Zefan Cai, Po-Nien Kung, Ashima Suvarna, Mingyu Derek Ma, Hritik Bansal, Baobao Chang, P. Jeffrey Brantingham, Wei Wang, Nanyun Peng

Abstract: Existing approaches on zero-shot event detection usually train models on datasets annotated with known event types, and prompt them with unseen event definitions. These approaches yield sporadic successes, yet generally fall short of expectations. In this work, we aim to improve zero-shot event detection by training models to better follow event definitions. We hypothesize that a diverse set of ev… ▽ More Existing approaches on zero-shot event detection usually train models on datasets annotated with known event types, and prompt them with unseen event definitions. These approaches yield sporadic successes, yet generally fall short of expectations. In this work, we aim to improve zero-shot event detection by training models to better follow event definitions. We hypothesize that a diverse set of event types and definitions are the key for models to learn to follow event definitions while existing event extraction datasets focus on annotating many high-quality examples for a few event types. To verify our hypothesis, we construct an automatically generated Diverse Event Definition (DivED) dataset and conduct comparative studies. Our experiments reveal that a large number of event types (200) and diverse event definitions can significantly boost event extraction performance; on the other hand, the performance does not scale with over ten examples per event type. Beyond scaling, we incorporate event ontology information and hard-negative samples during training, further boosting the performance. Based on these findings, we fine-tuned a LLaMA-2-7B model on our DivED dataset, yielding performance that surpasses SOTA large language models like GPT-3.5 across three open benchmarks on zero-shot event detection. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2402.18262 [pdf, other]

Hierarchical Multimodal Pre-training for Visually Rich Webpage Understanding

Authors: Hongshen Xu, Lu Chen, Zihan Zhao, Da Ma, Ruisheng Cao, Zichen Zhu, Kai Yu

Abstract: The growing prevalence of visually rich documents, such as webpages and scanned/digital-born documents (images, PDFs, etc.), has led to increased interest in automatic document understanding and information extraction across academia and industry. Although various document modalities, including image, text, layout, and structure, facilitate human information retrieval, the interconnected nature of… ▽ More The growing prevalence of visually rich documents, such as webpages and scanned/digital-born documents (images, PDFs, etc.), has led to increased interest in automatic document understanding and information extraction across academia and industry. Although various document modalities, including image, text, layout, and structure, facilitate human information retrieval, the interconnected nature of these modalities presents challenges for neural networks. In this paper, we introduce WebLM, a multimodal pre-training network designed to address the limitations of solely modeling text and structure modalities of HTML in webpages. Instead of processing document images as unified natural images, WebLM integrates the hierarchical structure of document images to enhance the understanding of markup-language-based documents. Additionally, we propose several pre-training tasks to model the interaction among text, structure, and image modalities effectively. Empirical results demonstrate that the pre-trained WebLM significantly surpasses previous state-of-the-art pre-trained models across several webpage understanding tasks. The pre-trained models and code are available at https://github.com/X-LANCE/weblm. △ Less

Submitted 28 February, 2024; originally announced February 2024.

arXiv:2402.15725 [pdf, other]

Text-guided HuBERT: Self-Supervised Speech Pre-training via Generative Adversarial Networks

Authors: Duo Ma, Xianghu Yue, Junyi Ao, Xiaoxue Gao, Haizhou Li

Abstract: Human language can be expressed in either written or spoken form, i.e. text or speech. Humans can acquire knowledge from text to improve speaking and listening. However, the quest for speech pre-trained models to leverage unpaired text has just started. In this paper, we investigate a new way to pre-train such a joint speech-text model to learn enhanced speech representations and benefit various s… ▽ More Human language can be expressed in either written or spoken form, i.e. text or speech. Humans can acquire knowledge from text to improve speaking and listening. However, the quest for speech pre-trained models to leverage unpaired text has just started. In this paper, we investigate a new way to pre-train such a joint speech-text model to learn enhanced speech representations and benefit various speech-related downstream tasks. Specifically, we propose a novel pre-training method, text-guided HuBERT, or T-HuBERT, which performs self-supervised learning over speech to derive phoneme-like discrete representations. And these phoneme-like pseudo-label sequences are firstly derived from speech via the generative adversarial networks (GAN) to be statistically similar to those from additional unpaired textual data. In this way, we build a bridge between unpaired speech and text in an unsupervised manner. Extensive experiments demonstrate the significant superiority of our proposed method over various strong baselines, which achieves up to 15.3% relative Word Error Rate (WER) reduction on the LibriSpeech dataset. △ Less

Submitted 28 February, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

Comments: 5 pages, 1 figures,5 tables, submit to IEEE Signal Processing Letters(SPL)

arXiv:2402.14774 [pdf]

Dominant 1/3-filling Correlated Insulator States and Orbital Geometric Frustration in Twisted Bilayer Graphene

Authors: Haidong Tian, Emilio Codecido, Dan Mao, Kevin Zhang, Shi Che, Kenji Watanabe, Takashi Taniguchi, Dmitry Smirnov, Eun-Ah Kim, Marc Bockrath, Chun Ning Lau

Abstract: Geometric frustration is a phenomenon in a lattice system where not all interactions can be satisfied, the simplest example being antiferromagnetically coupled spins on a triangular lattice. Frustrated systems are characterized by their many nearly degenerate ground states, leading to non-trivial phases such as spin ice and spin liquids. To date most studies are on geometric frustration of spins;… ▽ More Geometric frustration is a phenomenon in a lattice system where not all interactions can be satisfied, the simplest example being antiferromagnetically coupled spins on a triangular lattice. Frustrated systems are characterized by their many nearly degenerate ground states, leading to non-trivial phases such as spin ice and spin liquids. To date most studies are on geometric frustration of spins; much less explored is orbital geometric frustration. For electrons in twisted bilayer graphene (tBLG) at denominator 3 fractional filling, Coulomb interactions and the Wannier orbital shapes are predicted to strongly constrain spatial charge ordering, leading to geometrically frustrated ground states that produce a new class of correlated insulators (CIs). Here we report the observation of dominant denominator 3 fractional filling insulating states in large angle tBLG; these states persist in magnetic fields and display magnetic ordering signatures and tripled unit cell reconstruction. These results are in agreement with a strong-coupling theory of symmetry-breaking of geometrically frustrated fractional states. △ Less

Submitted 22 February, 2024; originally announced February 2024.

arXiv:2402.14677 [pdf, other]

Influence of thermal effects on atomic Bloch oscillation

Authors: Guoling Yin, Chi-Kin Lai, Nana Chang, Yi Liang, Dekai Mao, Xiaoji Zhou

Abstract: Advancements in the experimental toolbox of cold atoms have enabled the meticulous control of atomic Bloch oscillation within optical lattices, thereby enhancing the capabilities of gravity interferometers. This work delves into the impact of thermal effects on Bloch oscillation in 1D accelerated optical lattices aligned with gravity by varying the system's initial temperature. Through the applica… ▽ More Advancements in the experimental toolbox of cold atoms have enabled the meticulous control of atomic Bloch oscillation within optical lattices, thereby enhancing the capabilities of gravity interferometers. This work delves into the impact of thermal effects on Bloch oscillation in 1D accelerated optical lattices aligned with gravity by varying the system's initial temperature. Through the application of Raman cooling, we effectively reduce the longitudinal thermal effect, stabilizing the longitudinal coherence length over the timescale of its lifetime. The atomic losses over multiple Bloch oscillation is measured, which are primarily attributed to transverse excitation. Furthermore, we identify two distinct inverse scaling behaviors in the oscillation lifetime scaled by the corresponding density with respect to temperatures, implying diverse equilibrium processes within or outside the Bose-Einstein condensate regime. The competition between the system's coherence and atomic density leads to a relatively smooth variation in the actual lifetime versus temperature. Our findings provide valuable insights into the interaction between thermal effects and Bloch oscillation, offering avenues for the refinement of quantum measurement technologies. △ Less

Submitted 22 February, 2024; originally announced February 2024.

Comments: 8 pages, 7 figures

arXiv:2402.09264 [pdf, other]

UR2M: Uncertainty and Resource-Aware Event Detection on Microcontrollers

Authors: Hong Jia, Young D. Kwon, Dong Ma, Nhat Pham, Lorena Qendro, Tam Vu, Cecilia Mascolo

Abstract: Traditional machine learning techniques are prone to generating inaccurate predictions when confronted with shifts in the distribution of data between the training and testing phases. This vulnerability can lead to severe consequences, especially in applications such as mobile healthcare. Uncertainty estimation has the potential to mitigate this issue by assessing the reliability of a model's outp… ▽ More Traditional machine learning techniques are prone to generating inaccurate predictions when confronted with shifts in the distribution of data between the training and testing phases. This vulnerability can lead to severe consequences, especially in applications such as mobile healthcare. Uncertainty estimation has the potential to mitigate this issue by assessing the reliability of a model's output. However, existing uncertainty estimation techniques often require substantial computational resources and memory, making them impractical for implementation on microcontrollers (MCUs). This limitation hinders the feasibility of many important on-device wearable event detection (WED) applications, such as heart attack detection. In this paper, we present UR2M, a novel Uncertainty and Resource-aware event detection framework for MCUs. Specifically, we (i) develop an uncertainty-aware WED based on evidential theory for accurate event detection and reliable uncertainty estimation; (ii) introduce a cascade ML framework to achieve efficient model inference via early exits, by sharing shallower model layers among different event models; (iii) optimize the deployment of the model and MCU library for system efficiency. We conducted extensive experiments and compared UR2M to traditional uncertainty baselines using three wearable datasets. Our results demonstrate that UR2M achieves up to 864% faster inference speed, 857% energy-saving for uncertainty estimation, 55% memory saving on two popular MCUs, and a 22% improvement in uncertainty quantification performance. UR2M can be deployed on a wide range of MCUs, significantly expanding real-time and reliable WED applications. △ Less

Submitted 12 March, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

arXiv:2402.07971 [pdf, other]

Quasicrystalline Spin Liquid

Authors: Sunghoon Kim, Mohammad Saad, Dan Mao, Adhip Agarwala, Debanjan Chowdhury

Abstract: The interplay of electronic interactions and frustration in crystalline systems leads to a panoply of correlated phases, including exotic Mott insulators with non-trivial patterns of entanglement. Disorder introduces additional quantum interference effects that can drive localization phenomena. Quasicrystals, which are neither disordered nor perfectly crystalline, are interesting playgrounds for s… ▽ More The interplay of electronic interactions and frustration in crystalline systems leads to a panoply of correlated phases, including exotic Mott insulators with non-trivial patterns of entanglement. Disorder introduces additional quantum interference effects that can drive localization phenomena. Quasicrystals, which are neither disordered nor perfectly crystalline, are interesting playgrounds for studying the effects of interaction, frustration, and quantum interference. Here we consider a solvable example of a quantum spin liquid on a tri-coordinated quasicrystal. We extend Kitaev's original construction for the spin model to our quasicrystalline setting and perform a large scale flux-sampling to find the ground-state configuration in terms of the emergent majorana fermions and flux excitations. This reveals a fully gapped quantum spin liquid, regardless of the exchange anisotropies, accompanied by a tendency towards non-trivial (de-)localization at the edge and the bulk. The advent of moiré materials and a variety of quantum simulators provide a new platform to bring phases of quasicrystalline quantum matter to life in a controlled fashion. △ Less

Submitted 12 February, 2024; originally announced February 2024.

Comments: 5 pages, 3 figures. Supplementary material: 5 pages, 7 figures

arXiv:2402.06177 [pdf, other]

Hamiltonicity of Sparse Pseudorandom Graphs

Authors: Asaf Ferber, Jie Han, Dingjia Mao, Roman Vershynin

Abstract: We show that every $(n,d,λ)$-graph contains a Hamilton cycle for sufficiently large $n$, assuming that $d\geq \log^{10}n$ and $λ\leq cd$, where $c=\frac{1}{9000}$. This significantly improves a recent result of Glock, Correia and Sudakov, who obtain a similar result for $d$ that grows polynomially with $n$. The proof is based on the absorption technique combined with a new result regarding the sec… ▽ More We show that every $(n,d,λ)$-graph contains a Hamilton cycle for sufficiently large $n$, assuming that $d\geq \log^{10}n$ and $λ\leq cd$, where $c=\frac{1}{9000}$. This significantly improves a recent result of Glock, Correia and Sudakov, who obtain a similar result for $d$ that grows polynomially with $n$. The proof is based on the absorption technique combined with a new result regarding the second largest eigenvalue of the adjacency matrix of a subgraph induced by a random subset of vertices. We believe that the latter result is of an independent interest and will have further applications. △ Less

Submitted 8 February, 2024; originally announced February 2024.

arXiv:2402.04245 [pdf, other]

Direct evidence for cosmic-ray-induced correlated errors in superconducting qubit array

Authors: Xue-Gang Li, Jun-Hua Wang, Yao-Yao Jiang, Guang-Ming Xue, Xiao-Xia Cai, Jun Zhou, Ming Gong, Zhao-Feng Liu, Shuang-Yu Zheng, Deng-Ke Ma, Mo Chen, Wei-Jie Sun, Shuang Yang, Fei Yan, Yi-Rong Jin, Xue-Feng Ding, Hai-Feng Yu

Abstract: Correlated errors can significantly impact the quantum error correction, which challenges the assumption that errors occur in different qubits independently in both space and time. Superconducting qubits have been found to suffer correlated errors across multiple qubits, which could be attributable to ionizing radiations and cosmic rays. Nevertheless, the direct evidence and a quantitative underst… ▽ More Correlated errors can significantly impact the quantum error correction, which challenges the assumption that errors occur in different qubits independently in both space and time. Superconducting qubits have been found to suffer correlated errors across multiple qubits, which could be attributable to ionizing radiations and cosmic rays. Nevertheless, the direct evidence and a quantitative understanding of this relationship are currently lacking. In this work, we propose to continuously monitor multi-qubit simultaneous charge-parity jumps to detect correlated errors and find that occur more frequently than multi-qubit simultaneous bit flips. Then, we propose to position two cosmic-ray muon detectors directly beneath the sample box in a dilution refrigerator and successfully observe the correlated errors in a superconducting qubit array triggered by muons. By introducing a lead shielding layer on the refrigerator, we also reveal that the majority of other correlated errors are primarily induced by gamma rays. Furthermore, we find the superconducting film with a higher recombination rate of quasiparticles used in the qubits is helpful in reducing the duration of correlated errors. Our results provide experimental evidence of the impact of gamma rays and muons on superconducting quantum computation and offer practical insights into mitigation strategies for quantum error correction. In addition, we observe the average occurrence rate of muon-induced correlated errors in our processor is approximately 0.40 min$^{-1}$cm$^{-2}$, which is comparable to the muon event rate detected by the muon detector with 0.506 min$^{-1}$cm$^{-2}$. This demonstrates the potential applications of superconducting qubit arrays as low-energy threshold sensors in the field of high-energy physics. △ Less

Submitted 23 February, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

Comments: 7 pages and 5 figures for the main text, 20 pages and 20 figures for the supplementary materials

arXiv:2402.03557 [pdf, other]

Robust Analysis of Multi-Task Learning Efficiency: New Benchmarks on Light-Weighed Backbones and Effective Measurement of Multi-Task Learning Challenges by Feature Disentanglement

Authors: Dayou Mao, Yuhao Chen, Yifan Wu, Maximilian Gilles, Alexander Wong

Abstract: One of the main motivations of MTL is to develop neural networks capable of inferring multiple tasks simultaneously. While countless methods have been proposed in the past decade investigating robust model architectures and efficient training algorithms, there is still lack of understanding of these methods when applied on smaller feature extraction backbones, the generalizability of the commonly… ▽ More One of the main motivations of MTL is to develop neural networks capable of inferring multiple tasks simultaneously. While countless methods have been proposed in the past decade investigating robust model architectures and efficient training algorithms, there is still lack of understanding of these methods when applied on smaller feature extraction backbones, the generalizability of the commonly used fast approximation technique of replacing parameter-level gradients with feature level gradients, and lack of comprehensive understanding of MTL challenges and how one can efficiently and effectively identify the challenges. In this paper, we focus on the aforementioned efficiency aspects of existing MTL methods. We first carry out large-scale experiments of the methods with smaller backbones and on a the MetaGraspNet dataset as a new test ground. We also compare the existing methods with and without using the fast gradient surrogate and empirically study the generalizability of this technique. Lastly, we propose Feature Disentanglement measure as a novel and efficient identifier of the challenges in MTL, and propose Ranking Similarity score as an evaluation metric for different identifiers to prove the faithfulness of our method. △ Less

Submitted 16 April, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

arXiv:2402.00005 [pdf, other]

doi 10.1007/s44214-023-00039-9

1002 km Twin-Field Quantum Key Distribution with Finite-Key Analysis

Authors: Yang Liu, Wei-Jun Zhang, Cong Jiang, Jiu-Peng Chen, Di Ma, Chi Zhang, Wen-Xin Pan, Hao Dong, Jia-Min Xiong, Cheng-Jun Zhang, Hao Li, Rui-Chun Wang, Chao-Yang Lu, Jun Wu, Teng-Yun Chen, Lixing You, Xiang-Bin Wang, Qiang Zhang, Jian-Wei Pan

Abstract: Quantum key distribution (QKD) holds the potential to establish secure keys over long distances. The distance of point-to-point QKD secure key distribution is primarily impeded by the transmission loss inherent to the channel. In the quest to realize a large-scale quantum network, increasing the QKD distance under current technology is of great research interest. Here we adopt the 3-intensity send… ▽ More Quantum key distribution (QKD) holds the potential to establish secure keys over long distances. The distance of point-to-point QKD secure key distribution is primarily impeded by the transmission loss inherent to the channel. In the quest to realize a large-scale quantum network, increasing the QKD distance under current technology is of great research interest. Here we adopt the 3-intensity sending-or-not-sending twin-field QKD (TF-QKD) protocol with the actively-odd-parity-pairing method. The experiment demonstrates the feasibility of secure QKD over a 1002 km fibre channel considering the finite size effect. The secure key rate is $3.11\times10^{-12}$ per pulse at this distance. Furthermore, by optimizing parameters for shorter fiber distances, we conducted performance tests on key distribution for fiber lengths ranging from 202 km to 505 km. Notably, the secure key rate for the 202 km, the normal distance between major cities, reached 111.74 kbps. △ Less

Submitted 1 December, 2023; originally announced February 2024.

Comments: 18 pages, 3 figures

Journal ref: Quantum Front 2, 16 (2023)

arXiv:2401.14818 [pdf, other]

ChemDFM: Dialogue Foundation Model for Chemistry

Authors: Zihan Zhao, Da Ma, Lu Chen, Liangtai Sun, Zihao Li, Hongshen Xu, Zichen Zhu, Su Zhu, Shuai Fan, Guodong Shen, Xin Chen, Kai Yu

Abstract: Large language models (LLMs) have established great success in the general domain of natural language processing. Their emerging task generalization and free-form dialogue capabilities can greatly help to design Chemical General Intelligence (CGI) to assist real-world research in chemistry. However, the existence of specialized language and knowledge in the field of chemistry, such as the highly i… ▽ More Large language models (LLMs) have established great success in the general domain of natural language processing. Their emerging task generalization and free-form dialogue capabilities can greatly help to design Chemical General Intelligence (CGI) to assist real-world research in chemistry. However, the existence of specialized language and knowledge in the field of chemistry, such as the highly informative SMILES notation, hinders the performance of general-domain LLMs in chemistry. To this end, we develop ChemDFM, the first LLM towards CGI. ChemDFM-13B is trained on 34B tokens from chemical literature, textbooks, and instructions as well as various data from the general domain. Therefore, it can store, understand, and reason over chemical knowledge and languages while still possessing advanced free-form language comprehension capabilities. Extensive quantitative evaluation shows that ChemDFM can significantly outperform the representative open-sourced LLMs. Moreover, ChemDFM can also surpass GPT-4 on a great portion of chemical tasks, despite the significant size difference. Further qualitative evaluations demonstrate the efficiency and effectiveness of ChemDFM in real-world research scenarios. We will open-source the ChemDFM model soon. △ Less

Submitted 26 January, 2024; originally announced January 2024.

Comments: 10 pages, 12 figures, 13 tables. Under Review

Showing 1–50 of 467 results for author: Ma, D