subscribe to arXiv mailings

arXiv:2407.11921 [pdf, other]

IPA-NeRF: Illusory Poisoning Attack Against Neural Radiance Fields

Authors: Wenxiang Jiang, Hanwei Zhang, Shuo Zhao, Zhongwen Guo, Hao Wang

Abstract: Neural Radiance Field (NeRF) represents a significant advancement in computer vision, offering implicit neural network-based scene representation and novel view synthesis capabilities. Its applications span diverse fields including robotics, urban mapping, autonomous navigation, virtual reality/augmented reality, etc., some of which are considered high-risk AI applications. However, despite its wi… ▽ More Neural Radiance Field (NeRF) represents a significant advancement in computer vision, offering implicit neural network-based scene representation and novel view synthesis capabilities. Its applications span diverse fields including robotics, urban mapping, autonomous navigation, virtual reality/augmented reality, etc., some of which are considered high-risk AI applications. However, despite its widespread adoption, the robustness and security of NeRF remain largely unexplored. In this study, we contribute to this area by introducing the Illusory Poisoning Attack against Neural Radiance Fields (IPA-NeRF). This attack involves embedding a hidden backdoor view into NeRF, allowing it to produce predetermined outputs, i.e. illusory, when presented with the specified backdoor view while maintaining normal performance with standard inputs. Our attack is specifically designed to deceive users or downstream models at a particular position while ensuring that any abnormalities in NeRF remain undetectable from other viewpoints. Experimental results demonstrate the effectiveness of our Illusory Poisoning Attack, successfully presenting the desired illusory on the specified viewpoint without impacting other views. Notably, we achieve this attack by introducing small perturbations solely to the training set. The code can be found at https://github.com/jiang-wenxiang/IPA-NeRF. △ Less

Submitted 16 July, 2024; originally announced July 2024.

arXiv:2407.10987 [pdf, ps, other]

Adaptive Digital Twin and Communication-Efficient Federated Learning Network Slicing for 5G-enabled Internet of Things

Authors: Daniel Ayepah-Mensah, Guolin Sun, Yu Pang, Wei Jiang

Abstract: Network slicing enables industrial Internet of Things (IIoT) networks with multiservice and differentiated resource requirements to meet increasing demands through efficient use and management of network resources. Typically, the network slice orchestrator relies on demand forecasts for each slice to make informed decisions and maximize resource utilization. The new generation of Industry 4.0 has… ▽ More Network slicing enables industrial Internet of Things (IIoT) networks with multiservice and differentiated resource requirements to meet increasing demands through efficient use and management of network resources. Typically, the network slice orchestrator relies on demand forecasts for each slice to make informed decisions and maximize resource utilization. The new generation of Industry 4.0 has introduced digital twins to map physical systems to digital models for accurate decision-making. In our approach, we first use graph-attention networks to build a digital twin environment for network slices, enabling real-time traffic analysis, monitoring, and demand forecasting. Based on these predictions, we formulate the resource allocation problem as a federated multi-agent reinforcement learning problem and employ a deep deterministic policy gradient to determine the resource allocation policy while preserving the privacy of the slices. Our results demonstrate that the proposed approaches can improve the accuracy of demand prediction for network slices and reduce the communication overhead of dynamic network slicing. △ Less

Submitted 22 June, 2024; originally announced July 2024.

Comments: 8 pages, 7 figures, conference

arXiv:2407.10446 [pdf, other]

DDFAD: Dataset Distillation Framework for Audio Data

Authors: Wenbo Jiang, Rui Zhang, Hongwei Li, Xiaoyuan Liu, Haomiao Yang, Shui Yu

Abstract: Deep neural networks (DNNs) have achieved significant success in numerous applications. The remarkable performance of DNNs is largely attributed to the availability of massive, high-quality training datasets. However, processing such massive training data requires huge computational and storage resources. Dataset distillation is a promising solution to this problem, offering the capability to comp… ▽ More Deep neural networks (DNNs) have achieved significant success in numerous applications. The remarkable performance of DNNs is largely attributed to the availability of massive, high-quality training datasets. However, processing such massive training data requires huge computational and storage resources. Dataset distillation is a promising solution to this problem, offering the capability to compress a large dataset into a smaller distilled dataset. The model trained on the distilled dataset can achieve comparable performance to the model trained on the whole dataset. While dataset distillation has been demonstrated in image data, none have explored dataset distillation for audio data. In this work, for the first time, we propose a Dataset Distillation Framework for Audio Data (DDFAD). Specifically, we first propose the Fused Differential MFCC (FD-MFCC) as extracted features for audio data. After that, the FD-MFCC is distilled through the matching training trajectory distillation method. Finally, we propose an audio signal reconstruction algorithm based on the Griffin-Lim Algorithm to reconstruct the audio signal from the distilled FD-MFCC. Extensive experiments demonstrate the effectiveness of DDFAD on various audio datasets. In addition, we show that DDFAD has promising application prospects in many applications, such as continual learning and neural architecture search. △ Less

Submitted 15 July, 2024; originally announced July 2024.

arXiv:2407.10445 [pdf, other]

Backdoor Attacks against Image-to-Image Networks

Authors: Wenbo Jiang, Hongwei Li, Jiaming He, Rui Zhang, Guowen Xu, Tianwei Zhang, Rongxing Lu

Abstract: Recently, deep learning-based Image-to-Image (I2I) networks have become the predominant choice for I2I tasks such as image super-resolution and denoising. Despite their remarkable performance, the backdoor vulnerability of I2I networks has not been explored. To fill this research gap, we conduct a comprehensive investigation on the susceptibility of I2I networks to backdoor attacks. Specifically,… ▽ More Recently, deep learning-based Image-to-Image (I2I) networks have become the predominant choice for I2I tasks such as image super-resolution and denoising. Despite their remarkable performance, the backdoor vulnerability of I2I networks has not been explored. To fill this research gap, we conduct a comprehensive investigation on the susceptibility of I2I networks to backdoor attacks. Specifically, we propose a novel backdoor attack technique, where the compromised I2I network behaves normally on clean input images, yet outputs a predefined image of the adversary for malicious input images containing the trigger. To achieve this I2I backdoor attack, we propose a targeted universal adversarial perturbation (UAP) generation algorithm for I2I networks, where the generated UAP is used as the backdoor trigger. Additionally, in the backdoor training process that contains the main task and the backdoor task, multi-task learning (MTL) with dynamic weighting methods is employed to accelerate convergence rates. In addition to attacking I2I tasks, we extend our I2I backdoor to attack downstream tasks, including image classification and object detection. Extensive experiments demonstrate the effectiveness of the I2I backdoor on state-of-the-art I2I network architectures, as well as the robustness against different mainstream backdoor defenses. △ Less

Submitted 15 July, 2024; originally announced July 2024.

arXiv:2407.10440 [pdf, other]

A novel multi-threaded web crawling model

Authors: Weijie. Jiang

Abstract: This paper proposes a novel model for web crawling suitable for large-scale web data acquisition. This model first divides web data into several sub-data, with each sub-data corresponding to a thread task. In each thread task, web crawling tasks are concurrently executed, and the crawled data are stored in a buffer queue, awaiting further parsing. The parsing process is also divided into several t… ▽ More This paper proposes a novel model for web crawling suitable for large-scale web data acquisition. This model first divides web data into several sub-data, with each sub-data corresponding to a thread task. In each thread task, web crawling tasks are concurrently executed, and the crawled data are stored in a buffer queue, awaiting further parsing. The parsing process is also divided into several threads. By establishing the model and continuously conducting crawler tests, it is found that this model is significantly optimized compared to single-threaded approaches. △ Less

Submitted 9 May, 2024; originally announced July 2024.

arXiv:2407.09915 [pdf, other]

Automatic Parallel Tempering Markov Chain Monte Carlo with Nii-C

Authors: Sheng Jin, Wenxin Jiang, Dong-Hong Wu

Abstract: Due to the high dimensionality or multimodality that is common in modern astronomy, sampling Bayesian posteriors can be challenging. Several publicly available codes based on different sampling algorithms can solve these complex models, but the execution of the code is not always efficient or fast enough. The article introduces a C language general-purpose code, Nii-C (https://github.com/shengjin/… ▽ More Due to the high dimensionality or multimodality that is common in modern astronomy, sampling Bayesian posteriors can be challenging. Several publicly available codes based on different sampling algorithms can solve these complex models, but the execution of the code is not always efficient or fast enough. The article introduces a C language general-purpose code, Nii-C (https://github.com/shengjin/nii-c.git), that implements a framework of Automatic Parallel Tempering Markov Chain Monte Carlo. Automatic in this context means that the parameters that ensure an efficient parallel tempering process can be set by a control system during the initial stages of a sampling process. The auto-tuned parameters consist of two parts, the temperature ladders of all parallel tempering Markov chains and the proposal distributions for all model parameters across all parallel tempering chains. In order to reduce dependencies in the compilation process and increase the code's execution speed, Nii-C code is constructed entirely in the C language and parallelised using the Message-Passing Interface protocol to optimise the efficiency of parallel sampling. These implementations facilitate rapid convergence in the sampling of high-dimensional and multi-modal distributions, as well as expeditious code execution time. The Nii-C code can be used in various research areas to trace complex distributions due to its high sampling efficiency and quick execution speed. This article presents a few applications of the Nii-C code. △ Less

Submitted 13 July, 2024; originally announced July 2024.

Comments: 21 pages, 9 figures, accepted for publication in ApJS

arXiv:2407.07807 [pdf, other]

Revisiting the dead time effects of Insight-HXMT/ME on timing analysis

Authors: Youli Tuo, Xiaobo Li, Ying Tan, Baiyang Wu, Weichun Jiang, Liming Song, Jinlu Qu, Sudeep Gogate, Shuang-Nan Zhang, Andrea Santangelo

Abstract: Dead time is a common instrumental effect of X-ray detectors which would alter the behavior of timing properties of astronomical signals, such as distorting the shape of power density spectra (PDS), affecting the root-mean-square of potential quasi-periodic oscillation signals, etc. We revisit the effects of the dead time of Medium Energy X-ray telescope (ME) onboard Insight-HXMT, based on the sim… ▽ More Dead time is a common instrumental effect of X-ray detectors which would alter the behavior of timing properties of astronomical signals, such as distorting the shape of power density spectra (PDS), affecting the root-mean-square of potential quasi-periodic oscillation signals, etc. We revisit the effects of the dead time of Medium Energy X-ray telescope (ME) onboard Insight-HXMT, based on the simulation of electronic read-out mechanism that causes the dead time, and the real data. We investigate dead time effects on the pulse profile as well as the Quasi-Periodic Oscillation (QPO) signals. The dead time coefficient suggests a linear correlation with the observed count rate in each phase bin of the pulse profile according to the simulation of periodic signal as well as the real data observed on Swift J0243.6+6124. The Fourier-amplitude-difference (FAD) method could well recover the intrinsic shape of the observed PDS in the case that the PDS is from two identical detectors. We apply this technique on ME, by splitting the 9 FPGA modules into 2 groups. The results indicate that the FAD technique suits the case when two groups of detectors are not largely different; and the recovered PDS of Sco X-1 observed by ME slightly enhances the significance of the previously known QPO signal, meanwhile the root-mean-square of QPO is significantly improved. We provide the FAD correction tool implemented in HXMTDAS for users in the future to better analyze QPO signals. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: 9 pages, 8 figures, accepted for publication in MNRAS main journal

arXiv:2407.04206 [pdf, other]

Computational Graph Representation of Equations System Constructors in Hierarchical Circuit Simulation

Authors: Zichao Long, Lin Li, Lei Han, Xianglong Meng, Chongjun Ding, Ruiyan Li, Wu Jiang, Fuchen Ding, Jiaqing Yue, Zhichao Li, Yisheng Hu, Ding Li, Heng Liao

Abstract: Equations system constructors of hierarchical circuits play a central role in device modeling, nonlinear equations solving, and circuit design automation. However, existing constructors present limitations in applications to different extents. For example, the costs of developing and reusing device models -- especially coarse-grained equivalent models of circuit modules -- remain high while parame… ▽ More Equations system constructors of hierarchical circuits play a central role in device modeling, nonlinear equations solving, and circuit design automation. However, existing constructors present limitations in applications to different extents. For example, the costs of developing and reusing device models -- especially coarse-grained equivalent models of circuit modules -- remain high while parameter sensitivity analysis is complex and inefficient. Inspired by differentiable programming and leveraging the ecosystem benefits of open-source software, we propose an equations system constructor using the computational graph representation, along with its JSON format netlist, to address these limitations. This representation allows for runtime dependencies between signals and subcircuit/device parameters. The proposed method streamlines the model development process and facilitates end-to-end computation of gradients of equations remainders with respect to parameters. This paper discusses in detail the overarching concept of hierarchical subcircuit/device decomposition and nested invocation by drawing parallels to functions in programming languages, and introduces rules for parameters passing and gradient propagation across hierarchical circuit modules. The presented numerical examples, including (1) an uncoupled CMOS model representation using "equivalent circuit decomposition+dynamic parameters" and (2) operational amplifier (OpAmp) auto device sizing, have demonstrated that the proposed method supports circuit simulation and design and particularly subcircuit modeling with improved efficiency, simplicity, and decoupling compared to existing techniques. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2407.03641 [pdf, other]

Scalable Learned Model Soup on a Single GPU: An Efficient Subspace Training Strategy

Authors: Tao Li, Weisen Jiang, Fanghui Liu, Xiaolin Huang, James T. Kwok

Abstract: Pre-training followed by fine-tuning is widely adopted among practitioners. The performance can be improved by "model soups"~\cite{wortsman2022model} via exploring various hyperparameter configurations.The Learned-Soup, a variant of model soups, significantly improves the performance but suffers from substantial memory and time costs due to the requirements of (i) having to load all fine-tuned mod… ▽ More Pre-training followed by fine-tuning is widely adopted among practitioners. The performance can be improved by "model soups"~\cite{wortsman2022model} via exploring various hyperparameter configurations.The Learned-Soup, a variant of model soups, significantly improves the performance but suffers from substantial memory and time costs due to the requirements of (i) having to load all fine-tuned models simultaneously, and (ii) a large computational graph encompassing all fine-tuned models. In this paper, we propose Memory Efficient Hyperplane Learned Soup (MEHL-Soup) to tackle this issue by formulating the learned soup as a hyperplane optimization problem and introducing block coordinate gradient descent to learn the mixing coefficients. At each iteration, MEHL-Soup only needs to load a few fine-tuned models and build a computational graph with one combined model. We further extend MEHL-Soup to MEHL-Soup+ in a layer-wise manner. Experimental results on various ViT models and data sets show that MEHL-Soup(+) outperforms Learned-Soup(+) in terms of test accuracy, and also reduces memory usage by more than $13\times$. Moreover, MEHL-Soup(+) can be run on a single GPU and achieves $9\times$ speed up in soup construction compared with the Learned-Soup. The code is released at https://github.com/nblt/MEHL-Soup. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: ECCV 2024

arXiv:2407.02805 [pdf, other]

Efficient DNN-Powered Software with Fair Sparse Models

Authors: Xuanqi Gao, Weipeng Jiang, Juan Zhai, Shiqing Ma, Xiaoyu Zhang, Chao Shen

Abstract: With the emergence of the Software 3.0 era, there is a growing trend of compressing and integrating large models into software systems, with significant societal implications. Regrettably, in numerous instances, model compression techniques impact the fairness performance of these models and thus the ethical behavior of DNN-powered software. One of the most notable example is the Lottery Ticket Hy… ▽ More With the emergence of the Software 3.0 era, there is a growing trend of compressing and integrating large models into software systems, with significant societal implications. Regrettably, in numerous instances, model compression techniques impact the fairness performance of these models and thus the ethical behavior of DNN-powered software. One of the most notable example is the Lottery Ticket Hypothesis (LTH), a prevailing model pruning approach. This paper demonstrates that fairness issue of LTHbased pruning arises from both its subnetwork selection and training procedures, highlighting the inadequacy of existing remedies. To address this, we propose a novel pruning framework, Ballot, which employs a novel conflict-detection-based subnetwork selection to find accurate and fair subnetworks, coupled with a refined training process to attain a high-performance model, thereby improving the fairness of DNN-powered software. By means of this procedure, Ballot improves the fairness of pruning by 38.00%, 33.91%, 17.96%, and 35.82% compared to state-of-the-art baselines, namely Magnitude Pruning, Standard LTH, SafeCompress, and FairScratch respectively, based on our evaluation of five popular datasets and three widely used models. Our code is available at https://anonymous.4open.science/r/Ballot-506E. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.02228 [pdf, other]

MTMamba: Enhancing Multi-Task Dense Scene Understanding by Mamba-Based Decoders

Authors: Baijiong Lin, Weisen Jiang, Pengguang Chen, Yu Zhang, Shu Liu, Ying-Cong Chen

Abstract: Multi-task dense scene understanding, which learns a model for multiple dense prediction tasks, has a wide range of application scenarios. Modeling long-range dependency and enhancing cross-task interactions are crucial to multi-task dense prediction. In this paper, we propose MTMamba, a novel Mamba-based architecture for multi-task scene understanding. It contains two types of core blocks: self-t… ▽ More Multi-task dense scene understanding, which learns a model for multiple dense prediction tasks, has a wide range of application scenarios. Modeling long-range dependency and enhancing cross-task interactions are crucial to multi-task dense prediction. In this paper, we propose MTMamba, a novel Mamba-based architecture for multi-task scene understanding. It contains two types of core blocks: self-task Mamba (STM) block and cross-task Mamba (CTM) block. STM handles long-range dependency by leveraging Mamba, while CTM explicitly models task interactions to facilitate information exchange across tasks. Experiments on NYUDv2 and PASCAL-Context datasets demonstrate the superior performance of MTMamba over Transformer-based and CNN-based methods. Notably, on the PASCAL-Context dataset, MTMamba achieves improvements of +2.08, +5.01, and +4.90 over the previous best methods in the tasks of semantic segmentation, human parsing, and object boundary detection, respectively. The code is available at https://github.com/EnVision-Research/MTMamba. △ Less

Submitted 14 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

Comments: ECCV 2024

arXiv:2407.00955 [pdf, other]

Task-oriented Over-the-air Computation for Edge-device Co-inference with Balanced Classification Accuracy

Authors: Xiang Jiao, Dingzhu Wen, Guangxu Zhu, Wei Jiang, Wu Luo, Yuanming Shi

Abstract: Edge-device co-inference, which concerns the cooperation between edge devices and an edge server for completing inference tasks over wireless networks, has been a promising technique for enabling various kinds of intelligent services at the network edge, e.g., auto-driving. In this paradigm, the concerned design objective of the network shifts from the traditional communication throughput to the e… ▽ More Edge-device co-inference, which concerns the cooperation between edge devices and an edge server for completing inference tasks over wireless networks, has been a promising technique for enabling various kinds of intelligent services at the network edge, e.g., auto-driving. In this paradigm, the concerned design objective of the network shifts from the traditional communication throughput to the effective and efficient execution of the inference task underpinned by the network, measured by, e.g., the inference accuracy and latency. In this paper, a task-oriented over-the-air computation scheme is proposed for a multidevice artificial intelligence system. Particularly, a novel tractable inference accuracy metric is proposed for classification tasks, which is called minimum pair-wise discriminant gain. Unlike prior work measuring the average of all class pairs in feature space, it measures the minimum distance of all class pairs. By maximizing the minimum pair-wise discriminant gain instead of its average counterpart, any pair of classes can be better separated in the feature space, and thus leading to a balanced and improved inference accuracy for all classes. Besides, this paper jointly optimizes the minimum discriminant gain of all feature elements instead of separately maximizing that of each element in the existing designs. As a result, the transmit power can be adaptively allocated to the feature elements according to their different contributions to the inference accuracy, opening an extra degree of freedom to improve inference performance. Extensive experiments are conducted using a concrete use case of human motion recognition to verify the superiority of the proposed design over the benchmarking scheme. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: This paper was accepted by IEEE Transactions on Vehicular Technology on June 30, 2024

arXiv:2406.17841 [pdf, other]

Probing many-body Bell correlation depth with superconducting qubits

Authors: Ke Wang, Weikang Li, Shibo Xu, Mengyao Hu, Jiachen Chen, Yaozu Wu, Chuanyu Zhang, Feitong Jin, Xuhao Zhu, Yu Gao, Ziqi Tan, Aosai Zhang, Ning Wang, Yiren Zou, Tingting Li, Fanhao Shen, Jiarun Zhong, Zehang Bao, Zitian Zhu, Zixuan Song, Jinfeng Deng, Hang Dong, Xu Zhang, Pengfei Zhang, Wenjie Jiang , et al. (10 additional authors not shown)

Abstract: Quantum nonlocality describes a stronger form of quantum correlation than that of entanglement. It refutes Einstein's belief of local realism and is among the most distinctive and enigmatic features of quantum mechanics. It is a crucial resource for achieving quantum advantages in a variety of practical applications, ranging from cryptography and certified random number generation via self-testing… ▽ More Quantum nonlocality describes a stronger form of quantum correlation than that of entanglement. It refutes Einstein's belief of local realism and is among the most distinctive and enigmatic features of quantum mechanics. It is a crucial resource for achieving quantum advantages in a variety of practical applications, ranging from cryptography and certified random number generation via self-testing to machine learning. Nevertheless, the detection of nonlocality, especially in quantum many-body systems, is notoriously challenging. Here, we report an experimental certification of genuine multipartite Bell correlations, which signal nonlocality in quantum many-body systems, up to 24 qubits with a fully programmable superconducting quantum processor. In particular, we employ energy as a Bell correlation witness and variationally decrease the energy of a many-body system across a hierarchy of thresholds, below which an increasing Bell correlation depth can be certified from experimental data. As an illustrating example, we variationally prepare the low-energy state of a two-dimensional honeycomb model with 73 qubits and certify its Bell correlations by measuring an energy that surpasses the corresponding classical bound with up to 48 standard deviations. In addition, we variationally prepare a sequence of low-energy states and certify their genuine multipartite Bell correlations up to 24 qubits via energies measured efficiently by parity oscillation and multiple quantum coherence techniques. Our results establish a viable approach for preparing and certifying multipartite Bell correlations, which provide not only a finer benchmark beyond entanglement for quantum devices, but also a valuable guide towards exploiting multipartite Bell correlation in a wide spectrum of practical applications. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: 11 pages,6 figures + 14 pages, 6 figures

arXiv:2406.17784 [pdf, other]

Scalable Near-Field Localization Based on Partitioned Large-Scale Antenna Array

Authors: Xiaojun Yuan, Yuqing Zheng, Mingchen Zhang, Boyu Teng, Wenjun Jiang

Abstract: This paper studies a passive localization system, where an extremely large-scale antenna array (ELAA) is deployed at the base station (BS) to locate a user equipment (UE) residing in its near-field (Fresnel) region. We propose a novel algorithm, named array partitioning-based location estimation (APLE), for scalable near-field localization. The APLE algorithm is developed based on the basic assump… ▽ More This paper studies a passive localization system, where an extremely large-scale antenna array (ELAA) is deployed at the base station (BS) to locate a user equipment (UE) residing in its near-field (Fresnel) region. We propose a novel algorithm, named array partitioning-based location estimation (APLE), for scalable near-field localization. The APLE algorithm is developed based on the basic assumption that, by partitioning the ELAA into multiple subarrays, the UE can be approximated as in the far-field region of each subarray. We establish a Bayeian inference framework based on the geometric constraints between the UE location and the angles of arrivals (AoAs) at different subarrays. Then, the APLE algorithm is designed based on the message-passing principle for the localization of the UE. APLE exhibits linear computational complexity with the number of BS antennas, leading to a significant reduction in complexity compared to existing methods. We further propose an enhanced APLE (E-APLE) algorithm that refines the location estimate obtained from APLE by following the maximum likelihood principle. The E-APLE algorithm achieves superior localization accuracy compared to APLE while maintaining a linear complexity with the number of BS antennas. Numerical results demonstrate that the proposed APLE and E-APLE algorithms outperform the existing baselines in terms of localization accuracy. △ Less

Submitted 13 May, 2024; originally announced June 2024.

Comments: arXiv admin note: text overlap with arXiv:2312.12342

arXiv:2406.14501 [pdf, other]

Quantum limits of superconducting-photonic links and their extension to mm-waves

Authors: Kevin K. S. Multani, Wentao Jiang, Emilio A. Nanni, Amir H. Safavi-Naeini

Abstract: Photonic addressing of superconducting circuits has been proposed to overcome wiring complexity and heat load challenges, but superconducting-photonic links suffer from an efficiency-noise trade-off that limits scalability. This trade-off arises because increasing power conversion efficiency requires reducing optical power, which makes the converted signal susceptible to shot noise. We analyze thi… ▽ More Photonic addressing of superconducting circuits has been proposed to overcome wiring complexity and heat load challenges, but superconducting-photonic links suffer from an efficiency-noise trade-off that limits scalability. This trade-off arises because increasing power conversion efficiency requires reducing optical power, which makes the converted signal susceptible to shot noise. We analyze this trade-off and find the infidelity of qubit gates driven by photonic signals scales inversely with the number of photons used, and therefore the power efficiency of the converter. While methods like nonlinear detection or squeezed light could mitigate this effect, we consider generating higher frequency electrical signals, such as millimeter-waves (100 GHz), using laser light. At these higher frequencies, circuits have higher operating temperatures and cooling power budgets. We demonstrate an optically-driven cryogenic millimeter-wave source with a power efficiency of $10^{-4}$ that can generate ${1}~\mathrm{μW}$ of RF power at 80 GHz with 1500 thermal photons of added noise at 4 K. Using this source, we perform frequency-domain spectroscopy of superconducting NbTiN resonators at 80-90 GHz. Our results show a promising approach to alleviate the efficiency-noise constraints on optically-driven superconducting circuits while leveraging the benefits of photonic signal delivery. Further optimization of power efficiency and noise at high frequencies could make photonic control of superconducting qubits viable at temperatures exceeding 1 kelvin. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: 14 pages, 8 figures

arXiv:2406.14484 [pdf, other]

A two-dimensional optomechanical crystal for quantum transduction

Authors: Felix M. Mayor, Sultan Malik, André G. Primo, Samuel Gyger, Wentao Jiang, Thiago P. M. Alegre, Amir H. Safavi-Naeini

Abstract: Integrated optomechanical systems are one of the leading platforms for manipulating, sensing, and distributing quantum information. The temperature increase due to residual optical absorption sets the ultimate limit on performance for these applications. In this work, we demonstrate a two-dimensional optomechanical crystal geometry, named \textbf{b-dagger}, that alleviates this problem through inc… ▽ More Integrated optomechanical systems are one of the leading platforms for manipulating, sensing, and distributing quantum information. The temperature increase due to residual optical absorption sets the ultimate limit on performance for these applications. In this work, we demonstrate a two-dimensional optomechanical crystal geometry, named \textbf{b-dagger}, that alleviates this problem through increased thermal anchoring to the surrounding material. Our mechanical mode operates at 7.4 GHz, well within the operation range of standard cryogenic microwave hardware and piezoelectric transducers. The enhanced thermalization combined with the large optomechanical coupling rates, $g_0/2π\approx 880~\mathrm{kHz}$, and high optical quality factors, $Q_\text{opt} = 2.4 \times 10^5$, enables the ground-state cooling of the acoustic mode to phononic occupancies as low as $n_\text{m} = 0.35$ from an initial temperature of 3 kelvin, as well as entering the optomechanical strong-coupling regime. Finally, we perform pulsed sideband asymmetry of our devices at a temperature below 10 millikelvin and demonstrate ground-state operation ($n_\text{m} < 0.45$) for repetition rates as high as 3 MHz. Our results extend the boundaries of optomechanical system capabilities and establish a robust foundation for the next generation of microwave-to-optical transducers with entanglement rates overcoming the decoherence rates of state-of-the-art superconducting qubits. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: 13 pages, 4 main figures

arXiv:2406.12923 [pdf, other]

Interpretable Cascading Mixture-of-Experts for Urban Traffic Congestion Prediction

Authors: Wenzhao Jiang, Jindong Han, Hao Liu, Tao Tao, Naiqiang Tan, Hui Xiong

Abstract: Rapid urbanization has significantly escalated traffic congestion, underscoring the need for advanced congestion prediction services to bolster intelligent transportation systems. As one of the world's largest ride-hailing platforms, DiDi places great emphasis on the accuracy of congestion prediction to enhance the effectiveness and reliability of their real-time services, such as travel time esti… ▽ More Rapid urbanization has significantly escalated traffic congestion, underscoring the need for advanced congestion prediction services to bolster intelligent transportation systems. As one of the world's largest ride-hailing platforms, DiDi places great emphasis on the accuracy of congestion prediction to enhance the effectiveness and reliability of their real-time services, such as travel time estimation and route planning. Despite numerous efforts have been made on congestion prediction, most of them fall short in handling heterogeneous and dynamic spatio-temporal dependencies (e.g., periodic and non-periodic congestions), particularly in the presence of noisy and incomplete traffic data. In this paper, we introduce a Congestion Prediction Mixture-of-Experts, CP-MoE, to address the above challenges. We first propose a sparsely-gated Mixture of Adaptive Graph Learners (MAGLs) with congestion-aware inductive biases to improve the model capacity for efficiently capturing complex spatio-temporal dependencies in varying traffic scenarios. Then, we devise two specialized experts to help identify stable trends and periodic patterns within the traffic data, respectively. By cascading these experts with MAGLs, CP-MoE delivers congestion predictions in a more robust and interpretable manner. Furthermore, an ordinal regression strategy is adopted to facilitate effective collaboration among diverse experts. Extensive experiments on real-world datasets demonstrate the superiority of our proposed method compared with state-of-the-art spatio-temporal prediction models. More importantly, CP-MoE has been deployed in DiDi to improve the accuracy and reliability of the travel time estimation system. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.12385 [pdf, other]

Accelerating Graph-based Vector Search via Delayed-Synchronization Traversal

Authors: Wenqi Jiang, Hang Hu, Torsten Hoefler, Gustavo Alonso

Abstract: Vector search systems are indispensable in large language model (LLM) serving, search engines, and recommender systems, where minimizing online search latency is essential. Among various algorithms, graph-based vector search (GVS) is particularly popular due to its high search performance and quality. To efficiently serve low-latency GVS, we propose a hardware-algorithm co-design solution includin… ▽ More Vector search systems are indispensable in large language model (LLM) serving, search engines, and recommender systems, where minimizing online search latency is essential. Among various algorithms, graph-based vector search (GVS) is particularly popular due to its high search performance and quality. To efficiently serve low-latency GVS, we propose a hardware-algorithm co-design solution including Falcon, a GVS accelerator, and Delayed-Synchronization Traversal (DST), an accelerator-optimized graph traversal algorithm. Falcon implements high-performance GVS operators and reduces memory accesses with an on-chip Bloom filter to track search states. DST improves search performance and quality by relaxing the graph traversal order to maximize accelerator utilization. Evaluation across various graphs and datasets shows that our Falcon prototype on FPGAs, coupled with DST, achieves up to 4.3$\times$ and 19.5$\times$ speedups in latency and up to 8.0$\times$ and 26.9$\times$ improvements in energy efficiency over CPU and GPU-based GVS systems. The remarkable efficiency of Falcon and DST demonstrates their potential to become the standard solutions for future GVS acceleration. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.12210 [pdf, other]

Generalized Moving Least-Squares for Solving Vector-valued PDEs on Unknown Manifolds

Authors: Rongji Li, Qile Yan, Shixiao W. Jiang

Abstract: In this paper, we extend the Generalized Moving Least-Squares (GMLS) method in two different ways to solve the vector-valued PDEs on unknown smooth 2D manifolds without boundaries embedded in $\mathbb{R}^{3}$, identified with randomly sampled point cloud data. The two approaches are referred to as the intrinsic method and the extrinsic method. For the intrinsic method which relies on local approxi… ▽ More In this paper, we extend the Generalized Moving Least-Squares (GMLS) method in two different ways to solve the vector-valued PDEs on unknown smooth 2D manifolds without boundaries embedded in $\mathbb{R}^{3}$, identified with randomly sampled point cloud data. The two approaches are referred to as the intrinsic method and the extrinsic method. For the intrinsic method which relies on local approximations of metric tensors, we simplify the formula of Laplacians and covariant derivatives acting on vector fields at the base point by calculating them in a local Monge coordinate system. On the other hand, the extrinsic method formulates tangential derivatives on a submanifold as the projection of the directional derivative in the ambient Euclidean space onto the tangent space of the submanifold. One challenge of this method is that the discretization of vector Laplacians yields a matrix whose size relies on the ambient dimension. To overcome this issue, we reduce the dimension of vector Laplacian matrices by employing an appropriate projection. The complexity of both methods scales well with the dimension of manifolds rather than the ambient dimension. We also present supporting numerical examples, including eigenvalue problems, linear Poisson equations, and nonlinear Burgers' equations, to examine the numerical accuracy of proposed methods on various smooth manifolds. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.12019 [pdf]

Hacking Encrypted Wireless Power: Cyber-Security of Dynamic Charging

Authors: Hui Wang, Nima Tashakor, Wei Jiang, Wei Liu, C. Q. Jiang, Stefan M. Goetz

Abstract: Recently, energy encryption for wireless power transfer has been developed for energy safety, which is important in public places to suppress unauthorized energy extraction. Most techniques vary the frequency so that unauthorized receivers cannot extract energy because of non-resonance. However, this strategy is unreliable. To stimulate the progress of energy encryption technology and point out se… ▽ More Recently, energy encryption for wireless power transfer has been developed for energy safety, which is important in public places to suppress unauthorized energy extraction. Most techniques vary the frequency so that unauthorized receivers cannot extract energy because of non-resonance. However, this strategy is unreliable. To stimulate the progress of energy encryption technology and point out security holes, this paper proposes a decryption method for the fundamental principle of encrypted frequency-varying wireless power transfer. The paper uses an auxiliary coil to detect the frequency and a switched-capacitor array to adaptively compensate the receiver for a wide frequency range. The switched-capacitor array contains two capacitors and one semi-conductor switch. One capacitor compensates the receiver all the time while the other's active time during one wireless power transfer cycle is regulated by the switch. Thus, the proposed hacking receiver controls the equivalent capacitance of the compensation and steals energy. Finally, a detailed simulation model and experimental results prove the effectiveness of the attack on frequency-hopping energy encryption. Although any nonnegligible energy extracted would be problematic, we achieved to steal 78% to 84% of the energy an authorized receiver could get. When the frequency changes, the interceptor is coarsely tuned very quickly, which can hack fast frequency-varying encrypted system. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 10 pages, 17 figures

arXiv:2406.08698 [pdf, other]

Constraints on Ultra Heavy Dark Matter Properties from Dwarf Spheroidal Galaxies with LHAASO Observations

Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (255 additional authors not shown)

Abstract: In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes… ▽ More In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes of astrophysical $γ$-ray background while large amount of dark matter. By analyzing more than 700 days observational data at LHAASO, no significant dark matter signal from 1 TeV to 1 EeV is detected. Accordingly we derive the most stringent constraints on the ultra-heavy dark matter annihilation cross-section up to EeV. The constraints on the lifetime of dark matter in decay mode are also derived. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 17 pages, 12 figures, accepted by PRL

arXiv:2406.08205 [pdf, other]

What do we know about Hugging Face? A systematic literature review and quantitative validation of qualitative claims

Authors: Jason Jones, Wenxin Jiang, Nicholas Synovic, George K. Thiruvathukal, James C. Davis

Abstract: Background: Collaborative Software Package Registries (SPRs) are an integral part of the software supply chain. Much engineering work synthesizes SPR package into applications. Prior research has examined SPRs for traditional software, such as NPM (JavaScript) and PyPI (Python). Pre-Trained Model (PTM) Registries are an emerging class of SPR of increasing importance, because they support the deep… ▽ More Background: Collaborative Software Package Registries (SPRs) are an integral part of the software supply chain. Much engineering work synthesizes SPR package into applications. Prior research has examined SPRs for traditional software, such as NPM (JavaScript) and PyPI (Python). Pre-Trained Model (PTM) Registries are an emerging class of SPR of increasing importance, because they support the deep learning supply chain. Aims: Recent empirical research has examined PTM registries in ways such as vulnerabilities, reuse processes, and evolution. However, no existing research synthesizes them to provide a systematic understanding of the current knowledge. Some of the existing research includes qualitative claims lacking quantitative analysis. Our research fills these gaps by providing a knowledge synthesis and quantitative analyses. Methods: We first conduct a systematic literature review (SLR). We then observe that some of the claims are qualitative. We identify quantifiable metrics associated with those claims, and measure in order to substantiate these claims. Results: From our SLR, we identify 12 claims about PTM reuse on the HuggingFace platform, 4 of which lack quantitative validation. We successfully test 3 of these claims through a quantitative analysis, and directly compare one with traditional software. Our findings corroborate qualitative claims with quantitative measurements. Our findings are: (1) PTMs have a much higher turnover rate than traditional software, indicating a dynamic and rapidly evolving reuse environment within the PTM ecosystem; and (2) There is a strong correlation between documentation quality and PTM popularity. Conclusions: We confirm qualitative research claims with concrete metrics, supporting prior qualitative and case study research. Our measures show further dynamics of PTM reuse, inspiring research infrastructure and new measures. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.08100 [pdf, other]

Multimodal Table Understanding

Authors: Mingyu Zheng, Xinwei Feng, Qingyi Si, Qiaoqiao She, Zheng Lin, Wenbin Jiang, Weiping Wang

Abstract: Although great progress has been made by previous table understanding methods including recent approaches based on large language models (LLMs), they rely heavily on the premise that given tables must be converted into a certain text sequence (such as Markdown or HTML) to serve as model input. However, it is difficult to access such high-quality textual table representations in some real-world sce… ▽ More Although great progress has been made by previous table understanding methods including recent approaches based on large language models (LLMs), they rely heavily on the premise that given tables must be converted into a certain text sequence (such as Markdown or HTML) to serve as model input. However, it is difficult to access such high-quality textual table representations in some real-world scenarios, and table images are much more accessible. Therefore, how to directly understand tables using intuitive visual information is a crucial and urgent challenge for developing more practical applications. In this paper, we propose a new problem, multimodal table understanding, where the model needs to generate correct responses to various table-related requests based on the given table image. To facilitate both the model training and evaluation, we construct a large-scale dataset named MMTab, which covers a wide spectrum of table images, instructions and tasks. On this basis, we develop Table-LLaVA, a generalist tabular multimodal large language model (MLLM), which significantly outperforms recent open-source MLLM baselines on 23 benchmarks under held-in and held-out settings. The code and data is available at this https://github.com/SpursGoZmy/Table-LLaVA △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 23 pages, 16 figures, ACL 2024 main conference, camera-ready version

arXiv:2406.05877 [pdf, ps, other]

The Nodal Sets of Solutions to Parabolic Equations

Authors: Yiqi Huang, Wenshuai Jiang

Abstract: In this paper, we study the parabolic equations $\partial_t u=\partial_j\left(a^{ij}(x,t)\partial_iu\right)+b^j(x,t)\partial_ju+c(x,t)u$ in a domain of $\mathbb{R}^n$ under the condition that $a^{ij}$ are Lipschitz continuous. Consider the nodal set $Z_t=\{x: u(x,t)=0\}$ at a time $t$-slice. Simple examples show that the singular set $\mathcal{S}_t=\{x: u(x,t)=|\nabla_x u|(x,t)=0\}$ may coincide w… ▽ More In this paper, we study the parabolic equations $\partial_t u=\partial_j\left(a^{ij}(x,t)\partial_iu\right)+b^j(x,t)\partial_ju+c(x,t)u$ in a domain of $\mathbb{R}^n$ under the condition that $a^{ij}$ are Lipschitz continuous. Consider the nodal set $Z_t=\{x: u(x,t)=0\}$ at a time $t$-slice. Simple examples show that the singular set $\mathcal{S}_t=\{x: u(x,t)=|\nabla_x u|(x,t)=0\}$ may coincide with nodal set. This makes the methods used in the study of nodal sets for elliptic equations fail, rendering the parabolic case much more complicated. The current strongest results in the literature establish the finiteness of the $(n-1)$-dimensional Hausdorff measure of $Z_t$, assuming either $n=1$ by Angenent or that the coefficients are time-independent and analytic by Lin. With general coefficients, the codimension-one estimate was obtained under some doubling assumption by Han-Lin but only for space-time nodal sets. In the first part, we prove that $\mathcal{H}^{n-1}(Z_t) < \infty$ in full generality, i.e. for any dimension, with time-dependent coefficients and with merely Lipschitz regular leading coefficients $a^{ij}$. In the second part, we study the evolutionary behavior of nodal sets. When $n=1$, it is proved by Angenent that the number of nodal points is non-increasing in time. For the $n$-dimensional case, we construct examples showing that measure monotonicity fails. In contrast, we prove dimension monotonicity, i.e., the Hausdorff dimension of the nodal set is non-increasing in time. This is the first monotonicity property for nodal sets in general dimensions. All the assumptions here are sharp. △ Less

Submitted 9 June, 2024; originally announced June 2024.

arXiv:2406.05874 [pdf, other]

Stealthy Targeted Backdoor Attacks against Image Captioning

Authors: Wenshu Fan, Hongwei Li, Wenbo Jiang, Meng Hao, Shui Yu, Xiao Zhang

Abstract: In recent years, there has been an explosive growth in multimodal learning. Image captioning, a classical multimodal task, has demonstrated promising applications and attracted extensive research attention. However, recent studies have shown that image caption models are vulnerable to some security threats such as backdoor attacks. Existing backdoor attacks against image captioning typically pair… ▽ More In recent years, there has been an explosive growth in multimodal learning. Image captioning, a classical multimodal task, has demonstrated promising applications and attracted extensive research attention. However, recent studies have shown that image caption models are vulnerable to some security threats such as backdoor attacks. Existing backdoor attacks against image captioning typically pair a trigger either with a predefined sentence or a single word as the targeted output, yet they are unrelated to the image content, making them easily noticeable as anomalies by humans. In this paper, we present a novel method to craft targeted backdoor attacks against image caption models, which are designed to be stealthier than prior attacks. Specifically, our method first learns a special trigger by leveraging universal perturbation techniques for object detection, then places the learned trigger in the center of some specific source object and modifies the corresponding object name in the output caption to a predefined target name. During the prediction phase, the caption produced by the backdoored model for input images with the trigger can accurately convey the semantic information of the rest of the whole image, while incorrectly recognizing the source object as the predefined target. Extensive experiments demonstrate that our approach can achieve a high attack success rate while having a negligible impact on model clean performance. In addition, we show our method is stealthy in that the produced backdoor samples are indistinguishable from clean samples in both image and text domains, which can successfully bypass existing backdoor defenses, highlighting the need for better defensive mechanisms against such stealthy backdoor attacks. △ Less

Submitted 9 June, 2024; originally announced June 2024.

arXiv:2406.04744 [pdf, other]

CRAG -- Comprehensive RAG Benchmark

Authors: Xiao Yang, Kai Sun, Hao Xin, Yushi Sun, Nikita Bhalla, Xiangsen Chen, Sajal Choudhary, Rongze Daniel Gui, Ziran Will Jiang, Ziyu Jiang, Lingkun Kong, Brian Moran, Jiaqi Wang, Yifan Ethan Xu, An Yan, Chenyu Yang, Eting Yuan, Hanwen Zha, Nan Tang, Lei Chen, Nicolas Scheffer, Yue Liu, Nirav Shah, Rakesh Wanga, Anuj Kumar , et al. (2 additional authors not shown)

Abstract: Retrieval-Augmented Generation (RAG) has recently emerged as a promising solution to alleviate Large Language Model (LLM)'s deficiency in lack of knowledge. Existing RAG datasets, however, do not adequately represent the diverse and dynamic nature of real-world Question Answering (QA) tasks. To bridge this gap, we introduce the Comprehensive RAG Benchmark (CRAG), a factual question answering bench… ▽ More Retrieval-Augmented Generation (RAG) has recently emerged as a promising solution to alleviate Large Language Model (LLM)'s deficiency in lack of knowledge. Existing RAG datasets, however, do not adequately represent the diverse and dynamic nature of real-world Question Answering (QA) tasks. To bridge this gap, we introduce the Comprehensive RAG Benchmark (CRAG), a factual question answering benchmark of 4,409 question-answer pairs and mock APIs to simulate web and Knowledge Graph (KG) search. CRAG is designed to encapsulate a diverse array of questions across five domains and eight question categories, reflecting varied entity popularity from popular to long-tail, and temporal dynamisms ranging from years to seconds. Our evaluation on this benchmark highlights the gap to fully trustworthy QA. Whereas most advanced LLMs achieve <=34% accuracy on CRAG, adding RAG in a straightforward manner improves the accuracy only to 44%. State-of-the-art industry RAG solutions only answer 63% questions without any hallucination. CRAG also reveals much lower accuracy in answering questions regarding facts with higher dynamism, lower popularity, or higher complexity, suggesting future research directions. The CRAG benchmark laid the groundwork for a KDD Cup 2024 challenge, attracting thousands of participants and submissions within the first 50 days of the competition. We commit to maintaining CRAG to serve research communities in advancing RAG solutions and general QA solutions. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2406.03787 [pdf, other]

Projection-Free Variance Reduction Methods for Stochastic Constrained Multi-Level Compositional Optimization

Authors: Wei Jiang, Sifan Yang, Wenhao Yang, Yibo Wang, Yuanyu Wan, Lijun Zhang

Abstract: This paper investigates projection-free algorithms for stochastic constrained multi-level optimization. In this context, the objective function is a nested composition of several smooth functions, and the decision set is closed and convex. Existing projection-free algorithms for solving this problem suffer from two limitations: 1) they solely focus on the gradient mapping criterion and fail to mat… ▽ More This paper investigates projection-free algorithms for stochastic constrained multi-level optimization. In this context, the objective function is a nested composition of several smooth functions, and the decision set is closed and convex. Existing projection-free algorithms for solving this problem suffer from two limitations: 1) they solely focus on the gradient mapping criterion and fail to match the optimal sample complexities in unconstrained settings; 2) their analysis is exclusively applicable to non-convex functions, without considering convex and strongly convex objectives. To address these issues, we introduce novel projection-free variance reduction algorithms and analyze their complexities under different criteria. For gradient mapping, our complexities improve existing results and match the optimal rates for unconstrained problems. For the widely-used Frank-Wolfe gap criterion, we provide theoretical guarantees that align with those for single-level problems. Additionally, by using a stage-wise adaptation, we further obtain complexities for convex and strongly convex functions. Finally, numerical experiments on different tasks demonstrate the effectiveness of our methods. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.02135 [pdf, other]

Robust Interaction-based Relevance Modeling for Online E-Commerce and LLM-based Retrieval

Authors: Ben Chen, Huangyu Dai, Xiang Ma, Wen Jiang, Wei Ning

Abstract: Semantic relevance calculation is crucial for e-commerce search engines, as it ensures that the items selected closely align with customer intent. Inadequate attention to this aspect can detrimentally affect user experience and engagement. Traditional text-matching techniques are prevalent but often fail to capture the nuances of search intent accurately, so neural networks now have become a prefe… ▽ More Semantic relevance calculation is crucial for e-commerce search engines, as it ensures that the items selected closely align with customer intent. Inadequate attention to this aspect can detrimentally affect user experience and engagement. Traditional text-matching techniques are prevalent but often fail to capture the nuances of search intent accurately, so neural networks now have become a preferred solution to processing such complex text matching. Existing methods predominantly employ representation-based architectures, which strike a balance between high traffic capacity and low latency. However, they exhibit significant shortcomings in generalization and robustness when compared to interaction-based architectures. In this work, we introduce a robust interaction-based modeling paradigm to address these shortcomings. It encompasses 1) a dynamic length representation scheme for expedited inference, 2) a professional terms recognition method to identify subjects and core attributes from complex sentence structures, and 3) a contrastive adversarial training protocol to bolster the model's robustness and matching capabilities. Extensive offline evaluations demonstrate the superior robustness and effectiveness of our approach, and online A/B testing confirms its ability to improve relevance in the same exposure position, resulting in more clicks and conversions. To the best of our knowledge, this method is the first interaction-based approach for large e-commerce search relevance calculation. Notably, we have deployed it for the entire search traffic on alibaba.com, the largest B2B e-commerce platform in the world. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: Accepted by ECML-PKDD'24 as Outstanding Paper. 8 pages, 2 figures, 7 tables

arXiv:2406.01959 [pdf, other]

Adaptive Variance Reduction for Stochastic Optimization under Weaker Assumptions

Authors: Wei Jiang, Sifan Yang, Yibo Wang, Lijun Zhang

Abstract: This paper explores adaptive variance reduction methods for stochastic optimization based on the STORM technique. Existing adaptive extensions of STORM rely on strong assumptions like bounded gradients and bounded function values, or suffer an additional $\mathcal{O}(\log T)$ term in the convergence rate. To address these limitations, we introduce a novel adaptive STORM method that achieves an opt… ▽ More This paper explores adaptive variance reduction methods for stochastic optimization based on the STORM technique. Existing adaptive extensions of STORM rely on strong assumptions like bounded gradients and bounded function values, or suffer an additional $\mathcal{O}(\log T)$ term in the convergence rate. To address these limitations, we introduce a novel adaptive STORM method that achieves an optimal convergence rate of $\mathcal{O}(T^{-1/3})$ for non-convex functions with our newly designed learning rate strategy. Compared with existing approaches, our method requires weaker assumptions and attains the optimal convergence rate without the additional $\mathcal{O}(\log T)$ term. We also extend the proposed technique to stochastic compositional optimization, obtaining the same optimal rate of $\mathcal{O}(T^{-1/3})$. Furthermore, we investigate the non-convex finite-sum problem and develop another innovative adaptive variance reduction method that achieves an optimal convergence rate of $\mathcal{O}(n^{1/4} T^{-1/2} )$, where $n$ represents the number of component functions. Numerical experiments across various tasks validate the effectiveness of our method. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2406.01686 [pdf, other]

Prethermal Time-Crystalline Corner Modes

Authors: Si Jiang, Dong Yuan, Wenjie Jiang, Dong-Ling Deng, Francisco Machado

Abstract: We demonstrate the existence of prethermal discrete time crystals whose sub-harmonic response is entirely localized to zero-dimensional corner modes. Within the exponentially long prethermal regime, we show that the robustness of these corner modes arises from two related, yet distinct mechanisms: the presence of a higher-order symmetry-protected topological phase in the effective Hamiltonian, or… ▽ More We demonstrate the existence of prethermal discrete time crystals whose sub-harmonic response is entirely localized to zero-dimensional corner modes. Within the exponentially long prethermal regime, we show that the robustness of these corner modes arises from two related, yet distinct mechanisms: the presence of a higher-order symmetry-protected topological phase in the effective Hamiltonian, or the emergence of a dynamical constraint that prevents the decay of the corner mode. While the first mechanism ensures the stability of the sub-harmonic response throughout the entirety of the prethermal regime, it is restricted to initial states in the ground state manifold of the effective Hamiltonian. By contrast, the second mechanism enables the observation of the prethermal time-crystalline order for arbitrary initial states, albeit with a time scale that is not only determined by the frequency of the drive, but also the relative energy scale across the system's sublattices. We characterize these two mechanisms by simulating the dynamics of a periodically driven two-dimensional spin model, and discuss natural extensions of our model to all other dimensions. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: 7+8 pages, 3+5 figures

arXiv:2406.00966 [pdf, other]

Guaranteeing Data Privacy in Federated Unlearning with Dynamic User Participation

Authors: Ziyao Liu, Yu Jiang, Weifeng Jiang, Jiale Guo, Jun Zhao, Kwok-Yan Lam

Abstract: Federated Unlearning (FU) is gaining prominence for its capacity to eliminate influences of Federated Learning (FL) users' data from trained global FL models. A straightforward FU method involves removing the unlearned users and subsequently retraining a new global FL model from scratch with all remaining users, a process that leads to considerable overhead. To enhance unlearning efficiency, a wid… ▽ More Federated Unlearning (FU) is gaining prominence for its capacity to eliminate influences of Federated Learning (FL) users' data from trained global FL models. A straightforward FU method involves removing the unlearned users and subsequently retraining a new global FL model from scratch with all remaining users, a process that leads to considerable overhead. To enhance unlearning efficiency, a widely adopted strategy employs clustering, dividing FL users into clusters, with each cluster maintaining its own FL model. The final inference is then determined by aggregating the majority vote from the inferences of these sub-models. This method confines unlearning processes to individual clusters for removing a user, thereby enhancing unlearning efficiency by eliminating the need for participation from all remaining users. However, current clustering-based FU schemes mainly concentrate on refining clustering to boost unlearning efficiency but overlook the potential information leakage from FL users' gradients, a privacy concern that has been extensively studied. Typically, integrating secure aggregation (SecAgg) schemes within each cluster can facilitate a privacy-preserving FU. Nevertheless, crafting a clustering methodology that seamlessly incorporates SecAgg schemes is challenging, particularly in scenarios involving adversarial users and dynamic users. In this connection, we systematically explore the integration of SecAgg protocols within the most widely used federated unlearning scheme, which is based on clustering, to establish a privacy-preserving FU framework, aimed at ensuring privacy while effectively managing dynamic user participation. Comprehensive theoretical assessments and experimental results show that our proposed scheme achieves comparable unlearning effectiveness, alongside offering improved privacy protection and resilience in the face of varying user participation. △ Less

Submitted 2 June, 2024; originally announced June 2024.

arXiv:2406.00602 [pdf, other]

From Effectiveness to Efficiency: Comparative Evaluation of Code Generated by LCGMs for Bilingual Programming Questions

Authors: Weipeng Jiang, Xuanqi Gao, Juan Zhai, Shiqing Ma, Xiaoyu Zhang, Chao Shen

Abstract: Large Code Generation Models (LCGMs) have garnered significant attention and achieved promising results across various programming tasks. However, concerns arise regarding performance when using non-English prompts, as these models are primarily trained on English-centric corpora, and most programming language tokens resemble English. Existing benchmarks often rely on English programming questions… ▽ More Large Code Generation Models (LCGMs) have garnered significant attention and achieved promising results across various programming tasks. However, concerns arise regarding performance when using non-English prompts, as these models are primarily trained on English-centric corpora, and most programming language tokens resemble English. Existing benchmarks often rely on English programming questions and limited manual unit test cases, inadequately assessing LCGM-generated code quality. This paper investigates code quality differences, specifically effectiveness and efficiency, when employing different natural languages as inputs, focusing on Chinese and English due to their prominent corpora and LCGM availability. Evaluating LCGM-generated code quality under bilingual inputs presents three challenges: (1) lack of high-quality bilingual programming question datasets, (2) insufficient unit test cases for comprehensive correctness verification, and (3) limited support for comparing generated code performance. To address these challenges, we curated a test suite of 52 bilingual programming questions and developed automated input generators for each. We enhanced correctness verification by sampling larger unit test cases and estimated code performance by profiling execution time relative to input size growth. Using this framework, we conducted an empirical study on six state-of-the-art LCGMs. The results revealed that LCGM-generated code exhibits varying bilingual correctness on an average of 10.5% of tasks, with 39.5% of correct code showing diverse bilingual performance differences. Our findings suggested LCGMs may not consistently generate high-quality code across different languages, providing insights for future research directions. △ Less

Submitted 1 June, 2024; originally announced June 2024.

Comments: 10 and a quarter pages, 6 figures

arXiv:2406.00538 [pdf, other]

Cost-Effectiveness Analysis and Design of Cost-Efficient Cell-Free Massive MIMO Systems

Authors: Wei Jiang, Hans D. Schotten

Abstract: Cell-free massive multi-input multi-output (MIMO) has recently attracted much attention, attributed to its potential to deliver uniform service quality. However, the adoption of a cell-free architecture raises concerns about the high implementation costs associated with deploying numerous distributed access points (APs) and the need for fronthaul network installation. To ensure the sustainability… ▽ More Cell-free massive multi-input multi-output (MIMO) has recently attracted much attention, attributed to its potential to deliver uniform service quality. However, the adoption of a cell-free architecture raises concerns about the high implementation costs associated with deploying numerous distributed access points (APs) and the need for fronthaul network installation. To ensure the sustainability of next-generation wireless networks, it is crucial to improve cost-effectiveness, alongside achieving high performance. To address this, we conduct a cost analysis of cell-free massive MIMO and build a unified model with varying numbers of antennas per AP. Our objective is to explore whether employing multi-antenna APs could reduce system costs while maintaining performance. The analysis and evaluation result in the identification of a cost-effective design for cell-free massive MIMO, providing valuable insights for practical implementation. △ Less

Submitted 1 June, 2024; originally announced June 2024.

Comments: 2024 Annual IEEE International Symposium on Personal, Indoor and Mobile Radio Communications (IEEE PIMRC 2024)

arXiv:2406.00489 [pdf, other]

Efficient Sign-Based Optimization: Accelerating Convergence via Variance Reduction

Authors: Wei Jiang, Sifan Yang, Wenhao Yang, Lijun Zhang

Abstract: Sign stochastic gradient descent (signSGD) is a communication-efficient method that transmits only the sign of stochastic gradients for parameter updating. Existing literature has demonstrated that signSGD can achieve a convergence rate of $\mathcal{O}(d^{1/2}T^{-1/4})$, where $d$ represents the dimension and $T$ is the iteration number. In this paper, we improve this convergence rate to… ▽ More Sign stochastic gradient descent (signSGD) is a communication-efficient method that transmits only the sign of stochastic gradients for parameter updating. Existing literature has demonstrated that signSGD can achieve a convergence rate of $\mathcal{O}(d^{1/2}T^{-1/4})$, where $d$ represents the dimension and $T$ is the iteration number. In this paper, we improve this convergence rate to $\mathcal{O}(d^{1/2}T^{-1/3})$ by introducing the Sign-based Stochastic Variance Reduction (SSVR) method, which employs variance reduction estimators to track gradients and leverages their signs to update. For finite-sum problems, our method can be further enhanced to achieve a convergence rate of $\mathcal{O}(m^{1/4}d^{1/2}T^{-1/2})$, where $m$ denotes the number of component functions. Furthermore, we investigate the heterogeneous majority vote in distributed settings and introduce two novel algorithms that attain improved convergence rates of $\mathcal{O}(d^{1/2}T^{-1/2} + dn^{-1/2})$ and $\mathcal{O}(d^{1/4}T^{-1/4})$ respectively, outperforming the previous results of $\mathcal{O}(dT^{-1/4} + dn^{-1/2})$ and $\mathcal{O}(d^{3/8}T^{-1/8})$, where $n$ represents the number of nodes. Numerical experiments across different tasks validate the effectiveness of our proposed methods. △ Less

Submitted 1 June, 2024; originally announced June 2024.

arXiv:2406.00330 [pdf, other]

Magnetic ground state of monolayer CeI$_{2}$: occupation matrix control and DFT+U calculations

Authors: Yue-Fei Hou, Shujing Li, Xinlong Yang, Wei Jiang, Qiuhao Wang, Fawei Zheng, Zhen-Guo Fu, Ping Zhang

Abstract: The magnetic ground state is crucial for the applications of the two-dimension magnets as it decides fundamental magnetic properties of the material, such as magnetic order, magnetic transition temperature, and low-energy excitation of the spin waves. However, the simulations for magnetism of local-electron systems are challenging due to the existence of metastable states. In this study, occupatio… ▽ More The magnetic ground state is crucial for the applications of the two-dimension magnets as it decides fundamental magnetic properties of the material, such as magnetic order, magnetic transition temperature, and low-energy excitation of the spin waves. However, the simulations for magnetism of local-electron systems are challenging due to the existence of metastable states. In this study, occupation matrix control (OMC) and density functional theory plus Hubbard $U$ calculations are applied to investigate the magnetic ground state of monolayer CeI$_{2}$. Following the predicted ferrimagnetic (FM) order, the FM ground state and the FM metastable states are identified and found to have different values of the magnetic parameters. Based on the calculated magnetic parameters of the FM ground state, the Curie temperature is estimated to be $128$ K for monolayer CeI$_{2}$. When spin-orbit coupling (SOC) is considered,the FM ground state is further confirmed to contain both off-plane and in-plane components of magnetization. SOC is shown to be essential for reasonably describing not only magnetic anisotropy but also local electronic orbital state of monolayer CeI$_{2}$. △ Less

Submitted 1 June, 2024; originally announced June 2024.

Comments: 4 figures. Comments are welcome

arXiv:2405.18765 [pdf, other]

Large Brain Model for Learning Generic Representations with Tremendous EEG Data in BCI

Authors: Wei-Bang Jiang, Li-Ming Zhao, Bao-Liang Lu

Abstract: The current electroencephalogram (EEG) based deep learning models are typically designed for specific datasets and applications in brain-computer interaction (BCI), limiting the scale of the models and thus diminishing their perceptual capabilities and generalizability. Recently, Large Language Models (LLMs) have achieved unprecedented success in text processing, prompting us to explore the capabi… ▽ More The current electroencephalogram (EEG) based deep learning models are typically designed for specific datasets and applications in brain-computer interaction (BCI), limiting the scale of the models and thus diminishing their perceptual capabilities and generalizability. Recently, Large Language Models (LLMs) have achieved unprecedented success in text processing, prompting us to explore the capabilities of Large EEG Models (LEMs). We hope that LEMs can break through the limitations of different task types of EEG datasets, and obtain universal perceptual capabilities of EEG signals through unsupervised pre-training. Then the models can be fine-tuned for different downstream tasks. However, compared to text data, the volume of EEG datasets is generally small and the format varies widely. For example, there can be mismatched numbers of electrodes, unequal length data samples, varied task designs, and low signal-to-noise ratio. To overcome these challenges, we propose a unified foundation model for EEG called Large Brain Model (LaBraM). LaBraM enables cross-dataset learning by segmenting the EEG signals into EEG channel patches. Vector-quantized neural spectrum prediction is used to train a semantically rich neural tokenizer that encodes continuous raw EEG channel patches into compact neural codes. We then pre-train neural Transformers by predicting the original neural codes for the masked EEG channel patches. The LaBraMs were pre-trained on about 2,500 hours of various types of EEG signals from around 20 datasets and validated on multiple different types of downstream tasks. Experiments on abnormal detection, event type classification, emotion recognition, and gait prediction show that our LaBraM outperforms all compared SOTA methods in their respective fields. Our code is available at https://github.com/935963004/LaBraM. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: The Twelfth International Conference on Learning Representations

Journal ref: The Twelfth International Conference on Learning Representations, 2024

arXiv:2405.18720 [pdf, other]

Machine-Learning based photon counting for PMT waveforms and its application to the improvement of the energy resolution in large liquid scintillator detectors

Authors: Wei Jiang, Guihong Huang, Zhen Liu, Wuming Luo, Liangjian Wen, Jianyi Luo

Abstract: Photomultiplier tubes (PMTs) are widely used in particle experiments for photon detection. PMT waveform analysis is crucial for high-precision measurement of the position and energy of incident particles in liquid scintillator (LS) detectors. A key factor contributing to the energy resolution in large liquid scintillator detectors with PMTs is the charge smearing of PMTs. This paper presents a mac… ▽ More Photomultiplier tubes (PMTs) are widely used in particle experiments for photon detection. PMT waveform analysis is crucial for high-precision measurement of the position and energy of incident particles in liquid scintillator (LS) detectors. A key factor contributing to the energy resolution in large liquid scintillator detectors with PMTs is the charge smearing of PMTs. This paper presents a machine-learning-based photon counting method for PMT waveforms and its application to the energy reconstruction, using the JUNO experiment as an example. The results indicate that leveraging the photon counting information from the machine learning model can partially mitigate the impact of PMT charge smearing and lead to a relative 2.0% to 2.8% improvement on the energy resolution at different energies. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: 10 pages, 9 figures

arXiv:2405.17792 [pdf, other]

JUNO Sensitivity to Invisible Decay Modes of Neutrons

Authors: JUNO Collaboration, Angel Abusleme, Thomas Adam, Kai Adamowicz, Shakeel Ahmad, Rizwan Ahmed, Sebastiano Aiello, Fengpeng An, Qi An, Giuseppe Andronico, Nikolay Anfimov, Vito Antonelli, Tatiana Antoshkina, João Pedro Athayde Marcondes de André, Didier Auguste, Weidong Bai, Nikita Balashov, Wander Baldini, Andrea Barresi, Davide Basilico, Eric Baussan, Marco Bellato, Marco Beretta, Antonio Bergnoli, Daniel Bick , et al. (635 additional authors not shown)

Abstract: We explore the bound neutrons decay into invisible particles (e.g., $n\rightarrow 3 ν$ or $nn \rightarrow 2 ν$) in the JUNO liquid scintillator detector. The invisible decay includes two decay modes: $ n \rightarrow { inv} $ and $ nn \rightarrow { inv} $. The invisible decays of $s$-shell neutrons in $^{12}{\rm C}$ will leave a highly excited residual nucleus. Subsequently, some de-excitation mode… ▽ More We explore the bound neutrons decay into invisible particles (e.g., $n\rightarrow 3 ν$ or $nn \rightarrow 2 ν$) in the JUNO liquid scintillator detector. The invisible decay includes two decay modes: $ n \rightarrow { inv} $ and $ nn \rightarrow { inv} $. The invisible decays of $s$-shell neutrons in $^{12}{\rm C}$ will leave a highly excited residual nucleus. Subsequently, some de-excitation modes of the excited residual nuclei can produce a time- and space-correlated triple coincidence signal in the JUNO detector. Based on a full Monte Carlo simulation informed with the latest available data, we estimate all backgrounds, including inverse beta decay events of the reactor antineutrino $\barν_e$, natural radioactivity, cosmogenic isotopes and neutral current interactions of atmospheric neutrinos. Pulse shape discrimination and multivariate analysis techniques are employed to further suppress backgrounds. With two years of exposure, JUNO is expected to give an order of magnitude improvement compared to the current best limits. After 10 years of data taking, the JUNO expected sensitivities at a 90% confidence level are $τ/B( n \rightarrow { inv} ) > 5.0 \times 10^{31} \, {\rm yr}$ and $τ/B( nn \rightarrow { inv} ) > 1.4 \times 10^{32} \, {\rm yr}$. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 28 pages, 7 figures, 4 tables

arXiv:2405.17264 [pdf, other]

On the Noise Robustness of In-Context Learning for Text Generation

Authors: Hongfu Gao, Feipeng Zhang, Wenyu Jiang, Jun Shu, Feng Zheng, Hongxin Wei

Abstract: Large language models (LLMs) have shown impressive performance on downstream tasks by in-context learning (ICL), which heavily relies on the quality of demonstrations selected from a large set of annotated examples. Recent works claim that in-context learning is robust to noisy demonstrations in text classification. In this work, we show that, on text generation tasks, noisy annotations significan… ▽ More Large language models (LLMs) have shown impressive performance on downstream tasks by in-context learning (ICL), which heavily relies on the quality of demonstrations selected from a large set of annotated examples. Recent works claim that in-context learning is robust to noisy demonstrations in text classification. In this work, we show that, on text generation tasks, noisy annotations significantly hurt the performance of in-context learning. To circumvent the issue, we propose a simple and effective approach called Local Perplexity Ranking (LPR), which replaces the "noisy" candidates with their nearest neighbors that are more likely to be clean. Our method is motivated by analyzing the perplexity deviation caused by noisy labels and decomposing perplexity into inherent perplexity and matching perplexity. Our key idea behind LPR is thus to decouple the matching perplexity by performing the ranking among the neighbors in semantic space. Our approach can prevent the selected demonstrations from including mismatched input-label pairs while preserving the effectiveness of the original selection methods. Extensive experiments demonstrate the effectiveness of LPR, improving the EM score by up to 18.75 on common benchmarks with noisy annotations. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.16854 [pdf, other]

Knowing What Not to Do: Leverage Language Model Insights for Action Space Pruning in Multi-agent Reinforcement Learning

Authors: Zhihao Liu, Xianliang Yang, Zichuan Liu, Yifan Xia, Wei Jiang, Yuanyu Zhang, Lijuan Li, Guoliang Fan, Lei Song, Bian Jiang

Abstract: Multi-agent reinforcement learning (MARL) is employed to develop autonomous agents that can learn to adopt cooperative or competitive strategies within complex environments. However, the linear increase in the number of agents leads to a combinatorial explosion of the action space, which may result in algorithmic instability, difficulty in convergence, or entrapment in local optima. While research… ▽ More Multi-agent reinforcement learning (MARL) is employed to develop autonomous agents that can learn to adopt cooperative or competitive strategies within complex environments. However, the linear increase in the number of agents leads to a combinatorial explosion of the action space, which may result in algorithmic instability, difficulty in convergence, or entrapment in local optima. While researchers have designed a variety of effective algorithms to compress the action space, these methods also introduce new challenges, such as the need for manually designed prior knowledge or reliance on the structure of the problem, which diminishes the applicability of these techniques. In this paper, we introduce Evolutionary action SPAce Reduction with Knowledge (eSpark), an exploration function generation framework driven by large language models (LLMs) to boost exploration and prune unnecessary actions in MARL. Using just a basic prompt that outlines the overall task and setting, eSpark is capable of generating exploration functions in a zero-shot manner, identifying and pruning redundant or irrelevant state-action pairs, and then achieving autonomous improvement from policy feedback. In reinforcement learning tasks involving inventory management and traffic light control encompassing a total of 15 scenarios, eSpark consistently outperforms the combined MARL algorithm in all scenarios, achieving an average performance gain of 34.4% and 9.9% in the two types of tasks respectively. Additionally, eSpark has proven to be capable of managing situations with a large number of agents, securing a 29.7% improvement in scalability challenges that featured over 500 agents. The code can be found in https://github.com/LiuZhihao2022/eSpark.git. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.14303 [pdf, other]

Similarity-Navigated Conformal Prediction for Graph Neural Networks

Authors: Jianqing Song, Jianguo Huang, Wenyu Jiang, Baoming Zhang, Shuangjie Li, Chongjun Wang

Abstract: Graph Neural Networks have achieved remarkable accuracy in semi-supervised node classification tasks. However, these results lack reliable uncertainty estimates. Conformal prediction methods provide a theoretical guarantee for node classification tasks, ensuring that the conformal prediction set contains the ground-truth label with a desired probability (e.g., 95%). In this paper, we empirically s… ▽ More Graph Neural Networks have achieved remarkable accuracy in semi-supervised node classification tasks. However, these results lack reliable uncertainty estimates. Conformal prediction methods provide a theoretical guarantee for node classification tasks, ensuring that the conformal prediction set contains the ground-truth label with a desired probability (e.g., 95%). In this paper, we empirically show that for each node, aggregating the non-conformity scores of nodes with the same label can improve the efficiency of conformal prediction sets. This observation motivates us to propose a novel algorithm named Similarity-Navigated Adaptive Prediction Sets (SNAPS), which aggregates the non-conformity scores based on feature similarity and structural neighborhood. The key idea behind SNAPS is that nodes with high feature similarity or direct connections tend to have the same label. By incorporating adaptive similar nodes information, SNAPS can generate compact prediction sets and increase the singleton hit ratio (correct prediction sets of size one). Moreover, we theoretically provide a finite-sample coverage guarantee of SNAPS. Extensive experiments demonstrate the superiority of SNAPS, improving the efficiency of prediction sets and singleton hit ratio while maintaining valid coverage. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.11826 [pdf, other]

Data quality control system and long-term performance monitor of the LHAASO-KM2A

Authors: Zhen Cao, F. Aharonian, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, W. Bian, A. V. Bukevich, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, H. X. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. Chen , et al. (263 additional authors not shown)

Abstract: The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To… ▽ More The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To ensure the reliability of the LHAASO-KM2A data, a three-level quality control system has been established. It is used to monitor the status of detector units, stability of reconstructed parameters and the performance of the array based on observations of the Crab Nebula and Moon shadow. This paper will introduce the control system and its application on the LHAASO-KM2A data collected from August 2021 to July 2023. During this period, the pointing and angular resolution of the array were stable. From the observations of the Moon shadow and Crab Nebula, the results achieved using the two methods are consistent with each other. According to the observation of the Crab Nebula at energies from 25 TeV to 100 TeV, the time averaged pointing errors are estimated to be $-0.003^{\circ} \pm 0.005^{\circ}$ and $0.001^{\circ} \pm 0.006^{\circ}$ in the R.A. and Dec directions, respectively. △ Less

Submitted 13 June, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

Comments: 15 pages, 9 figures

arXiv:2405.11459 [pdf, other]

Du-IN: Discrete units-guided mask modeling for decoding speech from Intracranial Neural signals

Authors: Hui Zheng, Hai-Teng Wang, Wei-Bang Jiang, Zhong-Tao Chen, Li He, Pei-Yang Lin, Peng-Hu Wei, Guo-Guang Zhao, Yun-Zhe Liu

Abstract: Invasive brain-computer interfaces have garnered significant attention due to their high performance. The current intracranial stereoElectroEncephaloGraphy (sEEG) foundation models typically build univariate representations based on a single channel. Some of them further use Transformer to model the relationship among channels. However, due to the locality and specificity of brain computation, the… ▽ More Invasive brain-computer interfaces have garnered significant attention due to their high performance. The current intracranial stereoElectroEncephaloGraphy (sEEG) foundation models typically build univariate representations based on a single channel. Some of them further use Transformer to model the relationship among channels. However, due to the locality and specificity of brain computation, their performance on more difficult tasks, e.g., speech decoding, which demands intricate processing in specific brain regions, is yet to be fully investigated. We hypothesize that building multi-variate representations within certain brain regions can better capture the specific neural processing. To explore this hypothesis, we collect a well-annotated Chinese word-reading sEEG dataset, targeting language-related brain networks, over 12 subjects. Leveraging this benchmark dataset, we developed the Du-IN model that can extract contextual embeddings from specific brain regions through discrete codebook-guided mask modeling. Our model achieves SOTA performance on the downstream 61-word classification task, surpassing all baseline models. Model comparison and ablation analysis reveal that our design choices, including (i) multi-variate representation by fusing channels in vSMC and STG regions and (ii) self-supervision by discrete codebook-guided mask modeling, significantly contribute to these performances. Collectively, our approach, inspired by neuroscience findings, capitalizing on multi-variate neural representation from specific brain regions, is suitable for invasive brain modeling. It marks a promising neuro-inspired AI approach in BCI. △ Less

Submitted 19 May, 2024; originally announced May 2024.

arXiv:2405.11218 [pdf, other]

Learning-based Block-wise Planar Channel Estimation for Time-Varying MIMO OFDM

Authors: Chenchen Liu, Wenjun Jiang, Xiaojun Yuan

Abstract: In this paper, we propose a learning-based block-wise planar channel estimator (LBPCE) with high accuracy and low complexity to estimate the time-varying frequency-selective channel of a multiple-input multiple-output (MIMO) orthogonal frequency-division multiplexing (OFDM) system. First, we establish a block-wise planar channel model (BPCM) to characterize the correlation of the channel across su… ▽ More In this paper, we propose a learning-based block-wise planar channel estimator (LBPCE) with high accuracy and low complexity to estimate the time-varying frequency-selective channel of a multiple-input multiple-output (MIMO) orthogonal frequency-division multiplexing (OFDM) system. First, we establish a block-wise planar channel model (BPCM) to characterize the correlation of the channel across subcarriers and OFDM symbols. Specifically, adjacent subcarriers and OFDM symbols are divided into several sub-blocks, and an affine function (i.e., a plane) with only three variables (namely, mean, time-domain slope, and frequency-domain slope) is used to approximate the channel in each sub-block, which significantly reduces the number of variables to be determined in channel estimation. Second, we design a 3D dilated residual convolutional network (3D-DRCN) that leverages the time-frequency-space-domain correlations of the channel to further improve the channel estimates of each user. Numerical results demonstrate that the proposed significantly outperforms the state-of-the-art estimators and maintains a relatively low computational complexity. △ Less

Submitted 18 May, 2024; originally announced May 2024.

arXiv:2405.10753 [pdf]

Topological spin-torque diode effect in skyrmion-based magnetic tunnel junctions

Authors: Bin Fang, Riccardo Tomasello, Yuxuan Wu, Aitian Chen, Shuhui Liu, Baoshun Zhang, Emily Darwin, Mario Carpentieri, Wanjun Jiang, Xixiang Zhang, Giovanni Finocchio, Zhongming Zeng

Abstract: The growing market and massive use of Internet of Things nodes is placing unprecedented demands of energy efficient hardware for edge computing and microwave devices. In particular, magnetic tunnel junctions (MTJs), as main building blocks of spintronic microwave technology, can offer a path for the development of compact and high-performance microwave detectors. On the other hand, the fascinating… ▽ More The growing market and massive use of Internet of Things nodes is placing unprecedented demands of energy efficient hardware for edge computing and microwave devices. In particular, magnetic tunnel junctions (MTJs), as main building blocks of spintronic microwave technology, can offer a path for the development of compact and high-performance microwave detectors. On the other hand, the fascinating field of skyrmionics is bridging together concepts from topology and spintronics. Here, we show the proof of concept of a topological spin-torque diode realized with an MTJ on top of a skyrmionic material at room temperature and for a wide region of applied fields, including the zero-field case. Our spin torque diode electrical measurements show the electrical excitation of a skyrmion resonant mode with frequencies near 4 GHz and a selectivity one order of magnitude smaller than the uniform modes excited in the same device. Micromagnetic simulations identify these dynamics with the excitation of the breathing mode and point out the role of thickness dependent magnetic parameters (magnetic anisotropy field and Dzyaloshinkii Moriya interaction) in both stabilizing and exciting the magnetic skyrmions. This work marks a milestone for the development of topological spin-torque diodes. △ Less

Submitted 17 May, 2024; originally announced May 2024.

arXiv:2405.09951 [pdf, other]

A Review of Multiple Access Techniques for Intelligent Reflecting Surface-Assisted Systems

Authors: Wei Jiang, Hans Schotten

Abstract: Intelligent Reflecting Surface (IRS) is envisioned to be a technical enabler for the sixth-generation (6G) wireless system. Its potential lies in delivering high performance while maintaining both power efficiency and cost-effectiveness. Previous studies have primarily focused on point-to-point IRS communications involving a single user. Nevertheless, a practical system must serve multiple users s… ▽ More Intelligent Reflecting Surface (IRS) is envisioned to be a technical enabler for the sixth-generation (6G) wireless system. Its potential lies in delivering high performance while maintaining both power efficiency and cost-effectiveness. Previous studies have primarily focused on point-to-point IRS communications involving a single user. Nevertheless, a practical system must serve multiple users simultaneously. The unique characteristics of IRS, such as non-frequency-selective reflection and the necessity for joint active/passive beamforming, create obstacles to the use of conventional multiple access (MA) techniques. This motivates us to review various MA techniques to make clear their functionalities in the presence of IRS. Through this paper, our aim is to provide researchers with a comprehensive understanding of challenges and available solutions, offering insights to foster their design of efficient multiple access for IRS-aided systems. △ Less

Submitted 16 May, 2024; originally announced May 2024.

Comments: The Fourth IEEE International Mediterranean Conference on Communications and Networking (IEEE MEDITCOM 2024)

arXiv:2405.09928 [pdf, other]

Unified Modeling and Performance Comparison for Cellular and Cell-Free Massive MIMO

Authors: Wei Jiang, Hans D. Schotten

Abstract: Cell-free massive multi-input multi-output (MIMO) has recently gained a lot of attention due to its high potential in sixth-generation (6G) wireless systems. The goal of this paper is to first present a unified modeling for massive MIMO, encompassing both cellular and cell-free architectures with a variable number of antennas per access point. We derive signal transmission models and achievable sp… ▽ More Cell-free massive multi-input multi-output (MIMO) has recently gained a lot of attention due to its high potential in sixth-generation (6G) wireless systems. The goal of this paper is to first present a unified modeling for massive MIMO, encompassing both cellular and cell-free architectures with a variable number of antennas per access point. We derive signal transmission models and achievable spectral efficiency in both the downlink and uplink using zero-forcing and maximal-ratio schemes. We also provide performance comparisons in terms of per-user and sum spectral efficiency. △ Less

Submitted 16 May, 2024; originally announced May 2024.

Comments: The Fourth IEEE International Mediterranean Conference on Communications and Networking (IEEE MEDITCOM 2024)

arXiv:2405.09905 [pdf, other]

Cell-Free Terahertz Massive MIMO: A Novel Paradigm Beyond Ultra-Massive MIMO

Authors: Wei Jiang, Hans D. Schotten

Abstract: Terahertz (THz) frequencies have recently garnered considerable attention due to their potential to offer abundant spectral resources for communication, as well as distinct advantages in sensing, positioning, and imaging. Nevertheless, practical implementation encounters challenges stemming from the limited distances of signal transmission, primarily due to notable propagation, absorption, and blo… ▽ More Terahertz (THz) frequencies have recently garnered considerable attention due to their potential to offer abundant spectral resources for communication, as well as distinct advantages in sensing, positioning, and imaging. Nevertheless, practical implementation encounters challenges stemming from the limited distances of signal transmission, primarily due to notable propagation, absorption, and blockage losses. To address this issue, the current strategy involves employing ultra-massive multi-input multi-output (UMMIMO) to generate high beamforming gains, thereby extending the transmission range. This paper introduces an alternative solution through the utilization of cell-free massive MIMO (CFmMIMO) architecture, wherein the closest access point is actively chosen to reduce the distance, rather than relying solely on a substantial number of antennas. We compare these two techniques through simulations and the numerical results justify that CFmMIMO is superior to UMMIMO in both spectral and energy efficiency at THz frequencies. △ Less

Submitted 16 May, 2024; originally announced May 2024.

Comments: The Fourth IEEE International Mediterranean Conference on Communications and Networking (IEEE MEDITCOM 2024), July 2024, Madrid, Spain

arXiv:2405.09789 [pdf, other]

LeMeViT: Efficient Vision Transformer with Learnable Meta Tokens for Remote Sensing Image Interpretation

Authors: Wentao Jiang, Jing Zhang, Di Wang, Qiming Zhang, Zengmao Wang, Bo Du

Abstract: Due to spatial redundancy in remote sensing images, sparse tokens containing rich information are usually involved in self-attention (SA) to reduce the overall token numbers within the calculation, avoiding the high computational cost issue in Vision Transformers. However, such methods usually obtain sparse tokens by hand-crafted or parallel-unfriendly designs, posing a challenge to reach a better… ▽ More Due to spatial redundancy in remote sensing images, sparse tokens containing rich information are usually involved in self-attention (SA) to reduce the overall token numbers within the calculation, avoiding the high computational cost issue in Vision Transformers. However, such methods usually obtain sparse tokens by hand-crafted or parallel-unfriendly designs, posing a challenge to reach a better balance between efficiency and performance. Different from them, this paper proposes to use learnable meta tokens to formulate sparse tokens, which effectively learn key information meanwhile improving the inference speed. Technically, the meta tokens are first initialized from image tokens via cross-attention. Then, we propose Dual Cross-Attention (DCA) to promote information exchange between image tokens and meta tokens, where they serve as query and key (value) tokens alternatively in a dual-branch structure, significantly reducing the computational complexity compared to self-attention. By employing DCA in the early stages with dense visual tokens, we obtain the hierarchical architecture LeMeViT with various sizes. Experimental results in classification and dense prediction tasks show that LeMeViT has a significant $1.7 \times$ speedup, fewer parameters, and competitive performance compared to the baseline models, and achieves a better trade-off between efficiency and performance. △ Less

Submitted 15 May, 2024; originally announced May 2024.

Comments: Accepted by IJCAI'2024. The code is available at https://github.com/ViTAE-Transformer/LeMeViT

arXiv:2405.09179 [pdf, other]

Integrated Sensing and Communication Enabled Cooperative Passive Sensing Using Mobile Communication System

Authors: Zhiqing Wei, Haotian Liu, Hujun Li, Wangjun Jiang, Zhiyong Feng, Huici Wu, Ping Zhang

Abstract: Integrated sensing and communication (ISAC) is a potential technology of the sixth-generation (6G) mobile communication system, which enables communication base station (BS) with sensing capability. However, the performance of single-BS sensing is limited, which can be overcome by multi-BS cooperative sensing. There are three types of multi-BS cooperative sensing, including cooperative active sens… ▽ More Integrated sensing and communication (ISAC) is a potential technology of the sixth-generation (6G) mobile communication system, which enables communication base station (BS) with sensing capability. However, the performance of single-BS sensing is limited, which can be overcome by multi-BS cooperative sensing. There are three types of multi-BS cooperative sensing, including cooperative active sensing, cooperative passive sensing, and cooperative active and passive sensing, where the multi-BS cooperative passive sensing has the advantages of low hardware modification cost and large sensing coverage. However, multi-BS cooperative passive sensing faces the challenges of synchronization offsets mitigation and sensing information fusion. To address these challenges, a non-line of sight (NLoS) and line of sight (LoS) signal cross-correlation (NLCC) method is proposed to mitigate carrier frequency offset (CFO) and time offset (TO). Besides, a symbol-level multi-BS sensing information fusion method is proposed. The discrete samplings of echo signals from multiple BSs are matched independently and coherent accumulated to improve sensing accuracy. Moreover, a lowcomplexity joint angle-of-arrival (AoA) and angle-of-departure (AoD) estimation method is proposed to reduce the computational complexity. Simulation results show that symbol-level multi-BS cooperative passive sensing scheme has an order of magnitude higher sensing accuracy than single-BS passive sensing. This work provides a reference for the research on multi-BS cooperative passive sensing. △ Less

Submitted 15 May, 2024; originally announced May 2024.

Comments: 16 pages, 11 figures, Submitted to IEEE Transactions on Mobile Computing

Showing 1–50 of 1,341 results for author: Jiang, W