subscribe to arXiv mailings

Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training

Authors: Youliang Yuan, Wenxiang Jiao, Wenxuan Wang, Jen-tse Huang, Jiahao Xu, Tian Liang, Pinjia He, Zhaopeng Tu

Abstract: This study addresses a critical gap in safety tuning practices for Large Language Models (LLMs) by identifying and tackling a refusal position bias within safety tuning data, which compromises the models' ability to appropriately refuse generating unsafe content. We introduce a novel approach, Decoupled Refusal Training (DeRTa), designed to empower LLMs to refuse compliance to harmful prompts at a… ▽ More This study addresses a critical gap in safety tuning practices for Large Language Models (LLMs) by identifying and tackling a refusal position bias within safety tuning data, which compromises the models' ability to appropriately refuse generating unsafe content. We introduce a novel approach, Decoupled Refusal Training (DeRTa), designed to empower LLMs to refuse compliance to harmful prompts at any response position, significantly enhancing their safety capabilities. DeRTa incorporates two novel components: (1) Maximum Likelihood Estimation (MLE) with Harmful Response Prefix, which trains models to recognize and avoid unsafe content by appending a segment of harmful response to the beginning of a safe response, and (2) Reinforced Transition Optimization (RTO), which equips models with the ability to transition from potential harm to safety refusal consistently throughout the harmful response sequence. Our empirical evaluation, conducted using LLaMA3 and Mistral model families across six attack scenarios, demonstrates that our method not only improves model safety without compromising performance but also surpasses well-known models such as GPT-4 in defending against attacks. Importantly, our approach successfully defends recent advanced attack methods (e.g., CodeAttack) that have jailbroken GPT-4 and LLaMA3-70B-Instruct. Our code and data can be found at https://github.com/RobustNLP/DeRTa. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.07304 [pdf, other]

Inference Performance Optimization for Large Language Models on CPUs

Authors: Pujiang He, Shan Zhou, Wenhuan Huang, Changqing Li, Duyi Wang, Bin Guo, Chen Meng, Sheng Gui, Weifei Yu, Yi Xie

Abstract: Large language models (LLMs) have shown exceptional performance and vast potential across diverse tasks. However, the deployment of LLMs with high performance in low-resource environments has garnered significant attention in the industry. When GPU hardware resources are limited, we can explore alternative options on CPUs. To mitigate the financial burden and alleviate constraints imposed by hardw… ▽ More Large language models (LLMs) have shown exceptional performance and vast potential across diverse tasks. However, the deployment of LLMs with high performance in low-resource environments has garnered significant attention in the industry. When GPU hardware resources are limited, we can explore alternative options on CPUs. To mitigate the financial burden and alleviate constraints imposed by hardware resources, optimizing inference performance is necessary. In this paper, we introduce an easily deployable inference performance optimization solution aimed at accelerating LLMs on CPUs. In this solution, we implement an effective way to reduce the KV cache size while ensuring precision. We propose a distributed inference optimization approach and implement it based on oneAPI Collective Communications Library. Furthermore, we propose optimization approaches for LLMs on CPU, and conduct tailored optimizations for the most commonly used models. The code is open-sourced at https://github.com/intel/xFasterTransformer. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: 5 pages, 6 figure, ICML 2024 on Foundation Models in the Wild

arXiv:2407.05739 [pdf, other]

Multi-Bit Mechanism: A Novel Information Transmission Paradigm for Spiking Neural Networks

Authors: Yongjun Xiao, Xianlong Tian, Yongqi Ding, Pei He, Mengmeng Jing, Lin Zuo

Abstract: Since proposed, spiking neural networks (SNNs) gain recognition for their high performance, low power consumption and enhanced biological interpretability. However, while bringing these advantages, the binary nature of spikes also leads to considerable information loss in SNNs, ultimately causing performance degradation. We claim that the limited expressiveness of current binary spikes, resulting… ▽ More Since proposed, spiking neural networks (SNNs) gain recognition for their high performance, low power consumption and enhanced biological interpretability. However, while bringing these advantages, the binary nature of spikes also leads to considerable information loss in SNNs, ultimately causing performance degradation. We claim that the limited expressiveness of current binary spikes, resulting in substantial information loss, is the fundamental issue behind these challenges. To alleviate this, our research introduces a multi-bit information transmission mechanism for SNNs. This mechanism expands the output of spiking neurons from the original single bit to multiple bits, enhancing the expressiveness of the spikes and reducing information loss during the forward process, while still maintaining the low energy consumption advantage of SNNs. For SNNs, this represents a new paradigm of information transmission. Moreover, to further utilize the limited spikes, we extract effective signals from the previous layer to re-stimulate the neurons, thus encouraging full spikes emission across various bit levels. We conducted extensive experiments with our proposed method using both direct training method and ANN-SNN conversion method, and the results show consistent performance improvements. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: Under review

arXiv:2407.02756 [pdf, other]

Probing Krylov Complexity in Scalar Field Theory with General Temperatures

Authors: Peng-Zhang He, Hai-Qing Zhang

Abstract: Krylov complexity characterizes the operator growth in quantum many-body systems or in quantum field theories. The existing literatures have studied the Krylov complexity in the low temperature limit in quantum field theory. In this paper, we extend and systematically study the Krylov complexity and Krylov entropy in a scalar field theory with general temperatures. To this end, we propose a new me… ▽ More Krylov complexity characterizes the operator growth in quantum many-body systems or in quantum field theories. The existing literatures have studied the Krylov complexity in the low temperature limit in quantum field theory. In this paper, we extend and systematically study the Krylov complexity and Krylov entropy in a scalar field theory with general temperatures. To this end, we propose a new method to calculate the Wightman power spectrum which allows us to compute the Lanczos coefficients and then to study the Krylov complexity (entropy) extensively. Compared to the previous studies in the low temperature limit, we find that the Lanczos coefficients and Krylov complexity (entropy) in the high temperature limit will behave somewhat differently. Finally, we also discuss the Krylov complexity (entropy) in the symmetry breaking phase with the Higgs scalar field at finite temperatures. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: 26 pages, 7 figures

arXiv:2407.00029 [pdf, other]

Distributed Inference Performance Optimization for LLMs on CPUs

Authors: Pujiang He, Shan Zhou, Changqing Li, Wenhuan Huang, Weifei Yu, Duyi Wang, Chen Meng, Sheng Gui

Abstract: Large language models (LLMs) hold tremendous potential for addressing numerous real-world challenges, yet they typically demand significant computational resources and memory. Deploying LLMs onto a resource-limited hardware device with restricted memory capacity presents considerable challenges. Distributed computing emerges as a prevalent strategy to mitigate single-node memory constraints and ex… ▽ More Large language models (LLMs) hold tremendous potential for addressing numerous real-world challenges, yet they typically demand significant computational resources and memory. Deploying LLMs onto a resource-limited hardware device with restricted memory capacity presents considerable challenges. Distributed computing emerges as a prevalent strategy to mitigate single-node memory constraints and expedite LLM inference performance. To reduce the hardware limitation burden, we proposed an efficient distributed inference optimization solution for LLMs on CPUs. We conduct experiments with the proposed solution on 5th Gen Intel Xeon Scalable Processors, and the result shows the time per output token for the LLM with 72B parameter is 140 ms/token, much faster than the average human reading speed about 200ms per token. △ Less

Submitted 16 May, 2024; originally announced July 2024.

Comments: 4 pages, 3 figures, Practical ML for Low Resource Settings Workshop @ ICLR 2024

arXiv:2406.14773 [pdf, other]

Mitigating the Privacy Issues in Retrieval-Augmented Generation (RAG) via Pure Synthetic Data

Authors: Shenglai Zeng, Jiankun Zhang, Pengfei He, Jie Ren, Tianqi Zheng, Hanqing Lu, Han Xu, Hui Liu, Yue Xing, Jiliang Tang

Abstract: Retrieval-augmented generation (RAG) enhances the outputs of language models by integrating relevant information retrieved from external knowledge sources. However, when the retrieval process involves private data, RAG systems may face severe privacy risks, potentially leading to the leakage of sensitive information. To address this issue, we propose using synthetic data as a privacy-preserving al… ▽ More Retrieval-augmented generation (RAG) enhances the outputs of language models by integrating relevant information retrieved from external knowledge sources. However, when the retrieval process involves private data, RAG systems may face severe privacy risks, potentially leading to the leakage of sensitive information. To address this issue, we propose using synthetic data as a privacy-preserving alternative for the retrieval data. We propose SAGE, a novel two-stage synthetic data generation paradigm. In the stage-1, we employ an attribute-based extraction and generation approach to preserve key contextual information from the original data. In the stage-2, we further enhance the privacy properties of the synthetic data through an agent-based iterative refinement process. Extensive experiments demonstrate that using our synthetic data as the retrieval context achieves comparable performance to using the original data while substantially reducing privacy risks. Our work takes the first step towards investigating the possibility of generating high-utility and privacy-preserving synthetic data for RAG, opening up new opportunities for the safe application of RAG systems in various domains. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.11645 [pdf, other]

SeamPose: Repurposing Seams as Capacitive Sensors in a Shirt for Upper-Body Pose Tracking

Authors: Tianhong Catherine Yu, Manru, Zhang, Peter He, Chi-Jung Lee, Cassidy Cheesman, Saif Mahmud, Ruidong Zhang, François Guimbretière, Cheng Zhang

Abstract: Seams are areas of overlapping fabric formed by stitching two or more pieces of fabric together in the cut-and-sew apparel manufacturing process. In SeamPose, we repurposed seams as capacitive sensors in a shirt for continuous upper-body pose estimation. Compared to previous all-textile motion-capturing garments that place the electrodes on the surface of clothing, our solution leverages existing… ▽ More Seams are areas of overlapping fabric formed by stitching two or more pieces of fabric together in the cut-and-sew apparel manufacturing process. In SeamPose, we repurposed seams as capacitive sensors in a shirt for continuous upper-body pose estimation. Compared to previous all-textile motion-capturing garments that place the electrodes on the surface of clothing, our solution leverages existing seams inside of a shirt by machine-sewing insulated conductive threads over the seams. The unique invisibilities and placements of the seams afford the sensing shirt to look and wear the same as a conventional shirt while providing exciting pose-tracking capabilities. To validate this approach, we implemented a proof-of-concept untethered shirt. With eight capacitive sensing seams, our customized deep-learning pipeline accurately estimates the upper-body 3D joint positions relative to the pelvis. With a 12-participant user study, we demonstrated promising cross-user and cross-session tracking performance. SeamPose represents a step towards unobtrusive integration of smart clothing for everyday pose estimation. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.10794 [pdf, other]

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis

Authors: Yuping Lin, Pengfei He, Han Xu, Yue Xing, Makoto Yamada, Hui Liu, Jiliang Tang

Abstract: Large language models (LLMs) are susceptible to a type of attack known as jailbreaking, which misleads LLMs to output harmful contents. Although there are diverse jailbreak attack strategies, there is no unified understanding on why some methods succeed and others fail. This paper explores the behavior of harmful and harmless prompts in the LLM's representation space to investigate the intrinsic p… ▽ More Large language models (LLMs) are susceptible to a type of attack known as jailbreaking, which misleads LLMs to output harmful contents. Although there are diverse jailbreak attack strategies, there is no unified understanding on why some methods succeed and others fail. This paper explores the behavior of harmful and harmless prompts in the LLM's representation space to investigate the intrinsic properties of successful jailbreak attacks. We hypothesize that successful attacks share some similar properties: They are effective in moving the representation of the harmful prompt towards the direction to the harmless prompts. We leverage hidden representations into the objective of existing jailbreak attacks to move the attacks along the acceptance direction, and conduct experiments to validate the above hypothesis using the proposed objective. We hope this study provides new insights into understanding how LLMs understand harmfulness information. △ Less

Submitted 26 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

arXiv:2405.17229 [pdf, other]

InsigHTable: Insight-driven Hierarchical Table Visualization with Reinforcement Learning

Authors: Guozheng Li, Peng He, Xinyu Wang, Runfei Li, Chi Harold Liu, Chuangxin Ou, Dong He, Guoren Wang

Abstract: Embedding visual representations within original hierarchical tables can mitigate additional cognitive load stemming from the division of users' attention. The created hierarchical table visualizations can help users understand and explore complex data with multi-level attributes. However, because of many options available for transforming hierarchical tables and selecting subsets for embedding, t… ▽ More Embedding visual representations within original hierarchical tables can mitigate additional cognitive load stemming from the division of users' attention. The created hierarchical table visualizations can help users understand and explore complex data with multi-level attributes. However, because of many options available for transforming hierarchical tables and selecting subsets for embedding, the design space of hierarchical table visualizations becomes vast, and the construction process turns out to be tedious, hindering users from constructing hierarchical table visualizations with many data insights efficiently. We propose InsigHTable, a mixed-initiative and insight-driven hierarchical table transformation and visualization system. We first define data insights within hierarchical tables, which consider the hierarchical structure in the table headers. Since hierarchical table visualization construction is a sequential decision-making process, InsigHTable integrates a deep reinforcement learning framework incorporating an auxiliary rewards mechanism. This mechanism addresses the challenge of sparse rewards in constructing hierarchical table visualizations. Within the deep reinforcement learning framework, the agent continuously optimizes its decision-making process to create hierarchical table visualizations to uncover more insights by collaborating with analysts. We demonstrate the usability and effectiveness of InsigHTable through two case studies and sets of experiments. The results validate the effectiveness of the deep reinforcement learning framework and show that InsigHTable can facilitate users to construct hierarchical table visualizations and understand underlying data insights. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.04513 [pdf, other]

Switchable Decision: Dynamic Neural Generation Networks

Authors: Shujian Zhang, Korawat Tanwisuth, Chengyue Gong, Pengcheng He, Mingyuan Zhou

Abstract: Auto-regressive generation models achieve competitive performance across many different NLP tasks such as summarization, question answering, and classifications. However, they are also known for being slow in inference, which makes them challenging to deploy in real-time applications. We propose a switchable decision to accelerate inference by dynamically assigning computation resources for each d… ▽ More Auto-regressive generation models achieve competitive performance across many different NLP tasks such as summarization, question answering, and classifications. However, they are also known for being slow in inference, which makes them challenging to deploy in real-time applications. We propose a switchable decision to accelerate inference by dynamically assigning computation resources for each data instance. Automatically making decisions on where to skip and how to balance quality and computation cost with constrained optimization, our dynamic neural generation networks enforce the efficient inference path and determine the optimized trade-off. Experiments across question answering, summarization, and classification benchmarks show that our method benefits from less computation cost during inference while keeping the same accuracy. Extensive experiments and ablation studies demonstrate that our method can be general, effective, and beneficial for many NLP tasks. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: Accepted to ICML 2024

arXiv:2405.04133 [pdf, other]

Exposing AI-generated Videos: A Benchmark Dataset and a Local-and-Global Temporal Defect Based Detection Method

Authors: Peisong He, Leyao Zhu, Jiaxing Li, Shiqi Wang, Haoliang Li

Abstract: The generative model has made significant advancements in the creation of realistic videos, which causes security issues. However, this emerging risk has not been adequately addressed due to the absence of a benchmark dataset for AI-generated videos. In this paper, we first construct a video dataset using advanced diffusion-based video generation algorithms with various semantic contents. Besides,… ▽ More The generative model has made significant advancements in the creation of realistic videos, which causes security issues. However, this emerging risk has not been adequately addressed due to the absence of a benchmark dataset for AI-generated videos. In this paper, we first construct a video dataset using advanced diffusion-based video generation algorithms with various semantic contents. Besides, typical video lossy operations over network transmission are adopted to generate degraded samples. Then, by analyzing local and global temporal defects of current AI-generated videos, a novel detection framework by adaptively learning local motion information and global appearance variation is constructed to expose fake videos. Finally, experiments are conducted to evaluate the generalization and robustness of different spatial and temporal domain detection methods, where the results can serve as the baseline and demonstrate the research challenge for future studies. △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2405.03884 [pdf, other]

BadFusion: 2D-Oriented Backdoor Attacks against 3D Object Detection

Authors: Saket S. Chaturvedi, Lan Zhang, Wenbin Zhang, Pan He, Xiaoyong Yuan

Abstract: 3D object detection plays an important role in autonomous driving; however, its vulnerability to backdoor attacks has become evident. By injecting ''triggers'' to poison the training dataset, backdoor attacks manipulate the detector's prediction for inputs containing these triggers. Existing backdoor attacks against 3D object detection primarily poison 3D LiDAR signals, where large-sized 3D trigge… ▽ More 3D object detection plays an important role in autonomous driving; however, its vulnerability to backdoor attacks has become evident. By injecting ''triggers'' to poison the training dataset, backdoor attacks manipulate the detector's prediction for inputs containing these triggers. Existing backdoor attacks against 3D object detection primarily poison 3D LiDAR signals, where large-sized 3D triggers are injected to ensure their visibility within the sparse 3D space, rendering them easy to detect and impractical in real-world scenarios. In this paper, we delve into the robustness of 3D object detection, exploring a new backdoor attack surface through 2D cameras. Given the prevalent adoption of camera and LiDAR signal fusion for high-fidelity 3D perception, we investigate the latent potential of camera signals to disrupt the process. Although the dense nature of camera signals enables the use of nearly imperceptible small-sized triggers to mislead 2D object detection, realizing 2D-oriented backdoor attacks against 3D object detection is non-trivial. The primary challenge emerges from the fusion process that transforms camera signals into a 3D space, compromising the association with the 2D trigger to the target output. To tackle this issue, we propose an innovative 2D-oriented backdoor attack against LiDAR-camera fusion methods for 3D object detection, named BadFusion, for preserving trigger effectiveness throughout the entire fusion process. The evaluation demonstrates the effectiveness of BadFusion, achieving a significantly higher attack success rate compared to existing 2D-oriented attacks. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: Accepted at IJCAI 2024 Conference

arXiv:2405.03489 [pdf, other]

On the Influence of Data Resampling for Deep Learning-Based Log Anomaly Detection: Insights and Recommendations

Authors: Xiaoxue Ma, Huiqi Zou, Jacky Keung, Pinjia He, Yishu Li, Xiao Yu, Federica Sarro

Abstract: Numerous DL-based approaches have garnered considerable attention in the field of software Log Anomaly Detection. However, a practical challenge persists: the class imbalance in the public data commonly used to train the DL models. This imbalance is characterized by a substantial disparity in the number of abnormal log sequences compared to normal ones, for example, anomalies represent less than 1… ▽ More Numerous DL-based approaches have garnered considerable attention in the field of software Log Anomaly Detection. However, a practical challenge persists: the class imbalance in the public data commonly used to train the DL models. This imbalance is characterized by a substantial disparity in the number of abnormal log sequences compared to normal ones, for example, anomalies represent less than 1% of one of the most popular datasets. Previous research has indicated that existing DLLAD approaches may exhibit unsatisfactory performance, particularly when confronted with datasets featuring severe class imbalances. Mitigating class imbalance through data resampling has proven effective for other software engineering tasks, however, it has been unexplored for LAD thus far. This study aims to fill this gap by providing an in-depth analysis of the impact of diverse data resampling methods on existing DLLAD approaches from two distinct perspectives. Firstly, we assess the performance of these DLLAD approaches across three datasets and explore the impact of resampling ratios of normal to abnormal data on ten data resampling methods. Secondly, we evaluate the effectiveness of the data resampling methods when utilizing optimal resampling ratios of normal to abnormal data. Our findings indicate that oversampling methods generally outperform undersampling and hybrid methods. Data resampling on raw data yields superior results compared to data resampling in the feature space. In most cases, certain undersampling and hybrid methods show limited effectiveness. Additionally, by exploring the resampling ratio of normal to abnormal data, we suggest generating more data for minority classes through oversampling while removing less data from majority classes through undersampling. In conclusion, our study provides valuable insights into the intricate relationship between data resampling methods and DLLAD. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: 15 pages, 2 figures

arXiv:2405.00920 [pdf, other]

Identifying Halos in Cosmological Simulations with Continuous Wavelet Analysis: The 2D Case

Authors: Minxing Li, Yun Wang, Ping He

Abstract: Continuous wavelet analysis is gaining popularity in science and engineering for its ability to analyze data across spatial and scale domain simultaneously. In this study, we introduce a wavelet-based method to identify halos and assess its feasibility in two-dimensional (2D) scenarios. We begin with the generation of four pseudo-2D datasets from the SIMBA dark matter simulation by compressing thi… ▽ More Continuous wavelet analysis is gaining popularity in science and engineering for its ability to analyze data across spatial and scale domain simultaneously. In this study, we introduce a wavelet-based method to identify halos and assess its feasibility in two-dimensional (2D) scenarios. We begin with the generation of four pseudo-2D datasets from the SIMBA dark matter simulation by compressing thin slices of three-dimensional (3D) data into 2D. We then calculate the continuous wavelet transform (CWT) directly from the particle distributions, identify local maxima that represent actual halos, and segment the CWT to delineate halo boundaries. A comparison with the traditional Friends-of-Friends (FOF) method shows that our CWT-identified halos, while containing slightly fewer particles, have smoother boundaries and are more compact in dense regions. In contrast, the CWT method can link particles over greater distances to form halos in sparse regions due to its spatial segmentation scheme. The spatial distribution and halo power spectrum of both CWT and FOF halos demonstrate substantial consistency, validating the 2D applicability of CWT for halo detection. Our identification scheme operates with a linear time complexity of $\mathcal{O}(N)$, suggesting its suitability for analyzing significantly larger datasets in the future. △ Less

Submitted 3 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

Comments: 18 pages, 13 figures, 1 table, comments welcome

arXiv:2404.18512 [pdf, other]

Floquet Amorphous Topological Orders in a Rydberg Glass

Authors: Peng He, Jing-Xin Liu, Hong Wu, Z. D. Wang

Abstract: We study the Floquet amorphous topological orders in experimentally accessible one-dimensional array of randomly pointed Rydberg atoms with periodic driving. The filling factor in the chain is tunable by applying a microwave field. We give a complete characterization of the topological properties from both the single-particle and many-body aspect. The periodic driving results in richer topological… ▽ More We study the Floquet amorphous topological orders in experimentally accessible one-dimensional array of randomly pointed Rydberg atoms with periodic driving. The filling factor in the chain is tunable by applying a microwave field. We give a complete characterization of the topological properties from both the single-particle and many-body aspect. The periodic driving results in richer topological phases. At the single-particle level, we calculate the real space winding numbers and polarization, confirming robust amorphous topological phases with 0-type and $π$-type edge modes. We show a structural disorder induced topological phase transition acompanied with localization transition in the nonequilibrium system. Furthermore, in the many-body case we find the existence of amorphous topological orders of the hardcore bosons half-filled the chain, in contrast described by the topological entanglement entropy and the string order. Possible detection methods are also addressed. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: 7 pages, 4 figures

arXiv:2404.15838 [pdf, other]

doi 10.1140/epjc/s10052-024-12758-x

Abnormal threshold behaviors of photo-pion production off the proton in the GZK region

Authors: Ping He, Bo-Qiang Ma

Abstract: The confirmation of the existence of GZK cut-off was tortuous, leading to activities to explore new physics, such as the cosmic-ray new components, unidentified cosmic-ray origins, unknown propagation mechanism, and the modification of fundamental physics concepts like the tiny Lorentz invariance violation (LV). The confirmation of the GZK cut-off provides an opportunity to constrain the LV effect… ▽ More The confirmation of the existence of GZK cut-off was tortuous, leading to activities to explore new physics, such as the cosmic-ray new components, unidentified cosmic-ray origins, unknown propagation mechanism, and the modification of fundamental physics concepts like the tiny Lorentz invariance violation (LV). The confirmation of the GZK cut-off provides an opportunity to constrain the LV effect. We use a phenomenological framework to restudy the GZK mechanism under the Planck scale deformation of the proton and pion dispersion relations. Restudying the photon induced pion production of the proton $\mathrm{p}+γ\to\mathrm{p}+π^0$, we predict abnormal threshold behaviors of this reaction under different LV modifications. Therefore, we can study the LV effects not only from the conventional GZK cut-off, but also from potentially threshold anomalies of the pion production process. We divide the LV parameter space into three regions, and analyze the constraints from current observations in each region. The current observations have set strict constraints on a certain LV region. However, for others LV regions, further experimental observations and theoretical researches are still needed, and we also find survival space for some theoretical explorations that permit specific LV effects. △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: 9 latex pages, 3 figures, final version

Journal ref: Euro.Phys.J. C 84 (2024) 401

arXiv:2404.15819 [pdf, other]

APACHE: A Processing-Near-Memory Architecture for Multi-Scheme Fully Homomorphic Encryption

Authors: Lin Ding, Song Bian, Penggao He, Yan Xu, Gang Qu, Jiliang Zhang

Abstract: Fully Homomorphic Encryption (FHE) allows one to outsource computation over encrypted data to untrusted servers without worrying about data breaching. Since FHE is known to be extremely computationally-intensive, application-specific accelerators emerged as a powerful solution to narrow the performance gap. Nonetheless, due to the increasing complexities in FHE schemes per se and multi-scheme FHE… ▽ More Fully Homomorphic Encryption (FHE) allows one to outsource computation over encrypted data to untrusted servers without worrying about data breaching. Since FHE is known to be extremely computationally-intensive, application-specific accelerators emerged as a powerful solution to narrow the performance gap. Nonetheless, due to the increasing complexities in FHE schemes per se and multi-scheme FHE algorithm designs in end-to-end privacy-preserving tasks, existing FHE accelerators often face the challenges of low hardware utilization rates and insufficient memory bandwidth. In this work, we present APACHE, a layered near-memory computing hierarchy tailored for multi-scheme FHE acceleration. By closely inspecting the data flow across different FHE schemes, we propose a layered near-memory computing architecture with fine-grained functional unit design to significantly enhance the utilization rates of both computational resources and memory bandwidth. In addition, we propose a multi-scheme operator compiler to efficiently schedule high-level FHE computations across lower-level functional units. In the experiment, we evaluate APACHE on various FHE applications, such as Lola MNIST, HELR, fully-packed bootstrapping, and fully homomorphic processors. The results illustrate that APACHE outperforms the state-of-the-art ASIC FHE accelerators by 2.4x to 19.8x over a variety of operator and application benchmarks. △ Less

Submitted 24 April, 2024; originally announced April 2024.

arXiv:2404.13177 [pdf, other]

A Bayesian Hybrid Design with Borrowing from Historical Study

Authors: Zhaohua Lu, John Toso, Girma Ayele, Philip He

Abstract: In early phase drug development of combination therapy, the primary objective is to preliminarily assess whether there is additive activity when a novel agent combined with an established monotherapy. Due to potential feasibility issues with a large randomized study, uncontrolled single-arm trials have been the mainstream approach in cancer clinical trials. However, such trials often present signi… ▽ More In early phase drug development of combination therapy, the primary objective is to preliminarily assess whether there is additive activity when a novel agent combined with an established monotherapy. Due to potential feasibility issues with a large randomized study, uncontrolled single-arm trials have been the mainstream approach in cancer clinical trials. However, such trials often present significant challenges in deciding whether to proceed to the next phase of development. A hybrid design, leveraging data from a completed historical clinical study of the monotherapy, offers a valuable option to enhance study efficiency and improve informed decision-making. Compared to traditional single-arm designs, the hybrid design may significantly enhance power by borrowing external information, enabling a more robust assessment of activity. The primary challenge of hybrid design lies in handling information borrowing. We introduce a Bayesian dynamic power prior (DPP) framework with three components of controlling amount of dynamic borrowing. The framework offers flexible study design options with explicit interpretation of borrowing, allowing customization according to specific needs. Furthermore, the posterior distribution in the proposed framework has a closed form, offering significant advantages in computational efficiency. The proposed framework's utility is demonstrated through simulations and a case study. △ Less

Submitted 29 April, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

arXiv:2404.11255 [pdf, other]

Turbulence revealed by wavelet transform: power spectrum and intermittency for the velocity field of the cosmic baryonic fluid

Authors: Yun Wang, Ping He

Abstract: We use continuous wavelet transform techniques to construct the global and environment-dependent wavelet statistics, such as energy spectrum and kurtosis, to study the fluctuation and intermittency of the turbulent motion in the cosmic fluid velocity field with the IllustrisTNG simulation data. We find that the peak scales of the energy spectrum and the spectral ratio define two characteristic sca… ▽ More We use continuous wavelet transform techniques to construct the global and environment-dependent wavelet statistics, such as energy spectrum and kurtosis, to study the fluctuation and intermittency of the turbulent motion in the cosmic fluid velocity field with the IllustrisTNG simulation data. We find that the peak scales of the energy spectrum and the spectral ratio define two characteristic scales, which can be regarded as the integral scale and the dissipation scale of turbulence, respectively, so that the energy spectrum can be divided into the energy-containing range, the inertial range and the dissipation range of turbulence. The wavelet kurtosis is an increasing function of the wavenumber $k$, first grows rapidly then slowly with $k$, indicating that the cosmic fluid becomes increasingly intermittent with $k$. In the energy-containing range, the energy spectrum increases significantly from $z = 2$ to $1$, but remains almost unchanged from $z = 1$ to $0$. We find that both the environment-dependent spectrum and kurtosis are similar to the global ones, and the magnitude of the spectrum is smallest in the lowest-density and largest in the highest-density environment, suggesting that the cosmic fluid is more turbulent in a high-density than in a low-density environment. In the inertial range, the energy spectrum's exponent is steeper than both the Kolmogorov and Burgers exponents, indicating more efficient energy transfer compared to Kolmogorov or Burgers turbulence. △ Less

Submitted 10 July, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

Comments: 19 pages, 11 figures, 2 tables, submitted to the ApJ

arXiv:2404.08877 [pdf, other]

Aligning LLMs for FL-free Program Repair

Authors: Junjielong Xu, Ying Fu, Shin Hwei Tan, Pinjia He

Abstract: Large language models (LLMs) have achieved decent results on automated program repair (APR). However, the next token prediction training objective of decoder-only LLMs (e.g., GPT-4) is misaligned with the masked span prediction objective of current infilling-style methods, which impedes LLMs from fully leveraging pre-trained knowledge for program repair. In addition, while some LLMs are capable of… ▽ More Large language models (LLMs) have achieved decent results on automated program repair (APR). However, the next token prediction training objective of decoder-only LLMs (e.g., GPT-4) is misaligned with the masked span prediction objective of current infilling-style methods, which impedes LLMs from fully leveraging pre-trained knowledge for program repair. In addition, while some LLMs are capable of locating and repairing bugs end-to-end when using the related artifacts (e.g., test cases) as input, existing methods regard them as separate tasks and ask LLMs to generate patches at fixed locations. This restriction hinders LLMs from exploring potential patches beyond the given locations. In this paper, we investigate a new approach to adapt LLMs to program repair. Our core insight is that LLM's APR capability can be greatly improved by simply aligning the output to their training objective and allowing them to refine the whole program without first performing fault localization. Based on this insight, we designed D4C, a straightforward prompting framework for APR. D4C can repair 180 bugs correctly in Defects4J, with each patch being sampled only 10 times. This surpasses the SOTA APR methods with perfect fault localization by 10% and reduces the patch sampling number by 90%. Our findings reveal that (1) objective alignment is crucial for fully exploiting LLM's pre-trained capability, and (2) replacing the traditional localize-then-repair workflow with direct debugging is more effective for LLM-based APR methods. Thus, we believe this paper introduces a new mindset for harnessing LLMs in APR. △ Less

Submitted 12 April, 2024; originally announced April 2024.

arXiv:2403.17574 [pdf, other]

SPES: Towards Optimizing Performance-Resource Trade-Off for Serverless Functions

Authors: Cheryl Lee, Zhouruixin Zhu, Tianyi Yang, Yintong Huo, Yuxin Su, Pinjia He, Michael R. Lyu

Abstract: As an emerging cloud computing deployment paradigm, serverless computing is gaining traction due to its efficiency and ability to harness on-demand cloud resources. However, a significant hurdle remains in the form of the cold start problem, causing latency when launching new function instances from scratch. Existing solutions tend to use over-simplistic strategies for function pre-loading/unloadi… ▽ More As an emerging cloud computing deployment paradigm, serverless computing is gaining traction due to its efficiency and ability to harness on-demand cloud resources. However, a significant hurdle remains in the form of the cold start problem, causing latency when launching new function instances from scratch. Existing solutions tend to use over-simplistic strategies for function pre-loading/unloading without full invocation pattern exploitation, rendering unsatisfactory optimization of the trade-off between cold start latency and resource waste. To bridge this gap, we propose SPES, the first differentiated scheduler for runtime cold start mitigation by optimizing serverless function provision. Our insight is that the common architecture of serverless systems prompts the concentration of certain invocation patterns, leading to predictable invocation behaviors. This allows us to categorize functions and pre-load/unload proper function instances with finer-grained strategies based on accurate invocation prediction. Experiments demonstrate the success of SPES in optimizing serverless function provision on both sides: reducing the 75th-percentile cold start rates by 49.77% and the wasted memory time by 56.43%, compared to the state-of-the-art. By mitigating the cold start issue, SPES is a promising advancement in facilitating cloud services deployed on serverless architectures. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: 12 pages, accepted by ICDE 2024 (40th IEEE International Conference on Data Engineering)

arXiv:2403.12389 [pdf, other]

Learning-guided iterated local search for the minmax multiple traveling salesman problem

Authors: Pengfei He, Jin-Kao Hao, Jinhui Xia

Abstract: The minmax multiple traveling salesman problem involves minimizing the longest tour among a set of tours. The problem is of great practical interest because it can be used to formulate several real-life applications. To solve this computationally challenging problem, we propose a leaning-driven iterated local search approach that combines an aggressive local search procedure with a probabilistic a… ▽ More The minmax multiple traveling salesman problem involves minimizing the longest tour among a set of tours. The problem is of great practical interest because it can be used to formulate several real-life applications. To solve this computationally challenging problem, we propose a leaning-driven iterated local search approach that combines an aggressive local search procedure with a probabilistic acceptance criterion to find high-quality local optimal solutions and a multi-armed bandit algorithm to select various removal and insertion operators to escape local optimal traps. Extensive experiments on 77 commonly used benchmark instances show that our algorithm achieves excellent results in terms of solution quality and running time. In particular, it achieves 32 new best-known results and matches the best-known results for 35 other instances. Additional experiments shed light on the understanding of the composing elements of the algorithm. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.11773 [pdf, other]

Scaling limit of heavy tailed nearly unstable cumulative INAR($\infty$) processes and rough fractional diffusions

Authors: Yingli Wang, Chunhao Cai, Ping He

Abstract: In this paper, we investigated the scaling limit of heavy-tailed unstable cumulative INAR($\infty$) processes. These processes exhibit a power law tail of the form $n^{-(1+α)}$, with $α\in (\frac{1}{2}, 1)$, where the $\ell^1$ norm of the kernel vector is close to $1$. The result is in contrast to scaling limit of the continuous-time heavy tailed unstable Hawkes processes and the one of INAR($p$)… ▽ More In this paper, we investigated the scaling limit of heavy-tailed unstable cumulative INAR($\infty$) processes. These processes exhibit a power law tail of the form $n^{-(1+α)}$, with $α\in (\frac{1}{2}, 1)$, where the $\ell^1$ norm of the kernel vector is close to $1$. The result is in contrast to scaling limit of the continuous-time heavy tailed unstable Hawkes processes and the one of INAR($p$) processes. We show that the discrete-time scaling limit also has long-memory property and can also be seen as an integrated fractional Cox-Ingersoll-Ross process. △ Less

Submitted 16 April, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

Comments: arXiv admin note: text overlap with arXiv:1504.03100 by other authors

MSC Class: 60G22; 60F05

arXiv:2403.09361 [pdf, other]

A Multi-population Integrated Approach for Capacitated Location Routing

Authors: Pengfei He, Jin-Kao Hao, Qinghua Wu

Abstract: The capacitated location-routing problem involves determining the depots from a set of candidate capacitated depot locations and finding the required routes from the selected depots to serve a set of customers whereas minimizing a cost function that includes the cost of opening the chosen depots, the fixed utilization cost per vehicle used, and the total cost (distance) of the routes. This paper p… ▽ More The capacitated location-routing problem involves determining the depots from a set of candidate capacitated depot locations and finding the required routes from the selected depots to serve a set of customers whereas minimizing a cost function that includes the cost of opening the chosen depots, the fixed utilization cost per vehicle used, and the total cost (distance) of the routes. This paper presents a multi-population integrated framework in which a multi-depot edge assembly crossover generates promising offspring solutions from the perspective of both depot location and route edge assembly. The method includes an effective neighborhood-based local search, a feasibility-restoring procedure and a diversification-oriented mutation. Of particular interest is the multi-population scheme which organizes the population into multiple subpopulations based on depot configurations. Extensive experiments on 281 benchmark instances from the literature show that the algorithm performs remarkably well, by improving 101 best-known results (new upper bounds) and matching 84 best-known results. Additional experiments are presented to gain insight into the role of the key elements of the algorithm. △ Less

Submitted 14 March, 2024; originally announced March 2024.

arXiv:2403.07252 [pdf, ps, other]

Serre functors and complete torsion pairs

Authors: Zhe Han, Ping He

Abstract: Given a torsion pair $(\mathcal{T},\mathcal{F})$ in an abelian category $\mathcal{A}$, there is a t-structure $(\mathcal{U}_\mathcal{T},\mathcal{V}_\mathcal{T})$ determined by $\mathcal{T}$ on the derived category $D^b(\mathcal{A})$. The existence of derived equivalence between heart $\mathcal{B}$ of the t-structure and $\mathcal{A}$ which naturally extends the embedding… ▽ More Given a torsion pair $(\mathcal{T},\mathcal{F})$ in an abelian category $\mathcal{A}$, there is a t-structure $(\mathcal{U}_\mathcal{T},\mathcal{V}_\mathcal{T})$ determined by $\mathcal{T}$ on the derived category $D^b(\mathcal{A})$. The existence of derived equivalence between heart $\mathcal{B}$ of the t-structure and $\mathcal{A}$ which naturally extends the embedding $\mathcal{B}\to D^b(\mathcal{A})$ is determined by the completeness of the torsion pair [6]. When $\mathcal{A}$ is the module category of a finite-dimensional hereditary algebra and $\mathcal{U}_\mathcal{T}$ is closed under Serre functor, then there exists a triangle equivalence $D^b(\mathcal{B})\to D^b(\mathcal{A})$ [21]. In this case, we give a straightforward proof of the fact torsion pair $(\mathcal{T},\mathcal{F})$ is complete if and only if $\mathcal{U}_\mathcal{T}$ is closed under the Serre functor. △ Less

Submitted 11 March, 2024; originally announced March 2024.

Comments: 18pages

arXiv:2403.06884 [pdf, other]

A Holistic Framework Towards Vision-based Traffic Signal Control with Microscopic Simulation

Authors: Pan He, Quanyi Li, Xiaoyong Yuan, Bolei Zhou

Abstract: Traffic signal control (TSC) is crucial for reducing traffic congestion that leads to smoother traffic flow, reduced idling time, and mitigated CO2 emissions. In this study, we explore the computer vision approach for TSC that modulates on-road traffic flows through visual observation. Unlike traditional feature-based approaches, vision-based methods depend much less on heuristics and predefined f… ▽ More Traffic signal control (TSC) is crucial for reducing traffic congestion that leads to smoother traffic flow, reduced idling time, and mitigated CO2 emissions. In this study, we explore the computer vision approach for TSC that modulates on-road traffic flows through visual observation. Unlike traditional feature-based approaches, vision-based methods depend much less on heuristics and predefined features, bringing promising potentials for end-to-end learning and optimization of traffic signals. Thus, we introduce a holistic traffic simulation framework called TrafficDojo towards vision-based TSC and its benchmarking by integrating the microscopic traffic flow provided in SUMO into the driving simulator MetaDrive. This proposed framework offers a versatile traffic environment for in-depth analysis and comprehensive evaluation of traffic signal controllers across diverse traffic conditions and scenarios. We establish and compare baseline algorithms including both traditional and Reinforecment Learning (RL) approaches. This work sheds insights into the design and development of vision-based TSC approaches and open up new research opportunities. All the code and baselines will be made publicly available. △ Less

Submitted 11 March, 2024; originally announced March 2024.

Comments: Under review for IEEE publications

arXiv:2403.04861 [pdf, other]

A Survey of Lottery Ticket Hypothesis

Authors: Bohan Liu, Zijie Zhang, Peixiong He, Zhensen Wang, Yang Xiao, Ruimeng Ye, Yang Zhou, Wei-Shinn Ku, Bo Hui

Abstract: The Lottery Ticket Hypothesis (LTH) states that a dense neural network model contains a highly sparse subnetwork (i.e., winning tickets) that can achieve even better performance than the original model when trained in isolation. While LTH has been proved both empirically and theoretically in many works, there still are some open issues, such as efficiency and scalability, to be addressed. Also, th… ▽ More The Lottery Ticket Hypothesis (LTH) states that a dense neural network model contains a highly sparse subnetwork (i.e., winning tickets) that can achieve even better performance than the original model when trained in isolation. While LTH has been proved both empirically and theoretically in many works, there still are some open issues, such as efficiency and scalability, to be addressed. Also, the lack of open-source frameworks and consensual experimental setting poses a challenge to future research on LTH. We, for the first time, examine previous research and studies on LTH from different perspectives. We also discuss issues in existing works and list potential directions for further exploration. This survey aims to provide an in-depth look at the state of LTH and develop a duly maintained platform to conduct experiments and compare with the most updated baselines. △ Less

Submitted 12 March, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

arXiv:2402.16893 [pdf, other]

The Good and The Bad: Exploring Privacy Issues in Retrieval-Augmented Generation (RAG)

Authors: Shenglai Zeng, Jiankun Zhang, Pengfei He, Yue Xing, Yiding Liu, Han Xu, Jie Ren, Shuaiqiang Wang, Dawei Yin, Yi Chang, Jiliang Tang

Abstract: Retrieval-augmented generation (RAG) is a powerful technique to facilitate language model with proprietary and private data, where data privacy is a pivotal concern. Whereas extensive research has demonstrated the privacy risks of large language models (LLMs), the RAG technique could potentially reshape the inherent behaviors of LLM generation, posing new privacy issues that are currently under-ex… ▽ More Retrieval-augmented generation (RAG) is a powerful technique to facilitate language model with proprietary and private data, where data privacy is a pivotal concern. Whereas extensive research has demonstrated the privacy risks of large language models (LLMs), the RAG technique could potentially reshape the inherent behaviors of LLM generation, posing new privacy issues that are currently under-explored. In this work, we conduct extensive empirical studies with novel attack methods, which demonstrate the vulnerability of RAG systems on leaking the private retrieval database. Despite the new risk brought by RAG on the retrieval data, we further reveal that RAG can mitigate the leakage of the LLMs' training data. Overall, we provide new insights in this paper for privacy protection of retrieval-augmented LLMs, which benefit both LLMs and RAG systems builders. Our code is available at https://github.com/phycholosogy/RAG-privacy. △ Less

Submitted 23 February, 2024; originally announced February 2024.

arXiv:2402.12958 [pdf, other]

Go Static: Contextualized Logging Statement Generation

Authors: Yichen Li, Yintong Huo, Renyi Zhong, Zhihan Jiang, Jinyang Liu, Junjie Huang, Jiazhen Gu, Pinjia He, Michael R. Lyu

Abstract: Logging practices have been extensively investigated to assist developers in writing appropriate logging statements for documenting software behaviors. Although numerous automatic logging approaches have been proposed, their performance remains unsatisfactory due to the constraint of the single-method input, without informative programming context outside the method. Specifically, we identify thre… ▽ More Logging practices have been extensively investigated to assist developers in writing appropriate logging statements for documenting software behaviors. Although numerous automatic logging approaches have been proposed, their performance remains unsatisfactory due to the constraint of the single-method input, without informative programming context outside the method. Specifically, we identify three inherent limitations with single-method context: limited static scope of logging statements, inconsistent logging styles, and missing type information of logging variables. To tackle these limitations, we propose SCLogger, the first contextualized logging statement generation approach with inter-method static contexts. First, SCLogger extracts inter-method contexts with static analysis to construct the contextualized prompt for language models to generate a tentative logging statement. The contextualized prompt consists of an extended static scope and sampled similar methods, ordered by the chain-of-thought (COT) strategy. Second, SCLogger refines the access of logging variables by formulating a new refinement prompt for language models, which incorporates detailed type information of variables in the tentative logging statement. The evaluation results show that SCLogger surpasses the state-of-the-art approach by 8.7% in logging position accuracy, 32.1% in level accuracy, 19.6% in variable precision, and 138.4% in text BLEU-4 score. Furthermore, SCLogger consistently boosts the performance of logging statement generation across a range of large language models, thereby showcasing the generalizability of this approach. △ Less

Submitted 20 February, 2024; originally announced February 2024.

Comments: This paper was accepted by The ACM International Conference on the Foundations of Software Engineering (FSE 2024)

arXiv:2402.12085 [pdf, other]

doi 10.1016/j.physletb.2024.138546

Principle of multi-critical-points in the ALP-Higgs model and the corresponding phase transition

Authors: Jiyuan Ke, Minxing Li, Ping He

Abstract: The principle of multi-critical-points (PMCP) may be a convincing approach to determine the emerging parameter values in different kinds of beyond-standard-model (BSM) models. This could certainly be applied to solve the problem of undetermined new parameters in the ALP-Higgs interaction models. In this paper, we apply this principle to such model and investigate whether there are suitable solutio… ▽ More The principle of multi-critical-points (PMCP) may be a convincing approach to determine the emerging parameter values in different kinds of beyond-standard-model (BSM) models. This could certainly be applied to solve the problem of undetermined new parameters in the ALP-Higgs interaction models. In this paper, we apply this principle to such model and investigate whether there are suitable solutions. Then, using the 1-loop effective potential, we study the phase transition property of this model under the PMCP requirement. It is gratifying to find that under the requirement of PMCP, the phase transition can be not only first-order, but also strong enough to serve as a solution for electroweak baryongenesis (EWBG). Finally, we show the parameter space of ALP and provide the parameter range that leads to the first-order phase transition. △ Less

Submitted 20 February, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

Comments: 6 pages, 3 figures

arXiv:2402.11825 [pdf, other]

Photoelectron Polarization Vortexes in Strong-Field Ionization

Authors: Pei-Lun He, Zhao-Han Zhang, Karen Z. Hatsagortsyan, Christoph H. Keitel

Abstract: The spin polarization of photoelectrons induced by an intense linearly polarized laser field is investigated using numerical solutions of the time-dependent Schrödinger equation in companion with our analytic treatment via the spin-resolved strong-field approximation and classical trajectory Monte Carlo simulations. We demonstrate that, even though the total polarization vanishes upon averaging ov… ▽ More The spin polarization of photoelectrons induced by an intense linearly polarized laser field is investigated using numerical solutions of the time-dependent Schrödinger equation in companion with our analytic treatment via the spin-resolved strong-field approximation and classical trajectory Monte Carlo simulations. We demonstrate that, even though the total polarization vanishes upon averaging over the photoelectron momentum, momentum-resolved spin polarization is significant, typically exhibiting a vortex structure relative to the laser polarization axis. The polarization arises from the transfer of spin-orbital coupling in the bound state to the spin-correlated quantum orbits in the continuum. The rescattering of photoelectrons at the atomic core plays an important role in forming the polarization vortex structure, while there is no significant effect of the spin-orbit coupling during the continuum dynamics. Furthermore, spin-polarized electron holography is demonstrated, feasible for extracting fine structural information about the atom. △ Less

Submitted 18 February, 2024; originally announced February 2024.

Comments: 6 pages, 4 figures

arXiv:2402.10907 [pdf]

Optically Levitated Nanoparticles as Receiving Antennas for Low Frequency Wireless Communication

Authors: Zhenhai Fu, Jinsheng Xu, Shaochong Zhu, Chaoxiong He, Xunming Zhu, Xiaowen Gao, Han Cai, Peitong He, Zhiming Chen, Yizhou Zhang, Nan Li, Xingfan Chen, Ying Dong, Shiyao Zhu, Cheng Liu, Huizhu Hu

Abstract: Low-frequency (LF) wireless communications play a crucial role in ensuring anti-interference, long-range, and efficient communication across various environments. However, in conventional LF communication systems, their antenna size is required to be inversely proportional to the wavelength, so that their mobility and flexibility are greatly limited. Here we introduce a novel prototype of LF recei… ▽ More Low-frequency (LF) wireless communications play a crucial role in ensuring anti-interference, long-range, and efficient communication across various environments. However, in conventional LF communication systems, their antenna size is required to be inversely proportional to the wavelength, so that their mobility and flexibility are greatly limited. Here we introduce a novel prototype of LF receiving antennas based on optically levitated nanoparticles, which overcomes the size-frequency limitation to reduce the antenna size to the hundred-nanometer scale. These charged particles are extremely sensitive to external electric field as mechanical resonators, and their resonant frequencies are adjustable. The effectiveness of these antennas was experimentally demonstrated by using the frequency shift keying (2FSK) modulation scheme. The experimental results indicate a correlation between error rate and factors such as transmission rate, signal strength, and vacuum degree with a signal strength of approximately 0.1V/m and a bit error rate below 0.1%. This advancement in leveraging levitated particle mechanical resonators (LPMRs) as LF antennas marks a significant stride in long-distance communication technology. △ Less

Submitted 10 January, 2024; originally announced February 2024.

arXiv:2402.02777 [pdf, other]

doi 10.1103/PhysRevA.109.053314

Realizing and detecting Stiefel-Whitney insulators in an optical Raman lattice

Authors: Jian-Te Wang, Jing-Xin Liu, Hai-Tao Ding, Peng He

Abstract: We propose a feasible scheme to realize a four-band Stiefel-Whitney insultor (SWI) with spin-orbit coupled ultracold atoms in an optical Raman lattice. Four selected spin states are coupled by carefully designed Raman lasers, to generate the desired spin-orbit interactions with spacetime inversion symmetry. We map out a phase diagram with respect to the experimental parameters, where a large topol… ▽ More We propose a feasible scheme to realize a four-band Stiefel-Whitney insultor (SWI) with spin-orbit coupled ultracold atoms in an optical Raman lattice. Four selected spin states are coupled by carefully designed Raman lasers, to generate the desired spin-orbit interactions with spacetime inversion symmetry. We map out a phase diagram with respect to the experimental parameters, where a large topological phase region exists. We further present two distinct detection methods to resolve the non-abelian band topology, in both equilibrium and dynamical ways. The detection relies on the spin textures extracted from the time-of-flight imaging, showing the tomographic signatures in the ground states and long-time averaged patterns on certain submanifold via a bulk-surface duality. Our work paves a realistic way to explore novel real topology with quantum matters. △ Less

Submitted 5 February, 2024; originally announced February 2024.

Journal ref: Phys. Rev. A 109, 053314(2024)

arXiv:2402.02333 [pdf, other]

Copyright Protection in Generative AI: A Technical Perspective

Authors: Jie Ren, Han Xu, Pengfei He, Yingqian Cui, Shenglai Zeng, Jiankun Zhang, Hongzhi Wen, Jiayuan Ding, Hui Liu, Yi Chang, Jiliang Tang

Abstract: Generative AI has witnessed rapid advancement in recent years, expanding their capabilities to create synthesized content such as text, images, audio, and code. The high fidelity and authenticity of contents generated by these Deep Generative Models (DGMs) have sparked significant copyright concerns. There have been various legal debates on how to effectively safeguard copyrights in DGMs. This wor… ▽ More Generative AI has witnessed rapid advancement in recent years, expanding their capabilities to create synthesized content such as text, images, audio, and code. The high fidelity and authenticity of contents generated by these Deep Generative Models (DGMs) have sparked significant copyright concerns. There have been various legal debates on how to effectively safeguard copyrights in DGMs. This work delves into this issue by providing a comprehensive overview of copyright protection from a technical perspective. We examine from two distinct viewpoints: the copyrights pertaining to the source data held by the data owners and those of the generative models maintained by the model builders. For data copyright, we delve into methods data owners can protect their content and DGMs can be utilized without infringing upon these rights. For model copyright, our discussion extends to strategies for preventing model theft and identifying outputs generated by specific models. Finally, we highlight the limitations of existing techniques and identify areas that remain unexplored. Furthermore, we discuss prospective directions for the future of copyright protection, underscoring its importance for the sustainable and ethical development of Generative AI. △ Less

Submitted 3 February, 2024; originally announced February 2024.

Comments: 26 pages

arXiv:2402.02160 [pdf, other]

Data Poisoning for In-context Learning

Authors: Pengfei He, Han Xu, Yue Xing, Hui Liu, Makoto Yamada, Jiliang Tang

Abstract: In the domain of large language models (LLMs), in-context learning (ICL) has been recognized for its innovative ability to adapt to new tasks, relying on examples rather than retraining or fine-tuning. This paper delves into the critical issue of ICL's susceptibility to data poisoning attacks, an area not yet fully explored. We wonder whether ICL is vulnerable, with adversaries capable of manipula… ▽ More In the domain of large language models (LLMs), in-context learning (ICL) has been recognized for its innovative ability to adapt to new tasks, relying on examples rather than retraining or fine-tuning. This paper delves into the critical issue of ICL's susceptibility to data poisoning attacks, an area not yet fully explored. We wonder whether ICL is vulnerable, with adversaries capable of manipulating example data to degrade model performance. To address this, we introduce ICLPoison, a specialized attacking framework conceived to exploit the learning mechanisms of ICL. Our approach uniquely employs discrete text perturbations to strategically influence the hidden states of LLMs during the ICL process. We outline three representative strategies to implement attacks under our framework, each rigorously evaluated across a variety of models and tasks. Our comprehensive tests, including trials on the sophisticated GPT-4 model, demonstrate that ICL's performance is significantly compromised under our framework. These revelations indicate an urgent need for enhanced defense mechanisms to safeguard the integrity and reliability of LLMs in applications relying on in-context learning. △ Less

Submitted 27 March, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

arXiv:2401.17426 [pdf, other]

Superiority of Multi-Head Attention in In-Context Linear Regression

Authors: Yingqian Cui, Jie Ren, Pengfei He, Jiliang Tang, Yue Xing

Abstract: We present a theoretical analysis of the performance of transformer with softmax attention in in-context learning with linear regression tasks. While the existing literature predominantly focuses on the convergence of transformers with single-/multi-head attention, our research centers on comparing their performance. We conduct an exact theoretical analysis to demonstrate that multi-head attention… ▽ More We present a theoretical analysis of the performance of transformer with softmax attention in in-context learning with linear regression tasks. While the existing literature predominantly focuses on the convergence of transformers with single-/multi-head attention, our research centers on comparing their performance. We conduct an exact theoretical analysis to demonstrate that multi-head attention with a substantial embedding dimension performs better than single-head attention. When the number of in-context examples D increases, the prediction loss using single-/multi-head attention is in O(1/D), and the one for multi-head attention has a smaller multiplicative constant. In addition to the simplest data distribution setting, we consider more scenarios, e.g., noisy labels, local examples, correlated features, and prior knowledge. We observe that, in general, multi-head attention is preferred over single-head attention. Our results verify the effectiveness of the design of multi-head attention in the transformer architecture. △ Less

Submitted 30 January, 2024; originally announced January 2024.

arXiv:2401.05986 [pdf, other]

LogPTR: Variable-Aware Log Parsing with Pointer Network

Authors: Yifan Wu, Bingxu Chai, Siyu Yu, Ying Li, Pinjia He, Wei Jiang, Jianguo Li

Abstract: Due to the sheer size of software logs, developers rely on automated log analysis. Log parsing, which parses semi-structured logs into a structured format, is a prerequisite of automated log analysis. However, existing log parsers are unsatisfactory when applied in practice because: 1) they ignore categories of variables, and 2) have poor generalization ability. To address the limitations of exist… ▽ More Due to the sheer size of software logs, developers rely on automated log analysis. Log parsing, which parses semi-structured logs into a structured format, is a prerequisite of automated log analysis. However, existing log parsers are unsatisfactory when applied in practice because: 1) they ignore categories of variables, and 2) have poor generalization ability. To address the limitations of existing approaches, we propose LogPTR, the first end-to-end variable-aware log parser that can extract the static and dynamic parts in logs, and further identify the categories of variables. The key of LogPTR is using pointer network to copy words from the log message. We have performed extensive experiments on 16 public log datasets and the results show that LogPTR outperforms state-of-the-art log parsers both on general log parsing that extracts the log template and variable-aware log parsing that further identifies the category of variables. △ Less

Submitted 11 January, 2024; originally announced January 2024.

arXiv:2401.01912 [pdf, other]

Shrinking Your TimeStep: Towards Low-Latency Neuromorphic Object Recognition with Spiking Neural Network

Authors: Yongqi Ding, Lin Zuo, Mengmeng Jing, Pei He, Yongjun Xiao

Abstract: Neuromorphic object recognition with spiking neural networks (SNNs) is the cornerstone of low-power neuromorphic computing. However, existing SNNs suffer from significant latency, utilizing 10 to 40 timesteps or more, to recognize neuromorphic objects. At low latencies, the performance of existing SNNs is drastically degraded. In this work, we propose the Shrinking SNN (SSNN) to achieve low-latenc… ▽ More Neuromorphic object recognition with spiking neural networks (SNNs) is the cornerstone of low-power neuromorphic computing. However, existing SNNs suffer from significant latency, utilizing 10 to 40 timesteps or more, to recognize neuromorphic objects. At low latencies, the performance of existing SNNs is drastically degraded. In this work, we propose the Shrinking SNN (SSNN) to achieve low-latency neuromorphic object recognition without reducing performance. Concretely, we alleviate the temporal redundancy in SNNs by dividing SNNs into multiple stages with progressively shrinking timesteps, which significantly reduces the inference latency. During timestep shrinkage, the temporal transformer smoothly transforms the temporal scale and preserves the information maximally. Moreover, we add multiple early classifiers to the SNN during training to mitigate the mismatch between the surrogate gradient and the true gradient, as well as the gradient vanishing/exploding, thus eliminating the performance degradation at low latency. Extensive experiments on neuromorphic datasets, CIFAR10-DVS, N-Caltech101, and DVS-Gesture have revealed that SSNN is able to improve the baseline accuracy by 6.55% ~ 21.41%. With only 5 average timesteps and without any data augmentation, SSNN is able to achieve an accuracy of 73.63% on CIFAR10-DVS. This work presents a heterogeneous temporal scale SNN and provides valuable insights into the development of high-performance, low-latency SNNs. △ Less

Submitted 1 January, 2024; originally announced January 2024.

Comments: Accepted by AAAI 2024

arXiv:2401.00757 [pdf, other]

A & B == B & A: Triggering Logical Reasoning Failures in Large Language Models

Authors: Yuxuan Wan, Wenxuan Wang, Yiliu Yang, Youliang Yuan, Jen-tse Huang, Pinjia He, Wenxiang Jiao, Michael R. Lyu

Abstract: Recent advancements in large language models (LLMs) have propelled Artificial Intelligence (AI) to new heights, enabling breakthroughs in various tasks such as writing assistance, code generation, and machine translation. A significant distinction of advanced LLMs, such as ChatGPT, is their demonstrated ability to "reason." However, evaluating the reasoning ability of LLMs remains a challenge as m… ▽ More Recent advancements in large language models (LLMs) have propelled Artificial Intelligence (AI) to new heights, enabling breakthroughs in various tasks such as writing assistance, code generation, and machine translation. A significant distinction of advanced LLMs, such as ChatGPT, is their demonstrated ability to "reason." However, evaluating the reasoning ability of LLMs remains a challenge as most existing evaluations focus on their accuracy on the downstream tasks rather than directly assessing their reasoning processes. Efforts have been made to develop benchmarks and metrics to assess reasoning in LLMs, but they suffer from data leakage or limited scope. In this paper, we introduce LogicAsker, an automatic approach that comprehensively evaluates and improves the logical reasoning abilities of LLMs under a set of atomic reasoning skills based on propositional and predicate logic. The results provide insights into LLMs' reasoning abilities and reveal the logical rules the LLMs did not learn well. We evaluate LogicAsker on six widely deployed LLMs, including GPT-3, ChatGPT, GPT-4, Bard, Vicuna, and Guanaco. The results show that test cases from LogicAsker can find logical reasoning failures in different LLMs with a rate of 25\% - 94\%. In addition, the test cases of LogicAsker can be further used to design demonstration examples for in-context learning, which effectively improves the logical reasoning ability of LLMs, e.g., 10\% for GPT-4. As far as we know, our work is the first to create prompts based on testing results to improve LLMs' formal reasoning ability effectively. All the code, data, and results will be released for reproduction and future research. △ Less

Submitted 1 January, 2024; originally announced January 2024.

arXiv:2312.15352 [pdf]

A Bayesian Basket Trial Design Using Local Power Prior

Authors: Haiming Zhou, Rex Shen, Sutan Wu, Philip He

Abstract: In recent years, basket trials, which enable the evaluation of an experimental therapy across multiple tumor types within a single protocol, have gained prominence in early-phase oncology development. Unlike traditional trials, where each tumor type is evaluated separately with limited sample size, basket trials offer the advantage of borrowing information across various tumor types. However, a ke… ▽ More In recent years, basket trials, which enable the evaluation of an experimental therapy across multiple tumor types within a single protocol, have gained prominence in early-phase oncology development. Unlike traditional trials, where each tumor type is evaluated separately with limited sample size, basket trials offer the advantage of borrowing information across various tumor types. However, a key challenge in designing basket trials lies in dynamically determining the extent of information borrowing across tumor types to enhance statistical power while maintaining an acceptable type I error rate. In this paper, we propose a local power prior framework that includes a 3-component borrowing mechanism with explicit model interpretation. Unlike many existing Bayesian methods that require Markov Chain Monte Carlo (MCMC) sampling, the proposed framework offers a closed-form solution, eliminating the time-consuming nature of MCMC in large-scale simulations for evaluating operating characteristics. Extensive simulations have been conducted and demonstrated a good performance of the proposal method comparable to the other complex methods. The significantly shortened computation time further underscores the practical utility in the context of basket trials. △ Less

Submitted 19 April, 2024; v1 submitted 23 December, 2023; originally announced December 2023.

arXiv:2312.05578 [pdf]

doi 10.1021/acs.nanolett.3c03948

Large nonlinear Hall effect and Berry curvature in KTaO3 based two-dimensional electron gas

Authors: Jinfeng Zhai, Mattia Trama, Hao Liu, Zhifei Zhu, Yinyan Zhu, Carmine Antonio Perroni, Roberta Citro, Pan He, Jian Shen

Abstract: The two-dimensional electron gas (2DEG) at oxide interfaces exhibits various exotic properties stemming from interfacial inversion symmetry breaking. In this work, we report the emergence of large nonlinear Hall effects (NHE) in the LaAlO3/KTaO3(111) interface 2DEG under zero magnetic field. Skew scattering was identified as the dominant origin based on the cubic scaling of nonlinear Hall conducti… ▽ More The two-dimensional electron gas (2DEG) at oxide interfaces exhibits various exotic properties stemming from interfacial inversion symmetry breaking. In this work, we report the emergence of large nonlinear Hall effects (NHE) in the LaAlO3/KTaO3(111) interface 2DEG under zero magnetic field. Skew scattering was identified as the dominant origin based on the cubic scaling of nonlinear Hall conductivity with longitudinal conductivity and the threefold symmetry. Moreover, a gate-tunable NHE with pronounced peak and dip was observed and reproduced by our theoretical calculation. These results indicate the presence of Berry curvature hotspots and thus a large Berry curvature triple at the oxide interface. Our theoretical calculations confirm the existence of large Berry curvatures from the avoided crossing of multiple 5d-orbit bands, orders of magnitude larger than that in transition-metal dichalcogenides. NHE offers a new pathway to probe the Berry curvature at oxide interfaces, and facilitates new applications in oxide nonlinear electronics. △ Less

Submitted 9 December, 2023; originally announced December 2023.

Journal ref: Nano Letters 2023

arXiv:2312.04040 [pdf, other]

doi 10.1088/1674-4527/ac9e90

Merging history of massive galaxies at 3<z<6

Authors: Kemeng Li, Zhen Jiang, Ping He, Qi Guo, Jie Wang

Abstract: The observational data of high redshift galaxies become increasingly abundant, especially since the operation of the James Webb Space Telescope (JWST), which allows us to verify and optimize the galaxy formation model at high redshifts. In this work, we investigate the merging history of massive galaxies at $3 < z < 6$ using a well-developed semi-analytic galaxy formation catalogue. We find that t… ▽ More The observational data of high redshift galaxies become increasingly abundant, especially since the operation of the James Webb Space Telescope (JWST), which allows us to verify and optimize the galaxy formation model at high redshifts. In this work, we investigate the merging history of massive galaxies at $3 < z < 6$ using a well-developed semi-analytic galaxy formation catalogue. We find that the major merger rate increases with redshift up to 3 and then flattens. The fraction of wet mergers, during which the sum of the cold gas mass is higher than the sum of the stellar mass in two merging galaxies, also increases from $\sim$ 34\% at $z = 0$ to 96\% at $z = 3$. Interestingly, almost all major mergers are wet at $z > 3$ . This can be attributed to the high fraction ($> 50\%$) of cold gas at $z > 3$. In addition, we study some special systems of massive merging galaxies at $3 < z < 6$, including the massive gas-rich major merging systems and extreme dense proto-clusters, and investigate the supermassive black hole-dark matter halo mass relation and dual AGNs. We find that the galaxy formation model reproduces the incidence of those observed massive galaxies, but fails to reproduce the relation between the supermassive black hole mass and the dark matter halo mass at $z \sim 6$. The latter requires more careful estimates of the supermassive black hole masses observationally. Otherwise, it could suggest modifications of the modeling of the supermassive black hole growth at high redshifts. △ Less

Submitted 6 December, 2023; originally announced December 2023.

Comments: 9 pages,8figures

Journal ref: 2023,Research in Astronomy and Astrophysics, 23(1), 015010

arXiv:2312.00409 [pdf, ps, other]

White Paper and Roadmap for Quantum Gravity Phenomenology in the Multi-Messenger Era

Authors: R. Alves Batista, G. Amelino-Camelia, D. Boncioli, J. M. Carmona, A. di Matteo, G. Gubitosi, I. Lobo, N. E. Mavromatos, C. Pfeifer, D. Rubiera-Garcia, E. N. Saridakis, T. Terzić, E. C. Vagenas, P. Vargas Moniz, H. Abdalla, M. Adamo, A. Addazi, F. K. Anagnostopoulos, V. Antonelli, M. Asorey, A. Ballesteros, S. Basilakos, D. Benisty, M. Boettcher, J. Bolmont , et al. (80 additional authors not shown)

Abstract: The unification of quantum mechanics and general relativity has long been elusive. Only recently have empirical predictions of various possible theories of quantum gravity been put to test. The dawn of multi-messenger high-energy astrophysics has been tremendously beneficial, as it allows us to study particles with much higher energies and travelling much longer distances than possible in terrestr… ▽ More The unification of quantum mechanics and general relativity has long been elusive. Only recently have empirical predictions of various possible theories of quantum gravity been put to test. The dawn of multi-messenger high-energy astrophysics has been tremendously beneficial, as it allows us to study particles with much higher energies and travelling much longer distances than possible in terrestrial experiments, but more progress is needed on several fronts. A thorough appraisal of current strategies and experimental frameworks, regarding quantum gravity phenomenology, is provided here. Our aim is twofold: a description of tentative multimessenger explorations, plus a focus on future detection experiments. As the outlook of the network of researchers that formed through the COST Action CA18108 "Quantum gravity phenomenology in the multi-messenger approach (QG-MM)", in this work we give an overview of the desiderata that future theoretical frameworks, observational facilities, and data-sharing policies should satisfy in order to advance the cause of quantum gravity phenomenology. △ Less

Submitted 12 December, 2023; v1 submitted 1 December, 2023; originally announced December 2023.

Comments: Submitted to CQG for the Focus Issue on "Quantum Gravity Phenomenology in the Multi-Messenger Era: Challenges and Perspectives". Please contact us to express interesst of endorsement of this white paper

arXiv:2310.20278 [pdf, other]

doi 10.1093/mnras/stae229

How do baryonic effects on the cosmic matter distribution vary with scale and local density environment?

Authors: Yun Wang, Ping He

Abstract: In this study, we investigate how the baryonic effects vary with scale and local density environment mainly by utilizing a novel statistic, the environment-dependent wavelet power spectrum (env-WPS). With four state-of-the-art cosmological simulation suites, EAGLE, SIMBA, Illustris, and IllustrisTNG, we compare the env-WPS of the total matter density field between the hydrodynamic and dark matter-… ▽ More In this study, we investigate how the baryonic effects vary with scale and local density environment mainly by utilizing a novel statistic, the environment-dependent wavelet power spectrum (env-WPS). With four state-of-the-art cosmological simulation suites, EAGLE, SIMBA, Illustris, and IllustrisTNG, we compare the env-WPS of the total matter density field between the hydrodynamic and dark matter-only (DMO) runs at $z=0$. We find that the clustering is most strongly suppressed in the emptiest environment of $ρ_\mathrm{m}/\barρ_\mathrm{m}<0.1$ with maximum amplitudes $\sim67-89$ per cent on scales $\sim1.86-10.96\ h\mathrm{Mpc}^{-1}$, and less suppressed in higher density environments on small scales (except Illustris). In the environments of $ρ_\mathrm{m}/\barρ_\mathrm{m}\geqslant0.316$ ($\geqslant10$ in EAGLE), the feedbacks also lead to enhancement features at intermediate and large scales, which is most pronounced in the densest environment of $ρ_\mathrm{m}/\barρ_\mathrm{m}\geqslant100$ and reaches a maximum $\sim 7-15$ per cent on scales $\sim0.87-2.62\ h\mathrm{Mpc}^{-1}$ (except Illustris). The baryon fraction of the local environment decreases with increasing density, denoting the feedback strength, and potentially explaining some differences between simulations. We also measure the volume and mass fractions of local environments, which are affected by $\gtrsim 1$ per cent due to baryon physics. In conclusion, our results show that the baryonic processes can strongly modify the overall cosmic structure on the scales of $k>0.1\ h\mathrm{Mpc}^{-1}$, which encourages further research in this direction. △ Less

Submitted 21 January, 2024; v1 submitted 31 October, 2023; originally announced October 2023.

Comments: 12 pages, 12 figures, and 3 tables; accepted by MNRAS

Journal ref: 2024, MNRAS, Volume 528, Issue 2

arXiv:2310.17304 [pdf, other]

Static Semantics Reconstruction for Enhancing JavaScript-WebAssembly Multilingual Malware Detection

Authors: Yifan Xia, Ping He, Xuhong Zhang, Peiyu Liu, Shouling Ji, Wenhai Wang

Abstract: The emergence of WebAssembly allows attackers to hide the malicious functionalities of JavaScript malware in cross-language interoperations, termed JavaScript-WebAssembly multilingual malware (JWMM). However, existing anti-virus solutions based on static program analysis are still limited to monolingual code. As a result, their detection effectiveness decreases significantly against JWMM. The dete… ▽ More The emergence of WebAssembly allows attackers to hide the malicious functionalities of JavaScript malware in cross-language interoperations, termed JavaScript-WebAssembly multilingual malware (JWMM). However, existing anti-virus solutions based on static program analysis are still limited to monolingual code. As a result, their detection effectiveness decreases significantly against JWMM. The detection of JWMM is challenging due to the complex interoperations and semantic diversity between JavaScript and WebAssembly. To bridge this gap, we present JWBinder, the first technique aimed at enhancing the static detection of JWMM. JWBinder performs a language-specific data-flow analysis to capture the cross-language interoperations and then characterizes the functionalities of JWMM through a unified high-level structure called Inter-language Program Dependency Graph. The extensive evaluation on one of the most representative real-world anti-virus platforms, VirusTotal, shows that \system effectively enhances anti-virus systems from various vendors and increases the overall successful detection rate against JWMM from 49.1\% to 86.2\%. Additionally, we assess the side effects and runtime overhead of JWBinder, corroborating its practical viability in real-world applications. △ Less

Submitted 19 April, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

Comments: Accepted to ESORICS 2023

arXiv:2310.11451 [pdf, other]

Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective

Authors: Ming Zhong, Chenxin An, Weizhu Chen, Jiawei Han, Pengcheng He

Abstract: Large Language Models (LLMs) inherently encode a wealth of knowledge within their parameters through pre-training on extensive corpora. While prior research has delved into operations on these parameters to manipulate the underlying implicit knowledge (encompassing detection, editing, and merging), there remains an ambiguous understanding regarding their transferability across models with varying… ▽ More Large Language Models (LLMs) inherently encode a wealth of knowledge within their parameters through pre-training on extensive corpora. While prior research has delved into operations on these parameters to manipulate the underlying implicit knowledge (encompassing detection, editing, and merging), there remains an ambiguous understanding regarding their transferability across models with varying scales. In this paper, we seek to empirically investigate knowledge transfer from larger to smaller models through a parametric perspective. To achieve this, we employ sensitivity-based techniques to extract and align knowledge-specific parameters between different LLMs. Moreover, the LoRA module is used as the intermediary mechanism for injecting the extracted knowledge into smaller models. Evaluations across four benchmarks validate the efficacy of our proposed method. Our findings highlight the critical factors contributing to the process of parametric knowledge transfer, underscoring the transferability of model parameters across LLMs of different scales. Project website: https://maszhongming.github.io/ParaKnowTransfer. △ Less

Submitted 8 May, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

Comments: ICLR 2024

arXiv:2310.08659 [pdf, other]

LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models

Authors: Yixiao Li, Yifan Yu, Chen Liang, Pengcheng He, Nikos Karampatziakis, Weizhu Chen, Tuo Zhao

Abstract: Quantization is an indispensable technique for serving Large Language Models (LLMs) and has recently found its way into LoRA fine-tuning. In this work we focus on the scenario where quantization and LoRA fine-tuning are applied together on a pre-trained model. In such cases it is common to observe a consistent gap in the performance on downstream tasks between full fine-tuning and quantization plu… ▽ More Quantization is an indispensable technique for serving Large Language Models (LLMs) and has recently found its way into LoRA fine-tuning. In this work we focus on the scenario where quantization and LoRA fine-tuning are applied together on a pre-trained model. In such cases it is common to observe a consistent gap in the performance on downstream tasks between full fine-tuning and quantization plus LoRA fine-tuning approach. In response, we propose LoftQ (LoRA-Fine-Tuning-aware Quantization), a novel quantization framework that simultaneously quantizes an LLM and finds a proper low-rank initialization for LoRA fine-tuning. Such an initialization alleviates the discrepancy between the quantized and full-precision model and significantly improves generalization in downstream tasks. We evaluate our method on natural language understanding, question answering, summarization, and natural language generation tasks. Experiments show that our method is highly effective and outperforms existing quantization methods, especially in the challenging 2-bit and 2/4-bit mixed precision regimes. The code is available on https://github.com/yxli2123/LoftQ. △ Less

Submitted 28 November, 2023; v1 submitted 12 October, 2023; originally announced October 2023.

arXiv:2310.06714 [pdf, other]

Exploring Memorization in Fine-tuned Language Models

Authors: Shenglai Zeng, Yaxin Li, Jie Ren, Yiding Liu, Han Xu, Pengfei He, Yue Xing, Shuaiqiang Wang, Jiliang Tang, Dawei Yin

Abstract: Large language models (LLMs) have shown great capabilities in various tasks but also exhibited memorization of training data, raising tremendous privacy and copyright concerns. While prior works have studied memorization during pre-training, the exploration of memorization during fine-tuning is rather limited. Compared to pre-training, fine-tuning typically involves more sensitive data and diverse… ▽ More Large language models (LLMs) have shown great capabilities in various tasks but also exhibited memorization of training data, raising tremendous privacy and copyright concerns. While prior works have studied memorization during pre-training, the exploration of memorization during fine-tuning is rather limited. Compared to pre-training, fine-tuning typically involves more sensitive data and diverse objectives, thus may bring distinct privacy risks and unique memorization behaviors. In this work, we conduct the first comprehensive analysis to explore language models' (LMs) memorization during fine-tuning across tasks. Our studies with open-sourced and our own fine-tuned LMs across various tasks indicate that memorization presents a strong disparity among different fine-tuning tasks. We provide an intuitive explanation of this task disparity via sparse coding theory and unveil a strong correlation between memorization and attention score distribution. △ Less

Submitted 22 February, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

arXiv:2310.06433 [pdf, other]

Retromorphic Testing: A New Approach to the Test Oracle Problem

Authors: Boxi Yu, Qiuyang Mang, Qingshuo Guo, Pinjia He

Abstract: A test oracle serves as a criterion or mechanism to assess the correspondence between software output and the anticipated behavior for a given input set. In automated testing, black-box techniques, known for their non-intrusive nature in test oracle construction, are widely used, including notable methodologies like differential testing and metamorphic testing. Inspired by the mathematical concept… ▽ More A test oracle serves as a criterion or mechanism to assess the correspondence between software output and the anticipated behavior for a given input set. In automated testing, black-box techniques, known for their non-intrusive nature in test oracle construction, are widely used, including notable methodologies like differential testing and metamorphic testing. Inspired by the mathematical concept of inverse function, we present Retromorphic Testing, a novel black-box testing methodology. It leverages an auxiliary program in conjunction with the program under test, which establishes a dual-program structure consisting of a forward program and a backward program. The input data is first processed by the forward program and then its program output is reversed to its original input format using the backward program. In particular, the auxiliary program can operate as either the forward or backward program, leading to different testing modes. The process concludes by examining the relationship between the initial input and the transformed output within the input domain. For example, to test the implementation of the sine function $\sin(x)$, we can employ its inverse function, $\arcsin(x)$, and validate the equation $x = \sin(\arcsin(x)+2kπ), \forall k \in \mathbb{Z}$. In addition to the high-level concept of Retromorphic Testing, this paper presents its three testing modes with illustrative use cases across diverse programs, including algorithms, traditional software, and AI applications. △ Less

Submitted 10 October, 2023; originally announced October 2023.

ACM Class: D.3.0; I.2.7; I.4.0

arXiv:2310.06389 [pdf, other]

Learning Stackable and Skippable LEGO Bricks for Efficient, Reconfigurable, and Variable-Resolution Diffusion Modeling

Authors: Huangjie Zheng, Zhendong Wang, Jianbo Yuan, Guanghan Ning, Pengcheng He, Quanzeng You, Hongxia Yang, Mingyuan Zhou

Abstract: Diffusion models excel at generating photo-realistic images but come with significant computational costs in both training and sampling. While various techniques address these computational challenges, a less-explored issue is designing an efficient and adaptable network backbone for iterative refinement. Current options like U-Net and Vision Transformer often rely on resource-intensive deep netwo… ▽ More Diffusion models excel at generating photo-realistic images but come with significant computational costs in both training and sampling. While various techniques address these computational challenges, a less-explored issue is designing an efficient and adaptable network backbone for iterative refinement. Current options like U-Net and Vision Transformer often rely on resource-intensive deep networks and lack the flexibility needed for generating images at variable resolutions or with a smaller network than used in training. This study introduces LEGO bricks, which seamlessly integrate Local-feature Enrichment and Global-content Orchestration. These bricks can be stacked to create a test-time reconfigurable diffusion backbone, allowing selective skipping of bricks to reduce sampling costs and generate higher-resolution images than the training data. LEGO bricks enrich local regions with an MLP and transform them using a Transformer block while maintaining a consistent full-resolution image across all bricks. Experimental results demonstrate that LEGO bricks enhance training efficiency, expedite convergence, and facilitate variable-resolution image generation while maintaining strong generative performance. Moreover, LEGO significantly reduces sampling time compared to other methods, establishing it as a valuable enhancement for diffusion models. Our code and project page are available at https://jegzheng.github.io/LEGODiffusion. △ Less

Submitted 27 June, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

Showing 1–50 of 406 results for author: He, P