subscribe to arXiv mailings

Fine-Tuning Linear Layers Only Is a Simple yet Effective Way for Task Arithmetic

Authors: Ruochen Jin, Bojian Hou, Jiancong Xiao, Weijie Su, Li Shen

Abstract: Task arithmetic has recently emerged as a cost-effective and scalable approach to edit pre-trained models directly in weight space, by adding the fine-tuned weights of different tasks. The performance has been further improved by a linear property which is illustrated by weight disentanglement. Yet, conventional linearization methods (e.g., NTK linearization) not only double the time and training… ▽ More Task arithmetic has recently emerged as a cost-effective and scalable approach to edit pre-trained models directly in weight space, by adding the fine-tuned weights of different tasks. The performance has been further improved by a linear property which is illustrated by weight disentanglement. Yet, conventional linearization methods (e.g., NTK linearization) not only double the time and training cost but also have a disadvantage on single-task performance. We propose a simple yet effective and efficient method that only fine-tunes linear layers, which improves weight disentanglement and efficiency simultaneously. Specifically, our study reveals that only fine-tuning the linear layers in the attention modules makes the whole model occur in a linear regime, significantly improving weight disentanglement. To further understand how our method improves the disentanglement of task arithmetic, we present a comprehensive study of task arithmetic by differentiating the role of representation model and task-specific model. In particular, we find that the representation model plays an important role in improving weight disentanglement whereas the task-specific models such as the classification heads can degenerate the weight disentanglement performance. Overall, our work uncovers novel insights into the fundamental mechanisms of task arithmetic and offers a more reliable and effective approach to editing pre-trained models. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.06612 [pdf]

AI-based Automatic Segmentation of Prostate on Multi-modality Images: A Review

Authors: Rui Jin, Derun Li, Dehui Xiang, Lei Zhang, Hailing Zhou, Fei Shi, Weifang Zhu, Jing Cai, Tao Peng, Xinjian Chen

Abstract: Prostate cancer represents a major threat to health. Early detection is vital in reducing the mortality rate among prostate cancer patients. One approach involves using multi-modality (CT, MRI, US, etc.) computer-aided diagnosis (CAD) systems for the prostate region. However, prostate segmentation is challenging due to imperfections in the images and the prostate's complex tissue structure. The ad… ▽ More Prostate cancer represents a major threat to health. Early detection is vital in reducing the mortality rate among prostate cancer patients. One approach involves using multi-modality (CT, MRI, US, etc.) computer-aided diagnosis (CAD) systems for the prostate region. However, prostate segmentation is challenging due to imperfections in the images and the prostate's complex tissue structure. The advent of precision medicine and a significant increase in clinical capacity have spurred the need for various data-driven tasks in the field of medical imaging. Recently, numerous machine learning and data mining tools have been integrated into various medical areas, including image segmentation. This article proposes a new classification method that differentiates supervision types, either in number or kind, during the training phase. Subsequently, we conducted a survey on artificial intelligence (AI)-based automatic prostate segmentation methods, examining the advantages and limitations of each. Additionally, we introduce variants of evaluation metrics for the verification and performance assessment of the segmentation method and summarize the current challenges. Finally, future research directions and development trends are discussed, reflecting the outcomes of our literature survey, suggesting high-precision detection and treatment of prostate cancer as a promising avenue. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.04303 [pdf, other]

Transmission spectroscopy of CF$_4$ molecules in intense x-ray fields

Authors: Rui Jin, Adam Fouda, Alexander Magunia, Yeonsig Nam, Marc Rebholz, Alberto De Fanis, Kai Li, Gilles Doumy, Thomas M. Baumann, Michael Straub, Sergey Usenko, Yevheniy Ovcharenko, Tommaso Mazza, Jacobo Montaño, Marcus Agåker, Maria Novella Piancastelli, Marc Simon, Jan-Erik Rubensson, Michael Meyer, Linda Young, Christian Ott, Thomas Pfeifer

Abstract: The nonlinear interaction of x-rays with matter is at the heart of understanding and controlling ultrafast molecular dynamics from an atom-specific viewpoint, providing new scientific and analytical opportunities to explore the structure and dynamics of small quantum systems. At increasingly high x-ray intensity, the sensitivity of ultrashort x-ray pulses to specific electronic states and emerging… ▽ More The nonlinear interaction of x-rays with matter is at the heart of understanding and controlling ultrafast molecular dynamics from an atom-specific viewpoint, providing new scientific and analytical opportunities to explore the structure and dynamics of small quantum systems. At increasingly high x-ray intensity, the sensitivity of ultrashort x-ray pulses to specific electronic states and emerging short-lived transient intermediates is of particular relevance for our understanding of fundamental multi-photon absorption processes. In this work, intense x-ray free-electron laser (XFEL) pulses at the European XFEL (EuXFEL) are combined with a gas cell and grating spectrometer for a high-intensity transmission spectroscopy study of multiphoton-induced ultrafast molecular fragmentation dynamics in CF$_4$. This approach unlocks the direct intra-pulse observation of transient fragments, including neutral atoms, by their characteristic absorption lines in the transmitted broad-band x-ray spectrum. The dynamics with and without initially producing fluorine K-shell holes are studied by tuning the central photon energy. The absorption spectra are measured at different FEL intensities to observe nonlinear effects. Transient isolated fluorine atoms and ions are spectroscopically recorded within the ultrashort pulse duration of few tens of femtoseconds. An isosbestic point that signifies the correlated transition between intact neutral CF$_4$ molecules and charged atomic fragments is observed near the fluorine K-edge. The dissociation dynamics and the multiphoton absorption-induced dynamics encoded in the spectra are theoretically interpreted. Overall, this study demonstrates the potential of high-intensity x-ray transmission spectroscopy to study ultrafast molecular dynamics with sensitivity to specific intermediate species and their electronic structure. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: 30 pages, with 7 figures, submitted to Phys. Rev. X

arXiv:2407.02042 [pdf, other]

Fake News Detection and Manipulation Reasoning via Large Vision-Language Models

Authors: Ruihan Jin, Ruibo Fu, Zhengqi Wen, Shuai Zhang, Yukun Liu, Jianhua Tao

Abstract: Fake news becomes a growing threat to information security and public opinion with the rapid sprawl of media manipulation. Therefore, fake news detection attracts widespread attention from academic community. Traditional fake news detection models demonstrate remarkable performance on authenticity binary classification but their ability to reason detailed faked traces based on the news content rem… ▽ More Fake news becomes a growing threat to information security and public opinion with the rapid sprawl of media manipulation. Therefore, fake news detection attracts widespread attention from academic community. Traditional fake news detection models demonstrate remarkable performance on authenticity binary classification but their ability to reason detailed faked traces based on the news content remains under-explored. Furthermore, due to the lack of external knowledge, the performance of existing methods on fact-related news is questionable, leaving their practical implementation unclear. In this paper, we propose a new multi-media research topic, namely manipulation reasoning. Manipulation reasoning aims to reason manipulations based on news content. To support the research, we introduce a benchmark for fake news detection and manipulation reasoning, referred to as Human-centric and Fact-related Fake News (HFFN). The benchmark highlights the centrality of human and the high factual relevance, with detailed manual annotations. HFFN encompasses four realistic domains with fake news samples generated through three manipulation approaches. Moreover, a Multi-modal news Detection and Reasoning langUage Model (M-DRUM) is presented not only to judge on the authenticity of multi-modal news, but also raise analytical reasoning about potential manipulations. On the feature extraction level, a cross-attention mechanism is employed to extract fine-grained fusion features from multi-modal inputs. On the reasoning level, a large vision-language model (LVLM) serves as the backbone to facilitate fact-related reasoning. A two-stage training framework is deployed to better activate the capacity of identification and reasoning. Comprehensive experiments demonstrate that our model outperforms state-of-the-art (SOTA) fake news detection models and powerful LVLMs like GPT-4 and LLaVA. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2407.00983 [pdf, other]

FairMedFM: Fairness Benchmarking for Medical Imaging Foundation Models

Authors: Ruinan Jin, Zikang Xu, Yuan Zhong, Qiongsong Yao, Qi Dou, S. Kevin Zhou, Xiaoxiao Li

Abstract: The advent of foundation models (FMs) in healthcare offers unprecedented opportunities to enhance medical diagnostics through automated classification and segmentation tasks. However, these models also raise significant concerns about their fairness, especially when applied to diverse and underrepresented populations in healthcare applications. Currently, there is a lack of comprehensive benchmark… ▽ More The advent of foundation models (FMs) in healthcare offers unprecedented opportunities to enhance medical diagnostics through automated classification and segmentation tasks. However, these models also raise significant concerns about their fairness, especially when applied to diverse and underrepresented populations in healthcare applications. Currently, there is a lack of comprehensive benchmarks, standardized pipelines, and easily adaptable libraries to evaluate and understand the fairness performance of FMs in medical imaging, leading to considerable challenges in formulating and implementing solutions that ensure equitable outcomes across diverse patient populations. To fill this gap, we introduce FairMedFM, a fairness benchmark for FM research in medical imaging.FairMedFM integrates with 17 popular medical imaging datasets, encompassing different modalities, dimensionalities, and sensitive attributes. It explores 20 widely used FMs, with various usages such as zero-shot learning, linear probing, parameter-efficient fine-tuning, and prompting in various downstream tasks -- classification and segmentation. Our exhaustive analysis evaluates the fairness performance over different evaluation metrics from multiple perspectives, revealing the existence of bias, varied utility-fairness trade-offs on different FMs, consistent disparities on the same datasets regardless FMs, and limited effectiveness of existing unfairness mitigation methods. Checkout FairMedFM's project page and open-sourced codebase, which supports extendible functionalities and applications as well as inclusive for studies on FMs in medical imaging over the long term. △ Less

Submitted 3 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

Comments: 29 pages, 17 figures

arXiv:2406.18406 [pdf, other]

IRCAN: Mitigating Knowledge Conflicts in LLM Generation via Identifying and Reweighting Context-Aware Neurons

Authors: Dan Shi, Renren Jin, Tianhao Shen, Weilong Dong, Xinwei Wu, Deyi Xiong

Abstract: It is widely acknowledged that large language models (LLMs) encode a vast reservoir of knowledge after being trained on mass data. Recent studies disclose knowledge conflicts in LLM generation, wherein outdated or incorrect parametric knowledge (i.e., encoded knowledge) contradicts new knowledge provided in the context. To mitigate such knowledge conflicts, we propose a novel framework, IRCAN (Ide… ▽ More It is widely acknowledged that large language models (LLMs) encode a vast reservoir of knowledge after being trained on mass data. Recent studies disclose knowledge conflicts in LLM generation, wherein outdated or incorrect parametric knowledge (i.e., encoded knowledge) contradicts new knowledge provided in the context. To mitigate such knowledge conflicts, we propose a novel framework, IRCAN (Identifying and Reweighting Context-Aware Neurons) to capitalize on neurons that are crucial in processing contextual cues. Specifically, IRCAN first identifies neurons that significantly contribute to context processing, utilizing a context-aware attribution score derived from integrated gradients. Subsequently, the identified context-aware neurons are strengthened via reweighting. In doing so, we steer LLMs to generate context-sensitive outputs with respect to the new knowledge provided in the context. Extensive experiments conducted across a variety of models and tasks demonstrate that IRCAN not only achieves remarkable improvements in handling knowledge conflicts but also offers a scalable, plug-andplay solution that can be integrated seamlessly with existing models. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: 19 pages, 13 figures, 5 tables

arXiv:2406.14422 [pdf, other]

FutureNet-LOF: Joint Trajectory Prediction and Lane Occupancy Field Prediction with Future Context Encoding

Authors: Mingkun Wang, Xiaoguang Ren, Ruochun Jin, Minglong Li, Xiaochuan Zhang, Changqian Yu, Mingxu Wang, Wenjing Yang

Abstract: Most prior motion prediction endeavors in autonomous driving have inadequately encoded future scenarios, leading to predictions that may fail to accurately capture the diverse movements of agents (e.g., vehicles or pedestrians). To address this, we propose FutureNet, which explicitly integrates initially predicted trajectories into the future scenario and further encodes these future contexts to e… ▽ More Most prior motion prediction endeavors in autonomous driving have inadequately encoded future scenarios, leading to predictions that may fail to accurately capture the diverse movements of agents (e.g., vehicles or pedestrians). To address this, we propose FutureNet, which explicitly integrates initially predicted trajectories into the future scenario and further encodes these future contexts to enhance subsequent forecasting. Additionally, most previous motion forecasting works have focused on predicting independent futures for each agent. However, safe and smooth autonomous driving requires accurately predicting the diverse future behaviors of numerous surrounding agents jointly in complex dynamic environments. Given that all agents occupy certain potential travel spaces and possess lane driving priority, we propose Lane Occupancy Field (LOF), a new representation with lane semantics for motion forecasting in autonomous driving. LOF can simultaneously capture the joint probability distribution of all road participants' future spatial-temporal positions. Due to the high compatibility between lane occupancy field prediction and trajectory prediction, we propose a novel network with future context encoding for the joint prediction of these two tasks. Our approach ranks 1st on two large-scale motion forecasting benchmarks: Argoverse 1 and Argoverse 2. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: 10 pages

arXiv:2406.08765 [pdf, other]

LLM-based Knowledge Pruning for Time Series Data Analytics on Edge-computing Devices

Authors: Ruibing Jin, Qing Xu, Min Wu, Yuecong Xu, Dan Li, Xiaoli Li, Zhenghua Chen

Abstract: Limited by the scale and diversity of time series data, the neural networks trained on time series data often overfit and show unsatisfacotry performances. In comparison, large language models (LLMs) recently exhibit impressive generalization in diverse fields. Although massive LLM based approaches are proposed for time series tasks, these methods require to load the whole LLM in both training and… ▽ More Limited by the scale and diversity of time series data, the neural networks trained on time series data often overfit and show unsatisfacotry performances. In comparison, large language models (LLMs) recently exhibit impressive generalization in diverse fields. Although massive LLM based approaches are proposed for time series tasks, these methods require to load the whole LLM in both training and reference. This high computational demands limit practical applications in resource-constrained settings, like edge-computing and IoT devices. To address this issue, we propose Knowledge Pruning (KP), a novel paradigm for time series learning in this paper. For a specific downstream task, we argue that the world knowledge learned by LLMs is much redundant and only the related knowledge termed as "pertinent knowledge" is useful. Unlike other methods, our KP targets to prune the redundant knowledge and only distill the pertinent knowledge into the target model. This reduces model size and computational costs significantly. Additionally, different from existing LLM based approaches, our KP does not require to load the LLM in the process of training and testing, further easing computational burdens. With our proposed KP, a lightweight network can effectively learn the pertinent knowledge, achieving satisfactory performances with a low computation cost. To verify the effectiveness of our KP, two fundamental tasks on edge-computing devices are investigated in our experiments, where eight diverse environments or benchmarks with different networks are used to verify the generalization of our KP. Through experiments, our KP demonstrates effective learning of pertinent knowledge, achieving notable performance improvements in regression (19.7% on average) and classification (up to 13.7%) tasks, showcasing state-of-the-art results. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 12 pages, 5 figures

arXiv:2406.08764 [pdf]

Numerical Insights into noise amplification of high-energy mid-infrared supercontinuum generation in normal dispersion multimode fibers

Authors: Chaofan Yang, Dian Duan, Fan Zou, Kuo Liu, Ruibo Jin, Zechuan Liu, Haoyu Wu

Abstract: We report on the noise properties of high-energy mid-infrared supercontinuum (MIR-SC) generation in normal dispersion multimode fibers from the numerical perspective. Noise amplification in multi-modes is primarily due to the stimulated Raman scattering (SRS) effect. This leads to the emergence of "incoherent cloud formation" and "incoherent optical wave breaking", similar to those observed in sin… ▽ More We report on the noise properties of high-energy mid-infrared supercontinuum (MIR-SC) generation in normal dispersion multimode fibers from the numerical perspective. Noise amplification in multi-modes is primarily due to the stimulated Raman scattering (SRS) effect. This leads to the emergence of "incoherent cloud formation" and "incoherent optical wave breaking", similar to those observed in single-mode fibers. Increasing the pump technical noise from 0.1 % to 1 % significantly shortens the lumped coherence length L_C and exacerbates the influence of incoherent broadening dynamics competing with coherent dynamics, resulting in MIR-SC being a strong consistency in the collapse evolution of amplitude noise and phase coherence. To minimize this noise amplification and achieve high-energy low-noise MIR-SC in practical applications, it is essential to use short-pulse pumping with low amplitude noise, ensuring that L_C>>L_OWB (where L_OWB denotes the optical wave breaking length). △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.08487 [pdf, other]

Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models

Authors: Yi-Fan Zhang, Qingsong Wen, Chaoyou Fu, Xue Wang, Zhang Zhang, Liang Wang, Rong Jin

Abstract: Seeing clearly with high resolution is a foundation of Large Multimodal Models (LMMs), which has been proven to be vital for visual perception and reasoning. Existing works usually employ a straightforward resolution upscaling method, where the image consists of global and local branches, with the latter being the sliced image patches but resized to the same resolution as the former. This means th… ▽ More Seeing clearly with high resolution is a foundation of Large Multimodal Models (LMMs), which has been proven to be vital for visual perception and reasoning. Existing works usually employ a straightforward resolution upscaling method, where the image consists of global and local branches, with the latter being the sliced image patches but resized to the same resolution as the former. This means that higher resolution requires more local patches, resulting in exorbitant computational expenses, and meanwhile, the dominance of local image tokens may diminish the global context. In this paper, we dive into the problems and propose a new framework as well as an elaborate optimization strategy. Specifically, we extract contextual information from the global view using a mixture of adapters, based on the observation that different adapters excel at different tasks. With regard to local patches, learnable query embeddings are introduced to reduce image tokens, the most important tokens accounting for the user question will be further selected by a similarity-based selector. Our empirical results demonstrate a `less is more' pattern, where \textit{utilizing fewer but more informative local image tokens leads to improved performance}. Besides, a significant challenge lies in the training strategy, as simultaneous end-to-end training of the global mining block and local compression block does not yield optimal results. We thus advocate for an alternating training way, ensuring balanced learning between global and local aspects. Finally, we also introduce a challenging dataset with high requirements for image detail, enhancing the training of the local compression layer. The proposed method, termed LMM with Sophisticated Tasks, Local image compression, and Mixture of global Experts (SliME), achieves leading performance across various benchmarks with only 2 million training data. △ Less

Submitted 13 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

Comments: Project page: https://github.com/yfzhang114/SliME

arXiv:2406.05419 [pdf, ps, other]

Foundations of iterated star maps and their use in combinatorics

Authors: Mauro Di Nasso, Renling Jin

Abstract: We develop a framework for nonstandard analysis that gives foundations to the interplay between external and internal iterations of the star map, and we present a few examples to show the strength and flexibility of such a nonstandard technique for applications in combinatorial number theory. We develop a framework for nonstandard analysis that gives foundations to the interplay between external and internal iterations of the star map, and we present a few examples to show the strength and flexibility of such a nonstandard technique for applications in combinatorial number theory. △ Less

Submitted 8 June, 2024; originally announced June 2024.

arXiv:2405.20015 [pdf, other]

Efficient LLM-Jailbreaking by Introducing Visual Modality

Authors: Zhenxing Niu, Yuyao Sun, Haodong Ren, Haoxuan Ji, Quan Wang, Xiaoke Ma, Gang Hua, Rong Jin

Abstract: This paper focuses on jailbreaking attacks against large language models (LLMs), eliciting them to generate objectionable content in response to harmful user queries. Unlike previous LLM-jailbreaks that directly orient to LLMs, our approach begins by constructing a multimodal large language model (MLLM) through the incorporation of a visual module into the target LLM. Subsequently, we conduct an e… ▽ More This paper focuses on jailbreaking attacks against large language models (LLMs), eliciting them to generate objectionable content in response to harmful user queries. Unlike previous LLM-jailbreaks that directly orient to LLMs, our approach begins by constructing a multimodal large language model (MLLM) through the incorporation of a visual module into the target LLM. Subsequently, we conduct an efficient MLLM-jailbreak to generate jailbreaking embeddings embJS. Finally, we convert the embJS into text space to facilitate the jailbreaking of the target LLM. Compared to direct LLM-jailbreaking, our approach is more efficient, as MLLMs are more vulnerable to jailbreaking than pure LLM. Additionally, to improve the attack success rate (ASR) of jailbreaking, we propose an image-text semantic matching scheme to identify a suitable initial input. Extensive experiments demonstrate that our approach surpasses current state-of-the-art methods in terms of both efficiency and effectiveness. Moreover, our approach exhibits superior cross-class jailbreaking capabilities. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.19811 [pdf, ps, other]

Approximate Global Convergence of Independent Learning in Multi-Agent Systems

Authors: Ruiyang Jin, Zaiwei Chen, Yiheng Lin, Jie Song, Adam Wierman

Abstract: Independent learning (IL), despite being a popular approach in practice to achieve scalability in large-scale multi-agent systems, usually lacks global convergence guarantees. In this paper, we study two representative algorithms, independent $Q$-learning and independent natural actor-critic, within value-based and policy-based frameworks, and provide the first finite-sample analysis for approxima… ▽ More Independent learning (IL), despite being a popular approach in practice to achieve scalability in large-scale multi-agent systems, usually lacks global convergence guarantees. In this paper, we study two representative algorithms, independent $Q$-learning and independent natural actor-critic, within value-based and policy-based frameworks, and provide the first finite-sample analysis for approximate global convergence. The results imply a sample complexity of $\tilde{\mathcal{O}}(ε^{-2})$ up to an error term that captures the dependence among agents and characterizes the fundamental limit of IL in achieving global convergence. To establish the result, we develop a novel approach for analyzing IL by constructing a separable Markov decision process (MDP) for convergence analysis and then bounding the gap due to model difference between the separable MDP and the original one. Moreover, we conduct numerical experiments using a synthetic MDP and an electric vehicle charging example to verify our theoretical findings and to demonstrate the practical applicability of IL. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.17929 [pdf, other]

Towards Unified Robustness Against Both Backdoor and Adversarial Attacks

Authors: Zhenxing Niu, Yuyao Sun, Qiguang Miao, Rong Jin, Gang Hua

Abstract: Deep Neural Networks (DNNs) are known to be vulnerable to both backdoor and adversarial attacks. In the literature, these two types of attacks are commonly treated as distinct robustness problems and solved separately, since they belong to training-time and inference-time attacks respectively. However, this paper revealed that there is an intriguing connection between them: (1) planting a backdoor… ▽ More Deep Neural Networks (DNNs) are known to be vulnerable to both backdoor and adversarial attacks. In the literature, these two types of attacks are commonly treated as distinct robustness problems and solved separately, since they belong to training-time and inference-time attacks respectively. However, this paper revealed that there is an intriguing connection between them: (1) planting a backdoor into a model will significantly affect the model's adversarial examples; (2) for an infected model, its adversarial examples have similar features as the triggered images. Based on these observations, a novel Progressive Unified Defense (PUD) algorithm is proposed to defend against backdoor and adversarial attacks simultaneously. Specifically, our PUD has a progressive model purification scheme to jointly erase backdoors and enhance the model's adversarial robustness. At the early stage, the adversarial examples of infected models are utilized to erase backdoors. With the backdoor gradually erased, our model purification can naturally turn into a stage to boost the model's robustness against adversarial attacks. Besides, our PUD algorithm can effectively identify poisoned images, which allows the initial extra dataset not to be completely clean. Extensive experimental results show that, our discovered connection between backdoor and adversarial attacks is ubiquitous, no matter what type of backdoor attack. The proposed PUD outperforms the state-of-the-art backdoor defense, including the model repairing-based and data filtering-based methods. Besides, it also has the ability to compete with the most advanced adversarial defense methods. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.13578 [pdf, other]

ConTrans: Weak-to-Strong Alignment Engineering via Concept Transplantation

Authors: Weilong Dong, Xinwei Wu, Renren Jin, Shaoyang Xu, Deyi Xiong

Abstract: Ensuring large language models (LLM) behave consistently with human goals, values, and intentions is crucial for their safety but yet computationally expensive. To reduce the computational cost of alignment training of LLMs, especially for those with a huge number of parameters, and to reutilize learned value alignment, we propose ConTrans, a novel framework that enables weak-to-strong alignment t… ▽ More Ensuring large language models (LLM) behave consistently with human goals, values, and intentions is crucial for their safety but yet computationally expensive. To reduce the computational cost of alignment training of LLMs, especially for those with a huge number of parameters, and to reutilize learned value alignment, we propose ConTrans, a novel framework that enables weak-to-strong alignment transfer via concept transplantation. From the perspective of representation engineering, ConTrans refines concept vectors in value alignment from a source LLM (usually a weak yet aligned LLM). The refined concept vectors are then reformulated to adapt to the target LLM (usually a strong yet unaligned base LLM) via affine transformation. In the third step, ConTrans transplants the reformulated concept vectors into the residual stream of the target LLM. Experiments demonstrate the successful transplantation of a wide range of aligned concepts from 7B models to 13B and 70B models across multiple LLMs and LLM families. Remarkably, ConTrans even surpasses instruction-tuned models in terms of truthfulness. Experiment results validate the effectiveness of both inter-LLM-family and intra-LLM-family concept transplantation. Our work successfully demonstrates an alternative way to achieve weak-to-strong alignment generalization and control. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.12794 [pdf, other]

Multiphoton Quantum Imaging using Natural Light

Authors: Fatemeh Mostafavi, Mingyuan Hong, Riley B. Dawkins, Jannatul Ferdous, Rui-Bo Jin, Roberto de J. Leon-Montiel, Chenglong You, Omar S. Magana-Loaiza

Abstract: It is thought that schemes for quantum imaging are fragile against realistic environments in which the background noise is often stronger than the nonclassical signal of the imaging photons. Unfortunately, it is unfeasible to produce brighter quantum light sources to alleviate this problem. Here, we overcome this paradigmatic limitation by developing a quantum imaging scheme that relies on the use… ▽ More It is thought that schemes for quantum imaging are fragile against realistic environments in which the background noise is often stronger than the nonclassical signal of the imaging photons. Unfortunately, it is unfeasible to produce brighter quantum light sources to alleviate this problem. Here, we overcome this paradigmatic limitation by developing a quantum imaging scheme that relies on the use of natural sources of light. This is achieved by performing conditional detection on the photon number of the thermal light field scattered by a remote object. Specifically, the conditional measurements in our scheme enable us to extract quantum features of the detected thermal photons to produce quantum images with improved signal-to-noise ratios. This technique shows a remarkable exponential enhancement in the contrast of quantum images. Surprisingly, this measurement scheme enables the possibility of producing images from the vacuum fluctuations of the light field. This is experimentally demonstrated through the implementation of a single-pixel camera with photon-number-resolving capabilities. As such, we believe that our scheme opens a new paradigm in the field of quantum imaging. It also unveils the potential of combining natural light sources with nonclassical detection schemes for the development of robust quantum technologies. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2405.11441 [pdf, other]

EmbSum: Leveraging the Summarization Capabilities of Large Language Models for Content-Based Recommendations

Authors: Chiyu Zhang, Yifei Sun, Minghao Wu, Jun Chen, Jie Lei, Muhammad Abdul-Mageed, Rong Jin, Angli Liu, Ji Zhu, Sem Park, Ning Yao, Bo Long

Abstract: Content-based recommendation systems play a crucial role in delivering personalized content to users in the digital world. In this work, we introduce EmbSum, a novel framework that enables offline pre-computations of users and candidate items while capturing the interactions within the user engagement history. By utilizing the pretrained encoder-decoder model and poly-attention layers, EmbSum deri… ▽ More Content-based recommendation systems play a crucial role in delivering personalized content to users in the digital world. In this work, we introduce EmbSum, a novel framework that enables offline pre-computations of users and candidate items while capturing the interactions within the user engagement history. By utilizing the pretrained encoder-decoder model and poly-attention layers, EmbSum derives User Poly-Embedding (UPE) and Content Poly-Embedding (CPE) to calculate relevance scores between users and candidate items. EmbSum actively learns the long user engagement histories by generating user-interest summary with supervision from large language model (LLM). The effectiveness of EmbSum is validated on two datasets from different domains, surpassing state-of-the-art (SoTA) methods with higher accuracy and fewer parameters. Additionally, the model's ability to generate summaries of user interests serves as a valuable by-product, enhancing its usefulness for personalized content recommendations. △ Less

Submitted 19 May, 2024; originally announced May 2024.

Comments: Under review

arXiv:2405.10142 [pdf, other]

GS-Planner: A Gaussian-Splatting-based Planning Framework for Active High-Fidelity Reconstruction

Authors: Rui Jin, Yuman Gao, Yingjian Wang, Haojian Lu, Fei Gao

Abstract: Active reconstruction technique enables robots to autonomously collect scene data for full coverage, relieving users from tedious and time-consuming data capturing process. However, designed based on unsuitable scene representations, existing methods show unrealistic reconstruction results or the inability of online quality evaluation. Due to the recent advancements in explicit radiance field tech… ▽ More Active reconstruction technique enables robots to autonomously collect scene data for full coverage, relieving users from tedious and time-consuming data capturing process. However, designed based on unsuitable scene representations, existing methods show unrealistic reconstruction results or the inability of online quality evaluation. Due to the recent advancements in explicit radiance field technology, online active high-fidelity reconstruction has become achievable. In this paper, we propose GS-Planner, a planning framework for active high-fidelity reconstruction using 3D Gaussian Splatting. With improvement on 3DGS to recognize unobserved regions, we evaluate the reconstruction quality and completeness of 3DGS map online to guide the robot. Then we design a sampling-based active reconstruction strategy to explore the unobserved areas and improve the reconstruction geometric and textural quality. To establish a complete robot active reconstruction system, we choose quadrotor as the robotic platform for its high agility. Then we devise a safety constraint with 3DGS to generate executable trajectories for quadrotor navigation in the 3DGS map. To validate the effectiveness of our method, we conduct extensive experiments and ablation studies in highly realistic simulation scenes. △ Less

Submitted 24 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

arXiv:2405.08929 [pdf, other]

Size and Shape Dependence of Hydrogen-Induced Phase Transformation and Sorption Hysteresis in Palladium Nanoparticles

Authors: Xingsheng Sun, Rong Jin

Abstract: We establish a computational framework to explore the atomic configuration of a metal-hydrogen (M-H) system when in equilibrium with a H environment. This approach combines Diffusive Molecular Dynamics with an iteration strategy, aiming to minimize the system's free energy and ensure uniform chemical potential across the system that matches that of the H environment. Applying this framework, we in… ▽ More We establish a computational framework to explore the atomic configuration of a metal-hydrogen (M-H) system when in equilibrium with a H environment. This approach combines Diffusive Molecular Dynamics with an iteration strategy, aiming to minimize the system's free energy and ensure uniform chemical potential across the system that matches that of the H environment. Applying this framework, we investigate H chemical potential-composition isotherms during the hydrogenation and dehydrogenation of palladium nanoparticles, ranging in size from $3.9$ nm to $15.6$ nm and featuring various shapes including cube, rhombic dodecahedron, octahedron, and sphere. Our findings reveal an abrupt phase transformation in all examined particles during both H loading and unloading processes, accompanied by a distinct hysteresis gap between absorption and desorption chemical potentials. Notably, as particle size increases, absorption chemical potential rises while desorption chemical potential declines, consequently widening the hysteresis gap across all shapes. Regarding shape effects, we observe that, at a given size, cubic particles exhibit the lowest absorption chemical potentials during H loading, whereas octahedral particles demonstrate the highest. Moreover, octahedral particles also exhibit the highest desorption chemical potentials during H unloading. These size and shape effects are elucidated by statistics of atomic volumetric strains resulting from specific facet orientations and inhomogeneous H distributions. Prior to phase transformation in absorption, a H-rich surface shell induces lattice expansion in the H-poor core, while before phase transformation in desorption, surface stress promotes lattice compression in the H-rich core. The magnitude of the volumetric strains correlates well with the size and shape dependence, underlining their pivotal role in the observed phenomena. △ Less

Submitted 14 May, 2024; originally announced May 2024.

arXiv:2405.05741 [pdf, ps, other]

Can large language models understand uncommon meanings of common words?

Authors: Jinyang Wu, Feihu Che, Xinxin Zheng, Shuai Zhang, Ruihan Jin, Shuai Nie, Pengpeng Shao, Jianhua Tao

Abstract: Large language models (LLMs) like ChatGPT have shown significant advancements across diverse natural language understanding (NLU) tasks, including intelligent dialogue and autonomous agents. Yet, lacking widely acknowledged testing mechanisms, answering `whether LLMs are stochastic parrots or genuinely comprehend the world' remains unclear, fostering numerous studies and sparking heated debates. P… ▽ More Large language models (LLMs) like ChatGPT have shown significant advancements across diverse natural language understanding (NLU) tasks, including intelligent dialogue and autonomous agents. Yet, lacking widely acknowledged testing mechanisms, answering `whether LLMs are stochastic parrots or genuinely comprehend the world' remains unclear, fostering numerous studies and sparking heated debates. Prevailing research mainly focuses on surface-level NLU, neglecting fine-grained explorations. However, such explorations are crucial for understanding their unique comprehension mechanisms, aligning with human cognition, and finally enhancing LLMs' general NLU capacities. To address this gap, our study delves into LLMs' nuanced semantic comprehension capabilities, particularly regarding common words with uncommon meanings. The idea stems from foundational principles of human communication within psychology, which underscore accurate shared understandings of word semantics. Specifically, this paper presents the innovative construction of a Lexical Semantic Comprehension (LeSC) dataset with novel evaluation metrics, the first benchmark encompassing both fine-grained and cross-lingual dimensions. Introducing models of both open-source and closed-source, varied scales and architectures, our extensive empirical experiments demonstrate the inferior performance of existing models in this basic lexical-meaning understanding task. Notably, even the state-of-the-art LLMs GPT-4 and GPT-3.5 lag behind 16-year-old humans by 3.9% and 22.3%, respectively. Additionally, multiple advanced prompting techniques and retrieval-augmented generation are also introduced to help alleviate this trouble, yet limitations persist. By highlighting the above critical shortcomings, this research motivates further investigation and offers novel insights for developing more intelligent LLMs. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.04434 [pdf, other]

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Authors: DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding , et al. (132 additional authors not shown)

Abstract: We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference… ▽ More We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference through significantly compressing the Key-Value (KV) cache into a latent vector, while DeepSeekMoE enables training strong models at an economical cost through sparse computation. Compared with DeepSeek 67B, DeepSeek-V2 achieves significantly stronger performance, and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times. We pretrain DeepSeek-V2 on a high-quality and multi-source corpus consisting of 8.1T tokens, and further perform Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to fully unlock its potential. Evaluation results show that, even with only 21B activated parameters, DeepSeek-V2 and its chat versions still achieve top-tier performance among open-source models. △ Less

Submitted 19 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

arXiv:2405.00365 [pdf, other]

Robust Continuous-Time Beam Tracking with Liquid Neural Network

Authors: Fenghao Zhu, Xinquan Wang, Chongwen Huang, Richeng Jin, Qianqian Yang, Ahmed Alhammadi, Zhaoyang Zhang, Chau Yuen, Mérouane Debbah

Abstract: Millimeter-wave (mmWave) technology is increasingly recognized as a pivotal technology of the sixth-generation communication networks due to the large amounts of available spectrum at high frequencies. However, the huge overhead associated with beam training imposes a significant challenge in mmWave communications, particularly in urban environments with high background noise. To reduce this high… ▽ More Millimeter-wave (mmWave) technology is increasingly recognized as a pivotal technology of the sixth-generation communication networks due to the large amounts of available spectrum at high frequencies. However, the huge overhead associated with beam training imposes a significant challenge in mmWave communications, particularly in urban environments with high background noise. To reduce this high overhead, we propose a novel solution for robust continuous-time beam tracking with liquid neural network, which dynamically adjust the narrow mmWave beams to ensure real-time beam alignment with mobile users. Through extensive simulations, we validate the effectiveness of our proposed method and demonstrate its superiority over existing state-of-the-art deep-learning-based approaches. Specifically, our scheme achieves at most 46.9% higher normalized spectral efficiency than the baselines when the user is moving at 5 m/s, demonstrating the potential of liquid neural networks to enhance mmWave mobile communication performance. △ Less

Submitted 1 May, 2024; originally announced May 2024.

arXiv:2404.11070 [pdf]

Sky-GVIO: an enhanced GNSS/INS/Vision navigation with FCN-based sky-segmentation in urban canyon

Authors: Jingrong Wang, Bo Xu, Ronghe Jin, Shoujian Zhang, Kefu Gao, Jingnan Liu

Abstract: Accurate, continuous, and reliable positioning is a critical component of achieving autonomous driving. However, in complex urban canyon environments, the vulnerability of a stand-alone sensor and non-line-of-sight (NLOS) caused by high buildings, trees, and elevated structures seriously affect positioning results. To address these challenges, a sky-view images segmentation algorithm based on Full… ▽ More Accurate, continuous, and reliable positioning is a critical component of achieving autonomous driving. However, in complex urban canyon environments, the vulnerability of a stand-alone sensor and non-line-of-sight (NLOS) caused by high buildings, trees, and elevated structures seriously affect positioning results. To address these challenges, a sky-view images segmentation algorithm based on Fully Convolutional Network (FCN) is proposed for GNSS NLOS detection. Building upon this, a novel NLOS detection and mitigation algorithm (named S-NDM) is extended to the tightly coupled Global Navigation Satellite Systems (GNSS), Inertial Measurement Units (IMU), and visual feature system which is called Sky-GVIO, with the aim of achieving continuous and accurate positioning in urban canyon environments. Furthermore, the system harmonizes Single Point Positioning (SPP) with Real-Time Kinematic (RTK) methodologies to bolster its operational versatility and resilience. In urban canyon environments, the positioning performance of S-NDM algorithm proposed in this paper is evaluated under different tightly coupled SPP-related and RTK-related models. The results exhibit that Sky-GVIO system achieves meter-level accuracy under SPP mode and sub-decimeter precision with RTK, surpassing the performance of GNSS/INS/Vision frameworks devoid of S-NDM. Additionally, the sky-view image dataset, inclusive of training and evaluation subsets, has been made publicly accessible for scholarly exploration at https://github.com/whuwangjr/sky-view-images . △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.07509

Multiparameter cascaded quantum interferometer

Authors: Baihong Li, Zhuo-zhuo Wang, Qi-qi Li, Changhua Chen, Boxin Yuan, Yiwei Zhai, Rui-Bo Jin, Xiaofei Zhang

Abstract: We theoretically propose a multiparameter cascaded quantum interferometer in which a two-input and two-output setup is obtained by concatenating 50:50 beam splitters with n independent and adjustable time delays. A general method for deriving the coincidence probability of such an interferometer is given based on the linear transformation of the matrix of beam splitters. As examples, we analyze th… ▽ More We theoretically propose a multiparameter cascaded quantum interferometer in which a two-input and two-output setup is obtained by concatenating 50:50 beam splitters with n independent and adjustable time delays. A general method for deriving the coincidence probability of such an interferometer is given based on the linear transformation of the matrix of beam splitters. As examples, we analyze the interference characteristics of one-, two- and three-parameter cascaded quantum interferometers with different frequency correlations and input states. Some typical interferograms of such interferometers are provided to reveal more rich and complicated two-photon interference phenomena. In principle, arbitrary two-input and two-output experimental setups can be designed with the proposal. This work offers a toolbox for designing versatile quantum interferometers and provides a convenient method for deriving the coincidence probabilities involved. Potential applications can be found in the complete spectral characterization of two-photon states, multiparameter estimation, and quantum metrology. △ Less

Submitted 8 May, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

Comments: We have found a serious error in this version, which may mislead readers

arXiv:2404.07421 [pdf, other]

Controllable transitions among phase-matching conditions in a single nonlinear crystal

Authors: Zi-Qi Zeng, Shi-Xin You, Zi-Xiang Yang, Chenzhi Yuan, Chenglong You, Rui-Bo Jin

Abstract: Entangled photon pairs are crucial resources for quantum information processing protocols. Via the process of spontaneous parametric down-conversion (SPDC), we can generate these photon pairs using bulk nonlinear crystals. Traditionally, the crystal is designed to satisfy specific type of phase-matching condition. Here, we report controllable transitions among different types of phase-matching in… ▽ More Entangled photon pairs are crucial resources for quantum information processing protocols. Via the process of spontaneous parametric down-conversion (SPDC), we can generate these photon pairs using bulk nonlinear crystals. Traditionally, the crystal is designed to satisfy specific type of phase-matching condition. Here, we report controllable transitions among different types of phase-matching in a single periodically poled potassium titanyl phosphate (PPKTP) crystal. By carefully selecting pump conditions, we can satisfy different phase-matching conditions. This allows us to observe first-order type-II, fifth-order type-I, third-order type-0, and fifth-order type-II SPDCs. The temperature-dependent spectra of our source were also analyzed in detail. Finally, we discussed the possibility of observing more than nine SPDCs in this crystal. Our work not only deepens the understanding of the physics behind phase-matching conditions, but also offers the potential for a highly versatile entangled biphoton source for quantum information research. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: 8 pages, 3 figures

Journal ref: Chinese Optics Letters, 22(2), 021901(2024)

arXiv:2404.07074 [pdf]

Multiscale structure-property discovery via active learning in scanning tunneling microscopy

Authors: Ganesh Narasimha, Dejia Kong, Paras Regmi, Rongying Jin, Zheng Gai, Rama Vasudevan, Maxim Ziatdinov

Abstract: Atomic arrangements and local sub-structures fundamentally influence emergent material functionalities. The local structures are conventionally probed using spatially resolved studies and the property correlations are usually deciphered by a researcher based on sequential explorations and auxiliary information, thus limiting the throughput efficiency. Here we demonstrate a Bayesian deep learning b… ▽ More Atomic arrangements and local sub-structures fundamentally influence emergent material functionalities. The local structures are conventionally probed using spatially resolved studies and the property correlations are usually deciphered by a researcher based on sequential explorations and auxiliary information, thus limiting the throughput efficiency. Here we demonstrate a Bayesian deep learning based framework that automatically correlates material structure with its electronic properties using scanning tunneling microscopy (STM) measurements in real-time. Its predictions are used to autonomously direct exploration toward regions of the sample that optimize a given material property. This autonomous method is deployed on the low-temperature ultra-high vacuum STM to understand the structure-property relationship in a europium-based semimetal, EuZn2As2, one of the promising candidates for studying the magnetism-driven topological properties. The framework employs a sparse sampling approach to efficiently construct the scalar-property space using a minimal number of measurements, about 1 - 10 % of the data required in standard hyperspectral imaging methods. We further demonstrate a target-property-guided active learning of structures within a multiscale framework. This is implemented across length scales in a hierarchical fashion for the autonomous discovery of structural origins for an observed material property. This framework offers the choice to select and derive a suitable scalar property from the spectroscopic data to steer exploration across the sample space. Our findings reveal correlations of the electronic properties unique to surface terminations, local defect density, and point defects. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2403.19723 [pdf, other]

HGT: Leveraging Heterogeneous Graph-enhanced Large Language Models for Few-shot Complex Table Understanding

Authors: Rihui Jin, Yu Li, Guilin Qi, Nan Hu, Yuan-Fang Li, Jiaoyan Chen, Jianan Wang, Yongrui Chen, Dehai Min

Abstract: Table understanding (TU) has achieved promising advancements, but it faces the challenges of the scarcity of manually labeled tables and the presence of complex table structures.To address these challenges, we propose HGT, a framework with a heterogeneous graph (HG)-enhanced large language model (LLM) to tackle few-shot TU tasks.It leverages the LLM by aligning the table semantics with the LLM's p… ▽ More Table understanding (TU) has achieved promising advancements, but it faces the challenges of the scarcity of manually labeled tables and the presence of complex table structures.To address these challenges, we propose HGT, a framework with a heterogeneous graph (HG)-enhanced large language model (LLM) to tackle few-shot TU tasks.It leverages the LLM by aligning the table semantics with the LLM's parametric knowledge through soft prompts and instruction turning and deals with complex tables by a multi-task pre-training scheme involving three novel multi-granularity self-supervised HG pre-training objectives.We empirically demonstrate the effectiveness of HGT, showing that it outperforms the SOTA for few-shot complex TU on several benchmarks. △ Less

Submitted 27 March, 2024; originally announced March 2024.

arXiv:2403.14949 [pdf, other]

Addressing Concept Shift in Online Time Series Forecasting: Detect-then-Adapt

Authors: YiFan Zhang, Weiqi Chen, Zhaoyang Zhu, Dalin Qin, Liang Sun, Xue Wang, Qingsong Wen, Zhang Zhang, Liang Wang, Rong Jin

Abstract: Online updating of time series forecasting models aims to tackle the challenge of concept drifting by adjusting forecasting models based on streaming data. While numerous algorithms have been developed, most of them focus on model design and updating. In practice, many of these methods struggle with continuous performance regression in the face of accumulated concept drifts over time. To address t… ▽ More Online updating of time series forecasting models aims to tackle the challenge of concept drifting by adjusting forecasting models based on streaming data. While numerous algorithms have been developed, most of them focus on model design and updating. In practice, many of these methods struggle with continuous performance regression in the face of accumulated concept drifts over time. To address this limitation, we present a novel approach, Concept \textbf{D}rift \textbf{D}etection an\textbf{D} \textbf{A}daptation (D3A), that first detects drifting conception and then aggressively adapts the current model to the drifted concepts after the detection for rapid adaption. To best harness the utility of historical data for model adaptation, we propose a data augmentation strategy introducing Gaussian noise into existing training instances. It helps mitigate the data distribution gap, a critical factor contributing to train-test performance inconsistency. The significance of our data augmentation process is verified by our theoretical analysis. Our empirical studies across six datasets demonstrate the effectiveness of D3A in improving model adaptation capability. Notably, compared to a simple Temporal Convolutional Network (TCN) baseline, D3A reduces the average Mean Squared Error (MSE) by $43.9\%$. For the state-of-the-art (SOTA) model, the MSE is reduced by $33.3\%$. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: 7 figures, 14 pages. arXiv admin note: text overlap with arXiv:2309.12659

arXiv:2403.12601 [pdf, other]

LHMKE: A Large-scale Holistic Multi-subject Knowledge Evaluation Benchmark for Chinese Large Language Models

Authors: Chuang Liu, Renren Jin, Yuqi Ren, Deyi Xiong

Abstract: Chinese Large Language Models (LLMs) have recently demonstrated impressive capabilities across various NLP benchmarks and real-world applications. However, the existing benchmarks for comprehensively evaluating these LLMs are still insufficient, particularly in terms of measuring knowledge that LLMs capture. Current datasets collect questions from Chinese examinations across different subjects and… ▽ More Chinese Large Language Models (LLMs) have recently demonstrated impressive capabilities across various NLP benchmarks and real-world applications. However, the existing benchmarks for comprehensively evaluating these LLMs are still insufficient, particularly in terms of measuring knowledge that LLMs capture. Current datasets collect questions from Chinese examinations across different subjects and educational levels to address this issue. Yet, these benchmarks primarily focus on objective questions such as multiple-choice questions, leading to a lack of diversity in question types. To tackle this problem, we propose LHMKE, a Large-scale, Holistic, and Multi-subject Knowledge Evaluation benchmark in this paper. LHMKE is designed to provide a comprehensive evaluation of the knowledge acquisition capabilities of Chinese LLMs. It encompasses 10,465 questions across 75 tasks covering 30 subjects, ranging from primary school to professional certification exams. Notably, LHMKE includes both objective and subjective questions, offering a more holistic evaluation of the knowledge level of LLMs. We have assessed 11 Chinese LLMs under the zero-shot setting, which aligns with real examinations, and compared their performance across different subjects. We also conduct an in-depth analysis to check whether GPT-4 can automatically score subjective predictions. Our findings suggest that LHMKE is a challenging and advanced testbed for Chinese LLMs. △ Less

Submitted 19 March, 2024; originally announced March 2024.

Comments: Accepted by LREC-COLING 2024

arXiv:2403.12316 [pdf, other]

OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and Safety

Authors: Chuang Liu, Linhao Yu, Jiaxuan Li, Renren Jin, Yufei Huang, Ling Shi, Junhui Zhang, Xinmeng Ji, Tingting Cui, Tao Liu, Jinwang Song, Hongying Zan, Sun Li, Deyi Xiong

Abstract: The rapid development of Chinese large language models (LLMs) poses big challenges for efficient LLM evaluation. While current initiatives have introduced new benchmarks or evaluation platforms for assessing Chinese LLMs, many of these focus primarily on capabilities, usually overlooking potential alignment and safety issues. To address this gap, we introduce OpenEval, an evaluation testbed that b… ▽ More The rapid development of Chinese large language models (LLMs) poses big challenges for efficient LLM evaluation. While current initiatives have introduced new benchmarks or evaluation platforms for assessing Chinese LLMs, many of these focus primarily on capabilities, usually overlooking potential alignment and safety issues. To address this gap, we introduce OpenEval, an evaluation testbed that benchmarks Chinese LLMs across capability, alignment and safety. For capability assessment, we include 12 benchmark datasets to evaluate Chinese LLMs from 4 sub-dimensions: NLP tasks, disciplinary knowledge, commonsense reasoning and mathematical reasoning. For alignment assessment, OpenEval contains 7 datasets that examines the bias, offensiveness and illegalness in the outputs yielded by Chinese LLMs. To evaluate safety, especially anticipated risks (e.g., power-seeking, self-awareness) of advanced LLMs, we include 6 datasets. In addition to these benchmarks, we have implemented a phased public evaluation and benchmark update strategy to ensure that OpenEval is in line with the development of Chinese LLMs or even able to provide cutting-edge benchmark datasets to guide the development of Chinese LLMs. In our first public evaluation, we have tested a range of Chinese LLMs, spanning from 7B to 72B parameters, including both open-source and proprietary models. Evaluation results indicate that while Chinese LLMs have shown impressive performance in certain tasks, more attention should be directed towards broader aspects such as commonsense reasoning, alignment, and safety. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.11693 [pdf, other]

Beamforming Design for Semantic-Bit Coexisting Communication System

Authors: Maojun Zhang, Guangxu Zhu, Richeng Jin, Xiaoming Chen, Qingjiang Shi, Caijun Zhong, Kaibin Huang

Abstract: Semantic communication (SemCom) is emerging as a key technology for future sixth-generation (6G) systems. Unlike traditional bit-level communication (BitCom), SemCom directly optimizes performance at the semantic level, leading to superior communication efficiency. Nevertheless, the task-oriented nature of SemCom renders it challenging to completely replace BitCom. Consequently, it is desired to c… ▽ More Semantic communication (SemCom) is emerging as a key technology for future sixth-generation (6G) systems. Unlike traditional bit-level communication (BitCom), SemCom directly optimizes performance at the semantic level, leading to superior communication efficiency. Nevertheless, the task-oriented nature of SemCom renders it challenging to completely replace BitCom. Consequently, it is desired to consider a semantic-bit coexisting communication system, where a base station (BS) serves SemCom users (sem-users) and BitCom users (bit-users) simultaneously. Such a system faces severe and heterogeneous inter-user interference. In this context, this paper provides a new semantic-bit coexisting communication framework and proposes a spatial beamforming scheme to accommodate both types of users. Specifically, we consider maximizing the semantic rate for semantic users while ensuring the quality-of-service (QoS) requirements for bit-users. Due to the intractability of obtaining the exact closed-form expression of the semantic rate, a data driven method is first applied to attain an approximated expression via data fitting. With the resulting complex transcendental function, majorization minimization (MM) is adopted to convert the original formulated problem into a multiple-ratio problem, which allows fractional programming (FP) to be used to further transform the problem into an inhomogeneous quadratically constrained quadratic programs (QCQP) problem. Solving the problem leads to a semi-closed form solution with undetermined Lagrangian factors that can be updated by a fixed point algorithm. Extensive simulation results demonstrate that the proposed beamforming scheme significantly outperforms conventional beamforming algorithms such as zero-forcing (ZF), maximum ratio transmission (MRT), and weighted minimum mean-square error (WMMSE). △ Less

Submitted 22 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

Comments: Submitted to IEEE for possible publication

arXiv:2403.07747 [pdf, other]

FineMath: A Fine-Grained Mathematical Evaluation Benchmark for Chinese Large Language Models

Authors: Yan Liu, Renren Jin, Lin Shi, Zheng Yao, Deyi Xiong

Abstract: To thoroughly assess the mathematical reasoning abilities of Large Language Models (LLMs), we need to carefully curate evaluation datasets covering diverse mathematical concepts and mathematical problems at different difficulty levels. In pursuit of this objective, we propose FineMath in this paper, a fine-grained mathematical evaluation benchmark dataset for assessing Chinese LLMs. FineMath is cr… ▽ More To thoroughly assess the mathematical reasoning abilities of Large Language Models (LLMs), we need to carefully curate evaluation datasets covering diverse mathematical concepts and mathematical problems at different difficulty levels. In pursuit of this objective, we propose FineMath in this paper, a fine-grained mathematical evaluation benchmark dataset for assessing Chinese LLMs. FineMath is created to cover the major key mathematical concepts taught in elementary school math, which are further divided into 17 categories of math word problems, enabling in-depth analysis of mathematical reasoning abilities of LLMs. All the 17 categories of math word problems are manually annotated with their difficulty levels according to the number of reasoning steps required to solve these problems. We conduct extensive experiments on a wide range of LLMs on FineMath and find that there is still considerable room for improvements in terms of mathematical reasoning capability of Chinese LLMs. We also carry out an in-depth analysis on the evaluation process and methods that have been overlooked previously. These two factors significantly influence the model results and our understanding of their mathematical reasoning capabilities. The dataset will be publicly available soon. △ Less

Submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.06104 [pdf, other]

Debiased Noise Editing on Foundation Models for Fair Medical Image Classification

Authors: Ruinan Jin, Wenlong Deng, Minghui Chen, Xiaoxiao Li

Abstract: In the era of Foundation Models' (FMs) rising prominence in AI, our study addresses the challenge of biases in medical images while the model operates in black-box (e.g., using FM API), particularly spurious correlations between pixels and sensitive attributes. Traditional methods for bias mitigation face limitations due to the restricted access to web-hosted FMs and difficulties in addressing the… ▽ More In the era of Foundation Models' (FMs) rising prominence in AI, our study addresses the challenge of biases in medical images while the model operates in black-box (e.g., using FM API), particularly spurious correlations between pixels and sensitive attributes. Traditional methods for bias mitigation face limitations due to the restricted access to web-hosted FMs and difficulties in addressing the underlying bias encoded within the FM API. We propose a D(ebiased) N(oise) E(diting) strategy, termed DNE, which generates DNE noise to mask such spurious correlation. DNE is capable of mitigating bias both within the FM API embedding and the images themselves. Furthermore, DNE is suitable for both white-box and black-box FM APIs, where we introduced G(reedy) (Z)eroth-O(rder) (GeZO) optimization for it when the gradient is inaccessible in black-box APIs. Our whole pipeline enables fairness-aware image editing that can be applied across various medical contexts without requiring direct model manipulation or significant computational resources. Our empirical results demonstrate the method's effectiveness in maintaining fairness and utility across different patient groups and diseases. In the era of AI-driven medicine, this work contributes to making healthcare diagnostics more equitable, showcasing a practical solution for bias mitigation in pre-trained image FMs. Our code is provided at https://github.com/ubc-tea/DNE-foundation-model-fairness. △ Less

Submitted 12 July, 2024; v1 submitted 10 March, 2024; originally announced March 2024.

Comments: 13 pages, 3 figures. Accepted by MICCAI 2024

arXiv:2403.05935 [pdf, ps, other]

Unique reconstruction for discretized inverse problems: a random sketching approach

Authors: Ruhui Jin, Qin Li, Anjali Nair, Samuel Stechmann

Abstract: Inverse problem theory is often studied in the ideal infinite-dimensional setting. Through the lens of the PDE-constrained optimization, the well-posedness PDE theory suggests unique reconstruction of the parameter function that attain the zero-loss property of the mismatch function, when infinite amount of data is provided. Unfortunately, this is not the case in practice, when we are limited to f… ▽ More Inverse problem theory is often studied in the ideal infinite-dimensional setting. Through the lens of the PDE-constrained optimization, the well-posedness PDE theory suggests unique reconstruction of the parameter function that attain the zero-loss property of the mismatch function, when infinite amount of data is provided. Unfortunately, this is not the case in practice, when we are limited to finite amount of measurements due to experimental or economical reasons. Consequently, one must compromise the inference goal to a discrete approximation of the unknown smooth function. What is the reconstruction power of a fixed number of data observations? How many parameters can one reconstruct? Here we describe a probabilistic approach, and spell out the interplay of the observation size $(r)$ and the number of parameters to be uniquely identified $(m)$. The technical pillar is the random sketching strategy, in which the matrix concentration inequality and sampling theory are largely employed. By analyzing randomly sub-sampled Hessian matrix, we attain well-conditioned reconstruction problem with high probability. Our main theory is finally validated in numerical experiments. We set tests on both synthetic and the data from an elliptic inverse problem. The empirical performance shows that given suitable sampling quality, the well-conditioning of the sketched Hessian is certified with high probability. △ Less

Submitted 9 March, 2024; originally announced March 2024.

MSC Class: 65M32; 49M41; 65F35

arXiv:2403.05478 [pdf, other]

HGIC: A Hand Gesture Based Interactive Control System for Efficient and Scalable Multi-UAV Operations

Authors: Mengsha Hu, Jinzhou Li, Runxiang Jin, Chao Shi, Lei Xu, Rui Liu

Abstract: As technological advancements continue to expand the capabilities of multi unmanned-aerial-vehicle systems (mUAV), human operators face challenges in scalability and efficiency due to the complex cognitive load and operations associated with motion adjustments and team coordination. Such cognitive demands limit the feasible size of mUAV teams and necessitate extensive operator training, impeding b… ▽ More As technological advancements continue to expand the capabilities of multi unmanned-aerial-vehicle systems (mUAV), human operators face challenges in scalability and efficiency due to the complex cognitive load and operations associated with motion adjustments and team coordination. Such cognitive demands limit the feasible size of mUAV teams and necessitate extensive operator training, impeding broader adoption. This paper developed a Hand Gesture Based Interactive Control (HGIC), a novel interface system that utilize computer vision techniques to intuitively translate hand gestures into modular commands for robot teaming. Through learning control models, these commands enable efficient and scalable mUAV motion control and adjustments. HGIC eliminates the need for specialized hardware and offers two key benefits: 1) Minimal training requirements through natural gestures; and 2) Enhanced scalability and efficiency via adaptable commands. By reducing the cognitive burden on operators, HGIC opens the door for more effective large-scale mUAV applications in complex, dynamic, and uncertain scenarios. HGIC will be open-sourced after the paper being published online for the research community, aiming to drive forward innovations in human-mUAV interactions. △ Less

Submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.05472 [pdf, other]

Federated Joint Learning of Robot Networks in Stroke Rehabilitation

Authors: Xinyu Jiang, Yibei Guo, Mengsha Hu, Ruoming Jin, Hai Phan, Jay Alberts, Rui Liu

Abstract: Advanced by rich perception and precise execution, robots possess immense potential to provide professional and customized rehabilitation exercises for patients with mobility impairments caused by strokes. Autonomous robotic rehabilitation significantly reduces human workloads in the long and tedious rehabilitation process. However, training a rehabilitation robot is challenging due to the data sc… ▽ More Advanced by rich perception and precise execution, robots possess immense potential to provide professional and customized rehabilitation exercises for patients with mobility impairments caused by strokes. Autonomous robotic rehabilitation significantly reduces human workloads in the long and tedious rehabilitation process. However, training a rehabilitation robot is challenging due to the data scarcity issue. This challenge arises from privacy concerns (e.g., the risk of leaking private disease and identity information of patients) during clinical data access and usage. Data from various patients and hospitals cannot be shared for adequate robot training, further compromising rehabilitation safety and limiting implementation scopes. To address this challenge, this work developed a novel federated joint learning (FJL) method to jointly train robots across hospitals. FJL also adopted a long short-term memory network (LSTM)-Transformer learning mechanism to effectively explore the complex tempo-spatial relations among patient mobility conditions and robotic rehabilitation motions. To validate FJL's effectiveness in training a robot network, a clinic-simulation combined experiment was designed. Real rehabilitation exercise data from 200 patients with stroke diseases (upper limb hemiplegia, Parkinson's syndrome, and back pain syndrome) were adopted. Inversely driven by clinical data, 300,000 robotic rehabilitation guidances were simulated. FJL proved to be effective in joint rehabilitation learning, performing 20% - 30% better than baseline methods. △ Less

Submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.05262 [pdf, other]

Debiasing Multimodal Large Language Models

Authors: Yi-Fan Zhang, Weichen Yu, Qingsong Wen, Xue Wang, Zhang Zhang, Liang Wang, Rong Jin, Tieniu Tan

Abstract: In the realms of computer vision and natural language processing, Large Vision-Language Models (LVLMs) have become indispensable tools, proficient in generating textual descriptions based on visual inputs. Despite their advancements, our investigation reveals a noteworthy bias in the generated content, where the output is primarily influenced by the underlying Large Language Models (LLMs) prior ra… ▽ More In the realms of computer vision and natural language processing, Large Vision-Language Models (LVLMs) have become indispensable tools, proficient in generating textual descriptions based on visual inputs. Despite their advancements, our investigation reveals a noteworthy bias in the generated content, where the output is primarily influenced by the underlying Large Language Models (LLMs) prior rather than the input image. Our empirical experiments underscore the persistence of this bias, as LVLMs often provide confident answers even in the absence of relevant images or given incongruent visual input. To rectify these biases and redirect the model's focus toward vision information, we introduce two simple, training-free strategies. Firstly, for tasks such as classification or multi-choice question-answering (QA), we propose a ``calibration'' step through affine transformation to adjust the output distribution. This ``Post-Hoc debias'' approach ensures uniform scores for each answer when the image is absent, serving as an effective regularization technique to alleviate the influence of LLM priors. For more intricate open-ended generation tasks, we extend this method to ``Debias sampling'', drawing inspirations from contrastive decoding methods. Furthermore, our investigation sheds light on the instability of LVLMs across various decoding configurations. Through systematic exploration of different settings, we significantly enhance performance, surpassing reported results and raising concerns about the fairness of existing evaluations. Comprehensive experiments substantiate the effectiveness of our proposed strategies in mitigating biases. These strategies not only prove beneficial in minimizing hallucinations but also contribute to the generation of more helpful and precise illustrations. △ Less

Submitted 27 March, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

Comments: 38 pages, 17 figures

arXiv:2403.04911 [pdf, ps, other]

Fractional stochastic Landau-Lifshitz Navier-Stokes equations in dimension $d \geq 3$: Existence and (non-)triviality

Authors: Ruhong Jin, Nicolas Perkowski

Abstract: We investigate fractional stochastic Navier-Stokes equations in $d\ge 3$, driven by the random force $(-Δ)^{\fracθ{2}}ξ$ which, as we show, corresponds to a fractional version of the Landau-Lifshitz random force in the physics literature. We obtain the existence and uniqueness of martingale solutions on the torus $\mathbb T^d$ for $θ> \frac{d}{2}$. For $θ\le 1$ the equation is supercritical and we… ▽ More We investigate fractional stochastic Navier-Stokes equations in $d\ge 3$, driven by the random force $(-Δ)^{\fracθ{2}}ξ$ which, as we show, corresponds to a fractional version of the Landau-Lifshitz random force in the physics literature. We obtain the existence and uniqueness of martingale solutions on the torus $\mathbb T^d$ for $θ> \frac{d}{2}$. For $θ\le 1$ the equation is supercritical and we regularize the problem by introducing a Galerkin approximation and we study the large scale behavior of the truncated model on $\RR^d$. We show that the nonlinear term in the Galerkin approximation vanishes on large scales when $θ< 1$ and the model converges to the linearized equation. For $θ= 1$ the nonlinear term gives a nontrivial contribution to the large scale beahvior, and we conjecture that the large scale behavior is given by a linear model with strictly larger effective diffusivity compared to simply dropping the nonlinear term. The effective diffusivity is explicitly given in terms of the model parameters. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: 24 pages

arXiv:2403.03645 [pdf, other]

K-Link: Knowledge-Link Graph from LLMs for Enhanced Representation Learning in Multivariate Time-Series Data

Authors: Yucheng Wang, Ruibing Jin, Min Wu, Xiaoli Li, Lihua Xie, Zhenghua Chen

Abstract: Sourced from various sensors and organized chronologically, Multivariate Time-Series (MTS) data involves crucial spatial-temporal dependencies, e.g., correlations among sensors. To capture these dependencies, Graph Neural Networks (GNNs) have emerged as powerful tools, yet their effectiveness is restricted by the quality of graph construction from MTS data. Typically, existing approaches construct… ▽ More Sourced from various sensors and organized chronologically, Multivariate Time-Series (MTS) data involves crucial spatial-temporal dependencies, e.g., correlations among sensors. To capture these dependencies, Graph Neural Networks (GNNs) have emerged as powerful tools, yet their effectiveness is restricted by the quality of graph construction from MTS data. Typically, existing approaches construct graphs solely from MTS signals, which may introduce bias due to a small training dataset and may not accurately represent underlying dependencies. To address this challenge, we propose a novel framework named K-Link, leveraging Large Language Models (LLMs) to encode extensive general knowledge and thereby providing effective solutions to reduce the bias. Leveraging the knowledge embedded in LLMs, such as physical principles, we extract a \textit{Knowledge-Link graph}, capturing vast semantic knowledge of sensors and the linkage of the sensor-level knowledge. To harness the potential of the knowledge-link graph in enhancing the graph derived from MTS data, we propose a graph alignment module, facilitating the transfer of semantic knowledge within the knowledge-link graph into the MTS-derived graph. By doing so, we can improve the graph quality, ensuring effective representation learning with GNNs for MTS data. Extensive experiments demonstrate the efficacy of our approach for superior performance across various MTS-related downstream tasks. △ Less

Submitted 6 March, 2024; originally announced March 2024.

Comments: 12 pages,7 figures

arXiv:2403.02977 [pdf, other]

Fast Iterative Region Inflation for Computing Large 2-D/3-D Convex Regions of Obstacle-Free Space

Authors: Qianhao Wang, Zhepei Wang, Mingyang Wang, Jialin Ji, Zhichao Han, Tianyue Wu, Rui Jin, Yuman Gao, Chao Xu, Fei Gao

Abstract: Convex polytopes have compact representations and exhibit convexity, which makes them suitable for abstracting obstacle-free spaces from various environments. Existing methods for generating convex polytopes always struggle to strike a balance between two requirements, producing high-quality polytope and efficiency. Moreover, another crucial requirement for convex polytopes to accurately contain c… ▽ More Convex polytopes have compact representations and exhibit convexity, which makes them suitable for abstracting obstacle-free spaces from various environments. Existing methods for generating convex polytopes always struggle to strike a balance between two requirements, producing high-quality polytope and efficiency. Moreover, another crucial requirement for convex polytopes to accurately contain certain seed point sets, such as a robot or a front-end path, is proposed in various tasks, which we refer to as manageability. In this paper, we show that we can achieve generation of high-quality convex polytope while ensuring both efficiency and manageability simultaneously, by introducing Fast Iterative Regional Inflation (FIRI).FIRI consists of two iteratively executed submodules: Restrictive Inflation (RsI) and computation of the Maximum Volume Inscribed Ellipsoid (MVIE) of convex polytope. By explicitly incorporating constraints that include the seed point set, RsI guarantees manageability. Meanwhile, the iterative monotonic optimization of MVIE, which serves as a lower bound of the volume of convex polytope, ensures high-quality results of FIRI. In terms of efficiency, we design methods tailored to the low-dimensional and multi-constrained nature of both modules, resulting in orders of magnitude improvement compared to generic solvers. Notably, for 2-D MVIE, we present a novel analytical algorithm that achieves linear-time complexity for the first time, further enhancing the efficiency of FIRI in the 2-D scenario. Extensive benchmarks conducted against state-of-the-art methods validate the superior performance of FIRI in terms of quality, manageability, and efficiency. Furthermore, various real-world applications showcase the generality and practicality of FIRI. The high-performance code of FIRI will be open-sourced for the reference of the community. △ Less

Submitted 6 June, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

arXiv:2403.00997 [pdf, other]

Thermoelectric Transport in Weyl Semimetal BaMnSb2: a First-Principles Study

Authors: Yubi Chen, Rongying Jin, Bolin Liao, Sai Mu

Abstract: Topological materials are often associated with exceptional thermoelectric properties. Orthorhombic BaMnSb2 is a topological semimetal consisting of alternating layers of Ba, Sb, and MnSb. A recent experiment demonstrates that BaMnSb2 has a low thermal conductivity and modest thermopower, promising as a thermoelectric material. Through first-principles calculations with Coulomb repulsion and spin-… ▽ More Topological materials are often associated with exceptional thermoelectric properties. Orthorhombic BaMnSb2 is a topological semimetal consisting of alternating layers of Ba, Sb, and MnSb. A recent experiment demonstrates that BaMnSb2 has a low thermal conductivity and modest thermopower, promising as a thermoelectric material. Through first-principles calculations with Coulomb repulsion and spin-orbit coupling included, we studied the electronic structure, phononic structure, and thermoelectric transport properties of BaMnSb2 in depth. We find that BaMnSb2 exhibits a low lattice thermal conductivity, owing to the scattering of the acoustic phonons with low-frequency optical modes. Using the linearized Boltzmann transport theory with a constant relaxation time approximation, the thermopower is further calculated and an intriguing goniopolar transport behavior, which is associated with both n-type and p-type conduction along separate transport directions simultaneously, is observed. We propose that the figure of merit can be enhanced via doping in which electrical conductivity is decreased while the thermopower remains undiminished. BaMnSb2 is a potential platform for elucidating complex band structure effects and topological phenomena, paving the way to explore rich physics in low-dimensional systems. △ Less

Submitted 1 March, 2024; originally announced March 2024.

arXiv:2402.18023 [pdf, other]

Do Large Language Models Mirror Cognitive Language Processing?

Authors: Yuqi Ren, Renren Jin, Tongxuan Zhang, Deyi Xiong

Abstract: Large Language Models (LLMs) have demonstrated remarkable abilities in text comprehension and logical reasoning, indicating that the text representations learned by LLMs can facilitate their language processing capabilities. In cognitive science, brain cognitive processing signals are typically utilized to study human language processing. Therefore, it is natural to ask how well the text embedding… ▽ More Large Language Models (LLMs) have demonstrated remarkable abilities in text comprehension and logical reasoning, indicating that the text representations learned by LLMs can facilitate their language processing capabilities. In cognitive science, brain cognitive processing signals are typically utilized to study human language processing. Therefore, it is natural to ask how well the text embeddings from LLMs align with the brain cognitive processing signals, and how training strategies affect the LLM-brain alignment? In this paper, we employ Representational Similarity Analysis (RSA) to measure the alignment between 23 mainstream LLMs and fMRI signals of the brain to evaluate how effectively LLMs simulate cognitive language processing. We empirically investigate the impact of various factors (e.g., pre-training data size, model scaling, alignment training, and prompts) on such LLM-brain alignment. Experimental results indicate that pre-training data size and model scaling are positively correlated with LLM-brain similarity, and alignment training can significantly improve LLM-brain similarity. Explicit prompts contribute to the consistency of LLMs with brain cognitive language processing, while nonsensical noisy prompts may attenuate such alignment. Additionally, the performance of a wide range of LLM evaluations (e.g., MMLU, Chatbot Arena) is highly correlated with the LLM-brain similarity. △ Less

Submitted 28 May, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

arXiv:2402.16775 [pdf, other]

A Comprehensive Evaluation of Quantization Strategies for Large Language Models

Authors: Renren Jin, Jiangcun Du, Wuwei Huang, Wei Liu, Jian Luan, Bin Wang, Deyi Xiong

Abstract: Increasing the number of parameters in large language models (LLMs) usually improves performance in downstream tasks but raises compute and memory costs, making deployment difficult in resource-limited settings. Quantization techniques, which reduce the bits needed for model weights or activations with minimal performance loss, have become popular due to the rise of LLMs. However, most quantizatio… ▽ More Increasing the number of parameters in large language models (LLMs) usually improves performance in downstream tasks but raises compute and memory costs, making deployment difficult in resource-limited settings. Quantization techniques, which reduce the bits needed for model weights or activations with minimal performance loss, have become popular due to the rise of LLMs. However, most quantization studies use pre-trained LLMs, and the impact of quantization on instruction-tuned LLMs and the relationship between perplexity and benchmark performance of quantized LLMs are not well understood. Evaluation of quantized LLMs is often limited to language modeling and a few classification tasks, leaving their performance on other benchmarks unclear. To address these gaps, we propose a structured evaluation framework consisting of three critical dimensions: (1) knowledge \& capacity, (2) alignment, and (3) efficiency, and conduct extensive experiments across ten diverse benchmarks. Our experimental results indicate that LLMs with 4-bit quantization can retain performance comparable to their non-quantized counterparts, and perplexity can serve as a proxy metric for quantized LLMs on most benchmarks. Furthermore, quantized LLMs with larger parameter scales can outperform smaller LLMs. Despite the memory savings achieved through quantization, it can also slow down the inference speed of LLMs. Consequently, substantial engineering efforts and hardware support are imperative to achieve a balanced optimization of decoding speed and memory consumption in the context of quantized LLMs. △ Less

Submitted 6 June, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

Comments: ACL 2024 Findings

arXiv:2402.12869 [pdf, other]

Exploring the Impact of Table-to-Text Methods on Augmenting LLM-based Question Answering with Domain Hybrid Data

Authors: Dehai Min, Nan Hu, Rihui Jin, Nuo Lin, Jiaoyan Chen, Yongrui Chen, Yu Li, Guilin Qi, Yun Li, Nijun Li, Qianren Wang

Abstract: Augmenting Large Language Models (LLMs) for Question Answering (QA) with domain specific data has attracted wide attention. However, domain data often exists in a hybrid format, including text and semi-structured tables, posing challenges for the seamless integration of information. Table-to-Text Generation is a promising solution by facilitating the transformation of hybrid data into a uniformly… ▽ More Augmenting Large Language Models (LLMs) for Question Answering (QA) with domain specific data has attracted wide attention. However, domain data often exists in a hybrid format, including text and semi-structured tables, posing challenges for the seamless integration of information. Table-to-Text Generation is a promising solution by facilitating the transformation of hybrid data into a uniformly text-formatted corpus. Although this technique has been widely studied by the NLP community, there is currently no comparative analysis on how corpora generated by different table-to-text methods affect the performance of QA systems. In this paper, we address this research gap in two steps. First, we innovatively integrate table-to-text generation into the framework of enhancing LLM-based QA systems with domain hybrid data. Then, we utilize this framework in real-world industrial data to conduct extensive experiments on two types of QA systems (DSFT and RAG frameworks) with four representative methods: Markdown format, Template serialization, TPLM-based method, and LLM-based method. Based on the experimental results, we draw some empirical findings and explore the underlying reasons behind the success of some methods. We hope the findings of this work will provide a valuable reference for the academic and industrial communities in developing robust QA systems. △ Less

Submitted 9 April, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

Comments: Accepted to NAACL 2024 Industry Track Paper

arXiv:2402.10816 [pdf, other]

TernaryVote: Differentially Private, Communication Efficient, and Byzantine Resilient Distributed Optimization on Heterogeneous Data

Authors: Richeng Jin, Yujie Gu, Kai Yue, Xiaofan He, Zhaoyang Zhang, Huaiyu Dai

Abstract: Distributed training of deep neural networks faces three critical challenges: privacy preservation, communication efficiency, and robustness to fault and adversarial behaviors. Although significant research efforts have been devoted to addressing these challenges independently, their synthesis remains less explored. In this paper, we propose TernaryVote, which combines a ternary compressor and the… ▽ More Distributed training of deep neural networks faces three critical challenges: privacy preservation, communication efficiency, and robustness to fault and adversarial behaviors. Although significant research efforts have been devoted to addressing these challenges independently, their synthesis remains less explored. In this paper, we propose TernaryVote, which combines a ternary compressor and the majority vote mechanism to realize differential privacy, gradient compression, and Byzantine resilience simultaneously. We theoretically quantify the privacy guarantee through the lens of the emerging f-differential privacy (DP) and the Byzantine resilience of the proposed algorithm. Particularly, in terms of privacy guarantees, compared to the existing sign-based approach StoSign, the proposed method improves the dimension dependence on the gradient size and enjoys privacy amplification by mini-batch sampling while ensuring a comparable convergence rate. We also prove that TernaryVote is robust when less than 50% of workers are blind attackers, which matches that of SIGNSGD with majority vote. Extensive experimental results validate the effectiveness of the proposed algorithm. △ Less

Submitted 16 February, 2024; originally announced February 2024.

arXiv:2402.10555 [pdf, other]

SPAR: Personalized Content-Based Recommendation via Long Engagement Attention

Authors: Chiyu Zhang, Yifei Sun, Jun Chen, Jie Lei, Muhammad Abdul-Mageed, Sinong Wang, Rong Jin, Sem Park, Ning Yao, Bo Long

Abstract: Leveraging users' long engagement histories is essential for personalized content recommendations. The success of pretrained language models (PLMs) in NLP has led to their use in encoding user histories and candidate items, framing content recommendations as textual semantic matching tasks. However, existing works still struggle with processing very long user historical text and insufficient user-… ▽ More Leveraging users' long engagement histories is essential for personalized content recommendations. The success of pretrained language models (PLMs) in NLP has led to their use in encoding user histories and candidate items, framing content recommendations as textual semantic matching tasks. However, existing works still struggle with processing very long user historical text and insufficient user-item interaction. In this paper, we introduce a content-based recommendation framework, SPAR, which effectively tackles the challenges of holistic user interest extraction from the long user engagement history. It achieves so by leveraging PLM, poly-attention layers and attention sparsity mechanisms to encode user's history in a session-based manner. The user and item side features are sufficiently fused for engagement prediction while maintaining standalone representations for both sides, which is efficient for practical model deployment. Moreover, we enhance user profiling by exploiting large language model (LLM) to extract global interests from user engagement history. Extensive experiments on two benchmark datasets demonstrate that our framework outperforms existing state-of-the-art (SoTA) methods. △ Less

Submitted 21 May, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

Comments: Under review

arXiv:2402.05830 [pdf, other]

Sparse-VQ Transformer: An FFN-Free Framework with Vector Quantization for Enhanced Time Series Forecasting

Authors: Yanjun Zhao, Tian Zhou, Chao Chen, Liang Sun, Yi Qian, Rong Jin

Abstract: Time series analysis is vital for numerous applications, and transformers have become increasingly prominent in this domain. Leading methods customize the transformer architecture from NLP and CV, utilizing a patching technique to convert continuous signals into segments. Yet, time series data are uniquely challenging due to significant distribution shifts and intrinsic noise levels. To address th… ▽ More Time series analysis is vital for numerous applications, and transformers have become increasingly prominent in this domain. Leading methods customize the transformer architecture from NLP and CV, utilizing a patching technique to convert continuous signals into segments. Yet, time series data are uniquely challenging due to significant distribution shifts and intrinsic noise levels. To address these two challenges,we introduce the Sparse Vector Quantized FFN-Free Transformer (Sparse-VQ). Our methodology capitalizes on a sparse vector quantization technique coupled with Reverse Instance Normalization (RevIN) to reduce noise impact and capture sufficient statistics for forecasting, serving as an alternative to the Feed-Forward layer (FFN) in the transformer architecture. Our FFN-free approach trims the parameter count, enhancing computational efficiency and reducing overfitting. Through evaluations across ten benchmark datasets, including the newly introduced CAISO dataset, Sparse-VQ surpasses leading models with a 7.84% and 4.17% decrease in MAE for univariate and multivariate time series forecasting, respectively. Moreover, it can be seamlessly integrated with existing transformer-based models to elevate their performance. △ Less

Submitted 8 February, 2024; originally announced February 2024.

arXiv:2402.05823 [pdf, other]

FusionSF: Fuse Heterogeneous Modalities in a Vector Quantized Framework for Robust Solar Power Forecasting

Authors: Ziqing Ma, Wenwei Wang, Tian Zhou, Chao Chen, Bingqing Peng, Liang Sun, Rong Jin

Abstract: Accurate solar power forecasting is crucial to integrate photovoltaic plants into the electric grid, schedule and secure the power grid safety. This problem becomes more demanding for those newly installed solar plants which lack sufficient data. Current research predominantly relies on historical solar power data or numerical weather prediction in a single-modality format, ignoring the complement… ▽ More Accurate solar power forecasting is crucial to integrate photovoltaic plants into the electric grid, schedule and secure the power grid safety. This problem becomes more demanding for those newly installed solar plants which lack sufficient data. Current research predominantly relies on historical solar power data or numerical weather prediction in a single-modality format, ignoring the complementary information provided in different modalities. In this paper, we propose a multi-modality fusion framework to integrate historical power data, numerical weather prediction, and satellite images, significantly improving forecast performance. We introduce a vector quantized framework that aligns modalities with varying information densities, striking a balance between integrating sufficient information and averting model overfitting. Our framework demonstrates strong zero-shot forecasting capability, which is especially useful for those newly installed plants. Moreover, we collect and release a multi-modal solar power (MMSP) dataset from real-world plants to further promote the research of multi-modal solar forecasting algorithms. Our extensive experiments show that our model not only operates with robustness but also boosts accuracy in both zero-shot forecasting and scenarios rich with training data, surpassing leading models. We have incorporated it into our eForecaster platform and deployed it for more than 300 solar plants with a capacity of over 15GW. △ Less

Submitted 8 February, 2024; originally announced February 2024.

arXiv:2402.05370 [pdf, other]

Attention as Robust Representation for Time Series Forecasting

Authors: PeiSong Niu, Tian Zhou, Xue Wang, Liang Sun, Rong Jin

Abstract: Time series forecasting is essential for many practical applications, with the adoption of transformer-based models on the rise due to their impressive performance in NLP and CV. Transformers' key feature, the attention mechanism, dynamically fusing embeddings to enhance data representation, often relegating attention weights to a byproduct role. Yet, time series data, characterized by noise and n… ▽ More Time series forecasting is essential for many practical applications, with the adoption of transformer-based models on the rise due to their impressive performance in NLP and CV. Transformers' key feature, the attention mechanism, dynamically fusing embeddings to enhance data representation, often relegating attention weights to a byproduct role. Yet, time series data, characterized by noise and non-stationarity, poses significant forecasting challenges. Our approach elevates attention weights as the primary representation for time series, capitalizing on the temporal relationships among data points to improve forecasting accuracy. Our study shows that an attention map, structured using global landmarks and local windows, acts as a robust kernel representation for data points, withstanding noise and shifts in distribution. Our method outperforms state-of-the-art models, reducing mean squared error (MSE) in multivariate time series forecasting by a notable 3.6% without altering the core neural network architecture. It serves as a versatile component that can readily replace recent patching based embedding schemes in transformer-based models, boosting their performance. △ Less

Submitted 7 February, 2024; originally announced February 2024.

arXiv:2402.02309 [pdf, other]

Jailbreaking Attack against Multimodal Large Language Model

Authors: Zhenxing Niu, Haodong Ren, Xinbo Gao, Gang Hua, Rong Jin

Abstract: This paper focuses on jailbreaking attacks against multi-modal large language models (MLLMs), seeking to elicit MLLMs to generate objectionable responses to harmful user queries. A maximum likelihood-based algorithm is proposed to find an \emph{image Jailbreaking Prompt} (imgJP), enabling jailbreaks against MLLMs across multiple unseen prompts and images (i.e., data-universal property). Our approa… ▽ More This paper focuses on jailbreaking attacks against multi-modal large language models (MLLMs), seeking to elicit MLLMs to generate objectionable responses to harmful user queries. A maximum likelihood-based algorithm is proposed to find an \emph{image Jailbreaking Prompt} (imgJP), enabling jailbreaks against MLLMs across multiple unseen prompts and images (i.e., data-universal property). Our approach exhibits strong model-transferability, as the generated imgJP can be transferred to jailbreak various models, including MiniGPT-v2, LLaVA, InstructBLIP, and mPLUG-Owl2, in a black-box manner. Moreover, we reveal a connection between MLLM-jailbreaks and LLM-jailbreaks. As a result, we introduce a construction-based method to harness our approach for LLM-jailbreaks, demonstrating greater efficiency than current state-of-the-art methods. The code is available here. \textbf{Warning: some content generated by language models may be offensive to some readers.} △ Less

Submitted 3 February, 2024; originally announced February 2024.

Showing 1–50 of 542 results for author: Jin, R