Skip to main content

Showing 1–50 of 228 results for author: Qin, B

  1. arXiv:2407.08937  [pdf, other

    cs.CL cs.AI

    Self-Evolving GPT: A Lifelong Autonomous Experiential Learner

    Authors: Jinglong Gao, Xiao Ding, Yiming Cui, Jianbai Zhao, Hepeng Wang, Ting Liu, Bing Qin

    Abstract: To improve the performance of large language models (LLMs), researchers have explored providing LLMs with textual task-solving experience via prompts. However, they rely on manual efforts to acquire and apply such experience for each task, which is not feasible for the growing demand for LLMs and the variety of user questions. To address this issue, we design a lifelong autonomous experiential lea… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted by ACL 2024 MAIN

  2. arXiv:2407.02936  [pdf, other

    cs.AI cs.CL

    GraCoRe: Benchmarking Graph Comprehension and Complex Reasoning in Large Language Models

    Authors: Zike Yuan, Ming Liu, Hui Wang, Bing Qin

    Abstract: Evaluating the graph comprehension and reasoning abilities of Large Language Models (LLMs) is challenging and often incomplete. Existing benchmarks focus primarily on pure graph understanding, lacking a comprehensive evaluation across all graph types and detailed capability definitions. This paper presents GraCoRe, a benchmark for systematically assessing LLMs' graph comprehension and reasoning. G… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  3. arXiv:2407.00569  [pdf, other

    cs.CV cs.AI cs.CL

    Investigating and Mitigating the Multimodal Hallucination Snowballing in Large Vision-Language Models

    Authors: Weihong Zhong, Xiaocheng Feng, Liang Zhao, Qiming Li, Lei Huang, Yuxuan Gu, Weitao Ma, Yuan Xu, Bing Qin

    Abstract: Though advanced in understanding visual information with human languages, Large Vision-Language Models (LVLMs) still suffer from multimodal hallucinations. A natural concern is that during multimodal interaction, the generated hallucinations could influence the LVLMs' subsequent generation. Thus, we raise a question: When presented with a query relevant to the previously generated hallucination, w… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: Accepted to ACL 2024 Main Conference. 21 pages, 20 figures

  4. arXiv:2406.19820  [pdf, other

    cs.CL cs.AI

    BeamAggR: Beam Aggregation Reasoning over Multi-source Knowledge for Multi-hop Question Answering

    Authors: Zheng Chu, Jingchang Chen, Qianglong Chen, Haotian Wang, Kun Zhu, Xiyuan Du, Weijiang Yu, Ming Liu, Bing Qin

    Abstract: Large language models (LLMs) have demonstrated strong reasoning capabilities. Nevertheless, they still suffer from factual errors when tackling knowledge-intensive tasks. Retrieval-augmented reasoning represents a promising approach. However, significant challenges still persist, including inaccurate and insufficient retrieval for complex questions, as well as difficulty in integrating multi-sourc… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024

  5. arXiv:2406.18227  [pdf, other

    cs.CV cs.CL

    GUIDE: A Guideline-Guided Dataset for Instructional Video Comprehension

    Authors: Jiafeng Liang, Shixin Jiang, Zekun Wang, Haojie Pan, Zerui Chen, Zheng Chu, Ming Liu, Ruiji Fu, Zhongyuan Wang, Bing Qin

    Abstract: There are substantial instructional videos on the Internet, which provide us tutorials for completing various tasks. Existing instructional video datasets only focus on specific steps at the video level, lacking experiential guidelines at the task level, which can lead to beginners struggling to learn new tasks due to the lack of relevant experience. Moreover, the specific steps without guidelines… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: IJCAI 2024

  6. arXiv:2406.18020  [pdf, other

    cs.LG cs.AI physics.chem-ph

    MolFusion: Multimodal Fusion Learning for Molecular Representations via Multi-granularity Views

    Authors: Muzhen Cai, Sendong Zhao, Haochun Wang, Yanrui Du, Zewen Qiang, Bing Qin, Ting Liu

    Abstract: Artificial Intelligence predicts drug properties by encoding drug molecules, aiding in the rapid screening of candidates. Different molecular representations, such as SMILES and molecule graphs, contain complementary information for molecular encoding. Thus exploiting complementary information from different molecular representations is one of the research priorities in molecular encoding. Most ex… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 8 pages, 5 figures

  7. arXiv:2406.15796  [pdf, other

    cs.CL

    Rethinking Entity-level Unlearning for Large Language Models

    Authors: Weitao Ma, Xiaocheng Feng, Weihong Zhong, Lei Huang, Yangfan Ye, Bing Qin

    Abstract: Large language model unlearning has gained increasing attention due to its potential to mitigate security and privacy concerns. Current research predominantly focuses on Instance-level unlearning, specifically aiming at forgetting predefined instances of sensitive content. However, a notable gap still exists in exploring the deletion of complete entity-related information, which is crucial in many… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: Work in progress

  8. arXiv:2406.11517  [pdf, other

    cs.LG cs.AI

    Revisiting Spurious Correlation in Domain Generalization

    Authors: Bin Qin, Jiangmeng Li, Yi Li, Xuesong Wu, Yupeng Wang, Wenwen Qiang, Jianwen Cao

    Abstract: Without loss of generality, existing machine learning techniques may learn spurious correlation dependent on the domain, which exacerbates the generalization of models in out-of-distribution (OOD) scenarios. To address this issue, recent works build a structural causal model (SCM) to describe the causality within data generation process, thereby motivating methods to avoid the learning of spurious… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  9. arXiv:2406.11501  [pdf, other

    cs.LG cs.AI stat.ME

    Teleporter Theory: A General and Simple Approach for Modeling Cross-World Counterfactual Causality

    Authors: Jiangmeng Li, Bin Qin, Qirui Ji, Yi Li, Wenwen Qiang, Jianwen Cao, Fanjiang Xu

    Abstract: Leveraging the development of structural causal model (SCM), researchers can establish graphical models for exploring the causal mechanisms behind machine learning techniques. As the complexity of machine learning applications rises, single-world interventionism causal analysis encounters theoretical adaptation limitations. Accordingly, cross-world counterfactual approach extends our understanding… ▽ More

    Submitted 18 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  10. arXiv:2406.08124  [pdf, other

    cs.CL cs.AI

    Legend: Leveraging Representation Engineering to Annotate Safety Margin for Preference Datasets

    Authors: Duanyu Feng, Bowen Qin, Chen Huang, Youcheng Huang, Zheng Zhang, Wenqiang Lei

    Abstract: The success of the reward model in distinguishing between responses with subtle safety differences depends critically on the high-quality preference dataset, which should capture the fine-grained nuances of harmful and harmless responses. This motivates the need to develop a dataset involving preference margins, which accurately quantify how harmless one response is compared to another. In this pa… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Our code is available at https://github.com/colfeng/Legend

  11. arXiv:2406.08068  [pdf, other

    cs.CL

    Large Language Models Meet Text-Centric Multimodal Sentiment Analysis: A Survey

    Authors: Hao Yang, Yanyan Zhao, Yang Wu, Shilong Wang, Tian Zheng, Hongbo Zhang, Wanxiang Che, Bing Qin

    Abstract: Compared to traditional sentiment analysis, which only considers text, multimodal sentiment analysis needs to consider emotional signals from multimodal sources simultaneously and is therefore more consistent with the way how humans process sentiment in real-world scenarios. It involves processing emotional information from various sources such as natural language, images, videos, audio, physiolog… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  12. arXiv:2406.05374  [pdf, other

    cs.CL

    Planning Like Human: A Dual-process Framework for Dialogue Planning

    Authors: Tao He, Lizi Liao, Yixin Cao, Yuanxing Liu, Ming Liu, Zerui Chen, Bing Qin

    Abstract: In proactive dialogue, the challenge lies not just in generating responses but in steering conversations toward predetermined goals, a task where Large Language Models (LLMs) typically struggle due to their reactive nature. Traditional approaches to enhance dialogue planning in LLMs, ranging from elaborate prompt engineering to the integration of policy networks, either face efficiency issues or d… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: 24 pages, 5 figures, ACL 2024 main conference

  13. arXiv:2406.02911  [pdf, other

    cs.CL

    Improving In-Context Learning with Prediction Feedback for Sentiment Analysis

    Authors: Hongling Xu, Qianlong Wang, Yice Zhang, Min Yang, Xi Zeng, Bing Qin, Ruifeng Xu

    Abstract: Large language models (LLMs) have achieved promising results in sentiment analysis through the in-context learning (ICL) paradigm. However, their ability to distinguish subtle sentiments still remains a challenge. Inspired by the human ability to adjust understanding via feedback, this paper enhances ICL by incorporating prior predictions and feedback, aiming to rectify sentiment misinterpretation… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted by ACL 2024 (Findings)

  14. arXiv:2406.01983  [pdf, other

    cs.CL

    RKLD: Reverse KL-Divergence-based Knowledge Distillation for Unlearning Personal Information in Large Language Models

    Authors: Bichen Wang, Yuzhe Zi, Yixin Sun, Yanyan Zhao, Bing Qin

    Abstract: With the passage of the Right to Be Forgotten (RTBF) regulations and the scaling up of language model training datasets, research on model unlearning in large language models (LLMs) has become more crucial. Before the era of LLMs, machine unlearning research focused mainly on classification tasks in models with small parameters. In these tasks, the content to be forgotten or retained is clear and… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Work is in progress

  15. arXiv:2406.01549  [pdf, other

    cs.CL cs.AI

    An Information Bottleneck Perspective for Effective Noise Filtering on Retrieval-Augmented Generation

    Authors: Kun Zhu, Xiaocheng Feng, Xiyuan Du, Yuxuan Gu, Weijiang Yu, Haotian Wang, Qianglong Chen, Zheng Chu, Jingchang Chen, Bing Qin

    Abstract: Retrieval-augmented generation integrates the capabilities of large language models with relevant information retrieved from an extensive corpus, yet encounters challenges when confronted with real-world noisy data. One recent solution is to train a filter module to find relevant content but only achieve suboptimal noise compression. In this paper, we propose to introduce the information bottlenec… ▽ More

    Submitted 4 July, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024

  16. arXiv:2405.20092  [pdf, other

    cs.CL cs.SE

    Divide-and-Conquer Meets Consensus: Unleashing the Power of Functions in Code Generation

    Authors: Jingchang Chen, Hongxuan Tang, Zheng Chu, Qianglong Chen, Zekun Wang, Ming Liu, Bing Qin

    Abstract: Despite recent progress made by large language models in code generation, they still struggle with programs that meet complex requirements. Recent work utilizes plan-and-solve decomposition to decrease the complexity and leverage self-tests to refine the generated program. Yet, planning deep-inside requirements in advance can be challenging, and the tests need to be accurate to accomplish self-imp… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  17. arXiv:2405.15307  [pdf, other

    cs.CL

    Before Generation, Align it! A Novel and Effective Strategy for Mitigating Hallucinations in Text-to-SQL Generation

    Authors: Ge Qu, Jinyang Li, Bowen Li, Bowen Qin, Nan Huo, Chenhao Ma, Reynold Cheng

    Abstract: Large Language Models (LLMs) driven by In-Context Learning (ICL) have significantly improved the performance of text-to-SQL. Previous methods generally employ a two-stage reasoning framework, namely 1) schema linking and 2) logical synthesis, making the framework not only effective but also interpretable. Despite these advancements, the inherent bad nature of the generalization of LLMs often resul… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: Accepted to ACL Findings 2024

  18. arXiv:2405.14488  [pdf, other

    cs.CL

    MoGU: A Framework for Enhancing Safety of Open-Sourced LLMs While Preserving Their Usability

    Authors: Yanrui Du, Sendong Zhao, Danyang Zhao, Ming Ma, Yuhan Chen, Liangyu Huo, Qing Yang, Dongliang Xu, Bing Qin

    Abstract: Large Language Models (LLMs) are increasingly deployed in various applications. As their usage grows, concerns regarding their safety are rising, especially in maintaining harmless responses when faced with malicious instructions. Many defense strategies have been developed to enhance the safety of LLMs. However, our research finds that existing defense strategies lead LLMs to predominantly adopt… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  19. arXiv:2405.13820  [pdf, other

    cs.CL

    Towards Comprehensive and Efficient Post Safety Alignment of Large Language Models via Safety Patching

    Authors: Weixiang Zhao, Yulin Hu, Zhuojun Li, Yang Deng, Yanyan Zhao, Bing Qin, Tat-Seng Chua

    Abstract: Safety alignment of large language models (LLMs) has been gaining increasing attention. However, current safety-aligned LLMs suffer from the fragile and imbalanced safety mechanisms, which can still be induced to generate unsafe responses, exhibit over-safety by rejecting safe user inputs, and fail to preserve general utility after safety alignment. To this end, we propose a novel post safety alig… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 24 pages, 8 figures and 12 tables

  20. arXiv:2405.02933  [pdf, other

    cs.CL

    Relay Decoding: Concatenating Large Language Models for Machine Translation

    Authors: Chengpeng Fu, Xiaocheng Feng, Yichong Huang, Wenshuai Huo, Baohang Li, Hui Wang, Bin Qin, Ting Liu

    Abstract: Leveraging large language models for machine translation has demonstrated promising results. However, it does require the large language models to possess the capability of handling both the source and target languages in machine translation. When it is challenging to find large models that support the desired languages, resorting to continuous learning methods becomes a costly endeavor. To mitiga… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: Work in progress

  21. arXiv:2404.16058  [pdf, ps, other

    math.AP math.FA

    Sign Changing Critical Points for Locally Lipschitz Functionals

    Authors: Xian Xu, Baoxia Qin

    Abstract: In this paper, some existence results for sign-changing critical points of locally Lipschitz functionals in real Banach space are obtained by the method combining the invariant sets of descending ow method with a quantitative deformation. First we assume the locally Lipschitz functionals to be outwardly directed on the the boundary of some closed convex sets of the real Banach space. By using the… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: arXiv admin note: text overlap with arXiv:2404.13072

  22. arXiv:2404.13072  [pdf, ps, other

    math.AP math.FA

    The Method of Invariant Sets of Descending Flow for Locally Lipschitz Functionals

    Authors: Xian Xu, Baoxia Qin

    Abstract: In this paper, we extend the method of invariant sets of descending flow that proposed by Sun Jingxian for smooth functionals to the locally Lipschitz functionals. By this way, we obtain the existence results for the positive, negative and sign-changing critical points of the locally Lipschitz functionals, and apply these theoretical results to the study of differential inclusion problems with p-L… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  23. arXiv:2404.12715  [pdf, other

    cs.CL

    Ensemble Learning for Heterogeneous Large Language Models with Deep Parallel Collaboration

    Authors: Yichong Huang, Xiaocheng Feng, Baohang Li, Yang Xiang, Hui Wang, Bing Qin, Ting Liu

    Abstract: Large language models (LLMs) exhibit complementary strengths in various tasks, motivating the research of LLM ensembling. However, existing work focuses on training an extra reward model or fusion model to select or combine all candidate answers, posing a great challenge to the generalization on unseen data distributions. Besides, prior methods use textual responses as communication media, ignorin… ▽ More

    Submitted 30 May, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

    Comments: 16 pages, 9 figures, 9 tables

  24. arXiv:2404.04932  [pdf, other

    cs.CL cs.AI

    Towards Understanding the Influence of Reward Margin on Preference Model Performance

    Authors: Bowen Qin, Duanyu Feng, Xi Yang

    Abstract: Reinforcement Learning from Human Feedback (RLHF) is a widely used framework for the training of language models. However, the process of using RLHF to develop a language model that is well-aligned presents challenges, especially when it comes to optimizing the reward model. Our research has found that existing reward models, when trained using the traditional ranking objective based on human pref… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  25. arXiv:2404.04626  [pdf, ps, other

    cs.CL cs.AI

    Towards Analyzing and Understanding the Limitations of DPO: A Theoretical Perspective

    Authors: Duanyu Feng, Bowen Qin, Chen Huang, Zheng Zhang, Wenqiang Lei

    Abstract: Direct Preference Optimization (DPO), which derives reward signals directly from pairwise preference data, has shown its effectiveness on aligning Large Language Models (LLMs) with human preferences. Despite its widespread use across various tasks, DPO has been criticized for its sensitivity to the SFT's effectiveness and its hindrance to the learning capacity towards human-preferred responses, le… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: Draft version

  26. arXiv:2403.18843  [pdf, other

    cs.CV cs.CL cs.LG cs.SD eess.AS

    JEP-KD: Joint-Embedding Predictive Architecture Based Knowledge Distillation for Visual Speech Recognition

    Authors: Chang Sun, Hong Yang, Bo Qin

    Abstract: Visual Speech Recognition (VSR) tasks are generally recognized to have a lower theoretical performance ceiling than Automatic Speech Recognition (ASR), owing to the inherent limitations of conveying semantic information visually. To mitigate this challenge, this paper introduces an advanced knowledge distillation approach using a Joint-Embedding Predictive Architecture (JEPA), named JEP-KD, design… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

  27. arXiv:2403.16662  [pdf, other

    cs.CL

    RU22Fact: Optimizing Evidence for Multilingual Explainable Fact-Checking on Russia-Ukraine Conflict

    Authors: Yirong Zeng, Xiao Ding, Yi Zhao, Xiangyu Li, Jie Zhang, Chao Yao, Ting Liu, Bing Qin

    Abstract: Fact-checking is the task of verifying the factuality of a given claim by examining the available evidence. High-quality evidence plays a vital role in enhancing fact-checking systems and facilitating the generation of explanations that are understandable to humans. However, the provision of both sufficient and relevant evidence for explainable fact-checking systems poses a challenge. To tackle th… ▽ More

    Submitted 26 March, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

    Comments: 12 pages, 3 figures, accepted by lrec-coling2024

  28. arXiv:2403.14294  [pdf, other

    hep-ph nucl-th

    Electron-positron annihilation into $K\bar{K}π$ and their contributions to $(g-2)_μ$

    Authors: Bing-Hai Qin, Wen Qin, Ling-Yun Dai

    Abstract: In this paper, a coherent study of the $e^+e^-$ annihilation into $K^+K^-π^0$, $K^0_SK^0_Lπ^0$ and $K^0_SK^\pmπ^\mp$ is carried out within the framework of resonance chiral theory. The amplitudes are fixed by fitting to the experimental cross-section and invariant mass spectrum. With these amplitudes, one can calculate the hadronic vacuum polarization form factors of these processes. The leading o… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: 13 pages, 2 figures

  29. arXiv:2403.09085  [pdf, other

    cs.CL cs.AI

    Meaningful Learning: Advancing Abstract Reasoning in Large Language Models via Generic Fact Guidance

    Authors: Kai Xiong, Xiao Ding, Ting Liu, Bing Qin, Dongliang Xu, Qing Yang, Hongtao Liu, Yixin Cao

    Abstract: Large language models (LLMs) have developed impressive performance and strong explainability across various reasoning scenarios, marking a significant stride towards mimicking human-like intelligence. Despite this, when tasked with simple questions supported by a generic fact, LLMs often fail to provide consistent and precise answers, indicating a deficiency in abstract reasoning abilities. This h… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  30. Learning to Describe for Predicting Zero-shot Drug-Drug Interactions

    Authors: Fangqi Zhu, Yongqi Zhang, Lei Chen, Bing Qin, Ruifeng Xu

    Abstract: Adverse drug-drug interactions~(DDIs) can compromise the effectiveness of concurrent drug administration, posing a significant challenge in healthcare. As the development of new drugs continues, the potential for unknown adverse effects resulting from DDIs becomes a growing concern. Traditional computational methods for DDI prediction may fail to capture interactions for new drugs due to the lack… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  31. arXiv:2403.02661  [pdf, other

    cs.SE

    How to Save My Gas Fees: Understanding and Detecting Real-world Gas Issues in Solidity Programs

    Authors: Mengting He, Shihao Xia, Boqin Qin, Nobuko Yoshida, Tingting Yu, Linhai Song, Yiying Zhang

    Abstract: The execution of smart contracts on Ethereum, a public blockchain system, incurs a fee called gas fee for its computation and data-store consumption. When programmers develop smart contracts (e.g., in the Solidity programming language), they could unknowingly write code snippets that unnecessarily cause more gas fees. These issues, or what we call gas wastes, could lead to significant monetary was… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  32. arXiv:2403.02436  [pdf, other

    cs.CL

    How does Architecture Influence the Base Capabilities of Pre-trained Language Models? A Case Study Based on FFN-Wider Transformer Models

    Authors: Xin Lu, Yanyan Zhao, Bing Qin

    Abstract: Pre-trained language models have been proven to possess strong base capabilities, which not only excel in in-distribution language modeling but also show powerful abilities in out-of-distribution language modeling, transfer learning and few-shot learning. Unlike existing work focusing on the influence of scale on base capabilities, our work examines the influence of architecture on those. Specific… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  33. arXiv:2403.01994  [pdf, other

    cs.CL

    Vanilla Transformers are Transfer Capability Teachers

    Authors: Xin Lu, Yanyan Zhao, Bing Qin

    Abstract: Recently, Mixture of Experts (MoE) Transformers have garnered increasing attention due to their advantages in model capacity and computational efficiency. However, studies have indicated that MoE Transformers underperform vanilla Transformers in many downstream tasks, significantly diminishing the practical value of MoE models. To explain this issue, we propose that the pre-training performance an… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  34. arXiv:2403.01969  [pdf, other

    cs.CL

    AS-ES Learning: Towards Efficient CoT Learning in Small Models

    Authors: Nuwa Xi, Yuhan Chen, Sendong Zhao, Haochun Wang, Bing Qin, Ting Liu

    Abstract: Chain-of-Thought (CoT) serves as a critical emerging ability in LLMs, especially when it comes to logical reasoning. Attempts have been made to induce such ability in small models as well by distilling from the data with CoT generated by Large Language Models (LLMs). However, existing methods often simply generate and incorporate more data from LLMs and fail to note the importance of efficiently u… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  35. arXiv:2403.01203  [pdf, other

    cs.LG cs.CL cs.DB

    Pseudo-Label Calibration Semi-supervised Multi-Modal Entity Alignment

    Authors: Luyao Wang, Pengnian Qi, Xigang Bao, Chunlai Zhou, Biao Qin

    Abstract: Multi-modal entity alignment (MMEA) aims to identify equivalent entities between two multi-modal knowledge graphs for integration. Unfortunately, prior arts have attempted to improve the interaction and fusion of multi-modal information, which have overlooked the influence of modal-specific noise and the usage of labeled and unlabeled data in semi-supervised settings. In this work, we introduce a… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

    Comments: accepted by AAAI2024

  36. arXiv:2402.11537  [pdf, other

    cs.CL cs.AI

    Deciphering the Impact of Pretraining Data on Large Language Models through Machine Unlearning

    Authors: Yang Zhao, Li Du, Xiao Ding, Kai Xiong, Zhouhao Sun, Jun Shi, Ting Liu, Bing Qin

    Abstract: Through pretraining on a corpus with various sources, Large Language Models (LLMs) have gained impressive performance. However, the impact of each component of the pretraining corpus remains opaque. As a result, the organization of the pretraining corpus is still empirical and may deviate from the optimal. To address this issue, we systematically analyze the impact of 48 datasets from 5 major cate… ▽ More

    Submitted 26 March, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

  37. arXiv:2402.10073  [pdf, other

    cs.CL

    Both Matter: Enhancing the Emotional Intelligence of Large Language Models without Compromising the General Intelligence

    Authors: Weixiang Zhao, Zhuojun Li, Shilong Wang, Yang Wang, Yulin Hu, Yanyan Zhao, Chen Wei, Bing Qin

    Abstract: Emotional Intelligence (EI), consisting of emotion perception, emotion cognition and emotion expression, plays the critical roles in improving user interaction experience for the current large language model (LLM) based conversational general AI assistants. Previous works mainly focus on raising the emotion perception ability of them via naive fine-tuning on EI-related classification or regression… ▽ More

    Submitted 12 June, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

    Comments: To appear at Findings of ACL 2024

  38. arXiv:2402.01349  [pdf, other

    cs.CL cs.AI

    Beyond the Answers: Reviewing the Rationality of Multiple Choice Question Answering for the Evaluation of Large Language Models

    Authors: Haochun Wang, Sendong Zhao, Zewen Qiang, Nuwa Xi, Bing Qin, Ting Liu

    Abstract: In the field of natural language processing (NLP), Large Language Models (LLMs) have precipitated a paradigm shift, markedly enhancing performance in natural language generation tasks. Despite these advancements, the comprehensive evaluation of LLMs remains an inevitable challenge for the community. Recently, the utilization of Multiple Choice Question Answering (MCQA) as a benchmark for LLMs has… ▽ More

    Submitted 29 May, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

    Comments: 17 pages, 8 figures

  39. arXiv:2401.16150  [pdf

    cond-mat.mes-hall

    Sliding ferroelectric memories and synapses

    Authors: Xiuzhen Li, Biao Qin, Yaxian Wang, Yue Xi, Zhiheng Huang, Mengze Zhao, Yalin Peng, Zitao Chen, Zitian Pan, Jundong Zhu, Chenyang Cui, Rong Yang, Wei Yang, Sheng Meng, Dongxia Shi, Xuedong Bai, Can Liu, Na Li, Jianshi Tang, Kaihui Liu, Luojun Du, Guangyu Zhang

    Abstract: Ferroelectric materials with switchable electric polarization hold great promise for a plethora of emergent applications, such as post-Moore's law nanoelectronics, beyond-Boltzmann transistors, non-volatile memories, and above-bandgap photovoltaic devices. Recent advances have uncovered an exotic sliding ferroelectric mechanism, which endows to design atomically thin ferroelectrics from non-ferroe… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: 16 pages, 4 figures

  40. arXiv:2401.16107  [pdf, other

    cs.CL cs.AI

    Beyond Direct Diagnosis: LLM-based Multi-Specialist Agent Consultation for Automatic Diagnosis

    Authors: Haochun Wang, Sendong Zhao, Zewen Qiang, Nuwa Xi, Bing Qin, Ting Liu

    Abstract: Automatic diagnosis is a significant application of AI in healthcare, where diagnoses are generated based on the symptom description of patients. Previous works have approached this task directly by modeling the relationship between the normalized symptoms and all possible diseases. However, in the clinical diagnostic process, patients are initially consulted by a general practitioner and, if nece… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

  41. arXiv:2401.13225  [pdf, ps, other

    hep-ex

    A New Look at the Scalar Meson $f_0(500)$ via $D^+\to π^+π^-\ell^+ν_\ell$ Decays

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann, H. Cai, X. Cai , et al. (615 additional authors not shown)

    Abstract: Using $2.93~\mathrm{fb}^{-1}$ of $e^+e^-$ collision data collected with the BESIII detector at the center-of-mass energy of 3.773 GeV, we investigate the semileptonic decays $D^+\to π^+π^- \ell^+ν_\ell$ ($\ell=e$ and $μ$). The $D^+\to f_0(500)μ^+ν_μ$ decay is observed for the first time. By analyzing simultaneously the differential decay rates of $D^+\to f_0(500) μ^+ν_μ$ and… ▽ More

    Submitted 4 February, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

    Comments: Supplemental Materials added in this version

    Report number: BAM-00660

  42. arXiv:2401.11403  [pdf, other

    cs.LG cs.CL q-bio.BM

    MolTailor: Tailoring Chemical Molecular Representation to Specific Tasks via Text Prompts

    Authors: Haoqiang Guo, Sendong Zhao, Haochun Wang, Yanrui Du, Bing Qin

    Abstract: Deep learning is now widely used in drug discovery, providing significant acceleration and cost reduction. As the most fundamental building block, molecular representation is essential for predicting molecular properties to enable various downstream applications. Most existing methods attempt to incorporate more information to learn better representations. However, not all features are equally imp… ▽ More

    Submitted 19 April, 2024; v1 submitted 20 January, 2024; originally announced January 2024.

    Comments: Accepted by AAAI 2024

  43. arXiv:2401.08438  [pdf, other

    cs.CL cs.AI cs.LG

    CogGPT: Unleashing the Power of Cognitive Dynamics on Large Language Models

    Authors: Yaojia Lv, Haojie Pan, Ruiji Fu, Ming Liu, Zhongyuan Wang, Bing Qin

    Abstract: Cognitive dynamics are pivotal to advance human understanding of the world. Recent advancements in large language models (LLMs) reveal their potential for cognitive simulation. However, these LLM-based cognitive studies primarily focus on static modeling, overlooking the dynamic nature of cognition. To bridge this gap, we propose the concept of the cognitive dynamics of LLMs and present a correspo… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

  44. arXiv:2401.08295  [pdf, other

    cs.CL

    SAPT: A Shared Attention Framework for Parameter-Efficient Continual Learning of Large Language Models

    Authors: Weixiang Zhao, Shilong Wang, Yulin Hu, Yanyan Zhao, Bing Qin, Xuanyu Zhang, Qing Yang, Dongliang Xu, Wanxiang Che

    Abstract: The continual learning (CL) ability is vital for deploying large language models (LLMs) in the dynamic world. Existing methods devise the learning module to acquire task-specific knowledge with parameter-efficient tuning (PET) block and the selection module to pick out the corresponding one for the testing input, aiming at handling the challenges of catastrophic forgetting and knowledge transfer i… ▽ More

    Submitted 6 June, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: To appear at ACL 2024

  45. arXiv:2401.05072  [pdf, other

    cs.CL

    Aligning Translation-Specific Understanding to General Understanding in Large Language Models

    Authors: Yichong Huang, Xiaocheng Feng, Baohang Li, Chengpeng Fu, Wenshuai Huo, Ting Liu, Bing Qin

    Abstract: Although large language models (LLMs) have shown surprising language understanding and generation capabilities, they have yet to gain a revolutionary advancement in the field of machine translation. One potential cause of the limited performance is the misalignment between the translation-specific understanding and general understanding inside LLMs. To align the translation-specific understanding… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

    Comments: work in progress

  46. arXiv:2312.17044  [pdf, other

    cs.CL

    Length Extrapolation of Transformers: A Survey from the Perspective of Positional Encoding

    Authors: Liang Zhao, Xiaocheng Feng, Xiachong Feng, Dongliang Xu, Qing Yang, Hongtao Liu, Bing Qin, Ting Liu

    Abstract: Transformer has taken the field of natural language processing (NLP) by storm since its birth. Further, Large language models (LLMs) built upon it have captured worldwide attention due to its superior abilities. Nevertheless, all Transformer-based models including these powerful LLMs suffer from a preset length limit and can hardly generalize from short training sequences to longer inference ones,… ▽ More

    Submitted 2 April, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

    Comments: Work in progress

  47. arXiv:2312.14988  [pdf, other

    cs.CV

    Emage: Non-Autoregressive Text-to-Image Generation

    Authors: Zhangyin Feng, Runyi Hu, Liangxin Liu, Fan Zhang, Duyu Tang, Yong Dai, Xiaocheng Feng, Jiwei Li, Bing Qin, Shuming Shi

    Abstract: Autoregressive and diffusion models drive the recent breakthroughs on text-to-image generation. Despite their huge success of generating high-realistic images, a common shortcoming of these models is their high inference latency - autoregressive models run more than a thousand times successively to produce image tokens and diffusion models convert Gaussian noise into images with many hundreds of d… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

  48. arXiv:2312.04889  [pdf, other

    cs.AI cs.CL cs.LG

    KwaiAgents: Generalized Information-seeking Agent System with Large Language Models

    Authors: Haojie Pan, Zepeng Zhai, Hao Yuan, Yaojia Lv, Ruiji Fu, Ming Liu, Zhongyuan Wang, Bing Qin

    Abstract: Driven by curiosity, humans have continually sought to explore and understand the world around them, leading to the invention of various tools to satiate this inquisitiveness. Despite not having the capacity to process and memorize vast amounts of information in their brains, humans excel in critical thinking, planning, reflection, and harnessing available tools to interact with and interpret the… ▽ More

    Submitted 10 January, 2024; v1 submitted 8 December, 2023; originally announced December 2023.

  49. arXiv:2312.04127  [pdf, other

    cs.CL

    Analyzing the Inherent Response Tendency of LLMs: Real-World Instructions-Driven Jailbreak

    Authors: Yanrui Du, Sendong Zhao, Ming Ma, Yuhan Chen, Bing Qin

    Abstract: Extensive work has been devoted to improving the safety mechanism of Large Language Models (LLMs). However, LLMs still tend to generate harmful responses when faced with malicious instructions, a phenomenon referred to as "Jailbreak Attack". In our research, we introduce a novel automatic jailbreak method RADIAL, which bypasses the security mechanism by amplifying the potential of LLMs to generate… ▽ More

    Submitted 23 February, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

  50. arXiv:2311.17667  [pdf, other

    cs.CL cs.AI

    TimeBench: A Comprehensive Evaluation of Temporal Reasoning Abilities in Large Language Models

    Authors: Zheng Chu, Jingchang Chen, Qianglong Chen, Weijiang Yu, Haotian Wang, Ming Liu, Bing Qin

    Abstract: Grasping the concept of time is a fundamental facet of human cognition, indispensable for truly comprehending the intricacies of the world. Previous studies typically focus on specific aspects of time, lacking a comprehensive temporal reasoning benchmark. To address this, we propose TimeBench, a comprehensive hierarchical temporal reasoning benchmark that covers a broad spectrum of temporal reason… ▽ More

    Submitted 28 June, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

    Comments: Accepted to ACL 2024