Skip to main content

Showing 1–50 of 69 results for author: Bi, W

  1. arXiv:2407.05319  [pdf, other

    cs.CL

    Rethinking Targeted Adversarial Attacks For Neural Machine Translation

    Authors: Junjie Wu, Lemao Liu, Wei Bi, Dit-Yan Yeung

    Abstract: Targeted adversarial attacks are widely used to evaluate the robustness of neural machine translation systems. Unfortunately, this paper first identifies a critical issue in the existing settings of NMT targeted adversarial attacks, where their attacking results are largely overestimated. To this end, this paper presents a new setting for NMT targeted adversarial attacks that could lead to reliabl… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: 5 pages, 2 figures, accepted by ICASSP 2024

  2. arXiv:2405.12689  [pdf, other

    cs.CL cs.AI

    Spotting AI's Touch: Identifying LLM-Paraphrased Spans in Text

    Authors: Yafu Li, Zhilin Wang, Leyang Cui, Wei Bi, Shuming Shi, Yue Zhang

    Abstract: AI-generated text detection has attracted increasing attention as powerful language models approach human-level generation. Limited work is devoted to detecting (partially) AI-paraphrased texts. However, AI paraphrasing is commonly employed in various application scenarios for text refinement and diversity. To this end, we propose a novel detection framework, paraphrased text span detection (PTD),… ▽ More

    Submitted 29 May, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

    Comments: ACL 2024 Findings

  3. arXiv:2404.15949  [pdf, other

    cs.CL cs.AI cs.LG

    CORM: Cache Optimization with Recent Message for Large Language Model Inference

    Authors: Jincheng Dai, Zhuowei Huang, Haiyun Jiang, Chen Chen, Deng Cai, Wei Bi, Shuming Shi

    Abstract: Large Language Models (LLMs), despite their remarkable performance across a wide range of tasks, necessitate substantial GPU memory and consume significant computational resources. Beyond the memory taken up by model weights, the memory used by the KV cache rises linearly with sequence length, becoming a primary bottleneck for inference. In this paper, we introduce an innovative method for optimiz… ▽ More

    Submitted 21 June, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

  4. arXiv:2403.01954  [pdf, other

    cs.CL cs.AI cs.LO

    DECIDER: A Dual-System Rule-Controllable Decoding Framework for Language Generation

    Authors: Chen Xu, Tian Lan, Changlong Yu, Wei Wang, Jun Gao, Yu Ji, Qunxi Dong, Kun Qian, Piji Li, Wei Bi, Bin Hu

    Abstract: Constrained decoding approaches aim to control the meaning or style of text generated by a Pre-trained Language Model (PLM) using specific target words during inference. However, these methods often guide plausible continuations by greedily selecting targets, which, while completing the task, may disrupt the natural patterns of human language generation. In this work, we propose a novel decoding f… ▽ More

    Submitted 7 July, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: Submitted to IEEE TKDE (Major Revision), 13 pages, 6 figures

  5. arXiv:2402.19255  [pdf, other

    cs.CL

    GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem Solvers

    Authors: Qintong Li, Leyang Cui, Xueliang Zhao, Lingpeng Kong, Wei Bi

    Abstract: Large language models (LLMs) have achieved impressive performance across various mathematical reasoning benchmarks. However, there are increasing debates regarding whether these models truly understand and apply mathematical knowledge or merely rely on shortcuts for mathematical reasoning. One essential and frequently occurring evidence is that when the math questions are slightly changed, LLMs ca… ▽ More

    Submitted 1 July, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: ACL 2024

  6. arXiv:2402.17532  [pdf, other

    cs.CL

    Retrieval is Accurate Generation

    Authors: Bowen Cao, Deng Cai, Leyang Cui, Xuxin Cheng, Wei Bi, Yuexian Zou, Shuming Shi

    Abstract: Standard language models generate text by selecting tokens from a fixed, finite, and standalone vocabulary. We introduce a novel method that selects context-aware phrases from a collection of supporting documents. One of the most significant challenges for this paradigm shift is determining the training oracles, because a string of text can be segmented in various ways and each segment can be retr… ▽ More

    Submitted 16 March, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: ICLR 2024

  7. arXiv:2402.16107  [pdf, other

    cs.CL

    Knowledge Fusion of Chat LLMs: A Preliminary Technical Report

    Authors: Fanqi Wan, Ziyi Yang, Longguang Zhong, Xiaojun Quan, Xinting Huang, Wei Bi

    Abstract: Recently, FuseLLM introduced the concept of knowledge fusion to transfer the collective knowledge of multiple structurally varied LLMs into a target LLM through lightweight continual training. In this report, we extend the scalability and flexibility of the FuseLLM framework to realize the fusion of chat LLMs, resulting in FusionChat. FusionChat comprises two main stages. Firstly, we undertake kno… ▽ More

    Submitted 28 May, 2024; v1 submitted 25 February, 2024; originally announced February 2024.

    Comments: Technical Report, work in progress

  8. arXiv:2402.13577  [pdf, other

    cs.CL

    BBA: Bi-Modal Behavioral Alignment for Reasoning with Large Vision-Language Models

    Authors: Xueliang Zhao, Xinting Huang, Tingchen Fu, Qintong Li, Shansan Gong, Lemao Liu, Wei Bi, Lingpeng Kong

    Abstract: Multimodal reasoning stands as a pivotal capability for large vision-language models (LVLMs). The integration with Domain-Specific Languages (DSL), offering precise visual representations, equips these models with the opportunity to execute more accurate reasoning in complex and professional domains. However, the vanilla Chain-of-Thought (CoT) prompting method faces challenges in effectively lever… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: Preprint

  9. arXiv:2402.07754  [pdf, other

    cs.CL cs.AI cs.LG

    Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models

    Authors: Jiacheng Ye, Shansan Gong, Liheng Chen, Lin Zheng, Jiahui Gao, Han Shi, Chuan Wu, Xin Jiang, Zhenguo Li, Wei Bi, Lingpeng Kong

    Abstract: Recently, diffusion models have garnered significant interest in the field of text processing due to their many potential advantages compared to conventional autoregressive models. In this work, we propose Diffusion-of-Thought (DoT), a novel approach that integrates diffusion models with Chain-of-Thought, a well-established technique for improving the reasoning ability of autoregressive language m… ▽ More

    Submitted 15 July, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

    Comments: Multiple updates (add boolean logic dataset, add DoT based on SEDD model and add detailed mathematical formulation in Appendix)

  10. arXiv:2401.10768  [pdf, other

    cs.CL

    Knowledge Verification to Nip Hallucination in the Bud

    Authors: Fanqi Wan, Xinting Huang, Leyang Cui, Xiaojun Quan, Wei Bi, Shuming Shi

    Abstract: While large language models (LLMs) have demonstrated exceptional performance across various tasks following human alignment, they may still generate responses that sound plausible but contradict factual knowledge, a phenomenon known as \emph{hallucination}. In this paper, we demonstrate the feasibility of mitigating hallucinations by verifying and minimizing the inconsistency between external know… ▽ More

    Submitted 16 April, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

    Comments: Work in progress

  11. arXiv:2401.10491  [pdf, other

    cs.CL

    Knowledge Fusion of Large Language Models

    Authors: Fanqi Wan, Xinting Huang, Deng Cai, Xiaojun Quan, Wei Bi, Shuming Shi

    Abstract: While training large language models (LLMs) from scratch can generate models with distinct functionalities and strengths, it comes at significant costs and may result in redundant capabilities. Alternatively, a cost-effective and compelling approach is to merge existing pre-trained LLMs into a more potent model. However, due to the varying architectures of these LLMs, directly blending their weigh… ▽ More

    Submitted 22 January, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

    Comments: Accepted to ICLR 2024

  12. arXiv:2312.15710  [pdf, other

    cs.CL cs.AI

    Alleviating Hallucinations of Large Language Models through Induced Hallucinations

    Authors: Yue Zhang, Leyang Cui, Wei Bi, Shuming Shi

    Abstract: Despite their impressive capabilities, large language models (LLMs) have been observed to generate responses that include inaccurate or fabricated information, a phenomenon commonly known as ``hallucination''. In this work, we propose a simple \textit{Induce-then-Contrast} Decoding (ICD) strategy to alleviate hallucinations. We first construct a factually weak LLM by inducing hallucinations from t… ▽ More

    Submitted 11 March, 2024; v1 submitted 25 December, 2023; originally announced December 2023.

    Comments: Work in progress

  13. arXiv:2312.06668  [pdf

    cs.CL cs.SD eess.AS

    Evaluating Self-supervised Speech Models on a Taiwanese Hokkien Corpus

    Authors: Yi-Hui Chou, Kalvin Chang, Meng-Ju Wu, Winston Ou, Alice Wen-Hsin Bi, Carol Yang, Bryan Y. Chen, Rong-Wei Pai, Po-Yen Yeh, Jo-Peng Chiang, Iu-Tshian Phoann, Winnie Chang, Chenxuan Cui, Noel Chen, Jiatong Shi

    Abstract: Taiwanese Hokkien is declining in use and status due to a language shift towards Mandarin in Taiwan. This is partly why it is a low resource language in NLP and speech research today. To ensure that the state of the art in speech processing does not leave Taiwanese Hokkien behind, we contribute a 1.5-hour dataset of Taiwanese Hokkien to ML-SUPERB's hidden set. Evaluating ML-SUPERB's suite of self-… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

    Comments: Accepted to ASRU 2023

  14. arXiv:2310.19740  [pdf, other

    cs.CL

    Collaborative Evaluation: Exploring the Synergy of Large Language Models and Humans for Open-ended Generation Evaluation

    Authors: Qintong Li, Leyang Cui, Lingpeng Kong, Wei Bi

    Abstract: Humans are widely involved in the evaluation of open-ended natural language generation tasks (NLG) that demand creativity, as automatic metrics often exhibit weak correlations with human judgments. Large language models (LLMs) recently have emerged as a scalable and cost-effective alternative to human evaluations. However, both humans and LLMs have limitations, i.e., inherent subjectivity and unre… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

    Comments: We release our resources at \url{https://github.com/qtli/CoEval}

  15. arXiv:2310.15494  [pdf, other

    cs.CL

    TRAMS: Training-free Memory Selection for Long-range Language Modeling

    Authors: Haofei Yu, Cunxiang Wang, Yue Zhang, Wei Bi

    Abstract: The Transformer architecture is crucial for numerous AI models, but it still faces challenges in long-range language modeling. Though several specific transformer architectures have been designed to tackle issues of long-range dependencies, existing methods like Transformer-XL are plagued by a high percentage of ineffective memories. In this study, we present a plug-and-play strategy, known as TRA… ▽ More

    Submitted 20 December, 2023; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: Findings of EMNLP 2023

  16. On Synthetic Data for Back Translation

    Authors: Jiahao Xu, Yubin Ruan, Wei Bi, Guoping Huang, Shuming Shi, Lihui Chen, Lemao Liu

    Abstract: Back translation (BT) is one of the most significant technologies in NMT research fields. Existing attempts on BT share a common characteristic: they employ either beam search or random sampling to generate synthetic data with a backward model but seldom work studies the role of synthetic data in the performance of BT. This motivates us to ask a fundamental question: {\em what kind of synthetic da… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Journal ref: In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 419--430, Seattle, United States. Association for Computational Linguistics

  17. arXiv:2310.12960  [pdf, other

    cs.CL

    SEGO: Sequential Subgoal Optimization for Mathematical Problem-Solving

    Authors: Xueliang Zhao, Xinting Huang, Wei Bi, Lingpeng Kong

    Abstract: Large Language Models (LLMs) have driven substantial progress in artificial intelligence in recent years, exhibiting impressive capabilities across a wide range of tasks, including mathematical problem-solving. Inspired by the success of subgoal-based methods, we propose a novel framework called \textbf{SE}quential sub\textbf{G}oal \textbf{O}ptimization (SEGO) to enhance LLMs' ability to solve mat… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

    Comments: Preprint

  18. arXiv:2310.09168  [pdf, other

    cs.CL

    Explore-Instruct: Enhancing Domain-Specific Instruction Coverage through Active Exploration

    Authors: Fanqi Wan, Xinting Huang, Tao Yang, Xiaojun Quan, Wei Bi, Shuming Shi

    Abstract: Instruction-tuning can be substantially optimized through enhanced diversity, resulting in models capable of handling a broader spectrum of tasks. However, existing data employed for such tuning often exhibit an inadequate coverage of individual domains, limiting the scope for nuanced comprehension and interactions within these areas. To address this deficiency, we propose Explore-Instruct, a nove… ▽ More

    Submitted 24 October, 2023; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: Accepted to EMNLP 2023 (Main Conference)

  19. arXiv:2310.08877  [pdf, other

    cs.CL

    Retrieval-Generation Alignment for End-to-End Task-Oriented Dialogue System

    Authors: Weizhou Shen, Yingqi Gao, Canbin Huang, Fanqi Wan, Xiaojun Quan, Wei Bi

    Abstract: Developing an efficient retriever to retrieve knowledge from a large-scale knowledge base (KB) is critical for task-oriented dialogue systems to effectively handle localized and specialized tasks. However, widely used generative models such as T5 and ChatGPT often struggle to differentiate subtle differences among the retrieved KB records when generating responses, resulting in suboptimal quality… ▽ More

    Submitted 20 October, 2023; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: Accepted to EMNLP 2023 Main Conference

  20. arXiv:2310.07299  [pdf, other

    cs.CL cs.AI

    RobustGEC: Robust Grammatical Error Correction Against Subtle Context Perturbation

    Authors: Yue Zhang, Leyang Cui, Enbo Zhao, Wei Bi, Shuming Shi

    Abstract: Grammatical Error Correction (GEC) systems play a vital role in assisting people with their daily writing tasks. However, users may sometimes come across a GEC system that initially performs well but fails to correct errors when the inputs are slightly modified. To ensure an ideal user experience, a reliable GEC system should have the ability to provide consistent and accurate suggestions when enc… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: Accepted to EMNLP 2023 (main conference, long paper)

  21. arXiv:2309.09198  [pdf, other

    cs.CL

    A Benchmark for Text Expansion: Datasets, Metrics, and Baselines

    Authors: Yi Chen, Haiyun Jiang, Wei Bi, Rui Wang, Longyue Wang, Shuming Shi, Ruifeng Xu

    Abstract: This work presents a new task of Text Expansion (TE), which aims to insert fine-grained modifiers into proper locations of the plain text to concretize or vivify human writings. Different from existing insertion-based writing assistance tasks, TE requires the model to be more flexible in both locating and generation, and also more cautious in keeping basic semantics. We leverage four complementary… ▽ More

    Submitted 17 September, 2023; originally announced September 2023.

  22. arXiv:2309.01219  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models

    Authors: Yue Zhang, Yafu Li, Leyang Cui, Deng Cai, Lemao Liu, Tingchen Fu, Xinting Huang, Enbo Zhao, Yu Zhang, Yulong Chen, Longyue Wang, Anh Tuan Luu, Wei Bi, Freda Shi, Shuming Shi

    Abstract: While large language models (LLMs) have demonstrated remarkable capabilities across a range of downstream tasks, a significant concern revolves around their propensity to exhibit hallucinations: LLMs occasionally generate content that diverges from the user input, contradicts previously generated context, or misaligns with established world knowledge. This phenomenon poses a substantial challenge… ▽ More

    Submitted 24 September, 2023; v1 submitted 3 September, 2023; originally announced September 2023.

    Comments: work in progress; 32 pages

  23. Bridged-GNN: Knowledge Bridge Learning for Effective Knowledge Transfer

    Authors: Wendong Bi, Xueqi Cheng, Bingbing Xu, Xiaoqian Sun, Li Xu, Huawei Shen

    Abstract: The data-hungry problem, characterized by insufficiency and low-quality of data, poses obstacles for deep learning models. Transfer learning has been a feasible way to transfer knowledge from high-quality external data of source domains to limited data of target domains, which follows a domain-level knowledge transfer to learn a shared posterior distribution. However, they are usually built on str… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

    Comments: Accepted by CIKM2023

  24. arXiv:2306.11485  [pdf, other

    cs.CL

    Explicit Syntactic Guidance for Neural Text Generation

    Authors: Yafu Li, Leyang Cui, Jianhao Yan, Yongjing Yin, Wei Bi, Shuming Shi, Yue Zhang

    Abstract: Most existing text generation models follow the sequence-to-sequence paradigm. Generative Grammar suggests that humans generate natural language texts by learning language grammar. We propose a syntax-guided generation schema, which generates the sequence guided by a constituency parse tree in a top-down direction. The decoding process can be decomposed into two parts: (1) predicting the infilling… ▽ More

    Submitted 25 June, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: ACL 2023

  25. arXiv:2305.15175  [pdf, other

    cs.CL

    Pre-training Multi-party Dialogue Models with Latent Discourse Inference

    Authors: Yiyang Li, Xinting Huang, Wei Bi, Hai Zhao

    Abstract: Multi-party dialogues are more difficult for models to understand than one-to-one two-party dialogues, since they involve multiple interlocutors, resulting in interweaving reply-to relations and information flows. To step over these obstacles, an effective way is to pre-train a model that understands the discourse structure of multi-party dialogues, namely, to whom each utterance is replying. Howe… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

    Comments: Accepted by ACL 2023

  26. arXiv:2305.13242  [pdf, other

    cs.CL

    MAGE: Machine-generated Text Detection in the Wild

    Authors: Yafu Li, Qintong Li, Leyang Cui, Wei Bi, Zhilin Wang, Longyue Wang, Linyi Yang, Shuming Shi, Yue Zhang

    Abstract: Large language models (LLMs) have achieved human-level text generation, emphasizing the need for effective AI-generated text detection to mitigate risks like the spread of fake news and plagiarism. Existing research has been constrained by evaluating detection methods on specific domains or particular language models. In practical scenarios, however, the detector faces texts from various domains o… ▽ More

    Submitted 21 May, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: ACL 2024

  27. arXiv:2305.13225  [pdf, other

    cs.CL

    Multi-Task Instruction Tuning of LLaMa for Specific Scenarios: A Preliminary Study on Writing Assistance

    Authors: Yue Zhang, Leyang Cui, Deng Cai, Xinting Huang, Tao Fang, Wei Bi

    Abstract: Proprietary Large Language Models (LLMs), such as ChatGPT, have garnered significant attention due to their exceptional capabilities in handling a diverse range of tasks. Recent studies demonstrate that open-sourced smaller foundational models, such as 7B-size LLaMA, can also display remarkable proficiency in tackling diverse tasks when fine-tuned using instruction-driven data. In this work, we in… ▽ More

    Submitted 9 October, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

  28. arXiv:2305.12675  [pdf, other

    cs.CL

    A Frustratingly Simple Decoding Method for Neural Text Generation

    Authors: Haoran Yang, Deng Cai, Huayang Li, Wei Bi, Wai Lam, Shuming Shi

    Abstract: We introduce a frustratingly simple, super efficient and surprisingly effective decoding method, which we call Frustratingly Simple Decoding (FSD), for neural text generation. The idea behind FSD is straightforward: we build an anti-LM based on previously generated text and use this anti-LM to penalize future generation of what has been generated. The anti-LM can be implemented as simple as an n-g… ▽ More

    Submitted 27 February, 2024; v1 submitted 21 May, 2023; originally announced May 2023.

    Comments: LREC-Coling 2024

  29. arXiv:2305.10149  [pdf, other

    cs.CL

    Multi-Grained Knowledge Retrieval for End-to-End Task-Oriented Dialog

    Authors: Fanqi Wan, Weizhou Shen, Ke Yang, Xiaojun Quan, Wei Bi

    Abstract: Retrieving proper domain knowledge from an external database lies at the heart of end-to-end task-oriented dialog systems to generate informative responses. Most existing systems blend knowledge retrieval with response generation and optimize them with direct supervision from reference responses, leading to suboptimal retrieval performance when the knowledge base becomes large-scale. To address th… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

    Comments: Accepted to ACL 2023 (Main Conference)

  30. arXiv:2302.06299  [pdf, other

    cs.SI cs.AI cs.LG

    Homophily-oriented Heterogeneous Graph Rewiring

    Authors: Jiayan Guo, Lun Du, Wendong Bi, Qiang Fu, Xiaojun Ma, Xu Chen, Shi Han, Dongmei Zhang, Yan Zhang

    Abstract: With the rapid development of the World Wide Web (WWW), heterogeneous graphs (HG) have explosive growth. Recently, heterogeneous graph neural network (HGNN) has shown great potential in learning on HG. Current studies of HGNN mainly focus on some HGs with strong homophily properties (nodes connected by meta-path tend to have the same labels), while few discussions are made in those that are less h… ▽ More

    Submitted 23 February, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

    Comments: Accepted by WWW 2023

  31. Predicting the Silent Majority on Graphs: Knowledge Transferable Graph Neural Network

    Authors: Wendong Bi, Bingbing Xu, Xiaoqian Sun, Li Xu, Huawei Shen, Xueqi Cheng

    Abstract: Graphs consisting of vocal nodes ("the vocal minority") and silent nodes ("the silent majority"), namely VS-Graph, are ubiquitous in the real world. The vocal nodes tend to have abundant features and labels. In contrast, silent nodes only have incomplete features and rare labels, e.g., the description and political tendency of politicians (vocal) are abundant while not for ordinary people (silent)… ▽ More

    Submitted 8 April, 2023; v1 submitted 1 February, 2023; originally announced February 2023.

    Comments: Paper was accepted by WWW2023

  32. Company-as-Tribe: Company Financial Risk Assessment on Tribe-Style Graph with Hierarchical Graph Neural Networks

    Authors: Wendong Bi, Bingbing Xu, Xiaoqian Sun, Zidong Wang, Huawei Shen, Xueqi Cheng

    Abstract: Company financial risk is ubiquitous and early risk assessment for listed companies can avoid considerable losses. Traditional methods mainly focus on the financial statements of companies and lack the complex relationships among them. However, the financial statements are often biased and lagged, making it difficult to identify risks accurately and timely. To address the challenges, we redefine t… ▽ More

    Submitted 31 January, 2023; originally announced January 2023.

    Comments: accepted by SIGKDD2022

  33. arXiv:2212.09603  [pdf, other

    cs.CL

    Explanation Regeneration via Information Bottleneck

    Authors: Qintong Li, Zhiyong Wu, Lingpeng Kong, Wei Bi

    Abstract: Explaining the black-box predictions of NLP models naturally and accurately is an important open problem in natural language generation. These free-text explanations are expected to contain sufficient and carefully-selected evidence to form supportive arguments for predictions. Due to the superior generative capacity of large pretrained language models, recent work built on prompt engineering enab… ▽ More

    Submitted 11 July, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

    Comments: Accepted in ACL2023 Findings

  34. arXiv:2211.08119  [pdf

    cs.CV q-bio.NC

    DeepRGVP: A Novel Microstructure-Informed Supervised Contrastive Learning Framework for Automated Identification Of The Retinogeniculate Pathway Using dMRI Tractography

    Authors: Sipei Li, Jianzhong He, Tengfei Xue, Guoqiang Xie, Shun Yao, Yuqian Chen, Erickson F. Torio, Yuanjing Feng, Dhiego CA Bastos, Yogesh Rathi, Nikos Makris, Ron Kikinis, Wenya Linda Bi, Alexandra J Golby, Lauren J O'Donnell, Fan Zhang

    Abstract: The retinogeniculate pathway (RGVP) is responsible for carrying visual information from the retina to the lateral geniculate nucleus. Identification and visualization of the RGVP are important in studying the anatomy of the visual system and can inform treatment of related brain diseases. Diffusion MRI (dMRI) tractography is an advanced imaging method that uniquely enables in vivo mapping of the 3… ▽ More

    Submitted 15 November, 2022; originally announced November 2022.

    Comments: 5 pages, 2 figures, 2 tables

  35. arXiv:2209.08264  [pdf, other

    cs.LG cs.AI

    Make Heterophily Graphs Better Fit GNN: A Graph Rewiring Approach

    Authors: Wendong Bi, Lun Du, Qiang Fu, Yanlin Wang, Shi Han, Dongmei Zhang

    Abstract: Graph Neural Networks (GNNs) are popular machine learning methods for modeling graph data. A lot of GNNs perform well on homophily graphs while having unsatisfactory performance on heterophily graphs. Recently, some researchers turn their attention to designing GNNs for heterophily graphs by adjusting the message passing mechanism or enlarging the receptive field of the message passing. Different… ▽ More

    Submitted 17 September, 2022; originally announced September 2022.

    Comments: 11 pages

  36. MM-GNN: Mix-Moment Graph Neural Network towards Modeling Neighborhood Feature Distribution

    Authors: Wendong Bi, Lun Du, Qiang Fu, Yanlin Wang, Shi Han, Dongmei Zhang

    Abstract: Graph Neural Networks (GNNs) have shown expressive performance on graph representation learning by aggregating information from neighbors. Recently, some studies have discussed the importance of modeling neighborhood distribution on the graph. However, most existing GNNs aggregate neighbors' features through single statistic (e.g., mean, max, sum), which loses the information related to neighbor's… ▽ More

    Submitted 28 January, 2023; v1 submitted 15 August, 2022; originally announced August 2022.

    Comments: accepted by WSDM2023

  37. arXiv:2208.01815  [pdf, other

    cs.CL

    Effidit: Your AI Writing Assistant

    Authors: Shuming Shi, Enbo Zhao, Duyu Tang, Yan Wang, Piji Li, Wei Bi, Haiyun Jiang, Guoping Huang, Leyang Cui, Xinting Huang, Cong Zhou, Yong Dai, Dongyang Ma

    Abstract: In this technical report, we introduce Effidit (Efficient and Intelligent Editing), a digital writing assistant that facilitates users to write higher-quality text more efficiently by using artificial intelligence (AI) technologies. Previous writing assistants typically provide the function of error checking (to detect and correct spelling and grammatical errors) and limited text-rewriting functio… ▽ More

    Submitted 4 August, 2022; v1 submitted 2 August, 2022; originally announced August 2022.

    Comments: Technical report for Effidit. arXiv admin note: text overlap with arXiv:2202.06417

  38. BrainCog: A Spiking Neural Network based Brain-inspired Cognitive Intelligence Engine for Brain-inspired AI and Brain Simulation

    Authors: Yi Zeng, Dongcheng Zhao, Feifei Zhao, Guobin Shen, Yiting Dong, Enmeng Lu, Qian Zhang, Yinqian Sun, Qian Liang, Yuxuan Zhao, Zhuoya Zhao, Hongjian Fang, Yuwei Wang, Yang Li, Xin Liu, Chengcheng Du, Qingqun Kong, Zizhe Ruan, Weida Bi

    Abstract: Spiking neural networks (SNNs) have attracted extensive attentions in Brain-inspired Artificial Intelligence and computational neuroscience. They can be used to simulate biological information processing in the brain at multiple scales. More importantly, SNNs serve as an appropriate level of abstraction to bring inspirations from brain and cognition to Artificial Intelligence. In this paper, we pr… ▽ More

    Submitted 11 July, 2023; v1 submitted 18 July, 2022; originally announced July 2022.

    Comments: This paper was accepted by Patterns. The accepted version can be seen at https://www.cell.com/patterns/fulltext/S2666-3899(23)00144-7

  39. arXiv:2206.04636  [pdf, other

    cs.CV cs.LG

    Spatial Entropy as an Inductive Bias for Vision Transformers

    Authors: Elia Peruzzo, Enver Sangineto, Yahui Liu, Marco De Nadai, Wei Bi, Bruno Lepri, Nicu Sebe

    Abstract: Recent work on Vision Transformers (VTs) showed that introducing a local inductive bias in the VT architecture helps reducing the number of samples necessary for training. However, the architecture modifications lead to a loss of generality of the Transformer backbone, partially contradicting the push towards the development of uniform architectures, shared, e.g., by both the Computer Vision and t… ▽ More

    Submitted 14 March, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

  40. arXiv:2205.12551  [pdf, other

    cs.CV cs.CR

    Masked Jigsaw Puzzle: A Versatile Position Embedding for Vision Transformers

    Authors: Bin Ren, Yahui Liu, Yue Song, Wei Bi, Rita Cucchiara, Nicu Sebe, Wei Wang

    Abstract: Position Embeddings (PEs), an arguably indispensable component in Vision Transformers (ViTs), have been shown to improve the performance of ViTs on many vision tasks. However, PEs have a potentially high risk of privacy leakage since the spatial information of the input patches is exposed. This caveat naturally raises a series of interesting questions about the impact of PEs on the accuracy, priva… ▽ More

    Submitted 26 May, 2023; v1 submitted 25 May, 2022; originally announced May 2022.

    Comments: Accepted to CVPR2023

  41. arXiv:2205.01941  [pdf, other

    cs.CL

    Lexical Knowledge Internalization for Neural Dialog Generation

    Authors: Zhiyong Wu, Wei Bi, Xiang Li, Lingpeng Kong, Ben Kao

    Abstract: We propose knowledge internalization (KI), which aims to complement the lexical knowledge into neural dialog models. Instead of further conditioning the knowledge-grounded dialog (KGD) models on externally retrieved knowledge, we seek to integrate knowledge about each input token internally into the model's parameters. To tackle the challenge due to the large scale of lexical knowledge, we adopt t… ▽ More

    Submitted 4 May, 2022; originally announced May 2022.

    Comments: To appear at ACL 2022 main conference

  42. arXiv:2204.09867  [pdf, other

    cs.CL cs.AI

    A Model-Agnostic Data Manipulation Method for Persona-based Dialogue Generation

    Authors: Yu Cao, Wei Bi, Meng Fang, Shuming Shi, Dacheng Tao

    Abstract: Towards building intelligent dialogue agents, there has been a growing interest in introducing explicit personas in generation models. However, with limited persona-based dialogue data at hand, it may be difficult to train a dialogue generation model well. We point out that the data challenges of this generation task lie in two aspects: first, it is expensive to scale up current persona-based dial… ▽ More

    Submitted 20 April, 2022; originally announced April 2022.

    Comments: Accepted to ACL 2022 (long paper)

  43. arXiv:2204.09453  [pdf, other

    cs.CL

    Event Transition Planning for Open-ended Text Generation

    Authors: Qintong Li, Piji Li, Wei Bi, Zhaochun Ren, Yuxuan Lai, Lingpeng Kong

    Abstract: Open-ended text generation tasks, such as dialogue generation and story completion, require models to generate a coherent continuation given limited preceding context. The open-ended nature of these tasks brings new challenges to the neural auto-regressive text generators nowadays. Despite these neural models are good at producing human-like text, it is difficult for them to arrange causalities an… ▽ More

    Submitted 20 April, 2022; originally announced April 2022.

    Comments: Accepted at Findings of ACL 2022

  44. arXiv:2202.12713  [pdf, other

    cs.SI cs.AI cs.LG

    HTGN-BTW: Heterogeneous Temporal Graph Network with Bi-Time-Window Training Strategy for Temporal Link Prediction

    Authors: Chongjian Yue, Lun Du, Qiang Fu, Wendong Bi, Hengyu Liu, Yu Gu, Di Yao

    Abstract: With the development of temporal networks such as E-commerce networks and social networks, the issue of temporal link prediction has attracted increasing attention in recent years. The Temporal Link Prediction task of WSDM Cup 2022 expects a single model that can work well on two kinds of temporal graphs simultaneously, which have quite different characteristics and data properties, to predict whe… ▽ More

    Submitted 25 February, 2022; originally announced February 2022.

    Comments: 5 pages, Second Winner Award at Temporal Link Prediction task of WSDM Cup 2022

  45. arXiv:2108.04356  [pdf, other

    cs.CV

    A Robust Lane Detection Associated with Quaternion Hardy Filter

    Authors: Wenshan Bi, Dong Cheng, Kit Ian Kou

    Abstract: In this article, a robust color-edge feature extraction method based on the Quaternion Hardy filter is proposed. The Quaternion Hardy filter is an emerging edge detection theory. It is along with the Poisson and conjugate Poisson smoothing kernels to handle various types of noise. Combining with the Quaternion Hardy filter, Jin's color gradient operator and Hough transform, the color-edge feature… ▽ More

    Submitted 6 August, 2021; originally announced August 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:2001.01800

  46. arXiv:2106.03746  [pdf, other

    cs.CV cs.LG

    Efficient Training of Visual Transformers with Small Datasets

    Authors: Yahui Liu, Enver Sangineto, Wei Bi, Nicu Sebe, Bruno Lepri, Marco De Nadai

    Abstract: Visual Transformers (VTs) are emerging as an architectural paradigm alternative to Convolutional networks (CNNs). Differently from CNNs, VTs can capture global relations between image elements and they potentially have a larger representation capacity. However, the lack of the typical convolutional inductive bias makes these models more data-hungry than common CNNs. In fact, some local properties… ▽ More

    Submitted 14 November, 2021; v1 submitted 7 June, 2021; originally announced June 2021.

    Journal ref: Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS) 2021

  47. arXiv:2105.14488  [pdf, other

    cs.CL

    REAM$\sharp$: An Enhancement Approach to Reference-based Evaluation Metrics for Open-domain Dialog Generation

    Authors: Jun Gao, Wei Bi, Ruifeng Xu, Shuming Shi

    Abstract: The lack of reliable automatic evaluation metrics is a major impediment to the development of open-domain dialogue systems. Various reference-based metrics have been proposed to calculate a score between a predicted response and a small set of references. However, these metrics show unsatisfactory correlations with human judgments. For a reference-based metric, its reliability mainly depends on tw… ▽ More

    Submitted 15 March, 2022; v1 submitted 30 May, 2021; originally announced May 2021.

    Comments: ACL Findings 2021

  48. arXiv:2105.14462  [pdf, other

    cs.CL cs.AI

    Good for Misconceived Reasons: An Empirical Revisiting on the Need for Visual Context in Multimodal Machine Translation

    Authors: Zhiyong Wu, Lingpeng Kong, Wei Bi, Xiang Li, Ben Kao

    Abstract: A neural multimodal machine translation (MMT) system is one that aims to perform better translation by extending conventional text-only translation models with multimodal information. Many recent studies report improvements when equipping their models with the multimodal module, despite the controversy of whether such improvements indeed come from the multimodal part. We revisit the contribution o… ▽ More

    Submitted 30 May, 2021; originally announced May 2021.

    Comments: To appear at ACL 2021 main conference

  49. arXiv:2105.13650  [pdf, other

    cs.CL cs.AI

    Data Augmentation for Text Generation Without Any Augmented Data

    Authors: Wei Bi, Huayang Li, Jiacheng Huang

    Abstract: Data augmentation is an effective way to improve the performance of many neural text generation models. However, current data augmentation methods need to define or choose proper data mapping functions that map the original samples into the augmented samples. In this work, we derive an objective to formulate the problem of data augmentation on text generation tasks without any use of augmented dat… ▽ More

    Submitted 28 May, 2021; originally announced May 2021.

    Comments: Accepted into the main conference of ACL 2021

  50. arXiv:2105.10323  [pdf, other

    cs.CL

    Learning from My Friends: Few-Shot Personalized Conversation Systems via Social Networks

    Authors: Zhiliang Tian, Wei Bi, Zihan Zhang, Dongkyu Lee, Yiping Song, Nevin L. Zhang

    Abstract: Personalized conversation models (PCMs) generate responses according to speaker preferences. Existing personalized conversation tasks typically require models to extract speaker preferences from user descriptions or their conversation histories, which are scarce for newcomers and inactive users. In this paper, we propose a few-shot personalized conversation task with an auxiliary social network. T… ▽ More

    Submitted 21 May, 2021; originally announced May 2021.

    Comments: Published by AAAI 2021