Skip to main content

Showing 1–31 of 31 results for author: Luan, J

  1. arXiv:2407.05690  [pdf, other

    cs.CL cs.AI

    Pruning Large Language Models to Intra-module Low-rank Architecture with Transitional Activations

    Authors: Bowen Shen, Zheng Lin, Daren Zha, Wei Liu, Jian Luan, Bin Wang, Weiping Wang

    Abstract: Structured pruning fundamentally reduces computational and memory overheads of large language models (LLMs) and offers a feasible solution for end-side LLM deployment. Structurally pruned models remain dense and high-precision, highly compatible with further tuning and compression. However, as the coarse-grained structured pruning poses large damage to the highly interconnected model, achieving a… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Findings of ACL 2024

  2. arXiv:2407.00993  [pdf, other

    cs.AI cs.CL

    Mobile-Bench: An Evaluation Benchmark for LLM-based Mobile Agents

    Authors: Shihan Deng, Weikai Xu, Hongda Sun, Wei Liu, Tao Tan, Jianfeng Liu, Ang Li, Jian Luan, Bin Wang, Rui Yan, Shuo Shang

    Abstract: With the remarkable advancements of large language models (LLMs), LLM-based agents have become a research hotspot in human-computer interaction. However, there is a scarcity of benchmarks available for LLM-based mobile agents. Benchmarking these agents generally faces three main challenges: (1) The inefficiency of UI-only operations imposes limitations to task evaluation. (2) Specific instructions… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  3. arXiv:2406.06571  [pdf, other

    cs.CL cs.AI

    SUBLLM: A Novel Efficient Architecture with Token Sequence Subsampling for LLM

    Authors: Quandong Wang, Yuxuan Yuan, Xiaoyu Yang, Ruike Zhang, Kang Zhao, Wei Liu, Jian Luan, Daniel Povey, Bin Wang

    Abstract: While Large Language Models (LLMs) have achieved remarkable success in various fields, the efficiency of training and inference remains a major challenge. To address this issue, we propose SUBLLM, short for Subsampling-Upsampling-Bypass Large Language Model, an innovative architecture that extends the core decoder-only framework by incorporating subsampling, upsampling, and bypass modules. The sub… ▽ More

    Submitted 17 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: 9 pages, 3 figures, submitted to ECAI 2024

    ACM Class: I.2.7

  4. arXiv:2404.11474  [pdf, other

    cs.CV

    Towards Highly Realistic Artistic Style Transfer via Stable Diffusion with Step-aware and Layer-aware Prompt

    Authors: Zhanjie Zhang, Quanwei Zhang, Huaizhong Lin, Wei Xing, Juncheng Mo, Shuaicheng Huang, Jinheng Xie, Guangyuan Li, Junsheng Luan, Lei Zhao, Dalong Zhang, Lixia Chen

    Abstract: Artistic style transfer aims to transfer the learned artistic style onto an arbitrary content image, generating artistic stylized images. Existing generative adversarial network-based methods fail to generate highly realistic stylized images and always introduce obvious artifacts and disharmonious patterns. Recently, large-scale pre-trained diffusion models opened up a new way for generating highl… ▽ More

    Submitted 29 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by IJCAI2024

  5. arXiv:2403.06551  [pdf, other

    cs.IR

    ToolRerank: Adaptive and Hierarchy-Aware Reranking for Tool Retrieval

    Authors: Yuanhang Zheng, Peng Li, Wei Liu, Yang Liu, Jian Luan, Bin Wang

    Abstract: Tool learning aims to extend the capabilities of large language models (LLMs) with external tools. A major challenge in tool learning is how to support a large number of tools, including unseen tools. To address this challenge, previous studies have proposed retrieving suitable tools for the LLM based on the user query. However, previously proposed methods do not consider the differences between s… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: This paper is accepted for LREC-COLING 2024

    Journal ref: In Proceedings of LREC-COLING 2024, pages 16263-16273

  6. arXiv:2402.16775  [pdf, other

    cs.CL cs.AI

    A Comprehensive Evaluation of Quantization Strategies for Large Language Models

    Authors: Renren Jin, Jiangcun Du, Wuwei Huang, Wei Liu, Jian Luan, Bin Wang, Deyi Xiong

    Abstract: Increasing the number of parameters in large language models (LLMs) usually improves performance in downstream tasks but raises compute and memory costs, making deployment difficult in resource-limited settings. Quantization techniques, which reduce the bits needed for model weights or activations with minimal performance loss, have become popular due to the rise of LLMs. However, most quantizatio… ▽ More

    Submitted 6 June, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: ACL 2024 Findings

  7. arXiv:2401.05459  [pdf, other

    cs.HC cs.AI cs.SE

    Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security

    Authors: Yuanchun Li, Hao Wen, Weijun Wang, Xiangyu Li, Yizhen Yuan, Guohong Liu, Jiacheng Liu, Wenxing Xu, Xiang Wang, Yi Sun, Rui Kong, Yile Wang, Hanfei Geng, Jian Luan, Xuefeng Jin, Zilong Ye, Guanjing Xiong, Fan Zhang, Xiang Li, Mengwei Xu, Zhijun Li, Peng Li, Yang Liu, Ya-Qin Zhang, Yunxin Liu

    Abstract: Since the advent of personal computing devices, intelligent personal assistants (IPAs) have been one of the key technologies that researchers and engineers have focused on, aiming to help users efficiently obtain information and execute tasks, and provide users with more intelligent, convenient, and rich interaction experiences. With the development of smartphones and IoT, computing and sensing de… ▽ More

    Submitted 8 May, 2024; v1 submitted 10 January, 2024; originally announced January 2024.

    Comments: https://github.com/MobileLLM/Personal_LLM_Agents_Survey

  8. arXiv:2401.04283  [pdf, ps, other

    eess.AS cs.SD

    FADI-AEC: Fast Score Based Diffusion Model Guided by Far-end Signal for Acoustic Echo Cancellation

    Authors: Yang Liu, Li Wan, Yun Li, Yiteng Huang, Ming Sun, James Luan, Yangyang Shi, Xin Lei

    Abstract: Despite the potential of diffusion models in speech enhancement, their deployment in Acoustic Echo Cancellation (AEC) has been restricted. In this paper, we propose DI-AEC, pioneering a diffusion-based stochastic regeneration approach dedicated to AEC. Further, we propose FADI-AEC, fast score-based diffusion AEC framework to save computational demands, making it favorable for edge devices. It stan… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

  9. arXiv:2312.06135  [pdf, other

    cs.CV

    ArtBank: Artistic Style Transfer with Pre-trained Diffusion Model and Implicit Style Prompt Bank

    Authors: Zhanjie Zhang, Quanwei Zhang, Guangyuan Li, Wei Xing, Lei Zhao, Jiakai Sun, Zehua Lan, Junsheng Luan, Yiling Huang, Huaizhong Lin

    Abstract: Artistic style transfer aims to repaint the content image with the learned artistic style. Existing artistic style transfer methods can be divided into two categories: small model-based approaches and pre-trained large-scale model-based approaches. Small model-based approaches can preserve the content strucuture, but fail to produce highly realistic stylized images and introduce artifacts and dish… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI2024

  10. arXiv:2311.03672  [pdf, other

    cs.CL

    CBSiMT: Mitigating Hallucination in Simultaneous Machine Translation with Weighted Prefix-to-Prefix Training

    Authors: Mengge Liu, Wen Zhang, Xiang Li, Yanzhi Tian, Yuhang Guo, Jian Luan, Bin Wang, Shuoying Chen

    Abstract: Simultaneous machine translation (SiMT) is a challenging task that requires starting translation before the full source sentence is available. Prefix-to-prefix framework is often applied to SiMT, which learns to predict target tokens using only a partial source prefix. However, due to the word order difference between languages, misaligned prefix pairs would make SiMT models suffer from serious ha… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

  11. arXiv:2310.18659  [pdf, other

    cs.AI cs.CL

    DetermLR: Augmenting LLM-based Logical Reasoning from Indeterminacy to Determinacy

    Authors: Hongda Sun, Weikai Xu, Wei Liu, Jian Luan, Bin Wang, Shuo Shang, Ji-Rong Wen, Rui Yan

    Abstract: Recent advances in large language models (LLMs) have revolutionized the landscape of reasoning tasks. To enhance the capabilities of LLMs to emulate human reasoning, prior studies have focused on modeling reasoning steps using various thought structures like chains, trees, or graphs. However, LLM-based reasoning still encounters the following challenges: (1) Limited adaptability of preset structur… ▽ More

    Submitted 26 May, 2024; v1 submitted 28 October, 2023; originally announced October 2023.

    Comments: Accepted at ACL 2024 Main, Code repo: https://github.com/XiaoMi/DetermLR

  12. arXiv:2307.15895  [pdf, other

    cs.CR

    Auditing Frameworks Need Resource Isolation: A Systematic Study on the Super Producer Threat to System Auditing and Its Mitigation

    Authors: Peng Jiang, Ruizhe Huang, Ding Li, Yao Guo, Xiangqun Chen, Jianhai Luan, Yuxin Ren, Xinwei Hu

    Abstract: System auditing is a crucial technique for detecting APT attacks. However, attackers may try to compromise the system auditing frameworks to conceal their malicious activities. In this paper, we present a comprehensive and systematic study of the super producer threat in auditing frameworks, which enables attackers to either corrupt the auditing framework or paralyze the entire system. We analyze… ▽ More

    Submitted 29 July, 2023; originally announced July 2023.

    Comments: 18 pages, to appear in the 32th USENIX Security Symposium (USENIX Security '23)

  13. arXiv:2306.16636  [pdf, other

    cs.CL cs.AI cs.LG

    CMATH: Can Your Language Model Pass Chinese Elementary School Math Test?

    Authors: Tianwen Wei, Jian Luan, Wei Liu, Shuang Dong, Bin Wang

    Abstract: We present the Chinese Elementary School Math Word Problems (CMATH) dataset, comprising 1.7k elementary school-level math word problems with detailed annotations, source from actual Chinese workbooks and exams. This dataset aims to provide a benchmark tool for assessing the following question: to what grade level of elementary school math do the abilities of popular large language models (LLMs) co… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

  14. arXiv:2306.10543  [pdf, other

    cs.CL

    UniMC: A Unified Framework for Long-Term Memory Conversation via Relevance Representation Learning

    Authors: Kang Zhao, Wei Liu, Jian Luan, Minglei Gao, Li Qian, Hanlin Teng, Bin Wang

    Abstract: Open-domain long-term memory conversation can establish long-term intimacy with humans, and the key is the ability to understand and memorize long-term dialogue history information. Existing works integrate multiple models for modelling through a pipeline, which ignores the coupling between different stages. In this paper, we propose a Unified framework for Long-term Memory Conversations (UniMC),… ▽ More

    Submitted 18 June, 2023; originally announced June 2023.

  15. arXiv:2305.17415  [pdf, other

    cs.CL cs.AI

    Exploring Better Text Image Translation with Multimodal Codebook

    Authors: Zhibin Lan, Jiawei Yu, Xiang Li, Wen Zhang, Jian Luan, Bin Wang, Degen Huang, Jinsong Su

    Abstract: Text image translation (TIT) aims to translate the source texts embedded in the image to target translations, which has a wide range of applications and thus has important research value. However, current studies on TIT are confronted with two main bottlenecks: 1) this task lacks a publicly available TIT dataset, 2) dominant models are constructed in a cascaded manner, which tends to suffer from t… ▽ More

    Submitted 2 June, 2023; v1 submitted 27 May, 2023; originally announced May 2023.

    Comments: Accepted by ACL 2023 Main Conference

  16. arXiv:2303.00969  [pdf, other

    cs.CL

    Rethinking the Reasonability of the Test Set for Simultaneous Machine Translation

    Authors: Mengge Liu, Wen Zhang, Xiang Li, Jian Luan, Bin Wang, Yuhang Guo, Shuoying Chen

    Abstract: Simultaneous machine translation (SimulMT) models start translation before the end of the source sentence, making the translation monotonically aligned with the source sentence. However, the general full-sentence translation test set is acquired by offline translation of the entire source sentence, which is not designed for SimulMT evaluation, making us rethink whether this will underestimate the… ▽ More

    Submitted 13 March, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

    Comments: Accepted by 48th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023)

  17. arXiv:2301.06745  [pdf, other

    cs.CL cs.AI

    BERT-ERC: Fine-tuning BERT is Enough for Emotion Recognition in Conversation

    Authors: Xiangyu Qin, Zhiyu Wu, Jinshi Cui, Tingting Zhang, Yanran Li, Jian Luan, Bin Wang, Li Wang

    Abstract: Previous works on emotion recognition in conversation (ERC) follow a two-step paradigm, which can be summarized as first producing context-independent features via fine-tuning pretrained language models (PLMs) and then analyzing contextual information and dialogue structure information among the extracted features. However, we discover that this paradigm has several limitations. Accordingly, we pr… ▽ More

    Submitted 17 January, 2023; originally announced January 2023.

  18. arXiv:2212.03435  [pdf, other

    cs.SD cs.CL eess.AS

    Improve Bilingual TTS Using Dynamic Language and Phonology Embedding

    Authors: Fengyu Yang, Jian Luan, Yujun Wang

    Abstract: In most cases, bilingual TTS needs to handle three types of input scripts: first language only, second language only, and second language embedded in the first language. In the latter two situations, the pronunciation and intonation of the second language are usually quite different due to the influence of the first language. Therefore, it is a big challenge to accurately model the pronunciation a… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

    Comments: Submitted to ICASSP2023

  19. arXiv:2110.09780  [pdf, other

    cs.SD eess.AS

    Improving Emotional Speech Synthesis by Using SUS-Constrained VAE and Text Encoder Aggregation

    Authors: Fengyu Yang, Jian Luan, Yujun Wang

    Abstract: Learning emotion embedding from reference audio is a straightforward approach for multi-emotion speech synthesis in encoder-decoder systems. But how to get better emotion embedding and how to inject it into TTS acoustic model more effectively are still under investigation. In this paper, we propose an innovative constraint to help VAE extract emotion embedding with better cluster cohesion. Besides… ▽ More

    Submitted 28 January, 2022; v1 submitted 19 October, 2021; originally announced October 2021.

    Comments: accepted by ICASSP2022

  20. arXiv:2110.04486  [pdf, other

    cs.SD cs.AI cs.CL cs.LG

    PAMA-TTS: Progression-Aware Monotonic Attention for Stable Seq2Seq TTS With Accurate Phoneme Duration Control

    Authors: Yunchao He, Jian Luan, Yujun Wang

    Abstract: Sequence expansion between encoder and decoder is a critical challenge in sequence-to-sequence TTS. Attention-based methods achieve great naturalness but suffer from unstable issues like missing and repeating phonemes, not to mention accurate duration control. Duration-informed methods, on the contrary, seem to easily adjust phoneme duration but show obvious degradation in speech naturalness. This… ▽ More

    Submitted 18 March, 2022; v1 submitted 9 October, 2021; originally announced October 2021.

    Comments: Accepted by ICASSP 2022. 5 pages, 4 figures, 3 tables. Audio samples are available at: https://pama-tts.github.io/

  21. arXiv:2107.03065  [pdf, other

    cs.SD eess.AS

    Msdtron: a high-capability multi-speaker speech synthesis system for diverse data using characteristic information

    Authors: Qinghua Wu, Quanbo Shen, Jian Luan, YuJun Wang

    Abstract: In multi-speaker speech synthesis, data from a number of speakers usually tend to have great diversity due to the fact that the speakers may differ largely in ages, speaking styles, emotions, and so on. It is important but challenging to improve the modeling capabilities for multi-speaker speech synthesis. To address the issue, this paper proposes a high-capability speech synthesis system, called… ▽ More

    Submitted 11 February, 2022; v1 submitted 7 July, 2021; originally announced July 2021.

    Comments: Accepted by ICASSP-2022

  22. arXiv:2009.01776  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis

    Authors: Jiawei Chen, Xu Tan, Jian Luan, Tao Qin, Tie-Yan Liu

    Abstract: High-fidelity singing voices usually require higher sampling rate (e.g., 48kHz) to convey expression and emotion. However, higher sampling rate causes the wider frequency band and longer waveform sequences and throws challenges for singing voice synthesis (SVS) in both frequency and time domains. Conventional SVS systems that adopt small sampling rate cannot well address the above challenges. In t… ▽ More

    Submitted 3 September, 2020; originally announced September 2020.

  23. arXiv:2008.04658  [pdf, other

    eess.AS cs.SD

    Transfer Learning for Improving Singing-voice Detection in Polyphonic Instrumental Music

    Authors: Yuanbo Hou, Frank K. Soong, Jian Luan, Shengchen Li

    Abstract: Detecting singing-voice in polyphonic instrumental music is critical to music information retrieval. To train a robust vocal detector, a large dataset marked with vocal or non-vocal label at frame-level is essential. However, frame-level labeling is time-consuming and labor expensive, resulting there is little well-labeled dataset available for singing-voice detection (S-VD). Hence, we propose a d… ▽ More

    Submitted 11 August, 2020; originally announced August 2020.

    Comments: Accepted by INTERSPEECH 2020

  24. arXiv:2008.02490  [pdf

    eess.AS cs.SD

    PPSpeech: Phrase based Parallel End-to-End TTS System

    Authors: Yahuan Cong, Ran Zhang, Jian Luan

    Abstract: Current end-to-end autoregressive TTS systems (e.g. Tacotron 2) have outperformed traditional parallel approaches on the quality of synthesized speech. However, they introduce new problems at the same time. Due to the autoregressive nature, the time cost of inference has to be proportional to the length of text, which pose a great challenge for online serving. On the other hand, the style of synth… ▽ More

    Submitted 6 August, 2020; originally announced August 2020.

  25. arXiv:2007.04590  [pdf, other

    eess.AS cs.CL cs.SD

    DeepSinger: Singing Voice Synthesis with Data Mined From the Web

    Authors: Yi Ren, Xu Tan, Tao Qin, Jian Luan, Zhou Zhao, Tie-Yan Liu

    Abstract: In this paper, we develop DeepSinger, a multi-lingual multi-singer singing voice synthesis (SVS) system, which is built from scratch using singing training data mined from music websites. The pipeline of DeepSinger consists of several steps, including data crawling, singing and accompaniment separation, lyrics-to-singing alignment, data filtration, and singing modeling. Specifically, we design a l… ▽ More

    Submitted 15 July, 2020; v1 submitted 9 July, 2020; originally announced July 2020.

    Comments: Accepted by KDD2020 research track

  26. arXiv:2006.10317  [pdf, other

    eess.AS cs.LG cs.SD

    Adversarially Trained Multi-Singer Sequence-To-Sequence Singing Synthesizer

    Authors: Jie Wu, Jian Luan

    Abstract: This paper presents a high quality singing synthesizer that is able to model a voice with limited available recordings. Based on the sequence-to-sequence singing model, we design a multi-singer framework to leverage all the existing singing data of different singers. To attenuate the issue of musical score unbalance among singers, we incorporate an adversarial task of singer classification to make… ▽ More

    Submitted 18 June, 2020; originally announced June 2020.

    Comments: Submitted to INTERSPEECH2020

  27. arXiv:2006.06261  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    XiaoiceSing: A High-Quality and Integrated Singing Voice Synthesis System

    Authors: Peiling Lu, Jie Wu, Jian Luan, Xu Tan, Li Zhou

    Abstract: This paper presents XiaoiceSing, a high-quality singing voice synthesis system which employs an integrated network for spectrum, F0 and duration modeling. We follow the main architecture of FastSpeech while proposing some singing-specific design: 1) Besides phoneme ID and position encoding, features from musical score (e.g.note pitch and length) are also added. 2) To attenuate off-key issues, we a… ▽ More

    Submitted 11 June, 2020; originally announced June 2020.

  28. arXiv:1901.00578  [pdf, ps, other

    cs.LG math.ST stat.CO stat.ML

    Prediction of multi-dimensional spatial variation data via Bayesian tensor completion

    Authors: Jiali Luan, Zheng Zhang

    Abstract: This paper presents a multi-dimensional computational method to predict the spatial variation data inside and across multiple dies of a wafer. This technique is based on tensor computation. A tensor is a high-dimensional generalization of a matrix or a vector. By exploiting the hidden low-rank property of a high-dimensional data array, the large amount of unknown variation testing data may be pred… ▽ More

    Submitted 2 January, 2019; originally announced January 2019.

  29. "Influence Sketching": Finding Influential Samples In Large-Scale Regressions

    Authors: Mike Wojnowicz, Ben Cruz, Xuan Zhao, Brian Wallace, Matt Wolff, Jay Luan, Caleb Crable

    Abstract: There is an especially strong need in modern large-scale data analysis to prioritize samples for manual inspection. For example, the inspection could target important mislabeled samples or key vulnerabilities exploitable by an adversarial attack. In order to solve the "needle in the haystack" problem of which samples to inspect, we develop a new scalable version of Cook's distance, a classical sta… ▽ More

    Submitted 23 March, 2017; v1 submitted 17 November, 2016; originally announced November 2016.

    Comments: fixed additional typos

    Journal ref: Big Data (Big Data), 2016 IEEE International Conference on, pp. 3601 - 3612. IEEE, 2016

  30. arXiv:1311.6542  [pdf, ps, other

    cs.LO

    Implementing program extraction from CL1-proofs

    Authors: Meixia Qu, Ke Chen, Daming Zhu, Junfeng Luan

    Abstract: Computability logic (CoL) is a formal theory of interactive computation. It understands computational problems as games played by two players: a machine and its environment, uses logical formalism to describe valid principles of computability and formulas to represent computational problems. Logic CL1 is a deductive system for a fragment of CoL. The logical vocabulary contains all of the operators… ▽ More

    Submitted 25 November, 2013; originally announced November 2013.

    Comments: 1 figure

  31. arXiv:1207.1188  [pdf, ps, other

    cs.LO

    On the toggling-branching recurrence of Computability Logic

    Authors: Meixia Qu, Junfeng Luan, Daming Zhu

    Abstract: We introduce a new, substantially simplified version of the toggling-branching recurrence operation of Computability Logic, prove its equivalence to Japaridze's old, "canonical" version, and also prove that both versions preserve the static property of their arguments.

    Submitted 5 July, 2012; originally announced July 2012.