Skip to main content

Showing 1–50 of 225 results for author: Ling, Z

  1. arXiv:2407.10156  [pdf, other

    astro-ph.HE

    Triggering the Untriggered: The First Einstein Probe-Detected Gamma-Ray Burst 240219A and Its Implications

    Authors: Yi-Han Iris Yin, Bin-Bin Zhang, Jun Yang, Hui Sun, Chen Zhang, Yi-Xuan Shao, You-Dong Hu, Zi-Pei Zhu, Dong Xu, Li An, He Gao, Xue-Feng Wu, Bing Zhang, Alberto Javier Castro-Tirado, Shashi B. Pandey, Arne Rau, Weihua Lei, Wei Xie, Giancarlo Ghirlanda, Luigi Piro, Paul O'Brien, Eleonora Troja, Peter Jonker, Yun-Wei Yu, Jie An , et al. (26 additional authors not shown)

    Abstract: The Einstein Probe (EP) achieved its first detection and localization of a bright X-ray flare, EP240219a, on February 19, 2024, during its commissioning phase. Subsequent targeted searches triggered by the EP240219a alert identified a faint, untriggered gamma-ray burst (GRB) in the archived data of Fermi/GBM, Swift/BAT, Insight-HXMT/HE and INTEGRAL/SPI-ACS. The EP/WXT light curve reveals a long du… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: 14 pages, 8 figures, 3 tables

  2. arXiv:2407.09530  [pdf

    cs.CV cs.AI cs.RO

    Optimization of Autonomous Driving Image Detection Based on RFAConv and Triplet Attention

    Authors: Zhipeng Ling, Qi Xin, Yiyu Lin, Guangze Su, Zuwei Shui

    Abstract: YOLOv8 plays a crucial role in the realm of autonomous driving, owing to its high-speed target detection, precise identification and positioning, and versatile compatibility across multiple platforms. By processing video streams or images in real-time, YOLOv8 rapidly and accurately identifies obstacles such as vehicles and pedestrians on roadways, offering essential visual data for autonomous driv… ▽ More

    Submitted 25 June, 2024; originally announced July 2024.

    Comments: 13 pages

  3. arXiv:2406.16062  [pdf, other

    cs.NE

    Towards Biologically Plausible Computing: A Comprehensive Comparison

    Authors: Changze Lv, Yufei Gu, Zhengkang Guo, Zhibo Xu, Yixin Wu, Feiran Zhang, Tianyuan Shi, Zhenghua Wang, Ruicheng Yin, Yu Shang, Siqi Zhong, Xiaohua Wang, Muling Wu, Wenhao Liu, Tianlong Li, Jianhao Zhu, Cenyuan Zhang, Zixuan Ling, Xiaoqing Zheng

    Abstract: Backpropagation is a cornerstone algorithm in training neural networks for supervised learning, which uses a gradient descent method to update network weights by minimizing the discrepancy between actual and desired outputs. Despite its pivotal role in propelling deep learning advancements, the biological plausibility of backpropagation is questioned due to its requirements for weight symmetry, gl… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  4. arXiv:2406.14401  [pdf, other

    cs.LG cs.AI

    Fair Streaming Feature Selection

    Authors: Zhangling Duan, Tianci Li, Xingyu Wu, Zhaolong Ling, Jingye Yang, Zhaohong Jia

    Abstract: Streaming feature selection techniques have become essential in processing real-time data streams, as they facilitate the identification of the most relevant attributes from continuously updating information. Despite their performance, current algorithms to streaming feature selection frequently fall short in managing biases and avoiding discrimination that could be perpetuated by sensitive attrib… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 30 pages, 10 figures

  5. arXiv:2406.10976  [pdf, other

    cs.LG cs.CL cs.CR

    Promoting Data and Model Privacy in Federated Learning through Quantized LoRA

    Authors: JianHao Zhu, Changze Lv, Xiaohua Wang, Muling Wu, Wenhao Liu, Tianlong Li, Zixuan Ling, Cenyuan Zhang, Xiaoqing Zheng, Xuanjing Huang

    Abstract: Conventional federated learning primarily aims to secure the privacy of data distributed across multiple edge devices, with the global model dispatched to edge devices for parameter updates during the learning process. However, the development of large language models (LLMs) requires substantial data and computational resources, rendering them valuable intellectual properties for their developers… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  6. arXiv:2406.08266  [pdf, other

    eess.AS cs.SD

    Refining Self-Supervised Learnt Speech Representation using Brain Activations

    Authors: Hengyu Li, Kangdi Mei, Zhaoci Liu, Yang Ai, Liping Chen, Jie Zhang, Zhenhua Ling

    Abstract: It was shown in literature that speech representations extracted by self-supervised pre-trained models exhibit similarities with brain activations of human for speech perception and fine-tuning speech representation models on downstream tasks can further improve the similarity. However, it still remains unclear if this similarity can be used to optimize the pre-trained speech models. In this work,… ▽ More

    Submitted 13 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: accpeted by Interspeech2024

  7. arXiv:2406.08200  [pdf, other

    cs.SD cs.AI eess.AS

    Asynchronous Voice Anonymization Using Adversarial Perturbation On Speaker Embedding

    Authors: Rui Wang, Liping Chen, Kong AiK Lee, Zhen-Hua Ling

    Abstract: Voice anonymization has been developed as a technique for preserving privacy by replacing the speaker's voice in a speech signal with that of a pseudo-speaker, thereby obscuring the original voice attributes from machine recognition and human perception. In this paper, we focus on altering the voice attributes against machine recognition while retaining human perception. We referred to this as the… ▽ More

    Submitted 13 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: accpeted by Interspeech2024

  8. arXiv:2406.07410  [pdf, other

    eess.AS

    Clever Hans Effect Found in Automatic Detection of Alzheimer's Disease through Speech

    Authors: Yin-Long Liu, Rui Feng, Jia-Hong Yuan, Zhen-Hua Ling

    Abstract: We uncover an underlying bias present in the audio recordings produced from the picture description task of the Pitt corpus, the largest publicly accessible database for Alzheimer's Disease (AD) detection research. Even by solely utilizing the silent segments of these audio recordings, we achieve nearly 100% accuracy in AD detection. However, employing the same methods to other datasets and prepro… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  9. arXiv:2406.02250  [pdf, other

    eess.AS cs.SD

    Multi-Stage Speech Bandwidth Extension with Flexible Sampling Rate Control

    Authors: Ye-Xin Lu, Yang Ai, Zheng-Yan Sheng, Zhen-Hua Ling

    Abstract: The majority of existing speech bandwidth extension (BWE) methods operate under the constraint of fixed source and target sampling rates, which limits their flexibility in practical applications. In this paper, we propose a multi-stage speech BWE model named MS-BWE, which can handle a set of source and target sampling rate pairs and achieve flexible extensions of frequency bandwidth. The proposed… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  10. arXiv:2406.02162  [pdf, other

    eess.AS cs.SD

    BiVocoder: A Bidirectional Neural Vocoder Integrating Feature Extraction and Waveform Generation

    Authors: Hui-Peng Du, Ye-Xin Lu, Yang Ai, Zhen-Hua Ling

    Abstract: This paper proposes a novel bidirectional neural vocoder, named BiVocoder, capable both of feature extraction and reverse waveform generation within the short-time Fourier transform (STFT) domain. For feature extraction, the BiVocoder takes amplitude and phase spectra derived from STFT as inputs, transforms them into long-frame-shift and low-dimensional features through convolutional neural networ… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  11. arXiv:2405.16821  [pdf, other

    cs.CL

    Perturbation-Restrained Sequential Model Editing

    Authors: Jun-Yu Ma, Hong Wang, Hao-Xiang Xu, Zhen-Hua Ling, Jia-Chen Gu

    Abstract: Model editing is an emerging field that focuses on updating the knowledge embedded within large language models (LLMs) without extensive retraining. However, current model editing methods significantly compromise the general abilities of LLMs as the number of edits increases, and this trade-off poses a substantial challenge to the continual learning of LLMs. In this paper, we first theoretically a… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  12. arXiv:2405.11541  [pdf, other

    cs.IT eess.SP

    R-NeRF: Neural Radiance Fields for Modeling RIS-enabled Wireless Environments

    Authors: Huiying Yang, Zihan Jin, Chenhao Wu, Rujing Xiong, Robert Caiming Qiu, Zenan Ling

    Abstract: Recently, ray tracing has gained renewed interest with the advent of Reflective Intelligent Surfaces (RIS) technology, a key enabler of 6G wireless communications due to its capability of intelligent manipulation of electromagnetic waves. However, accurately modeling RIS-enabled wireless environments poses significant challenges due to the complex variations caused by various environmental factors… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

  13. arXiv:2404.16425  [pdf, other

    astro-ph.HE

    Soft X-ray prompt emission from a high-redshift gamma-ray burst EP240315a

    Authors: Y. Liu, H. Sun, D. Xu, D. S. Svinkin, J. Delaunay, N. R. Tanvir, H. Gao, C. Zhang, Y. Chen, X. -F. Wu, B. Zhang, W. Yuan, J. An, G. Bruni, D. D. Frederiks, G. Ghirlanda, J. -W. Hu, A. Li, C. -K. Li, J. -D. Li, D. B. Malesani, L. Piro, G. Raman, R. Ricci, E. Troja , et al. (170 additional authors not shown)

    Abstract: Long gamma-ray bursts (GRBs) are believed to originate from core collapse of massive stars. High-redshift GRBs can probe the star formation and reionization history of the early universe, but their detection remains rare. Here we report the detection of a GRB triggered in the 0.5--4 keV band by the Wide-field X-ray Telescope (WXT) on board the Einstein Probe (EP) mission, designated as EP240315a,… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: 41 pages, 8 figures, 7 tables

  14. arXiv:2404.16350  [pdf, other

    astro-ph.HE

    The fast X-ray transient EP240315a: a z ~ 5 gamma-ray burst in a Lyman continuum leaking galaxy

    Authors: Andrew J. Levan, Peter G. Jonker, Andrea Saccardi, Daniele Bjørn Malesani, Nial R. Tanvir, Luca Izzo, Kasper E. Heintz, Daniel Mata Sánchez, Jonathan Quirola-Vásquez, Manuel A. P. Torres, Susanna D. Vergani, Steve Schulze, Andrea Rossi, Paolo D'Avanzo, Benjamin Gompertz, Antonio Martin-Carrillo, Antonio de Ugarte Postigo, Benjamin Schneider, Weimin Yuan, Zhixing Ling, Wenjie Zhang, Xuan Mao, Yuan Liu, Hui Sun, Dong Xu , et al. (51 additional authors not shown)

    Abstract: The nature of the minute-to-hour long Fast X-ray Transients (FXTs) localised by telescopes such as Chandra, Swift, and XMM-Newton remains mysterious, with numerous models suggested for the events. Here, we report multi-wavelength observations of EP240315a, a 1600 s long transient detected by the Einstein Probe, showing it to have a redshift of z=4.859. We measure a low column density of neutral hy… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: 41 pages, 7 figures, submitted

  15. arXiv:2404.12886  [pdf, other

    cs.CV cs.LG

    MCM: Multi-condition Motion Synthesis Framework

    Authors: Zeyu Ling, Bo Han, Yongkang Wongkan, Han Lin, Mohan Kankanhalli, Weidong Geng

    Abstract: Conditional human motion synthesis (HMS) aims to generate human motion sequences that conform to specific conditions. Text and audio represent the two predominant modalities employed as HMS control conditions. While existing research has primarily focused on single conditions, the multi-condition human motion synthesis remains underexplored. In this study, we propose a multi-condition HMS framewor… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Journal ref: International Joint Conference on Artificial Intelligence 2024

  16. arXiv:2404.08857  [pdf, other

    cs.SD cs.AI eess.AS

    Voice Attribute Editing with Text Prompt

    Authors: Zhengyan Sheng, Yang Ai, Li-Juan Liu, Jia Pan, Zhen-Hua Ling

    Abstract: Despite recent advancements in speech generation with text prompt providing control over speech style, voice attributes in synthesized speech remain elusive and challenging to control. This paper introduces a novel task: voice attribute editing with text prompt, with the goal of making relative modifications to voice attributes according to the actions described in the text prompt. To solve this t… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

  17. arXiv:2403.17378  [pdf, other

    cs.SD eess.AS

    Low-Latency Neural Speech Phase Prediction based on Parallel Estimation Architecture and Anti-Wrapping Losses for Speech Generation Tasks

    Authors: Yang Ai, Zhen-Hua Ling

    Abstract: This paper presents a novel neural speech phase prediction model which predicts wrapped phase spectra directly from amplitude spectra. The proposed model is a cascade of a residual convolutional network and a parallel estimation architecture. The parallel estimation architecture is a core module for direct wrapped phase prediction. This architecture consists of two parallel linear convolutional la… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: Accepted by IEEE Transactions on Audio, Speech and Language Processing. arXiv admin note: substantial text overlap with arXiv:2211.15974

  18. arXiv:2403.15764  [pdf, other

    astro-ph.IM hep-ex physics.ins-det

    Radiation Effects on Scientific CMOS Detectors for X-ray Astronomy: II. Total Ionizing Dose Irradiation

    Authors: Mengxi Chen, Zhixing Ling, Mingjun Liu, Qinyu Wu, Chen Zhang, Jiaqiang Liu, Zhenlong Zhang, Weimin Yuan, Shuang-Nan Zhang

    Abstract: Complementary metal-oxide-semiconductor (CMOS) detectors are a competitive choice for current and upcoming astronomical missions. To understand the performance variations of CMOS detectors in space environment, we investigate the total ionizing dose effects on custom-made large-format X-ray CMOS detectors. Three CMOS detector samples were irradiated with a Co-60 source with a total dose of 70 krad… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

    Comments: accepted by JATIS

  19. arXiv:2403.11183  [pdf, other

    cs.CL

    Decoding Continuous Character-based Language from Non-invasive Brain Recordings

    Authors: Cenyuan Zhang, Xiaoqing Zheng, Ruicheng Yin, Shujie Geng, Jianhan Xu, Xuan Gao, Changze Lv, Zixuan Ling, Xuanjing Huang, Miao Cao, Jianfeng Feng

    Abstract: Deciphering natural language from brain activity through non-invasive devices remains a formidable challenge. Previous non-invasive decoders either require multiple experiments with identical stimuli to pinpoint cortical regions and enhance signal-to-noise ratios in brain activity, or they are limited to discerning basic linguistic elements such as letters and words. We propose a novel approach to… ▽ More

    Submitted 19 March, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

  20. arXiv:2403.10146  [pdf, other

    cs.SD cs.IR eess.AS

    Multiscale Matching Driven by Cross-Modal Similarity Consistency for Audio-Text Retrieval

    Authors: Qian Wang, Jia-Chen Gu, Zhen-Hua Ling

    Abstract: Audio-text retrieval (ATR), which retrieves a relevant caption given an audio clip (A2T) and vice versa (T2A), has recently attracted much research attention. Existing methods typically aggregate information from each modality into a single vector for matching, but this sacrifices local details and can hardly capture intricate relationships within and between modalities. Furthermore, current ATR d… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: 5 pages, accepted to ICASSP2024

  21. arXiv:2403.09718  [pdf

    cs.CL cs.AI

    Comprehensive Implementation of TextCNN for Enhanced Collaboration between Natural Language Processing and System Recommendation

    Authors: Xiaonan Xu, Zheng Xu, Zhipeng Ling, Zhengyu Jin, ShuQian Du

    Abstract: Natural Language Processing (NLP) is an important branch of artificial intelligence that studies how to enable computers to understand, process, and generate human language. Text classification is a fundamental task in NLP, which aims to classify text into different predefined categories. Text classification is the most basic and classic task in natural language processing, and most of the tasks i… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  22. arXiv:2402.15179  [pdf, other

    cs.LG cs.CL

    Advancing Parameter Efficiency in Fine-tuning via Representation Editing

    Authors: Muling Wu, Wenhao Liu, Xiaohua Wang, Tianlong Li, Changze Lv, Zixuan Ling, Jianhao Zhu, Cenyuan Zhang, Xiaoqing Zheng, Xuanjing Huang

    Abstract: Parameter Efficient Fine-Tuning (PEFT) techniques have drawn significant attention due to their ability to yield competitive results while updating only a small portion of the adjustable parameters. However, existing PEFT methods pose challenges in hyperparameter selection, such as choosing the rank for LoRA or Adapter, or specifying the length of soft prompts. To address these challenges, we prop… ▽ More

    Submitted 2 June, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

  23. arXiv:2402.10533  [pdf, other

    cs.SD eess.AS

    APCodec: A Neural Audio Codec with Parallel Amplitude and Phase Spectrum Encoding and Decoding

    Authors: Yang Ai, Xiao-Hang Jiang, Ye-Xin Lu, Hui-Peng Du, Zhen-Hua Ling

    Abstract: This paper introduces a novel neural audio codec targeting high waveform sampling rates and low bitrates named APCodec, which seamlessly integrates the strengths of parametric codecs and waveform codecs. The APCodec revolutionizes the process of audio encoding and decoding by concurrently handling the amplitude and phase spectra as audio parametric characteristics like parametric codecs. It is com… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: Submitted to IEEE/ACM Transactions on Audio, Speech, and Language Processing

  24. arXiv:2402.07501  [pdf, other

    cs.LG cs.AI

    One Train for Two Tasks: An Encrypted Traffic Classification Framework Using Supervised Contrastive Learning

    Authors: Haozhen Zhang, Xi Xiao, Le Yu, Qing Li, Zhen Ling, Ye Zhang

    Abstract: As network security receives widespread attention, encrypted traffic classification has become the current research focus. However, existing methods conduct traffic classification without sufficiently considering the common characteristics between data samples, leading to suboptimal performance. Moreover, they train the packet-level and flow-level classification tasks independently, which is redun… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

    Comments: The code is available at https://github.com/ViktorAxelsen/CLE-TFE

  25. arXiv:2402.05926  [pdf, other

    cs.LG cs.CL

    On the Convergence of Zeroth-Order Federated Tuning for Large Language Models

    Authors: Zhenqing Ling, Daoyuan Chen, Liuyi Yao, Yaliang Li, Ying Shen

    Abstract: The confluence of Federated Learning (FL) and Large Language Models (LLMs) is ushering in a new era in privacy-preserving natural language processing. However, the intensive memory requirements for fine-tuning LLMs pose significant challenges, especially when deploying on clients with limited computational resources. To circumvent this, we explore the novel integration of Memory-efficient Zeroth-O… ▽ More

    Submitted 17 June, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

    Comments: accepted by KDD'24 research track. 21 pages, 10 figures, 8 tables

  26. arXiv:2402.02697  [pdf, ps, other

    cs.LG stat.ML

    Deep Equilibrium Models are Almost Equivalent to Not-so-deep Explicit Models for High-dimensional Gaussian Mixtures

    Authors: Zenan Ling, Longbo Li, Zhanbo Feng, Yixuan Zhang, Feng Zhou, Robert C. Qiu, Zhenyu Liao

    Abstract: Deep equilibrium models (DEQs), as a typical implicit neural network, have demonstrated remarkable success on various tasks. There is, however, a lack of theoretical understanding of the connections and differences between implicit DEQs and explicit neural network models. In this paper, leveraging recent advances in random matrix theory (RMT), we perform an in-depth analysis on the eigenspectra of… ▽ More

    Submitted 19 May, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

    Comments: Accepted by ICML 2024

  27. arXiv:2401.17623  [pdf, other

    cs.CL

    Neighboring Perturbations of Knowledge Editing on Large Language Models

    Authors: Jun-Yu Ma, Zhen-Hua Ling, Ningyu Zhang, Jia-Chen Gu

    Abstract: Despite their exceptional capabilities, large language models (LLMs) are prone to generating unintended text due to false or outdated knowledge. Given the resource-intensive nature of retraining LLMs, there has been a notable increase in the development of knowledge editing. However, current approaches and evaluations rarely explore the perturbation of editing on neighboring knowledge. This paper… ▽ More

    Submitted 26 May, 2024; v1 submitted 31 January, 2024; originally announced January 2024.

    Comments: Accepted by ICML 2024

  28. arXiv:2401.15884  [pdf, other

    cs.CL

    Corrective Retrieval Augmented Generation

    Authors: Shi-Qi Yan, Jia-Chen Gu, Yun Zhu, Zhen-Hua Ling

    Abstract: Large language models (LLMs) inevitably exhibit hallucinations since the accuracy of generated texts cannot be secured solely by the parametric knowledge they encapsulate. Although retrieval-augmented generation (RAG) is a practicable complement to LLMs, it relies heavily on the relevance of retrieved documents, raising concerns about how the model behaves if retrieval goes wrong. To this end, we… ▽ More

    Submitted 16 February, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

  29. Adversarial speech for voice privacy protection from Personalized Speech generation

    Authors: Shihao Chen, Liping Chen, Jie Zhang, KongAik Lee, Zhenhua Ling, Lirong Dai

    Abstract: The rapid progress in personalized speech generation technology, including personalized text-to-speech (TTS) and voice conversion (VC), poses a challenge in distinguishing between generated and real speech for human listeners, resulting in an urgent demand in protecting speakers' voices from malicious misuse. In this regard, we propose a speaker protection method based on adversarial attacks. The… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: Accepted by icassp 2024

  30. arXiv:2401.06387  [pdf, other

    eess.AS cs.SD eess.SP

    Towards High-Quality and Efficient Speech Bandwidth Extension with Parallel Amplitude and Phase Prediction

    Authors: Ye-Xin Lu, Yang Ai, Hui-Peng Du, Zhen-Hua Ling

    Abstract: Speech bandwidth extension (BWE) refers to widening the frequency bandwidth range of speech signals, enhancing the speech quality towards brighter and fuller. This paper proposes a generative adversarial network (GAN) based BWE model with parallel prediction of Amplitude and Phase spectra, named AP-BWE, which achieves both high-quality and efficient wideband speech waveform generation. The propose… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

    Comments: Submitted to IEEE/ACM Transactions on Audio, Speech, and Language Processing

  31. arXiv:2401.04700  [pdf, other

    cs.CL

    Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescue

    Authors: Jia-Chen Gu, Hao-Xiang Xu, Jun-Yu Ma, Pan Lu, Zhen-Hua Ling, Kai-Wei Chang, Nanyun Peng

    Abstract: Model editing is a technique that edits the large language models (LLMs) with updated knowledge to alleviate hallucinations without resource-intensive retraining. While current model editing methods can effectively modify a model's behavior within a specific area of interest, they often overlook the potential unintended side effects on the general abilities of LLMs such as reasoning, natural langu… ▽ More

    Submitted 16 June, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

    Comments: Propose a new regularization method

  32. arXiv:2312.15997  [pdf, other

    cs.CL

    Aligning Large Language Models with Human Preferences through Representation Engineering

    Authors: Wenhao Liu, Xiaohua Wang, Muling Wu, Tianlong Li, Changze Lv, Zixuan Ling, Jianhao Zhu, Cenyuan Zhang, Xiaoqing Zheng, Xuanjing Huang

    Abstract: Aligning large language models (LLMs) with human preferences is crucial for enhancing their utility in terms of helpfulness, truthfulness, safety, harmlessness, and interestingness. Existing methods for achieving this alignment often involves employing reinforcement learning from human feedback (RLHF) to fine-tune LLMs based on human labels assessing the relative quality of model responses. Nevert… ▽ More

    Submitted 3 July, 2024; v1 submitted 26 December, 2023; originally announced December 2023.

  33. arXiv:2312.15946  [pdf, other

    cs.SD cs.GR eess.AS

    EnchantDance: Unveiling the Potential of Music-Driven Dance Movement

    Authors: Bo Han, Yi Ren, Hao Peng, Teng Zhang, Zeyu Ling, Xiang Yin, Feilin Han

    Abstract: The task of music-driven dance generation involves creating coherent dance movements that correspond to the given music. While existing methods can produce physically plausible dances, they often struggle to generalize to out-of-set data. The challenge arises from three aspects: 1) the high diversity of dance movements and significant differences in the distribution of music modalities, which make… ▽ More

    Submitted 26 December, 2023; originally announced December 2023.

  34. arXiv:2312.08749  [pdf, other

    cs.LG cs.CY

    Mitigating Label Bias in Machine Learning: Fairness through Confident Learning

    Authors: Yixuan Zhang, Boyu Li, Zenan Ling, Feng Zhou

    Abstract: Discrimination can occur when the underlying unbiased labels are overwritten by an agent with potential bias, resulting in biased datasets that unfairly harm specific groups and cause classifiers to inherit these biases. In this paper, we demonstrate that despite only having access to the biased labels, it is possible to eliminate bias by filtering the fairest instances within the framework of con… ▽ More

    Submitted 24 December, 2023; v1 submitted 14 December, 2023; originally announced December 2023.

  35. arXiv:2312.06964  [pdf, other

    astro-ph.IM hep-ex physics.ins-det

    Ground Calibration Result of the Lobster Eye Imager for Astronomy

    Authors: Huaqing Cheng, Zhixing Ling, Chen Zhang, Xiaojin Sun, Shengli Sun, Yuan Liu, Yanfeng Dai, Zhenqing Jia, Haiwu Pan, Wenxin Wang, Donghua Zhao, Yifan Chen, Zhiwei Cheng, Wei Fu, Yixiao Han, Junfei Li, Zhengda Li, Xiaohao Ma, Yulong Xue, Ailiang Yan, Qiang Zhang, Yusa Wang, Xiongtao Yang, Zijian Zhao, Weimin Yuan

    Abstract: We report on results of the on-ground X-ray calibration of the Lobster Eye Imager for Astronomy (LEIA), an experimental space wide-field (18.6*18.6 square degrees) X-ray telescope built from novel lobster eye mirco-pore optics. LEIA was successfully launched on July 27, 2022 onboard the SATech-01 satellite. To achieve full characterisation of its performance before launch, a series of tests and ca… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Comments: 24 pages, 13 figures. Submitted to Experimental Astronomy

  36. arXiv:2312.04817  [pdf, other

    cs.CV

    MoVQA: A Benchmark of Versatile Question-Answering for Long-Form Movie Understanding

    Authors: Hongjie Zhang, Yi Liu, Lu Dong, Yifei Huang, Zhen-Hua Ling, Yali Wang, Limin Wang, Yu Qiao

    Abstract: While several long-form VideoQA datasets have been introduced, the length of both videos used to curate questions and sub-clips of clues leveraged to answer those questions have not yet reached the criteria for genuine long-form video understanding. Moreover, their QAs are unduly narrow and modality-biased, lacking a wider view of understanding long-term video content with rich dynamics and comple… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

  37. arXiv:2312.01851  [pdf, other

    astro-ph.IM hep-ex physics.ins-det

    Radiation effects on scientific CMOS sensors for X-ray astronomy: I. proton irradiation

    Authors: Mingjun Liu, Zhixing Ling, Qinyu Wu, Chen Zhang, Jiaqiang Liu, Zhenlong Zhang, Weimin Yuan, Shuang-Nan Zhang

    Abstract: Complementary metal-oxide-semiconductor (CMOS) sensors are a competitive choice for future X-ray astronomy missions. Typically, CMOS sensors on space astronomical telescopes are exposed to a high dose of irradiation. We investigate the impact of irradiation on the performance of two scientific CMOS (sCMOS) sensors between -30 to 20 degree at high gain mode (7.5 times), including the bias map, read… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: accepted by JATIS

  38. arXiv:2311.13436  [pdf, other

    eess.AS

    Sparsity-Driven EEG Channel Selection for Brain-Assisted Speech Enhancement

    Authors: Jie Zhang, Qing-Tian Xu, Zhen-Hua Ling, Haizhou Li

    Abstract: Speech enhancement is widely used as a front-end to improve the speech quality in many audio systems, while it is hard to extract the target speech in multi-talker conditions without prior information on the speaker identity. It was shown that the auditory attention on the target speaker can be decoded from the electroencephalogram (EEG) of the listener implicitly. In this work, we therefore propo… ▽ More

    Submitted 25 June, 2024; v1 submitted 22 November, 2023; originally announced November 2023.

    Comments: arXiv admin note: text overlap with arXiv:2305.09994

  39. arXiv:2311.11545  [pdf, other

    eess.AS

    APNet2: High-quality and High-efficiency Neural Vocoder with Direct Prediction of Amplitude and Phase Spectra

    Authors: Hui-Peng Du, Ye-Xin Lu, Yang Ai, Zhen-Hua Ling

    Abstract: In our previous work, we proposed a neural vocoder called APNet, which directly predicts speech amplitude and phase spectra with a 5 ms frame shift in parallel from the input acoustic features, and then reconstructs the 16 kHz speech waveform using inverse short-time Fourier transform (ISTFT). APNet demonstrates the capability to generate synthesized speech of comparable quality to the HiFi-GAN vo… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

  40. arXiv:2311.00694  [pdf, other

    cs.AI cs.CL

    Unleashing the Creative Mind: Language Model As Hierarchical Policy For Improved Exploration on Challenging Problem Solving

    Authors: Zhan Ling, Yunhao Fang, Xuanlin Li, Tongzhou Mu, Mingu Lee, Reza Pourreza, Roland Memisevic, Hao Su

    Abstract: Large Language Models (LLMs) have achieved tremendous progress, yet they still often struggle with challenging reasoning problems. Current approaches address this challenge by sampling or searching detailed and low-level reasoning chains. However, these methods are still limited in their exploration capabilities, making it challenging for correct solutions to stand out in the huge solution space.… ▽ More

    Submitted 5 December, 2023; v1 submitted 1 November, 2023; originally announced November 2023.

  41. arXiv:2310.16582  [pdf, other

    cs.CL

    Tailoring Personality Traits in Large Language Models via Unsupervisedly-Built Personalized Lexicons

    Authors: Tianlong Li, Shihan Dou, Changze Lv, Wenhao Liu, Jianhan Xu, Muling Wu, Zixuan Ling, Xiaoqing Zheng, Xuanjing Huang

    Abstract: Personality plays a pivotal role in shaping human expression patterns, thus regulating the personality of large language models (LLMs) holds significant potential in enhancing the user experience of LLMs. Previous methods either relied on fine-tuning LLMs on specific corpora or necessitated manually crafted prompts to elicit specific personalities from LLMs. However, the former approach is ineffic… ▽ More

    Submitted 6 January, 2024; v1 submitted 25 October, 2023; originally announced October 2023.

    Comments: Work in progress

  42. arXiv:2310.16301  [pdf, other

    cs.CL

    Is ChatGPT a Good Multi-Party Conversation Solver?

    Authors: Chao-Hong Tan, Jia-Chen Gu, Zhen-Hua Ling

    Abstract: Large Language Models (LLMs) have emerged as influential instruments within the realm of natural language processing; nevertheless, their capacity to handle multi-party conversations (MPCs) -- a scenario marked by the presence of multiple interlocutors involved in intricate information exchanges -- remains uncharted. In this paper, we delve into the potential of generative LLMs such as ChatGPT and… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: Accepted by Findings of EMNLP 2023

  43. arXiv:2310.14887  [pdf, other

    astro-ph.IM hep-ex physics.ins-det

    An Aluminum-coated sCMOS sensor for X-Ray Astronomy

    Authors: Qinyu Wu, Zhixing Ling, Chen Zhang, Shuang-Nan Zhang, Weimin Yuan

    Abstract: In recent years, tremendous progress has been made on scientific Complementary Metal Oxide Semiconductor (sCMOS) sensors, making them a promising device for future space X-ray missions. We have customized a large-format sCMOS sensor, G1516BI, dedicated for X-ray applications. In this work, a 200 nm thick aluminum layer is successfully sputtered on the surface of this sensor. This Al-coated sensor,… ▽ More

    Submitted 4 December, 2023; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: published on PASP

  44. arXiv:2310.11595  [pdf, other

    cs.CV cs.AI

    WaveAttack: Asymmetric Frequency Obfuscation-based Backdoor Attacks Against Deep Neural Networks

    Authors: Jun Xia, Zhihao Yue, Yingbo Zhou, Zhiwei Ling, Xian Wei, Mingsong Chen

    Abstract: Due to the popularity of Artificial Intelligence (AI) technology, numerous backdoor attacks are designed by adversaries to mislead deep neural network predictions by manipulating training samples and training processes. Although backdoor attacks are effective in various real scenarios, they still suffer from the problems of both low fidelity of poisoned samples and non-negligible transfer in laten… ▽ More

    Submitted 19 October, 2023; v1 submitted 17 October, 2023; originally announced October 2023.

  45. arXiv:2310.10379  [pdf, other

    cs.LG stat.ML

    Revisiting Logistic-softmax Likelihood in Bayesian Meta-Learning for Few-Shot Classification

    Authors: Tianjun Ke, Haoqun Cao, Zenan Ling, Feng Zhou

    Abstract: Meta-learning has demonstrated promising results in few-shot classification (FSC) by learning to solve new problems using prior knowledge. Bayesian methods are effective at characterizing uncertainty in FSC, which is crucial in high-risk fields. In this context, the logistic-softmax likelihood is often employed as an alternative to the softmax likelihood in multi-class Gaussian process classificat… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

  46. arXiv:2310.10322  [pdf, other

    cs.CL

    Untying the Reversal Curse via Bidirectional Language Model Editing

    Authors: Jun-Yu Ma, Jia-Chen Gu, Zhen-Hua Ling, Quan Liu, Cong Liu

    Abstract: Recent studies have demonstrated that large language models (LLMs) store massive factual knowledge within their parameters. But existing LLMs are prone to hallucinate unintended text due to false or outdated knowledge. Since retraining LLMs is resource intensive, there has been a growing interest in the concept of model editing. Despite the emergence of benchmarks and approaches, these unidirectio… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

  47. arXiv:2310.04185  [pdf, other

    cs.NI

    Cross-Edge Orchestration of Serverless Functions with Probabilistic Caching

    Authors: Chen Chen, Manuel Herrera, Ge Zheng, Liqiao Xia, Zhengyang Ling, Jiangtao Wang

    Abstract: Serverless edge computing adopts an event-based paradigm that provides back-end services on an as-used basis, resulting in efficient resource utilization. To improve the end-to-end latency and revenue, service providers need to optimize the number and placement of serverless containers while considering the system cost incurred by the provisioning. The particular reason for this circumstance is th… ▽ More

    Submitted 6 October, 2023; originally announced October 2023.

  48. arXiv:2309.10455  [pdf, other

    eess.AS cs.SD

    Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement

    Authors: Rui-Chen Zheng, Yang Ai, Zhen-Hua Ling

    Abstract: Audio-visual speech enhancement (AV-SE) aims to enhance degraded speech along with extra visual information such as lip videos, and has been shown to be more effective than audio-only speech enhancement. This paper proposes the incorporation of ultrasound tongue images to improve the performance of lip-based AV-SE systems further. To address the challenge of acquiring ultrasound tongue images duri… ▽ More

    Submitted 20 November, 2023; v1 submitted 19 September, 2023; originally announced September 2023.

    Comments: Submmited to IEEE/ACM Transactions on Audio, Speech and Language Processing. arXiv admin note: text overlap with arXiv:2305.14933

  49. arXiv:2309.09470  [pdf, other

    cs.SD cs.LG cs.MM eess.AS

    Face-Driven Zero-Shot Voice Conversion with Memory-based Face-Voice Alignment

    Authors: Zheng-Yan Sheng, Yang Ai, Yan-Nian Chen, Zhen-Hua Ling

    Abstract: This paper presents a novel task, zero-shot voice conversion based on face images (zero-shot FaceVC), which aims at converting the voice characteristics of an utterance from any source speaker to a newly coming target speaker, solely relying on a single face image of the target speaker. To address this task, we propose a face-voice memory-based zero-shot FaceVC method. This method leverages a memo… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

  50. arXiv:2309.03031  [pdf, other

    cs.CV

    MCM: Multi-condition Motion Synthesis Framework for Multi-scenario

    Authors: Zeyu Ling, Bo Han, Yongkang Wong, Mohan Kangkanhalli, Weidong Geng

    Abstract: The objective of the multi-condition human motion synthesis task is to incorporate diverse conditional inputs, encompassing various forms like text, music, speech, and more. This endows the task with the capability to adapt across multiple scenarios, ranging from text-to-motion and music-to-dance, among others. While existing research has primarily focused on single conditions, the multi-condition… ▽ More

    Submitted 6 September, 2023; originally announced September 2023.