Skip to main content

Showing 1–50 of 338 results for author: Cai, D

  1. arXiv:2407.10701  [pdf, other

    cs.CL

    DOCBENCH: A Benchmark for Evaluating LLM-based Document Reading Systems

    Authors: Anni Zou, Wenhao Yu, Hongming Zhang, Kaixin Ma, Deng Cai, Zhuosheng Zhang, Hai Zhao, Dong Yu

    Abstract: Recently, there has been a growing interest among large language model (LLM) developers in LLM-based document reading systems, which enable users to upload their own documents and pose questions related to the document contents, going beyond simple reading comprehension tasks. Consequently, these systems have been carefully designed to tackle challenges such as file parsing, metadata extraction, m… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Work in progress

  2. arXiv:2407.09787  [pdf, other

    cs.CV

    Semi-supervised 3D Object Detection with PatchTeacher and PillarMix

    Authors: Xiaopei Wu, Liang Peng, Liang Xie, Yuenan Hou, Binbin Lin, Xiaoshui Huang, Haifeng Liu, Deng Cai, Wanli Ouyang

    Abstract: Semi-supervised learning aims to leverage numerous unlabeled data to improve the model performance. Current semi-supervised 3D object detection methods typically use a teacher to generate pseudo labels for a student, and the quality of the pseudo labels is essential for the final performance. In this paper, we propose PatchTeacher, which focuses on partial scene 3D object detection to provide high… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: Accepted by AAAI 2024

  3. arXiv:2407.09751  [pdf, other

    cs.CV

    TASeg: Temporal Aggregation Network for LiDAR Semantic Segmentation

    Authors: Xiaopei Wu, Yuenan Hou, Xiaoshui Huang, Binbin Lin, Tong He, Xinge Zhu, Yuexin Ma, Boxi Wu, Haifeng Liu, Deng Cai, Wanli Ouyang

    Abstract: Training deep models for LiDAR semantic segmentation is challenging due to the inherent sparsity of point clouds. Utilizing temporal data is a natural remedy against the sparsity problem as it makes the input signal denser. However, previous multi-frame fusion algorithms fall short in utilizing sufficient temporal information due to the memory constraint, and they also ignore the informative tempo… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Accepted by CVPR 2024

  4. arXiv:2407.07457  [pdf, other

    cs.LG cs.CL

    GLBench: A Comprehensive Benchmark for Graph with Large Language Models

    Authors: Yuhan Li, Peisong Wang, Xiao Zhu, Aochuan Chen, Haiyun Jiang, Deng Cai, Victor Wai Kin Chan, Jia Li

    Abstract: The emergence of large language models (LLMs) has revolutionized the way we interact with graphs, leading to a new paradigm called GraphLLM. Despite the rapid development of GraphLLM methods in recent years, the progress and understanding of this field remain unclear due to the lack of a benchmark with consistent experimental protocols. To bridge this gap, we introduce GLBench, the first comprehen… ▽ More

    Submitted 11 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2306.10280 by other authors

  5. arXiv:2407.00132  [pdf, other

    cs.SE cs.AI

    ShortcutsBench: A Large-Scale Real-world Benchmark for API-based Agents

    Authors: Haiyang Shen, Yue Li, Desong Meng, Dongqi Cai, Sheng Qi, Li Zhang, Mengwei Xu, Yun Ma

    Abstract: Recent advancements in integrating large language models (LLMs) with application programming interfaces (APIs) have gained significant interest in both academia and industry. These API-based agents, leveraging the strong autonomy and planning capabilities of LLMs, can efficiently solve problems requiring multi-step actions. However, their ability to handle multi-dimensional difficulty levels, dive… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

  6. arXiv:2406.17312  [pdf, other

    cs.CL

    Not All Preference Pairs Are Created Equal: A Recipe for Annotation-Efficient Iterative Preference Learning

    Authors: Sen Yang, Leyang Cui, Deng Cai, Xinting Huang, Shuming Shi, Wai Lam

    Abstract: Iterative preference learning, though yielding superior performances, requires online annotated preference labels. In this work, we study strategies to select worth-annotating response pairs for cost-efficient annotation while achieving competitive or even better performances compared with the random selection baseline for iterative preference learning. Built on assumptions regarding uncertainty a… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  7. arXiv:2406.16377  [pdf, other

    cs.CL cs.AI

    On the Transformations across Reward Model, Parameter Update, and In-Context Prompt

    Authors: Deng Cai, Huayang Li, Tingchen Fu, Siheng Li, Weiwen Xu, Shuaiyi Li, Bowen Cao, Zhisong Zhang, Xinting Huang, Leyang Cui, Yan Wang, Lemao Liu, Taro Watanabe, Shuming Shi

    Abstract: Despite the general capabilities of pre-trained large language models (LLMs), they still need further adaptation to better serve practical applications. In this paper, we demonstrate the interchangeability of three popular and distinct adaptation tools: parameter updating, reward modeling, and in-context prompting. This interchangeability establishes a triangular framework with six transformation… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  8. arXiv:2406.10248  [pdf, other

    cs.CL cs.AI

    On the Worst Prompt Performance of Large Language Models

    Authors: Bowen Cao, Deng Cai, Zhisong Zhang, Yuexian Zou, Wai Lam

    Abstract: The performance of large language models (LLMs) is acutely sensitive to the phrasing of prompts, which raises significant concerns about their reliability in real-world scenarios. Existing studies often divide prompts into task-level instructions and case-level inputs and primarily focus on evaluating and improving robustness against variations in tasks-level instructions. However, this setup fail… ▽ More

    Submitted 21 June, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

  9. arXiv:2406.09961  [pdf, other

    cs.SE cs.CL cs.CV

    ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation

    Authors: Chufan Shi, Cheng Yang, Yaxin Liu, Bo Shui, Junjie Wang, Mohan Jing, Linran Xu, Xinyu Zhu, Siheng Li, Yuxiang Zhang, Gongye Liu, Xiaomei Nie, Deng Cai, Yujiu Yang

    Abstract: We introduce a new benchmark, ChartMimic, aimed at assessing the visually-grounded code generation capabilities of large multimodal models (LMMs). ChartMimic utilizes information-intensive visual charts and textual instructions as inputs, requiring LMMs to generate the corresponding code for chart rendering. ChartMimic includes 1,000 human-curated (figure, instruction, code) triplets, which repres… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Data and code are available at https://github.com/ChartMimic/ChartMimic

  10. arXiv:2406.04594  [pdf, other

    cs.DC cs.AI cs.LG

    Boosting Large-scale Parallel Training Efficiency with C4: A Communication-Driven Approach

    Authors: Jianbo Dong, Bin Luo, Jun Zhang, Pengcheng Zhang, Fei Feng, Yikai Zhu, Ang Liu, Zian Chen, Yi Shi, Hairong Jiao, Gang Lu, Yu Guan, Ennan Zhai, Wencong Xiao, Hanyu Zhao, Man Yuan, Siran Yang, Xiang Li, Jiamang Wang, Rui Men, Jianwei Zhang, Huang Zhong, Dennis Cai, Yuan Xie, Binzhang Fu

    Abstract: The emergence of Large Language Models (LLMs) has necessitated the adoption of parallel training techniques, involving the deployment of thousands of GPUs to train a single model. Unfortunately, we have found that the efficiency of current parallel training is often suboptimal, largely due to the following two main issues. Firstly, hardware failures are inevitable, leading to interruptions in the… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  11. arXiv:2405.14507  [pdf, other

    cs.CL cs.LG

    Unchosen Experts Can Contribute Too: Unleashing MoE Models' Power by Self-Contrast

    Authors: Chufan Shi, Cheng Yang, Xinyu Zhu, Jiahao Wang, Taiqiang Wu, Siheng Li, Deng Cai, Yujiu Yang, Yu Meng

    Abstract: Mixture-of-Experts (MoE) has emerged as a prominent architecture for scaling model size while maintaining computational efficiency. In MoE, each token in the input sequence activates a different subset of experts determined by a routing mechanism. However, the unchosen experts in MoE models do not contribute to the output, potentially leading to underutilization of the model's capacity. In this wo… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  12. arXiv:2405.13432  [pdf, other

    cs.CL cs.AI

    Disperse-Then-Merge: Pushing the Limits of Instruction Tuning via Alignment Tax Reduction

    Authors: Tingchen Fu, Deng Cai, Lemao Liu, Shuming Shi, Rui Yan

    Abstract: Supervised fine-tuning (SFT) on instruction-following corpus is a crucial approach toward the alignment of large language models (LLMs). However, the performance of LLMs on standard knowledge and reasoning benchmarks tends to suffer from deterioration at the latter stage of the SFT process, echoing the phenomenon of alignment tax. Through our pilot study, we put a hypothesis that the data biases a… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: Accepted to the findings of ACL2024

  13. arXiv:2405.11735  [pdf, other

    q-bio.GN

    Accurate and efficient protein embedding using multi-teacher distillation learning

    Authors: Jiayu Shang, Cheng Peng, Yongxin Ji, Jiaojiao Guan, Dehan Cai, Xubo Tang, Yanni Sun

    Abstract: Motivation: Protein embedding, which represents proteins as numerical vectors, is a crucial step in various learning-based protein annotation/classification problems, including gene ontology prediction, protein-protein interaction prediction, and protein structure prediction. However, existing protein embedding methods are often computationally expensive due to their large number of parameters, wh… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: 3 pages; 1 figure

  14. arXiv:2405.03173  [pdf, other

    quant-ph

    Performance Upper Bound of Grover-Mixer Quantum Alternating Operator Ansatz

    Authors: Ningyi Xie, Jiahua Xu, Tiejin Chen, Xinwei Lee, Yoshiyuki Saito, Nobuyoshi Asai, Dongsheng Cai

    Abstract: The Quantum Alternating Operator Ansatz (QAOA) represents a branch of quantum algorithms for solving combinatorial optimization problems. A specific variant, the Grover-Mixer Quantum Alternating Operator Ansatz (GM-QAOA), ensures uniform amplitude across states that share equivalent objective values. This property makes the algorithm independent of the problem structure, focusing instead on the di… ▽ More

    Submitted 24 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

    Comments: 19 pages, 7 figures, 1 table

  15. arXiv:2405.01833  [pdf, other

    quant-ph

    Feed-Forward Probabilistic Error Cancellation with Noisy Recovery Gates

    Authors: Leo Kurosawa, Yoshiyuki Saito, Xinwei Lee, Xinjian Yan, Ningyi Xie, Dongsheng Cai, Nobuyoshi Asai

    Abstract: Probabilistic Error Cancellation (PEC) aims to improve the accuracy of expectation values for observables.This is accomplished using the probabilistic insertion of recovery gates, which correspond to the inverse of errors.However, the inserted recovery gates also induce errors. Thus, it is difficult to obtain accurate expectation values with PEC since the estimator of PEC has a bias due to noise i… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  16. arXiv:2404.19497  [pdf, other

    quant-ph

    Light Cone Cancellation for Variational Quantum Eigensolver Ansatz

    Authors: Xinjian Yan, Xinwei Lee, Ningyi Xie, Yoshiyuki Saito, Leo Kurosawa, Nobuyoshi Asai, Dongsheng Cai, HoongChuin Lau

    Abstract: Variational Quantum Algorithms (VQAs) represent a class of algorithms that utilize a hybrid approach, combining classical and quantum computing techniques. In this approach, classical computers serve as optimizers that update circuit parameters to find approximate solutions to complex problems. In this study, we apply a method known as Light Cone Cancellation (LCC) to optimize variational circuits… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  17. arXiv:2404.19384  [pdf, other

    cs.CV cs.AI

    Pseudo Label Refinery for Unsupervised Domain Adaptation on Cross-dataset 3D Object Detection

    Authors: Zhanwei Zhang, Minghao Chen, Shuai Xiao, Liang Peng, Hengjia Li, Binbin Lin, Ping Li, Wenxiao Wang, Boxi Wu, Deng Cai

    Abstract: Recent self-training techniques have shown notable improvements in unsupervised domain adaptation for 3D object detection (3D UDA). These techniques typically select pseudo labels, i.e., 3D boxes, to supervise models for the target domain. However, this selection process inevitably introduces unreliable 3D boxes, in which 3D points cannot be definitively assigned as foreground or background. Previ… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024

  18. arXiv:2404.19330  [pdf, other

    cs.CV cs.AI

    G2LTraj: A Global-to-Local Generation Approach for Trajectory Prediction

    Authors: Zhanwei Zhang, Zishuo Hua, Minghao Chen, Wei Lu, Binbin Lin, Deng Cai, Wenxiao Wang

    Abstract: Predicting future trajectories of traffic agents accurately holds substantial importance in various applications such as autonomous driving. Previous methods commonly infer all future steps of an agent either recursively or simultaneously. However, the recursive strategy suffers from the accumulated error, while the simultaneous strategy overlooks the constraints among future steps, resulting in k… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: Accepted by IJCAI 2024

  19. arXiv:2404.18292  [pdf, other

    physics.chem-ph

    Excimer Formation in Zinc-phthalocyanine Revealed using Ultrafast Electron Diffraction

    Authors: Sebastian Hammer, Tristan L. Britt, Laurenz Kremeyer, Maximilian Rödel, David Cai, Jens Pflaum, Bradley Siwick

    Abstract: The formation of excited dimer states, so called excimers, is an important phenomenon in many organic molecular semiconductors. In contrast to Frenkel exciton-polaron excited states, an excimer is long-lived and energetically low-lying due to stabilization resulting from a substantial reorganization of the inter-molecular geometry. In this letter, we show that ultrafast electron diffraction can fo… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  20. arXiv:2404.15949  [pdf, other

    cs.CL cs.AI cs.LG

    CORM: Cache Optimization with Recent Message for Large Language Model Inference

    Authors: Jincheng Dai, Zhuowei Huang, Haiyun Jiang, Chen Chen, Deng Cai, Wei Bi, Shuming Shi

    Abstract: Large Language Models (LLMs), despite their remarkable performance across a wide range of tasks, necessitate substantial GPU memory and consume significant computational resources. Beyond the memory taken up by model weights, the memory used by the KV cache rises linearly with sequence length, becoming a primary bottleneck for inference. In this paper, we introduce an innovative method for optimiz… ▽ More

    Submitted 21 June, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

  21. arXiv:2404.15284  [pdf, other

    eess.SP cs.AI

    Global 4D Ionospheric STEC Prediction based on DeepONet for GNSS Rays

    Authors: Dijia Cai, Zenghui Shi, Haiyang Fu, Huan Liu, Hongyi Qian, Yun Sui, Feng Xu, Ya-Qiu Jin

    Abstract: The ionosphere is a vitally dynamic charged particle region in the Earth's upper atmosphere, playing a crucial role in applications such as radio communication and satellite navigation. The Slant Total Electron Contents (STEC) is an important parameter for characterizing wave propagation, representing the integrated electron density along the ray of radio signals passing through the ionosphere. Th… ▽ More

    Submitted 12 March, 2024; originally announced April 2024.

  22. arXiv:2404.09463  [pdf

    cs.LG

    PRIME: A CyberGIS Platform for Resilience Inference Measurement and Enhancement

    Authors: Debayan Mandal, Dr. Lei Zou, Rohan Singh Wilkho, Joynal Abedin, Bing Zhou, Dr. Heng Cai, Dr. Furqan Baig, Dr. Nasir Gharaibeh, Dr. Nina Lam

    Abstract: In an era of increased climatic disasters, there is an urgent need to develop reliable frameworks and tools for evaluating and improving community resilience to climatic hazards at multiple geographical and temporal scales. Defining and quantifying resilience in the social domain is relatively subjective due to the intricate interplay of socioeconomic factors with disaster resilience. Meanwhile, t… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: 28 pages, 6 figures

  23. arXiv:2403.16820  [pdf, other

    cs.CL

    Cross-lingual Contextualized Phrase Retrieval

    Authors: Huayang Li, Deng Cai, Zhi Qu, Qu Cui, Hidetaka Kamigaito, Lemao Liu, Taro Watanabe

    Abstract: Phrase-level dense retrieval has shown many appealing characteristics in downstream NLP tasks by leveraging the fine-grained information that phrases offer. In our work, we propose a new task formulation of dense retrieval, cross-lingual contextualized phrase retrieval, which aims to augment cross-lingual applications by addressing polysemy using context information. However, the lack of specific… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: preprint

  24. arXiv:2403.14103  [pdf, other

    cs.CV

    MaskSAM: Towards Auto-prompt SAM with Mask Classification for Medical Image Segmentation

    Authors: Bin Xie, Hao Tang, Bin Duan, Dawen Cai, Yan Yan

    Abstract: Segment Anything Model~(SAM), a prompt-driven foundation model for natural image segmentation, has demonstrated impressive zero-shot performance. However, SAM does not work when directly applied to medical image segmentation tasks, since SAM lacks the functionality to predict semantic labels for predicted masks and needs to provide extra prompts, such as points or boxes, to segment target regions.… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  25. arXiv:2403.11627  [pdf, other

    cs.CV

    LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models

    Authors: Yang Yang, Wen Wang, Liang Peng, Chaotian Song, Yao Chen, Hengjia Li, Xiaolong Yang, Qinglin Lu, Deng Cai, Boxi Wu, Wei Liu

    Abstract: Customization generation techniques have significantly advanced the synthesis of specific concepts across varied contexts. Multi-concept customization emerges as the challenging task within this domain. Existing approaches often rely on training a fusion matrix of multiple Low-Rank Adaptations (LoRAs) to merge various concepts into a single image. However, we identify this straightforward method f… ▽ More

    Submitted 10 July, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: project page: https://github.com/Young98CN/LoRA_Composer

  26. arXiv:2403.10683  [pdf, other

    cs.CV

    GS-Pose: Cascaded Framework for Generalizable Segmentation-based 6D Object Pose Estimation

    Authors: Dingding Cai, Janne Heikkilä, Esa Rahtu

    Abstract: This paper introduces GS-Pose, an end-to-end framework for locating and estimating the 6D pose of objects. GS-Pose begins with a set of posed RGB images of a previously unseen object and builds three distinct representations stored in a database. At inference, GS-Pose operates sequentially by locating the object in the input image, estimating its initial 6D pose using a retrieval approach, and ref… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: Project Page: https://dingdingcai.github.io/gs-pose

  27. arXiv:2403.06251  [pdf, other

    q-bio.NC cs.CV cs.LG

    Online Multi-spectral Neuron Tracing

    Authors: Bin Duan, Yuzhang Shang, Dawen Cai, Yan Yan

    Abstract: In this paper, we propose an online multi-spectral neuron tracing method with uniquely designed modules, where no offline training are required. Our method is trained online to update our enhanced discriminative correlation filter to conglutinate the tracing process. This distinctive offline-training-free schema differentiates us from other training-dependent tracing approaches like deep learning… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  28. arXiv:2403.05330  [pdf, other

    cs.CL

    Consecutive Model Editing with Batch alongside HooK Layers

    Authors: Shuaiyi Li, Yang Deng, Deng Cai, Hongyuan Lu, Liang Chen, Wai Lam

    Abstract: As the typical retraining paradigm is unacceptably time- and resource-consuming, researchers are turning to model editing in order to seek an effective, consecutive, and batch-supportive way to edit the model behavior directly. Despite all these practical expectations, existing model editing methods fail to realize all of them. Furthermore, the memory demands for such succession-supportive model e… ▽ More

    Submitted 17 April, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

    Comments: Under review

  29. arXiv:2403.04309  [pdf, other

    cs.CV cs.AI

    AO-DETR: Anti-Overlapping DETR for X-Ray Prohibited Items Detection

    Authors: Mingyuan Li, Tong Jia, Hao Wang, Bowen Ma, Shuyang Lin, Da Cai, Dongyue Chen

    Abstract: Prohibited item detection in X-ray images is one of the most essential and highly effective methods widely employed in various security inspection scenarios. Considering the significant overlapping phenomenon in X-ray prohibited item images, we propose an Anti-Overlapping DETR (AO-DETR) based on one of the state-of-the-art general object detectors, DINO. Specifically, to address the feature coupli… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  30. arXiv:2403.00881  [pdf, other

    cs.LG cs.DC cs.NI

    FedRDMA: Communication-Efficient Cross-Silo Federated LLM via Chunked RDMA Transmission

    Authors: Zeling Zhang, Dongqi Cai, Yiran Zhang, Mengwei Xu, Shangguang Wang, Ao Zhou

    Abstract: Communication overhead is a significant bottleneck in federated learning (FL), which has been exaggerated with the increasing size of AI models. In this paper, we propose FedRDMA, a communication-efficient cross-silo FL system that integrates RDMA into the FL communication protocol. To overcome the limitations of RDMA in wide-area networks (WANs), FedRDMA divides the updated model into chunks and… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

    Comments: under review

  31. arXiv:2402.17532  [pdf, other

    cs.CL

    Retrieval is Accurate Generation

    Authors: Bowen Cao, Deng Cai, Leyang Cui, Xuxin Cheng, Wei Bi, Yuexian Zou, Shuming Shi

    Abstract: Standard language models generate text by selecting tokens from a fixed, finite, and standalone vocabulary. We introduce a novel method that selects context-aware phrases from a collection of supporting documents. One of the most significant challenges for this paradigm shift is determining the training oracles, because a string of text can be segmented in various ways and each segment can be retr… ▽ More

    Submitted 16 March, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: ICLR 2024

  32. arXiv:2402.15582  [pdf, other

    cond-mat.mtrl-sci

    Probabilistic Prediction of Material Stability: Integrating Convex Hulls into Active Learning

    Authors: Andrew Novick, Diana Cai, Quan Nguyen, Roman Garnett, Ryan Adams, Eric Toberer

    Abstract: Active learning is a valuable tool for efficiently exploring complex spaces, finding a variety of uses in materials science. However, the determination of convex hulls for phase diagrams does not neatly fit into traditional active learning approaches due to their global nature. Specifically, the thermodynamic stability of a material is not simply a function of its own energy, but rather requires e… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  33. arXiv:2402.14758  [pdf, other

    stat.ML cs.AI cs.LG stat.CO

    Batch and match: black-box variational inference with a score-based divergence

    Authors: Diana Cai, Chirag Modi, Loucas Pillaud-Vivien, Charles C. Margossian, Robert M. Gower, David M. Blei, Lawrence K. Saul

    Abstract: Most leading implementations of black-box variational inference (BBVI) are based on optimizing a stochastic evidence lower bound (ELBO). But such approaches to BBVI often converge slowly due to the high variance of their gradient estimates and their sensitivity to hyperparameters. In this work, we propose batch and match (BaM), an alternative approach to BBVI based on a score-based divergence. Not… ▽ More

    Submitted 12 June, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: 49 pages, 14 figures. To appear in the Proceedings of the 41st International Conference on Machine Learning (ICML), 2024

  34. arXiv:2402.14464  [pdf, other

    cs.CV

    NeRF-Det++: Incorporating Semantic Cues and Perspective-aware Depth Supervision for Indoor Multi-View 3D Detection

    Authors: Chenxi Huang, Yuenan Hou, Weicai Ye, Di Huang, Xiaoshui Huang, Binbin Lin, Deng Cai, Wanli Ouyang

    Abstract: NeRF-Det has achieved impressive performance in indoor multi-view 3D detection by innovatively utilizing NeRF to enhance representation learning. Despite its notable performance, we uncover three decisive shortcomings in its current design, including semantic ambiguity, inappropriate sampling, and insufficient utilization of depth supervision. To combat the aforementioned problems, we present thre… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: 7 pages, 2 figures

  35. arXiv:2402.09748  [pdf, other

    cs.CL cs.AI cs.LG cs.PF

    Model Compression and Efficient Inference for Large Language Models: A Survey

    Authors: Wenxiao Wang, Wei Chen, Yicong Luo, Yongliu Long, Zhengkai Lin, Liye Zhang, Binbin Lin, Deng Cai, Xiaofei He

    Abstract: Transformer based large language models have achieved tremendous success. However, the significant memory and computational costs incurred during the inference process make it challenging to deploy large models on resource-constrained devices. In this paper, we investigate compression and efficient inference methods for large language models from an algorithmic perspective. Regarding taxonomy, sim… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

    Comments: 47 pages, review 380 papers. The work is ongoing

  36. arXiv:2402.06930  [pdf, other

    cs.CL

    LiFi: Lightweight Controlled Text Generation with Fine-Grained Control Codes

    Authors: Chufan Shi, Deng Cai, Yujiu Yang

    Abstract: In the rapidly evolving field of text generation, the demand for more precise control mechanisms has become increasingly apparent. To address this need, we present a novel methodology, LIFI, which offers a lightweight approach with fine-grained control for controlled text generation. Unlike previous studies that train pre-trained language models to follow discrete, categorical, and exclusive contr… ▽ More

    Submitted 10 February, 2024; originally announced February 2024.

  37. arXiv:2402.06925  [pdf, other

    cs.CL

    A Thorough Examination of Decoding Methods in the Era of LLMs

    Authors: Chufan Shi, Haoran Yang, Deng Cai, Zhisong Zhang, Yifan Wang, Yujiu Yang, Wai Lam

    Abstract: Decoding methods play an indispensable role in converting language models from next-token predictors into practical task solvers. Prior research on decoding methods, primarily focusing on task-specific models, may not extend to the current era of general-purpose large language models (LLMs). Moreover, the recent influx of decoding strategies has further complicated this landscape. This paper provi… ▽ More

    Submitted 17 June, 2024; v1 submitted 10 February, 2024; originally announced February 2024.

  38. arXiv:2401.15871  [pdf, other

    quant-ph

    Enhancing the expressivity of quantum neural networks with residual connections

    Authors: Jingwei Wen, Zhiguo Huang, Dunbo Cai, Ling Qian

    Abstract: In the recent noisy intermediate-scale quantum era, the research on the combination of artificial intelligence and quantum computing has been greatly developed. Inspired by neural networks, developing quantum neural networks with specific structures is one of the most promising directions for improving network performance. In this work, we propose a quantum circuit-based algorithm to implement qua… ▽ More

    Submitted 28 January, 2024; originally announced January 2024.

  39. arXiv:2401.12596  [pdf, other

    cs.CV

    UniHDA: A Unified and Versatile Framework for Multi-Modal Hybrid Domain Adaptation

    Authors: Hengjia Li, Yang Liu, Yuqi Lin, Zhanwei Zhang, Yibo Zhao, weihang Pan, Tu Zheng, Zheng Yang, Yuchun Jiang, Boxi Wu, Deng Cai

    Abstract: Recently, generative domain adaptation has achieved remarkable progress, enabling us to adapt a pre-trained generator to a new target domain. However, existing methods simply adapt the generator to a single target domain and are limited to a single modality, either text-driven or image-driven. Moreover, they cannot maintain well consistency with the source domain, which impedes the inheritance of… ▽ More

    Submitted 15 March, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

  40. arXiv:2401.11983   

    cs.SD cs.CR eess.AS

    Lightweight Protection for Privacy in Offloaded Speech Understanding

    Authors: Dongqi Cai

    Abstract: Speech is a common input method for mobile embedded devices, but cloud-based speech recognition systems pose privacy risks. Disentanglement-based encoders, designed to safeguard user privacy by filtering sensitive information from speech signals, unfortunately require substantial memory and computational resources, which limits their use in less powerful devices. To overcome this, we introduce a n… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: arXiv admin comment: This version has been removed by arXiv administrators as the submitter did not have the rights to agree to the license at the time of submission

  41. arXiv:2401.11504  [pdf, other

    cs.CL cs.AI

    With Greater Text Comes Greater Necessity: Inference-Time Training Helps Long Text Generation

    Authors: Y. Wang, D. Ma, D. Cai

    Abstract: Long text generation, such as novel writing and discourse-level translation with extremely long contexts, presents significant challenges to current language models. Existing methods mainly focus on extending the model's context window through strategies like length extrapolation. However, these approaches demand substantial hardware resources during the training and/or inference phases. Our propo… ▽ More

    Submitted 25 March, 2024; v1 submitted 21 January, 2024; originally announced January 2024.

  42. arXiv:2401.10491  [pdf, other

    cs.CL

    Knowledge Fusion of Large Language Models

    Authors: Fanqi Wan, Xinting Huang, Deng Cai, Xiaojun Quan, Wei Bi, Shuming Shi

    Abstract: While training large language models (LLMs) from scratch can generate models with distinct functionalities and strengths, it comes at significant costs and may result in redundant capabilities. Alternatively, a cost-effective and compelling approach is to merge existing pre-trained LLMs into a more potent model. However, due to the varying architectures of these LLMs, directly blending their weigh… ▽ More

    Submitted 22 January, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

    Comments: Accepted to ICLR 2024

  43. arXiv:2401.08294  [pdf, other

    cs.CL

    Inferflow: an Efficient and Highly Configurable Inference Engine for Large Language Models

    Authors: Shuming Shi, Enbo Zhao, Deng Cai, Leyang Cui, Xinting Huang, Huayang Li

    Abstract: We present Inferflow, an efficient and highly configurable inference engine for large language models (LLMs). With Inferflow, users can serve most of the common transformer models by simply modifying some lines in corresponding configuration files, without writing a single line of source code. Compared with most existing inference engines, Inferflow has some key features. First, by implementing a… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: Technical report of Inferflow

  44. arXiv:2401.08092  [pdf, other

    cs.LG cs.AI cs.DC

    A Survey of Resource-efficient LLM and Multimodal Foundation Models

    Authors: Mengwei Xu, Wangsong Yin, Dongqi Cai, Rongjie Yi, Daliang Xu, Qipeng Wang, Bingyang Wu, Yihao Zhao, Chen Yang, Shihe Wang, Qiyang Zhang, Zhenyan Lu, Li Zhang, Shangguang Wang, Yuanchun Li, Yunxin Liu, Xin Jin, Xuanzhe Liu

    Abstract: Large foundation models, including large language models (LLMs), vision transformers (ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine learning lifecycle, from training to deployment. However, the substantial advancements in versatility and performance these models offer come at a significant cost in terms of hardware resources. To support the growth of the… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

  45. arXiv:2401.06786  [pdf, other

    cs.DC cs.AI

    CloudEval-YAML: A Practical Benchmark for Cloud Configuration Generation

    Authors: Yifei Xu, Yuning Chen, Xumiao Zhang, Xianshang Lin, Pan Hu, Yunfei Ma, Songwu Lu, Wan Du, Zhuoqing Mao, Ennan Zhai, Dennis Cai

    Abstract: Among the thriving ecosystem of cloud computing and the proliferation of Large Language Model (LLM)-based code generation tools, there is a lack of benchmarking for code generation in cloud-native applications. In response to this need, we present CloudEval-YAML, a practical benchmark for cloud configuration generation. CloudEval-YAML tackles the diversity challenge by focusing on YAML, the de fac… ▽ More

    Submitted 9 November, 2023; originally announced January 2024.

  46. arXiv:2401.01473  [pdf, other

    eess.AS cs.SD

    Self-supervised Reflective Learning through Self-distillation and Online Clustering for Speaker Representation Learning

    Authors: Danwei Cai, Zexin Cai, Ming Li

    Abstract: Speaker representation learning is critical for modern voice recognition systems. While supervised learning techniques require extensive labeled data, unsupervised methodologies can leverage vast unlabeled corpora, offering a scalable solution. This paper introduces self-supervised reflective learning (SSRL), a novel paradigm that streamlines existing iterative unsupervised frameworks. SSRL integr… ▽ More

    Submitted 15 July, 2024; v1 submitted 2 January, 2024; originally announced January 2024.

  47. arXiv:2312.15853  [pdf, other

    cs.LG cs.AI

    Curricular and Cyclical Loss for Time Series Learning Strategy

    Authors: Chenxi Sun, Hongyan Li, Moxian Song, Derun Cai, Shenda Hong

    Abstract: Time series widely exists in real-world applications and many deep learning models have performed well on it. Current research has shown the importance of learning strategy for models, suggesting that the benefit is the order and size of learning samples. However, no effective strategy has been proposed for time series due to its abstract and dynamic construction. Meanwhile, the existing one-shot… ▽ More

    Submitted 25 December, 2023; originally announced December 2023.

    Comments: 23 pages, 5 figures

  48. arXiv:2312.14591  [pdf, other

    cs.CL

    Reasons to Reject? Aligning Language Models with Judgments

    Authors: Weiwen Xu, Deng Cai, Zhisong Zhang, Wai Lam, Shuming Shi

    Abstract: As humans, we consistently interact with our peers and receive feedback in the form of natural language. This language feedback allows us to maintain appropriate behavior, and rectify potential errors. The question arises naturally: can we use language feedback to align large language models (LLMs)? In contrast to previous research that aligns LLMs with scalar rewards, we present the first systema… ▽ More

    Submitted 6 June, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

    Comments: Accepted at ACL 2024 Findings. Our source codes and models are publicly available at https://github.com/wwxu21/CUT

  49. arXiv:2312.12828  [pdf, other

    cs.CV cs.AI

    TagCLIP: A Local-to-Global Framework to Enhance Open-Vocabulary Multi-Label Classification of CLIP Without Training

    Authors: Yuqi Lin, Minghao Chen, Kaipeng Zhang, Hengjia Li, Mingming Li, Zheng Yang, Dongqin Lv, Binbin Lin, Haifeng Liu, Deng Cai

    Abstract: Contrastive Language-Image Pre-training (CLIP) has demonstrated impressive capabilities in open-vocabulary classification. The class token in the image encoder is trained to capture the global features to distinguish different text descriptions supervised by contrastive loss, making it highly effective for single-label classification. However, it shows poor performance on multi-label datasets beca… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI2024

  50. arXiv:2312.11837  [pdf, other

    cs.CV

    Regulating Intermediate 3D Features for Vision-Centric Autonomous Driving

    Authors: Junkai Xu, Liang Peng, Haoran Cheng, Linxuan Xia, Qi Zhou, Dan Deng, Wei Qian, Wenxiao Wang, Deng Cai

    Abstract: Multi-camera perception tasks have gained significant attention in the field of autonomous driving. However, existing frameworks based on Lift-Splat-Shoot (LSS) in the multi-camera setting cannot produce suitable dense 3D features due to the projection nature and uncontrollable densification process. To resolve this problem, we propose to regulate intermediate dense 3D features with the help of vo… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI 2024