Skip to main content

Showing 1–50 of 256 results for author: Cai, D

  1. arXiv:2407.00132  [pdf, other

    cs.SE cs.AI

    ShortcutsBench: A Large-Scale Real-world Benchmark for API-based Agents

    Authors: Haiyang Shen, Yue Li, Desong Meng, Dongqi Cai, Sheng Qi, Li Zhang, Mengwei Xu, Yun Ma

    Abstract: Recent advancements in integrating large language models (LLMs) with application programming interfaces (APIs) have gained significant interest in both academia and industry. These API-based agents, leveraging the strong autonomy and planning capabilities of LLMs, can efficiently solve problems requiring multi-step actions. However, their ability to handle multi-dimensional difficulty levels, dive… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

  2. arXiv:2406.17312  [pdf, other

    cs.CL

    Not All Preference Pairs Are Created Equal: A Recipe for Annotation-Efficient Iterative Preference Learning

    Authors: Sen Yang, Leyang Cui, Deng Cai, Xinting Huang, Shuming Shi, Wai Lam

    Abstract: Iterative preference learning, though yielding superior performances, requires online annotated preference labels. In this work, we study strategies to select worth-annotating response pairs for cost-efficient annotation while achieving competitive or even better performances compared with the random selection baseline for iterative preference learning. Built on assumptions regarding uncertainty a… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  3. arXiv:2406.16377  [pdf, other

    cs.CL cs.AI

    On the Transformations across Reward Model, Parameter Update, and In-Context Prompt

    Authors: Deng Cai, Huayang Li, Tingchen Fu, Siheng Li, Weiwen Xu, Shuaiyi Li, Bowen Cao, Zhisong Zhang, Xinting Huang, Leyang Cui, Yan Wang, Lemao Liu, Taro Watanabe, Shuming Shi

    Abstract: Despite the general capabilities of pre-trained large language models (LLMs), they still need further adaptation to better serve practical applications. In this paper, we demonstrate the interchangeability of three popular and distinct adaptation tools: parameter updating, reward modeling, and in-context prompting. This interchangeability establishes a triangular framework with six transformation… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  4. arXiv:2406.10248  [pdf, other

    cs.CL cs.AI

    On the Worst Prompt Performance of Large Language Models

    Authors: Bowen Cao, Deng Cai, Zhisong Zhang, Yuexian Zou, Wai Lam

    Abstract: The performance of large language models (LLMs) is acutely sensitive to the phrasing of prompts, which raises significant concerns about their reliability in real-world scenarios. Existing studies often divide prompts into task-level instructions and case-level inputs and primarily focus on evaluating and improving robustness against variations in tasks-level instructions. However, this setup fail… ▽ More

    Submitted 21 June, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

  5. arXiv:2406.09961  [pdf, other

    cs.SE cs.CL cs.CV

    ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation

    Authors: Chufan Shi, Cheng Yang, Yaxin Liu, Bo Shui, Junjie Wang, Mohan Jing, Linran Xu, Xinyu Zhu, Siheng Li, Yuxiang Zhang, Gongye Liu, Xiaomei Nie, Deng Cai, Yujiu Yang

    Abstract: We introduce a new benchmark, ChartMimic, aimed at assessing the visually-grounded code generation capabilities of large multimodal models (LMMs). ChartMimic utilizes information-intensive visual charts and textual instructions as inputs, requiring LMMs to generate the corresponding code for chart rendering. ChartMimic includes 1,000 human-curated (figure, instruction, code) triplets, which repres… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Data and code are available at https://github.com/ChartMimic/ChartMimic

  6. arXiv:2406.04594  [pdf, other

    cs.DC cs.AI cs.LG

    Boosting Large-scale Parallel Training Efficiency with C4: A Communication-Driven Approach

    Authors: Jianbo Dong, Bin Luo, Jun Zhang, Pengcheng Zhang, Fei Feng, Yikai Zhu, Ang Liu, Zian Chen, Yi Shi, Hairong Jiao, Gang Lu, Yu Guan, Ennan Zhai, Wencong Xiao, Hanyu Zhao, Man Yuan, Siran Yang, Xiang Li, Jiamang Wang, Rui Men, Jianwei Zhang, Huang Zhong, Dennis Cai, Yuan Xie, Binzhang Fu

    Abstract: The emergence of Large Language Models (LLMs) has necessitated the adoption of parallel training techniques, involving the deployment of thousands of GPUs to train a single model. Unfortunately, we have found that the efficiency of current parallel training is often suboptimal, largely due to the following two main issues. Firstly, hardware failures are inevitable, leading to interruptions in the… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  7. arXiv:2405.14507  [pdf, other

    cs.CL cs.LG

    Unchosen Experts Can Contribute Too: Unleashing MoE Models' Power by Self-Contrast

    Authors: Chufan Shi, Cheng Yang, Xinyu Zhu, Jiahao Wang, Taiqiang Wu, Siheng Li, Deng Cai, Yujiu Yang, Yu Meng

    Abstract: Mixture-of-Experts (MoE) has emerged as a prominent architecture for scaling model size while maintaining computational efficiency. In MoE, each token in the input sequence activates a different subset of experts determined by a routing mechanism. However, the unchosen experts in MoE models do not contribute to the output, potentially leading to underutilization of the model's capacity. In this wo… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  8. arXiv:2405.13432  [pdf, other

    cs.CL cs.AI

    Disperse-Then-Merge: Pushing the Limits of Instruction Tuning via Alignment Tax Reduction

    Authors: Tingchen Fu, Deng Cai, Lemao Liu, Shuming Shi, Rui Yan

    Abstract: Supervised fine-tuning (SFT) on instruction-following corpus is a crucial approach toward the alignment of large language models (LLMs). However, the performance of LLMs on standard knowledge and reasoning benchmarks tends to suffer from deterioration at the latter stage of the SFT process, echoing the phenomenon of alignment tax. Through our pilot study, we put a hypothesis that the data biases a… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: Accepted to the findings of ACL2024

  9. arXiv:2404.19384  [pdf, other

    cs.CV cs.AI

    Pseudo Label Refinery for Unsupervised Domain Adaptation on Cross-dataset 3D Object Detection

    Authors: Zhanwei Zhang, Minghao Chen, Shuai Xiao, Liang Peng, Hengjia Li, Binbin Lin, Ping Li, Wenxiao Wang, Boxi Wu, Deng Cai

    Abstract: Recent self-training techniques have shown notable improvements in unsupervised domain adaptation for 3D object detection (3D UDA). These techniques typically select pseudo labels, i.e., 3D boxes, to supervise models for the target domain. However, this selection process inevitably introduces unreliable 3D boxes, in which 3D points cannot be definitively assigned as foreground or background. Previ… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024

  10. arXiv:2404.19330  [pdf, other

    cs.CV cs.AI

    G2LTraj: A Global-to-Local Generation Approach for Trajectory Prediction

    Authors: Zhanwei Zhang, Zishuo Hua, Minghao Chen, Wei Lu, Binbin Lin, Deng Cai, Wenxiao Wang

    Abstract: Predicting future trajectories of traffic agents accurately holds substantial importance in various applications such as autonomous driving. Previous methods commonly infer all future steps of an agent either recursively or simultaneously. However, the recursive strategy suffers from the accumulated error, while the simultaneous strategy overlooks the constraints among future steps, resulting in k… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: Accepted by IJCAI 2024

  11. arXiv:2404.15949  [pdf, other

    cs.CL cs.AI cs.LG

    CORM: Cache Optimization with Recent Message for Large Language Model Inference

    Authors: Jincheng Dai, Zhuowei Huang, Haiyun Jiang, Chen Chen, Deng Cai, Wei Bi, Shuming Shi

    Abstract: Large Language Models (LLMs), despite their remarkable performance across a wide range of tasks, necessitate substantial GPU memory and consume significant computational resources. Beyond the memory taken up by model weights, the memory used by the KV cache rises linearly with sequence length, becoming a primary bottleneck for inference. In this paper, we introduce an innovative method for optimiz… ▽ More

    Submitted 21 June, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

  12. arXiv:2404.15284  [pdf, other

    eess.SP cs.AI

    Global 4D Ionospheric STEC Prediction based on DeepONet for GNSS Rays

    Authors: Dijia Cai, Zenghui Shi, Haiyang Fu, Huan Liu, Hongyi Qian, Yun Sui, Feng Xu, Ya-Qiu Jin

    Abstract: The ionosphere is a vitally dynamic charged particle region in the Earth's upper atmosphere, playing a crucial role in applications such as radio communication and satellite navigation. The Slant Total Electron Contents (STEC) is an important parameter for characterizing wave propagation, representing the integrated electron density along the ray of radio signals passing through the ionosphere. Th… ▽ More

    Submitted 12 March, 2024; originally announced April 2024.

  13. arXiv:2404.09463  [pdf

    cs.LG

    PRIME: A CyberGIS Platform for Resilience Inference Measurement and Enhancement

    Authors: Debayan Mandal, Dr. Lei Zou, Rohan Singh Wilkho, Joynal Abedin, Bing Zhou, Dr. Heng Cai, Dr. Furqan Baig, Dr. Nasir Gharaibeh, Dr. Nina Lam

    Abstract: In an era of increased climatic disasters, there is an urgent need to develop reliable frameworks and tools for evaluating and improving community resilience to climatic hazards at multiple geographical and temporal scales. Defining and quantifying resilience in the social domain is relatively subjective due to the intricate interplay of socioeconomic factors with disaster resilience. Meanwhile, t… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: 28 pages, 6 figures

  14. arXiv:2403.16820  [pdf, other

    cs.CL

    Cross-lingual Contextualized Phrase Retrieval

    Authors: Huayang Li, Deng Cai, Zhi Qu, Qu Cui, Hidetaka Kamigaito, Lemao Liu, Taro Watanabe

    Abstract: Phrase-level dense retrieval has shown many appealing characteristics in downstream NLP tasks by leveraging the fine-grained information that phrases offer. In our work, we propose a new task formulation of dense retrieval, cross-lingual contextualized phrase retrieval, which aims to augment cross-lingual applications by addressing polysemy using context information. However, the lack of specific… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: preprint

  15. arXiv:2403.14103  [pdf, other

    cs.CV

    MaskSAM: Towards Auto-prompt SAM with Mask Classification for Medical Image Segmentation

    Authors: Bin Xie, Hao Tang, Bin Duan, Dawen Cai, Yan Yan

    Abstract: Segment Anything Model~(SAM), a prompt-driven foundation model for natural image segmentation, has demonstrated impressive zero-shot performance. However, SAM does not work when directly applied to medical image segmentation tasks, since SAM lacks the functionality to predict semantic labels for predicted masks and needs to provide extra prompts, such as points or boxes, to segment target regions.… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  16. arXiv:2403.11627  [pdf, other

    cs.CV

    LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models

    Authors: Yang Yang, Wen Wang, Liang Peng, Chaotian Song, Yao Chen, Hengjia Li, Xiaolong Yang, Qinglin Lu, Deng Cai, Boxi Wu, Wei Liu

    Abstract: Customization generation techniques have significantly advanced the synthesis of specific concepts across varied contexts. Multi-concept customization emerges as the challenging task within this domain. Existing approaches often rely on training a Low-Rank Adaptations (LoRA) fusion matrix of multiple LoRA to merge various concepts into a single image. However, we identify this straightforward meth… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  17. arXiv:2403.10683  [pdf, other

    cs.CV

    GS-Pose: Cascaded Framework for Generalizable Segmentation-based 6D Object Pose Estimation

    Authors: Dingding Cai, Janne Heikkilä, Esa Rahtu

    Abstract: This paper introduces GS-Pose, an end-to-end framework for locating and estimating the 6D pose of objects. GS-Pose begins with a set of posed RGB images of a previously unseen object and builds three distinct representations stored in a database. At inference, GS-Pose operates sequentially by locating the object in the input image, estimating its initial 6D pose using a retrieval approach, and ref… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: Project Page: https://dingdingcai.github.io/gs-pose

  18. arXiv:2403.06251  [pdf, other

    q-bio.NC cs.CV cs.LG

    Online Multi-spectral Neuron Tracing

    Authors: Bin Duan, Yuzhang Shang, Dawen Cai, Yan Yan

    Abstract: In this paper, we propose an online multi-spectral neuron tracing method with uniquely designed modules, where no offline training are required. Our method is trained online to update our enhanced discriminative correlation filter to conglutinate the tracing process. This distinctive offline-training-free schema differentiates us from other training-dependent tracing approaches like deep learning… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  19. arXiv:2403.05330  [pdf, other

    cs.CL

    Consecutive Model Editing with Batch alongside HooK Layers

    Authors: Shuaiyi Li, Yang Deng, Deng Cai, Hongyuan Lu, Liang Chen, Wai Lam

    Abstract: As the typical retraining paradigm is unacceptably time- and resource-consuming, researchers are turning to model editing in order to seek an effective, consecutive, and batch-supportive way to edit the model behavior directly. Despite all these practical expectations, existing model editing methods fail to realize all of them. Furthermore, the memory demands for such succession-supportive model e… ▽ More

    Submitted 17 April, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

    Comments: Under review

  20. arXiv:2403.04309  [pdf, other

    cs.CV cs.AI

    AO-DETR: Anti-Overlapping DETR for X-Ray Prohibited Items Detection

    Authors: Mingyuan Li, Tong Jia, Hao Wang, Bowen Ma, Shuyang Lin, Da Cai, Dongyue Chen

    Abstract: Prohibited item detection in X-ray images is one of the most essential and highly effective methods widely employed in various security inspection scenarios. Considering the significant overlapping phenomenon in X-ray prohibited item images, we propose an Anti-Overlapping DETR (AO-DETR) based on one of the state-of-the-art general object detectors, DINO. Specifically, to address the feature coupli… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  21. arXiv:2403.00881  [pdf, other

    cs.LG cs.DC cs.NI

    FedRDMA: Communication-Efficient Cross-Silo Federated LLM via Chunked RDMA Transmission

    Authors: Zeling Zhang, Dongqi Cai, Yiran Zhang, Mengwei Xu, Shangguang Wang, Ao Zhou

    Abstract: Communication overhead is a significant bottleneck in federated learning (FL), which has been exaggerated with the increasing size of AI models. In this paper, we propose FedRDMA, a communication-efficient cross-silo FL system that integrates RDMA into the FL communication protocol. To overcome the limitations of RDMA in wide-area networks (WANs), FedRDMA divides the updated model into chunks and… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

    Comments: under review

  22. arXiv:2402.17532  [pdf, other

    cs.CL

    Retrieval is Accurate Generation

    Authors: Bowen Cao, Deng Cai, Leyang Cui, Xuxin Cheng, Wei Bi, Yuexian Zou, Shuming Shi

    Abstract: Standard language models generate text by selecting tokens from a fixed, finite, and standalone vocabulary. We introduce a novel method that selects context-aware phrases from a collection of supporting documents. One of the most significant challenges for this paradigm shift is determining the training oracles, because a string of text can be segmented in various ways and each segment can be retr… ▽ More

    Submitted 16 March, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: ICLR 2024

  23. arXiv:2402.14758  [pdf, other

    stat.ML cs.AI cs.LG stat.CO

    Batch and match: black-box variational inference with a score-based divergence

    Authors: Diana Cai, Chirag Modi, Loucas Pillaud-Vivien, Charles C. Margossian, Robert M. Gower, David M. Blei, Lawrence K. Saul

    Abstract: Most leading implementations of black-box variational inference (BBVI) are based on optimizing a stochastic evidence lower bound (ELBO). But such approaches to BBVI often converge slowly due to the high variance of their gradient estimates and their sensitivity to hyperparameters. In this work, we propose batch and match (BaM), an alternative approach to BBVI based on a score-based divergence. Not… ▽ More

    Submitted 12 June, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: 49 pages, 14 figures. To appear in the Proceedings of the 41st International Conference on Machine Learning (ICML), 2024

  24. arXiv:2402.14464  [pdf, other

    cs.CV

    NeRF-Det++: Incorporating Semantic Cues and Perspective-aware Depth Supervision for Indoor Multi-View 3D Detection

    Authors: Chenxi Huang, Yuenan Hou, Weicai Ye, Di Huang, Xiaoshui Huang, Binbin Lin, Deng Cai, Wanli Ouyang

    Abstract: NeRF-Det has achieved impressive performance in indoor multi-view 3D detection by innovatively utilizing NeRF to enhance representation learning. Despite its notable performance, we uncover three decisive shortcomings in its current design, including semantic ambiguity, inappropriate sampling, and insufficient utilization of depth supervision. To combat the aforementioned problems, we present thre… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: 7 pages, 2 figures

  25. arXiv:2402.09748  [pdf, other

    cs.CL cs.AI cs.LG cs.PF

    Model Compression and Efficient Inference for Large Language Models: A Survey

    Authors: Wenxiao Wang, Wei Chen, Yicong Luo, Yongliu Long, Zhengkai Lin, Liye Zhang, Binbin Lin, Deng Cai, Xiaofei He

    Abstract: Transformer based large language models have achieved tremendous success. However, the significant memory and computational costs incurred during the inference process make it challenging to deploy large models on resource-constrained devices. In this paper, we investigate compression and efficient inference methods for large language models from an algorithmic perspective. Regarding taxonomy, sim… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

    Comments: 47 pages, review 380 papers. The work is ongoing

  26. arXiv:2402.06930  [pdf, other

    cs.CL

    LiFi: Lightweight Controlled Text Generation with Fine-Grained Control Codes

    Authors: Chufan Shi, Deng Cai, Yujiu Yang

    Abstract: In the rapidly evolving field of text generation, the demand for more precise control mechanisms has become increasingly apparent. To address this need, we present a novel methodology, LIFI, which offers a lightweight approach with fine-grained control for controlled text generation. Unlike previous studies that train pre-trained language models to follow discrete, categorical, and exclusive contr… ▽ More

    Submitted 10 February, 2024; originally announced February 2024.

  27. arXiv:2402.06925  [pdf, other

    cs.CL

    A Thorough Examination of Decoding Methods in the Era of LLMs

    Authors: Chufan Shi, Haoran Yang, Deng Cai, Zhisong Zhang, Yifan Wang, Yujiu Yang, Wai Lam

    Abstract: Decoding methods play an indispensable role in converting language models from next-token predictors into practical task solvers. Prior research on decoding methods, primarily focusing on task-specific models, may not extend to the current era of general-purpose large language models (LLMs). Moreover, the recent influx of decoding strategies has further complicated this landscape. This paper provi… ▽ More

    Submitted 17 June, 2024; v1 submitted 10 February, 2024; originally announced February 2024.

  28. arXiv:2401.12596  [pdf, other

    cs.CV

    UniHDA: A Unified and Versatile Framework for Multi-Modal Hybrid Domain Adaptation

    Authors: Hengjia Li, Yang Liu, Yuqi Lin, Zhanwei Zhang, Yibo Zhao, weihang Pan, Tu Zheng, Zheng Yang, Yuchun Jiang, Boxi Wu, Deng Cai

    Abstract: Recently, generative domain adaptation has achieved remarkable progress, enabling us to adapt a pre-trained generator to a new target domain. However, existing methods simply adapt the generator to a single target domain and are limited to a single modality, either text-driven or image-driven. Moreover, they cannot maintain well consistency with the source domain, which impedes the inheritance of… ▽ More

    Submitted 15 March, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

  29. arXiv:2401.11983   

    cs.SD cs.CR eess.AS

    Lightweight Protection for Privacy in Offloaded Speech Understanding

    Authors: Dongqi Cai

    Abstract: Speech is a common input method for mobile embedded devices, but cloud-based speech recognition systems pose privacy risks. Disentanglement-based encoders, designed to safeguard user privacy by filtering sensitive information from speech signals, unfortunately require substantial memory and computational resources, which limits their use in less powerful devices. To overcome this, we introduce a n… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: arXiv admin comment: This version has been removed by arXiv administrators as the submitter did not have the rights to agree to the license at the time of submission

  30. arXiv:2401.11504  [pdf, other

    cs.CL cs.AI

    With Greater Text Comes Greater Necessity: Inference-Time Training Helps Long Text Generation

    Authors: Y. Wang, D. Ma, D. Cai

    Abstract: Long text generation, such as novel writing and discourse-level translation with extremely long contexts, presents significant challenges to current language models. Existing methods mainly focus on extending the model's context window through strategies like length extrapolation. However, these approaches demand substantial hardware resources during the training and/or inference phases. Our propo… ▽ More

    Submitted 25 March, 2024; v1 submitted 21 January, 2024; originally announced January 2024.

  31. arXiv:2401.10491  [pdf, other

    cs.CL

    Knowledge Fusion of Large Language Models

    Authors: Fanqi Wan, Xinting Huang, Deng Cai, Xiaojun Quan, Wei Bi, Shuming Shi

    Abstract: While training large language models (LLMs) from scratch can generate models with distinct functionalities and strengths, it comes at significant costs and may result in redundant capabilities. Alternatively, a cost-effective and compelling approach is to merge existing pre-trained LLMs into a more potent model. However, due to the varying architectures of these LLMs, directly blending their weigh… ▽ More

    Submitted 22 January, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

    Comments: Accepted to ICLR 2024

  32. arXiv:2401.08294  [pdf, other

    cs.CL

    Inferflow: an Efficient and Highly Configurable Inference Engine for Large Language Models

    Authors: Shuming Shi, Enbo Zhao, Deng Cai, Leyang Cui, Xinting Huang, Huayang Li

    Abstract: We present Inferflow, an efficient and highly configurable inference engine for large language models (LLMs). With Inferflow, users can serve most of the common transformer models by simply modifying some lines in corresponding configuration files, without writing a single line of source code. Compared with most existing inference engines, Inferflow has some key features. First, by implementing a… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: Technical report of Inferflow

  33. arXiv:2401.08092  [pdf, other

    cs.LG cs.AI cs.DC

    A Survey of Resource-efficient LLM and Multimodal Foundation Models

    Authors: Mengwei Xu, Wangsong Yin, Dongqi Cai, Rongjie Yi, Daliang Xu, Qipeng Wang, Bingyang Wu, Yihao Zhao, Chen Yang, Shihe Wang, Qiyang Zhang, Zhenyan Lu, Li Zhang, Shangguang Wang, Yuanchun Li, Yunxin Liu, Xin Jin, Xuanzhe Liu

    Abstract: Large foundation models, including large language models (LLMs), vision transformers (ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine learning lifecycle, from training to deployment. However, the substantial advancements in versatility and performance these models offer come at a significant cost in terms of hardware resources. To support the growth of the… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

  34. arXiv:2401.06786  [pdf, other

    cs.DC cs.AI

    CloudEval-YAML: A Practical Benchmark for Cloud Configuration Generation

    Authors: Yifei Xu, Yuning Chen, Xumiao Zhang, Xianshang Lin, Pan Hu, Yunfei Ma, Songwu Lu, Wan Du, Zhuoqing Mao, Ennan Zhai, Dennis Cai

    Abstract: Among the thriving ecosystem of cloud computing and the proliferation of Large Language Model (LLM)-based code generation tools, there is a lack of benchmarking for code generation in cloud-native applications. In response to this need, we present CloudEval-YAML, a practical benchmark for cloud configuration generation. CloudEval-YAML tackles the diversity challenge by focusing on YAML, the de fac… ▽ More

    Submitted 9 November, 2023; originally announced January 2024.

  35. arXiv:2401.01473  [pdf, other

    eess.AS cs.SD

    Self-supervised Reflective Learning through Self-distillation and Online Clustering for Speaker Representation Learning

    Authors: Danwei Cai, Zexin Cai, Ming Li

    Abstract: Speaker representation learning is critical for modern voice recognition systems. While supervised learning techniques require extensive labeled data, unsupervised methodologies can leverage vast unlabeled corpora, offering a scalable solution. This paper introduces self-supervised reflective learning (SSRL), a novel paradigm that streamlines existing iterative unsupervised frameworks. SSRL integr… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

  36. arXiv:2312.15853  [pdf, other

    cs.LG cs.AI

    Curricular and Cyclical Loss for Time Series Learning Strategy

    Authors: Chenxi Sun, Hongyan Li, Moxian Song, Derun Cai, Shenda Hong

    Abstract: Time series widely exists in real-world applications and many deep learning models have performed well on it. Current research has shown the importance of learning strategy for models, suggesting that the benefit is the order and size of learning samples. However, no effective strategy has been proposed for time series due to its abstract and dynamic construction. Meanwhile, the existing one-shot… ▽ More

    Submitted 25 December, 2023; originally announced December 2023.

    Comments: 23 pages, 5 figures

  37. arXiv:2312.14591  [pdf, other

    cs.CL

    Reasons to Reject? Aligning Language Models with Judgments

    Authors: Weiwen Xu, Deng Cai, Zhisong Zhang, Wai Lam, Shuming Shi

    Abstract: As humans, we consistently interact with our peers and receive feedback in the form of natural language. This language feedback allows us to maintain appropriate behavior, and rectify potential errors. The question arises naturally: can we use language feedback to align large language models (LLMs)? In contrast to previous research that aligns LLMs with scalar rewards, we present the first systema… ▽ More

    Submitted 6 June, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

    Comments: Accepted at ACL 2024 Findings. Our source codes and models are publicly available at https://github.com/wwxu21/CUT

  38. arXiv:2312.12828  [pdf, other

    cs.CV cs.AI

    TagCLIP: A Local-to-Global Framework to Enhance Open-Vocabulary Multi-Label Classification of CLIP Without Training

    Authors: Yuqi Lin, Minghao Chen, Kaipeng Zhang, Hengjia Li, Mingming Li, Zheng Yang, Dongqin Lv, Binbin Lin, Haifeng Liu, Deng Cai

    Abstract: Contrastive Language-Image Pre-training (CLIP) has demonstrated impressive capabilities in open-vocabulary classification. The class token in the image encoder is trained to capture the global features to distinguish different text descriptions supervised by contrastive loss, making it highly effective for single-label classification. However, it shows poor performance on multi-label datasets beca… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI2024

  39. arXiv:2312.11837  [pdf, other

    cs.CV

    Regulating Intermediate 3D Features for Vision-Centric Autonomous Driving

    Authors: Junkai Xu, Liang Peng, Haoran Cheng, Linxuan Xia, Qi Zhou, Dan Deng, Wei Qian, Wenxiao Wang, Deng Cai

    Abstract: Multi-camera perception tasks have gained significant attention in the field of autonomous driving. However, existing frameworks based on Lift-Splat-Shoot (LSS) in the multi-camera setting cannot produce suitable dense 3D features due to the projection nature and uncontrollable densification process. To resolve this problem, we propose to regulate intermediate dense 3D features with the help of vo… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI 2024

  40. arXiv:2311.16511  [pdf, other

    cs.CV

    GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation

    Authors: Zhanyu Wang, Longyue Wang, Zhen Zhao, Minghao Wu, Chenyang Lyu, Huayang Li, Deng Cai, Luping Zhou, Shuming Shi, Zhaopeng Tu

    Abstract: While the recent advances in Multimodal Large Language Models (MLLMs) constitute a significant leap forward in the field, these models are predominantly confined to the realm of input-side multimodal comprehension, lacking the capacity for multimodal content generation. To fill this gap, we present GPT4Video, a unified multi-model framework that empowers Large Language Models (LLMs) with the capab… ▽ More

    Submitted 24 November, 2023; originally announced November 2023.

  41. arXiv:2311.09832  [pdf, other

    cs.CL

    WatME: Towards Lossless Watermarking Through Lexical Redundancy

    Authors: Liang Chen, Yatao Bian, Yang Deng, Deng Cai, Shuaiyi Li, Peilin Zhao, Kam-fai Wong

    Abstract: Text watermarking has emerged as a pivotal technique for identifying machine-generated text. However, existing methods often rely on arbitrary vocabulary partitioning during decoding to embed watermarks, which compromises the availability of suitable tokens and significantly degrades the quality of responses. This study assesses the impact of watermarking on different capabilities of large languag… ▽ More

    Submitted 6 June, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

    Comments: Accepted to ACL 2024 main conference

  42. arXiv:2311.08803  [pdf, other

    cs.CL

    StrategyLLM: Large Language Models as Strategy Generators, Executors, Optimizers, and Evaluators for Problem Solving

    Authors: Chang Gao, Haiyun Jiang, Deng Cai, Shuming Shi, Wai Lam

    Abstract: Most existing prompting methods suffer from the issues of generalizability and consistency, as they often rely on instance-specific solutions that may not be applicable to other instances and lack task-level consistency across the selected few-shot examples. To address these limitations, we propose a comprehensive framework, StrategyLLM, allowing LLMs to perform inductive reasoning, deriving gener… ▽ More

    Submitted 24 May, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

  43. arXiv:2311.02778  [pdf, other

    cs.CV

    MuSHRoom: Multi-Sensor Hybrid Room Dataset for Joint 3D Reconstruction and Novel View Synthesis

    Authors: Xuqian Ren, Wenjia Wang, Dingding Cai, Tuuli Tuominen, Juho Kannala, Esa Rahtu

    Abstract: Metaverse technologies demand accurate, real-time, and immersive modeling on consumer-grade hardware for both non-human perception (e.g., drone/robot/autonomous car navigation) and immersive technologies like AR/VR, requiring both structural accuracy and photorealism. However, there exists a knowledge gap in how to apply geometric reconstruction and photorealism modeling (novel view synthesis) in… ▽ More

    Submitted 19 March, 2024; v1 submitted 5 November, 2023; originally announced November 2023.

  44. arXiv:2311.02130  [pdf, ps, other

    cs.LG cs.AI

    Client Orchestration and Cost-Efficient Joint Optimization for NOMA-Enabled Hierarchical Federated Learning

    Authors: Bibo Wu, Fang Fang, Xianbin Wang, Donghong Cai, Shu Fu, Zhiguo Ding

    Abstract: Hierarchical federated learning (HFL) shows great advantages over conventional two-layer federated learning (FL) in reducing network overhead and interaction latency while still retaining the data privacy of distributed FL clients. However, the communication and energy overhead still pose a bottleneck for HFL performance, especially as the number of clients raises dramatically. To tackle this issu… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

  45. arXiv:2310.18257  [pdf, other

    cs.LG

    MIM-GAN-based Anomaly Detection for Multivariate Time Series Data

    Authors: Shan Lu, Zhicheng Dong, Donghong Cai, Fang Fang, Dongcai Zhao

    Abstract: The loss function of Generative adversarial network(GAN) is an important factor that affects the quality and diversity of the generated samples for anomaly detection. In this paper, we propose an unsupervised multiple time series anomaly detection algorithm based on the GAN with message importance measure(MIM-GAN). In particular, the time series data is divided into subsequences using a sliding wi… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: 7 pages,6 figures

  46. arXiv:2310.15326  [pdf, other

    cs.CL

    Specialist or Generalist? Instruction Tuning for Specific NLP Tasks

    Authors: Chufan Shi, Yixuan Su, Cheng Yang, Yujiu Yang, Deng Cai

    Abstract: The potential of large language models (LLMs) to simultaneously perform a wide range of natural language processing (NLP) tasks has been the subject of extensive research. Although instruction tuning has proven to be a data-efficient method for transforming LLMs into such generalist models, their performance still lags behind specialist models trained exclusively for specific tasks. In this paper,… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: Accepted to EMNLP 2023

  47. arXiv:2310.10226  [pdf, other

    cs.CL

    Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective

    Authors: Huayang Li, Tian Lan, Zihao Fu, Deng Cai, Lemao Liu, Nigel Collier, Taro Watanabe, Yixuan Su

    Abstract: There are a number of diverging hypotheses about the neural text degeneration problem, i.e., generating repetitive and dull loops, which makes this problem both interesting and confusing. In this work, we aim to advance our understanding by presenting a straightforward and fundamental explanation from the data perspective. Our preliminary investigation reveals a strong correlation between the dege… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: Accepted to NeurIPS 2023

  48. DxPU: Large Scale Disaggregated GPU Pools in the Datacenter

    Authors: Bowen He, Xiao Zheng, Yuan Chen, Weinan Li, Yajin Zhou, Xin Long, Pengcheng Zhang, Xiaowei Lu, Linquan Jiang, Qiang Liu, Dennis Cai, Xiantao Zhang

    Abstract: The rapid adoption of AI and convenience offered by cloud services have resulted in the growing demands for GPUs in the cloud. Generally, GPUs are physically attached to host servers as PCIe devices. However, the fixed assembly combination of host servers and GPUs is extremely inefficient in resource utilization, upgrade, and maintenance. Due to these issues, the GPU disaggregation technique has b… ▽ More

    Submitted 6 October, 2023; originally announced October 2023.

    Comments: 23 pages, 6 figures, published in ACM Transactions on Architecture and Code Optimization

  49. arXiv:2309.08637  [pdf, other

    cs.CL cs.AI

    TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wild

    Authors: Huayang Li, Siheng Li, Deng Cai, Longyue Wang, Lemao Liu, Taro Watanabe, Yujiu Yang, Shuming Shi

    Abstract: Large language models with instruction-following abilities have revolutionized the field of artificial intelligence. These models show exceptional generalizability to tackle various real-world tasks through their natural language interfaces. However, their performance heavily relies on high-quality exemplar data, which is often difficult to obtain. This challenge is further exacerbated when it com… ▽ More

    Submitted 3 June, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: Findings of ACL 2024

  50. arXiv:2309.01219  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models

    Authors: Yue Zhang, Yafu Li, Leyang Cui, Deng Cai, Lemao Liu, Tingchen Fu, Xinting Huang, Enbo Zhao, Yu Zhang, Yulong Chen, Longyue Wang, Anh Tuan Luu, Wei Bi, Freda Shi, Shuming Shi

    Abstract: While large language models (LLMs) have demonstrated remarkable capabilities across a range of downstream tasks, a significant concern revolves around their propensity to exhibit hallucinations: LLMs occasionally generate content that diverges from the user input, contradicts previously generated context, or misaligns with established world knowledge. This phenomenon poses a substantial challenge… ▽ More

    Submitted 24 September, 2023; v1 submitted 3 September, 2023; originally announced September 2023.

    Comments: work in progress; 32 pages