Skip to main content

Showing 1–50 of 2,139 results for author: Chen, D

  1. arXiv:2407.11890  [pdf, other

    cs.CV

    DepGAN: Leveraging Depth Maps for Handling Occlusions and Transparency in Image Composition

    Authors: Amr Ghoneim, Jiju Poovvancheri, Yasushi Akiyama, Dong Chen

    Abstract: Image composition is a complex task which requires a lot of information about the scene for an accurate and realistic composition, such as perspective, lighting, shadows, occlusions, and object interactions. Previous methods have predominantly used 2D information for image composition, neglecting the potentials of 3D spatial information. In this work, we propose DepGAN, a Generative Adversarial Ne… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 10 pages, 13 figures

  2. arXiv:2407.11784  [pdf, other

    cs.AI cs.CV cs.LG

    Data-Juicer Sandbox: A Comprehensive Suite for Multimodal Data-Model Co-development

    Authors: Daoyuan Chen, Haibin Wang, Yilun Huang, Ce Ge, Yaliang Li, Bolin Ding, Jingren Zhou

    Abstract: The emergence of large-scale multi-modal generative models has drastically advanced artificial intelligence, introducing unprecedented levels of performance and functionality. However, optimizing these models remains challenging due to historically isolated paths of model-centric and data-centric developments, leading to suboptimal outcomes and inefficient resource utilization. In response, we pre… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 26 pages, 9 figures, 5 tables

  3. arXiv:2407.10949  [pdf, other

    cs.CL cs.AI cs.LG

    Representing Rule-based Chatbots with Transformers

    Authors: Dan Friedman, Abhishek Panigrahi, Danqi Chen

    Abstract: Transformer-based chatbots can conduct fluent, natural-sounding conversations, but we have limited understanding of the mechanisms underlying their behavior. Prior work has taken a bottom-up approach to understanding Transformers by constructing Transformers for various synthetic and formal language tasks, such as regular expressions and Dyck languages. However, it is not obvious how to extend thi… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Code and data are available at https://github.com/princeton-nlp/ELIZA-Transformer

  4. arXiv:2407.10310  [pdf, other

    cs.CY eess.SY

    Impact of Different Infrastructures and Traffic Scenarios on Behavioral and Physiological Responses of E-scooter Users

    Authors: Dong Chen, Arman Hosseini, Arik Smith, David Xiang, Arsalan Heydarian, Omid Shoghli, Bradford Campbell

    Abstract: As micromobility devices such as e-scooters gain global popularity, emergency departments around the world have observed a rising trend in related injuries. However, the majority of current research on e-scooter safety relies heavily on surveys, news reports, and data from vendors, with a noticeable scarcity of naturalistic studies examining the effects of riders' behaviors and physiological respo… ▽ More

    Submitted 5 May, 2024; originally announced July 2024.

    Comments: 6 pages, 8 figures

  5. arXiv:2407.10031  [pdf, other

    cs.RO cs.MA

    Long-Horizon Planning for Multi-Agent Robots in Partially Observable Environments

    Authors: Siddharth Nayak, Adelmo Morrison Orozco, Marina Ten Have, Vittal Thirumalai, Jackson Zhang, Darren Chen, Aditya Kapoor, Eric Robinson, Karthik Gopalakrishnan, James Harrison, Brian Ichter, Anuj Mahajan, Hamsa Balakrishnan

    Abstract: The ability of Language Models (LMs) to understand natural language makes them a powerful tool for parsing human instructions into task plans for autonomous robots. Unlike traditional planning methods that rely on domain-specific knowledge and handcrafted rules, LMs generalize from diverse data and adapt to various tasks with minimal tuning, acting as a compressed knowledge base. However, LMs in t… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: 27 pages, 4 figures, 5 tables

  6. arXiv:2407.09790  [pdf, other

    cs.LG

    Team up GBDTs and DNNs: Advancing Efficient and Effective Tabular Prediction with Tree-hybrid MLPs

    Authors: Jiahuan Yan, Jintai Chen, Qianxing Wang, Danny Z. Chen, Jian Wu

    Abstract: Tabular datasets play a crucial role in various applications. Thus, developing efficient, effective, and widely compatible prediction algorithms for tabular data is important. Currently, two prominent model types, Gradient Boosted Decision Trees (GBDTs) and Deep Neural Networks (DNNs), have demonstrated performance advantages on distinct tabular prediction tasks. However, selecting an effective mo… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: Accepted at KDD 2024 Research Track, codes will be available at https://github.com/jyansir/tmlp

  7. arXiv:2407.08953  [pdf, ps, other

    q-fin.CP cs.LG

    Attribution Methods in Asset Pricing: Do They Account for Risk?

    Authors: Dangxing Chen, Yuan Gao

    Abstract: Over the past few decades, machine learning models have been extremely successful. As a result of axiomatic attribution methods, feature contributions have been explained more clearly and rigorously. There are, however, few studies that have examined domain knowledge in conjunction with the axioms. In this study, we examine asset pricing in finance, a field closely related to risk management. Cons… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Journal ref: 2024 IEEE Symposium on Computational Intelligence for Financial Engineering and Economics (CIFEr)

  8. arXiv:2407.08586  [pdf, other

    nucl-ex

    Centrality dependence of Lévy-stable two-pion Bose-Einstein correlations in $\sqrt{s_{_{NN}}}=200$ GeV Au$+$Au collisions

    Authors: PHENIX Collaboration, N. J. Abdulameer, U. Acharya, A. Adare, C. Aidala, N. N. Ajitanand, Y. Akiba, R. Akimoto, H. Al-Ta'ani, J. Alexander, A. Angerami, K. Aoki, N. Apadula, Y. Aramaki, H. Asano, E. C. Aschenauer, E. T. Atomssa, T. C. Awes, B. Azmoun, V. Babintsev, M. Bai, B. Bannier, K. N. Barish, B. Bassalleck, S. Bathe , et al. (377 additional authors not shown)

    Abstract: The PHENIX experiment measured the centrality dependence of two-pion Bose-Einstein correlation functions in $\sqrt{s_{_{NN}}}=200$~GeV Au$+$Au collisions at the Relativistic Heavy Ion Collider at Brookhaven National Laboratory. The data are well represented by Lévy-stable source distributions. The extracted source parameters are the correlation-strength parameter $λ$, the Lévy index of stability… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 401 authors from 75 institutions, 20 pages, 15 figures, 2 tables. v1 is version submitted to Physical Review C. HEPdata tables for the points plotted in figures for this and previous PHENIX publications are (or will be) publicly available at http://www.phenix.bnl.gov/papers.html

  9. arXiv:2407.08583  [pdf, other

    cs.AI cs.CV cs.LG

    The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective

    Authors: Zhen Qin, Daoyuan Chen, Wenhao Zhang, Liuyi Yao, Yilun Huang, Bolin Ding, Yaliang Li, Shuiguang Deng

    Abstract: The rapid development of large language models (LLMs) has been witnessed in recent years. Based on the powerful LLMs, multi-modal LLMs (MLLMs) extend the modality from text to a broader spectrum of domains, attracting widespread attention due to the broader range of application scenarios. As LLMs and MLLMs rely on vast amounts of model parameters and data to achieve emergent capabilities, the impo… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Ongoing work. 31 pages. Related materials are continually maintained and available at https://github.com/modelscope/data-juicer/blob/main/docs/awesome_llm_data.md

  10. arXiv:2407.08233  [pdf, other

    cs.LG

    Differentially Private Neural Network Training under Hidden State Assumption

    Authors: Ding Chen, Chen Liu

    Abstract: We present a novel approach called differentially private stochastic block coordinate descent (DP-SBCD) for training neural networks with provable guarantees of differential privacy under the hidden state assumption. Our methodology incorporates Lipschitz neural networks and decomposes the training process of the neural network into sub-problems, each corresponding to the training of a specific la… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  11. arXiv:2407.07681  [pdf

    physics.optics physics.bio-ph

    Localizing axial dense emitters based onsingle-helix point spread function andcompressed sensing

    Authors: Hanzhe Wu, Danni Chen, YiHong Jiand Gan Xiang, Heng Li, Bin Yu, JunLe Qu

    Abstract: Among the approaches in three-dimensional (3D) single molecule localization microscopy, there are several point spread function (PSF) engineering approaches, in which depth information of molecules is encoded in 2D images. Usually,the molecules are excited sparsely in each raw image. The consequence is that the temporal resolution has to be sacrificed. In order to improve temporal resolution and e… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  12. arXiv:2407.07107  [pdf, ps, other

    math.NT math.CO

    Congruences modulo powers of $5$ and $7$ for the crank and rank parity functions and related mock theta functions

    Authors: Dandan Chen, Rong Chen, Frank Garvan

    Abstract: It is well known that Ramanujan conjectured congruences modulo powers of $5$, $7$ and and $11$ for the partition function. These were subsequently proved by Watson (1938) and Atkin (1967). In 2009 Choi, Kang, and Lovejoy proved congruences modulo powers of $5$ for the crank parity function. The generating function for the analogous rank parity function is $f(q)$, the first example of a mock theta… ▽ More

    Submitted 24 June, 2024; originally announced July 2024.

    Comments: 44 pages

    MSC Class: 05A17; 11F33; 11F37; 11P83; 33D15

  13. arXiv:2407.06938  [pdf, other

    cs.CV

    RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models

    Authors: Bowen Zhang, Yiji Cheng, Chunyu Wang, Ting Zhang, Jiaolong Yang, Yansong Tang, Feng Zhao, Dong Chen, Baining Guo

    Abstract: We present RodinHD, which can generate high-fidelity 3D avatars from a portrait image. Existing methods fail to capture intricate details such as hairstyles which we tackle in this paper. We first identify an overlooked problem of catastrophic forgetting that arises when fitting triplanes sequentially on many avatars, caused by the MLP decoder sharing scheme. To overcome this issue, we raise a nov… ▽ More

    Submitted 10 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

    Comments: ECCV 2024; project page: https://rodinhd.github.io/

  14. arXiv:2407.06573  [pdf, other

    cs.SE

    LLM for Mobile: An Initial Roadmap

    Authors: Daihang Chen, Yonghui Liu, Mingyi Zhou, Yanjie Zhao, Haoyu Wang, Shuai Wang, Xiao Chen, Tegawendé F. Bissyandé, Jacques Klein, Li Li

    Abstract: When mobile meets LLMs, mobile app users deserve to have more intelligent usage experiences. For this to happen, we argue that there is a strong need to appl LLMs for the mobile ecosystem. We therefore provide a research roadmap for guiding our fellow researchers to achieve that as a whole. In this roadmap, we sum up six directions that we believe are urgently required for research to enable nativ… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  15. arXiv:2407.05598  [pdf, ps, other

    hep-ph nucl-th

    Probing the nature of the anticharmed-strange pentaquark states: mass spectra, decays, and magnetic moments

    Authors: Xuejie Liu, Yue Tan, Xiaoyun Chen, Dianyong Chen, Hongxia Huang, Jialun Ping

    Abstract: Within the framework of the quark delocalization color screening model, a systematic investigation of the anticharmed-strange pentaquark system is performed using the resonance group method. The currently estimations predict three bound states with estimated masses to be 2886 MeV, 3039 MeV, and 3153 MeV, respectively. Additionally, three resonance states are identified in various scattering phase… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  16. arXiv:2407.04681  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Rethinking Visual Prompting for Multimodal Large Language Models with External Knowledge

    Authors: Yuanze Lin, Yunsheng Li, Dongdong Chen, Weijian Xu, Ronald Clark, Philip Torr, Lu Yuan

    Abstract: In recent years, multimodal large language models (MLLMs) have made significant strides by training on vast high-quality image-text datasets, enabling them to generally understand images well. However, the inherent difficulty in explicitly conveying fine-grained or spatially dense information in text, such as masks, poses a challenge for MLLMs, limiting their ability to answer questions requiring… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  17. arXiv:2407.03282  [pdf, other

    cs.CL

    LLM Internal States Reveal Hallucination Risk Faced With a Query

    Authors: Ziwei Ji, Delong Chen, Etsuko Ishii, Samuel Cahyawijaya, Yejin Bang, Bryan Wilie, Pascale Fung

    Abstract: The hallucination problem of Large Language Models (LLMs) significantly limits their reliability and trustworthiness. Humans have a self-awareness process that allows us to recognize what we don't know when faced with queries. Inspired by this, our paper investigates whether LLMs can estimate their own hallucination risk before response generation. We analyze the internal mechanisms of LLMs broadl… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  18. arXiv:2407.02139  [pdf, ps, other

    physics.flu-dyn

    Kinetics of Rayleigh-Taylor instability in van der Waals fluid: the influence of compressibility

    Authors: Jie Chen, Aiguo Xu, Yudong Zhang, Dawei Chen, Zhihua Chen

    Abstract: Early studies on Rayleigh-Taylor instability (RTI) primarily relied on the Navier-Stokes (NS) model. As research progresses, it becomes increasingly evident that the kinetic information that the NS model failed to capture is of great value for identifying and even controlling the RTI process; simultaneously, the lack of analysis techniques for complex physical fields results in a significant waste… ▽ More

    Submitted 3 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

  19. arXiv:2407.01906  [pdf, other

    cs.CL cs.AI cs.LG

    Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models

    Authors: Zihan Wang, Deli Chen, Damai Dai, Runxin Xu, Zhuoshu Li, Y. Wu

    Abstract: Parameter-efficient fine-tuning (PEFT) is crucial for customizing Large Language Models (LLMs) with constrained resources. Although there have been various PEFT methods for dense-architecture LLMs, PEFT for sparse-architecture LLMs is still underexplored. In this work, we study the PEFT method for LLMs with the Mixture-of-Experts (MoE) architecture and the contents of this work are mainly threefol… ▽ More

    Submitted 4 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

  20. arXiv:2407.01875  [pdf, ps, other

    cs.AI

    Spatio-Temporal Graphical Counterfactuals: An Overview

    Authors: Mingyu Kang, Duxin Chen, Ziyuan Pu, Jianxi Gao, Wenwu Yu

    Abstract: Counterfactual thinking is a critical yet challenging topic for artificial intelligence to learn knowledge from data and ultimately improve their performances for new scenarios. Many research works, including Potential Outcome Model and Structural Causal Model, have been proposed to realize it. However, their modelings, theoretical foundations and application approaches are usually different. More… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  21. arXiv:2407.01505  [pdf, other

    cs.CL cs.AI

    Self-Cognition in Large Language Models: An Exploratory Study

    Authors: Dongping Chen, Jiawen Shi, Yao Wan, Pan Zhou, Neil Zhenqiang Gong, Lichao Sun

    Abstract: While Large Language Models (LLMs) have achieved remarkable success across various applications, they also raise concerns regarding self-cognition. In this paper, we perform a pioneering study to explore self-cognition in LLMs. Specifically, we first construct a pool of self-cognition instruction prompts to evaluate where an LLM exhibits self-cognition and four well-designed principles to quantify… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted at ICML 2024 Large Language Models and Cognition Workshop

  22. arXiv:2407.01436  [pdf, other

    cs.CV cs.RO

    AdaOcc: Adaptive Forward View Transformation and Flow Modeling for 3D Occupancy and Flow Prediction

    Authors: Dubing Chen, Wencheng Han, Jin Fang, Jianbing Shen

    Abstract: In this technical report, we present our solution for the Vision-Centric 3D Occupancy and Flow Prediction track in the nuScenes Open-Occ Dataset Challenge at CVPR 2024. Our innovative approach involves a dual-stage framework that enhances 3D occupancy and flow predictions by incorporating adaptive forward view transformation and flow modeling. Initially, we independently train the occupancy model,… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 2nd Place in the 3D Occupancy and Flow Prediction Challenge (CVPR24)

  23. arXiv:2407.00668  [pdf, other

    cs.CL

    HRDE: Retrieval-Augmented Large Language Models for Chinese Health Rumor Detection and Explainability

    Authors: Yanfang Chen, Ding Chen, Shichao Song, Simin Niu, Hanyu Wang, Zeyun Tang, Feiyu Xiong, Zhiyu Li

    Abstract: As people increasingly prioritize their health, the speed and breadth of health information dissemination on the internet have also grown. At the same time, the presence of false health information (health rumors) intermingled with genuine content poses a significant potential threat to public health. However, current research on Chinese health rumors still lacks a large-scale, public, and open-so… ▽ More

    Submitted 3 July, 2024; v1 submitted 30 June, 2024; originally announced July 2024.

  24. arXiv:2407.00530  [pdf, other

    cond-mat.dis-nn physics.comp-ph

    Solving combinatorial optimization problems through stochastic Landau-Lifshitz-Gilbert dynamical systems

    Authors: Dairong Chen, Andrew D. Kent, Dries Sels, Flaviano Morone

    Abstract: We present a method to approximately solve general instances of combinatorial optimization problems using the physical dynamics of 3d rotors obeying Landau-Lifshitz-Gilbert dynamics. Conventional techniques to solve discrete optimization problems that use simple continuous relaxation of the objective function followed by gradient descent minimization are inherently unable to avoid local optima, th… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  25. arXiv:2407.00136  [pdf, other

    hep-ex

    Observation of the Electromagnetic Dalitz Transition $h_c \rightarrow e^+e^-η_c$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, S. Ahmed, M. Albrecht, R. Aliberti, A. Amoroso, M. R. An, Q. An, X. H. Bai, Y. Bai, O. Bakina, R. Baldini Ferroli, I. Balossino, Y. Ban, K. Begzsuren, N. Berger, M. Bertani, D. Bettoni, F. Bianchi, J. Bloms, A. Bortone, I. Boyko, R. A. Briere , et al. (495 additional authors not shown)

    Abstract: Using $(27.12\pm 0.14)\times10^8$ $ψ(3686)$ decays and data samples of $e^+e^-$ collisions with $\sqrt{s}$ from 4.130 to 4.780~GeV collected with the BESIII detector, we report the first observation of the electromagnetic Dalitz transition $h_c\to e^+e^-η_c$ with a statistical significance of $5.4σ$. We measure the ratio of the branching fractions… ▽ More

    Submitted 2 July, 2024; v1 submitted 28 June, 2024; originally announced July 2024.

  26. arXiv:2406.19485  [pdf, other

    eess.IV cs.CV

    GAPNet: Granularity Attention Network with Anatomy-Prior-Constraint for Carotid Artery Segmentation

    Authors: Lin Zhang, Chenggang Lu, Xin-yang Shi, Caifeng Shan, Jiong Zhang, Da Chen, Laurent D. Cohen

    Abstract: Atherosclerosis is a chronic, progressive disease that primarily affects the arterial walls. It is one of the major causes of cardiovascular disease. Magnetic Resonance (MR) black-blood vessel wall imaging (BB-VWI) offers crucial insights into vascular disease diagnosis by clearly visualizing vascular structures. However, the complex anatomy of the neck poses challenges in distinguishing the carot… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  27. arXiv:2406.18966  [pdf, other

    cs.CL

    UniGen: A Unified Framework for Textual Dataset Generation Using Large Language Models

    Authors: Siyuan Wu, Yue Huang, Chujie Gao, Dongping Chen, Qihui Zhang, Yao Wan, Tianyi Zhou, Xiangliang Zhang, Jianfeng Gao, Chaowei Xiao, Lichao Sun

    Abstract: Large Language Models (LLMs) such as GPT-4 and Llama3 have significantly impacted various fields by enabling high-quality synthetic data generation and reducing dependence on expensive human-generated datasets. Despite this, challenges remain in the areas of generalization, controllability, diversity, and truthfulness within the existing generative frameworks. To address these challenges, this pap… ▽ More

    Submitted 28 June, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

  28. arXiv:2406.18521  [pdf, other

    cs.CL cs.CV

    CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs

    Authors: Zirui Wang, Mengzhou Xia, Luxi He, Howard Chen, Yitao Liu, Richard Zhu, Kaiqu Liang, Xindi Wu, Haotian Liu, Sadhika Malladi, Alexis Chevalier, Sanjeev Arora, Danqi Chen

    Abstract: Chart understanding plays a pivotal role when applying Multimodal Large Language Models (MLLMs) to real-world tasks such as analyzing scientific papers or financial reports. However, existing datasets often focus on oversimplified and homogeneous charts with template-based questions, leading to an over-optimistic measure of progress. We demonstrate that although open-source models can appear to ou… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 121 pages, 90 figures

  29. arXiv:2406.16778  [pdf, other

    cs.CL

    Finding Transformer Circuits with Edge Pruning

    Authors: Adithya Bhaskar, Alexander Wettig, Dan Friedman, Danqi Chen

    Abstract: The path to interpreting a language model often proceeds via analysis of circuits -- sparse computational subgraphs of the model that capture specific aspects of its behavior. Recent work has automated the task of discovering circuits. Yet, these methods have practical limitations, as they rely either on inefficient search algorithms or inaccurate approximations. In this paper, we frame automated… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: We release our code and data publicly at https://github.com/princeton-nlp/Edge-Pruning

  30. arXiv:2406.15480  [pdf, other

    cs.CL cs.AI cs.LG

    On Giant's Shoulders: Effortless Weak to Strong by Dynamic Logits Fusion

    Authors: Chenghao Fan, Zhenyi Lu, Wei Wei, Jie Tian, Xiaoye Qu, Dangyang Chen, Yu Cheng

    Abstract: Efficient fine-tuning of large language models for task-specific applications is imperative, yet the vast number of parameters in these models makes their training increasingly challenging. Despite numerous proposals for effective methods, a substantial memory overhead remains for gradient computations during updates. \thm{Can we fine-tune a series of task-specific small models and transfer their… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: submit under review

  31. arXiv:2406.15479  [pdf, other

    cs.CL cs.AI cs.LG

    Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging

    Authors: Zhenyi Lu, Chenghao Fan, Wei Wei, Xiaoye Qu, Dangyang Chen, Yu Cheng

    Abstract: In the era of large language models, model merging is a promising way to combine multiple task-specific models into a single multitask model without extra training. However, two challenges remain: (a) interference between different models and (b) heterogeneous data during testing. Traditional model merging methods often show significant performance gaps compared to fine-tuned models due to these i… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: submit in review

  32. arXiv:2406.15471  [pdf, other

    cs.CL cs.AI cs.LG

    Improving Large Models with Small models: Lower Costs and Better Performance

    Authors: Dong Chen, Shuo Zhang, Yueting Zhuang, Siliang Tang, Qidong Liu, Hua Wang, Mingliang Xu

    Abstract: Pretrained large models (PLMs), such as ChatGPT, have demonstrated remarkable performance across diverse tasks. However, the significant computational requirements of PLMs have discouraged most product teams from running or fine-tuning them. In such cases, to harness the exceptional performance of PLMs, one must rely on expensive APIs, thereby exacerbating the economic burden. Despite the overall… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 11 pages

  33. arXiv:2406.14598  [pdf, other

    cs.AI

    SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors

    Authors: Tinghao Xie, Xiangyu Qi, Yi Zeng, Yangsibo Huang, Udari Madhushani Sehwag, Kaixuan Huang, Luxi He, Boyi Wei, Dacheng Li, Ying Sheng, Ruoxi Jia, Bo Li, Kai Li, Danqi Chen, Peter Henderson, Prateek Mittal

    Abstract: Evaluating aligned large language models' (LLMs) ability to recognize and reject unsafe user requests is crucial for safe, policy-compliant deployments. Existing evaluation efforts, however, face three limitations that we address with SORRY-Bench, our proposed benchmark. First, existing methods often use coarse-grained taxonomies of unsafe topics, and are over-representing some fine-grained topics… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  34. arXiv:2406.14526  [pdf, other

    cs.CV cs.AI cs.CY cs.LG

    Fantastic Copyrighted Beasts and How (Not) to Generate Them

    Authors: Luxi He, Yangsibo Huang, Weijia Shi, Tinghao Xie, Haotian Liu, Yue Wang, Luke Zettlemoyer, Chiyuan Zhang, Danqi Chen, Peter Henderson

    Abstract: Recent studies show that image and video generation models can be prompted to reproduce copyrighted content from their training data, raising serious legal concerns around copyright infringement. Copyrighted characters, in particular, pose a difficult challenge for image generation services, with at least one lawsuit already awarding damages based on the generation of these characters. Yet, little… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  35. arXiv:2406.14123  [pdf

    cs.CY

    Mapping AI Ethics Narratives: Evidence from Twitter Discourse Between 2015 and 2022

    Authors: Mengyi Wei, Puzhen Zhang, Chuan Chen, Dongsheng Chen, Chenyu Zuo, Liqiu Meng

    Abstract: Public participation is indispensable for an insightful understanding of the ethics issues raised by AI technologies. Twitter is selected in this paper to serve as an online public sphere for exploring discourse on AI ethics, facilitating broad and equitable public engagement in the development of AI technology. A research framework is proposed to demonstrate how to transform AI ethics-related dis… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 22 pages, 6 figures

  36. arXiv:2406.13662  [pdf, other

    cs.CL

    ObscurePrompt: Jailbreaking Large Language Models via Obscure Input

    Authors: Yue Huang, Jingyu Tang, Dongping Chen, Bingda Tang, Yao Wan, Lichao Sun, Xiangliang Zhang

    Abstract: Recently, Large Language Models (LLMs) have garnered significant attention for their exceptional natural language processing capabilities. However, concerns about their trustworthiness remain unresolved, particularly in addressing "jailbreaking" attacks on aligned LLMs. Previous research predominantly relies on scenarios with white-box LLMs or specific and fixed prompt templates, which are often i… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  37. arXiv:2406.13340  [pdf, other

    cs.CL cs.SD eess.AS

    SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words

    Authors: Junyi Ao, Yuancheng Wang, Xiaohai Tian, Dekun Chen, Jun Zhang, Lu Lu, Yuxuan Wang, Haizhou Li, Zhizheng Wu

    Abstract: Speech encompasses a wealth of information, including but not limited to content, paralinguistic, and environmental information. This comprehensive nature of speech significantly impacts communication and is crucial for human-computer interaction. Chat-Oriented Large Language Models (LLMs), known for their general-purpose assistance capabilities, have evolved to handle multi-modal inputs, includin… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  38. arXiv:2406.12125  [pdf, other

    cs.LG cs.CL

    Efficient Sequential Decision Making with Large Language Models

    Authors: Dingyang Chen, Qi Zhang, Yinglun Zhu

    Abstract: This paper focuses on extending the success of large language models (LLMs) to sequential decision making. Existing efforts either (i) re-train or finetune LLMs for decision making, or (ii) design prompts for pretrained LLMs. The former approach suffers from the computational burden of gradient updates, and the latter approach does not show promising results. In this paper, we propose a new approa… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  39. arXiv:2406.11931  [pdf, other

    cs.SE cs.AI cs.LG

    DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

    Authors: DeepSeek-AI, Qihao Zhu, Daya Guo, Zhihong Shao, Dejian Yang, Peiyi Wang, Runxin Xu, Y. Wu, Yukun Li, Huazuo Gao, Shirong Ma, Wangding Zeng, Xiao Bi, Zihui Gu, Hanwei Xu, Damai Dai, Kai Dong, Liyue Zhang, Yishi Piao, Zhibin Gou, Zhenda Xie, Zhewen Hao, Bingxuan Wang, Junxiao Song, Deli Chen , et al. (15 additional authors not shown)

    Abstract: We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathe… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  40. arXiv:2406.11837  [pdf, other

    cs.CV

    Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99%

    Authors: Lei Zhu, Fangyun Wei, Yanye Lu, Dong Chen

    Abstract: In the realm of image quantization exemplified by VQGAN, the process encodes images into discrete tokens drawn from a codebook with a predefined size. Recent advancements, particularly with LLAMA 3, reveal that enlarging the codebook significantly enhances model performance. However, VQGAN and its derivatives, such as VQGAN-FC (Factorized Codes) and VQGAN-EMA, continue to grapple with challenges r… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  41. arXiv:2406.11653  [pdf, other

    eess.SY

    Communication-Efficient MARL for Platoon Stability and Energy-efficiency Co-optimization in Cooperative Adaptive Cruise Control of CAVs

    Authors: Min Hua, Dong Chen, Kun Jiang, Fanggang Zhang, Jinhai Wang, Bo Wang, Quan Zhou, Hongming Xu

    Abstract: Cooperative adaptive cruise control (CACC) has been recognized as a fundamental function of autonomous driving, in which platoon stability and energy efficiency are outstanding challenges that are difficult to accommodate in real-world operations. This paper studied the CACC of connected and autonomous vehicles (CAVs) based on the multi-agent reinforcement learning algorithm (MARL) to optimize pla… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  42. arXiv:2406.11026  [pdf, other

    cs.CV cs.AI

    Boosting Medical Image Classification with Segmentation Foundation Model

    Authors: Pengfei Gu, Zihan Zhao, Hongxiao Wang, Yaopeng Peng, Yizhe Zhang, Nishchal Sapkota, Chaoli Wang, Danny Z. Chen

    Abstract: The Segment Anything Model (SAM) exhibits impressive capabilities in zero-shot segmentation for natural images. Recently, SAM has gained a great deal of attention for its applications in medical image segmentation. However, to our best knowledge, no studies have shown how to harness the power of SAM for medical image classification. To fill this gap and make SAM a true ``foundation model'' for med… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  43. arXiv:2406.10961  [pdf, other

    cs.CV cs.AI cs.CY

    Open-Vocabulary X-ray Prohibited Item Detection via Fine-tuning CLIP

    Authors: Shuyang Lin, Tong Jia, Hao Wang, Bowen Ma, Mingyuan Li, Dongyue Chen

    Abstract: X-ray prohibited item detection is an essential component of security check and categories of prohibited item are continuously increasing in accordance with the latest laws. Previous works all focus on close-set scenarios, which can only recognize known categories used for training and often require time-consuming as well as labor-intensive annotations when learning novel categories, resulting in… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  44. arXiv:2406.10903  [pdf, other

    cs.LG cs.CL cs.SE

    New Solutions on LLM Acceleration, Optimization, and Application

    Authors: Yingbing Huang, Lily Jiaxin Wan, Hanchen Ye, Manvi Jha, Jinghua Wang, Yuhong Li, Xiaofan Zhang, Deming Chen

    Abstract: Large Language Models (LLMs) have become extremely potent instruments with exceptional capacities for comprehending and producing human-like text in a wide range of applications. However, the increasing size and complexity of LLMs present significant challenges in both training and deployment, leading to substantial computational and storage costs as well as heightened energy consumption. In this… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: This is an expanded and more comprehensive study based on our invited DAC-24 paper with the same title and co-authors

  45. arXiv:2406.10819  [pdf, other

    cs.CV cs.AI cs.CL

    GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents

    Authors: Dongping Chen, Yue Huang, Siyuan Wu, Jingyu Tang, Liuyi Chen, Yilin Bai, Zhigang He, Chenlong Wang, Huichi Zhou, Yiqiang Li, Tianshuo Zhou, Yue Yu, Chujie Gao, Qihui Zhang, Yi Gui, Zhen Li, Yao Wan, Pan Zhou, Jianfeng Gao, Lichao Sun

    Abstract: Recently, Multimodal Large Language Models (MLLMs) have been used as agents to control keyboard and mouse inputs by directly perceiving the Graphical User Interface (GUI) and generating corresponding code. However, current agents primarily exhibit excellent understanding capabilities in static environments and are predominantly applied in relatively simple domains, such as Web or mobile interfaces… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  46. arXiv:2406.10519  [pdf, other

    cs.CV cs.AI

    Self Pre-training with Topology- and Spatiality-aware Masked Autoencoders for 3D Medical Image Segmentation

    Authors: Pengfei Gu, Yejia Zhang, Huimin Li, Chaoli Wang, Danny Z. Chen

    Abstract: Masked Autoencoders (MAEs) have been shown to be effective in pre-training Vision Transformers (ViTs) for natural and medical image analysis problems. By reconstructing missing pixel/voxel information in visible patches, a ViT encoder can aggregate contextual information for downstream tasks. But, existing MAE pre-training methods, which were specifically developed with the ViT architecture, lack… ▽ More

    Submitted 15 July, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

  47. arXiv:2406.10358  [pdf, other

    cs.CR eess.SY

    I Still See You: Why Existing IoT Traffic Reshaping Fails

    Authors: Su Wang, Keyang Yu, Qi Li, Dong Chen

    Abstract: The Internet traffic data produced by the Internet of Things (IoT) devices are collected by Internet Service Providers (ISPs) and device manufacturers, and often shared with their third parties to maintain and enhance user services. Unfortunately, on-path adversaries could infer and fingerprint users' sensitive privacy information such as occupancy and user activities by analyzing these network tr… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: EWSN'24 paper accepted, to appear

  48. arXiv:2406.10263  [pdf, other

    cs.SE

    A Lightweight Framework for Adaptive Retrieval In Code Completion With Critique Model

    Authors: Wenrui Zhang, Tiehang Fu, Ting Yuan, Ge Zhang, Dong Chen, Jie Wang

    Abstract: Recent advancements in Retrieval-Augmented Generation have significantly enhanced code completion at the repository level. Various RAG-based code completion systems are proposed based on different design choices. For instance, gaining more effectiveness at the cost of repeating the retrieval-generation process multiple times. However, the indiscriminate use of retrieval in current methods reveals… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  49. arXiv:2406.09166  [pdf, other

    cs.CV cs.AI

    Fine-Grained Domain Generalization with Feature Structuralization

    Authors: Wenlong Yu, Dongyue Chen, Qilong Wang, Qinghua Hu

    Abstract: Fine-grained domain generalization (FGDG) is a more challenging task than traditional DG tasks due to its small inter-class variations and relatively large intra-class disparities. When domain distribution changes, the vulnerability of subtle features leads to a severe deterioration in model performance. Nevertheless, humans inherently demonstrate the capacity for generalizing to out-of-distributi… ▽ More

    Submitted 17 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  50. arXiv:2406.08794  [pdf, other

    astro-ph.EP

    Constraints on the formation history and composition of Kepler planets from their distribution of orbital period ratios

    Authors: Di-Chang Chen, Christoph Mordasini, Ji-Wei Xie, Ji-Lin Zhou, Alexandre Emsenhuber

    Abstract: The Kepler high-precision planetary sample has revealed a radius valley, separating compact super-Earths from sub-Neptunes with lower density. Super-Earths are generally assumed to be rocky planets that were probably born in-situ, while the composition and origin of sub-Neptunes remains debated. To provide more constraints on the formation history and composition, based on the planetary sample of… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted for publication in A&A; 14 pages, 6 figures in the main text, 7 figures in Appendix