Skip to main content

Showing 1–50 of 1,401 results for author: Li, G

  1. arXiv:2407.02813  [pdf, other

    cs.CV cs.AI cs.LG

    Data Overfitting for On-Device Super-Resolution with Dynamic Algorithm and Compiler Co-Design

    Authors: Gen Li, Zhihao Shu, Jie Ji, Minghai Qin, Fatemeh Afghah, Wei Niu, Xiaolong Ma

    Abstract: Deep neural networks (DNNs) are frequently employed in a variety of computer vision applications. Nowadays, an emerging trend in the current video distribution system is to take advantage of DNN's overfitting properties to perform video resolution upscaling. By splitting videos into chunks and applying a super-resolution (SR) model to overfit each chunk, this scheme of SR models plus video chunks… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: ECCV2024

  2. Coding-Enhanced Cooperative Jamming for Secret Communication in Fluid Antenna Systems

    Authors: Hao Xu, Kai-Kit Wong, Wee Kiat New, Guyue Li, Farshad Rostami Ghadi, Yongxu Zhu, Shi Jin, Chan-Byoung Chae, Yangyang Zhang

    Abstract: This letter investigates the secret communication problem for a fluid antenna system (FAS)-assisted wiretap channel, where the legitimate transmitter transmits an information-bearing signal to the legitimate receiver, and at the same time, transmits a jamming signal to interfere with the eavesdropper (Eve). Unlike the conventional jamming scheme, which usually transmits Gaussian noise that interfe… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 6 pages, 3 figures, this paper has been accepted by IEEE Communications Letters

  3. arXiv:2407.01917  [pdf, other

    cs.NI cs.CR cs.DC

    Securing Distributed Network Digital Twin Systems Against Model Poisoning Attacks

    Authors: Zifan Zhang, Minghong Fang, Mingzhe Chen, Gaolei Li, Xi Lin, Yuchen Liu

    Abstract: In the era of 5G and beyond, the increasing complexity of wireless networks necessitates innovative frameworks for efficient management and deployment. Digital twins (DTs), embodying real-time monitoring, predictive configurations, and enhanced decision-making capabilities, stand out as a promising solution in this context. Within a time-series data-driven framework that effectively maps wireless… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted by Internet of Things Journal (IoT-J). arXiv admin note: substantial text overlap with arXiv:2404.14389

  4. arXiv:2407.01640  [pdf, other

    cs.LG

    BADM: Batch ADMM for Deep Learning

    Authors: Ouya Wang, Shenglong Zhou, Geoffrey Ye Li

    Abstract: Stochastic gradient descent-based algorithms are widely used for training deep neural networks but often suffer from slow convergence. To address the challenge, we leverage the framework of the alternating direction method of multipliers (ADMM) to develop a novel data-driven algorithm, called batch ADMM (BADM). The fundamental idea of the proposed algorithm is to split the training data into batch… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  5. arXiv:2407.01511  [pdf, other

    cs.AI

    CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents

    Authors: Tianqi Xu, Linyao Chen, Dai-Jie Wu, Yanjun Chen, Zecheng Zhang, Xiang Yao, Zhiqiang Xie, Yongchao Chen, Shilong Liu, Bochen Qian, Philip Torr, Bernard Ghanem, Guohao Li

    Abstract: The development of autonomous agents increasingly relies on Multimodal Language Models (MLMs) to perform tasks described in natural language with GUI environments, such as websites, desktop computers, or mobile phones. Existing benchmarks for MLM agents in interactive environments are limited by their focus on a single environment, lack of detailed and generalized evaluation methods, and the compl… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  6. arXiv:2407.01003  [pdf, other

    cs.CV cs.AI

    Embedded Prompt Tuning: Towards Enhanced Calibration of Pretrained Models for Medical Images

    Authors: Wenqiang Zu, Shenghao Xie, Qing Zhao, Guoqi Li, Lei Ma

    Abstract: Foundation models pre-trained on large-scale data have been widely witnessed to achieve success in various natural imaging downstream tasks. Parameter-efficient fine-tuning (PEFT) methods aim to adapt foundation models to new domains by updating only a small portion of parameters in order to reduce computational overhead. However, the effectiveness of these PEFT methods, especially in cross-domain… ▽ More

    Submitted 2 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

    Comments: 16 pages, 7 figures. arXiv admin note: text overlap with arXiv:2306.09579, arXiv:2203.12119 by other authors

  7. arXiv:2407.00896  [pdf, other

    eess.SP cs.AI

    Channel Modeling Aided Dataset Generation for AI-Enabled CSI Feedback: Advances, Challenges, and Solutions

    Authors: Yupeng Li, Gang Li, Zirui Wen, Shuangfeng Han, Shijian Gao, Guangyi Liu, Jiangzhou Wang

    Abstract: The AI-enabled autoencoder has demonstrated great potential in channel state information (CSI) feedback in frequency division duplex (FDD) multiple input multiple output (MIMO) systems. However, this method completely changes the existing feedback strategies, making it impractical to deploy in recent years. To address this issue, this paper proposes a channel modeling aided data augmentation metho… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  8. arXiv:2407.00295  [pdf, other

    cs.CV

    A deep neural network framework for dynamic multi-valued mapping estimation and its applications

    Authors: Geng Li, Di Qiu, Lok Ming Lui

    Abstract: This paper addresses the problem of modeling and estimating dynamic multi-valued mappings. While most mathematical models provide a unique solution for a given input, real-world applications often lack deterministic solutions. In such scenarios, estimating dynamic multi-valued mappings is necessary to suggest different reasonable solutions for each input. This paper introduces a deep neural networ… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

  9. arXiv:2406.19811  [pdf, ps, other

    cs.CV

    EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting

    Authors: Daiwei Zhang, Gengyan Li, Jiajie Li, Mickaël Bressieux, Otmar Hilliges, Marc Pollefeys, Luc Van Gool, Xi Wang

    Abstract: Human activities are inherently complex, and even simple household tasks involve numerous object interactions. To better understand these activities and behaviors, it is crucial to model their dynamic interactions with the environment. The recent availability of affordable head-mounted cameras and egocentric data offers a more accessible and efficient means to understand dynamic human-object inter… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  10. arXiv:2406.18559  [pdf, other

    cs.HC cs.AI cs.CV cs.LG

    Revision Matters: Generative Design Guided by Revision Edits

    Authors: Tao Li, Chin-Yi Cheng, Amber Xie, Gang Li, Yang Li

    Abstract: Layout design, such as user interface or graphical layout in general, is fundamentally an iterative revision process. Through revising a design repeatedly, the designer converges on an ideal layout. In this paper, we investigate how revision edits from human designer can benefit a multimodal generative model. To do so, we curate an expert dataset that traces how human designers iteratively edit an… ▽ More

    Submitted 27 May, 2024; originally announced June 2024.

  11. arXiv:2406.18512  [pdf, other

    cs.CL

    "Is ChatGPT a Better Explainer than My Professor?": Evaluating the Explanation Capabilities of LLMs in Conversation Compared to a Human Baseline

    Authors: Grace Li, Milad Alshomary, Smaranda Muresan

    Abstract: Explanations form the foundation of knowledge sharing and build upon communication principles, social dynamics, and learning theories. We focus specifically on conversational approaches for explanations because the context is highly adaptive and interactive. Our research leverages previous work on explanatory acts, a framework for understanding the different strategies that explainers and explaine… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 6 figures, 5 pages

  12. arXiv:2406.18351  [pdf, other

    cs.LG cs.AI

    Reinforcement Learning with Intrinsically Motivated Feedback Graph for Lost-sales Inventory Control

    Authors: Zifan Liu, Xinran Li, Shibo Chen, Gen Li, Jiashuo Jiang, Jun Zhang

    Abstract: Reinforcement learning (RL) has proven to be well-performed and general-purpose in the inventory control (IC). However, further improvement of RL algorithms in the IC domain is impeded due to two limitations of online experience. First, online experience is expensive to acquire in real-world applications. With the low sample efficiency nature of RL algorithms, it would take extensive time to train… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  13. arXiv:2406.17172  [pdf, other

    cs.CR cs.DC cs.LG

    Robust Zero Trust Architecture: Joint Blockchain based Federated learning and Anomaly Detection based Framework

    Authors: Shiva Raj Pokhrel, Luxing Yang, Sutharshan Rajasegarar, Gang Li

    Abstract: This paper introduces a robust zero-trust architecture (ZTA) tailored for the decentralized system that empowers efficient remote work and collaboration within IoT networks. Using blockchain-based federated learning principles, our proposed framework includes a robust aggregation mechanism designed to counteract malicious updates from compromised clients, enhancing the security of the global learn… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Journal ref: ACM SIGCOMM 2024 Sydney

  14. arXiv:2406.16807  [pdf, other

    cs.LG cs.CL cs.CV

    Beyond Thumbs Up/Down: Untangling Challenges of Fine-Grained Feedback for Text-to-Image Generation

    Authors: Katherine M. Collins, Najoung Kim, Yonatan Bitton, Verena Rieser, Shayegan Omidshafiei, Yushi Hu, Sherol Chen, Senjuti Dutta, Minsuk Chang, Kimin Lee, Youwei Liang, Georgina Evans, Sahil Singla, Gang Li, Adrian Weller, Junfeng He, Deepak Ramachandran, Krishnamurthy Dj Dvijotham

    Abstract: Human feedback plays a critical role in learning and refining reward models for text-to-image generation, but the optimal form the feedback should take for learning an accurate reward function has not been conclusively established. This paper investigates the effectiveness of fine-grained feedback which captures nuanced distinctions in image quality and prompt-alignment, compared to traditional co… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  15. arXiv:2406.16150  [pdf, other

    eess.IV cs.CV

    Intensity Confusion Matters: An Intensity-Distance Guided Loss for Bronchus Segmentation

    Authors: Haifan Gong, Wenhao Huang, Huan Zhang, Yu Wang, Xiang Wan, Hong Shen, Guanbin Li, Haofeng Li

    Abstract: Automatic segmentation of the bronchial tree from CT imaging is important, as it provides structural information for disease diagnosis. Despite the merits of previous automatic bronchus segmentation methods, they have paied less attention to the issue we term as \textit{Intensity Confusion}, wherein the intensity values of certain background voxels approach those of the foreground voxels within br… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: IEEE International Conference on Multimedia & Expo (ICME) 2024

  16. arXiv:2406.16137  [pdf, other

    cs.CV

    MLPHand: Real Time Multi-View 3D Hand Mesh Reconstruction via MLP Modeling

    Authors: Jian Yang, Jiakun Li, Guoming Li, Zhen Shen, Huai-Yu Wu, Zhaoxin Fan, Heng Huang

    Abstract: Multi-view hand mesh reconstruction is a critical task for applications in virtual reality and human-computer interaction, but it remains a formidable challenge. Although existing multi-view hand reconstruction methods achieve remarkable accuracy, they typically come with an intensive computational burden that hinders real-time inference. To this end, we propose MLPHand, a novel method designed fo… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  17. arXiv:2406.15906  [pdf, other

    cs.NI cs.AI

    OpticGAI: Generative AI-aided Deep Reinforcement Learning for Optical Networks Optimization

    Authors: Siyuan Li, Xi Lin, Yaju Liu, Gaolei Li, Jianhua Li

    Abstract: Deep Reinforcement Learning (DRL) is regarded as a promising tool for optical network optimization. However, the flexibility and efficiency of current DRL-based solutions for optical network optimization require further improvement. Currently, generative models have showcased their significant performance advantages across various domains. In this paper, we introduce OpticGAI, the AI-generated pol… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: Accepted by ACM SIGCOMM 2024 Workshop on Hot Topics in Optical Technologies and Applications in Networking

  18. arXiv:2406.13275  [pdf, other

    cs.SD cs.CL eess.AS

    Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding

    Authors: Jizhong Liu, Gang Li, Junbo Zhang, Heinrich Dinkel, Yongqing Wang, Zhiyong Yan, Yujun Wang, Bin Wang

    Abstract: Automated audio captioning (AAC) is an audio-to-text task to describe audio contents in natural language. Recently, the advancements in large language models (LLMs), with improvements in training approaches for audio encoders, have opened up possibilities for improving AAC. Thus, we explore enhancing AAC from three aspects: 1) a pre-trained audio encoder via consistent ensemble distillation (CED)… ▽ More

    Submitted 25 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  19. arXiv:2406.13223  [pdf, other

    cs.RO

    Act Better by Timing: A timing-Aware Reinforcement Learning for Autonomous Driving

    Authors: Guanzhou Li, Jianping Wu, Yujing He

    Abstract: Coping with intensively interactive scenarios is one of the significant challenges in the development of autonomous driving. Reinforcement learning (RL) offers an ideal solution for such scenarios through its self-evolution mechanism via interaction with the environment. However, the lack of sufficient safety mechanisms in common RL leads to the fact that agent often find it difficult to interact… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  20. arXiv:2406.12539  [pdf, other

    cs.LG cs.AI

    The Heterophilic Snowflake Hypothesis: Training and Empowering GNNs for Heterophilic Graphs

    Authors: Kun Wang, Guibin Zhang, Xinnan Zhang, Junfeng Fang, Xun Wu, Guohao Li, Shirui Pan, Wei Huang, Yuxuan Liang

    Abstract: Graph Neural Networks (GNNs) have become pivotal tools for a range of graph-based learning tasks. Notably, most current GNN architectures operate under the assumption of homophily, whether explicitly or implicitly. While this underlying assumption is frequently adopted, it is not universally applicable, which can result in potential shortcomings in learning effectiveness. In this paper, \textbf{fo… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: KDD 2024

  21. arXiv:2406.12538  [pdf, other

    cs.LG cs.AI cs.RO

    Variational Distillation of Diffusion Policies into Mixture of Experts

    Authors: Hongyi Zhou, Denis Blessing, Ge Li, Onur Celik, Xiaogang Jia, Gerhard Neumann, Rudolf Lioutikov

    Abstract: This work introduces Variational Diffusion Distillation (VDD), a novel method that distills denoising diffusion policies into Mixtures of Experts (MoE) through variational inference. Diffusion Models are the current state-of-the-art in generative modeling due to their exceptional ability to accurately learn and represent complex, multi-modal distributions. This ability allows Diffusion Models to r… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  22. arXiv:2406.11496  [pdf, other

    cs.MA

    Decentralized Collaborative Pricing and Shunting for Multiple EV Charging Stations Based on Multi-Agent Reinforcement Learning

    Authors: Tianhao Bu, Hang Li, Guojie Li

    Abstract: The extraordinary electric vehicle (EV) popularization in the recent years has facilitated research studies in alleviating EV energy charging demand. Previous studies primarily focused on the optimizations over charging stations (CS) profit and EV users cost savings through charge/discharge scheduling events. In this work, the random behaviors of EVs are considered, with EV users preferences over… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted at PSGEC: PSGEC2024

  23. arXiv:2406.11231  [pdf, other

    cs.RO cs.AI cs.CL cs.LG

    Enabling robots to follow abstract instructions and complete complex dynamic tasks

    Authors: Ruaridh Mon-Williams, Gen Li, Ran Long, Wenqian Du, Chris Lucas

    Abstract: Completing complex tasks in unpredictable settings like home kitchens challenges robotic systems. These challenges include interpreting high-level human commands, such as "make me a hot beverage" and performing actions like pouring a precise amount of water into a moving mug. To address these challenges, we present a novel framework that combines Large Language Models (LLMs), a curated Knowledge B… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  24. arXiv:2406.11033  [pdf, other

    cs.DB cs.AI

    HAIChart: Human and AI Paired Visualization System

    Authors: Yupeng Xie, Yuyu Luo, Guoliang Li, Nan Tang

    Abstract: The growing importance of data visualization in business intelligence and data science emphasizes the need for tools that can efficiently generate meaningful visualizations from large datasets. Existing tools fall into two main categories: human-powered tools (e.g., Tableau and PowerBI), which require intensive expert involvement, and AI-powered automated tools (e.g., Draco and Table2Charts), whic… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 16 pages, 14 figures, 7 tables

  25. arXiv:2406.10744   

    cs.CV

    Technique Report of CVPR 2024 PBDL Challenges

    Authors: Ying Fu, Yu Li, Shaodi You, Boxin Shi, Jose Alvarez, Coert van Gemeren, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li, Yuze Han, Yingkai Zhang, Jianan Wang, Qinglin Liu, Wei Yu, Xiaoqian Lv, Jianing Li, Shengping Zhang, Xiangyang Ji, Yuanpei Chen, Yuhan Zhang, Weihang Peng, Liwen Zhang, Zhe Xu, Dingyong Gou , et al. (77 additional authors not shown)

    Abstract: The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, a… ▽ More

    Submitted 27 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: The author list and contents need to be verified by all authors

  26. arXiv:2406.10591  [pdf, other

    eess.AS cs.AI cs.CV cs.MM cs.SD

    MINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley Audio Content Planning and Generation

    Authors: Ruibo Fu, Shuchen Shi, Hongming Guo, Tao Wang, Chunyu Qiang, Zhengqi Wen, Jianhua Tao, Xin Qi, Yi Lu, Xiaopeng Wang, Zhiyong Wang, Yukun Liu, Xuefei Liu, Shuai Zhang, Guanjun Li

    Abstract: Foley audio, critical for enhancing the immersive experience in multimedia content, faces significant challenges in the AI-generated content (AIGC) landscape. Despite advancements in AIGC technologies for text and image generation, the foley audio dubbing remains rudimentary due to difficulties in cross-modal scene matching and content correlation. Current text-to-audio technology, which relies on… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  27. arXiv:2406.10573  [pdf, other

    cs.LG cs.AI cs.CR

    Graph Neural Backdoor: Fundamentals, Methodologies, Applications, and Future Directions

    Authors: Xiao Yang, Gaolei Li, Jianhua Li

    Abstract: Graph Neural Networks (GNNs) have significantly advanced various downstream graph-relevant tasks, encompassing recommender systems, molecular structure prediction, social media analysis, etc. Despite the boosts of GNN, recent research has empirically demonstrated its potential vulnerability to backdoor attacks, wherein adversaries employ triggers to poison input samples, inducing GNN to adversary-… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  28. arXiv:2406.10152  [pdf, other

    cs.SD eess.AS

    Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition

    Authors: Guinan Li, Jiajun Deng, Youjun Chen, Mengzhe Geng, Shujie Hu, Zhe Li, Zengrui Jin, Tianzi Wang, Xurong Xie, Helen Meng, Xunying Liu

    Abstract: This paper proposes joint speaker feature learning methods for zero-shot adaptation of audio-visual multichannel speech separation and recognition systems. xVector and ECAPA-TDNN speaker encoders are connected using purpose-built fusion blocks and tightly integrated with the complete system training. Experiments conducted on LRS3-TED data simulated multichannel overlapped speech suggest that joint… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  29. arXiv:2406.10034  [pdf, other

    cs.SD cs.AI eess.AS

    Towards Effective and Efficient Non-autoregressive Decoding Using Block-based Attention Mask

    Authors: Tianzi Wang, Xurong Xie, Zhaoqing Li, Shoukang Hu, Zengrui Jing, Jiajun Deng, Mingyu Cui, Shujie Hu, Mengzhe Geng, Guinan Li, Helen Meng, Xunying Liu

    Abstract: This paper proposes a novel non-autoregressive (NAR) block-based Attention Mask Decoder (AMD) that flexibly balances performance-efficiency trade-offs for Conformer ASR systems. AMD performs parallel NAR inference within contiguous blocks of output labels that are concealed using attention masks, while conducting left-to-right AR prediction and history context amalgamation between blocks. A beam s… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 5 pages, 2 figures, 2 tables, Interspeech24 conference

  30. arXiv:2406.09238  [pdf, other

    cs.IT eess.SP

    Near-Field Multiuser Communications based on Sparse Arrays

    Authors: Kangjian Chen, Chenhao Qi, Geoffrey Ye Li, Octavia A. Dobre

    Abstract: This paper considers near-field multiuser communications based on sparse arrays (SAs). First, for the uniform SAs (USAs), we analyze the beam gains of channel steering vectors, which shows that increasing the antenna spacings can effectively improve the spatial resolution of the antenna arrays to enhance the sum rate of multiuser communications. Then, we investigate nonuniform SAs (NSAs) to mitiga… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  31. arXiv:2406.09198  [pdf, other

    cs.CV

    CLIP-Driven Cloth-Agnostic Feature Learning for Cloth-Changing Person Re-Identification

    Authors: Shuang Li, Jiaxu Leng, Guozhang Li, Ji Gan, Haosheng chen, Xinbo Gao

    Abstract: Contrastive Language-Image Pre-Training (CLIP) has shown impressive performance in short-term Person Re-Identification (ReID) due to its ability to extract high-level semantic features of pedestrians, yet its direct application to Cloth-Changing Person Re-Identification (CC-ReID) faces challenges due to CLIP's image encoder overly focusing on clothes clues. To address this, we propose a novel fram… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  32. arXiv:2406.08234  [pdf, other

    cs.LG cs.RO

    MaIL: Improving Imitation Learning with Mamba

    Authors: Xiaogang Jia, Qian Wang, Atalay Donat, Bowen Xing, Ge Li, Hongyi Zhou, Onur Celik, Denis Blessing, Rudolf Lioutikov, Gerhard Neumann

    Abstract: This work introduces Mamba Imitation Learning (MaIL), a novel imitation learning (IL) architecture that offers a computationally efficient alternative to state-of-the-art (SoTA) Transformer policies. Transformer-based policies have achieved remarkable results due to their ability in handling human-recorded data with inherently non-Markovian behavior. However, their high performance comes with the… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  33. arXiv:2406.06959  [pdf, other

    cs.LG cs.AI

    Unleashing the Denoising Capability of Diffusion Prior for Solving Inverse Problems

    Authors: Jiawei Zhang, Jiaxin Zhuang, Cheng Jin, Gen Li, Yuantao Gu

    Abstract: The recent emergence of diffusion models has significantly advanced the precision of learnable priors, presenting innovative avenues for addressing inverse problems. Since inverse problems inherently entail maximum a posteriori estimation, previous works have endeavored to integrate diffusion priors into the optimization frameworks. However, prevailing optimization-based inverse algorithms primari… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  34. arXiv:2406.06007  [pdf, other

    cs.LG cs.CL cs.CV cs.CY

    CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models

    Authors: Peng Xia, Ze Chen, Juanxi Tian, Yangrui Gong, Ruibo Hou, Yue Xu, Zhenbang Wu, Zhiyuan Fan, Yiyang Zhou, Kangyu Zhu, Wenhao Zheng, Zhaoyang Wang, Xiao Wang, Xuchao Zhang, Chetan Bansal, Marc Niethammer, Junzhou Huang, Hongtu Zhu, Yun Li, Jimeng Sun, Zongyuan Ge, Gang Li, James Zou, Huaxiu Yao

    Abstract: Artificial intelligence has significantly impacted medical applications, particularly with the advent of Medical Large Vision Language Models (Med-LVLMs), sparking optimism for the future of automated and personalized healthcare. However, the trustworthiness of Med-LVLMs remains unverified, posing significant risks for future model deployment. In this paper, we introduce CARES and aim to comprehen… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  35. arXiv:2406.05940  [pdf, other

    cs.SE

    M2CVD: Multi-Model Collaboration for Code Vulnerability Detection

    Authors: Ziliang Wang, Ge Li, Jia Li, Yingfei Xiong, Jia Li, Zhi Jin

    Abstract: Large Language Models (LLMs) have strong capabilities in code comprehension, but fine-tuning costs and semantic alignment issues limit their project-specific optimization; conversely, code models such CodeBERT are easy to fine-tune, but it is often difficult to learn vulnerability semantics from complex code languages. To address these challenges, this paper introduces the Multi-Model Collaborativ… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  36. arXiv:2406.05639  [pdf, other

    cs.SE

    A Comprehensive Evaluation of Parameter-Efficient Fine-Tuning on Automated Program Repair

    Authors: Guochang Li, Chen Zhi, Jialiang Chen, Junxiao Han, Shuiguang Deng

    Abstract: Automated Program Repair (APR) aims to fix bugs by generating patches. And existing work has demonstrated that "pre-training and fine-tuning" paradigm enables Large Language Models (LLMs) improve fixing capabilities on APR. However, existing work mainly focuses on Full-Model Fine-Tuning (FMFT) for APR and limited research has been conducted on the execution-based evaluation of Parameter-Efficient… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  37. arXiv:2406.05422  [pdf, other

    cs.AI cs.RO

    Diffusion-based Reinforcement Learning for Dynamic UAV-assisted Vehicle Twins Migration in Vehicular Metaverses

    Authors: Yongju Tong, Jiawen Kang, Junlong Chen, Minrui Xu, Gaolei Li, Weiting Zhang, Xincheng Yan

    Abstract: Air-ground integrated networks can relieve communication pressure on ground transportation networks and provide 6G-enabled vehicular Metaverses services offloading in remote areas with sparse RoadSide Units (RSUs) coverage and downtown areas where users have a high demand for vehicular services. Vehicle Twins (VTs) are the digital twins of physical vehicles to enable more immersive and realistic v… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  38. arXiv:2406.03287  [pdf, other

    cs.NE cs.CL cs.LG

    SpikeLM: Towards General Spike-Driven Language Modeling via Elastic Bi-Spiking Mechanisms

    Authors: Xingrun Xing, Zheng Zhang, Ziyi Ni, Shitao Xiao, Yiming Ju, Siqi Fan, Yequan Wang, Jiajun Zhang, Guoqi Li

    Abstract: Towards energy-efficient artificial intelligence similar to the human brain, the bio-inspired spiking neural networks (SNNs) have advantages of biological plausibility, event-driven sparsity, and binary activation. Recently, large-scale language models exhibit promising generalization capability, making it a valuable issue to explore more general spike-driven models. However, the binary spikes in… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  39. arXiv:2406.01265  [pdf, other

    cs.DB

    The Dawn of Natural Language to SQL: Are We Fully Ready?

    Authors: Boyan Li, Yuyu Luo, Chengliang Chai, Guoliang Li, Nan Tang

    Abstract: Translating users' natural language questions into SQL queries (i.e., NL2SQL) significantly lowers the barriers to accessing relational databases. The emergence of Large Language Models has introduced a novel paradigm in NL2SQL tasks, enhancing capabilities dramatically. However, this raises a critical question: Are we fully prepared to deploy NL2SQL models in production? To address the posed qu… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 14 pages, 15 figures, 7 tables

  40. arXiv:2406.00510  [pdf, other

    cs.CV

    Learning Background Prompts to Discover Implicit Knowledge for Open Vocabulary Object Detection

    Authors: Jiaming Li, Jiacheng Zhang, Jichang Li, Ge Li, Si Liu, Liang Lin, Guanbin Li

    Abstract: Open vocabulary object detection (OVD) aims at seeking an optimal object detector capable of recognizing objects from both base and novel categories. Recent advances leverage knowledge distillation to transfer insightful knowledge from pre-trained large-scale vision-language models to the task of object detection, significantly generalizing the powerful capabilities of the detector to identify mor… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: CVPR2024

  41. arXiv:2406.00487  [pdf, other

    cs.LG cs.AI stat.ML

    Optimistic Rates for Learning from Label Proportions

    Authors: Gene Li, Lin Chen, Adel Javanmard, Vahab Mirrokni

    Abstract: We consider a weakly supervised learning problem called Learning from Label Proportions (LLP), where examples are grouped into ``bags'' and only the average label within each bag is revealed to the learner. We study various learning rules for LLP that achieve PAC learning guarantees for classification loss. We establish that the classical Empirical Proportional Risk Minimization (EPRM) learning ru… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: Accepted to COLT 2024. Comments welcome

  42. arXiv:2406.00480  [pdf, other

    cs.CV

    AlignSAM: Aligning Segment Anything Model to Open Context via Reinforcement Learning

    Authors: Duojun Huang, Xinyu Xiong, Jie Ma, Jichang Li, Zequn Jie, Lin Ma, Guanbin Li

    Abstract: Powered by massive curated training data, Segment Anything Model (SAM) has demonstrated its impressive generalization capabilities in open-world scenarios with the guidance of prompts. However, the vanilla SAM is class agnostic and heavily relies on user-provided prompts to segment objects of interest. Adapting this method to diverse tasks is crucial for accurate target identification and to avoid… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: CVPR2024

  43. arXiv:2405.20073  [pdf, other

    cs.IT eess.SP

    Power Allocation for Cell-Free Massive MIMO ISAC Systems with OTFS Signal

    Authors: Yifei Fan, Shaochuan Wu, Xixi Bi, Guoyu Li

    Abstract: Applying integrated sensing and communication (ISAC) to a cell-free massive multiple-input multiple-output (CF mMIMO) architecture has attracted increasing attention. This approach equips CF mMIMO networks with sensing capabilities and resolves the problem of unreliable service at cell edges in conventional cellular networks. However, existing studies on CF-ISAC systems have focused on the applica… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: This work is submitted to IEEE for possible publication

  44. arXiv:2405.19856  [pdf, other

    cs.CL cs.SE

    DevEval: A Manually-Annotated Code Generation Benchmark Aligned with Real-World Code Repositories

    Authors: Jia Li, Ge Li, Yunfei Zhao, Yongmin Li, Huanyu Liu, Hao Zhu, Lecheng Wang, Kaibo Liu, Zheng Fang, Lanshen Wang, Jiazheng Ding, Xuanming Zhang, Yuqi Zhu, Yihong Dong, Zhi Jin, Binhua Li, Fei Huang, Yongbin Li

    Abstract: How to evaluate the coding abilities of Large Language Models (LLMs) remains an open question. We find that existing benchmarks are poorly aligned with real-world code repositories and are insufficient to evaluate the coding abilities of LLMs. To address the knowledge gap, we propose a new benchmark named DevEval, which has three advances. (1) DevEval aligns with real-world repositories in multi… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Accepted by the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). arXiv admin note: substantial text overlap with arXiv:2404.00599, arXiv:2401.06401

  45. arXiv:2405.19465  [pdf, other

    cs.CV

    RAP: Efficient Text-Video Retrieval with Sparse-and-Correlated Adapter

    Authors: Meng Cao, Haoran Tang, Jinfa Huang, Peng Jin, Can Zhang, Ruyang Liu, Long Chen, Xiaodan Liang, Li Yuan, Ge Li

    Abstract: Text-Video Retrieval (TVR) aims to align relevant video content with natural language queries. To date, most state-of-the-art TVR methods learn image-to-video transfer learning based on large-scale pre-trained visionlanguage models (e.g., CLIP). However, fully fine-tuning these pre-trained models for TVR incurs prohibitively expensive computation costs. To this end, we propose to conduct efficient… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Accepted by ACL 2024 Findings

  46. arXiv:2405.19360  [pdf, other

    cs.CR cs.AI

    ART: Automatic Red-teaming for Text-to-Image Models to Protect Benign Users

    Authors: Guanlin Li, Kangjie Chen, Shudong Zhang, Jie Zhang, Tianwei Zhang

    Abstract: Large-scale pre-trained generative models are taking the world by storm, due to their abilities in generating creative content. Meanwhile, safeguards for these generative models are developed, to protect users' rights and safety, most of which are designed for large language models. Existing methods primarily focus on jailbreak and adversarial attacks, which mainly evaluate the model's safety unde… ▽ More

    Submitted 17 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

  47. arXiv:2405.19266  [pdf, other

    cs.CL

    PediatricsGPT: Large Language Models as Chinese Medical Assistants for Pediatric Applications

    Authors: Dingkang Yang, Jinjie Wei, Dongling Xiao, Shunli Wang, Tong Wu, Gang Li, Mingcheng Li, Shuaibing Wang, Jiawei Chen, Yue Jiang, Qingyao Xu, Ke Li, Peng Zhai, Lihua Zhang

    Abstract: Developing intelligent pediatric consultation systems offers promising prospects for improving diagnostic efficiency, especially in China, where healthcare resources are scarce. Despite recent advances in Large Language Models (LLMs) for Chinese medicine, their performance is sub-optimal in pediatric applications due to inadequate instruction data and vulnerable training procedures. To address the… ▽ More

    Submitted 3 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: A Technical Report on a Chinese Medical Large Language Model

  48. arXiv:2405.17229  [pdf, other

    cs.HC

    InsigHTable: Insight-driven Hierarchical Table Visualization with Reinforcement Learning

    Authors: Guozheng Li, Peng He, Xinyu Wang, Runfei Li, Chi Harold Liu, Chuangxin Ou, Dong He, Guoren Wang

    Abstract: Embedding visual representations within original hierarchical tables can mitigate additional cognitive load stemming from the division of users' attention. The created hierarchical table visualizations can help users understand and explore complex data with multi-level attributes. However, because of many options available for transforming hierarchical tables and selecting subsets for embedding, t… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  49. arXiv:2405.16466  [pdf, other

    cs.NE

    High-Performance Temporal Reversible Spiking Neural Networks with $O(L)$ Training Memory and $O(1)$ Inference Cost

    Authors: JiaKui Hu, Man Yao, Xuerui Qiu, Yuhong Chou, Yuxuan Cai, Ning Qiao, Yonghong Tian, Bo XU, Guoqi Li

    Abstract: Multi-timestep simulation of brain-inspired Spiking Neural Networks (SNNs) boost memory requirements during training and increase inference energy cost. Current training methods cannot simultaneously solve both training and inference dilemmas. This work proposes a novel Temporal Reversible architecture for SNNs (T-RevSNN) to jointly address the training and inference challenges by altering the for… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: Accepted by ICML2024

  50. arXiv:2405.14861  [pdf, ps, other

    cs.LG cs.AI math.ST stat.ML

    Adapting to Unknown Low-Dimensional Structures in Score-Based Diffusion Models

    Authors: Gen Li, Yuling Yan

    Abstract: This paper investigates score-based diffusion models when the underlying target distribution is concentrated on or near low-dimensional manifolds within the higher-dimensional space in which they formally reside, a common characteristic of natural image distributions. Despite previous efforts to understand the data generation process of diffusion models, existing theoretical support remains highly… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.