Skip to main content

Showing 1–50 of 38,638 results for author: Chen

  1. arXiv:2407.11965  [pdf, other

    cs.CV

    UrbanWorld: An Urban World Model for 3D City Generation

    Authors: Yu Shang, Jiansheng Chen, Hangyu Fan, Jingtao Ding, Jie Feng, Yong Li

    Abstract: Cities, as the most fundamental environment of human life, encompass diverse physical elements such as buildings, roads and vegetation with complex interconnection. Crafting realistic, interactive 3D urban environments plays a crucial role in constructing AI agents capable of perceiving, decision-making, and acting like humans in real-world environments. However, creating high-fidelity 3D urban en… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 11 pages

  2. arXiv:2407.11963  [pdf, other

    cs.CL

    NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?

    Authors: Mo Li, Songyang Zhang, Yunxin Liu, Kai Chen

    Abstract: In evaluating the long-context capabilities of large language models (LLMs), identifying content relevant to a user's query from original long documents is a crucial prerequisite for any LLM to answer questions based on long text. We present NeedleBench, a framework consisting of a series of progressively more challenging tasks for assessing bilingual long-context capabilities, spanning multiple l… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  3. arXiv:2407.11890  [pdf, other

    cs.CV

    DepGAN: Leveraging Depth Maps for Handling Occlusions and Transparency in Image Composition

    Authors: Amr Ghoneim, Jiju Poovvancheri, Yasushi Akiyama, Dong Chen

    Abstract: Image composition is a complex task which requires a lot of information about the scene for an accurate and realistic composition, such as perspective, lighting, shadows, occlusions, and object interactions. Previous methods have predominantly used 2D information for image composition, neglecting the potentials of 3D spatial information. In this work, we propose DepGAN, a Generative Adversarial Ne… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 10 pages, 13 figures

  4. arXiv:2407.11784  [pdf, other

    cs.AI cs.CV cs.LG

    Data-Juicer Sandbox: A Comprehensive Suite for Multimodal Data-Model Co-development

    Authors: Daoyuan Chen, Haibin Wang, Yilun Huang, Ce Ge, Yaliang Li, Bolin Ding, Jingren Zhou

    Abstract: The emergence of large-scale multi-modal generative models has drastically advanced artificial intelligence, introducing unprecedented levels of performance and functionality. However, optimizing these models remains challenging due to historically isolated paths of model-centric and data-centric developments, leading to suboptimal outcomes and inefficient resource utilization. In response, we pre… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 26 pages, 9 figures, 5 tables

  5. arXiv:2407.11781  [pdf, other

    cs.CV

    SlingBAG: Sliding ball adaptive growth algorithm with differentiable radiation enables super-efficient iterative 3D photoacoustic image reconstruction

    Authors: Shuang Li, Yibing Wang, Jian Gao, Chulhong Kim, Seongwook Choi, Yu Zhang, Qian Chen, Yao Yao, Changhui Li

    Abstract: High-quality 3D photoacoustic imaging (PAI) reconstruction under sparse view or limited view has long been challenging. Traditional 3D iterative-based reconstruction methods suffer from both slow speed and high memory consumption. Recently, in computer graphics, the differentiable rendering has made significant progress, particularly with the rise of 3D Gaussian Splatting. Inspired by these, we in… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  6. arXiv:2407.11766  [pdf, ps, other

    cs.CL cs.AI

    Vectoring Languages

    Authors: Joseph Chen

    Abstract: Recent breakthroughs in large language models (LLM) have stirred up global attention, and the research has been accelerating non-stop since then. Philosophers and psychologists have also been researching the structure of language for decades, but they are having a hard time finding a theory that directly benefits from the breakthroughs of LLMs. In this article, we propose a novel structure of lang… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 12 pages including references

  7. arXiv:2407.11750  [pdf, other

    cs.CV

    Cycle Contrastive Adversarial Learning for Unsupervised image Deraining

    Authors: Chen Zhao, Weiling Cai, ChengWei Hu, Zheng Yuan

    Abstract: To tackle the difficulties in fitting paired real-world data for single image deraining (SID), recent unsupervised methods have achieved notable success. However, these methods often struggle to generate high-quality, rain-free images due to a lack of attention to semantic representation and image content, resulting in ineffective separation of content from the rain layer. In this paper, we propos… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  8. arXiv:2407.11730  [pdf, other

    cs.CV

    Monocular Occupancy Prediction for Scalable Indoor Scenes

    Authors: Hongxiao Yu, Yuqi Wang, Yuntao Chen, Zhaoxiang Zhang

    Abstract: Camera-based 3D occupancy prediction has recently garnered increasing attention in outdoor driving scenes. However, research in indoor scenes remains relatively unexplored. The core differences in indoor scenes lie in the complexity of scene scale and the variance in object size. In this paper, we propose a novel method, named ISO, for predicting indoor scene occupancy using monocular images. ISO… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  9. arXiv:2407.11717  [pdf, other

    cs.CV

    Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language Large Models

    Authors: Chen Ju, Haicheng Wang, Haozhe Cheng, Xu Chen, Zhonghua Zhai, Weilin Huang, Jinsong Lan, Shuai Xiao, Bo Zheng

    Abstract: Vision-Language Large Models (VLMs) recently become primary backbone of AI, due to the impressive performance. However, their expensive computation costs, i.e., throughput and delay, impede potentials in the real-world scenarios. To achieve acceleration for VLMs, most existing methods focus on the model perspective: pruning, distillation, quantization, but completely overlook the data-perspective… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: ECCV 2024. The first two authors share the same contribution. arXiv admin note: substantial text overlap with arXiv:2312.07408

  10. arXiv:2407.11705  [pdf, other

    cs.RO eess.SP

    Snail-Radar: A large-scale diverse dataset for the evaluation of 4D-radar-based SLAM systems

    Authors: Jianzhu Huai, Binliang Wang, Yuan Zhuang, Yiwen Chen, Qipeng Li, Yulong Han, Charles Toth

    Abstract: 4D radars are increasingly favored for odometry and mapping of autonomous systems due to their robustness in harsh weather and dynamic environments. Existing datasets, however, often cover limited areas and are typically captured using a single platform. To address this gap, we present a diverse large-scale dataset specifically designed for 4D radar-based localization and mapping. This dataset was… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 11 pages, 4 figures, 5 tables

  11. arXiv:2407.11700  [pdf, other

    cs.CV eess.IV

    Rate-Distortion-Cognition Controllable Versatile Neural Image Compression

    Authors: Jinming Liu, Ruoyu Feng, Yunpeng Qi, Qiuyu Chen, Zhibo Chen, Wenjun Zeng, Xin Jin

    Abstract: Recently, the field of Image Coding for Machines (ICM) has garnered heightened interest and significant advances thanks to the rapid progress of learning-based techniques for image compression and analysis. Previous studies often require training separate codecs to support various bitrate levels, machine tasks, and networks, thus lacking both flexibility and practicality. To address these challeng… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: ECCV2024

  12. arXiv:2407.11699  [pdf, other

    cs.CV

    Relation DETR: Exploring Explicit Position Relation Prior for Object Detection

    Authors: Xiuquan Hou, Meiqin Liu, Senlin Zhang, Ping Wei, Badong Chen, Xuguang Lan

    Abstract: This paper presents a general scheme for enhancing the convergence and performance of DETR (DEtection TRansformer). We investigate the slow convergence problem in transformers from a new perspective, suggesting that it arises from the self-attention that introduces no structural bias over inputs. To address this issue, we explore incorporating position relation prior as attention bias to augment o… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  13. arXiv:2407.11691  [pdf, other

    cs.CV

    VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models

    Authors: Haodong Duan, Junming Yang, Yuxuan Qiao, Xinyu Fang, Lin Chen, Yuan Liu, Xiaoyi Dong, Yuhang Zang, Pan Zhang, Jiaqi Wang, Dahua Lin, Kai Chen

    Abstract: We present VLMEvalKit: an open-source toolkit for evaluating large multi-modality models based on PyTorch. The toolkit aims to provide a user-friendly and comprehensive framework for researchers and developers to evaluate existing multi-modality models and publish reproducible evaluation results. In VLMEvalKit, we implement over 70 different large multi-modality models, including both proprietary… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  14. arXiv:2407.11615  [pdf, other

    cs.LG cs.AI

    Graph Dimension Attention Networks for Enterprise Credit Assessment

    Authors: Shaopeng Wei, Beni Egressy, Xingyan Chen, Yu Zhao, Fuzhen Zhuang, Roger Wattenhofer, Gang Kou

    Abstract: Enterprise credit assessment is critical for evaluating financial risk, and Graph Neural Networks (GNNs), with their advanced capability to model inter-entity relationships, are a natural tool to get a deeper understanding of these financial networks. However, existing GNN-based methodologies predominantly emphasize entity-level attention mechanisms for contagion risk aggregation, often overlookin… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  15. arXiv:2407.11585  [pdf, other

    cs.CV cs.AI

    QVD: Post-training Quantization for Video Diffusion Models

    Authors: Shilong Tian, Hong Chen, Chengtao Lv, Yu Liu, Jinyang Guo, Xianglong Liu, Shengxi Li, Hao Yang, Tao Xie

    Abstract: Recently, video diffusion models (VDMs) have garnered significant attention due to their notable advancements in generating coherent and realistic video content. However, processing multiple frame features concurrently, coupled with the considerable model size, results in high latency and extensive memory consumption, hindering their broader application. Post-training quantization (PTQ) is an effe… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  16. arXiv:2407.11569  [pdf, other

    cs.CV

    SFPNet: Sparse Focal Point Network for Semantic Segmentation on General LiDAR Point Clouds

    Authors: Yanbo Wang, Wentao Zhao, Chuan Cao, Tianchen Deng, Jingchuan Wang, Weidong Chen

    Abstract: Although LiDAR semantic segmentation advances rapidly, state-of-the-art methods often incorporate specifically designed inductive bias derived from benchmarks originating from mechanical spinning LiDAR. This can limit model generalizability to other kinds of LiDAR technologies and make hyperparameter tuning more complex. To tackle these issues, we propose a generalized framework to accommodate var… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  17. arXiv:2407.11556  [pdf, other

    cs.DB

    LITS: An Optimized Learned Index for Strings (An Extended Version)

    Authors: Yifan Yang, Shimin Chen

    Abstract: Index is an important component in database systems. Learned indexes have been shown to outperform traditional tree-based index structures for fixed-sized integer or floating point keys. However, the application of the learned solution to variable-length string keys is under-researched. Our experiments show that existing learned indexes for strings fail to outperform traditional string indexes, su… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  18. arXiv:2407.11536  [pdf, other

    cs.CL cs.AI

    Fine-Tuning Medical Language Models for Enhanced Long-Contextual Understanding and Domain Expertise

    Authors: Qimin Yang, Rongsheng Wang, Jiexin Chen, Runqi Su, Tao Tan

    Abstract: Large Language Models (LLMs) have been widely applied in various professional fields. By fine-tuning the models using domain specific question and answer datasets, the professional domain knowledge and Q\&A abilities of these models have significantly improved, for example, medical professional LLMs that use fine-tuning of doctor-patient Q\&A data exhibit extraordinary disease diagnostic abilities… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 5 pages, 1 figure. Accepted by the Workshop on Long-Context Foundation Models (LCFM) at ICML 2024

  19. arXiv:2407.11505  [pdf, other

    cs.CV

    Haze-Aware Attention Network for Single-Image Dehazing

    Authors: Lihan Tong, Yun Liu, Weijia Li, Liyuan Chen, Erkang Chen

    Abstract: Single-image dehazing is a pivotal challenge in computer vision that seeks to remove haze from images and restore clean background details. Recognizing the limitations of traditional physical model-based methods and the inefficiencies of current attention-based solutions, we propose a new dehazing network combining an innovative Haze-Aware Attention Module (HAAM) with a Multiscale Frequency Enhanc… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 13 pages, 6 figures

    Report number: applsci-3022856 MSC Class: 68I1C; 68I8P ACM Class: I.4.3; I.4.9

  20. arXiv:2407.11484  [pdf, other

    cs.AI cs.CL

    The Oscars of AI Theater: A Survey on Role-Playing with Language Models

    Authors: Nuo Chen, Y. Wang, Yang Deng, Jia Li

    Abstract: This survey explores the burgeoning field of role-playing with language models, focusing on their development from early persona-based models to advanced character-driven simulations facilitated by Large Language Models (LLMs). Initially confined to simple persona consistency due to limited model capabilities, role-playing tasks have now expanded to embrace complex character portrayals involving c… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 28 pages

  21. arXiv:2407.11481  [pdf, other

    cs.LG cs.AI eess.SP

    Multi-Channel Masked Autoencoder and Comprehensive Evaluations for Reconstructing 12-Lead ECG from Arbitrary Single-Lead ECG

    Authors: Jiarong Chen, Wanqing Wu, Tong Liu, Shenda Hong

    Abstract: In the context of cardiovascular diseases (CVD) that exhibit an elevated prevalence and mortality, the electrocardiogram (ECG) is a popular and standard diagnostic tool for doctors, commonly utilizing a 12-lead configuration in clinical practice. However, the 10 electrodes placed on the surface would cause a lot of inconvenience and discomfort, while the rapidly advancing wearable devices adopt th… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted by KDD-AIDSH 2024

  22. arXiv:2407.11477  [pdf, other

    cs.LG cs.AI

    XTraffic: A Dataset Where Traffic Meets Incidents with Explainability and More

    Authors: Xiaochuan Gou, Ziyue Li, Tian Lan, Junpeng Lin, Zhishuai Li, Bingyu Zhao, Chen Zhang, Di Wang, Xiangliang Zhang

    Abstract: Long-separated research has been conducted on two highly correlated tracks: traffic and incidents. Traffic track witnesses complicating deep learning models, e.g., to push the prediction a few percent more accurate, and the incident track only studies the incidents alone, e.g., to infer the incident risk. We, for the first time, spatiotemporally aligned the two tracks in a large-scale region (16,9… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  23. arXiv:2407.11459  [pdf, other

    eess.SP cs.LG

    RIMformer: An End-to-End Transformer for FMCW Radar Interference Mitigation

    Authors: Ziang Zhang, Guangzhi Chen, Youlong Weng, Shunchuan Yang, Zhiyu Jia, Jingxuan Chen

    Abstract: Frequency-modulated continuous-wave (FMCW) radar plays a pivotal role in the field of remote sensing. The increasing degree of FMCW radar deployment has increased the mutual interference, which weakens the detection capabilities of radars and threatens reliability and safety of systems. In this paper, a novel FMCW radar interference mitigation (RIM) method, termed as RIMformer, is proposed by usin… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  24. arXiv:2407.11448  [pdf, other

    cs.CV

    cDP-MIL: Robust Multiple Instance Learning via Cascaded Dirichlet Process

    Authors: Yihang Chen, Tsai Hor Chan, Guosheng Yin, Yuming Jiang, Lequan Yu

    Abstract: Multiple instance learning (MIL) has been extensively applied to whole slide histopathology image (WSI) analysis. The existing aggregation strategy in MIL, which primarily relies on the first-order distance (e.g., mean difference) between instances, fails to accurately approximate the true feature distribution of each instance, leading to biased slide-level representations. Moreover, the scarcity… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  25. Incremental high average-utility itemset mining: survey and challenges

    Authors: Jing Chen, Shengyi Yang, Weiping Ding, Peng Li, Aijun Liu, Hongjun Zhang, Tian Li

    Abstract: The High Average Utility Itemset Mining (HAUIM) technique, a variation of High Utility Itemset Mining (HUIM), uses the average utility of the itemsets. Historically, most HAUIM algorithms were designed for static databases. However, practical applications like market basket analysis and business decision-making necessitate regular updates of the database with new transactions. As a result, researc… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 25 pages, 23 figures

  26. arXiv:2407.11421  [pdf, other

    cs.CL

    States Hidden in Hidden States: LLMs Emerge Discrete State Representations Implicitly

    Authors: Junhao Chen, Shengding Hu, Zhiyuan Liu, Maosong Sun

    Abstract: Large Language Models (LLMs) exhibit various emergent abilities. Among these abilities, some might reveal the internal working mechanisms of models. In this paper, we uncover a novel emergent capability in models: the intrinsic ability to perform extended sequences of calculations without relying on chain-of-thought step-by-step solutions. Remarkably, the most advanced models can directly output t… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  27. arXiv:2407.11420  [pdf, other

    cs.RO

    iKalibr: Unified Targetless Spatiotemporal Calibration for Resilient Integrated Inertial Systems

    Authors: Shuolong Chen, Xingxing Li, Shengyu Li, Yuxuan Zhou, Xiaoteng Yang

    Abstract: The integrated inertial system, typically integrating an IMU and an exteroceptive sensor such as radar, LiDAR, and camera, has been widely accepted and applied in modern robotic applications for ego-motion estimation, motion control, or autonomous exploration. To improve system accuracy, robustness, and further usability, both multiple and various sensors are generally resiliently integrated, whic… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  28. arXiv:2407.11387  [pdf, other

    cs.HC

    A Framework for Evaluating Appropriateness, Trustworthiness, and Safety in Mental Wellness AI Chatbots

    Authors: Lucia Chen, David A. Preece, Pilleriin Sikka, James J. Gross, Ben Krause

    Abstract: Large language model (LLM) chatbots are susceptible to biases and hallucinations, but current evaluations of mental wellness technologies lack comprehensive case studies to evaluate their practical applications. Here, we address this gap by introducing the MHealth-EVAL framework, a new role-play based interactive evaluation method designed specifically for evaluating the appropriateness, trustwort… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  29. arXiv:2407.11380  [pdf, other

    cs.CV cs.LG

    NAMER: Non-Autoregressive Modeling for Handwritten Mathematical Expression Recognition

    Authors: Chenyu Liu, Jia Pan, Jinshui Hu, Baocai Yin, Bing Yin, Mingjun Chen, Cong Liu, Jun Du, Qingfeng Liu

    Abstract: Recently, Handwritten Mathematical Expression Recognition (HMER) has gained considerable attention in pattern recognition for its diverse applications in document understanding. Current methods typically approach HMER as an image-to-sequence generation task within an autoregressive (AR) encoder-decoder framework. However, these approaches suffer from several drawbacks: 1) a lack of overall languag… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  30. arXiv:2407.11364  [pdf, ps, other

    cs.DS cs.LG

    Learning-augmented Maximum Independent Set

    Authors: Vladimir Braverman, Prathamesh Dharangutte, Vihan Shah, Chen Wang

    Abstract: We study the Maximum Independent Set (MIS) problem on general graphs within the framework of learning-augmented algorithms. The MIS problem is known to be NP-hard and is also NP-hard to approximate to within a factor of $n^{1-δ}$ for any $δ>0$. We show that we can break this barrier in the presence of an oracle obtained through predictions from a machine learning model that answers vertex membersh… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: APPROX 2024

  31. arXiv:2407.11321  [pdf, other

    cs.CV

    TCFormer: Visual Recognition via Token Clustering Transformer

    Authors: Wang Zeng, Sheng Jin, Lumin Xu, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo, Xiaogang Wang

    Abstract: Transformers are widely used in computer vision areas and have achieved remarkable success. Most state-of-the-art approaches split images into regular grids and represent each grid region with a vision token. However, fixed token distribution disregards the semantic meaning of different image regions, resulting in sub-optimal performance. To address this issue, we propose the Token Clustering Tran… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  32. arXiv:2407.11300  [pdf, other

    cs.CV cs.AI

    Large Vision-Language Models as Emotion Recognizers in Context Awareness

    Authors: Yuxuan Lei, Dingkang Yang, Zhaoyu Chen, Jiawei Chen, Peng Zhai, Lihua Zhang

    Abstract: Context-aware emotion recognition (CAER) is a complex and significant task that requires perceiving emotions from various contextual cues. Previous approaches primarily focus on designing sophisticated architectures to extract emotional cues from images. However, their knowledge is confined to specific training datasets and may reflect the subjective emotional biases of the annotators. Furthermore… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  33. arXiv:2407.11280  [pdf, other

    cs.AI cs.CE cs.DB cs.LG

    Intelligent Cross-Organizational Process Mining: A Survey and New Perspectives

    Authors: Yiyuan Yang, Zheshun Wu, Yong Chu, Zhenghua Chen, Zenglin Xu, Qingsong Wen

    Abstract: Process mining, as a high-level field in data mining, plays a crucial role in enhancing operational efficiency and decision-making across organizations. In this survey paper, we delve into the growing significance and ongoing trends in the field of process mining, advocating a specific viewpoint on its contents, application, and development in modern businesses and process management, particularly… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Under review; 13 pages, 7 figures, 2 tables

  34. arXiv:2407.11277  [pdf, other

    cs.CL eess.AS

    Target conversation extraction: Source separation using turn-taking dynamics

    Authors: Tuochao Chen, Qirui Wang, Bohan Wu, Malek Itani, Emre Sefik Eskimez, Takuya Yoshioka, Shyamnath Gollakota

    Abstract: Extracting the speech of participants in a conversation amidst interfering speakers and noise presents a challenging problem. In this paper, we introduce the novel task of target conversation extraction, where the goal is to extract the audio of a target conversation based on the speaker embedding of one of its participants. To accomplish this, we propose leveraging temporal patterns inherent in h… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by Interspeech 2024

  35. arXiv:2407.11268  [pdf, other

    stat.ML cs.CE cs.LG

    Heterogenous Multi-Source Data Fusion Through Input Mapping and Latent Variable Gaussian Process

    Authors: Yigitcan Comlek, Sandipp Krishnan Ravi, Piyush Pandita, Sayan Ghosh, Liping Wang, Wei Chen

    Abstract: Artificial intelligence and machine learning frameworks have served as computationally efficient mapping between inputs and outputs for engineering problems. These mappings have enabled optimization and analysis routines that have warranted superior designs, ingenious material systems and optimized manufacturing processes. A common occurrence in such modeling endeavors is the existence of multiple… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 20 Pages,9 Figures, Data is available per request

  36. arXiv:2407.11177  [pdf, ps, other

    cs.DS

    Trace reconstruction from local statistical queries

    Authors: Xi Chen, Anindya De, Chin Ho Lee, Rocco A. Servedio

    Abstract: The goal of trace reconstruction is to reconstruct an unknown $n$-bit string $x$ given only independent random traces of $x$, where a random trace of $x$ is obtained by passing $x$ through a deletion channel. A Statistical Query (SQ) algorithm for trace reconstruction is an algorithm which can only access statistical information about the distribution of random traces of $x$ rather than individual… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: RANDOM 2024

  37. arXiv:2407.11162  [pdf, other

    cs.CV

    Integrating Amortized Inference with Diffusion Models for Learning Clean Distribution from Corrupted Images

    Authors: Yifei Wang, Weimin Bai, Weijian Luo, Wenzheng Chen, He Sun

    Abstract: Diffusion models (DMs) have emerged as powerful generative models for solving inverse problems, offering a good approximation of prior distributions of real-world image data. Typically, diffusion models rely on large-scale clean signals to accurately learn the score functions of ground truth clean image distributions. However, such a requirement for large amounts of clean data is often impractical… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  38. arXiv:2407.11098  [pdf, other

    cs.LG cs.AI

    Inertial Confinement Fusion Forecasting via LLMs

    Authors: Mingkai Chen, Taowen Wang, James Chenhao Liang, Chuan Liu, Chunshu Wu, Qifan Wang, Ying Nian Wu, Michael Huang, Chuang Ren, Ang Li, Tong Geng, Dongfang Liu

    Abstract: Controlled fusion energy is deemed pivotal for the advancement of human civilization. In this study, we introduce $\textbf{Fusion-LLM}$, a novel integration of Large Language Models (LLMs) with classical reservoir computing paradigms tailored to address challenges in Inertial Confinement Fusion ($\texttt{ICF}$). Our approach offers several key contributions: Firstly, we propose the… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  39. arXiv:2407.11096  [pdf, other

    cs.LG cs.AI

    Static and multivariate-temporal attentive fusion transformer for readmission risk prediction

    Authors: Zhe Sun, Runzhi Li, Jing Wang, Gang Chen, Siyu Yan, Lihong Ma

    Abstract: Background: Accurate short-term readmission prediction of ICU patients is significant in improving the efficiency of resource assignment by assisting physicians in making discharge decisions. Clinically, both individual static static and multivariate temporal data collected from ICU monitors play critical roles in short-term readmission prediction. Informative static and multivariate temporal feat… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  40. arXiv:2407.11085  [pdf, other

    cs.LG cs.AI

    SpreadFGL: Edge-Client Collaborative Federated Graph Learning with Adaptive Neighbor Generation

    Authors: Luying Zhong, Yueyang Pi, Zheyi Chen, Zhengxin Yu, Wang Miao, Xing Chen, Geyong Min

    Abstract: Federated Graph Learning (FGL) has garnered widespread attention by enabling collaborative training on multiple clients for semi-supervised classification tasks. However, most existing FGL studies do not well consider the missing inter-client topology information in real-world scenarios, causing insufficient feature aggregation of multi-hop neighbor clients during model training. Moreover, the cla… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  41. arXiv:2407.11083  [pdf, other

    cs.LG

    Empowering Graph Invariance Learning with Deep Spurious Infomax

    Authors: Tianjun Yao, Yongqiang Chen, Zhenhao Chen, Kai Hu, Zhiqiang Shen, Kun Zhang

    Abstract: Recently, there has been a surge of interest in developing graph neural networks that utilize the invariance principle on graphs to generalize the out-of-distribution (OOD) data. Due to the limited knowledge about OOD data, existing approaches often pose assumptions about the correlation strengths of the underlying spurious features and the target labels. However, this prior is often unavailable a… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: ICML2024 camera-ready version

    ACM Class: I.2.6

  42. arXiv:2407.11079  [pdf, ps, other

    eess.SP cs.IT

    One-Bit MIMO Detection: From Global Maximum-Likelihood Detector to Amplitude Retrieval Approach

    Authors: Mingjie Shao, Wei-Kun Chen, Cheng-Yang Yu, Ya-Feng Liu, Wing-Kin Ma

    Abstract: As communication systems advance towards the future 6G era, the incorporation of large-scale antenna arrays in base stations (BSs) presents challenges such as increased hardware costs and energy consumption. To address these issues, the use of one-bit analog-to-digital converters (ADCs)/digital-to-analog converters (DACs) has gained significant attentions. This paper focuses on one-bit multiple-in… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  43. arXiv:2407.11073  [pdf, other

    cs.CR cs.CV cs.LG

    SemiAdv: Query-Efficient Black-Box Adversarial Attack with Unlabeled Images

    Authors: Mingyuan Fan, Yang Liu, Cen Chen, Ximeng Liu

    Abstract: Adversarial attack has garnered considerable attention due to its profound implications for the secure deployment of robots in sensitive security scenarios. To potentially push for advances in the field, this paper studies the adversarial attack in the black-box setting and proposes an unlabeled data-driven adversarial attack method, called SemiAdv. Specifically, SemiAdv achieves the following bre… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  44. arXiv:2407.11071  [pdf, other

    cs.LG cs.AI cs.AR

    MonoSparse-CAM: Harnessing Monotonicity and Sparsity for Enhanced Tree Model Processing on CAMs

    Authors: Tergel Molom-Ochir, Brady Taylor, Hai, Li, Yiran Chen

    Abstract: Despite significant advancements in AI driven by neural networks, tree-based machine learning (TBML) models excel on tabular data. These models exhibit promising energy efficiency, and high performance, particularly when accelerated on analog content-addressable memory (aCAM) arrays. However, optimizing their hardware deployment, especially in leveraging TBML model structure and aCAM circuitry, re… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  45. arXiv:2407.11062  [pdf, other

    cs.LG cs.AI cs.CL

    EfficientQAT: Efficient Quantization-Aware Training for Large Language Models

    Authors: Mengzhao Chen, Wenqi Shao, Peng Xu, Jiahao Wang, Peng Gao, Kaipeng Zhang, Yu Qiao, Ping Luo

    Abstract: Large language models (LLMs) are integral to modern natural language processing and artificial intelligence. However, they face challenges in managing their significant memory requirements. Although quantization-aware training (QAT) offers a solution by reducing memory consumption through low-bit representations with minimal accuracy loss, it demands substantial training resources to optimize mode… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: An efficient and effective quantization technical to improve the performance of low-bits LMMs and LVLMs

  46. arXiv:2407.11055  [pdf, other

    cs.LG cs.SD eess.AS

    Knowledge boosting during low-latency inference

    Authors: Vidya Srinivas, Malek Itani, Tuochao Chen, Emre Sefik Eskimez, Takuya Yoshioka, Shyamnath Gollakota

    Abstract: Models for low-latency, streaming applications could benefit from the knowledge capacity of larger models, but edge devices cannot run these models due to resource constraints. A possible solution is to transfer hints during inference from a large model running remotely to a small model running on-device. However, this incurs a communication delay that breaks real-time requirements and does not gu… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: Accepted by Interspeech 2024

  47. Sampling and active learning methods for network reliability estimation using K-terminal spanning tree

    Authors: Chen Ding, Pengfei Wei, Yan Shi, Jinxing Liu, Matteo Broggi, Michael Beer

    Abstract: Network reliability analysis remains a challenge due to the increasing size and complexity of networks. This paper presents a novel sampling method and an active learning method for efficient and accurate network reliability estimation under node failure and edge failure scenarios. The proposed sampling method adopts Monte Carlo technique to sample component lifetimes and the K-terminal spanning t… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Journal ref: Reliability Engineering & System Safety (2024) 110309

  48. arXiv:2407.11033  [pdf, other

    cs.LG cs.CL

    Hadamard Adapter: An Extreme Parameter-Efficient Adapter Tuning Method for Pre-trained Language Models

    Authors: Yuyan Chen, Qiang Fu, Ge Fan, Lun Du, Jian-Guang Lou, Shi Han, Dongmei Zhang, Zhixu Li, Yanghua Xiao

    Abstract: Recent years, Pre-trained Language models (PLMs) have swept into various fields of artificial intelligence and achieved great success. However, most PLMs, such as T5 and GPT3, have a huge amount of parameters, fine-tuning them is often expensive and time consuming, and storing them takes up a lot of space. Therefore, it is necessary to adopt a parameter-efficient approach to reduce parameters of P… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Accepted to CIKM 2023 (Long Paper)

  49. arXiv:2407.11030  [pdf, other

    cs.LG cs.AI cs.CL

    DLO: Dynamic Layer Operation for Efficient Vertical Scaling of LLMs

    Authors: Zhen Tan, Daize Dong, Xinyu Zhao, Jie Peng, Yu Cheng, Tianlong Chen

    Abstract: In this paper, we introduce Dynamic Layer Operations (DLO), a novel approach for vertically scaling transformer-based Large Language Models (LLMs) by dynamically expanding, activating, or skipping layers using a sophisticated routing policy based on layerwise feature similarity. Unlike traditional Mixture-of-Experts (MoE) methods that focus on extending the model width, our approach targets model… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  50. arXiv:2407.11018  [pdf, other

    cs.NI eess.SP

    Online Multi-Task Offloading for Semantic-Aware Edge Computing Systems

    Authors: Xuyang Chen, Qu Luo, Gaojie Chen, Daquan Feng, Yao Sun

    Abstract: Mobile edge computing (MEC) provides low-latency offloading solutions for computationally intensive tasks, effectively improving the computing efficiency and battery life of mobile devices. However, for data-intensive tasks or scenarios with limited uplink bandwidth, network congestion might occur due to massive simultaneous offloading nodes, increasing transmission latency and affecting task perf… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.