Skip to main content

Showing 1–50 of 517 results for author: Zheng, X

  1. arXiv:2407.02517  [pdf, other

    cs.RO cs.AI

    CAV-AHDV-CAV: Mitigating Traffic Oscillations for CAVs through a Novel Car-Following Structure and Reinforcement Learning

    Authors: Xianda Chen, PakHin Tiu, Yihuai Zhang, Xinhu Zheng, Meixin Zhu

    Abstract: Connected and Automated Vehicles (CAVs) offer a promising solution to the challenges of mixed traffic with both CAVs and Human-Driven Vehicles (HDVs). A significant hurdle in such scenarios is traffic oscillation, or the "stop-and-go" pattern, during car-following situations. While HDVs rely on limited information, CAVs can leverage data from other CAVs for better decision-making. This allows CAVs… ▽ More

    Submitted 23 June, 2024; originally announced July 2024.

  2. arXiv:2407.02516  [pdf, other

    cs.RO cs.AI

    EditFollower: Tunable Car Following Models for Customizable Adaptive Cruise Control Systems

    Authors: Xianda Chen, Xu Han, Meixin Zhu, Xiaowen Chu, PakHin Tiu, Xinhu Zheng, Yinhai Wang

    Abstract: In the realm of driving technologies, fully autonomous vehicles have not been widely adopted yet, making advanced driver assistance systems (ADAS) crucial for enhancing driving experiences. Adaptive Cruise Control (ACC) emerges as a pivotal component of ADAS. However, current ACC systems often employ fixed settings, failing to intuitively capture drivers' social preferences and leading to potentia… ▽ More

    Submitted 23 June, 2024; originally announced July 2024.

  3. arXiv:2407.02510  [pdf, other

    cs.SE cs.LG

    Detecting Stimuli with Novel Temporal Patterns to Accelerate Functional Coverage Closure

    Authors: Xuan Zheng, Tim Blackmore, James Buckingham, Kerstin Eder

    Abstract: Novel test selectors have demonstrated their effectiveness in accelerating the closure of functional coverage for various industrial digital designs in simulation-based verification. The primary advantages of these test selectors include performance that is not impacted by coverage holes, straightforward implementation, and relatively low computational expense. However, the detection of stimuli wi… ▽ More

    Submitted 19 June, 2024; originally announced July 2024.

  4. arXiv:2407.01884  [pdf, other

    cs.CV cs.HC

    EIT-1M: One Million EEG-Image-Text Pairs for Human Visual-textual Recognition and More

    Authors: Xu Zheng, Ling Wang, Kanghao Chen, Yuanhuiyi Lyu, Jiazhou Zhou, Lin Wang

    Abstract: Recently, electroencephalography (EEG) signals have been actively incorporated to decode brain activity to visual or textual stimuli and achieve object recognition in multi-modal AI. Accordingly, endeavors have been focused on building EEG-based datasets from visual or textual single-modal stimuli. However, these datasets offer limited EEG epochs per category, and the complex semantics of stimuli… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  5. arXiv:2407.01492  [pdf, other

    cs.CL cs.AI

    RegMix: Data Mixture as Regression for Language Model Pre-training

    Authors: Qian Liu, Xiaosen Zheng, Niklas Muennighoff, Guangtao Zeng, Longxu Dou, Tianyu Pang, Jing Jiang, Min Lin

    Abstract: The data mixture for large language model pre-training significantly impacts performance, yet how to determine an effective mixture remains unclear. We propose RegMix to automatically identify a high-performing data mixture by formulating it as a regression task. RegMix involves training a set of small models with diverse data mixtures and fitting a regression model to predict their performance gi… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  6. arXiv:2407.01461  [pdf, other

    cs.CL

    Enhancing the Capability and Robustness of Large Language Models through Reinforcement Learning-Driven Query Refinement

    Authors: Zisu Huang, Xiaohua Wang, Feiran Zhang, Zhibo Xu, Cenyuan Zhang, Xiaoqing Zheng, Xuanjing Huang

    Abstract: The capacity of large language models (LLMs) to generate honest, harmless, and helpful responses heavily relies on the quality of user prompts. However, these prompts often tend to be brief and vague, thereby significantly limiting the full potential of LLMs. Moreover, harmful prompts can be meticulously crafted and manipulated by adversaries to jailbreak LLMs, inducing them to produce potentially… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  7. arXiv:2407.01219  [pdf, other

    cs.CL

    Searching for Best Practices in Retrieval-Augmented Generation

    Authors: Xiaohua Wang, Zhenghua Wang, Xuan Gao, Feiran Zhang, Yixin Wu, Zhibo Xu, Tianyuan Shi, Zhengyuan Wang, Shizheng Li, Qi Qian, Ruicheng Yin, Changze Lv, Xiaoqing Zheng, Xuanjing Huang

    Abstract: Retrieval-augmented generation (RAG) techniques have proven to be effective in integrating up-to-date information, mitigating hallucinations, and enhancing response quality, particularly in specialized domains. While many RAG approaches have been proposed to enhance large language models through query-dependent retrievals, these approaches still suffer from their complex implementation and prolong… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  8. arXiv:2406.19640  [pdf, other

    cs.CV

    Efficient Event Stream Super-Resolution with Recursive Multi-Branch Fusion

    Authors: Quanmin Liang, Zhilin Huang, Xiawu Zheng, Feidiao Yang, Jun Peng, Kai Huang, Yonghong Tian

    Abstract: Current Event Stream Super-Resolution (ESR) methods overlook the redundant and complementary information present in positive and negative events within the event stream, employing a direct mixing approach for super-resolution, which may lead to detail loss and inefficiency. To address these issues, we propose an efficient Recursive Multi-Branch Information Fusion Network (RMFNet) that separates po… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Journal ref: International Joint Conference on Artificial Intelligence 2024

  9. arXiv:2406.19247  [pdf, other

    cs.CV

    Local Manifold Learning for No-Reference Image Quality Assessment

    Authors: Timin Gao, Wensheng Pan, Yan Zhang, Sicheng Zhao, Shengchuan Zhang, Xiawu Zheng, Ke Li, Liujuan Cao, Rongrong Ji

    Abstract: Contrastive learning has considerably advanced the field of Image Quality Assessment (IQA), emerging as a widely adopted technique. The core mechanism of contrastive learning involves minimizing the distance between quality-similar (positive) examples while maximizing the distance between quality-dissimilar (negative) examples. Despite its successes, current contrastive learning methods often negl… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  10. arXiv:2406.19230  [pdf, other

    cs.NE cs.CL

    Spiking Convolutional Neural Networks for Text Classification

    Authors: Changze Lv, Jianhan Xu, Xiaoqing Zheng

    Abstract: Spiking neural networks (SNNs) offer a promising pathway to implement deep neural networks (DNNs) in a more energy-efficient manner since their neurons are sparsely activated and inferences are event-driven. However, there have been very few works that have demonstrated the efficacy of SNNs in language tasks partially because it is non-trivial to represent words in the forms of spikes and to deal… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  11. arXiv:2406.17413  [pdf, other

    cs.CV

    Depth-Guided Semi-Supervised Instance Segmentation

    Authors: Xin Chen, Jie Hu, Xiawu Zheng, Jianghang Lin, Liujuan Cao, Rongrong Ji

    Abstract: Semi-Supervised Instance Segmentation (SSIS) aims to leverage an amount of unlabeled data during training. Previous frameworks primarily utilized the RGB information of unlabeled images to generate pseudo-labels. However, such a mechanism often introduces unstable noise, as a single instance can display multiple RGB values. To overcome this limitation, we introduce a Depth-Guided (DG) SSIS framewo… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 12 pages, 6 figures, 4 tables

  12. arXiv:2406.16062  [pdf, other

    cs.NE

    Towards Biologically Plausible Computing: A Comprehensive Comparison

    Authors: Changze Lv, Yufei Gu, Zhengkang Guo, Zhibo Xu, Yixin Wu, Feiran Zhang, Tianyuan Shi, Zhenghua Wang, Ruicheng Yin, Yu Shang, Siqi Zhong, Xiaohua Wang, Muling Wu, Wenhao Liu, Tianlong Li, Jianhao Zhu, Cenyuan Zhang, Zixuan Ling, Xiaoqing Zheng

    Abstract: Backpropagation is a cornerstone algorithm in training neural networks for supervised learning, which uses a gradient descent method to update network weights by minimizing the discrepancy between actual and desired outputs. Despite its pivotal role in propelling deep learning advancements, the biological plausibility of backpropagation is questioned due to its requirements for weight symmetry, gl… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  13. arXiv:2406.10976  [pdf, other

    cs.LG cs.CL cs.CR

    Promoting Data and Model Privacy in Federated Learning through Quantized LoRA

    Authors: JianHao Zhu, Changze Lv, Xiaohua Wang, Muling Wu, Wenhao Liu, Tianlong Li, Zixuan Ling, Cenyuan Zhang, Xiaoqing Zheng, Xuanjing Huang

    Abstract: Conventional federated learning primarily aims to secure the privacy of data distributed across multiple edge devices, with the global model dispatched to edge devices for parameter updates during the learning process. However, the development of large language models (LLMs) requires substantial data and computational resources, rendering them valuable intellectual properties for their developers… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  14. arXiv:2406.10347  [pdf, other

    cs.NI

    A Near-Optimal Category Information Sampling in RFID Systems

    Authors: Xiujun Wang, Zhi Liu, Xiaokang Zhou, Yong Liao, Han Hu, Xiao Zheng, Jie Li

    Abstract: In many RFID-enabled applications, objects are classified into different categories, and the information associated with each object's category (called category information) is written into the attached tag, allowing the reader to access it later. The category information sampling in such RFID systems, which is to randomly choose (sample) a few tags from each category and collect their category in… ▽ More

    Submitted 18 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: 37 pages, 11 figures

  15. arXiv:2406.10228  [pdf, other

    cs.CV cs.AI cs.CL

    VEGA: Learning Interleaved Image-Text Comprehension in Vision-Language Large Models

    Authors: Chenyu Zhou, Mengdan Zhang, Peixian Chen, Chaoyou Fu, Yunhang Shen, Xiawu Zheng, Xing Sun, Rongrong Ji

    Abstract: The swift progress of Multi-modal Large Models (MLLMs) has showcased their impressive ability to tackle tasks blending vision and language. Yet, most current models and benchmarks cater to scenarios with a narrow scope of visual and textual contexts. These models often fall short when faced with complex comprehension tasks, which involve navigating through a plethora of irrelevant and potentially… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Project Page: https://zhourax.github.io/VEGA/

  16. arXiv:2406.03011  [pdf, other

    cs.IT eess.SP

    Huygens-Fresnel Model Based Position-Aided Phase Configuration for 1-Bit RIS Assisted Wireless Communication

    Authors: Xiao Zheng, Wenchi Cheng, Jiangzhou Wang

    Abstract: Reconfigurable intelligent surface (RIS), composed of nearly passive elements, is regarded as one of the potential paradigms to support multi-gigabit data in real-time. However, in traditional CSI (channel state information) driven frame, the training overhead of channel estimation greatly increases as the number of RIS elements increases to intelligently manipulate the reflected signals. To conve… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 15 pages, accepted by IEEE TCOM (early access)

    ACM Class: H.1.1

  17. arXiv:2406.02898  [pdf, other

    cs.IT

    Location-Driven Beamforming for RIS-Assisted Near-Field Communications

    Authors: Xiao Zheng, Wenchi Cheng, Jingqing Wang, Wei Zhang

    Abstract: Future wireless communications are promising to support ubiquitous connections and high data rates with cost-effective devices. Benefiting from the energy-efficient elements with low cost, reconfigurable intelligent surface (RIS) emerges as a potential solution to fulfill such demands, which has the capability to flexibly manipulate the wireless signals with a tunable phase. Recently, as the opera… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 7 pages, 6 figures, accepted by IEEE Communication Magazine

    MSC Class: 14J60 (Primary) 14F05 ACM Class: H.1.1

  18. arXiv:2406.02240  [pdf, other

    cs.NI

    Quantum Computing in Wireless Communications and Networking: A Tutorial-cum-Survey

    Authors: Wei Zhao, Tangjie Weng, Yue Ruan, Zhi Liu, Xuangou Wu, Xiao Zheng, Nei Kato

    Abstract: Owing to its outstanding parallel computing capabilities, quantum computing (QC) has been a subject of continuous attention. With the gradual maturation of QC platforms, it has increasingly played a significant role in various fields such as transportation, pharmaceuticals, and industrial manufacturing,achieving unprecedented milestones. In modern society, wireless communication stands as an indis… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  19. arXiv:2406.01288  [pdf, other

    cs.CL cs.AI cs.CR cs.LG

    Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses

    Authors: Xiaosen Zheng, Tianyu Pang, Chao Du, Qian Liu, Jing Jiang, Min Lin

    Abstract: Recently, Anil et al. (2024) show that many-shot (up to hundreds of) demonstrations can jailbreak state-of-the-art LLMs by exploiting their long-context capability. Nevertheless, is it possible to use few-shot demonstrations to efficiently jailbreak LLMs within limited context sizes? While the vanilla few-shot jailbreaking may be inefficient, we propose improved techniques such as injecting specia… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  20. arXiv:2406.01168  [pdf

    econ.GN cs.AI cs.CY cs.ET cs.HC

    How Ethical Should AI Be? How AI Alignment Shapes the Risk Preferences of LLMs

    Authors: Shumiao Ouyang, Hayong Yun, Xingjian Zheng

    Abstract: This study explores the risk preferences of Large Language Models (LLMs) and how the process of aligning them with human ethical standards influences their economic decision-making. By analyzing 30 LLMs, we uncover a broad range of inherent risk profiles ranging from risk-averse to risk-seeking. We then explore how different types of AI alignment, a process that ensures models act according to hum… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  21. arXiv:2406.00036  [pdf, other

    cs.CL cs.AI cs.LG

    EMERGE: Integrating RAG for Improved Multimodal EHR Predictive Modeling

    Authors: Yinghao Zhu, Changyu Ren, Zixiang Wang, Xiaochen Zheng, Shiyun Xie, Junlan Feng, Xi Zhu, Zhoujun Li, Liantao Ma, Chengwei Pan

    Abstract: The integration of multimodal Electronic Health Records (EHR) data has notably advanced clinical predictive capabilities. However, current models that utilize clinical notes and multivariate time-series EHR data often lack the necessary medical context for precise clinical tasks. Previous methods using knowledge graphs (KGs) primarily focus on structured knowledge extraction. To address this, we p… ▽ More

    Submitted 27 May, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2402.07016

  22. arXiv:2405.21075  [pdf, other

    cs.CV cs.CL

    Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

    Authors: Chaoyou Fu, Yuhan Dai, Yongdong Luo, Lei Li, Shuhuai Ren, Renrui Zhang, Zihan Wang, Chenyu Zhou, Yunhang Shen, Mengdan Zhang, Peixian Chen, Yanwei Li, Shaohui Lin, Sirui Zhao, Ke Li, Tong Xu, Xiawu Zheng, Enhong Chen, Rongrong Ji, Xing Sun

    Abstract: In the quest for artificial general intelligence, Multi-modal Large Language Models (MLLMs) have emerged as a focal point in recent advancements. However, the predominant focus remains on developing their capabilities in static image understanding. The potential of MLLMs in processing sequential visual data is still insufficiently explored, highlighting the absence of a comprehensive, high-quality… ▽ More

    Submitted 16 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

    Comments: Project Page: https://video-mme.github.io

  23. arXiv:2405.20786  [pdf, other

    cs.CV cs.HC

    Stratified Avatar Generation from Sparse Observations

    Authors: Han Feng, Wenchao Ma, Quankai Gao, Xianwei Zheng, Nan Xue, Huijuan Xu

    Abstract: Estimating 3D full-body avatars from AR/VR devices is essential for creating immersive experiences in AR/VR applications. This task is challenging due to the limited input from Head Mounted Devices, which capture only sparse observations from the head and hands. Predicting the full-body avatars, particularly the lower body, from these sparse observations presents significant difficulties. In this… ▽ More

    Submitted 3 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: Accepted by CVPR 2024 (Oral)

  24. arXiv:2405.18744  [pdf, other

    cs.CR

    PermLLM: Private Inference of Large Language Models within 3 Seconds under WAN

    Authors: Fei Zheng, Chaochao Chen, Zhongxuan Han, Xiaolin Zheng

    Abstract: The emergence of ChatGPT marks the arrival of the large language model (LLM) era. While LLMs demonstrate their power in a variety of fields, they also raise serious privacy concerns as the users' queries are sent to the model provider. On the other side, deploying the LLM on the user's device will also leak all the model data. Existing methods based on secure multiparty computation (MPC) managed t… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  25. arXiv:2405.17743  [pdf, other

    cs.CL cs.AI cs.CE cs.LG

    ORLM: Training Large Language Models for Optimization Modeling

    Authors: Zhengyang Tang, Chenyu Huang, Xin Zheng, Shixi Hu, Zizhuo Wang, Dongdong Ge, Benyou Wang

    Abstract: Large Language Models (LLMs) have emerged as powerful tools for tackling complex Operations Research (OR) problem by providing the capacity in automating optimization modeling. However, current methodologies heavily rely on prompt engineering (e.g., multi-agent cooperation) with proprietary LLMs, raising data privacy concerns that could be prohibitive in industry applications. To tackle this issue… ▽ More

    Submitted 29 May, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: Work in progress

  26. arXiv:2405.16108  [pdf, other

    cs.CV

    OmniBind: Teach to Build Unequal-Scale Modality Interaction for Omni-Bind of All

    Authors: Yuanhuiyi Lyu, Xu Zheng, Dahun Kim, Lin Wang

    Abstract: Research on multi-modal learning dominantly aligns the modalities in a unified space at training, and only a single one is taken for prediction at inference. However, for a real machine, e.g., a robot, sensors could be added or removed at any time. Thus, it is crucial to enable the machine to tackle the mismatch and unequal-scale problems of modality combinations between training and inference. In… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  27. arXiv:2405.14362  [pdf, other

    cs.NE

    Advancing Spiking Neural Networks for Sequential Modeling with Central Pattern Generators

    Authors: Changze Lv, Dongqi Han, Yansen Wang, Xiaoqing Zheng, Xuanjing Huang, Dongsheng Li

    Abstract: Spiking neural networks (SNNs) represent a promising approach to developing artificial neural networks that are both energy-efficient and biologically plausible. However, applying SNNs to sequential tasks, such as text classification and time-series forecasting, has been hindered by the challenge of creating an effective and hardware-friendly spike-form positional encoding (PE) strategy. Drawing i… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  28. arXiv:2405.10037  [pdf, other

    cs.CV

    Bilateral Event Mining and Complementary for Event Stream Super-Resolution

    Authors: Zhilin Huang, Quanmin Liang, Yijie Yu, Chujun Qin, Xiawu Zheng, Kai Huang, Zikun Zhou, Wenming Yang

    Abstract: Event Stream Super-Resolution (ESR) aims to address the challenge of insufficient spatial resolution in event streams, which holds great significance for the application of event cameras in complex scenarios. Previous works for ESR often process positive and negative events in a mixed paradigm. This paradigm limits their ability to effectively model the unique characteristics of each event and mut… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: Accepted to CVPR2024

  29. arXiv:2405.09308  [pdf, other

    cs.LG cs.AI

    TimeX++: Learning Time-Series Explanations with Information Bottleneck

    Authors: Zichuan Liu, Tianchun Wang, Jimeng Shi, Xu Zheng, Zhuomin Chen, Lei Song, Wenqian Dong, Jayantha Obeysekera, Farhad Shirani, Dongsheng Luo

    Abstract: Explaining deep learning models operating on time series data is crucial in various applications of interest which require interpretable and transparent insights from time series signals. In this work, we investigate this problem from an information theoretic perspective and show that most existing measures of explainability may suffer from trivial solutions and distributional shift issues. To add… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: Accepted by International Conference on Machine Learning (ICML 2024)

  30. arXiv:2405.08748  [pdf, other

    cs.CV

    Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

    Authors: Zhimin Li, Jianwei Zhang, Qin Lin, Jiangfeng Xiong, Yanxin Long, Xinchi Deng, Yingfang Zhang, Xingchao Liu, Minbin Huang, Zedong Xiao, Dayou Chen, Jiajun He, Jiahao Li, Wenyue Li, Chen Zhang, Rongwei Quan, Jianxiang Lu, Jiabin Huang, Xiaoyan Yuan, Xiaoxiao Zheng, Yixuan Li, Jihong Zhang, Chao Zhang, Meng Chen, Jie Liu , et al. (20 additional authors not shown)

    Abstract: We present Hunyuan-DiT, a text-to-image diffusion transformer with fine-grained understanding of both English and Chinese. To construct Hunyuan-DiT, we carefully design the transformer structure, text encoder, and positional encoding. We also build from scratch a whole data pipeline to update and evaluate data for iterative model optimization. For fine-grained language understanding, we train a Mu… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: Project Page: https://dit.hunyuan.tencent.com/

  31. arXiv:2405.05741  [pdf, ps, other

    cs.CL cs.AI

    Can large language models understand uncommon meanings of common words?

    Authors: Jinyang Wu, Feihu Che, Xinxin Zheng, Shuai Zhang, Ruihan Jin, Shuai Nie, Pengpeng Shao, Jianhua Tao

    Abstract: Large language models (LLMs) like ChatGPT have shown significant advancements across diverse natural language understanding (NLU) tasks, including intelligent dialogue and autonomous agents. Yet, lacking widely acknowledged testing mechanisms, answering `whether LLMs are stochastic parrots or genuinely comprehend the world' remains unclear, fostering numerous studies and sparking heated debates. P… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  32. arXiv:2405.03327  [pdf, other

    cs.LG

    Clustering of Disease Trajectories with Explainable Machine Learning: A Case Study on Postoperative Delirium Phenotypes

    Authors: Xiaochen Zheng, Manuel Schürch, Xingyu Chen, Maria Angeliki Komninou, Reto Schüpbach, Ahmed Allam, Jan Bartussek, Michael Krauthammer

    Abstract: The identification of phenotypes within complex diseases or syndromes is a fundamental component of precision medicine, which aims to adapt healthcare to individual patient characteristics. Postoperative delirium (POD) is a complex neuropsychiatric condition with significant heterogeneity in its clinical manifestations and underlying pathophysiology. We hypothesize that POD comprises several disti… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  33. arXiv:2405.03252  [pdf, other

    cs.IT

    A Universal List Decoding Algorithm with Application to Decoding of Polar Codes

    Authors: Xiangping Zheng, Xiao Ma

    Abstract: This paper is concerned with a guessing codeword decoding (GCD) of linear block codes. Compared with the guessing noise decoding (GND), which is only efficient for high-rate codes, the GCD is efficient for not only high-rate codes but also low-rate codes. We prove that the GCD typically requires a fewer number of queries than the GND. Compared with the ordered statistics decoding (OSD), the GCD do… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 47 pages, 24 figures

  34. arXiv:2405.02764  [pdf, other

    cs.CL cs.LG

    Assessing Adversarial Robustness of Large Language Models: An Empirical Study

    Authors: Zeyu Yang, Zhao Meng, Xiaochen Zheng, Roger Wattenhofer

    Abstract: Large Language Models (LLMs) have revolutionized natural language processing, but their robustness against adversarial attacks remains a critical concern. We presents a novel white-box style attack approach that exposes vulnerabilities in leading open-source LLMs, including Llama, OPT, and T5. We assess the impact of model size, structure, and fine-tuning strategies on their resistance to adversar… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: 16 pages, 9 figures, 10 tables

  35. arXiv:2405.00587  [pdf, other

    cs.CV

    GraCo: Granularity-Controllable Interactive Segmentation

    Authors: Yian Zhao, Kehan Li, Zesen Cheng, Pengchong Qiao, Xiawu Zheng, Rongrong Ji, Chang Liu, Li Yuan, Jie Chen

    Abstract: Interactive Segmentation (IS) segments specific objects or parts in the image according to user input. Current IS pipelines fall into two categories: single-granularity output and multi-granularity output. The latter aims to alleviate the spatial ambiguity present in the former. However, the multi-granularity output pipeline suffers from limited interaction flexibility and produces redundant resul… ▽ More

    Submitted 16 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

    Comments: CVPR2024 Highlight, Project: https://zhao-yian.github.io/GraCo

  36. arXiv:2404.19242  [pdf, other

    cs.CV eess.IV stat.ME

    A Minimal Set of Parameters Based Depth-Dependent Distortion Model and Its Calibration Method for Stereo Vision Systems

    Authors: Xin Ma, Puchen Zhu, Xiao Li, Xiaoyin Zheng, Jianshu Zhou, Xuchen Wang, Kwok Wai Samuel Au

    Abstract: Depth position highly affects lens distortion, especially in close-range photography, which limits the measurement accuracy of existing stereo vision systems. Moreover, traditional depth-dependent distortion models and their calibration methods have remained complicated. In this work, we propose a minimal set of parameters based depth-dependent distortion model (MDM), which considers the radial an… ▽ More

    Submitted 1 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

    Comments: This paper has been accepted for publication in IEEE Transactions on Instrumentation and Measurement

  37. arXiv:2404.17164  [pdf, other

    cs.LG

    DPGAN: A Dual-Path Generative Adversarial Network for Missing Data Imputation in Graphs

    Authors: Xindi Zheng, Yuwei Wu, Yu Pan, Wanyu Lin, Lei Ma, Jianjun Zhao

    Abstract: Missing data imputation poses a paramount challenge when dealing with graph data. Prior works typically are based on feature propagation or graph autoencoders to address this issue. However, these methods usually encounter the over-smoothing issue when dealing with missing data, as the graph neural network (GNN) modules are not explicitly designed for handling missing data. This paper proposes a n… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: 9 pages

  38. arXiv:2404.16501  [pdf, other

    cs.CV

    360SFUDA++: Towards Source-free UDA for Panoramic Segmentation by Learning Reliable Category Prototypes

    Authors: Xu Zheng, Pengyuan Zhou, Athanasios V. Vasilakos, Lin Wang

    Abstract: In this paper, we address the challenging source-free unsupervised domain adaptation (SFUDA) for pinhole-to-panoramic semantic segmentation, given only a pinhole image pre-trained model (i.e., source) and unlabeled panoramic images (i.e., target). Tackling this problem is non-trivial due to three critical challenges: 1) semantic mismatches from the distinct Field-of-View (FoV) between domains, 2)… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2403.12505

  39. arXiv:2404.16033  [pdf, other

    cs.CV cs.CL

    Cantor: Inspiring Multimodal Chain-of-Thought of MLLM

    Authors: Timin Gao, Peixian Chen, Mengdan Zhang, Chaoyou Fu, Yunhang Shen, Yan Zhang, Shengchuan Zhang, Xiawu Zheng, Xing Sun, Liujuan Cao, Rongrong Ji

    Abstract: With the advent of large language models(LLMs) enhanced by the chain-of-thought(CoT) methodology, visual reasoning problem is usually decomposed into manageable sub-tasks and tackled sequentially with various external tools. However, such a paradigm faces the challenge of the potential "determining hallucinations" in decision-making due to insufficient visual information and the limitation of low-… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: The project page is available at https://ggg0919.github.io/cantor/

  40. arXiv:2404.15660  [pdf, other

    cs.CL

    KS-LLM: Knowledge Selection of Large Language Models with Evidence Document for Question Answering

    Authors: Xinxin Zheng, Feihu Che, Jinyang Wu, Shuai Zhang, Shuai Nie, Kang Liu, Jianhua Tao

    Abstract: Large language models (LLMs) suffer from the hallucination problem and face significant challenges when applied to knowledge-intensive tasks. A promising approach is to leverage evidence documents as extra supporting knowledge, which can be obtained through retrieval or generation. However, existing methods directly leverage the entire contents of the evidence document, which may introduce noise i… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  41. arXiv:2404.14949  [pdf, other

    cs.CV

    Multi-Modal Prompt Learning on Blind Image Quality Assessment

    Authors: Wensheng Pan, Timin Gao, Yan Zhang, Runze Hu, Xiawu Zheng, Enwei Zhang, Yuting Gao, Yutao Liu, Yunhang Shen, Ke Li, Shengchuan Zhang, Liujuan Cao, Rongrong Ji

    Abstract: Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly. Currently, leveraging semantic information to enhance IQA is a crucial research direction. Traditional methods, hindered by a lack of sufficiently annotated data, have employed the CLIP image-text pretraining model as their backbone to gain semant… ▽ More

    Submitted 18 May, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

  42. arXiv:2404.14047  [pdf, other

    cs.LG

    How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study

    Authors: Wei Huang, Xudong Ma, Haotong Qin, Xingyu Zheng, Chengtao Lv, Hong Chen, Jie Luo, Xiaojuan Qi, Xianglong Liu, Michele Magno

    Abstract: Meta's LLaMA family has become one of the most powerful open-source Large Language Model (LLM) series. Notably, LLaMA3 models have recently been released and achieve impressive performance across various with super-large scale pre-training on over 15T tokens of data. Given the wide application of low-bit quantization for LLMs in resource-limited scenarios, we explore LLaMA3's capabilities when qua… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  43. arXiv:2404.13534  [pdf, other

    cs.CV

    Motion-aware Latent Diffusion Models for Video Frame Interpolation

    Authors: Zhilin Huang, Yijie Yu, Ling Yang, Chujun Qin, Bing Zheng, Xiawu Zheng, Zikun Zhou, Yaowei Wang, Wenming Yang

    Abstract: With the advancement of AIGC, video frame interpolation (VFI) has become a crucial component in existing video generation frameworks, attracting widespread research interest. For the VFI task, the motion estimation between neighboring frames plays a crucial role in avoiding motion ambiguity. However, existing VFI methods always struggle to accurately predict the motion information between consecut… ▽ More

    Submitted 4 June, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

    Comments: 17 pages, 4 figures. arXiv admin note: substantial text overlap with arXiv:2303.09508 by other authors

  44. arXiv:2404.13425  [pdf, other

    cs.CV cs.AI

    AdvLoRA: Adversarial Low-Rank Adaptation of Vision-Language Models

    Authors: Yuheng Ji, Yue Liu, Zhicheng Zhang, Zhao Zhang, Yuting Zhao, Gang Zhou, Xingwei Zhang, Xinwang Liu, Xiaolong Zheng

    Abstract: Vision-Language Models (VLMs) are a significant technique for Artificial General Intelligence (AGI). With the fast growth of AGI, the security problem become one of the most important challenges for VLMs. In this paper, through extensive experiments, we demonstrate the vulnerability of the conventional adaptation methods for VLMs, which may bring significant security risks. In addition, as the siz… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

  45. arXiv:2404.11093  [pdf, other

    quant-ph cs.LG

    Neural Network Approach for Non-Markovian Dissipative Dynamics of Many-Body Open Quantum Systems

    Authors: Long Cao, Liwei Ge, Daochi Zhang, Xiang Li, Yao Wang, Rui-Xue Xu, YiJing Yan, Xiao Zheng

    Abstract: Simulating the dynamics of open quantum systems coupled to non-Markovian environments remains an outstanding challenge due to exponentially scaling computational costs. We present an artificial intelligence strategy to overcome this obstacle by integrating the neural quantum states approach into the dissipaton-embedded quantum master equation in second quantization (DQME-SQ). Our approach utilizes… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: 7 pages, 5 figures

  46. arXiv:2404.11064  [pdf, other

    cs.CV cs.AI

    Rethinking 3D Dense Caption and Visual Grounding in A Unified Framework through Prompt-based Localization

    Authors: Yongdong Luo, Haojia Lin, Xiawu Zheng, Yigeng Jiang, Fei Chao, Jie Hu, Guannan Jiang, Songan Zhang, Rongrong Ji

    Abstract: 3D Visual Grounding (3DVG) and 3D Dense Captioning (3DDC) are two crucial tasks in various 3D applications, which require both shared and complementary information in localization and visual-language relationships. Therefore, existing approaches adopt the two-stage "detect-then-describe/discriminate" pipeline, which relies heavily on the performance of the detector, resulting in suboptimal perform… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  47. arXiv:2404.09498  [pdf, other

    cs.CV

    FusionMamba: Dynamic Feature Enhancement for Multimodal Image Fusion with Mamba

    Authors: Xinyu Xie, Yawen Cui, Chio-In Ieong, Tao Tan, Xiaozhi Zhang, Xubin Zheng, Zitong Yu

    Abstract: Multi-modal image fusion aims to combine information from different modes to create a single image with comprehensive information and detailed textures. However, fusion models based on convolutional neural networks encounter limitations in capturing global image features due to their focus on local convolution operations. Transformer-based models, while excelling in global feature modeling, confro… ▽ More

    Submitted 20 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

  48. arXiv:2404.09204  [pdf, other

    cs.CV cs.AI

    TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models

    Authors: Ya-Qi Yu, Minghui Liao, Jihao Wu, Yongxin Liao, Xiaoyu Zheng, Wei Zeng

    Abstract: Multimodal Large Language Models (MLLMs) have shown impressive results on various multimodal tasks. However, most existing MLLMs are not well suited for document-oriented tasks, which require fine-grained image perception and information compression. In this paper, we present TextHawk, a MLLM that is specifically designed for document-oriented tasks, while preserving the general capabilities of ML… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  49. arXiv:2404.08939  [pdf, other

    cs.RO cs.AI cs.HC

    NeurIT: Pushing the Limit of Neural Inertial Tracking for Indoor Robotic IoT

    Authors: Xinzhe Zheng, Sijie Ji, Yipeng Pan, Kaiwen Zhang, Chenshu Wu

    Abstract: Inertial tracking is vital for robotic IoT and has gained popularity thanks to the ubiquity of low-cost Inertial Measurement Units (IMUs) and deep learning-powered tracking algorithms. Existing works, however, have not fully utilized IMU measurements, particularly magnetometers, nor maximized the potential of deep learning to achieve the desired accuracy. To enhance the tracking accuracy for indoo… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

  50. arXiv:2404.05662  [pdf, other

    cs.CV

    Towards Accurate Binarization of Diffusion Model

    Authors: Xingyu Zheng, Haotong Qin, Xudong Ma, Mingyuan Zhang, Haojie Hao, Jiakai Wang, Zixiang Zhao, Jinyang Guo, Xianglong Liu

    Abstract: With the advancement of diffusion models (DMs) and the substantially increased computational requirements, quantization emerges as a practical solution to obtain compact and efficient low-bit DMs. However, the highly discrete representation leads to severe accuracy degradation, hindering the quantization of diffusion models to ultra-low bit-widths. This paper proposes a novel quantization-aware tr… ▽ More

    Submitted 28 May, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

    Comments: The code is available at https://github.com/Xingyu-Zheng/BinaryDM