Skip to main content

Showing 1–50 of 4,726 results for author: li, Z

  1. arXiv:2407.11840  [pdf, other

    cs.CV

    MVG-Splatting: Multi-View Guided Gaussian Splatting with Adaptive Quantile-Based Geometric Consistency Densification

    Authors: Zhuoxiao Li, Shanliang Yao, Yijie Chu, Angel F. Garcia-Fernandez, Yong Yue, Eng Gee Lim, Xiaohui Zhu

    Abstract: In the rapidly evolving field of 3D reconstruction, 3D Gaussian Splatting (3DGS) and 2D Gaussian Splatting (2DGS) represent significant advancements. Although 2DGS compresses 3D Gaussian primitives into 2D Gaussian surfels to effectively enhance mesh extraction quality, this compression can potentially lead to a decrease in rendering quality. Additionally, unreliable densification processes and th… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: https://mvgsplatting.github.io

  2. arXiv:2407.11624  [pdf, other

    cs.LG cs.AI cs.CY

    Rethinking Fair Graph Neural Networks from Re-balancing

    Authors: Zhixun Li, Yushun Dong, Qiang Liu, Jeffrey Xu Yu

    Abstract: Driven by the powerful representation ability of Graph Neural Networks (GNNs), plentiful GNN models have been widely deployed in many real-world applications. Nevertheless, due to distribution disparities between different demographic groups, fairness in high-stake decision-making systems is receiving increasing attention. Although lots of recent works devoted to improving the fairness of GNNs and… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted by SIGKDD 2024, research track

  3. arXiv:2407.11610  [pdf, other

    cs.CV

    MergeNet: Explicit Mesh Reconstruction from Sparse Point Clouds via Edge Prediction

    Authors: Weimin Wang, Yingxu Deng, Zezeng Li, Yu Liu, Na Lei

    Abstract: This paper introduces a novel method for reconstructing meshes from sparse point clouds by predicting edge connection. Existing implicit methods usually produce superior smooth and watertight meshes due to the isosurface extraction algorithms~(e.g., Marching Cubes). However, these methods become memory and computationally intensive with increasing resolution. Explicit methods are more efficient by… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  4. arXiv:2407.11551  [pdf

    cs.RO

    Human-Machine Shared Control Approach for the Takeover of Cooperative Adaptive Cruise Control

    Authors: Haoran Wang, Zhenning Li, Arno Eichberger, Jia Hu

    Abstract: Cooperative Adaptive Cruise Control (CACC) often requires human takeover for tasks such as exiting a freeway. Direct human takeover can pose significant risks, especially given the close-following strategy employed by CACC, which might cause drivers to feel unsafe and execute hard braking, potentially leading to collisions. This research aims to develop a CACC takeover controller that ensures a sm… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  5. arXiv:2407.11541  [pdf, other

    eess.IV cs.CV

    Uniformly Accelerated Motion Model for Inter Prediction

    Authors: Zhuoyuan Li, Yao Li, Chuanbo Tang, Li Li, Dong Liu, Feng Wu

    Abstract: Inter prediction is a key technology to reduce the temporal redundancy in video coding. In natural videos, there are usually multiple moving objects with variable velocity, resulting in complex motion fields that are difficult to represent compactly. In Versatile Video Coding (VVC), existing inter prediction methods usually assume uniform speed motion between consecutive frames and use the linear… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 5 pages, 4 figures

  6. arXiv:2407.11483  [pdf, other

    cs.NI cs.DC

    Performance Analysis of Internet of Vehicles Mesh Networks Based on Actual Switch Models

    Authors: Jialin Hu, Zhiyuan Ren, Wenchi Cheng, Zhiliang Shuai, Zhao Li

    Abstract: The rapid growth of the automotive industry has exacerbated the conflict between the complex traffic environment, increasing communication demands, and limited resources. Given the imperative to mitigate traffic and network congestion, analyzing the performance of Internet of Vehicles (IoV) mesh networks is of great practical significance. Most studies focus solely on individual performance metric… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  7. arXiv:2407.11477  [pdf, other

    cs.LG cs.AI

    XTraffic: A Dataset Where Traffic Meets Incidents with Explainability and More

    Authors: Xiaochuan Gou, Ziyue Li, Tian Lan, Junpeng Lin, Zhishuai Li, Bingyu Zhao, Chen Zhang, Di Wang, Xiangliang Zhang

    Abstract: Long-separated research has been conducted on two highly correlated tracks: traffic and incidents. Traffic track witnesses complicating deep learning models, e.g., to push the prediction a few percent more accurate, and the incident track only studies the incidents alone, e.g., to infer the incident risk. We, for the first time, spatiotemporally aligned the two tracks in a large-scale region (16,9… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  8. arXiv:2407.11337  [pdf, other

    cs.CV

    Continuity Preserving Online CenterLine Graph Learning

    Authors: Yunhui Han, Kun Yu, Zhiwei Li

    Abstract: Lane topology, which is usually modeled by a centerline graph, is essential for high-level autonomous driving. For a high-quality graph, both topology connectivity and spatial continuity of centerline segments are critical. However, most of existing approaches pay more attention to connectivity while neglect the continuity. Such kind of centerline graph usually cause problem to planning of autonom… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  9. arXiv:2407.11180  [pdf, other

    cs.LG

    Transformer-based Drum-level Prediction in a Boiler Plant with Delayed Relations among Multivariates

    Authors: Gang Su, Sun Yang, Zhishuai Li

    Abstract: The steam drum water level is a critical parameter that directly impacts the safety and efficiency of power plant operations. However, predicting the drum water level in boilers is challenging due to complex non-linear process dynamics originating from long-time delays and interrelations, as well as measurement noise. This paper investigates the application of Transformer-based models for predicti… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: under review of IEEE PES ISGT Europe 2024 conference

  10. arXiv:2407.11033  [pdf, other

    cs.LG cs.CL

    Hadamard Adapter: An Extreme Parameter-Efficient Adapter Tuning Method for Pre-trained Language Models

    Authors: Yuyan Chen, Qiang Fu, Ge Fan, Lun Du, Jian-Guang Lou, Shi Han, Dongmei Zhang, Zhixu Li, Yanghua Xiao

    Abstract: Recent years, Pre-trained Language models (PLMs) have swept into various fields of artificial intelligence and achieved great success. However, most PLMs, such as T5 and GPT3, have a huge amount of parameters, fine-tuning them is often expensive and time consuming, and storing them takes up a lot of space. Therefore, it is necessary to adopt a parameter-efficient approach to reduce parameters of P… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Accepted to CIKM 2023 (Long Paper)

  11. arXiv:2407.10996  [pdf, other

    cs.CL cs.AI cs.HC

    Visualization Literacy of Multimodal Large Language Models: A Comparative Study

    Authors: Zhimin Li, Haichao Miao, Valerio Pascucci, Shusen Liu

    Abstract: The recent introduction of multimodal large language models (MLLMs) combine the inherent power of large language models (LLMs) with the renewed capabilities to reason about the multimodal context. The potential usage scenarios for MLLMs significantly outpace their text-only counterparts. Many recent works in visualization have demonstrated MLLMs' capability to understand and interpret visualizatio… ▽ More

    Submitted 24 June, 2024; originally announced July 2024.

  12. arXiv:2407.10926  [pdf, other

    eess.IV cs.CV

    In-Loop Filtering via Trained Look-Up Tables

    Authors: Zhuoyuan Li, Jiacheng Li, Yao Li, Li Li, Dong Liu, Feng Wu

    Abstract: In-loop filtering (ILF) is a key technology for removing the artifacts in image/video coding standards. Recently, neural network-based in-loop filtering methods achieve remarkable coding gains beyond the capability of advanced video coding standards, which becomes a powerful coding tool candidate for future video coding standards. However, the utilization of deep neural networks brings heavy time… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 11 pages, 6 figures

  13. arXiv:2407.10906  [pdf, other

    cs.SE

    An Exploratory Study on Just-in-Time Multi-Programming-Language Bug Prediction

    Authors: Zengyang Li, Jiabao Ji, Peng Liang, Ran Mo, Hui Liu

    Abstract: Context: An increasing number of software systems are written in multiple programming languages (PLs), which are called multi-programming-language (MPL) systems. MPL bugs (MPLBs) refers to the bugs whose resolution involves multiple PLs. Despite high complexity of MPLB resolution, there lacks MPLB prediction methods. Objective: This work aims to construct just-in-time (JIT) MPLB prediction models… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Preprint accepted for publication in Information and Software Technology, 2024

  14. arXiv:2407.10878  [pdf, other

    cs.LG cs.AI

    Deep Causal Learning to Explain and Quantify The Geo-Tension's Impact on Natural Gas Market

    Authors: Philipp Kai Peter, Yulin Li, Ziyue Li, Wolfgang Ketter

    Abstract: Natural gas demand is a crucial factor for predicting natural gas prices and thus has a direct influence on the power system. However, existing methods face challenges in assessing the impact of shocks, such as the outbreak of the Russian-Ukrainian war. In this context, we apply deep neural network-based Granger causality to identify important drivers of natural gas demand. Furthermore, the result… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by IEEE PES ISGT Europe 2024 conference

  15. arXiv:2407.10811  [pdf, other

    cs.MA cs.AI cs.LG

    GuideLight: "Industrial Solution" Guidance for More Practical Traffic Signal Control Agents

    Authors: Haoyuan Jiang, Xuantang Xiong, Ziyue Li, Hangyu Mao, Guanghu Sui, Jingqing Ruan, Yuheng Cheng, Hua Wei, Wolfgang Ketter, Rui Zhao

    Abstract: Currently, traffic signal control (TSC) methods based on reinforcement learning (RL) have proven superior to traditional methods. However, most RL methods face difficulties when applied in the real world due to three factors: input, output, and the cycle-flow relation. The industry's observable input is much more limited than simulation-based RL methods. For real-world solutions, only flow can be… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Under Review of IEEE Transactions on Intelligent Transportation Systems

  16. arXiv:2407.10377  [pdf

    eess.IV cs.AI cs.CV

    Enhanced Self-supervised Learning for Multi-modality MRI Segmentation and Classification: A Novel Approach Avoiding Model Collapse

    Authors: Linxuan Han, Sa Xiao, Zimeng Li, Haidong Li, Xiuchao Zhao, Fumin Guo, Yeqing Han, Xin Zhou

    Abstract: Multi-modality magnetic resonance imaging (MRI) can provide complementary information for computer-aided diagnosis. Traditional deep learning algorithms are suitable for identifying specific anatomical structures segmenting lesions and classifying diseases with magnetic resonance images. However, manual labels are limited due to high expense, which hinders further improvement of model accuracy. Se… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  17. arXiv:2407.10340  [pdf

    cs.CY cs.AI cs.HC cs.IT cs.SI

    Mapping the Scholarship of Dark Pattern Regulation: A Systematic Review of Concepts, Regulatory Paradigms, and Solutions from an Interdisciplinary Perspective

    Authors: Weiwei Yi, Zihao Li

    Abstract: Dark patterns, design tricks used on online interfaces to manipulate users decision-making process, have raised public concerns. However, research on regulation of dark pattern remains underdeveloped and scattered, particularly regarding scholars views on the concept, regulatory paradigms, and solutions. Following PRISMA guidelines, this paper systematically reviews the formats and content of regu… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  18. arXiv:2407.10267  [pdf, other

    cs.CV

    RS-NeRF: Neural Radiance Fields from Rolling Shutter Images

    Authors: Muyao Niu, Tong Chen, Yifan Zhan, Zhuoxiao Li, Xiang Ji, Yinqiang Zheng

    Abstract: Neural Radiance Fields (NeRFs) have become increasingly popular because of their impressive ability for novel view synthesis. However, their effectiveness is hindered by the Rolling Shutter (RS) effects commonly found in most camera systems. To solve this, we present RS-NeRF, a method designed to synthesize normal images from novel views using input with RS distortions. This involves a physical mo… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: ECCV 2024 ; Codes and data: https://github.com/MyNiuuu/RS-NeRF

  19. arXiv:2407.10255  [pdf, other

    cs.SD eess.AS

    CUSIDE-T: Chunking, Simulating Future and Decoding for Transducer based Streaming ASR

    Authors: Wenbo Zhao, Ziwei Li, Chuan Yu, Zhijian Ou

    Abstract: Streaming automatic speech recognition (ASR) is very important for many real-world ASR applications. However, a notable challenge for streaming ASR systems lies in balancing operational performance against latency constraint. Recently, a method of chunking, simulating future context and decoding, called CUSIDE, has been proposed for connectionist temporal classification (CTC) based streaming ASR,… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  20. arXiv:2407.09842  [pdf, other

    cs.CV

    Eliminating Feature Ambiguity for Few-Shot Segmentation

    Authors: Qianxiong Xu, Guosheng Lin, Chen Change Loy, Cheng Long, Ziyue Li, Rui Zhao

    Abstract: Recent advancements in few-shot segmentation (FSS) have exploited pixel-by-pixel matching between query and support features, typically based on cross attention, which selectively activate query foreground (FG) features that correspond to the same-class support FG features. However, due to the large receptive fields in deep layers of the backbone, the extracted query and support FG features are in… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: This paper is accepted by ECCV'24

  21. arXiv:2407.09618  [pdf, other

    cs.LG cs.SI

    The Heterophilic Graph Learning Handbook: Benchmarks, Models, Theoretical Analysis, Applications and Challenges

    Authors: Sitao Luan, Chenqing Hua, Qincheng Lu, Liheng Ma, Lirong Wu, Xinyu Wang, Minkai Xu, Xiao-Wen Chang, Doina Precup, Rex Ying, Stan Z. Li, Jian Tang, Guy Wolf, Stefanie Jegelka

    Abstract: Homophily principle, \ie{} nodes with the same labels or similar attributes are more likely to be connected, has been commonly believed to be the main reason for the superiority of Graph Neural Networks (GNNs) over traditional Neural Networks (NNs) on graph-structured data, especially on node-level tasks. However, recent work has identified a non-trivial set of datasets where GNN's performance com… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Suggestions and comments are welcomed at sitao.luan@mail.mcgill.ca!

  22. arXiv:2407.08994  [pdf, other

    cs.CV

    Global Attention-Guided Dual-Domain Point Cloud Feature Learning for Classification and Segmentation

    Authors: Zihao Li, Pan Gao, Kang You, Chuan Yan, Manoranjan Paul

    Abstract: Previous studies have demonstrated the effectiveness of point-based neural models on the point cloud analysis task. However, there remains a crucial issue on producing the efficient input embedding for raw point coordinates. Moreover, another issue lies in the limited efficiency of neighboring aggregations, which is a critical component in the network stem. In this paper, we propose a Global Atten… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  23. arXiv:2407.08414  [pdf, other

    cs.CV cs.GR

    MeshAvatar: Learning High-quality Triangular Human Avatars from Multi-view Videos

    Authors: Yushuo Chen, Zerong Zheng, Zhe Li, Chao Xu, Yebin Liu

    Abstract: We present a novel pipeline for learning high-quality triangular human avatars from multi-view videos. Recent methods for avatar learning are typically based on neural radiance fields (NeRF), which is not compatible with traditional graphics pipeline and poses great challenges for operations like editing or synthesizing under different environments. To overcome these limitations, our method repres… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Project Page: https://shad0wta9.github.io/meshavatar-page/

  24. arXiv:2407.08273   

    cs.CL

    RB-SQL: A Retrieval-based LLM Framework for Text-to-SQL

    Authors: Zhenhe Wu, Zhongqiu Li, Jie Zhang, Mengxiang Li, Yu Zhao, Ruiyu Fang, Zhongjiang He, Xuelong Li, Zhoujun Li, Shuangyong Song

    Abstract: Large language models (LLMs) with in-context learning have significantly improved the performance of text-to-SQL task. Previous works generally focus on using exclusive SQL generation prompt to improve the LLMs' reasoning ability. However, they are mostly hard to handle large databases with numerous tables and columns, and usually ignore the significance of pre-processing database and extracting v… ▽ More

    Submitted 12 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

    Comments: Further improvement and modification are needed.

  25. arXiv:2407.08165  [pdf, other

    eess.IV cs.CV

    Explicit_NeRF_QA: A Quality Assessment Database for Explicit NeRF Model Compression

    Authors: Yuke Xing, Qi Yang, Kaifa Yang, Yilin Xu, Zhu Li

    Abstract: In recent years, Neural Radiance Fields (NeRF) have demonstrated significant advantages in representing and synthesizing 3D scenes. Explicit NeRF models facilitate the practical NeRF applications with faster rendering speed, and also attract considerable attention in NeRF compression due to its huge storage cost. To address the challenge of the NeRF compression study, in this paper, we construct a… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 5 pages, 4 figures, 2 tables, conference

  26. arXiv:2407.07723  [pdf, other

    cs.IT cs.AI

    Understanding is Compression

    Authors: Ziguang Li, Chao Huang, Xuliang Wang, Haibo Hu, Cole Wyeth, Dongbo Bu, Quan Yu, Wen Gao, Xingwu Liu, Ming Li

    Abstract: We have previously shown all understanding or learning are compression, under reasonable assumptions. In principle, better understanding of data should improve data compression. Traditional compression methodologies focus on encoding frequencies or some other computable properties of data. Large language models approximate the uncomputable Solomonoff distribution, opening up a whole new avenue to… ▽ More

    Submitted 23 June, 2024; originally announced July 2024.

  27. arXiv:2407.07337  [pdf, other

    cs.NI eess.SP

    In-Orbit Processing or Not? Sunlight-Aware Task Scheduling for Energy-Efficient Space Edge Computing Networks

    Authors: Weisen Liu, Zeqi Lai, Qian Wu, Hewu Li, Qi Zhang, Zonglun Li, Yuanjie Li, Jun Liu

    Abstract: With the rapid evolution of space-borne capabilities, space edge computing (SEC) is becoming a new computation paradigm for future integrated space and terrestrial networks. Satellite edges adopt advanced on-board hardware, which not only enables new opportunities to perform complex intelligent tasks in orbit, but also involves new challenges due to the additional energy consumption in power-const… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: Accepted by IEEE INFOCOM 2024

  28. arXiv:2407.07327  [pdf, other

    cs.AI

    Fuse, Reason and Verify: Geometry Problem Solving with Parsed Clauses from Diagram

    Authors: Ming-Liang Zhang, Zhong-Zhi Li, Fei Yin, Liang Lin, Cheng-Lin Liu

    Abstract: Geometry problem solving (GPS) requires capacities of multi-modal understanding, multi-hop reasoning and theorem knowledge application. In this paper, we propose a neural-symbolic model for plane geometry problem solving (PGPS), named PGPSNet-v2, with three key steps: modal fusion, reasoning process and knowledge verification. In modal fusion, we leverage textual clauses to express fine-grained st… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: under review by journal

  29. arXiv:2407.07020  [pdf, other

    cs.AI cs.RO

    Less is More: Efficient Brain-Inspired Learning for Autonomous Driving Trajectory Prediction

    Authors: Haicheng Liao, Yongkang Li, Zhenning Li, Chengyue Wang, Chunlin Tian, Yuming Huang, Zilin Bian, Kaiqun Zhu, Guofa Li, Ziyuan Pu, Jia Hu, Zhiyong Cui, Chengzhong Xu

    Abstract: Accurately and safely predicting the trajectories of surrounding vehicles is essential for fully realizing autonomous driving (AD). This paper presents the Human-Like Trajectory Prediction model (HLTP++), which emulates human cognitive processes to improve trajectory prediction in AD. HLTP++ incorporates a novel teacher-student knowledge distillation framework. The "teacher" model equipped with an… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2402.19251

  30. arXiv:2407.06904  [pdf, other

    cs.AI

    Hypergraph based Understanding for Document Semantic Entity Recognition

    Authors: Qiwei Li, Zuchao Li, Ping Wang, Haojun Ai, Hai Zhao

    Abstract: Semantic entity recognition is an important task in the field of visually-rich document understanding. It distinguishes the semantic types of text by analyzing the position relationship between text nodes and the relation between text content. The existing document understanding models mainly focus on entity categories while ignoring the extraction of entity boundaries. We build a novel hypergraph… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  31. arXiv:2407.06886  [pdf, other

    cs.CV cs.AI cs.LG cs.MA cs.RO

    Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI

    Authors: Yang Liu, Weixing Chen, Yongjie Bai, Jingzhou Luo, Xinshuai Song, Kaixuan Jiang, Zhida Li, Ganlong Zhao, Junyi Lin, Guanbin Li, Wen Gao, Liang Lin

    Abstract: Embodied Artificial Intelligence (Embodied AI) is crucial for achieving Artificial General Intelligence (AGI) and serves as a foundation for various applications that bridge cyberspace and the physical world. Recently, the emergence of Multi-modal Large Models (MLMs) and World Models (WMs) have attracted significant attention due to their remarkable perception, interaction, and reasoning capabilit… ▽ More

    Submitted 11 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

    Comments: The first comprehensive review of Embodied AI in the era of MLMs, 37 pages. We also provide the paper list for Embodied AI: https://github.com/HCPLab-SYSU/Embodied_AI_Paper_List

  32. arXiv:2407.06846  [pdf, other

    cs.HC

    SilverCycling: Exploring the Impact of Bike-Based Locomotion on Spatial Orientation for Older Adults in VR

    Authors: Qiongyan Chen, Zhiqing Wu, Yucheng Liu, Lei Han, Zisu Li, Ge Lin Kan, Mingming Fan

    Abstract: Spatial orientation is essential for people to effectively navigate and interact with the environment in everyday life. With age-related cognitive decline, providing VR locomotion techniques with better spatial orientation performance for older adults becomes important. Such advancements not only make VR more accessible to older adults but also enable them to reap the potential health benefits of… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 19 pages, 6 figures

  33. arXiv:2407.06833  [pdf, other

    q-bio.QM cs.CV eess.IV

    Training-free CryoET Tomogram Segmentation

    Authors: Yizhou Zhao, Hengwei Bian, Michael Mu, Mostofa R. Uddin, Zhenyang Li, Xiang Li, Tianyang Wang, Min Xu

    Abstract: Cryogenic Electron Tomography (CryoET) is a useful imaging technology in structural biology that is hindered by its need for manual annotations, especially in particle picking. Recent works have endeavored to remedy this issue with few-shot learning or contrastive learning techniques. However, supervised training is still inevitable for them. We instead choose to leverage the power of existing 2D… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: This preprint has not undergone peer review or any post-submission improvements or corrections. The Version of Record of this contribution will be published in MICCAI 2024

  34. arXiv:2407.06767  [pdf, other

    cs.IT eess.SP

    Enhancing Robustness and Security in ISAC Network Design: Leveraging Transmissive Reconfigurable Intelligent Surface with RSMA

    Authors: Ziwei Liu, Wen Chen, Qingqing Wu, Zhendong Li, Xusheng Zhu, Qiong Wu, Nan Cheng

    Abstract: In this paper, we propose a novel transmissive reconfigurable intelligent surface transceiver-enhanced robust and secure integrated sensing and communication network. A time-division sensing communication mechanism is designed for the scenario, which enables communication and sensing to share wireless resources. To address the interference management problem and hinder eavesdropping, we implement… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  35. arXiv:2407.06642  [pdf, other

    cs.CV

    Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning

    Authors: Fanyue Wei, Wei Zeng, Zhenyang Li, Dawei Yin, Lixin Duan, Wen Li

    Abstract: Personalized text-to-image models allow users to generate varied styles of images (specified with a sentence) for an object (specified with a set of reference images). While remarkable results have been achieved using diffusion-based generation models, the visual structure and details of the object are often unexpectedly changed during the diffusion process. One major reason is that these diffusio… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  36. arXiv:2407.06584  [pdf, other

    cs.RO

    HiLMa-Res: A General Hierarchical Framework via Residual RL for Combining Quadrupedal Locomotion and Manipulation

    Authors: Xiaoyu Huang, Qiayuan Liao, Yiming Ni, Zhongyu Li, Laura Smith, Sergey Levine, Xue Bin Peng, Koushil Sreenath

    Abstract: This work presents HiLMa-Res, a hierarchical framework leveraging reinforcement learning to tackle manipulation tasks while performing continuous locomotion using quadrupedal robots. Unlike most previous efforts that focus on solving a specific task, HiLMa-Res is designed to be general for various loco-manipulation tasks that require quadrupedal robots to maintain sustained mobility. The novel des… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: IROS 2024

  37. arXiv:2407.06334  [pdf, other

    cs.AI q-bio.QM

    Double-Ended Synthesis Planning with Goal-Constrained Bidirectional Search

    Authors: Kevin Yu, Jihye Roh, Ziang Li, Wenhao Gao, Runzhong Wang, Connor W. Coley

    Abstract: Computer-aided synthesis planning (CASP) algorithms have demonstrated expert-level abilities in planning retrosynthetic routes to molecules of low to moderate complexity. However, current search methods assume the sufficiency of reaching arbitrary building blocks, failing to address the common real-world constraint where using specific molecules is desired. To this end, we present a formulation of… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 10 pages main, 4 figures

  38. arXiv:2407.06310  [pdf, other

    cs.SD cs.AI cs.HC cs.LG eess.AS

    Homogeneous Speaker Features for On-the-Fly Dysarthric and Elderly Speaker Adaptation

    Authors: Mengzhe Geng, Xurong Xie, Jiajun Deng, Zengrui Jin, Guinan Li, Tianzi Wang, Shujie Hu, Zhaoqing Li, Helen Meng, Xunying Liu

    Abstract: The application of data-intensive automatic speech recognition (ASR) technologies to dysarthric and elderly adult speech is confronted by their mismatch against healthy and nonaged voices, data scarcity and large speaker-level variability. To this end, this paper proposes two novel data-efficient methods to learn homogeneous dysarthric and elderly speaker-level features for rapid, on-the-fly test-… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: In submission to IEEE/ACM Transactions on Audio, Speech, and Language Processing

  39. arXiv:2407.05709  [pdf, other

    eess.IV cs.CV

    Heterogeneous window transformer for image denoising

    Authors: Chunwei Tian, Menghua Zheng, Chia-Wen Lin, Zhiwu Li, David Zhang

    Abstract: Deep networks can usually depend on extracting more structural information to improve denoising results. However, they may ignore correlation between pixels from an image to pursue better denoising performance. Window transformer can use long- and short-distance modeling to interact pixels to address mentioned problem. To make a tradeoff between distance modeling and denoising time, we propose a h… ▽ More

    Submitted 14 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

  40. arXiv:2407.05600  [pdf, other

    cs.CV

    GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing

    Authors: Zhenyu Wang, Aoxue Li, Zhenguo Li, Xihui Liu

    Abstract: Despite the success achieved by existing image generation and editing methods, current models still struggle with complex problems including intricate text prompts, and the absence of verification and self-correction mechanisms makes the generated images unreliable. Meanwhile, a single model tends to specialize in particular tasks and possess the corresponding capabilities, making it inadequate fo… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  41. arXiv:2407.05562  [pdf, other

    cs.CV

    Focus on the Whole Character: Discriminative Character Modeling for Scene Text Recognition

    Authors: Bangbang Zhou, Yadong Qu, Zixiao Wang, Zicheng Li, Boqiang Zhang, Hongtao Xie

    Abstract: Recently, scene text recognition (STR) models have shown significant performance improvements. However, existing models still encounter difficulties in recognizing challenging texts that involve factors such as severely distorted and perspective characters. These challenging texts mainly cause two problems: (1) Large Intra-Class Variance. (2) Small Inter-Class Variance. An extremely distorted char… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: Accepted to IJCAI2024

  42. arXiv:2407.05370  [pdf, other

    cs.LG

    Learning Label Refinement and Threshold Adjustment for Imbalanced Semi-Supervised Learning

    Authors: Zeju Li, Ying-Qiu Zheng, Chen Chen, Saad Jbabdi

    Abstract: Semi-supervised learning (SSL) algorithms struggle to perform well when exposed to imbalanced training data. In this scenario, the generated pseudo-labels can exhibit a bias towards the majority class, and models that employ these pseudo-labels can further amplify this bias. Here we investigate pseudo-labeling strategies for imbalanced SSL including pseudo-label refinement and threshold adjustment… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  43. arXiv:2407.04999  [pdf, other

    cs.LG

    Rethinking the Effectiveness of Graph Classification Datasets in Benchmarks for Assessing GNNs

    Authors: Zhengdao Li, Yong Cao, Kefan Shuai, Yiming Miao, Kai Hwang

    Abstract: Graph classification benchmarks, vital for assessing and developing graph neural networks (GNNs), have recently been scrutinized, as simple methods like MLPs have demonstrated comparable performance. This leads to an important question: Do these benchmarks effectively distinguish the advancements of GNNs over other methodologies? If so, how do we quantitatively measure this effectiveness? In respo… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  44. arXiv:2407.04989  [pdf, ps, other

    cs.DS

    FPTAS for Holant Problems with Log-Concave Signatures

    Authors: Kun He, Zhidan Li, Guoliang Qiu, Chihao Zhang

    Abstract: For an integer $b\ge 0$, a $b$-matching in a graph $G=(V,E)$ is a set $S\subseteq E$ such that each vertex $v\in V$ is incident to at most $b$ edges in $S$. We design a fully polynomial-time approximation scheme (FPTAS) for counting the number of $b$-matchings in graphs with bounded degrees. Our FPTAS also applies to a broader family of counting problems, namely Holant problems with log-concave si… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  45. arXiv:2407.04903  [pdf, other

    cs.CL cs.AI cs.CV

    MMSci: A Multimodal Multi-Discipline Dataset for PhD-Level Scientific Comprehension

    Authors: Zekun Li, Xianjun Yang, Kyuri Choi, Wanrong Zhu, Ryan Hsieh, HyeonJung Kim, Jin Hyuk Lim, Sungyoung Ji, Byungju Lee, Xifeng Yan, Linda Ruth Petzold, Stephen D. Wilson, Woosang Lim, William Yang Wang

    Abstract: The rapid advancement of Large Language Models (LLMs) and Large Multimodal Models (LMMs) has heightened the demand for AI-based scientific assistants capable of understanding scientific articles and figures. Despite progress, there remains a significant gap in evaluating models' comprehension of professional, graduate-level, and even PhD-level scientific content. Current datasets and benchmarks pr… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Code and data are available at https://github.com/Leezekun/MMSci

  46. arXiv:2407.04711  [pdf, other

    cs.CV cs.AI eess.IV

    MetaFruit Meets Foundation Models: Leveraging a Comprehensive Multi-Fruit Dataset for Advancing Agricultural Foundation Models

    Authors: Jiajia Li, Kyle Lammers, Xunyuan Yin, Xiang Yin, Long He, Renfu Lu, Zhaojian Li

    Abstract: Fruit harvesting poses a significant labor and financial burden for the industry, highlighting the critical need for advancements in robotic harvesting solutions. Machine vision-based fruit detection has been recognized as a crucial component for robust identification of fruits to guide robotic manipulation. Despite considerable progress in leveraging deep learning and machine learning techniques… ▽ More

    Submitted 13 May, 2024; originally announced July 2024.

    Comments: 14 pages, 5 figures, 7 tables

  47. arXiv:2407.04675  [pdf, other

    eess.AS cs.SD

    Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition

    Authors: Ye Bai, Jingping Chen, Jitong Chen, Wei Chen, Zhuo Chen, Chuang Ding, Linhao Dong, Qianqian Dong, Yujiao Du, Kepan Gao, Lu Gao, Yi Guo, Minglun Han, Ting Han, Wenchao Hu, Xinying Hu, Yuxiang Hu, Deyu Hua, Lu Huang, Mingkun Huang, Youjia Huang, Jishuo Jin, Fanliu Kong, Zongwei Lan, Tianyu Li , et al. (30 additional authors not shown)

    Abstract: Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this wor… ▽ More

    Submitted 10 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

  48. arXiv:2407.04656  [pdf, other

    cs.DC cs.LG

    Lazarus: Resilient and Elastic Training of Mixture-of-Experts Models with Adaptive Expert Placement

    Authors: Yongji Wu, Wenjie Qu, Tianyang Tao, Zhuang Wang, Wei Bai, Zhuohao Li, Yuan Tian, Jiaheng Zhang, Matthew Lentz, Danyang Zhuo

    Abstract: Sparsely-activated Mixture-of-Experts (MoE) architecture has increasingly been adopted to further scale large language models (LLMs) due to its sub-linear scaling for computation costs. However, frequent failures still pose significant challenges as training scales. The cost of even a single failure is significant, as all GPUs need to wait idle until the failure is resolved, potentially losing con… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  49. arXiv:2407.04467  [pdf, other

    cs.AI cs.CL cs.GT

    Are Large Language Models Strategic Decision Makers? A Study of Performance and Bias in Two-Player Non-Zero-Sum Games

    Authors: Nathan Herr, Fernando Acero, Roberta Raileanu, María Pérez-Ortiz, Zhibin Li

    Abstract: Large Language Models (LLMs) have been increasingly used in real-world settings, yet their strategic abilities remain largely unexplored. Game theory provides a good framework for assessing the decision-making abilities of LLMs in interactions with other agents. Although prior studies have shown that LLMs can solve these tasks with carefully curated prompts, they fail when the problem setting or p… ▽ More

    Submitted 16 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

    Comments: 8 pages (19 with appendix), 6 figures in the main body (4 in the appendix), 4 tables in the main body

  50. arXiv:2407.04206  [pdf, other

    math.NA cs.CE

    Computational Graph Representation of Equations System Constructors in Hierarchical Circuit Simulation

    Authors: Zichao Long, Lin Li, Lei Han, Xianglong Meng, Chongjun Ding, Ruiyan Li, Wu Jiang, Fuchen Ding, Jiaqing Yue, Zhichao Li, Yisheng Hu, Ding Li, Heng Liao

    Abstract: Equations system constructors of hierarchical circuits play a central role in device modeling, nonlinear equations solving, and circuit design automation. However, existing constructors present limitations in applications to different extents. For example, the costs of developing and reusing device models -- especially coarse-grained equivalent models of circuit modules -- remain high while parame… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.