Skip to main content

Showing 1–50 of 869 results for author: Mao, L

  1. arXiv:2407.11096  [pdf, other

    cs.LG cs.AI

    Static and multivariate-temporal attentive fusion transformer for readmission risk prediction

    Authors: Zhe Sun, Runzhi Li, Jing Wang, Gang Chen, Siyu Yan, Lihong Ma

    Abstract: Background: Accurate short-term readmission prediction of ICU patients is significant in improving the efficiency of resource assignment by assisting physicians in making discharge decisions. Clinically, both individual static static and multivariate temporal data collected from ICU monitors play critical roles in short-term readmission prediction. Informative static and multivariate temporal feat… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  2. arXiv:2407.09826  [pdf, other

    cs.CV

    3D Weakly Supervised Semantic Segmentation with 2D Vision-Language Guidance

    Authors: Xiaoxu Xu, Yitian Yuan, Jinlong Li, Qiudan Zhang, Zequn Jie, Lin Ma, Hao Tang, Nicu Sebe, Xu Wang

    Abstract: In this paper, we propose 3DSS-VLG, a weakly supervised approach for 3D Semantic Segmentation with 2D Vision-Language Guidance, an alternative approach that a 3D model predicts dense-embedding for each point which is co-embedded with both the aligned image and text spaces from the 2D vision-language model. Specifically, our method exploits the superior generalization ability of the 2D vision-langu… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  3. arXiv:2407.09618  [pdf, other

    cs.LG cs.SI

    The Heterophilic Graph Learning Handbook: Benchmarks, Models, Theoretical Analysis, Applications and Challenges

    Authors: Sitao Luan, Chenqing Hua, Qincheng Lu, Liheng Ma, Lirong Wu, Xinyu Wang, Minkai Xu, Xiao-Wen Chang, Doina Precup, Rex Ying, Stan Z. Li, Jian Tang, Guy Wolf, Stefanie Jegelka

    Abstract: Homophily principle, \ie{} nodes with the same labels or similar attributes are more likely to be connected, has been commonly believed to be the main reason for the superiority of Graph Neural Networks (GNNs) over traditional Neural Networks (NNs) on graph-structured data, especially on node-level tasks. However, recent work has identified a non-trivial set of datasets where GNN's performance com… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Suggestions and comments are welcomed at sitao.luan@mail.mcgill.ca!

  4. arXiv:2407.08872  [pdf, other

    cs.CV

    Visual Multi-Object Tracking with Re-Identification and Occlusion Handling using Labeled Random Finite Sets

    Authors: Linh Van Ma, Tran Thien Dat Nguyen, Changbeom Shim, Du Yong Kim, Namkoo Ha, Moongu Jeon

    Abstract: This paper proposes an online visual multi-object tracking (MOT) algorithm that resolves object appearance-reappearance and occlusion. Our solution is based on the labeled random finite set (LRFS) filtering approach, which in principle, addresses disappearance, appearance, reappearance, and occlusion via a single Bayesian recursion. However, in practice, existing numerical approximations cause rea… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  5. arXiv:2407.08801  [pdf, other

    cs.CV

    DG-PIC: Domain Generalized Point-In-Context Learning for Point Cloud Understanding

    Authors: Jincen Jiang, Qianyu Zhou, Yuhang Li, Xuequan Lu, Meili Wang, Lizhuang Ma, Jian Chang, Jian Jun Zhang

    Abstract: Recent point cloud understanding research suffers from performance drops on unseen data, due to the distribution shifts across different domains. While recent studies use Domain Generalization (DG) techniques to mitigate this by learning domain-invariant features, most are designed for a single task and neglect the potential of testing data. Despite In-Context Learning (ICL) showcasing multi-task… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  6. arXiv:2407.08554  [pdf, other

    cs.AI cs.HC

    Establishing Rigorous and Cost-effective Clinical Trials for Artificial Intelligence Models

    Authors: Wanling Gao, Yunyou Huang, Dandan Cui, Zhuoming Yu, Wenjing Liu, Xiaoshuang Liang, Jiahui Zhao, Jiyue Xie, Hao Li, Li Ma, Ning Ye, Yumiao Kang, Dingfeng Luo, Peng Pan, Wei Huang, Zhongmou Liu, Jizhong Hu, Gangyuan Zhao, Chongrong Jiang, Fan Huang, Tianyi Wei, Suqin Tang, Bingjie Xia, Zhifei Zhang, Jianfeng Zhan

    Abstract: A profound gap persists between artificial intelligence (AI) and clinical practice in medicine, primarily due to the lack of rigorous and cost-effective evaluation methodologies. State-of-the-art and state-of-the-practice AI model evaluations are limited to laboratory studies on medical datasets or direct clinical trials with no or solely patient-centered controls. Moreover, the crucial role of cl… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 23 pages

  7. arXiv:2407.08374  [pdf, other

    cs.CV

    Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Cross-Regularization

    Authors: Jinlong Li, Zequn Jie, Elisa Ricci, Lin Ma, Nicu Sebe

    Abstract: Efficient finetuning of vision-language models (VLMs) like CLIP for specific downstream tasks is gaining significant attention. Previous works primarily focus on prompt learning to adapt the CLIP into a variety of downstream tasks, however, suffering from task overfitting when finetuned on a small data set. In this paper, we introduce an orthogonal finetuning method for efficiently updating pretra… ▽ More

    Submitted 15 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

  8. arXiv:2407.07844  [pdf, other

    cs.CV

    OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion

    Authors: Hao Wang, Pengzhen Ren, Zequn Jie, Xiao Dong, Chengjian Feng, Yinlong Qian, Lin Ma, Dongmei Jiang, Yaowei Wang, Xiangyuan Lan, Xiaodan Liang

    Abstract: Open-vocabulary detection is a challenging task due to the requirement of detecting objects based on class names, including those not encountered during training. Existing methods have shown strong zero-shot detection capabilities through pre-training on diverse large-scale datasets. However, these approaches still face two primary challenges: (i) how to universally integrate diverse data sources… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Technical Report

  9. arXiv:2407.07728  [pdf, other

    cs.SD cs.AI cs.MM eess.AS

    SaMoye: Zero-shot Singing Voice Conversion Based on Feature Disentanglement and Synthesis

    Authors: Zihao Wang, Le Ma, Yan Liu, Kejun Zhang

    Abstract: Singing voice conversion (SVC) aims to convert a singer's voice in a given music piece to another singer while keeping the original content. We propose an end-to-end feature disentanglement-based model, which we named SaMoye, to enable zero-shot many-to-many singing voice conversion. SaMoye disentangles the features of the singing voice into content features, timbre features, and pitch features re… ▽ More

    Submitted 10 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: 7 pages, 4 figures

    MSC Class: 68Txx(Primary)14F05; 91Fxx(Secondary) ACM Class: I.2.7; J.5

  10. arXiv:2407.07342  [pdf, other

    cs.CL

    Multilingual Blending: LLM Safety Alignment Evaluation with Language Mixture

    Authors: Jiayang Song, Yuheng Huang, Zhehua Zhou, Lei Ma

    Abstract: As safety remains a crucial concern throughout the development lifecycle of Large Language Models (LLMs), researchers and industrial practitioners have increasingly focused on safeguarding and aligning LLM behaviors with human preferences and ethical standards. LLMs, trained on extensive multilingual corpora, exhibit powerful generalization abilities across diverse languages and domains. However,… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  11. arXiv:2407.07093  [pdf, other

    cs.CL cs.AI cs.LG

    FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation

    Authors: Liqun Ma, Mingjie Sun, Zhiqiang Shen

    Abstract: This work presents a Fully BInarized Large Language Model (FBI-LLM), demonstrating for the first time how to train a large-scale binary language model from scratch (not the partial binary or ternary LLM like BitNet b1.58) to match the performance of its full-precision counterparts (e.g., FP16 or BF16) in transformer-based LLMs. It achieves this by employing an autoregressive distillation (AD) loss… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: Github at https://github.com/LiqunMa/FBI-LLM

  12. arXiv:2407.06951  [pdf, other

    cs.RO

    RoboCAS: A Benchmark for Robotic Manipulation in Complex Object Arrangement Scenarios

    Authors: Liming Zheng, Feng Yan, Fanfan Liu, Chengjian Feng, Zhuoliang Kang, Lin Ma

    Abstract: Foundation models hold significant potential for enabling robots to perform long-horizon general manipulation tasks. However, the simplicity of tasks and the uniformity of environments in existing benchmarks restrict their effective deployment in complex scenarios. To address this limitation, this paper introduces the \textit{RoboCAS} benchmark, the first benchmark specifically designed for comple… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  13. arXiv:2407.06698  [pdf, ps, other

    cs.CV cs.LG

    PSPU: Enhanced Positive and Unlabeled Learning by Leveraging Pseudo Supervision

    Authors: Chengjie Wang, Chengming Xu, Zhenye Gan, Jianlong Hu, Wenbing Zhu, Lizhuag Ma

    Abstract: Positive and Unlabeled (PU) learning, a binary classification model trained with only positive and unlabeled data, generally suffers from overfitted risk estimation due to inconsistent data distributions. To address this, we introduce a pseudo-supervised PU learning framework (PSPU), in which we train the PU model first, use it to gather confident samples for the pseudo supervision, and then apply… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: accepted by ICME2024

  14. arXiv:2407.05138  [pdf, other

    cs.SE cs.AI

    Vortex under Ripplet: An Empirical Study of RAG-enabled Applications

    Authors: Yuchen Shao, Yuheng Huang, Jiawei Shen, Lei Ma, Ting Su, Chengcheng Wan

    Abstract: Large language models (LLMs) enhanced by retrieval-augmented generation (RAG) provide effective solutions in various application scenarios. However, developers face challenges in integrating RAG-enhanced LLMs into software systems, due to lack of interface specification, requirements from software context, and complicated system management. In this paper, we manually studied 100 open-source applic… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  15. arXiv:2407.04618  [pdf, ps, other

    cs.CC

    Encoding of algebraic geometry codes with quasi-linear complexity $O(N\log N)$

    Authors: Songsong Li, Shu Liu, Liming Ma, Yunqi Wan, Chaoping Xing

    Abstract: Fast encoding and decoding of codes have been always an important topic in code theory as well as complexity theory. Although encoding is easier than decoding in general, designing an encoding algorithm of codes of length $N$ with quasi-linear complexity $O(N\log N)$ is not an easy task. Despite the fact that algebraic geometry codes were discovered in the early of 1980s, encoding algorithms of al… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  16. arXiv:2407.04292  [pdf, other

    cs.AR cs.RO

    Corki: Enabling Real-time Embodied AI Robots via Algorithm-Architecture Co-Design

    Authors: Yiyang Huang, Yuhui Hao, Bo Yu, Feng Yan, Yuxin Yang, Feng Min, Yinhe Han, Lin Ma, Shaoshan Liu, Qiang Liu, Yiming Gan

    Abstract: Embodied AI robots have the potential to fundamentally improve the way human beings live and manufacture. Continued progress in the burgeoning field of using large language models to control robots depends critically on an efficient computing substrate. In particular, today's computing systems for embodied AI robots are designed purely based on the interest of algorithm developers, where robot act… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  17. arXiv:2407.03771  [pdf, other

    cs.CV

    SpikeGS: Reconstruct 3D scene via fast-moving bio-inspired sensors

    Authors: Yijia Guo, Liwen Hu, Lei Ma, Tiejun Huang

    Abstract: 3D Gaussian Splatting (3DGS) demonstrates unparalleled superior performance in 3D scene reconstruction. However, 3DGS heavily relies on the sharp images. Fulfilling this requirement can be challenging in real-world scenarios especially when the camera moves fast, which severely limits the application of 3DGS. To address these challenges, we proposed Spike Gausian Splatting (SpikeGS), the first fra… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  18. arXiv:2407.03374  [pdf

    cs.AI cs.SE eess.SP eess.SY

    An Outline of Prognostics and Health Management Large Model: Concepts, Paradigms, and Challenges

    Authors: Laifa Tao, Shangyu Li, Haifei Liu, Qixuan Huang, Liang Ma, Guoao Ning, Yiling Chen, Yunlong Wu, Bin Li, Weiwei Zhang, Zhengduo Zhao, Wenchao Zhan, Wenyan Cao, Chao Wang, Hongmei Liu, Jian Ma, Mingliang Suo, Yujie Cheng, Yu Ding, Dengwei Song, Chen Lu

    Abstract: Prognosis and Health Management (PHM), critical for ensuring task completion by complex systems and preventing unexpected failures, is widely adopted in aerospace, manufacturing, maritime, rail, energy, etc. However, PHM's development is constrained by bottlenecks like generalization, interpretation and verification abilities. Presently, generative artificial intelligence (AI), represented by Larg… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  19. arXiv:2407.02842  [pdf, other

    cs.CV cs.AI cs.CL

    MindBench: A Comprehensive Benchmark for Mind Map Structure Recognition and Analysis

    Authors: Lei Chen, Feng Yan, Yujie Zhong, Shaoxiang Chen, Zequn Jie, Lin Ma

    Abstract: Multimodal Large Language Models (MLLM) have made significant progress in the field of document analysis. Despite this, existing benchmarks typically focus only on extracting text and simple layout information, neglecting the complex interactions between elements in structured documents such as mind maps and flowcharts. To address this issue, we introduce the new benchmark named MindBench, which n… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: technical report

  20. arXiv:2407.01003  [pdf, other

    cs.CV cs.AI

    Embedded Prompt Tuning: Towards Enhanced Calibration of Pretrained Models for Medical Images

    Authors: Wenqiang Zu, Shenghao Xie, Qing Zhao, Guoqi Li, Lei Ma

    Abstract: Foundation models pre-trained on large-scale data have been widely witnessed to achieve success in various natural imaging downstream tasks. Parameter-efficient fine-tuning (PEFT) methods aim to adapt foundation models to new domains by updating only a small portion of parameters in order to reduce computational overhead. However, the effectiveness of these PEFT methods, especially in cross-domain… ▽ More

    Submitted 2 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

    Comments: 16 pages, 7 figures. arXiv admin note: text overlap with arXiv:2306.09579, arXiv:2203.12119 by other authors

  21. arXiv:2407.00088  [pdf, other

    cs.DC cs.AI

    T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge

    Authors: Jianyu Wei, Shijie Cao, Ting Cao, Lingxiao Ma, Lei Wang, Yanyong Zhang, Mao Yang

    Abstract: The deployment of Large Language Models (LLMs) on edge devices is increasingly important to enhance on-device intelligence. Weight quantization is crucial for reducing the memory footprint of LLMs on devices. However, low-bit LLMs necessitate mixed precision matrix multiplication (mpGEMM) of low precision weights and high precision activations during inference. Existing systems, lacking native sup… ▽ More

    Submitted 25 June, 2024; originally announced July 2024.

  22. arXiv:2406.18977  [pdf, other

    cs.RO cs.CL cs.CV

    RoboUniView: Visual-Language Model with Unified View Representation for Robotic Manipulaiton

    Authors: Fanfan Liu, Feng Yan, Liming Zheng, Chengjian Feng, Yiyang Huang, Lin Ma

    Abstract: Utilizing Vision-Language Models (VLMs) for robotic manipulation represents a novel paradigm, aiming to enhance the model's ability to generalize to new objects and instructions. However, due to variations in camera specifications and mounting positions, existing methods exhibit significant performance disparities across different robotic platforms. To address this challenge, we propose RoboUniVie… ▽ More

    Submitted 12 July, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

  23. arXiv:2406.18817  [pdf, other

    cs.CV cs.AI

    Correspondence-Free Non-Rigid Point Set Registration Using Unsupervised Clustering Analysis

    Authors: Mingyang Zhao, Jingen Jiang, Lei Ma, Shiqing Xin, Gaofeng Meng, Dong-Ming Yan

    Abstract: This paper presents a novel non-rigid point set registration method that is inspired by unsupervised clustering analysis. Unlike previous approaches that treat the source and target point sets as separate entities, we develop a holistic framework where they are formulated as clustering centroids and clustering members, separately. We then adopt Tikhonov regularization with an $\ell_1$-induced Lapl… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: [CVPR 2024 Highlight] Project and code at: https://github.com/zikai1/CVPR24_PointSetReg

  24. arXiv:2406.18102  [pdf

    eess.IV cs.CV

    A Lung Nodule Dataset with Histopathology-based Cancer Type Annotation

    Authors: Muwei Jian, Hongyu Chen, Zaiyong Zhang, Nan Yang, Haorang Zhang, Lifu Ma, Wenjing Xu, Huixiang Zhi

    Abstract: Recently, Computer-Aided Diagnosis (CAD) systems have emerged as indispensable tools in clinical diagnostic workflows, significantly alleviating the burden on radiologists. Nevertheless, despite their integration into clinical settings, CAD systems encounter limitations. Specifically, while CAD systems can achieve high performance in the detection of lung nodules, they face challenges in accuratel… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  25. arXiv:2406.17287  [pdf, other

    cs.CL cs.AI

    Predicting the Big Five Personality Traits in Chinese Counselling Dialogues Using Large Language Models

    Authors: Yang Yan, Lizhi Ma, Anqi Li, Jingsong Ma, Zhenzhong Lan

    Abstract: Accurate assessment of personality traits is crucial for effective psycho-counseling, yet traditional methods like self-report questionnaires are time-consuming and biased. This study exams whether Large Language Models (LLMs) can predict the Big Five personality traits directly from counseling dialogues and introduces an innovative framework to perform the task. Our framework applies role-play an… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  26. arXiv:2406.16710  [pdf, other

    cs.CV

    Portrait3D: 3D Head Generation from Single In-the-wild Portrait Image

    Authors: Jinkun Hao, Junshu Tang, Jiangning Zhang, Ran Yi, Yijia Hong, Moran Li, Weijian Cao, Yating Wang, Lizhuang Ma

    Abstract: While recent works have achieved great success on one-shot 3D common object generation, high quality and fidelity 3D head generation from a single image remains a great challenge. Previous text-based methods for generating 3D heads were limited by text descriptions and image-based methods struggled to produce high-quality head geometry. To handle this challenging problem, we propose a novel framew… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: https://jinkun-hao.github.io/Portrait3D/

  27. arXiv:2406.15501  [pdf

    cs.CR

    Secure Combination of Untrusted Time information Based on Optimized Dempster-Shafer Theory

    Authors: Yang Li, Yujie Luo, Yichen Zhang, Ao Sun, Wei Huang, Shuai Zhang, Tao Zhang, Chuang Zhou, Li Ma, Jie Yang, Mei Wu, Heng Wang, Yan Pan, Yun Shao, Xing Chen, Ziyang Chen, Song Yu, Hong Guo, Bingjie Xu

    Abstract: Secure precision time synchronization is important for applications of Cyber-Physical Systems. However, several attacks, especially the Time Delay Attack (TDA), deteriorates the performance of time synchronization system seriously. Multiple paths scheme is thought as an effective security countermeasure to decrease the influence of TDA. However, the effective secure combination algorithm is still… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  28. arXiv:2406.13870  [pdf, other

    cs.CV

    Splatter a Video: Video Gaussian Representation for Versatile Processing

    Authors: Yang-Tian Sun, Yi-Hua Huang, Lin Ma, Xiaoyang Lyu, Yan-Pei Cao, Xiaojuan Qi

    Abstract: Video representation is a long-standing problem that is crucial for various down-stream tasks, such as tracking,depth prediction,segmentation,view synthesis,and editing. However, current methods either struggle to model complex motions due to the absence of 3D structure or rely on implicit 3D representations that are ill-suited for manipulation tasks. To address these challenges, we introduce a no… ▽ More

    Submitted 26 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

  29. arXiv:2406.13173  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Biomedical Visual Instruction Tuning with Clinician Preference Alignment

    Authors: Hejie Cui, Lingjun Mao, Xin Liang, Jieyu Zhang, Hui Ren, Quanzheng Li, Xiang Li, Carl Yang

    Abstract: Recent advancements in multimodal foundation models have showcased impressive capabilities in understanding and reasoning with visual and textual information. Adapting these foundation models trained for general usage to specialized domains like biomedicine requires large-scale domain-specific instruction datasets. While existing works have explored curating such datasets automatically, the result… ▽ More

    Submitted 16 July, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

    MSC Class: 68T50; 68T45; 68T37; 68T05; 68T07; 68T09; ACM Class: I.2.7; I.2.6; I.2.10

  30. arXiv:2406.13145  [pdf, other

    eess.SY cs.LG

    Constructing and Evaluating Digital Twins: An Intelligent Framework for DT Development

    Authors: Longfei Ma, Nan Cheng, Xiucheng Wang, Jiong Chen, Yinjun Gao, Dongxiao Zhang, Jun-Jie Zhang

    Abstract: The development of Digital Twins (DTs) represents a transformative advance for simulating and optimizing complex systems in a controlled digital space. Despite their potential, the challenge of constructing DTs that accurately replicate and predict the dynamics of real-world systems remains substantial. This paper introduces an intelligent framework for the construction and evaluation of DTs, spec… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  31. GMP-AR: Granularity Message Passing and Adaptive Reconciliation for Temporal Hierarchy Forecasting

    Authors: Fan Zhou, Chen Pan, Lintao Ma, Yu Liu, James Zhang, Jun Zhou, Hongyuan Mei, Weitao Lin, Zi Zhuang, Wenxin Ning, Yunhua Hu, Siqiao Xue

    Abstract: Time series forecasts of different temporal granularity are widely used in real-world applications, e.g., sales prediction in days and weeks for making different inventory plans. However, these tasks are usually solved separately without ensuring coherence, which is crucial for aligning downstream decisions. Previous works mainly focus on ensuring coherence with some straightforward methods, e.g.,… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  32. PIG: Prompt Images Guidance for Night-Time Scene Parsing

    Authors: Zhifeng Xie, Rui Qiu, Sen Wang, Xin Tan, Yuan Xie, Lizhuang Ma

    Abstract: Night-time scene parsing aims to extract pixel-level semantic information in night images, aiding downstream tasks in understanding scene object distribution. Due to limited labeled night image datasets, unsupervised domain adaptation (UDA) has become the predominant method for studying night scenes. UDA typically relies on paired day-night image pairs to guide adaptation, but this approach hamper… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: This paper is accepted by IEEE TIP. Code: https://github.com/qiurui4shu/PIG

  33. arXiv:2406.10236  [pdf, other

    eess.IV cs.AI

    Lightening Anything in Medical Images

    Authors: Ben Fei, Yixuan Li, Weidong Yang, Hengjun Gao, Jingyi Xu, Lipeng Ma, Yatian Yang, Pinghong Zhou

    Abstract: The development of medical imaging techniques has made a significant contribution to clinical decision-making. However, the existence of suboptimal imaging quality, as indicated by irregular illumination or imbalanced intensity, presents significant obstacles in automating disease screening, analysis, and diagnosis. Existing approaches for natural image enhancement are mostly trained with numerous… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: 23 pages, 6 figures

  34. arXiv:2406.09905  [pdf, other

    cs.CV cs.GR

    Nymeria: A Massive Collection of Multimodal Egocentric Daily Motion in the Wild

    Authors: Lingni Ma, Yuting Ye, Fangzhou Hong, Vladimir Guzov, Yifeng Jiang, Rowan Postyeni, Luis Pesqueira, Alexander Gamino, Vijay Baiyya, Hyo Jin Kim, Kevin Bailey, David Soriano Fosas, C. Karen Liu, Ziwei Liu, Jakob Engel, Renzo De Nardi, Richard Newcombe

    Abstract: We introduce Nymeria - a large-scale, diverse, richly annotated human motion dataset collected in the wild with multiple multimodal egocentric devices. The dataset comes with a) full-body 3D motion ground truth; b) egocentric multimodal recordings from Project Aria devices with RGB, grayscale, eye-tracking cameras, IMUs, magnetometer, barometer, and microphones; and c) an additional "observer" dev… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  35. arXiv:2406.09844  [pdf, other

    cs.SD eess.AS

    Vec-Tok-VC+: Residual-enhanced Robust Zero-shot Voice Conversion with Progressive Constraints in a Dual-mode Training Strategy

    Authors: Linhan Ma, Xinfa Zhu, Yuanjun Lv, Zhichao Wang, Ziqian Wang, Wendi He, Hongbin Zhou, Lei Xie

    Abstract: Zero-shot voice conversion (VC) aims to transform source speech into arbitrary unseen target voice while keeping the linguistic content unchanged. Recent VC methods have made significant progress, but semantic losses in the decoupling process as well as training-inference mismatch still hinder conversion performance. In this paper, we propose Vec-Tok-VC+, a novel prompt-based zero-shot VC model im… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH2024

  36. arXiv:2406.09485  [pdf, other

    cs.SE

    Integrated Modeling, Verification, and Code Generation for Unmanned Aerial Systems

    Authors: Jianyu Zhang, Long Zhang, Yixuan Wu, Linru Ma, Feng Yang

    Abstract: Unmanned Aerial Systems (UAS) are currently widely used in safety-critical fields such as industrial production, military operations, and disaster relief. Due to the diversity and complexity of application scenarios, UAS have become increasingly intricate. The challenge of designing and implementing highly reliable UAS while effectively controlling development costs and enhancing efficiency is a p… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  37. arXiv:2406.08845  [pdf, other

    cs.CV

    Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability,Reproducibility, and Practicality

    Authors: Tianle Zhang, Langtian Ma, Yuchen Yan, Yuchen Zhang, Kai Wang, Yue Yang, Ziyao Guo, Wenqi Shao, Yang You, Yu Qiao, Ping Luo, Kaipeng Zhang

    Abstract: Recent text-to-video (T2V) technology advancements, as demonstrated by models such as Gen2, Pika, and Sora, have significantly broadened its applicability and popularity. Despite these strides, evaluating these models poses substantial challenges. Primarily, due to the limitations inherent in automatic metrics, manual evaluation is often considered a superior method for assessing T2V generation. H… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  38. arXiv:2406.08731  [pdf, other

    cs.SE

    Where Do Large Language Models Fail When Generating Code?

    Authors: Zhijie Wang, Zijie Zhou, Da Song, Yuheng Huang, Shengmai Chen, Lei Ma, Tianyi Zhang

    Abstract: Large Language Models (LLMs) have shown great potential in code generation. However, current LLMs still cannot reliably generate correct code. Moreover, it is unclear what kinds of code generation errors LLMs can make. To address this, we conducted an empirical study to analyze incorrect code snippets generated by six popular LLMs on the HumanEval dataset. We analyzed these errors alongside two di… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Extended from our MAPS 2023 paper. Our data is available at https://llm-code-errors.cs.purdue.edu

  39. arXiv:2406.08024  [pdf, other

    cs.CV cs.AI

    Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities in Large Vision-Language Models

    Authors: Shimin Chen, Yitian Yuan, Shaoxiang Chen, Zequn Jie, Lin Ma

    Abstract: Amidst the advancements in image-based Large Vision-Language Models (image-LVLM), the transition to video-based models (video-LVLM) is hindered by the limited availability of quality video data. This paper addresses the challenge by leveraging the visual commonalities between images and videos to efficiently evolve image-LVLMs into video-LVLMs. We present a cost-effective video-LVLM that enhances… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  40. arXiv:2406.07362  [pdf, other

    cs.HC

    AI.vs.Clinician: Unveiling Intricate Interactions Between AI and Clinicians through an Open-Access Database

    Authors: Wanling Gao, Yuan Liu, Zhuoming Yu, Dandan Cui, Wenjing Liu, Xiaoshuang Liang, Jiahui Zhao, Jiyue Xie, Hao Li, Li Ma, Ning Ye, Yumiao Kang, Dingfeng Luo, Peng Pan, Wei Huang, Zhongmou Liu, Jizhong Hu, Fan Huang, Gangyuan Zhao, Chongrong Jiang, Tianyi Wei, Zhifei Zhang, Yunyou Huang, Jianfeng Zhan

    Abstract: Artificial Intelligence (AI) plays a crucial role in medical field and has the potential to revolutionize healthcare practices. However, the success of AI models and their impacts hinge on the synergy between AI and medical specialists, with clinicians assuming a dominant role. Unfortunately, the intricate dynamics and interactions between AI and clinicians remain undiscovered and thus hinder AI f… ▽ More

    Submitted 15 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: 12 pages

  41. arXiv:2406.05955  [pdf, other

    cs.LG cs.CL

    Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters

    Authors: Yixin Song, Haotong Xie, Zhengyan Zhang, Bo Wen, Li Ma, Zeyu Mi, Haibo Chen

    Abstract: Exploiting activation sparsity is a promising approach to significantly accelerating the inference process of large language models (LLMs) without compromising performance. However, activation sparsity is determined by activation functions, and commonly used ones like SwiGLU and GeGLU exhibit limited sparsity. Simply replacing these functions with ReLU fails to achieve sufficient sparsity. Moreove… ▽ More

    Submitted 10 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

  42. arXiv:2406.05746  [pdf

    cs.AI cs.HC cs.LG

    Methodology and Real-World Applications of Dynamic Uncertain Causality Graph for Clinical Diagnosis with Explainability and Invariance

    Authors: Zhan Zhang, Qin Zhang, Yang Jiao, Lin Lu, Lin Ma, Aihua Liu, Xiao Liu, Juan Zhao, Yajun Xue, Bing Wei, Mingxia Zhang, Ru Gao, Hong Zhao, Jie Lu, Fan Li, Yang Zhang, Yiming Wang, Lei Zhang, Fengwei Tian, Jie Hu, Xin Gou

    Abstract: AI-aided clinical diagnosis is desired in medical care. Existing deep learning models lack explainability and mainly focus on image analysis. The recently developed Dynamic Uncertain Causality Graph (DUCG) approach is causality-driven, explainable, and invariant across different application scenarios, without problems of data collection, labeling, fitting, privacy, bias, generalization, high cost… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Journal ref: Artificaial Intelligence Review, (2024) 57:151

  43. arXiv:2406.05260  [pdf, other

    stat.ML cs.LG

    Generative modeling of density regression through tree flows

    Authors: Zhuoqun Wang, Naoki Awaya, Li Ma

    Abstract: A common objective in the analysis of tabular data is estimating the conditional distribution (in contrast to only producing predictions) of a set of "outcome" variables given a set of "covariates", which is sometimes referred to as the "density regression" problem. Beyond estimation on the conditional distribution, the generative ability of drawing synthetic samples from the learned conditional d… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 24 pages, 9 figures

  44. arXiv:2406.04531  [pdf, other

    cs.SE

    TESTEVAL: Benchmarking Large Language Models for Test Case Generation

    Authors: Wenhan Wang, Chenyuan Yang, Zhijie Wang, Yuheng Huang, Zhaoyang Chu, Da Song, Lingming Zhang, An Ran Chen, Lei Ma

    Abstract: Testing plays a crucial role in the software development cycle, enabling the detection of bugs, vulnerabilities, and other undesirable behaviors. To perform software testing, testers need to write code snippets that execute the program under test. Recently, researchers have recognized the potential of large language models (LLMs) in software testing. However, there remains a lack of fair compariso… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  45. arXiv:2406.04484  [pdf, ps, other

    cs.CV

    Step Out and Seek Around: On Warm-Start Training with Incremental Data

    Authors: Maying Shen, Hongxu Yin, Pavlo Molchanov, Lei Mao, Jose M. Alvarez

    Abstract: Data often arrives in sequence over time in real-world deep learning applications such as autonomous driving. When new training data is available, training the model from scratch undermines the benefit of leveraging the learned knowledge, leading to significant training costs. Warm-starting from a previously trained checkpoint is the most intuitive way to retain knowledge and advance learning. How… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  46. arXiv:2406.03912  [pdf, other

    cs.AI cs.LG cs.RO eess.SY

    GenSafe: A Generalizable Safety Enhancer for Safe Reinforcement Learning Algorithms Based on Reduced Order Markov Decision Process Model

    Authors: Zhehua Zhou, Xuan Xie, Jiayang Song, Zhan Shu, Lei Ma

    Abstract: Although deep reinforcement learning has demonstrated impressive achievements in controlling various autonomous systems, e.g., autonomous vehicles or humanoid robots, its inherent reliance on random exploration raises safety concerns in their real-world applications. To improve system safety during the learning process, a variety of Safe Reinforcement Learning (SRL) algorithms have been proposed,… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  47. arXiv:2406.02263  [pdf, other

    cs.CV

    M3DM-NR: RGB-3D Noisy-Resistant Industrial Anomaly Detection via Multimodal Denoising

    Authors: Chengjie Wang, Haokun Zhu, Jinlong Peng, Yue Wang, Ran Yi, Yunsheng Wu, Lizhuang Ma, Jiangning Zhang

    Abstract: Existing industrial anomaly detection methods primarily concentrate on unsupervised learning with pristine RGB images. Yet, both RGB and 3D data are crucial for anomaly detection, and the datasets are seldom completely clean in practical scenarios. To address above challenges, this paper initially delves into the RGB-3D multi-modal noisy anomaly detection, proposing a novel noise-resistant M3DM-NR… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  48. arXiv:2406.01916  [pdf, other

    cs.CV

    FastLGS: Speeding up Language Embedded Gaussians with Feature Grid Mapping

    Authors: Yuzhou Ji, He Zhu, Junshu Tang, Wuyi Liu, Zhizhong Zhang, Yuan Xie, Lizhuang Ma, Xin Tan

    Abstract: The semantically interactive radiance field has always been an appealing task for its potential to facilitate user-friendly and automated real-world 3D scene understanding applications. However, it is a challenging task to achieve high quality, efficiency and zero-shot ability at the same time with semantics in radiance fields. In this work, we present FastLGS, an approach that supports real-time… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  49. arXiv:2406.00480  [pdf, other

    cs.CV

    AlignSAM: Aligning Segment Anything Model to Open Context via Reinforcement Learning

    Authors: Duojun Huang, Xinyu Xiong, Jie Ma, Jichang Li, Zequn Jie, Lin Ma, Guanbin Li

    Abstract: Powered by massive curated training data, Segment Anything Model (SAM) has demonstrated its impressive generalization capabilities in open-world scenarios with the guidance of prompts. However, the vanilla SAM is class agnostic and heavily relies on user-provided prompts to segment objects of interest. Adapting this method to diverse tasks is crucial for accurate target identification and to avoid… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: CVPR2024

  50. arXiv:2406.00036  [pdf, other

    cs.CL cs.AI cs.LG

    EMERGE: Integrating RAG for Improved Multimodal EHR Predictive Modeling

    Authors: Yinghao Zhu, Changyu Ren, Zixiang Wang, Xiaochen Zheng, Shiyun Xie, Junlan Feng, Xi Zhu, Zhoujun Li, Liantao Ma, Chengwei Pan

    Abstract: The integration of multimodal Electronic Health Records (EHR) data has notably advanced clinical predictive capabilities. However, current models that utilize clinical notes and multivariate time-series EHR data often lack the necessary medical context for precise clinical tasks. Previous methods using knowledge graphs (KGs) primarily focus on structured knowledge extraction. To address this, we p… ▽ More

    Submitted 27 May, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2402.07016