Skip to main content

Showing 1–50 of 323 results for author: xie, W

  1. arXiv:2407.04651  [pdf, other

    cs.CV

    SAM Fewshot Finetuning for Anatomical Segmentation in Medical Images

    Authors: Weiyi Xie, Nathalie Willems, Shubham Patil, Yang Li, Mayank Kumar

    Abstract: We propose a straightforward yet highly effective few-shot fine-tuning strategy for adapting the Segment Anything (SAM) to anatomical segmentation tasks in medical images. Our novel approach revolves around reformulating the mask decoder within SAM, leveraging few-shot embeddings derived from a limited set of labeled images (few-shot collection) as prompts for querying anatomical objects captured… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 9 pages, Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2024

    ACM Class: I.4.6; I.5.4; I.5.1

  2. arXiv:2407.04638  [pdf, other

    cs.CV

    Semi-Supervised Segmentation via Embedding Matching

    Authors: Weiyi Xie, Nathalie Willems, Nikolas Lessmann, Tom Gibbons, Daniele De Massari

    Abstract: Deep convolutional neural networks are widely used in medical image segmentation but require many labeled images for training. Annotating three-dimensional medical images is a time-consuming and costly process. To overcome this limitation, we propose a novel semi-supervised segmentation method that leverages mostly unlabeled images and a small set of labeled images in training. Our approach involv… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 13 pages, MIDL2024 oral

    ACM Class: I.5.4; I.4.6; I.2.10

  3. arXiv:2407.04557  [pdf, other

    cond-mat.mtrl-sci cs.LG

    Structural Constraint Integration in Generative Model for Discovery of Quantum Material Candidates

    Authors: Ryotaro Okabe, Mouyang Cheng, Abhijatmedhi Chotrattanapituk, Nguyen Tuan Hung, Xiang Fu, Bowen Han, Yao Wang, Weiwei Xie, Robert J. Cava, Tommi S. Jaakkola, Yongqiang Cheng, Mingda Li

    Abstract: Billions of organic molecules are known, but only a tiny fraction of the functional inorganic materials have been discovered, a particularly relevant problem to the community searching for new quantum materials. Recent advancements in machine-learning-based generative models, particularly diffusion models, show great promise for generating new, stable materials. However, integrating geometric patt… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 512 pages total, 4 main figures + 218 supplementary figures

  4. arXiv:2406.19435  [pdf, other

    cs.CV

    A Sanity Check for AI-generated Image Detection

    Authors: Shilin Yan, Ouxiang Li, Jiayin Cai, Yanbin Hao, Xiaolong Jiang, Yao Hu, Weidi Xie

    Abstract: With the rapid development of generative models, discerning AI-generated content has evoked increasing attention from both industry and academia. In this paper, we conduct a sanity check on "whether the task of AI-generated image detection has been solved". To start with, we present Chameleon dataset, consisting AIgenerated images that are genuinely challenging for human perception. To quantify th… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: Project page: https://shilinyan99.github.io/AIDE Code: https://github.com/shilinyan99/AIDE

  5. arXiv:2406.18530  [pdf, other

    cs.CV

    MatchTime: Towards Automatic Soccer Game Commentary Generation

    Authors: Jiayuan Rao, Haoning Wu, Chang Liu, Yanfeng Wang, Weidi Xie

    Abstract: Soccer is a globally popular sport with a vast audience, in this paper, we consider constructing an automatic soccer game commentary model to improve the audiences' viewing experience. In general, we make the following contributions: First, observing the prevalent video-text misalignment in existing datasets, we manually annotate timestamps for 49 matches, establishing a more robust benchmark for… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Technical Report; Project Page: https://haoningwu3639.github.io/MatchTime/

  6. arXiv:2406.18034  [pdf, other

    cs.CL

    LLMs for Doctors: Leveraging Medical LLMs to Assist Doctors, Not Replace Them

    Authors: Wenya Xie, Qingying Xiao, Yu Zheng, Xidong Wang, Junying Chen, Ke Ji, Anningzhe Gao, Xiang Wan, Feng Jiang, Benyou Wang

    Abstract: The recent success of Large Language Models (LLMs) has had a significant impact on the healthcare field, providing patients with medical advice, diagnostic information, and more. However, due to a lack of professional medical knowledge, patients are easily misled by generated erroneous information from LLMs, which may result in serious medical problems. To address this issue, we focus on tuning th… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  7. arXiv:2406.16845  [pdf, other

    cs.CL

    RaTEScore: A Metric for Radiology Report Generation

    Authors: Weike Zhao, Chaoyi Wu, Xiaoman Zhang, Ya Zhang, Yanfeng Wang, Weidi Xie

    Abstract: This paper introduces a novel, entity-aware metric, termed as Radiological Report (Text) Evaluation (RaTEScore), to assess the quality of medical reports generated by AI models. RaTEScore emphasizes crucial medical entities such as diagnostic outcomes and anatomical details, and is robust against complex medical synonyms and sensitive to negation expressions. Technically, we developed a comprehens… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  8. arXiv:2406.07001  [pdf, other

    cs.CL cs.AI

    Mitigating Boundary Ambiguity and Inherent Bias for Text Classification in the Era of Large Language Models

    Authors: Zhenyi Lu, Jie Tian, Wei Wei, Xiaoye Qu, Yu Cheng, Wenfeng xie, Dangyang Chen

    Abstract: Text classification is a crucial task encountered frequently in practical scenarios, yet it is still under-explored in the era of large language models (LLMs). This study shows that LLMs are vulnerable to changes in the number and arrangement of options in text classification. Our extensive empirical analyses reveal that the key bottleneck arises from ambiguous decision boundaries and inherent bia… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: ACL2024 findings

  9. arXiv:2406.06580  [pdf, other

    cs.CL cs.AI

    Break the Chain: Large Language Models Can be Shortcut Reasoners

    Authors: Mengru Ding, Hanmeng Liu, Zhizhang Fu, Jian Song, Wenbo Xie, Yue Zhang

    Abstract: Recent advancements in Chain-of-Thought (CoT) reasoning utilize complex modules but are hampered by high token consumption, limited applicability, and challenges in reproducibility. This paper conducts a critical evaluation of CoT prompting, extending beyond arithmetic to include complex logical and commonsense reasoning tasks, areas where standard CoT methods fall short. We propose the integratio… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  10. arXiv:2406.06521  [pdf, other

    cs.CV

    PGSR: Planar-based Gaussian Splatting for Efficient and High-Fidelity Surface Reconstruction

    Authors: Danpeng Chen, Hai Li, Weicai Ye, Yifan Wang, Weijian Xie, Shangjin Zhai, Nan Wang, Haomin Liu, Hujun Bao, Guofeng Zhang

    Abstract: Recently, 3D Gaussian Splatting (3DGS) has attracted widespread attention due to its high-quality rendering, and ultra-fast training and rendering speed. However, due to the unstructured and irregular nature of Gaussian point clouds, it is difficult to guarantee geometric reconstruction accuracy and multi-view consistency simply by relying on image reconstruction loss. Although many studies on sur… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: project page: https://zju3dv.github.io/pgsr/

  11. arXiv:2406.02002  [pdf, other

    cs.CL cs.AI

    Position Debiasing Fine-Tuning for Causal Perception in Long-Term Dialogue

    Authors: Shixuan Fan, Wei Wei, Wendi Li, Xian-Ling Mao, Wenfeng Xie, Dangyang Chen

    Abstract: The core of the dialogue system is to generate relevant, informative, and human-like responses based on extensive dialogue history. Recently, dialogue generation domain has seen mainstream adoption of large language models (LLMs), due to its powerful capability in generating utterances. However, there is a natural deficiency for such models, that is, inherent position bias, which may lead them to… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted to IJCAI 2024

  12. arXiv:2406.00956  [pdf, other

    cs.CV cs.LG eess.IV

    Improving Segment Anything on the Fly: Auxiliary Online Learning and Adaptive Fusion for Medical Image Segmentation

    Authors: Tianyu Huang, Tao Zhou, Weidi Xie, Shuo Wang, Qi Dou, Yizhe Zhang

    Abstract: The current variants of the Segment Anything Model (SAM), which include the original SAM and Medical SAM, still lack the capability to produce sufficiently accurate segmentation for medical images. In medical imaging contexts, it is not uncommon for human experts to rectify segmentations of specific test samples after SAM generates its segmentation predictions. These rectifications typically entai… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: Project Link: https://sam-auxol.github.io/AuxOL/

  13. arXiv:2406.00606  [pdf, other

    cs.CL

    LLMs Could Autonomously Learn Without External Supervision

    Authors: Ke Ji, Junying Chen, Anningzhe Gao, Wenya Xie, Xiang Wan, Benyou Wang

    Abstract: In the quest for super-human performance, Large Language Models (LLMs) have traditionally been tethered to human-annotated datasets and predefined training objectives-a process that is both labor-intensive and inherently limited. This paper presents a transformative approach: Autonomous Learning for LLMs, a self-sufficient learning paradigm that frees models from the constraints of human supervisi… ▽ More

    Submitted 6 June, 2024; v1 submitted 1 June, 2024; originally announced June 2024.

    Comments: 20 pages, 8 figures

  14. arXiv:2405.20568  [pdf, other

    cs.LG cs.NI

    Generative AI for Deep Reinforcement Learning: Framework, Analysis, and Use Cases

    Authors: Geng Sun, Wenwen Xie, Dusit Niyato, Fang Mei, Jiawen Kang, Hongyang Du, Shiwen Mao

    Abstract: As a form of artificial intelligence (AI) technology based on interactive learning, deep reinforcement learning (DRL) has been widely applied across various fields and has achieved remarkable accomplishments. However, DRL faces certain limitations, including low sample efficiency and poor generalization. Therefore, we present how to leverage generative AI (GAI) to address these issues above and en… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  15. arXiv:2405.19694  [pdf, other

    cs.AI

    Grade Like a Human: Rethinking Automated Assessment with Large Language Models

    Authors: Wenjing Xie, Juxin Niu, Chun Jason Xue, Nan Guan

    Abstract: While large language models (LLMs) have been used for automated grading, they have not yet achieved the same level of performance as humans, especially when it comes to grading complex questions. Existing research on this topic focuses on a particular step in the grading procedure: grading using predefined rubrics. However, grading is a multifaceted procedure that encompasses other crucial steps,… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  16. arXiv:2405.07481  [pdf, other

    cs.CV

    Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis

    Authors: Tianci Bi, Xiaoyi Zhang, Zhizheng Zhang, Wenxuan Xie, Cuiling Lan, Yan Lu, Nanning Zheng

    Abstract: Significant progress has been made in scene text detection models since the rise of deep learning, but scene text layout analysis, which aims to group detected text instances as paragraphs, has not kept pace. Previous works either treated text detection and grouping using separate models, or train a model from scratch while using a unified one. All of them have not yet made full use of the already… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: Accepted to CVPR 2024

  17. arXiv:2405.05957  [pdf, other

    cs.CL

    OpenBA-V2: Reaching 77.3% High Compression Ratio with Fast Multi-Stage Pruning

    Authors: Dan Qiao, Yi Su, Pinzheng Wang, Jing Ye, Wenjing Xie, Yuechi Zhou, Yuyang Ding, Zecheng Tang, Jikai Wang, Yixin Ji, Yue Wang, Pei Guo, Zechen Sun, Zikang Zhang, Juntao Li, Pingfu Chao, Wenliang Chen, Guohong Fu, Guodong Zhou, Qiaoming Zhu, Min Zhang

    Abstract: Large Language Models (LLMs) have played an important role in many fields due to their powerful capabilities.However, their massive number of parameters leads to high deployment requirements and incurs significant inference costs, which impedes their practical applications. Training smaller models is an effective way to address this problem. Therefore, we introduce OpenBA-V2, a 3.4B model derived… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  18. arXiv:2405.03913  [pdf, other

    q-bio.QM cs.LG stat.ML

    Digital Twin Calibration for Biological System-of-Systems: Cell Culture Manufacturing Process

    Authors: Fuqiang Cheng, Wei Xie, Hua Zheng

    Abstract: Biomanufacturing innovation relies on an efficient Design of Experiments (DoEs) to optimize processes and product quality. Traditional DoE methods, ignoring the underlying bioprocessing mechanisms, often suffer from a lack of interpretability and sample efficiency. This limitation motivates us to create a new optimal learning approach for digital twin model calibration. In this study, we consider… ▽ More

    Submitted 28 June, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

    Comments: 11 pages, 5 figures

  19. arXiv:2405.03101  [pdf, ps, other

    cs.IT

    Double Self-Sustainable Reconfigurable Intelligent Surfaces Aided Wireless Communications

    Authors: Ji Wang, Suhong Luo, Yixuan Li, Wenwu Xie, Xingwang Li, Arumugam Nallanathan

    Abstract: A double self-sustainable reconfigurable intelligent surfaces (RISs) assisted multi-user multiple input multiple output (MIMO) system is investigated. Two RISs are equipped with energy harvesting circuit to achieve self-sustainable transmission. The aim is to minimize the transmission power at the base station (BS), while guaranteeing the quality of service (QoS) requirements of the users and meet… ▽ More

    Submitted 7 May, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

  20. arXiv:2405.02783  [pdf, other

    stat.ML cs.LG

    Linear Noise Approximation Assisted Bayesian Inference on Mechanistic Model of Partially Observed Stochastic Reaction Network

    Authors: Wandi Xu, Wei Xie

    Abstract: To support mechanism online learning and facilitate digital twin development for biomanufacturing processes, this paper develops an efficient Bayesian inference approach for partially observed enzymatic stochastic reaction network (SRN), a fundamental building block of multi-scale bioprocess mechanistic model. To tackle the critical challenges brought by the nonlinear stochastic differential equat… ▽ More

    Submitted 28 June, 2024; v1 submitted 4 May, 2024; originally announced May 2024.

    Comments: 11 pages, 2 figures

  21. arXiv:2404.17774  [pdf, other

    cs.CV cs.GR

    High-quality Surface Reconstruction using Gaussian Surfels

    Authors: Pinxuan Dai, Jiamin Xu, Wenxiang Xie, Xinguo Liu, Huamin Wang, Weiwei Xu

    Abstract: We propose a novel point-based representation, Gaussian surfels, to combine the advantages of the flexible optimization procedure in 3D Gaussian points and the surface alignment property of surfels. This is achieved by directly setting the z-scale of 3D Gaussian points to 0, effectively flattening the original 3D ellipsoid into a 2D ellipse. Such a design provides clear guidance to the optimizer.… ▽ More

    Submitted 29 April, 2024; v1 submitted 27 April, 2024; originally announced April 2024.

    Comments: Results added and improved

  22. arXiv:2404.16828  [pdf, other

    cs.CV cs.LG

    Made to Order: Discovering monotonic temporal changes via self-supervised video ordering

    Authors: Charig Yang, Weidi Xie, Andrew Zisserman

    Abstract: Our objective is to discover and localize monotonic temporal changes in a sequence of images. To achieve this, we exploit a simple proxy task of ordering a shuffled image sequence, with `time' serving as a supervisory signal since only changes that are monotonic with time can give rise to the correct ordering. We also introduce a flexible transformer-based model for general-purpose ordering of ima… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: Project page: https://charigyang.github.io/order/

  23. arXiv:2404.16754  [pdf, other

    cs.CV

    RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis

    Authors: Xiaoman Zhang, Chaoyi Wu, Ziheng Zhao, Jiayu Lei, Ya Zhang, Yanfeng Wang, Weidi Xie

    Abstract: Developing generalist foundation model has recently attracted tremendous attention among researchers in the field of AI for Medicine (AI4Medicine). A pivotal insight in developing these models is their reliance on dataset scaling, which emphasizes the requirements on developing open-source medical image datasets that incorporate diverse supervision signals across various imaging modalities. In thi… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  24. arXiv:2404.14412  [pdf, other

    cs.CV

    AutoAD III: The Prequel -- Back to the Pixels

    Authors: Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman

    Abstract: Generating Audio Description (AD) for movies is a challenging task that requires fine-grained visual understanding and an awareness of the characters and their names. Currently, visual language models for AD generation are limited by a lack of suitable training data, and also their evaluation is hampered by using performance measures not specialized to the AD domain. In this paper, we make three c… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: CVPR2024. Project page: https://www.robots.ox.ac.uk/~vgg/research/autoad/

  25. arXiv:2404.13342  [pdf, other

    cs.CV cs.LG

    Hyperspectral Anomaly Detection with Self-Supervised Anomaly Prior

    Authors: Yidan Liu, Weiying Xie, Kai Jiang, Jiaqing Zhang, Yunsong Li, Leyuan Fang

    Abstract: The majority of existing hyperspectral anomaly detection (HAD) methods use the low-rank representation (LRR) model to separate the background and anomaly components, where the anomaly component is optimized by handcrafted sparse priors (e.g., $\ell_{2,1}$-norm). However, this may not be ideal since they overlook the spatial structure present in anomalies and make the detection result largely depen… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

  26. arXiv:2404.12389  [pdf, other

    cs.CV

    Moving Object Segmentation: All You Need Is SAM (and Flow)

    Authors: Junyu Xie, Charig Yang, Weidi Xie, Andrew Zisserman

    Abstract: The objective of this paper is motion segmentation -- discovering and segmenting the moving objects in a video. This is a much studied area with numerous careful,and sometimes complex, approaches and training schemes including: self-supervised learning, learning from synthetic datasets, object-centric representations, amodal representations, and many more. Our interest in this paper is to determin… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: Project Page: https://www.robots.ox.ac.uk/~vgg/research/flowsam/

  27. arXiv:2404.10556  [pdf, other

    cs.NI eess.SP

    Generative AI for Advanced UAV Networking

    Authors: Geng Sun, Wenwen Xie, Dusit Niyato, Hongyang Du, Jiawen Kang, Jing Wu, Sumei Sun, Ping Zhang

    Abstract: With the impressive achievements of chatGPT and Sora, generative artificial intelligence (GAI) has received increasing attention. Not limited to the field of content generation, GAI is also widely used to solve the problems in wireless communication scenarios due to its powerful learning and generalization capabilities. Therefore, we discuss key applications of GAI in improving unmanned aerial veh… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  28. arXiv:2404.09942  [pdf, other

    cs.CV

    Knowledge-enhanced Visual-Language Pretraining for Computational Pathology

    Authors: Xiao Zhou, Xiaoman Zhang, Chaoyi Wu, Ya Zhang, Weidi Xie, Yanfeng Wang

    Abstract: In this paper, we consider the problem of visual representation learning for computational pathology, by exploiting large-scale image-text pairs gathered from public resources, along with the domain specific knowledge in pathology. Specifically, we make the following contributions: (i) We curate a pathology knowledge tree that consists of 50,470 informative attributes for 4,718 diseases requiring… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  29. arXiv:2404.08926  [pdf, other

    cs.CV

    Diffusion Models Meet Remote Sensing: Principles, Methods, and Perspectives

    Authors: Yidan Liu, Jun Yue, Shaobo Xia, Pedram Ghamisi, Weiying Xie, Leyuan Fang

    Abstract: As a newly emerging advance in deep generative models, diffusion models have achieved state-of-the-art results in many fields, including computer vision, natural language processing, and molecule design. The remote sensing community has also noticed the powerful ability of diffusion models and quickly applied them to a variety of tasks for image processing. Given the rapid increase in research on… ▽ More

    Submitted 17 April, 2024; v1 submitted 13 April, 2024; originally announced April 2024.

  30. arXiv:2404.06443  [pdf, other

    cs.CV

    Multi-scale Dynamic and Hierarchical Relationship Modeling for Facial Action Units Recognition

    Authors: Zihan Wang, Siyang Song, Cheng Luo, Songhe Deng, Weicheng Xie, Linlin Shen

    Abstract: Human facial action units (AUs) are mutually related in a hierarchical manner, as not only they are associated with each other in both spatial and temporal domains but also AUs located in the same/close facial regions show stronger relationships than those of different facial regions. While none of existing approach thoroughly model such hierarchical inter-dependencies among AUs, this paper propos… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR2024

  31. arXiv:2403.18762  [pdf, other

    cs.CV cs.AI cs.RO

    ModaLink: Unifying Modalities for Efficient Image-to-PointCloud Place Recognition

    Authors: Weidong Xie, Lun Luo, Nanfei Ye, Yi Ren, Shaoyi Du, Minhang Wang, Jintao Xu, Rui Ai, Weihao Gu, Xieyuanli Chen

    Abstract: Place recognition is an important task for robots and autonomous cars to localize themselves and close loops in pre-built maps. While single-modal sensor-based methods have shown satisfactory performance, cross-modal place recognition that retrieving images from a point-cloud database remains a challenging problem. Current cross-modal methods transform images into 3D points using depth estimation… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: 8 pages, 11 figures, conference

  32. arXiv:2403.15027  [pdf, other

    cs.LG cs.AI

    Grey-informed neural network for time-series forecasting

    Authors: Wanli Xie, Ruibin Zhao, Zhenguo Xu, Tingting Liang

    Abstract: Neural network models have shown outstanding performance and successful resolutions to complex problems in various fields. However, the majority of these models are viewed as black-box, requiring a significant amount of data for development. Consequently, in situations with limited data, constructing appropriate models becomes challenging due to the lack of transparency and scarcity of data. To ta… ▽ More

    Submitted 3 April, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

  33. arXiv:2403.11558  [pdf, other

    cs.CL cs.AI

    Reinforcement Learning with Token-level Feedback for Controllable Text Generation

    Authors: Wendi Li, Wei Wei, Kaihe Xu, Wenfeng Xie, Dangyang Chen, Yu Cheng

    Abstract: To meet the requirements of real-world applications, it is essential to control generations of large language models (LLMs). Prior research has tried to introduce reinforcement learning (RL) into controllable text generation while most existing methods suffer from overfitting issues (finetuning-based methods) or semantic collapse (post-processing methods). However, current RL methods are generally… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Accepted to NAACL 2024 Findings

  34. arXiv:2403.09323  [pdf, other

    cs.CV

    E2E-MFD: Towards End-to-End Synchronous Multimodal Fusion Detection

    Authors: Jiaqing Zhang, Mingxiang Cao, Xue Yang, Weiying Xie, Jie Lei, Daixun Li, Wenbo Huang, Yunsong Li

    Abstract: Multimodal image fusion and object detection are crucial for autonomous driving. While current methods have advanced the fusion of texture details and semantic information, their complex training processes hinder broader applications. Addressing this challenge, we introduce E2E-MFD, a novel end-to-end algorithm for multimodal fusion detection. E2E-MFD streamlines the process, achieving high perfor… ▽ More

    Submitted 23 May, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

  35. arXiv:2403.07832  [pdf, other

    cs.RO

    DeliGrasp: Inferring Object Properties with LLMs for Adaptive Grasp Policies

    Authors: William Xie, Jensen Lavering, Nikolaus Correll

    Abstract: Large language models (LLMs) can provide rich physical descriptions of most worldly objects, allowing robots to achieve more informed and capable grasping. We leverage LLMs' common sense physical reasoning and code-writing abilities to infer an object's physical characteristics--mass $m$, friction coefficient $μ$, and spring constant $k$--from a semantic description, and then translate those chara… ▽ More

    Submitted 30 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  36. arXiv:2403.04697  [pdf, other

    cs.CV cs.AI

    AUFormer: Vision Transformers are Parameter-Efficient Facial Action Unit Detectors

    Authors: Kaishen Yuan, Zitong Yu, Xin Liu, Weicheng Xie, Huanjing Yue, Jingyu Yang

    Abstract: Facial Action Units (AU) is a vital concept in the realm of affective computing, and AU detection has always been a hot research topic. Existing methods suffer from overfitting issues due to the utilization of a large number of learnable parameters on scarce AU-annotated datasets or heavy reliance on substantial additional relevant data. Parameter-Efficient Transfer Learning (PETL) provides a prom… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: 19 pages, 6 figures

  37. arXiv:2403.04652  [pdf, other

    cs.CL cs.AI

    Yi: Open Foundation Models by 01.AI

    Authors: 01. AI, :, Alex Young, Bei Chen, Chao Li, Chengen Huang, Ge Zhang, Guanwei Zhang, Heng Li, Jiangcheng Zhu, Jianqun Chen, Jing Chang, Kaidong Yu, Peng Liu, Qiang Liu, Shawn Yue, Senbin Yang, Shiming Yang, Tao Yu, Wen Xie, Wenhao Huang, Xiaohui Hu, Xiaoyi Ren, Xinyao Niu, Pengcheng Nie , et al. (7 additional authors not shown)

    Abstract: We introduce the Yi model family, a series of language and multimodal models that demonstrate strong multi-dimensional capabilities. The Yi model family is based on 6B and 34B pretrained language models, then we extend them to chat models, 200K long context models, depth-upscaled models, and vision-language models. Our base models achieve strong performance on a wide range of benchmarks like MMLU,… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  38. arXiv:2403.00841  [pdf, other

    cs.MA cs.AI cs.GT cs.LG

    Offline Fictitious Self-Play for Competitive Games

    Authors: Jingxiao Chen, Weiji Xie, Weinan Zhang, Yong yu, Ying Wen

    Abstract: Offline Reinforcement Learning (RL) has received significant interest due to its ability to improve policies in previously collected datasets without online interactions. Despite its success in the single-agent setting, offline multi-agent RL remains a challenge, especially in competitive games. Firstly, unaware of the game structure, it is impossible to interact with the opponents and conduct a m… ▽ More

    Submitted 29 February, 2024; originally announced March 2024.

  39. arXiv:2402.15690  [pdf, other

    cs.CL cs.AI

    Foot In The Door: Understanding Large Language Model Jailbreaking via Cognitive Psychology

    Authors: Zhenhua Wang, Wei Xie, Baosheng Wang, Enze Wang, Zhiwen Gui, Shuoyoucheng Ma, Kai Chen

    Abstract: Large Language Models (LLMs) have gradually become the gateway for people to acquire new knowledge. However, attackers can break the model's security protection ("jail") to access restricted information, which is called "jailbreaking." Previous studies have shown the weakness of current LLMs when confronted with such jailbreaking attacks. Nevertheless, comprehension of the intrinsic decision-makin… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  40. arXiv:2402.13963  [pdf, other

    cs.CL

    Towards Building Multilingual Language Model for Medicine

    Authors: Pengcheng Qiu, Chaoyi Wu, Xiaoman Zhang, Weixiong Lin, Haicheng Wang, Ya Zhang, Yanfeng Wang, Weidi Xie

    Abstract: The development of open-source, multilingual medical language models can benefit a wide, linguistically diverse audience from different regions. To promote this domain, we present contributions from the following: First, we construct a multilingual medical corpus, containing approximately 25.5B tokens encompassing 6 main languages, termed as MMedC, enabling auto-regressive domain adaptation for ge… ▽ More

    Submitted 2 June, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

  41. arXiv:2402.13088  [pdf, other

    cs.CV

    Slot-VLM: SlowFast Slots for Video-Language Modeling

    Authors: Jiaqi Xu, Cuiling Lan, Wenxuan Xie, Xuejin Chen, Yan Lu

    Abstract: Video-Language Models (VLMs), powered by the advancements in Large Language Models (LLMs), are charting new frontiers in video understanding. A pivotal challenge is the development of an efficient method to encapsulate video content into a set of representative tokens to align with LLMs. In this work, we introduce Slot-VLM, a novel framework designed to generate semantically decomposed video token… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: 16 pages, 10 figures

  42. arXiv:2402.05937  [pdf, other

    cs.CV

    InstaGen: Enhancing Object Detection by Training on Synthetic Dataset

    Authors: Chengjian Feng, Yujie Zhong, Zequn Jie, Weidi Xie, Lin Ma

    Abstract: In this paper, we present a novel paradigm to enhance the ability of object detector, e.g., expanding categories or improving detection performance, by training on synthetic dataset generated from diffusion models. Specifically, we integrate an instance-level grounding head into a pre-trained, generative diffusion model, to augment it with the ability of localising instances in the generated image… ▽ More

    Submitted 8 April, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

    Comments: CVPR2024

  43. arXiv:2402.03951  [pdf, other

    cs.CV cs.AI

    Boosting Adversarial Transferability across Model Genus by Deformation-Constrained Warping

    Authors: Qinliang Lin, Cheng Luo, Zenghao Niu, Xilin He, Weicheng Xie, Yuanbo Hou, Linlin Shen, Siyang Song

    Abstract: Adversarial examples generated by a surrogate model typically exhibit limited transferability to unknown target systems. To address this problem, many transferability enhancement approaches (e.g., input transformation and model augmentation) have been proposed. However, they show poor performances in attacking systems having different model genera from the surrogate model. In this paper, we propos… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: AAAI 2024

  44. arXiv:2402.00740  [pdf, other

    cs.CV

    DRSM: efficient neural 4d decomposition for dynamic reconstruction in stationary monocular cameras

    Authors: Weixing Xie, Xiao Dong, Yong Yang, Qiqin Lin, Jingze Chen, Junfeng Yao, Xiaohu Guo

    Abstract: With the popularity of monocular videos generated by video sharing and live broadcasting applications, reconstructing and editing dynamic scenes in stationary monocular cameras has become a special but anticipated technology. In contrast to scene reconstructions that exploit multi-view observations, the problem of modeling a dynamic scene from a single view is significantly more under-constrained… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

  45. arXiv:2401.16423  [pdf, other

    cs.CV cs.LG cs.MM cs.SD eess.AS

    Synchformer: Efficient Synchronization from Sparse Cues

    Authors: Vladimir Iashin, Weidi Xie, Esa Rahtu, Andrew Zisserman

    Abstract: Our objective is audio-visual synchronization with a focus on 'in-the-wild' videos, such as those on YouTube, where synchronization cues can be sparse. Our contributions include a novel audio-visual synchronization model, and training that decouples feature extraction from synchronization modelling through multi-modal segment-level contrastive pre-training. This approach achieves state-of-the-art… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: Extended version of the ICASSP 24 paper. Project page: https://www.robots.ox.ac.uk/~vgg/research/synchformer/ Code: https://github.com/v-iashin/Synchformer

  46. arXiv:2401.11141  [pdf, other

    cs.IT eess.SP

    Wideband Beamforming for RIS Assisted Near-Field Communications

    Authors: Ji Wang, Jian Xiao, Yixuan Zou, Wenwu Xie, Yuanwei Liu

    Abstract: A near-field wideband beamforming scheme is investigated for reconfigurable intelligent surface (RIS) assisted multiple-input multiple-output (MIMO) systems, in which a deep learning-based end-to-end (E2E) optimization framework is proposed to maximize the system spectral efficiency. To deal with the near-field double beam split effect, the base station is equipped with frequency-dependent hybrid… ▽ More

    Submitted 3 July, 2024; v1 submitted 20 January, 2024; originally announced January 2024.

  47. arXiv:2401.08695  [pdf, other

    cs.AI cs.CV cs.HC

    Enabling Collaborative Clinical Diagnosis of Infectious Keratitis by Integrating Expert Knowledge and Interpretable Data-driven Intelligence

    Authors: Zhengqing Fang, Shuowen Zhou, Zhouhang Yuan, Yuxuan Si, Mengze Li, Jinxu Li, Yesheng Xu, Wenjia Xie, Kun Kuang, Yingming Li, Fei Wu, Yu-Feng Yao

    Abstract: Although data-driven artificial intelligence (AI) in medical image diagnosis has shown impressive performance in silico, the lack of interpretability makes it difficult to incorporate the "black box" into clinicians' workflows. To make the diagnostic patterns learned from data understandable by clinicians, we develop an interpretable model, knowledge-guided diagnosis model (KGDM), that provides a… ▽ More

    Submitted 13 January, 2024; originally announced January 2024.

    Comments: 33 pages

  48. arXiv:2401.08687  [pdf, other

    cs.CV

    DA-BEV: Unsupervised Domain Adaptation for Bird's Eye View Perception

    Authors: Kai Jiang, Jiaxing Huang, Weiying Xie, Yunsong Li, Ling Shao, Shijian Lu

    Abstract: Camera-only Bird's Eye View (BEV) has demonstrated great potential in environment perception in a 3D space. However, most existing studies were conducted under a supervised setup which cannot scale well while handling various new data. Unsupervised domain adaptive BEV, which effective learning from various unlabelled target data, is far under-explored. In this work, we design DA-BEV, the first dom… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

  49. arXiv:2401.06969  [pdf, other

    cs.CV

    Domain Adaptation for Large-Vocabulary Object Detectors

    Authors: Kai Jiang, Jiaxing Huang, Weiying Xie, Jie Lei, Yunsong Li, Ling Shao, Shijian Lu

    Abstract: Large-vocabulary object detectors (LVDs) aim to detect objects of many categories, which learn super objectness features and can locate objects accurately while applied to various downstream data. However, LVDs often struggle in recognizing the located objects due to domain discrepancy in data distribution and object vocabulary. At the other end, recent vision-language foundation models such as CL… ▽ More

    Submitted 10 May, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

  50. arXiv:2401.05093  [pdf, other

    cs.CV

    SwiMDiff: Scene-wide Matching Contrastive Learning with Diffusion Constraint for Remote Sensing Image

    Authors: Jiayuan Tian, Jie Lei, Jiaqing Zhang, Weiying Xie, Yunsong Li

    Abstract: With recent advancements in aerospace technology, the volume of unlabeled remote sensing image (RSI) data has increased dramatically. Effectively leveraging this data through self-supervised learning (SSL) is vital in the field of remote sensing. However, current methodologies, particularly contrastive learning (CL), a leading SSL method, encounter specific challenges in this domain. Firstly, CL o… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.