Skip to main content

Showing 1–50 of 337 results for author: Du, B

  1. arXiv:2407.03695  [pdf, other

    cs.CV

    M^3:Manipulation Mask Manufacturer for Arbitrary-Scale Super-Resolution Mask

    Authors: Xinyu Yang, Xiaochen Ma, Xuekang Zhu, Bo Du, Lei Su, Bingkui Tong, Zeyu Lei, Jizhe Zhou

    Abstract: In the field of image manipulation localization (IML), the small quantity and poor quality of existing datasets have always been major issues. A dataset containing various types of manipulations will greatly help improve the accuracy of IML models. Images on the internet (such as those on Baidu Tieba's PS Bar) are manipulated using various techniques, and creating a dataset from these images will… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  2. arXiv:2407.00341  [pdf, other

    cs.CL

    Iterative Data Augmentation with Large Language Models for Aspect-based Sentiment Analysis

    Authors: Haiyun Li, Qihuang Zhong, Ke Zhu, Juhua Liu, Bo Du, Dacheng Tao

    Abstract: Aspect-based Sentiment Analysis (ABSA) is an important sentiment analysis task, which aims to determine the sentiment polarity towards an aspect in a sentence. Due to the expensive and limited labeled data, data augmentation (DA) has become the standard for improving the performance of ABSA. However, current DA methods usually have some shortcomings: 1) poor fluency and coherence, 2) lack of diver… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: Work in process

  3. arXiv:2406.18937  [pdf, other

    cs.LG cs.AI

    Federated Graph Semantic and Structural Learning

    Authors: Wenke Huang, Guancheng Wan, Mang Ye, Bo Du

    Abstract: Federated graph learning collaboratively learns a global graph neural network with distributed graphs, where the non-independent and identically distributed property is one of the major challenges. Most relative arts focus on traditional distributed tasks like images and voices, incapable of graph structures. This paper firstly reveals that local client distortion is brought by both node-level sem… ▽ More

    Submitted 29 June, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

    Journal ref: International Joint Conference on Artificial Intelligence (IJCAI), 2023

  4. arXiv:2406.18610  [pdf, other

    cs.CV

    Vox-UDA: Voxel-wise Unsupervised Domain Adaptation for Cryo-Electron Subtomogram Segmentation with Denoised Pseudo Labeling

    Authors: Haoran Li, Xingjian Li, Jiahua Shi, Huaming Chen, Bo Du, Daisuke Kihara, Johan Barthelemy, Jun Shen, Min Xu

    Abstract: Cryo-Electron Tomography (cryo-ET) is a 3D imaging technology facilitating the study of macromolecular structures at near-atomic resolution. Recent volumetric segmentation approaches on cryo-ET images have drawn widespread interest in biological sector. However, existing methods heavily rely on manually labeled data, which requires highly professional skills, thereby hindering the adoption of full… ▽ More

    Submitted 30 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

    Comments: 11 pages

  5. arXiv:2406.16442  [pdf, other

    cs.CV

    EmoLLM: Multimodal Emotional Understanding Meets Large Language Models

    Authors: Qu Yang, Mang Ye, Bo Du

    Abstract: Multi-modal large language models (MLLMs) have achieved remarkable performance on objective multimodal perception tasks, but their ability to interpret subjective, emotionally nuanced multimodal content remains largely unexplored. Thus, it impedes their ability to effectively understand and react to the intricate emotions expressed by humans through multimodal media. To bridge this gap, we introdu… ▽ More

    Submitted 29 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

    Comments: 9 pages

  6. arXiv:2406.12757  [pdf, other

    cs.CV

    MAC: A Benchmark for Multiple Attributes Compositional Zero-Shot Learning

    Authors: Shuo Xu, Sai Wang, Xinyue Hu, Yutian Lin, Bo Du, Yu Wu

    Abstract: Compositional Zero-Shot Learning (CZSL) aims to learn semantic primitives (attributes and objects) from seen compositions and recognize unseen attribute-object compositions. Existing CZSL datasets focus on single attributes, neglecting the fact that objects naturally exhibit multiple interrelated attributes. Real-world objects often possess multiple interrelated attributes, and current datasets' n… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 13pages,5figures

  7. arXiv:2406.11519  [pdf, other

    cs.CV eess.IV

    HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model

    Authors: Di Wang, Meiqi Hu, Yao Jin, Yuchun Miao, Jiaqi Yang, Yichu Xu, Xiaolei Qin, Jiaqi Ma, Lingyu Sun, Chenxing Li, Chuan Fu, Hongruixuan Chen, Chengxi Han, Naoto Yokoya, Jing Zhang, Minqiang Xu, Lin Liu, Lefei Zhang, Chen Wu, Bo Du, Dacheng Tao, Liangpei Zhang

    Abstract: Foundation models (FMs) are revolutionizing the analysis and understanding of remote sensing (RS) scenes, including aerial RGB, multispectral, and SAR images. However, hyperspectral images (HSIs), which are rich in spectral information, have not seen much application of FMs, with existing methods often restricted to specific tasks and lacking generality. To fill this gap, we introduce HyperSIGMA,… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: The code and models will be released at https://github.com/WHU-Sigma/HyperSIGMA

  8. arXiv:2406.10580  [pdf, other

    cs.CV

    IMDL-BenCo: A Comprehensive Benchmark and Codebase for Image Manipulation Detection & Localization

    Authors: Xiaochen Ma, Xuekang Zhu, Lei Su, Bo Du, Zhuohang Jiang, Bingkui Tong, Zeyu Lei, Xinyu Yang, Chi-Man Pun, Jiancheng Lv, Jizhe Zhou

    Abstract: A comprehensive benchmark is yet to be established in the Image Manipulation Detection \& Localization (IMDL) field. The absence of such a benchmark leads to insufficient and misleading model evaluations, severely undermining the development of this field. However, the scarcity of open-sourced baseline models and inconsistent training and evaluation protocols make conducting rigorous experiments a… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: Technical report

  9. arXiv:2406.10576  [pdf, other

    cs.LG cs.CL stat.ML

    Optimization-based Structural Pruning for Large Language Models without Back-Propagation

    Authors: Yuan Gao, Zujing Liu, Weizhong Zhang, Bo Du, Gui-Song Xia

    Abstract: Compared to the moderate size of neural network models, structural weight pruning on the Large-Language Models (LLMs) imposes a novel challenge on the efficiency of the pruning algorithms, due to the heavy computation/memory demands of the LLMs. Recent efficient LLM pruning methods typically operate at the post-training phase without the expensive weight finetuning, however, their pruning criteria… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 17 pages

  10. arXiv:2406.09770  [pdf, other

    cs.LG cs.AI

    Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion

    Authors: Anke Tang, Li Shen, Yong Luo, Shiwei Liu, Han Hu, Bo Du

    Abstract: Solving multi-objective optimization problems for large deep neural networks is a challenging task due to the complexity of the loss landscape and the expensive computational cost of training and evaluating models. Efficient Pareto front approximation of large models enables multi-objective optimization for various tasks such as multi-task learning and trade-off analysis. Existing algorithms for l… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: code is available at https://github.com/tanganke/pareto_set_learning

  11. arXiv:2406.06498  [pdf, other

    cs.RO cs.HC

    Demonstrating HumanTHOR: A Simulation Platform and Benchmark for Human-Robot Collaboration in a Shared Workspace

    Authors: Chenxu Wang, Boyuan Du, Jiaxin Xu, Peiyan Li, Di Guo, Huaping Liu

    Abstract: Human-robot collaboration (HRC) in a shared workspace has become a common pattern in real-world robot applications and has garnered significant research interest. However, most existing studies for human-in-the-loop (HITL) collaboration with robots in a shared workspace evaluate in either simplified game environments or physical platforms, falling short in limited realistic significance or limited… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: In RSS 2024

  12. arXiv:2406.03280  [pdf, other

    cs.LG cs.AI cs.CL

    FusionBench: A Comprehensive Benchmark of Deep Model Fusion

    Authors: Anke Tang, Li Shen, Yong Luo, Han Hu, Bo Du, Dacheng Tao

    Abstract: Deep model fusion is an emerging technique that unifies the predictions or parameters of several deep neural networks into a single model in a cost-effective and data-efficient manner. This enables the unified model to take advantage of the original models' strengths, potentially exceeding their performance. Although a variety of deep model fusion techniques have been introduced, their evaluations… ▽ More

    Submitted 14 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

    Comments: Project homepage: https://github.com/tanganke/fusion_bench

  13. arXiv:2406.02987  [pdf, other

    cs.CV

    Enhancing Multimodal Large Language Models with Multi-instance Visual Prompt Generator for Visual Representation Enrichment

    Authors: Wenliang Zhong, Wenyi Wu, Qi Li, Rob Barton, Boxin Du, Shioulin Sam, Karim Bouyarmane, Ismail Tutar, Junzhou Huang

    Abstract: Multimodal Large Language Models (MLLMs) have achieved SOTA performance in various visual language tasks by fusing the visual representations with LLMs leveraging some visual adapters. In this paper, we first establish that adapters using query-based Transformers such as Q-former is a simplified Multi-instance Learning method without considering instance heterogeneity/correlation. We then propose… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  14. arXiv:2406.00403  [pdf, other

    cs.LG cs.AI

    Dual-perspective Cross Contrastive Learning in Graph Transformers

    Authors: Zelin Yao, Chuang Liu, Xueqi Ma, Mukun Chen, Jia Wu, Xiantao Cai, Bo Du, Wenbin Hu

    Abstract: Graph contrastive learning (GCL) is a popular method for leaning graph representations by maximizing the consistency of features across augmented views. Traditional GCL methods utilize single-perspective i.e. data or model-perspective) augmentation to generate positive samples, restraining the diversity of positive samples. In addition, these positive samples may be unreliable due to uncontrollabl… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: 12 pages, 5 figures, submitted to IEEE TKDE

  15. arXiv:2405.18786  [pdf, other

    cs.LG cs.CV

    MOKD: Cross-domain Finetuning for Few-shot Classification via Maximizing Optimized Kernel Dependence

    Authors: Hongduan Tian, Feng Liu, Tongliang Liu, Bo Du, Yiu-ming Cheung, Bo Han

    Abstract: In cross-domain few-shot classification, \emph{nearest centroid classifier} (NCC) aims to learn representations to construct a metric space where few-shot classification can be performed by measuring the similarities between samples and the prototype of each class. An intuition behind NCC is that each sample is pulled closer to the class centroid it belongs to while pushed away from those of other… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  16. arXiv:2405.17495  [pdf, other

    cs.LG cs.CR

    Vertical Federated Learning for Effectiveness, Security, Applicability: A Survey

    Authors: Mang Ye, Wei Shen, Bo Du, Eduard Snezhko, Vassili Kovalev, Pong C. Yuen

    Abstract: Vertical Federated Learning (VFL) is a privacy-preserving distributed learning paradigm where different parties collaboratively learn models using partitioned features of shared samples, without leaking private data. Recent research has shown promising results addressing various challenges in VFL, highlighting its potential for practical applications in cross-domain collaboration. However, the cor… ▽ More

    Submitted 4 June, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

    Comments: 31 pages, 9 figures, 10 tables

  17. arXiv:2405.17473  [pdf, other

    cs.LG cs.AI cs.SI

    Repeat-Aware Neighbor Sampling for Dynamic Graph Learning

    Authors: Tao Zou, Yuhao Mao, Junchen Ye, Bowen Du

    Abstract: Dynamic graph learning equips the edges with time attributes and allows multiple links between two nodes, which is a crucial technology for understanding evolving data scenarios like traffic prediction and recommendation systems. Existing works obtain the evolving patterns mainly depending on the most recent neighbor sequences. However, we argue that whether two nodes will have interaction with ea… ▽ More

    Submitted 20 June, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: Accepted by KDD 2024, Research Track

  18. arXiv:2405.14545  [pdf, other

    q-bio.BM cs.LG

    A Cross-Field Fusion Strategy for Drug-Target Interaction Prediction

    Authors: Hongzhi Zhang, Xiuwen Gong, Shirui Pan, Jia Wu, Bo Du, Wenbin Hu

    Abstract: Drug-target interaction (DTI) prediction is a critical component of the drug discovery process. In the drug development engineering field, predicting novel drug-target interactions is extremely crucial.However, although existing methods have achieved high accuracy levels in predicting known drugs and drug targets, they fail to utilize global protein information during DTI prediction. This leads to… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  19. arXiv:2405.14536  [pdf, other

    q-bio.MN cs.AI cs.LG

    Regressor-free Molecule Generation to Support Drug Response Prediction

    Authors: Kun Li, Xiuwen Gong, Shirui Pan, Jia Wu, Bo Du, Wenbin Hu

    Abstract: Drug response prediction (DRP) is a crucial phase in drug discovery, and the most important metric for its evaluation is the IC50 score. DRP results are heavily dependent on the quality of the generated molecules. Existing molecule generation methods typically employ classifier-based guidance, enabling sampling within the IC50 classification range. However, these methods fail to ensure the samplin… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 22 pages, 7 figures, 9 tables,

  20. arXiv:2405.12872  [pdf, other

    eess.IV cs.CV

    Spatial-aware Attention Generative Adversarial Network for Semi-supervised Anomaly Detection in Medical Image

    Authors: Zerui Zhang, Zhichao Sun, Zelong Liu, Bo Du, Rui Yu, Zhou Zhao, Yongchao Xu

    Abstract: Medical anomaly detection is a critical research area aimed at recognizing abnormal images to aid in diagnosis.Most existing methods adopt synthetic anomalies and image restoration on normal samples to detect anomaly. The unlabeled data consisting of both normal and abnormal data is not well explored. We introduce a novel Spatial-aware Attention Generative Adversarial Network (SAGAN) for one-class… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: Early Accept by MICCAI 2024

  21. arXiv:2405.10642  [pdf, other

    cs.LG

    Hi-GMAE: Hierarchical Graph Masked Autoencoders

    Authors: Chuang Liu, Zelin Yao, Yibing Zhan, Xueqi Ma, Dapeng Tao, Jia Wu, Wenbin Hu, Shirui Pan, Bo Du

    Abstract: Graph Masked Autoencoders (GMAEs) have emerged as a notable self-supervised learning approach for graph-structured data. Existing GMAE models primarily focus on reconstructing node-level information, categorizing them as single-scale GMAEs. This methodology, while effective in certain contexts, tends to overlook the complex hierarchical structures inherent in many real-world graphs. For instance,… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: 10 pages, 6 figures, 3 tables

  22. arXiv:2405.09789  [pdf, other

    cs.CV

    LeMeViT: Efficient Vision Transformer with Learnable Meta Tokens for Remote Sensing Image Interpretation

    Authors: Wentao Jiang, Jing Zhang, Di Wang, Qiming Zhang, Zengmao Wang, Bo Du

    Abstract: Due to spatial redundancy in remote sensing images, sparse tokens containing rich information are usually involved in self-attention (SA) to reduce the overall token numbers within the calculation, avoiding the high computational cost issue in Vision Transformers. However, such methods usually obtain sparse tokens by hand-crafted or parallel-unfriendly designs, posing a challenge to reach a better… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: Accepted by IJCAI'2024. The code is available at https://github.com/ViTAE-Transformer/LeMeViT

  23. arXiv:2405.07226  [pdf, other

    quant-ph cs.AI cs.LG

    Separable Power of Classical and Quantum Learning Protocols Through the Lens of No-Free-Lunch Theorem

    Authors: Xinbiao Wang, Yuxuan Du, Kecheng Liu, Yong Luo, Bo Du, Dacheng Tao

    Abstract: The No-Free-Lunch (NFL) theorem, which quantifies problem- and data-independent generalization errors regardless of the optimization process, provides a foundational framework for comprehending diverse learning protocols' potential. Despite its significance, the establishment of the NFL theorem for quantum machine learning models remains largely unexplored, thereby overlooking broader insights int… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

  24. arXiv:2405.05769  [pdf, other

    cs.CV

    Exploring Text-Guided Single Image Editing for Remote Sensing Images

    Authors: Fangzhou Han, Lingyu Si, Hongwei Dong, Lamei Zhang, Hao Chen, Bo Du

    Abstract: Artificial Intelligence Generative Content (AIGC) technologies have significantly influenced the remote sensing domain, particularly in the realm of image generation. However, remote sensing image editing, an equally vital research area, has not garnered sufficient attention. Different from text-guided editing in natural images, which relies on extensive text-image paired data for semantic correla… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  25. arXiv:2405.04741  [pdf, other

    cs.CV

    All in One Framework for Multimodal Re-identification in the Wild

    Authors: He Li, Mang Ye, Ming Zhang, Bo Du

    Abstract: In Re-identification (ReID), recent advancements yield noteworthy progress in both unimodal and cross-modal retrieval tasks. However, the challenge persists in developing a unified framework that could effectively handle varying multimodal data, including RGB, infrared, sketches, and textual information. Additionally, the emergence of large-scale models shows promising performance in various visio… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: 12 pages, 3 figure, CVPR 2024

  26. arXiv:2405.01649  [pdf, other

    cs.CL

    Improving Complex Reasoning over Knowledge Graph with Logic-Aware Curriculum Tuning

    Authors: Tianle Xia, Liang Ding, Guojia Wan, Yibing Zhan, Bo Du, Dacheng Tao

    Abstract: Answering complex queries over incomplete knowledge graphs (KGs) is a challenging job. Most previous works have focused on learning entity/relation embeddings and simulating first-order logic operators with various neural networks. However, they are bottlenecked by the inability to share world knowledge to improve logical reasoning, thus resulting in suboptimal performance. In this paper, we propo… ▽ More

    Submitted 8 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

  27. arXiv:2404.18861  [pdf, other

    cs.CV

    Visual Mamba: A Survey and New Outlooks

    Authors: Rui Xu, Shu Yang, Yihui Wang, Yu Cai, Bo Du, Hao Chen

    Abstract: Mamba, a recent selective structured state space model, excels in long sequence modeling, which is vital in the large model era. Long sequence modeling poses significant challenges, including capturing long-range dependencies within the data and handling the computational demands caused by their extensive length. Mamba addresses these challenges by overcoming the local perception limitations of co… ▽ More

    Submitted 6 July, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

    Comments: Under Review

  28. arXiv:2404.17765  [pdf

    cs.CV

    RFL-CDNet: Towards Accurate Change Detection via Richer Feature Learning

    Authors: Yuhang Gan, Wenjie Xuan, Hang Chen, Juhua Liu, Bo Du

    Abstract: Change Detection is a crucial but extremely challenging task of remote sensing image analysis, and much progress has been made with the rapid development of deep learning. However, most existing deep learning-based change detection methods mainly focus on intricate feature extraction and multi-scale feature fusion, while ignoring the insufficient utilization of features in the intermediate stages,… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: Accepted by PR, volume 153

  29. arXiv:2404.15598  [pdf, other

    cs.LG cs.CR

    Federated Learning with Only Positive Labels by Exploring Label Correlations

    Authors: Xuming An, Dui Wang, Li Shen, Yong Luo, Han Hu, Bo Du, Yonggang Wen, Dacheng Tao

    Abstract: Federated learning aims to collaboratively learn a model by using the data from multiple users under privacy constraints. In this paper, we study the multi-label classification problem under the federated learning setting, where trivial solution and extremely poor performance may be obtained, especially when only positive data w.r.t. a single class label are provided for each client. This issue ca… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: To be published in IEEE Transactions on Neural Networks and Learning Systems

  30. arXiv:2404.14963  [pdf, other

    cs.CL cs.AI

    Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word Problems

    Authors: Qihuang Zhong, Kang Wang, Ziyang Xu, Juhua Liu, Liang Ding, Bo Du, Dacheng Tao

    Abstract: Chain-of-Thought (CoT) prompting has enhanced the performance of Large Language Models (LLMs) across various reasoning tasks. However, CoT still falls short in dealing with complex math word problems, as it usually suffers from three pitfalls: semantic misunderstanding errors, calculation errors and step-missing errors. Prior studies involve addressing the calculation errors and step-missing error… ▽ More

    Submitted 29 May, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

    Comments: Work in progress

  31. arXiv:2404.10353  [pdf, other

    cs.LG cs.SI

    Rethinking the Graph Polynomial Filter via Positive and Negative Coupling Analysis

    Authors: Haodong Wen, Bodong Du, Ruixun Liu, Deyu Meng, Xiangyong Cao

    Abstract: Recently, the optimization of polynomial filters within Spectral Graph Neural Networks (GNNs) has emerged as a prominent research focus. Existing spectral GNNs mainly emphasize polynomial properties in filter design, introducing computational overhead and neglecting the integration of crucial graph structure information. We argue that incorporating graph information into basis construction can enh… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: 13 pages, 8 figures, 6 tables

  32. arXiv:2404.07498  [pdf, other

    cs.CL cs.AI cs.HC cs.LG

    Interactive Prompt Debugging with Sequence Salience

    Authors: Ian Tenney, Ryan Mullins, Bin Du, Shree Pandya, Minsuk Kahng, Lucas Dixon

    Abstract: We present Sequence Salience, a visual tool for interactive prompt debugging with input salience methods. Sequence Salience builds on widely used salience methods for text classification and single-token prediction, and extends this to a system tailored for debugging complex LLM prompts. Our system is well-suited for long texts, and expands on previous work by 1) providing controllable aggregation… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  33. arXiv:2404.04538  [pdf, other

    cs.AI cs.CL

    Soft-Prompting with Graph-of-Thought for Multi-modal Representation Learning

    Authors: Juncheng Yang, Zuchao Li, Shuai Xie, Wei Yu, Shijun Li, Bo Du

    Abstract: The chain-of-thought technique has been received well in multi-modal tasks. It is a step-by-step linear reasoning process that adjusts the length of the chain to improve the performance of generated prompts. However, human thought processes are predominantly non-linear, as they encompass multiple aspects simultaneously and employ dynamic adjustment and updating mechanisms. Therefore, we propose a… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: This paper is accepted to LREC-COLING 2024

  34. arXiv:2404.01925  [pdf, other

    cs.CV cs.AI

    Improving Bird's Eye View Semantic Segmentation by Task Decomposition

    Authors: Tianhao Zhao, Yongcan Chen, Yu Wu, Tianyang Liu, Bo Du, Peilun Xiao, Shi Qiu, Hongda Yang, Guozhen Li, Yi Yang, Yutian Lin

    Abstract: Semantic segmentation in bird's eye view (BEV) plays a crucial role in autonomous driving. Previous methods usually follow an end-to-end pipeline, directly predicting the BEV segmentation map from monocular RGB inputs. However, the challenge arises when the RGB inputs and BEV targets from distinct perspectives, making the direct point-to-point predicting hard to optimize. In this paper, we decompo… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  35. arXiv:2404.01673  [pdf, other

    cs.CV

    A Universal Knowledge Embedded Contrastive Learning Framework for Hyperspectral Image Classification

    Authors: Quanwei Liu, Yanni Dong, Tao Huang, Lefei Zhang, Bo Du

    Abstract: Hyperspectral image (HSI) classification techniques have been intensively studied and a variety of models have been developed. However, these HSI classification models are confined to pocket models and unrealistic ways of dataset partitioning. The former limits the generalization performance of the model and the latter is partitioned leading to inflated model evaluation metrics, which results in p… ▽ More

    Submitted 27 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

  36. arXiv:2404.01273  [pdf, other

    cs.LG cs.CL stat.ME

    TWIN-GPT: Digital Twins for Clinical Trials via Large Language Model

    Authors: Yue Wang, Tianfan Fu, Yinlong Xu, Zihan Ma, Hongxia Xu, Yingzhou Lu, Bang Du, Honghao Gao, Jian Wu

    Abstract: Clinical trials are indispensable for medical research and the development of new treatments. However, clinical trials often involve thousands of participants and can span several years to complete, with a high probability of failure during the process. Recently, there has been a burgeoning interest in virtual clinical trials, which simulate real-world scenarios and hold the potential to significa… ▽ More

    Submitted 28 June, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

  37. arXiv:2403.13430  [pdf, other

    cs.CV

    MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining

    Authors: Di Wang, Jing Zhang, Minqiang Xu, Lin Liu, Dongsheng Wang, Erzhong Gao, Chengxi Han, Haonan Guo, Bo Du, Dacheng Tao, Liangpei Zhang

    Abstract: Foundation models have reshaped the landscape of Remote Sensing (RS) by enhancing various image interpretation tasks. Pretraining is an active research topic, encompassing supervised and self-supervised learning methods to initialize model weights effectively. However, transferring the pretrained models to downstream tasks may encounter task discrepancy due to their formulation of pretraining as i… ▽ More

    Submitted 29 May, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

    Comments: Accepted by IEEE JSTARS Special issue on "Large-Scale Pretraining for Interpretation Promotion in Remote Sensing Domain". The codes and pretrained models are available at https://github.com/ViTAE-Transformer/MTP

  38. arXiv:2403.11967  [pdf, other

    quant-ph cond-mat.quant-gas

    Probing Site-Resolved Current in Strongly Interacting Superconducting Circuit Lattices

    Authors: Botao Du, Ramya Suresh, Santiago López, Jeremy Cadiente, Ruichao Ma

    Abstract: Transport measurements are fundamental for understanding condensed matter phenomena, from superconductivity to the fractional quantum Hall effect. Analogously, they can be powerful tools for probing synthetic quantum matter in quantum simulators. Here we demonstrate the measurement of in-situ particle current in a superconducting circuit lattice and apply it to study transport in both coherent and… ▽ More

    Submitted 8 July, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: Revised version to be published in Phys. Rev. Lett

  39. arXiv:2403.11672  [pdf, other

    eess.IV cs.CV

    WIA-LD2ND: Wavelet-based Image Alignment for Self-supervised Low-Dose CT Denoising

    Authors: Haoyu Zhao, Yuliang Gu, Zhou Zhao, Bo Du, Yongchao Xu, Rui Yu

    Abstract: In clinical examinations and diagnoses, low-dose computed tomography (LDCT) is crucial for minimizing health risks compared with normal-dose computed tomography (NDCT). However, reducing the radiation dose compromises the signal-to-noise ratio, leading to degraded quality of CT images. To address this, we analyze LDCT denoising task based on experimental results from the frequency perspective, and… ▽ More

    Submitted 1 July, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: MICCAI2024

  40. arXiv:2403.09953  [pdf, other

    cs.LG

    Online GNN Evaluation Under Test-time Graph Distribution Shifts

    Authors: Xin Zheng, Dongjin Song, Qingsong Wen, Bo Du, Shirui Pan

    Abstract: Evaluating the performance of a well-trained GNN model on real-world graphs is a pivotal step for reliable GNN online deployment and serving. Due to a lack of test node labels and unknown potential training-test graph data distribution shifts, conventional model evaluation encounters limitations in calculating performance metrics (e.g., test error) and measuring graph data-level discrepancies, par… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: Accepted by ICLR-2024

  41. arXiv:2403.07720  [pdf, other

    cs.CV cs.AI

    Multi-modal Auto-regressive Modeling via Visual Words

    Authors: Tianshuo Peng, Zuchao Li, Lefei Zhang, Hai Zhao, Ping Wang, Bo Du

    Abstract: Large Language Models (LLMs), benefiting from the auto-regressive modelling approach performed on massive unannotated texts corpora, demonstrates powerful perceptual and reasoning capabilities. However, as for extending auto-regressive modelling to multi-modal scenarios to build Large Multi-modal Models (LMMs), there lies a great difficulty that the image information is processed in the LMM as con… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  42. arXiv:2403.04184  [pdf, other

    cs.SI cs.CY

    Exploring the Impact of Opinion Polarization on Short Video Consumption

    Authors: Bangde Du, Ziyi Ye, Zhijing Wu, Qingyao Ai, Yiqun Liu

    Abstract: Investigating the increasingly popular domain of short video consumption, this study focuses on the impact of Opinion Polarization (OP), a significant factor in the digital landscape influencing public opinions and social interactions. We analyze OP's effect on viewers' perceptions and behaviors, finding that traditional feedback metrics like likes and watch time fail to fully capture and measure… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: 9 pages, 8 figures

    MSC Class: 92C55 ACM Class: H.5.2; K.4.2; J.4

  43. arXiv:2403.00467  [pdf, other

    cs.CV

    When ControlNet Meets Inexplicit Masks: A Case Study of ControlNet on its Contour-following Ability

    Authors: Wenjie Xuan, Yufei Xu, Shanshan Zhao, Chaoyue Wang, Juhua Liu, Bo Du, Dacheng Tao

    Abstract: ControlNet excels at creating content that closely matches precise contours in user-provided masks. However, when these masks contain noise, as a frequent occurrence with non-expert users, the output would include unwanted artifacts. This paper first highlights the crucial role of controlling the impact of these inexplicit masks with diverse deterioration levels through in-depth analysis. Subseque… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  44. Boosting Semi-Supervised Object Detection in Remote Sensing Images With Active Teaching

    Authors: Boxuan Zhang, Zengmao Wang, Bo Du

    Abstract: The lack of object-level annotations poses a significant challenge for object detection in remote sensing images (RSIs). To address this issue, active learning (AL) and semi-supervised learning (SSL) techniques have been proposed to enhance the quality and quantity of annotations. AL focuses on selecting the most informative samples for annotation, while SSL leverages the knowledge from unlabeled… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

    Journal ref: in IEEE Geoscience and Remote Sensing Letters, vol. 21, pp. 1-5, 2024

  45. arXiv:2402.17464  [pdf, other

    cs.CV

    Generative 3D Part Assembly via Part-Whole-Hierarchy Message Passing

    Authors: Bi'an Du, Xiang Gao, Wei Hu, Renjie Liao

    Abstract: Generative 3D part assembly involves understanding part relationships and predicting their 6-DoF poses for assembling a realistic 3D shape. Prior work often focus on the geometry of individual parts, neglecting part-whole hierarchies of objects. Leveraging two key observations: 1) super-part poses provide strong hints about part poses, and 2) predicting super-part poses is easier due to fewer supe… ▽ More

    Submitted 26 March, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

  46. arXiv:2402.15253  [pdf, other

    cs.DC

    PICO: Accelerating All k-Core Paradigms on GPU

    Authors: Chen Zhao, Ting Yu, Zhigao Zheng, Song Jin, Jiawei Jiang, Bo Du, Dacheng Tao

    Abstract: Core decomposition is a well-established graph mining problem with various applications that involves partitioning the graph into hierarchical subgraphs. Solutions to this problem have been developed using both bottom-up and top-down approaches from the perspective of vertex convergence dependency. However, existing algorithms have not effectively harnessed GPU performance to expedite core decompo… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  47. arXiv:2402.11890  [pdf, other

    cs.CL

    Revisiting Knowledge Distillation for Autoregressive Language Models

    Authors: Qihuang Zhong, Liang Ding, Li Shen, Juhua Liu, Bo Du, Dacheng Tao

    Abstract: Knowledge distillation (KD) is a common approach to compress a teacher model to reduce its inference cost and memory footprint, by training a smaller student model. However, in the context of autoregressive language models (LMs), we empirically find that larger teacher LMs might dramatically result in a poorer student. In response to this problem, we conduct a series of analyses and reveal that di… ▽ More

    Submitted 16 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: Accepted to ACL2024 Main Conference

  48. arXiv:2402.11889  [pdf, other

    cs.CL

    ROSE Doesn't Do That: Boosting the Safety of Instruction-Tuned Large Language Models with Reverse Prompt Contrastive Decoding

    Authors: Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, Dacheng Tao

    Abstract: With the development of instruction-tuned large language models (LLMs), improving the safety of LLMs has become more critical. However, the current approaches for aligning the LLMs output with expected safety usually require substantial training efforts, e.g., high-quality safety data and expensive computational resources, which are costly and inefficient. To this end, we present reverse prompt co… ▽ More

    Submitted 16 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: Accepted to ACL2024 Findings

  49. arXiv:2402.09372  [pdf, other

    eess.IV cs.AI cs.CV

    Deep Rib Fracture Instance Segmentation and Classification from CT on the RibFrac Challenge

    Authors: Jiancheng Yang, Rui Shi, Liang Jin, Xiaoyang Huang, Kaiming Kuang, Donglai Wei, Shixuan Gu, Jianying Liu, Pengfei Liu, Zhizhong Chai, Yongjie Xiao, Hao Chen, Liming Xu, Bang Du, Xiangyi Yan, Hao Tang, Adam Alessio, Gregory Holste, Jiapeng Zhang, Xiaoming Wang, Jianye He, Lixuan Che, Hanspeter Pfister, Ming Li, Bingbing Ni

    Abstract: Rib fractures are a common and potentially severe injury that can be challenging and labor-intensive to detect in CT scans. While there have been efforts to address this field, the lack of large-scale annotated datasets and evaluation benchmarks has hindered the development and validation of deep learning algorithms. To address this issue, the RibFrac Challenge was introduced, providing a benchmar… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

    Comments: Challenge paper for MICCAI RibFrac Challenge (https://ribfrac.grand-challenge.org/)

  50. arXiv:2402.02724  [pdf, other

    eess.IV cs.CV cs.LG

    FDNet: Frequency Domain Denoising Network For Cell Segmentation in Astrocytes Derived From Induced Pluripotent Stem Cells

    Authors: Haoran Li, Jiahua Shi, Huaming Chen, Bo Du, Simon Maksour, Gabrielle Phillips, Mirella Dottori, Jun Shen

    Abstract: Artificially generated induced pluripotent stem cells (iPSCs) from somatic cells play an important role for disease modeling and drug screening of neurodegenerative diseases. Astrocytes differentiated from iPSCs are important targets to investigate neuronal metabolism. The astrocyte differentiation progress can be monitored through the variations of morphology observed from microscopy images at di… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

    Comments: Accepted by The IEEE International Symposium on Biomedical Imaging (ISBI) 2024