Skip to main content

Showing 1–50 of 156 results for author: Xing, Y

  1. arXiv:2406.14855  [pdf, other

    cs.CV cs.CR

    Six-CD: Benchmarking Concept Removals for Benign Text-to-image Diffusion Models

    Authors: Jie Ren, Kangrui Chen, Yingqian Cui, Shenglai Zeng, Hui Liu, Yue Xing, Jiliang Tang, Lingjuan Lyu

    Abstract: Text-to-image (T2I) diffusion models have shown exceptional capabilities in generating images that closely correspond to textual prompts. However, the advancement of T2I diffusion models presents significant risks, as the models could be exploited for malicious purposes, such as generating images with violence or nudity, or creating unauthorized portraits of public figures in inappropriate context… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  2. arXiv:2406.14773  [pdf, other

    cs.CR

    Mitigating the Privacy Issues in Retrieval-Augmented Generation (RAG) via Pure Synthetic Data

    Authors: Shenglai Zeng, Jiankun Zhang, Pengfei He, Jie Ren, Tianqi Zheng, Hanqing Lu, Han Xu, Hui Liu, Yue Xing, Jiliang Tang

    Abstract: Retrieval-augmented generation (RAG) enhances the outputs of language models by integrating relevant information retrieved from external knowledge sources. However, when the retrieval process involves private data, RAG systems may face severe privacy risks, potentially leading to the leakage of sensitive information. To address this issue, we propose using synthetic data as a privacy-preserving al… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  3. arXiv:2406.13933  [pdf, other

    cs.CR

    EnTruth: Enhancing the Traceability of Unauthorized Dataset Usage in Text-to-image Diffusion Models with Minimal and Robust Alterations

    Authors: Jie Ren, Yingqian Cui, Chen Chen, Vikash Sehwag, Yue Xing, Jiliang Tang, Lingjuan Lyu

    Abstract: Generative models, especially text-to-image diffusion models, have significantly advanced in their ability to generate images, benefiting from enhanced architectures, increased computational power, and large-scale datasets. While the datasets play an important role, their protection has remained as an unsolved issue. Current protection strategies, such as watermarks and membership inference, are e… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  4. arXiv:2406.13487  [pdf, other

    cs.LG

    An evidential time-to-event prediction model based on Gaussian random fuzzy numbers

    Authors: Ling Huang, Yucheng Xing, Thierry Denoeux, Mengling Feng

    Abstract: We introduce an evidential model for time-to-event prediction with censored data. In this model, uncertainty on event time is quantified by Gaussian random fuzzy numbers, a newly introduced family of random fuzzy subsets of the real line with associated belief functions, generalizing both Gaussian random variables and Gaussian possibility distributions. Our approach makes minimal assumptions about… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Journal ref: BELIEF2024

  5. arXiv:2406.10794  [pdf, other

    cs.CL

    Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis

    Authors: Yuping Lin, Pengfei He, Han Xu, Yue Xing, Makoto Yamada, Hui Liu, Jiliang Tang

    Abstract: Large language models (LLMs) are susceptible to a type of attack known as jailbreaking, which misleads LLMs to output harmful contents. Although there are diverse jailbreak attack strategies, there is no unified understanding on why some methods succeed and others fail. This paper explores the behavior of harmful and harmless prompts in the LLM's representation space to investigate the intrinsic p… ▽ More

    Submitted 26 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

  6. arXiv:2406.05658  [pdf, other

    cs.CV cs.AI

    Visual Prompt Tuning in Null Space for Continual Learning

    Authors: Yue Lu, Shizhou Zhang, De Cheng, Yinghui Xing, Nannan Wang, Peng Wang, Yanning Zhang

    Abstract: Existing prompt-tuning methods have demonstrated impressive performances in continual learning (CL), by selecting and updating relevant prompts in the vision-transformer models. On the contrary, this paper aims to learn each task by tuning the prompts in the direction orthogonal to the subspace spanned by previous tasks' features, so as to ensure no interference on tasks that have been learned to… ▽ More

    Submitted 10 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: 20 pages, 10 figures

  7. arXiv:2406.05535  [pdf, other

    cs.LG cs.AI cs.CR

    Perturbation Towards Easy Samples Improves Targeted Adversarial Transferability

    Authors: Junqi Gao, Biqing Qi, Yao Li, Zhichang Guo, Dong Li, Yuming Xing, Dazhi Zhang

    Abstract: The transferability of adversarial perturbations provides an effective shortcut for black-box attacks. Targeted perturbations have greater practicality but are more difficult to transfer between models. In this paper, we experimentally and theoretically demonstrated that neural networks trained on the same dataset have more consistent performance in High-Sample-Density-Regions (HSDR) of each class… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Journal ref: Advances in Neural Information Processing Systems 36, 2023

  8. arXiv:2405.19334  [pdf, other

    cs.AI cs.CL cs.CV cs.MM cs.SD

    LLMs Meet Multimodal Generation and Editing: A Survey

    Authors: Yingqing He, Zhaoyang Liu, Jingye Chen, Zeyue Tian, Hongyu Liu, Xiaowei Chi, Runtao Liu, Ruibin Yuan, Yazhou Xing, Wenhai Wang, Jifeng Dai, Yong Zhang, Wei Xue, Qifeng Liu, Yike Guo, Qifeng Chen

    Abstract: With the recent advancement in large language models (LLMs), there is a growing interest in combining LLMs with multimodal learning. Previous surveys of multimodal large language models (MLLMs) mainly focus on multimodal understanding. This survey elaborates on multimodal generation and editing across various domains, comprising image, video, 3D, and audio. Specifically, we summarize the notable a… ▽ More

    Submitted 9 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: 52 Pages with 16 Figures, 12 Tables, and 545 References. GitHub Repository at: https://github.com/YingqingHe/Awesome-LLMs-meet-Multimodal-Generation

  9. Defect Category Prediction Based on Multi-Source Domain Adaptation

    Authors: Ying Xing, Mengci Zhao, Bin Yang, Yuwei Zhang, Wenjin Li, Jiawei Gu, Jun Yuan

    Abstract: In recent years, defect prediction techniques based on deep learning have become a prominent research topic in the field of software engineering. These techniques can identify potential defects without executing the code. However, existing approaches mostly concentrate on determining the presence of defects at the method-level code, lacking the ability to precisely classify specific defect categor… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: 17 pages, in Chinese language, 8 figures (Due to length constraints of the abstract field, please refer to the original PDF file for the full content of abstract.)

    Journal ref: Journal of Software [2024]

  10. arXiv:2405.08284  [pdf

    econ.EM cs.LG stat.AP

    Predicting NVIDIA's Next-Day Stock Price: A Comparative Analysis of LSTM, MLP, ARIMA, and ARIMA-GARCH Models

    Authors: Yiluan Xing, Chao Yan, Cathy Chang Xie

    Abstract: Forecasting stock prices remains a considerable challenge in financial markets, bearing significant implications for investors, traders, and financial institutions. Amid the ongoing AI revolution, NVIDIA has emerged as a key player driving innovation across various sectors. Given its prominence, we chose NVIDIA as the subject of our study.

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: 7 pages, 4 figures, 2 tables, conference paper

  11. arXiv:2405.05526  [pdf, other

    cs.RO

    Benchmarking Neural Radiance Fields for Autonomous Robots: An Overview

    Authors: Yuhang Ming, Xingrui Yang, Weihan Wang, Zheng Chen, Jinglun Feng, Yifan Xing, Guofeng Zhang

    Abstract: Neural Radiance Fields (NeRF) have emerged as a powerful paradigm for 3D scene representation, offering high-fidelity renderings and reconstructions from a set of sparse and unstructured sensor data. In the context of autonomous robotics, where perception and understanding of the environment are pivotal, NeRF holds immense promise for improving performance. In this paper, we present a comprehensiv… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: 32 pages, 5 figures, 8 tables

  12. arXiv:2405.02583  [pdf, other

    cs.AI

    Explainable Interface for Human-Autonomy Teaming: A Survey

    Authors: Xiangqi Kong, Yang Xing, Antonios Tsourdos, Ziyue Wang, Weisi Guo, Adolfo Perrusquia, Andreas Wikander

    Abstract: Nowadays, large-scale foundation models are being increasingly integrated into numerous safety-critical applications, including human-autonomy teaming (HAT) within transportation, medical, and defence domains. Consequently, the inherent 'black-box' nature of these sophisticated deep neural networks heightens the significance of fostering mutual understanding and trust between humans and autonomous… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: 45 pages, 9 figures

  13. arXiv:2405.01102  [pdf, other

    cs.LG cs.AI

    Less is More: on the Over-Globalizing Problem in Graph Transformers

    Authors: Yujie Xing, Xiao Wang, Yibo Li, Hai Huang, Chuan Shi

    Abstract: Graph Transformer, due to its global attention mechanism, has emerged as a new tool in dealing with graph-structured data. It is well recognized that the global attention mechanism considers a wider receptive field in a fully connected graph, leading many to believe that useful information can be extracted from all the nodes. In this paper, we challenge this belief: does the globalizing property a… ▽ More

    Submitted 24 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

    Comments: Accepted by ICML 2024 (Camera-Ready)

  14. arXiv:2404.07620  [pdf, other

    eess.IV cs.CV

    Diffusion Probabilistic Multi-cue Level Set for Reducing Edge Uncertainty in Pancreas Segmentation

    Authors: Yue Gou, Yuming Xing, Shengzhu Shi, Zhichang Guo

    Abstract: Accurately segmenting the pancreas remains a huge challenge. Traditional methods encounter difficulties in semantic localization due to the small volume and distorted structure of the pancreas, while deep learning methods encounter challenges in obtaining accurate edges because of low contrast and organ overlapping. To overcome these issues, we propose a multi-cue level set method based on the dif… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  15. arXiv:2404.02933  [pdf, other

    cs.DB cs.AI cs.CL

    NL2KQL: From Natural Language to Kusto Query

    Authors: Amir H. Abdi, Xinye Tang, Jeremias Eichelbaum, Mahan Das, Alex Klein, Nihal Irmak Pakis, William Blum, Daniel L Mace, Tanvi Raja, Namrata Padmanabhan, Ye Xing

    Abstract: Data is growing rapidly in volume and complexity. Proficiency in database query languages is pivotal for crafting effective queries. As coding assistants become more prevalent, there is significant opportunity to enhance database query languages. The Kusto Query Language (KQL) is a widely used query language for large semi-structured data such as logs, telemetries, and time-series for big data ana… ▽ More

    Submitted 15 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

  16. arXiv:2404.02517  [pdf, other

    cs.CV

    HENet: Hybrid Encoding for End-to-end Multi-task 3D Perception from Multi-view Cameras

    Authors: Zhongyu Xia, ZhiWei Lin, Xinhao Wang, Yongtao Wang, Yun Xing, Shengxiang Qi, Nan Dong, Ming-Hsuan Yang

    Abstract: Three-dimensional perception from multi-view cameras is a crucial component in autonomous driving systems, which involves multiple tasks like 3D object detection and bird's-eye-view (BEV) semantic segmentation. To improve perception precision, large image encoders, high-resolution images, and long-term temporal inputs have been adopted in recent 3D perception models, bringing remarkable performanc… ▽ More

    Submitted 20 May, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

  17. arXiv:2404.01789  [pdf

    cs.SE

    A Feature Dataset of Microservices-based Systems

    Authors: Weipan Yang, Yongchao Xing, Yiming Lyu, Zhihao Liang, Zhiying Tu

    Abstract: Microservice architecture has become a dominant architectural style in the service-oriented software industry. Poor practices in the design and development of microservices are called microservice bad smells. In microservice bad smells research, the detection of these bad smells relies on feature data from microservices. However, there is a lack of an appropriate open-source microservice feature d… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  18. arXiv:2403.15492  [pdf, other

    cs.CL

    Visual Analytics for Fine-grained Text Classification Models and Datasets

    Authors: Munkhtulga Battogtokh, Yiwen Xing, Cosmin Davidescu, Alfie Abdul-Rahman, Michael Luck, Rita Borgo

    Abstract: In natural language processing (NLP), text classification tasks are increasingly fine-grained, as datasets are fragmented into a larger number of classes that are more difficult to differentiate from one another. As a consequence, the semantic structures of datasets have become more complex, and model decisions more difficult to explain. Existing tools, suited for coarse-grained classification, fa… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  19. arXiv:2403.13728  [pdf, ps, other

    cs.LG cs.AI

    M-HOF-Opt: Multi-Objective Hierarchical Output Feedback Optimization via Multiplier Induced Loss Landscape Scheduling

    Authors: Xudong Sun, Nutan Chen, Alexej Gossmann, Yu Xing, Carla Feistner, Emilio Dorigatt, Felix Drost, Daniele Scarcella, Lisa Beer, Carsten Marr

    Abstract: We address the online combinatorial choice of weight multipliers for multi-objective optimization of many loss terms parameterized by neural works via a probabilistic graphical model (PGM) for the joint model parameter and multiplier evolution process, with a hypervolume based likelihood promoting multi-objective descent. The corresponding parameter and multiplier estimation as a sequential decisi… ▽ More

    Submitted 10 April, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

  20. arXiv:2403.11662  [pdf, other

    cs.RO

    FE-DeTr: Keypoint Detection and Tracking in Low-quality Image Frames with Events

    Authors: Xiangyuan Wang, Kuangyi Chen, Wen Yang, Lei Yu, Yannan Xing, Huai Yu

    Abstract: Keypoint detection and tracking in traditional image frames are often compromised by image quality issues such as motion blur and extreme lighting conditions. Event cameras offer potential solutions to these challenges by virtue of their high temporal resolution and high dynamic range. However, they have limited performance in practical applications due to their inherent noise in event data. This… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: 7 pages, Accepted by ICRA 2024

  21. arXiv:2403.11052  [pdf, other

    cs.CV cs.CR

    Unveiling and Mitigating Memorization in Text-to-image Diffusion Models through Cross Attention

    Authors: Jie Ren, Yaxin Li, Shenglai Zeng, Han Xu, Lingjuan Lyu, Yue Xing, Jiliang Tang

    Abstract: Recent advancements in text-to-image diffusion models have demonstrated their remarkable capability to generate high-quality images from textual prompts. However, increasing research indicates that these models memorize and replicate images from their training data, raising tremendous concerns about potential copyright infringement and privacy risks. In our study, we provide a novel perspective to… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

  22. arXiv:2403.05394  [pdf, other

    cs.CV

    A Deep Learning Method for Classification of Biophilic Artworks

    Authors: Purna Kar, Jordan J. Bird, Yangang Xing, Alexander Sumich, Andrew Knight, Ahmad Lotfi, Benedict Carpenter van Barthold

    Abstract: Biophilia is an innate love for living things and nature itself that has been associated with a positive impact on mental health and well-being. This study explores the application of deep learning methods for the classification of Biophilic artwork, in order to learn and explain the different Biophilic characteristics present in a visual representation of a painting. Using the concept of Biophili… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  23. arXiv:2403.03967  [pdf, other

    cs.LG cs.CR stat.ML

    Effect of Ambient-Intrinsic Dimension Gap on Adversarial Vulnerability

    Authors: Rajdeep Haldar, Yue Xing, Qifan Song

    Abstract: The existence of adversarial attacks on machine learning models imperceptible to a human is still quite a mystery from a theoretical perspective. In this work, we introduce two notions of adversarial attacks: natural or on-manifold attacks, which are perceptible by a human/oracle, and unnatural or off-manifold attacks, which are not. We argue that the existence of the off-manifold attacks is a nat… ▽ More

    Submitted 23 March, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

    Comments: AISTATS 2024

  24. arXiv:2403.02784  [pdf, other

    cs.CV

    DDF: A Novel Dual-Domain Image Fusion Strategy for Remote Sensing Image Semantic Segmentation with Unsupervised Domain Adaptation

    Authors: Lingyan Ran, Lushuang Wang, Tao Zhuo, Yinghui Xing

    Abstract: Semantic segmentation of remote sensing images is a challenging and hot issue due to the large amount of unlabeled data. Unsupervised domain adaptation (UDA) has proven to be advantageous in incorporating unclassified information from the target domain. However, independently fine-tuning UDA models on the source and target domains has a limited effect on the outcome. This paper proposes a hybrid t… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  25. arXiv:2402.17723  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners

    Authors: Yazhou Xing, Yingqing He, Zeyue Tian, Xintao Wang, Qifeng Chen

    Abstract: Video and audio content creation serves as the core technique for the movie industry and professional users. Recently, existing diffusion-based methods tackle video and audio generation separately, which hinders the technique transfer from academia to industry. In this work, we aim at filling the gap, with a carefully designed optimization-based framework for cross-visual-audio and joint-visual-au… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: Accepted to CVPR 2024. Project website: https://yzxing87.github.io/Seeing-and-Hearing/

  26. arXiv:2402.16893  [pdf, other

    cs.CR cs.AI cs.CL

    The Good and The Bad: Exploring Privacy Issues in Retrieval-Augmented Generation (RAG)

    Authors: Shenglai Zeng, Jiankun Zhang, Pengfei He, Yue Xing, Yiding Liu, Han Xu, Jie Ren, Shuaiqiang Wang, Dawei Yin, Yi Chang, Jiliang Tang

    Abstract: Retrieval-augmented generation (RAG) is a powerful technique to facilitate language model with proprietary and private data, where data privacy is a pivotal concern. Whereas extensive research has demonstrated the privacy risks of large language models (LLMs), the RAG technique could potentially reshape the inherent behaviors of LLM generation, posing new privacy issues that are currently under-ex… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  27. arXiv:2402.14586  [pdf, other

    cs.CV cs.GR

    FrameNeRF: A Simple and Efficient Framework for Few-shot Novel View Synthesis

    Authors: Yan Xing, Pan Wang, Ligang Liu, Daolun Li, Li Zhang

    Abstract: We present a novel framework, called FrameNeRF, designed to apply off-the-shelf fast high-fidelity NeRF models with fast training speed and high rendering quality for few-shot novel view synthesis tasks. The training stability of fast high-fidelity models is typically constrained to dense views, making them unsuitable for few-shot novel view synthesis tasks. To address this limitation, we utilize… ▽ More

    Submitted 26 February, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

  28. arXiv:2402.03631  [pdf, other

    cs.CV

    Conditional Tuning Network for Few-Shot Adaptation of Segmentation Anything Model

    Authors: Aoran Xiao, Weihao Xuan, Heli Qi, Yun Xing, Ruijie Ren, Xiaoqin Zhang, Ling Shao, Shijian Lu

    Abstract: The recent Segment Anything Model (SAM) has demonstrated remarkable zero-shot capability and flexible geometric prompting in general image segmentation. However, SAM often struggles when handling various unconventional images, such as aerial, medical, and non-RGB images. This paper presents CAT-SAM, a ConditionAl Tuning network that adapts SAM toward various unconventional target tasks with just f… ▽ More

    Submitted 21 March, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: Project page: https://xiaoaoran.github.io/projects/CAT-SAM

  29. arXiv:2402.02160  [pdf, other

    cs.CR

    Data Poisoning for In-context Learning

    Authors: Pengfei He, Han Xu, Yue Xing, Hui Liu, Makoto Yamada, Jiliang Tang

    Abstract: In the domain of large language models (LLMs), in-context learning (ICL) has been recognized for its innovative ability to adapt to new tasks, relying on examples rather than retraining or fine-tuning. This paper delves into the critical issue of ICL's susceptibility to data poisoning attacks, an area not yet fully explored. We wonder whether ICL is vulnerable, with adversaries capable of manipula… ▽ More

    Submitted 27 March, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

  30. arXiv:2402.00743  [pdf, other

    cs.LG cs.CL stat.ML

    Theoretical Understanding of In-Context Learning in Shallow Transformers with Unstructured Data

    Authors: Yue Xing, Xiaofeng Lin, Chenheng Xu, Namjoon Suh, Qifan Song, Guang Cheng

    Abstract: Large language models (LLMs) are powerful models that can learn concepts at the inference stage via in-context learning (ICL). While theoretical studies, e.g., \cite{zhang2023trained}, attempt to explain the mechanism of ICL, they assume the input $x_i$ and the output $y_i$ of each demonstration example are in the same token (i.e., structured data). However, in real practice, the examples are usua… ▽ More

    Submitted 18 June, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

  31. arXiv:2401.17426  [pdf, other

    cs.LG cs.AI stat.ML

    Superiority of Multi-Head Attention in In-Context Linear Regression

    Authors: Yingqian Cui, Jie Ren, Pengfei He, Jiliang Tang, Yue Xing

    Abstract: We present a theoretical analysis of the performance of transformer with softmax attention in in-context learning with linear regression tasks. While the existing literature predominantly focuses on the convergence of transformers with single-/multi-head attention, our research centers on comparing their performance. We conduct an exact theoretical analysis to demonstrate that multi-head attention… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

  32. arXiv:2401.16784  [pdf, other

    cs.LG cs.AI cs.SI

    Graph Fairness Learning under Distribution Shifts

    Authors: Yibo Li, Xiao Wang, Yujie Xing, Shaohua Fan, Ruijia Wang, Yaoqi Liu, Chuan Shi

    Abstract: Graph neural networks (GNNs) have achieved remarkable performance on graph-structured data. However, GNNs may inherit prejudice from the training data and make discriminatory predictions based on sensitive attributes, such as gender and race. Recently, there has been an increasing interest in ensuring fairness on GNNs, but all of them are under the assumption that the training and testing data are… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

    Comments: Accepted by WWW 2024

  33. arXiv:2401.15248  [pdf, other

    cs.LG stat.ML

    Better Representations via Adversarial Training in Pre-Training: A Theoretical Perspective

    Authors: Yue Xing, Xiaofeng Lin, Qifan Song, Yi Xu, Belinda Zeng, Guang Cheng

    Abstract: Pre-training is known to generate universal representations for downstream tasks in large-scale deep learning such as large language models. Existing literature, e.g., \cite{kim2020adversarial}, empirically observe that the downstream tasks can inherit the adversarial robustness of the pre-trained model. We provide theoretical justifications for this robustness inheritance phenomenon. Our theoreti… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

    Comments: To appear in AISTATS2024

  34. arXiv:2401.11914  [pdf, other

    cs.CV

    A Saliency Enhanced Feature Fusion based multiscale RGB-D Salient Object Detection Network

    Authors: Rui Huang, Qingyi Zhao, Yan Xing, Sihua Gao, Weifeng Xu, Yuxiang Zhang, Wei Fan

    Abstract: Multiscale convolutional neural network (CNN) has demonstrated remarkable capabilities in solving various vision problems. However, fusing features of different scales alwaysresults in large model sizes, impeding the application of multiscale CNNs in RGB-D saliency detection. In this paper, we propose a customized feature fusion module, called Saliency Enhanced Feature Fusion (SEFF), for RGB-D sal… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: Accpeted by 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024)

  35. arXiv:2401.08407  [pdf, other

    cs.CV

    Cross-Domain Few-Shot Segmentation via Iterative Support-Query Correspondence Mining

    Authors: Jiahao Nie, Yun Xing, Gongjie Zhang, Pei Yan, Aoran Xiao, Yap-Peng Tan, Alex C. Kot, Shijian Lu

    Abstract: Cross-Domain Few-Shot Segmentation (CD-FSS) poses the challenge of segmenting novel categories from a distinct domain using only limited exemplars. In this paper, we undertake a comprehensive study of CD-FSS and uncover two crucial insights: (i) the necessity of a fine-tuning stage to effectively transfer the learned meta-knowledge across domains, and (ii) the overfitting risk during the naïve fin… ▽ More

    Submitted 13 March, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: Accepted by CVPR 2024

  36. arXiv:2401.05153  [pdf, other

    cs.CV eess.IV

    CrossDiff: Exploring Self-Supervised Representation of Pansharpening via Cross-Predictive Diffusion Model

    Authors: Yinghui Xing, Litao Qu, Shizhou Zhang, Kai Zhang, Yanning Zhang

    Abstract: Fusion of a panchromatic (PAN) image and corresponding multispectral (MS) image is also known as pansharpening, which aims to combine abundant spatial details of PAN and spectral information of MS. Due to the absence of high-resolution MS images, available deep-learning-based methods usually follow the paradigm of training at reduced resolution and testing at both reduced and full resolution. When… ▽ More

    Submitted 13 January, 2024; v1 submitted 10 January, 2024; originally announced January 2024.

  37. arXiv:2312.01027  [pdf, other

    cs.CV

    LDM-ISP: Enhancing Neural ISP for Low Light with Latent Diffusion Models

    Authors: Qiang Wen, Yazhou Xing, Zhefan Rao, Qifeng Chen

    Abstract: Enhancing a low-light noisy RAW image into a well-exposed and clean sRGB image is a significant challenge for modern digital cameras. Prior approaches have difficulties in recovering fine-grained details and true colors of the scene under extremely low-light environments due to near-to-zero SNR. Meanwhile, diffusion models have shown significant progress towards general domain image generation. In… ▽ More

    Submitted 19 March, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

  38. arXiv:2310.06873  [pdf, other

    eess.IV cs.CV

    A review of uncertainty quantification in medical image analysis: probabilistic and non-probabilistic methods

    Authors: Ling Huang, Su Ruan, Yucheng Xing, Mengling Feng

    Abstract: The comprehensive integration of machine learning healthcare models within clinical practice remains suboptimal, notwithstanding the proliferation of high-performing solutions reported in the literature. A predominant factor hindering widespread adoption pertains to an insufficiency of evidence affirming the reliability of the aforementioned models. Recently, uncertainty quantification methods hav… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2210.03736 by other authors

  39. arXiv:2310.06714  [pdf, other

    cs.AI cs.CL cs.LG

    Exploring Memorization in Fine-tuned Language Models

    Authors: Shenglai Zeng, Yaxin Li, Jie Ren, Yiding Liu, Han Xu, Pengfei He, Yue Xing, Shuaiqiang Wang, Jiliang Tang, Dawei Yin

    Abstract: Large language models (LLMs) have shown great capabilities in various tasks but also exhibited memorization of training data, raising tremendous privacy and copyright concerns. While prior works have studied memorization during pre-training, the exploration of memorization during fine-tuning is rather limited. Compared to pre-training, fine-tuning typically involves more sensitive data and diverse… ▽ More

    Submitted 22 February, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

  40. arXiv:2310.05263  [pdf, other

    cs.CR

    Confidence-driven Sampling for Backdoor Attacks

    Authors: Pengfei He, Han Xu, Yue Xing, Jie Ren, Yingqian Cui, Shenglai Zeng, Jiliang Tang, Makoto Yamada, Mohammad Sabokrou

    Abstract: Backdoor attacks aim to surreptitiously insert malicious triggers into DNN models, granting unauthorized control during testing scenarios. Existing methods lack robustness against defense strategies and predominantly focus on enhancing trigger stealthiness while randomly selecting poisoned samples. Our research highlights the overlooked drawbacks of random sampling, which make that attack detectab… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

  41. arXiv:2310.02401  [pdf, other

    cs.CV cs.CR

    FT-Shield: A Watermark Against Unauthorized Fine-tuning in Text-to-Image Diffusion Models

    Authors: Yingqian Cui, Jie Ren, Yuping Lin, Han Xu, Pengfei He, Yue Xing, Lingjuan Lyu, Wenqi Fan, Hui Liu, Jiliang Tang

    Abstract: Text-to-image generative models, especially those based on latent diffusion models (LDMs), have demonstrated outstanding ability in generating high-quality and high-resolution images from textual prompts. With this advancement, various fine-tuning methods have been developed to personalize text-to-image models for specific applications such as artistic style adaptation and human face transfer. How… ▽ More

    Submitted 3 May, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

  42. arXiv:2309.13505  [pdf, other

    cs.CV

    Rewrite Caption Semantics: Bridging Semantic Gaps for Language-Supervised Semantic Segmentation

    Authors: Yun Xing, Jian Kang, Aoran Xiao, Jiahao Nie, Ling Shao, Shijian Lu

    Abstract: Vision-Language Pre-training has demonstrated its remarkable zero-shot recognition ability and potential to learn generalizable visual representations from language supervision. Taking a step ahead, language-supervised semantic segmentation enables spatial localization of textual inputs by learning pixel grouping solely from image-text pairs. Nevertheless, the state-of-the-art suffers from clear s… ▽ More

    Submitted 4 January, 2024; v1 submitted 23 September, 2023; originally announced September 2023.

    Comments: NeurIPS 2023. Code is available at https://github.com/xing0047/rewrite

  43. arXiv:2308.16325  [pdf

    cs.CV

    Two-Stage Violence Detection Using ViTPose and Classification Models at Smart Airports

    Authors: İrem Üstek, Jay Desai, Iván López Torrecillas, Sofiane Abadou, Jinjie Wang, Quentin Fever, Sandhya Rani Kasthuri, Yang Xing, Weisi Guo, Antonios Tsourdos

    Abstract: This study introduces an innovative violence detection framework tailored to the unique requirements of smart airports, where prompt responses to violent situations are crucial. The proposed framework harnesses the power of ViTPose for human pose estimation. It employs a CNN - BiLSTM network to analyse spatial and temporal information within keypoints sequences, enabling the accurate classificatio… ▽ More

    Submitted 30 August, 2023; originally announced August 2023.

  44. arXiv:2308.15462  [pdf, other

    cs.CV

    Online Overexposed Pixels Hallucination in Videos with Adaptive Reference Frame Selection

    Authors: Yazhou Xing, Amrita Mazumdar, Anjul Patney, Chao Liu, Hongxu Yin, Qifeng Chen, Jan Kautz, Iuri Frosio

    Abstract: Low dynamic range (LDR) cameras cannot deal with wide dynamic range inputs, frequently leading to local overexposure issues. We present a learning-based system to reduce these artifacts without resorting to complex acquisition mechanisms like alternating exposures or costly processing that are typical of high dynamic range (HDR) imaging. We propose a transformer-based deep neural network (DNN) to… ▽ More

    Submitted 29 August, 2023; originally announced August 2023.

    Comments: The demo video can be found at https://drive.google.com/file/d/1-r12BKImLOYCLUoPzdebnMyNjJ4Rk360/view

  45. arXiv:2308.14460  [pdf, other

    cs.SE

    STEAM: Simulating the InTeractive BEhavior of ProgrAMmers for Automatic Bug Fixing

    Authors: Yuwei Zhang, Zhi Jin, Ying Xing, Ge Li

    Abstract: Bug fixing holds significant importance in software development and maintenance. Recent research has made notable progress in exploring the potential of large language models (LLMs) for automatic bug fixing. However, existing studies often overlook the collaborative nature of bug resolution, treating it as a single-stage process. To overcome this limitation, we introduce a novel stage-wise framewo… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

    Comments: 13 pages, 8 figures

  46. Ground-to-Aerial Person Search: Benchmark Dataset and Approach

    Authors: Shizhou Zhang, Qingchun Yang, De Cheng, Yinghui Xing, Guoqiang Liang, Peng Wang, Yanning Zhang

    Abstract: In this work, we construct a large-scale dataset for Ground-to-Aerial Person Search, named G2APS, which contains 31,770 images of 260,559 annotated bounding boxes for 2,644 identities appearing in both of the UAVs and ground surveillance cameras. To our knowledge, this is the first dataset for cross-platform intelligent surveillance applications, where the UAVs could work as a powerful complement… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

    Comments: Accepted by ACM MM 2023

    ACM Class: I.5.4; I.4.8

  47. arXiv:2308.10795  [pdf, other

    cs.HC

    Visualizing Historical Book Trade Data: An Iterative Design Study with Close Collaboration with Domain Experts

    Authors: Yiwen Xing, Cristina Dondi, Rita Borgo, Alfie Abdul-Rahman

    Abstract: The circulation of historical books has always been an area of interest for historians. However, the data used to represent the journey of a book across different places and times can be difficult for domain experts to digest due to buried geographical and chronological features within text-based presentations. This situation provides an opportunity for collaboration between visualization research… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

  48. arXiv:2307.11853  [pdf, other

    cs.CR cs.SE

    Exploring Security Commits in Python

    Authors: Shiyu Sun, Shu Wang, Xinda Wang, Yunlong Xing, Elisa Zhang, Kun Sun

    Abstract: Python has become the most popular programming language as it is friendly to work with for beginners. However, a recent study has found that most security issues in Python have not been indexed by CVE and may only be fixed by 'silent' security commits, which pose a threat to software security and hinder the security fixes to downstream software. It is critical to identify the hidden security commi… ▽ More

    Submitted 21 July, 2023; originally announced July 2023.

    Comments: Accepted to 2023 IEEE International Conference on Software Maintenance and Evolution (ICSME)

  49. arXiv:2307.10685  [pdf, other

    cs.CV

    Pre-train, Adapt and Detect: Multi-Task Adapter Tuning for Camouflaged Object Detection

    Authors: Yinghui Xing, Dexuan Kong, Shizhou Zhang, Geng Chen, Lingyan Ran, Peng Wang, Yanning Zhang

    Abstract: Camouflaged object detection (COD), aiming to segment camouflaged objects which exhibit similar patterns with the background, is a challenging task. Most existing works are dedicated to establishing specialized modules to identify camouflaged objects with complete and fine details, while the boundary can not be well located for the lack of object-related semantics. In this paper, we propose a nove… ▽ More

    Submitted 22 August, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

  50. arXiv:2307.04047  [pdf, other

    cs.CV

    Threshold-Consistent Margin Loss for Open-World Deep Metric Learning

    Authors: Qin Zhang, Linghan Xu, Qingming Tang, Jun Fang, Ying Nian Wu, Joe Tighe, Yifan Xing

    Abstract: Existing losses used in deep metric learning (DML) for image retrieval often lead to highly non-uniform intra-class and inter-class representation structures across test classes and data distributions. When combined with the common practice of using a fixed threshold to declare a match, this gives rise to significant performance variations in terms of false accept rate (FAR) and false reject rate… ▽ More

    Submitted 12 March, 2024; v1 submitted 8 July, 2023; originally announced July 2023.

    Comments: Accepted to ICLR'24