Skip to main content

Showing 1–50 of 220 results for author: Ye, M

  1. arXiv:2406.18937  [pdf, other

    cs.LG cs.AI

    Federated Graph Semantic and Structural Learning

    Authors: Wenke Huang, Guancheng Wan, Mang Ye, Bo Du

    Abstract: Federated graph learning collaboratively learns a global graph neural network with distributed graphs, where the non-independent and identically distributed property is one of the major challenges. Most relative arts focus on traditional distributed tasks like images and voices, incapable of graph structures. This paper firstly reveals that local client distortion is brought by both node-level sem… ▽ More

    Submitted 29 June, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

    Journal ref: International Joint Conference on Artificial Intelligence (IJCAI), 2023

  2. arXiv:2406.18074  [pdf, other

    cs.CV cs.AI

    Few-Shot Medical Image Segmentation with High-Fidelity Prototypes

    Authors: Song Tang, Shaxu Yan, Xiaozhi Qi, Jianxin Gao, Mao Ye, Jianwei Zhang, Xiatian Zhu

    Abstract: Few-shot Semantic Segmentation (FSS) aims to adapt a pretrained model to new classes with as few as a single labelled training sample per class. Despite the prototype based approaches have achieved substantial success, existing models are limited to the imaging scenarios with considerably distinct objects and not highly complex background, e.g., natural images. This makes such models suboptimal fo… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  3. arXiv:2406.17963  [pdf, other

    cs.LG cs.HC cs.SI

    Empowering Interdisciplinary Insights with Dynamic Graph Embedding Trajectories

    Authors: Yiqiao Jin, Andrew Zhao, Yeon-Chang Lee, Meng Ye, Ajay Divakaran, Srijan Kumar

    Abstract: We developed DyGETViz, a novel framework for effectively visualizing dynamic graphs (DGs) that are ubiquitous across diverse real-world systems. This framework leverages recent advancements in discrete-time dynamic graph (DTDG) models to adeptly handle the temporal dynamics inherent in dynamic graphs. DyGETViz effectively captures both micro- and macro-level structural shifts within these graphs,… ▽ More

    Submitted 28 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

    Comments: 27 pages, 11 figures

  4. arXiv:2406.16442  [pdf, other

    cs.CV

    EmoLLM: Multimodal Emotional Understanding Meets Large Language Models

    Authors: Qu Yang, Mang Ye, Bo Du

    Abstract: Multi-modal large language models (MLLMs) have achieved remarkable performance on objective multimodal perception tasks, but their ability to interpret subjective, emotionally nuanced multimodal content remains largely unexplored. Thus, it impedes their ability to effectively understand and react to the intricate emotions expressed by humans through multimodal media. To bridge this gap, we introdu… ▽ More

    Submitted 29 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

    Comments: 9 pages

  5. arXiv:2406.06949  [pdf, other

    cs.CV cs.AI

    Triple-domain Feature Learning with Frequency-aware Memory Enhancement for Moving Infrared Small Target Detection

    Authors: Weiwei Duan, Luping Ji, Shengjia Chen, Sicheng Zhu, Mao Ye

    Abstract: Moving infrared small target detection presents significant challenges due to tiny target sizes and low contrast against backgrounds. Currently-existing methods primarily focus on extracting target features only from the spatial-temporal domain. For further enhancing feature representation, more information domains such as frequency are believed to be potentially valuable. To extend target feature… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: This paper has submitted to IEEE TGRS,under review

  6. arXiv:2406.05773  [pdf, other

    cs.CV

    CorrMAE: Pre-training Correspondence Transformers with Masked Autoencoder

    Authors: Tangfei Liao, Xiaoqin Zhang, Guobao Xiao, Min Li, Tao Wang, Mang Ye

    Abstract: Pre-training has emerged as a simple yet powerful methodology for representation learning across various domains. However, due to the expensive training cost and limited data, pre-training has not yet been extensively studied in correspondence pruning. To tackle these challenges, we propose a pre-training method to acquire a generic inliers-consistent representation by reconstructing masked corres… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  7. arXiv:2406.01658  [pdf, other

    cs.CV

    Proxy Denoising for Source-Free Domain Adaptation

    Authors: Song Tang, Wenxin Su, Mao Ye, Jianwei Zhang, Xiatian Zhu

    Abstract: Source-free Domain Adaptation (SFDA) aims to adapt a pre-trained source model to an unlabeled target domain with no access to the source data. Inspired by the success of pre-trained large vision-language (ViL) models in many other applications, the latest SFDA methods have also validated the benefit of ViL models by leveraging their predictions as pseudo supervision. However, we observe that ViL's… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  8. arXiv:2405.19358  [pdf, other

    cs.CR cs.AI

    Robustifying Safety-Aligned Large Language Models through Clean Data Curation

    Authors: Xiaoqun Liu, Jiacheng Liang, Muchao Ye, Zhaohan Xi

    Abstract: Large language models (LLMs) are vulnerable when trained on datasets containing harmful content, which leads to potential jailbreaking attacks in two scenarios: the integration of harmful texts within crowdsourced data used for pre-training and direct tampering with LLMs through fine-tuning. In both scenarios, adversaries can compromise the safety alignment of LLMs, exacerbating malfunctions. Moti… ▽ More

    Submitted 30 May, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

  9. arXiv:2405.17495  [pdf, other

    cs.LG cs.CR

    Vertical Federated Learning for Effectiveness, Security, Applicability: A Survey

    Authors: Mang Ye, Wei Shen, Bo Du, Eduard Snezhko, Vassili Kovalev, Pong C. Yuen

    Abstract: Vertical Federated Learning (VFL) is a privacy-preserving distributed learning paradigm where different parties collaboratively learn models using partitioned features of shared samples, without leaking private data. Recent research has shown promising results addressing various challenges in VFL, highlighting its potential for practical applications in cross-domain collaboration. However, the cor… ▽ More

    Submitted 4 June, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

    Comments: 31 pages, 9 figures, 10 tables

  10. arXiv:2405.16585  [pdf, other

    cs.LG cs.AI

    Fair Federated Learning under Domain Skew with Local Consistency and Domain Diversity

    Authors: Yuhang Chen, Wenke Huang, Mang Ye

    Abstract: Federated learning (FL) has emerged as a new paradigm for privacy-preserving collaborative training. Under domain skew, the current FL approaches are biased and face two fairness problems. 1) Parameter Update Conflict: data disparity among clients leads to varying parameter importance and inconsistent update directions. These two disparities cause important parameters to potentially be overwhelmed… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: Accepted by CVPR2024

  11. arXiv:2405.04741  [pdf, other

    cs.CV

    All in One Framework for Multimodal Re-identification in the Wild

    Authors: He Li, Mang Ye, Ming Zhang, Bo Du

    Abstract: In Re-identification (ReID), recent advancements yield noteworthy progress in both unimodal and cross-modal retrieval tasks. However, the challenge persists in developing a unified framework that could effectively handle varying multimodal data, including RGB, infrared, sketches, and textual information. Additionally, the emergence of large-scale models shows promising performance in various visio… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: 12 pages, 3 figure, CVPR 2024

  12. arXiv:2404.18106  [pdf, other

    cs.CV

    Semi-supervised Text-based Person Search

    Authors: Daming Gao, Yang Bai, Min Cao, Hao Dou, Mang Ye, Min Zhang

    Abstract: Text-based person search (TBPS) aims to retrieve images of a specific person from a large image gallery based on a natural language description. Existing methods rely on massive annotated image-text data to achieve satisfactory performance in fully-supervised learning. It poses a significant challenge in practice, as acquiring person images from surveillance videos is relatively easy, while obtain… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: 13 pages

  13. arXiv:2404.02718  [pdf, other

    cs.HC

    Evolving Agents: Interactive Simulation of Dynamic and Diverse Human Personalities

    Authors: Jiale Li, Jiayang Li, Jiahao Chen, Yifan Li, Shijie Wang, Hugo Zhou, Minjun Ye, Yunsheng Su

    Abstract: Human-like Agents with diverse and dynamic personalities could serve as an essential design probe in the process of user-centered design, thereby enabling designers to enhance the user experience of interactive applications. In this article, we introduce Evolving Agents, a novel agent architecture that consists of two systems: Personality and Behavior. The Personality system includes Cognition, Em… ▽ More

    Submitted 17 June, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

  14. arXiv:2403.07601  [pdf, other

    cs.CV

    Unified Source-Free Domain Adaptation

    Authors: Song Tang, Wenxin Su, Mao Ye, Jianwei Zhang, Xiatian Zhu

    Abstract: In the pursuit of transferring a source model to a target domain without access to the source training data, Source-Free Domain Adaptation (SFDA) has been extensively explored across various scenarios, including closed-set, open-set, partial-set, and generalized settings. Existing methods, focusing on specific scenarios, not only address only a subset of challenges but also necessitate prior knowl… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  15. arXiv:2403.06166  [pdf, other

    cs.CV cs.RO

    Cross-Cluster Shifting for Efficient and Effective 3D Object Detection in Autonomous Driving

    Authors: Zhili Chen, Kien T. Pham, Maosheng Ye, Zhiqiang Shen, Qifeng Chen

    Abstract: We present a new 3D point-based detector model, named Shift-SSD, for precise 3D object detection in autonomous driving. Traditional point-based 3D object detectors often employ architectures that rely on a progressive downsampling of points. While this method effectively reduces computational demands and increases receptive fields, it will compromise the preservation of crucial non-local informati… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

    Comments: ICRA2024

  16. arXiv:2402.15750  [pdf, other

    cs.CV math.NA physics.med-ph

    Design, Implementation and Analysis of a Compressed Sensing Photoacoustic Projection Imaging System

    Authors: Markus Haltmeier, Matthias Ye, Karoline Felbermayer, Florian Hinterleitner, Peter Burgholzer

    Abstract: Significance: Compressed sensing (CS) uses special measurement designs combined with powerful mathematical algorithms to reduce the amount of data to be collected while maintaining image quality. This is relevant to almost any imaging modality, and in this paper we focus on CS in photoacoustic projection imaging (PAPI) with integrating line detectors (ILDs). Aim: Our previous research involved r… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

  17. arXiv:2402.11083  [pdf, other

    cs.CV

    VQAttack: Transferable Adversarial Attacks on Visual Question Answering via Pre-trained Models

    Authors: Ziyi Yin, Muchao Ye, Tianrong Zhang, Jiaqi Wang, Han Liu, Jinghui Chen, Ting Wang, Fenglong Ma

    Abstract: Visual Question Answering (VQA) is a fundamental task in computer vision and natural language process fields. Although the ``pre-training & finetuning'' learning paradigm significantly improves the VQA performance, the adversarial robustness of such a learning paradigm has not been explored. In this paper, we delve into a new problem: using a pre-trained multimodal source model to create adversari… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: AAAI 2024, 11 pages

  18. arXiv:2402.01077  [pdf, ps, other

    cs.LG cs.AI

    Recent Advances in Predictive Modeling with Electronic Health Records

    Authors: Jiaqi Wang, Junyu Luo, Muchao Ye, Xiaochen Wang, Yuan Zhong, Aofei Chang, Guanjie Huang, Ziyi Yin, Cao Xiao, Jimeng Sun, Fenglong Ma

    Abstract: The development of electronic health records (EHR) systems has enabled the collection of a vast amount of digitized patient data. However, utilizing EHR data for predictive modeling presents several challenges due to its unique characteristics. With the advancements in machine learning techniques, deep learning has demonstrated its superiority in various applications, including healthcare. This su… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

  19. arXiv:2401.17904  [pdf, other

    cs.CV

    Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation

    Authors: Maoyuan Ye, Jing Zhang, Juhua Liu, Chenyu Liu, Baocai Yin, Cong Liu, Bo Du, Dacheng Tao

    Abstract: The Segment Anything Model (SAM), a profound vision foundation model pre-trained on a large-scale dataset, breaks the boundaries of general segmentation and sparks various downstream applications. This paper introduces Hi-SAM, a unified model leveraging SAM for hierarchical text segmentation. Hi-SAM excels in text segmentation across four hierarchies, including stroke, word, text-line, and paragra… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

    Comments: GitHub repository: https://github.com/ymy-k/Hi-SAM

  20. arXiv:2401.07080  [pdf, other

    cs.CV

    GoMatching: A Simple Baseline for Video Text Spotting via Long and Short Term Matching

    Authors: Haibin He, Maoyuan Ye, Jing Zhang, Juhua Liu, Dacheng Tao

    Abstract: Beyond the text detection and recognition tasks in image text spotting, video text spotting presents an augmented challenge with the inclusion of tracking. While advanced end-to-end trainable methods have shown commendable performance, the pursuit of multi-task optimization may pose the risk of producing sub-optimal outcomes for individual tasks. In this paper, we highlight a main bottleneck in th… ▽ More

    Submitted 13 January, 2024; originally announced January 2024.

  21. arXiv:2401.06960  [pdf, other

    cs.CV cs.AI

    Transformer for Object Re-Identification: A Survey

    Authors: Mang Ye, Shuoyi Chen, Chenyue Li, Wei-Shi Zheng, David Crandall, Bo Du

    Abstract: Object Re-Identification (Re-ID) aims to identify and retrieve specific objects from varying viewpoints. For a prolonged period, this field has been predominantly driven by deep convolutional neural networks. In recent years, the Transformer has witnessed remarkable advancements in computer vision, prompting an increasing body of research to delve into the application of Transformer in Re-ID. This… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

  22. arXiv:2312.05777  [pdf, other

    cs.CV

    Negative Pre-aware for Noisy Cross-modal Matching

    Authors: Xu Zhang, Hao Li, Mang Ye

    Abstract: Cross-modal noise-robust learning is a challenging task since noisy correspondence is hard to recognize and rectify. Due to the cumulative and unavoidable negative impact of unresolved noise, existing methods cannot maintain a stable performance when the noise increases. In this paper, we present a novel Negative Pre-aware Cross-modal (NPC) matching solution for large visual-language model fine-tu… ▽ More

    Submitted 14 December, 2023; v1 submitted 10 December, 2023; originally announced December 2023.

    Comments: 9 pages, 5 figures, conference

  23. arXiv:2312.01080  [pdf, other

    cs.CR

    A Novel Residual-guided Learning Method for Image Steganography

    Authors: Miaoxin Ye, Dongxia Huang, Kangkang Wei, Weiqi Luo

    Abstract: Traditional steganographic techniques have often relied on manually crafted attributes related to image residuals. These methods demand a significant level of expertise and face challenges in integrating diverse image residual characteristics. In this paper, we introduce an innovative deep learning-based methodology that seamlessly integrates image residuals, residual distances, and image local va… ▽ More

    Submitted 2 December, 2023; originally announced December 2023.

  24. arXiv:2312.00732  [pdf, other

    cs.CV cs.AI

    Gaussian Grouping: Segment and Edit Anything in 3D Scenes

    Authors: Mingqiao Ye, Martin Danelljan, Fisher Yu, Lei Ke

    Abstract: The recent Gaussian Splatting achieves high-quality and real-time novel-view synthesis of the 3D scenes. However, it is solely concentrated on the appearance and geometry modeling, while lacking in fine-grained object-level scene understanding. To address this issue, we propose Gaussian Grouping, which extends Gaussian Splatting to jointly reconstruct and segment anything in open-world 3D scenes.… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

    Comments: We propose Gaussian Grouping, which extends Gaussian Splatting to fine-grained open-world 3D scene understanding. Github: https://github.com/lkeab/gaussian-grouping

  25. arXiv:2312.00115  [pdf, other

    cs.CV cs.CL

    A Video is Worth 10,000 Words: Training and Benchmarking with Diverse Captions for Better Long Video Retrieval

    Authors: Matthew Gwilliam, Michael Cogswell, Meng Ye, Karan Sikka, Abhinav Shrivastava, Ajay Divakaran

    Abstract: Existing long video retrieval systems are trained and tested in the paragraph-to-video retrieval regime, where every long video is described by a single long paragraph. This neglects the richness and variety of possible valid descriptions of a video, which could be described in moment-by-moment detail, or in a single phrase summary, or anything in between. To provide a more thorough evaluation of… ▽ More

    Submitted 30 November, 2023; originally announced December 2023.

    Comments: 13 pages, 15 tables, 5 figures

  26. arXiv:2311.16510  [pdf, other

    cs.CV

    Source-Free Domain Adaptation with Frozen Multimodal Foundation Model

    Authors: Song Tang, Wenxin Su, Mao Ye, Xiatian Zhu

    Abstract: Source-Free Domain Adaptation (SFDA) aims to adapt a source model for a target domain, with only access to unlabeled target training data and the source model pre-trained on a supervised source domain. Relying on pseudo labeling and/or auxiliary supervision, conventional methods are inevitably error-prone. To mitigate this limitation, in this work we for the first time explore the potentials of of… ▽ More

    Submitted 13 March, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

    Comments: Accepted at CVPR 2024

  27. arXiv:2311.15776  [pdf, other

    cs.CV

    Stable Segment Anything Model

    Authors: Qi Fan, Xin Tao, Lei Ke, Mingqiao Ye, Yuan Zhang, Pengfei Wan, Zhongyuan Wang, Yu-Wing Tai, Chi-Keung Tang

    Abstract: The Segment Anything Model (SAM) achieves remarkable promptable segmentation given high-quality prompts which, however, often require good skills to specify. To make SAM robust to casual prompts, this paper presents the first comprehensive analysis on SAM's segmentation stability across a diverse spectrum of prompt qualities, notably imprecise bounding boxes and insufficient points. Our key findin… ▽ More

    Submitted 5 December, 2023; v1 submitted 27 November, 2023; originally announced November 2023.

    Comments: Smaller file size for the easy access. Codes will be released upon acceptance. https://github.com/fanq15/Stable-SAM

  28. arXiv:2311.10372  [pdf, other

    cs.SE

    A Survey of Large Language Models for Code: Evolution, Benchmarking, and Future Trends

    Authors: Zibin Zheng, Kaiwen Ning, Yanlin Wang, Jingwen Zhang, Dewu Zheng, Mingxi Ye, Jiachi Chen

    Abstract: General large language models (LLMs), represented by ChatGPT, have demonstrated significant potential in tasks such as code generation in software engineering. This has led to the development of specialized LLMs for software engineering, known as Code LLMs. A considerable portion of Code LLMs is derived from general LLMs through model fine-tuning. As a result, Code LLMs are often updated frequentl… ▽ More

    Submitted 8 January, 2024; v1 submitted 17 November, 2023; originally announced November 2023.

  29. arXiv:2311.08100  [pdf, other

    cs.CV cs.RO

    PPAD: Iterative Interactions of Prediction and Planning for End-to-end Autonomous Driving

    Authors: Zhili Chen, Maosheng Ye, Shuangjie Xu, Tongyi Cao, Qifeng Chen

    Abstract: We present a new interaction mechanism of prediction and planning for end-to-end autonomous driving, called PPAD (Iterative Interaction of Prediction and Planning Autonomous Driving), which considers the timestep-wise interaction to better integrate prediction and planning. An ego vehicle performs motion planning at each timestep based on the trajectory prediction of surrounding agents (e.g., vehi… ▽ More

    Submitted 27 March, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

  30. arXiv:2311.06750  [pdf, other

    cs.LG cs.AI

    Federated Learning for Generalization, Robustness, Fairness: A Survey and Benchmark

    Authors: Wenke Huang, Mang Ye, Zekun Shi, Guancheng Wan, He Li, Bo Du, Qiang Yang

    Abstract: Federated learning has emerged as a promising paradigm for privacy-preserving collaboration among different parties. Recently, with the popularity of federated learning, an influx of approaches have delivered towards different realistic challenges. In this survey, we provide a systematic overview of the important and recent developments of research on federated learning. Firstly, we introduce the… ▽ More

    Submitted 12 November, 2023; originally announced November 2023.

    Comments: 22 pages, 4 figures

  31. arXiv:2311.02559  [pdf, other

    cs.CV

    Rotation Invariant Transformer for Recognizing Object in UAVs

    Authors: Shuoyi Chen, Mang Ye, Bo Du

    Abstract: Recognizing a target of interest from the UAVs is much more challenging than the existing object re-identification tasks across multiple city cameras. The images taken by the UAVs usually suffer from significant size difference when generating the object bounding boxes and uncertain rotation variations. Existing methods are usually designed for city cameras, incapable of handing the rotation issue… ▽ More

    Submitted 4 November, 2023; originally announced November 2023.

    Comments: ACM MM2022

  32. arXiv:2310.15888  [pdf, other

    cs.LG

    State Sequences Prediction via Fourier Transform for Representation Learning

    Authors: Mingxuan Ye, Yufei Kuang, Jie Wang, Rui Yang, Wengang Zhou, Houqiang Li, Feng Wu

    Abstract: While deep reinforcement learning (RL) has been demonstrated effective in solving complex control tasks, sample efficiency remains a key challenge due to the large amounts of data required for remarkable performance. Existing research explores the application of representation learning for data-efficient RL, e.g., learning predictive representations by predicting long-term future states. However,… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

  33. arXiv:2310.14868  [pdf, other

    cs.CL

    Assessing Step-by-Step Reasoning against Lexical Negation: A Case Study on Syllogism

    Authors: Mengyu Ye, Tatsuki Kuribayashi, Jun Suzuki, Goro Kobayashi, Hiroaki Funayama

    Abstract: Large language models (LLMs) take advantage of step-by-step reasoning instructions, e.g., chain-of-thought (CoT) prompting. Building on this, their ability to perform CoT-style reasoning robustly is of interest from a probing perspective. In this study, we inspect the step-by-step reasoning ability of LLMs with a focus on negation, which is a core linguistic phenomenon that is difficult to process… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

  34. arXiv:2310.05564  [pdf

    cs.NI

    A Novel Node Selection Method in Wireless Distributed Edge Storage Based on SDN and Multi-attribute Decision Model

    Authors: Yejin Yang, Miao Ye, Qiuxiang Jiang, Peng Wen

    Abstract: The distributed edge storage system can store data collected at the edge of the network in a decentralised manner, with low latency, high security, and flexibility. Traditional edge-distributed storage systems only consider one single factor, such as node capacity, when storing data, ignoring network and storage node load conditions that affecting the system's read/write performance. At the same t… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

  35. arXiv:2310.04655  [pdf, other

    cs.CR cs.CV

    VLATTACK: Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models

    Authors: Ziyi Yin, Muchao Ye, Tianrong Zhang, Tianyu Du, Jinguo Zhu, Han Liu, Jinghui Chen, Ting Wang, Fenglong Ma

    Abstract: Vision-Language (VL) pre-trained models have shown their superiority on many multimodal tasks. However, the adversarial robustness of such models has not been fully explored. Existing approaches mainly focus on exploring the adversarial robustness under the white-box setting, which is unrealistic. In this paper, we aim to investigate a new yet practical task to craft image and text perturbations u… ▽ More

    Submitted 5 February, 2024; v1 submitted 6 October, 2023; originally announced October 2023.

    Comments: Accepted by NeurIPS 2023, 21 pages

  36. arXiv:2309.16286  [pdf, other

    cs.LG cs.AI

    Generalizable Heterogeneous Federated Cross-Correlation and Instance Similarity Learning

    Authors: Wenke Huang, Mang Ye, Zekun Shi, Bo Du

    Abstract: Federated learning is an important privacy-preserving multi-party learning paradigm, involving collaborative learning with others and local updating on private data. Model heterogeneity and catastrophic forgetting are two crucial challenges, which greatly limit the applicability and generalizability. This paper presents a novel FCCL+, federated correlation and similarity learning with non-target d… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

  37. arXiv:2309.13839  [pdf, other

    eess.IV cs.CV

    Fill the K-Space and Refine the Image: Prompting for Dynamic and Multi-Contrast MRI Reconstruction

    Authors: Bingyu Xin, Meng Ye, Leon Axel, Dimitris N. Metaxas

    Abstract: The key to dynamic or multi-contrast magnetic resonance imaging (MRI) reconstruction lies in exploring inter-frame or inter-contrast information. Currently, the unrolled model, an approach combining iterative MRI reconstruction steps with learnable neural network layers, stands as the best-performing method for MRI reconstruction. However, there are two main limitations to overcome: firstly, the u… ▽ More

    Submitted 24 September, 2023; originally announced September 2023.

    Comments: STACOM 2023; Code is available at https://github.com/hellopipu/PromptMR

  38. arXiv:2309.12594  [pdf, other

    cs.CV

    DeFormer: Integrating Transformers with Deformable Models for 3D Shape Abstraction from a Single Image

    Authors: Di Liu, Xiang Yu, Meng Ye, Qilong Zhangli, Zhuowei Li, Zhixing Zhang, Dimitris N. Metaxas

    Abstract: Accurate 3D shape abstraction from a single 2D image is a long-standing problem in computer vision and graphics. By leveraging a set of primitives to represent the target shape, recent methods have achieved promising results. However, these methods either use a relatively large number of primitives or lack geometric flexibility due to the limited expressibility of the primitives. In this paper, we… ▽ More

    Submitted 3 October, 2023; v1 submitted 21 September, 2023; originally announced September 2023.

    Comments: Accepted by ICCV 2023

  39. arXiv:2309.11913  [pdf, other

    eess.IV cs.CV cs.MM

    Spatial-Temporal Transformer based Video Compression Framework

    Authors: Yanbo Gao, Wenjia Huang, Shuai Li, Hui Yuan, Mao Ye, Siwei Ma

    Abstract: Learned video compression (LVC) has witnessed remarkable advancements in recent years. Similar as the traditional video coding, LVC inherits motion estimation/compensation, residual coding and other modules, all of which are implemented with neural networks (NNs). However, within the framework of NNs and its training mechanism using gradient backpropagation, most existing works often struggle to c… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

  40. arXiv:2309.05135  [pdf, ps, other

    cs.DS

    Streaming Semidefinite Programs: $O(\sqrt{n})$ Passes, Small Space and Fast Runtime

    Authors: Zhao Song, Mingquan Ye, Lichen Zhang

    Abstract: We study the problem of solving semidefinite programs (SDP) in the streaming model. Specifically, $m$ constraint matrices and a target matrix $C$, all of size $n\times n$ together with a vector $b\in \mathbb{R}^m$ are streamed to us one-by-one. The goal is to find a matrix $X\in \mathbb{R}^{n\times n}$ such that $\langle C, X\rangle$ is maximized, subject to $\langle A_i, X\rangle=b_i$ for all… ▽ More

    Submitted 10 September, 2023; originally announced September 2023.

  41. arXiv:2308.12617  [pdf, ps, other

    eess.SY cs.MA

    Quantized distributed Nash equilibrium seeking under DoS attacks: A quantized consensus based approach

    Authors: Shuai Feng, Maojiao Ye, Lihua Xie, Shengyuan Xu

    Abstract: This paper studies distributed Nash equilibrium (NE) seeking under Denial-of-Service (DoS) attacks and quantization. The players can only exchange information with their own direct neighbors. The transmitted information is subject to quantization and packet losses induced by malicious DoS attacks. We propose a quantized distributed NE seeking strategy based on the approach of dynamic quantized con… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

  42. arXiv:2308.10162  [pdf, other

    cs.LG cs.AI

    Rethinking Client Drift in Federated Learning: A Logit Perspective

    Authors: Yunlu Yan, Chun-Mei Feng, Mang Ye, Wangmeng Zuo, Ping Li, Rick Siow Mong Goh, Lei Zhu, C. L. Philip Chen

    Abstract: Federated Learning (FL) enables multiple clients to collaboratively learn in a distributed way, allowing for privacy protection. However, the real-world non-IID data will lead to client drift which degrades the performance of FL. Interestingly, we find that the difference in logits between the local and global models increases as the model is continuously updated, thus seriously deteriorating FL p… ▽ More

    Submitted 20 August, 2023; originally announced August 2023.

    Comments: 11 pages, 7 figures

  43. arXiv:2308.10045  [pdf, other

    cs.CV cs.CL

    An Empirical Study of CLIP for Text-based Person Search

    Authors: Min Cao, Yang Bai, Ziyin Zeng, Mang Ye, Min Zhang

    Abstract: Text-based Person Search (TBPS) aims to retrieve the person images using natural language descriptions. Recently, Contrastive Language Image Pretraining (CLIP), a universal large cross-modal vision-language pre-training model, has remarkably performed over various cross-modal downstream tasks due to its powerful cross-modal semantic learning capacity. TPBS, as a fine-grained cross-modal retrieval… ▽ More

    Submitted 20 December, 2023; v1 submitted 19 August, 2023; originally announced August 2023.

    Comments: Accepted by AAAI 2024. Code is available at https://github.com/Flame-Chasers/TBPS-CLIP

  44. arXiv:2307.11035  [pdf, other

    cs.CV cs.AI

    Cascade-DETR: Delving into High-Quality Universal Object Detection

    Authors: Mingqiao Ye, Lei Ke, Siyuan Li, Yu-Wing Tai, Chi-Keung Tang, Martin Danelljan, Fisher Yu

    Abstract: Object localization in general environments is a fundamental part of vision systems. While dominating on the COCO benchmark, recent Transformer-based detection methods are not competitive in diverse domains. Moreover, these methods still struggle to very accurately estimate the object bounding boxes in complex environments. We introduce Cascade-DETR for high-quality universal object detection. W… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

    Comments: Accepted in ICCV 2023. Our code and models will be released at https://github.com/SysCV/cascade-detr

  45. arXiv:2307.10616  [pdf, other

    cs.LG cs.AI cs.CV

    Heterogeneous Federated Learning: State-of-the-art and Research Challenges

    Authors: Mang Ye, Xiuwen Fang, Bo Du, Pong C. Yuen, Dacheng Tao

    Abstract: Federated learning (FL) has drawn increasing attention owing to its potential use in large-scale industrial applications. Existing federated learning works mainly focus on model homogeneous settings. However, practical federated learning typically faces the heterogeneity of data distributions, model architectures, network environments, and hardware devices among participant clients. Heterogeneous… ▽ More

    Submitted 8 September, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

    Comments: 42 pages, 11 figures, and 4 tables

  46. arXiv:2307.07720  [pdf

    cs.CV

    Spatial-Spectral Hyperspectral Classification based on Learnable 3D Group Convolution

    Authors: Guandong Li, Mengxia Ye

    Abstract: Deep neural networks have faced many problems in hyperspectral image classification, including the ineffective utilization of spectral-spatial joint information and the problems of gradient vanishing and overfitting that arise with increasing depth. In order to accelerate the deployment of models on edge devices with strict latency requirements and limited computing power, this paper proposes a le… ▽ More

    Submitted 15 July, 2023; originally announced July 2023.

  47. arXiv:2307.07693  [pdf, other

    cs.CV

    Neural Deformable Models for 3D Bi-Ventricular Heart Shape Reconstruction and Modeling from 2D Sparse Cardiac Magnetic Resonance Imaging

    Authors: Meng Ye, Dong Yang, Mikael Kanski, Leon Axel, Dimitris Metaxas

    Abstract: We propose a novel neural deformable model (NDM) targeting at the reconstruction and modeling of 3D bi-ventricular shape of the heart from 2D sparse cardiac magnetic resonance (CMR) imaging data. We model the bi-ventricular shape using blended deformable superquadrics, which are parameterized by a set of geometric parameter functions and are capable of deforming globally and locally. While global… ▽ More

    Submitted 12 August, 2023; v1 submitted 14 July, 2023; originally announced July 2023.

    Comments: Accepted by ICCV 2023

  48. arXiv:2306.08325  [pdf, other

    cs.LG

    GCformer: An Efficient Framework for Accurate and Scalable Long-Term Multivariate Time Series Forecasting

    Authors: YanJun Zhao, Ziqing Ma, Tian Zhou, Liang Sun, Mengni Ye, Yi Qian

    Abstract: Transformer-based models have emerged as promising tools for time series forecasting. However, these model cannot make accurate prediction for long input time series. On the one hand, they failed to capture global dependencies within time series data. On the other hand, the long input sequence usually leads to large model size and high time complexity. To address these limitations, we present… ▽ More

    Submitted 14 August, 2023; v1 submitted 14 June, 2023; originally announced June 2023.

  49. arXiv:2306.06683  [pdf, other

    cs.SI

    To be a pro-vax or not, the COVID-19 vaccine conundrum on Twitter

    Authors: Zainab Zaidi, Mengbin Ye, Shanika Karunasekera, Yoshihisa Kashima

    Abstract: The most surprising observation reported by the study in (arXiv:2208.13523), involving stance detection of COVID-19 vaccine related tweets during the first year of pandemic, is the presence of a significant number of users (~2 million) who posted tweets with both anti-vax and pro-vax stances. This is a sizable cohort even when the stance detection noise is considered. In this paper, we tried to ge… ▽ More

    Submitted 11 June, 2023; originally announced June 2023.

  50. arXiv:2306.04169  [pdf, ps, other

    cs.LG math.OC

    Efficient Alternating Minimization with Applications to Weighted Low Rank Approximation

    Authors: Zhao Song, Mingquan Ye, Junze Yin, Lichen Zhang

    Abstract: Weighted low rank approximation is a fundamental problem in numerical linear algebra, and it has many applications in machine learning. Given a matrix $M \in \mathbb{R}^{n \times n}$, a weight matrix $W \in \mathbb{R}_{\geq 0}^{n \times n}$, a parameter $k$, the goal is to output two matrices $U, V \in \mathbb{R}^{n \times k}$ such that $\| W \circ (M - U V^\top) \|_F$ is minimized, where $\circ$… ▽ More

    Submitted 27 July, 2023; v1 submitted 7 June, 2023; originally announced June 2023.