Skip to main content

Showing 1–50 of 969 results for author: Yao, J

  1. arXiv:2407.12648  [pdf, ps, other

    cs.IT eess.SP

    Blind Beamforming for Coverage Enhancement with Intelligent Reflecting Surface

    Authors: Fan Xu, Jiawei Yao, Wenhai Lai, Kaiming Shen, Xin Li, Xin Chen, Zhi-Quan Luo

    Abstract: Conventional policy for configuring an intelligent reflecting surface (IRS) typically requires channel state information (CSI), thus incurring substantial overhead costs and facing incompatibility with the current network protocols. This paper proposes a blind beamforming strategy in the absence of CSI, aiming to boost the minimum signal-to-noise ratio (SNR) among all the receiver positions, namel… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 17 pages

  2. arXiv:2407.12258  [pdf, other

    cs.CV

    Facial Affect Recognition based on Multi Architecture Encoder and Feature Fusion for the ABAW7 Challenge

    Authors: Kang Shen, Xuxiong Liu, Boyan Wang, Jun Yao, Xin Liu, Yujie Guan, Yu Wang, Gengchen Li, Xiao Sun

    Abstract: In this paper, we present our approach to addressing the challenges of the 7th ABAW competition. The competition comprises three sub-challenges: Valence Arousal (VA) estimation, Expression (Expr) classification, and Action Unit (AU) detection. To tackle these challenges, we employ state-of-the-art models to extract powerful visual features. Subsequently, a Transformer Encoder is utilized to integr… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  3. arXiv:2407.12257  [pdf, other

    cs.CV

    Compound Expression Recognition via Multi Model Ensemble for the ABAW7 Challenge

    Authors: Xuxiong Liu, Kang Shen, Jun Yao, Boyan Wang, Minrui Liu, Liuwei An, Zishun Cui, Weijie Feng, Xiao Sun

    Abstract: Compound Expression Recognition (CER) is vital for effective interpersonal interactions. Human emotional expressions are inherently complex due to the presence of compound expressions, requiring the consideration of both local and global facial cues for accurate judgment. In this paper, we propose an ensemble learning-based solution to address this complexity. Our approach involves training three… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2403.12572 by other authors

  4. arXiv:2407.11629  [pdf, other

    eess.AS

    MUSA: Multi-lingual Speaker Anonymization via Serial Disentanglement

    Authors: Jixun Yao, Qing Wang, Pengcheng Guo, Ziqian Ning, Yuguang Yang, Yu Pan, Lei Xie

    Abstract: Speaker anonymization is an effective privacy protection solution designed to conceal the speaker's identity while preserving the linguistic content and para-linguistic information of the original speech. While most prior studies focus solely on a single language, an ideal speaker anonymization system should be capable of handling multiple languages. This paper proposes MUSA, a Multi-lingual Speak… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Submitted to TASLP

  5. arXiv:2407.11307  [pdf, ps, other

    eess.SP

    Fluid Antenna-Assisted Simultaneous Wireless Information and Power Transfer Systems

    Authors: Liaoshi Zhou, Junteng Yao, Tuo Wu, Ming Jin, Chau Yuen, Fumiyuki Adachi

    Abstract: This paper examines a fluid antenna (FA)-assisted simultaneous wireless information and power transfer (SWIPT) system. Unlike traditional SWIPT systems with fixed-position antennas (FPAs), our FA-assisted system enables dynamic reconfiguration of the radio propagation environment by adjusting the positions of FAs. This capability enhances both energy harvesting and communication performance. The s… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  6. arXiv:2407.10725  [pdf, other

    cs.CL cs.AI

    CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated Responses

    Authors: Jing Yao, Xiaoyuan Yi, Xing Xie

    Abstract: The rapid progress in Large Language Models (LLMs) poses potential risks such as generating unethical content. Assessing LLMs' values can help expose their misalignment, but relies on reference-free evaluators, e.g., fine-tuned LLMs or close-source ones like GPT-4, to identify values reflected in generated responses. Nevertheless, these evaluators face two challenges in open-ended value evaluation… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  7. arXiv:2407.08141  [pdf, ps, other

    eess.SP

    A Framework of FAS-RIS Systems: Performance Analysis and Throughput Optimization

    Authors: Junteng Yao, Xiazhi Lai, Kangda Zhi, Tuo Wu, Ming Jin, Cunhua Pan, Maged Elkashlan, Chau Yuen, Kai-Kit Wong

    Abstract: In this paper, we investigate reconfigurable intelligent surface (RIS)-assisted communication systems which involve a fixed-antenna base station (BS) and a mobile user (MU) that is equipped with fluid antenna system (FAS). Specifically, the RIS is utilized to enable communication for the user whose direct link from the base station is blocked by obstacles. We propose a comprehensive framework that… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: submitted to IEEE journal for possible publication

  8. arXiv:2407.06504  [pdf, other

    cs.CV

    Reprogramming Distillation for Medical Foundation Models

    Authors: Yuhang Zhou, Siyuan Du, Haolin Li, Jiangchao Yao, Ya Zhang, Yanfeng Wang

    Abstract: Medical foundation models pre-trained on large-scale datasets have demonstrated powerful versatile capabilities for various tasks. However, due to the gap between pre-training tasks (or modalities) and downstream tasks (or modalities), the real-world computation and speed constraints, it might not be straightforward to apply medical foundation models in the downstream scenarios. Previous methods,… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: MICCAI 2024

  9. arXiv:2407.05577  [pdf, other

    cs.CV

    Audio-driven High-resolution Seamless Talking Head Video Editing via StyleGAN

    Authors: Jiacheng Su, Kunhong Liu, Liyan Chen, Junfeng Yao, Qingsong Liu, Dongdong Lv

    Abstract: The existing methods for audio-driven talking head video editing have the limitations of poor visual effects. This paper tries to tackle this problem through editing talking face images seamless with different emotions based on two modules: (1) an audio-to-landmark module, consisting of the CrossReconstructed Emotion Disentanglement and an alignment network module. It bridges the gap between speec… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  10. arXiv:2407.05023  [pdf, other

    cs.CV

    SurgicalGaussian: Deformable 3D Gaussians for High-Fidelity Surgical Scene Reconstruction

    Authors: Weixing Xie, Junfeng Yao, Xianpeng Cao, Qiqin Lin, Zerui Tang, Xiao Dong, Xiaohu Guo

    Abstract: Dynamic reconstruction of deformable tissues in endoscopic video is a key technology for robot-assisted surgery. Recent reconstruction methods based on neural radiance fields (NeRFs) have achieved remarkable results in the reconstruction of surgical scenes. However, based on implicit representation, NeRFs struggle to capture the intricate details of objects in the scene and cannot achieve real-tim… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  11. arXiv:2407.00718  [pdf, other

    eess.IV cs.CV

    ASPS: Augmented Segment Anything Model for Polyp Segmentation

    Authors: Huiqian Li, Dingwen Zhang, Jieru Yao, Longfei Han, Zhongyu Li, Junwei Han

    Abstract: Polyp segmentation plays a pivotal role in colorectal cancer diagnosis. Recently, the emergence of the Segment Anything Model (SAM) has introduced unprecedented potential for polyp segmentation, leveraging its powerful pre-training capability on large-scale datasets. However, due to the domain gap between natural and endoscopy images, SAM encounters two limitations in achieving effective performan… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: Accepted by MICCAI2024

  12. arXiv:2407.00336  [pdf, other

    cs.CR cs.LG

    Dual-view Aware Smart Contract Vulnerability Detection for Ethereum

    Authors: Jiacheng Yao, Maolin Wang, Wanqi Chen, Chengxiang Jin, Jiajun Zhou, Shanqing Yu, Qi Xuan

    Abstract: The wide application of Ethereum technology has brought technological innovation to traditional industries. As one of Ethereum's core applications, smart contracts utilize diverse contract codes to meet various functional needs and have gained widespread use. However, the non-tamperability of smart contracts, coupled with vulnerabilities caused by natural flaws or human errors, has brought unprece… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: Accepted by International Conference on Blockchain and Trustworthy Systems 2024

  13. arXiv:2406.18169  [pdf, ps, other

    astro-ph.HE hep-ph

    Timing and Scintillation Studies of Pulsars in Globular Cluster M3 (NGC 5272) with FAST

    Authors: Baoda Li, Li-yun Zhang, Jumei Yao, Dejiang Yin, Ralph P. Eatough, Minghui Li, Yifeng Li, Yujie Lian, Yu Pan, Yinfeng Dai, Yaowei Li, Xingnan Zhang, Tianhao Su, Yuxiao Wu, Tong Liu, Kuo Liu, Lin Wang, Lei Qian, Zhichen Pan

    Abstract: We present the phase-connected timing solutions of all the five pulsars in globular cluster (GC) M3 (NGC 5272), namely PSRs M3A to F (PSRs J1342+2822A to F), with the exception of PSR M3C, from FAST archival data. In these timing solutions, those of PSRs M3E, and F are obtained for the first time. We find that PSRs M3E and F have low mass companions, and are in circular orbits with periods of 7.1… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 14 pages, 4 figures, accepted for publication in The Astrophysical Journal

  14. arXiv:2406.17005  [pdf, other

    cs.CV

    PVUW 2024 Challenge on Complex Video Understanding: Methods and Results

    Authors: Henghui Ding, Chang Liu, Yunchao Wei, Nikhila Ravi, Shuting He, Song Bai, Philip Torr, Deshui Miao, Xin Li, Zhenyu He, Yaowei Wang, Ming-Hsuan Yang, Zhensong Xu, Jiangtao Yao, Chengjing Wu, Ting Liu, Luoqi Liu, Xinyu Liu, Jing Zhang, Kexin Zhang, Yuting Yang, Licheng Jiao, Shuyuan Yang, Mingqi Gao, Jingnan Luo , et al. (12 additional authors not shown)

    Abstract: Pixel-level Video Understanding in the Wild Challenge (PVUW) focus on complex video understanding. In this CVPR 2024 workshop, we add two new tracks, Complex Video Object Segmentation Track based on MOSE dataset and Motion Expression guided Video Segmentation track based on MeViS dataset. In the two new tracks, we provide additional videos and annotations that feature challenging elements, such as… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: MOSE Challenge: https://henghuiding.github.io/MOSE/ChallengeCVPR2024, MeViS Challenge: https://henghuiding.github.io/MeViS/ChallengeCVPR2024

  15. arXiv:2406.16529  [pdf, other

    cs.CL

    Towards Better Graph-based Cross-document Relation Extraction via Non-bridge Entity Enhancement and Prediction Debiasing

    Authors: Hao Yue, Shaopeng Lai, Chengyi Yang, Liang Zhang, Junfeng Yao, Jinsong Su

    Abstract: Cross-document Relation Extraction aims to predict the relation between target entities located in different documents. In this regard, the dominant models commonly retain useful information for relation prediction via bridge entities, which allows the model to elaborately capture the intrinsic interdependence between target entities. However, these studies ignore the non-bridge entities, each of… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 Findings

  16. arXiv:2406.16295  [pdf, other

    cs.LG cs.AI

    Relaxing Continuous Constraints of Equivariant Graph Neural Networks for Physical Dynamics Learning

    Authors: Zinan Zheng, Yang Liu, Jia Li, Jianhua Yao, Yu Rong

    Abstract: Incorporating Euclidean symmetries (e.g. rotation equivariance) as inductive biases into graph neural networks has improved their generalization ability and data efficiency in unbounded physical dynamics modeling. However, in various scientific and engineering applications, the symmetries of dynamics are frequently discrete due to the boundary conditions. Thus, existing GNNs either overlook necess… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  17. arXiv:2406.15047  [pdf, other

    cs.IT eess.SP

    Optimal Transmit Signal Design for Multi-Target MIMO Sensing Exploiting Prior Information

    Authors: Jiayi Yao, Shuowen Zhang

    Abstract: In this paper, we study the transmit signal optimization in a multiple-input multiple-output (MIMO) radar system for sensing the angle information of multiple targets via their reflected echo signals. We consider a challenging and practical scenario where the angles to be sensed are unknown and random, while their probability information is known a priori for exploitation. First, we establish an a… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: submitted for possible piblication

  18. arXiv:2406.14519  [pdf, other

    astro-ph.CO astro-ph.IM

    ForSE+: Simulating non-Gaussian CMB foregrounds at 3 arcminutes in a stochastic way based on a generative adversarial network

    Authors: Jian Yao, Nicoletta Krachmalnicoff, Marianna Foschi, Giuseppe Puglisi, Carlo Baccigalupi

    Abstract: We present ForSE+, a Python package that produces non-Gaussian diffuse Galactic thermal dust emission maps at arcminute angular scales and that has the capacity to generate random realizations of small scales. This represents an extension of the ForSE (Foreground Scale Extender) package, which was recently proposed to simulate non-Gaussian small scales of thermal dust emission using generative adv… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Published in A&A. Comments welcome

  19. arXiv:2406.09679  [pdf, other

    cs.CV

    Exploring Training on Heterogeneous Data with Mixture of Low-rank Adapters

    Authors: Yuhang Zhou, Zihua Zhao, Haolin Li, Siyuan Du, Jiangchao Yao, Ya Zhang, Yanfeng Wang

    Abstract: Training a unified model to take multiple targets into account is a trend towards artificial general intelligence. However, how to efficiently mitigate the training conflicts among heterogeneous data collected from different domains or tasks remains under-explored. In this study, we explore to leverage Mixture of Low-rank Adapters (MoLA) to mitigate conflicts in heterogeneous data training, which… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: ICML2024

  20. arXiv:2406.09098  [pdf, other

    cs.CL

    SciKnowEval: Evaluating Multi-level Scientific Knowledge of Large Language Models

    Authors: Kehua Feng, Keyan Ding, Weijie Wang, Xiang Zhuang, Zeyuan Wang, Ming Qin, Yu Zhao, Jianhua Yao, Qiang Zhang, Huajun Chen

    Abstract: The burgeoning utilization of Large Language Models (LLMs) in scientific research necessitates advanced benchmarks capable of evaluating their understanding and application of scientific knowledge comprehensively. To address this need, we introduce the SciKnowEval benchmark, a novel framework that systematically evaluates LLMs across five progressive levels of scientific knowledge: studying extens… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 48 pages, 2 figures

  21. arXiv:2406.08760  [pdf, other

    physics.flu-dyn

    Boundary sources of velocity gradient tensor and its invariants

    Authors: Tao Chen, Jie-Zhi Wu, Tianshu Liu, Jie Yao

    Abstract: The present work elucidates the boundary behaviors of the velocity gradient tensor ($\bm{A}\equiv\bm{\nabla}\bm{u}$) and its principal invariants ($P,Q,R$) for compressible flow interacting with a stationary rigid wall. Firstly, it is found that the well-known Caswell formula exhibits an inherent physical structure being compatible with the normal-nilpotent decomposition, where both the strain-rat… ▽ More

    Submitted 24 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  22. arXiv:2406.08288  [pdf, other

    cs.LG

    Decoupling the Class Label and the Target Concept in Machine Unlearning

    Authors: Jianing Zhu, Bo Han, Jiangchao Yao, Jianliang Xu, Gang Niu, Masashi Sugiyama

    Abstract: Machine unlearning as an emerging research topic for data regulations, aims to adjust a trained model to approximate a retrained one that excludes a portion of training data. Previous studies showed that class-wise unlearning is successful in forgetting the knowledge of a target class, through gradient ascent on the forgetting data or fine-tuning with the remaining data. However, while these metho… ▽ More

    Submitted 16 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  23. arXiv:2406.08192  [pdf, other

    cs.CV

    2nd Place Solution for MOSE Track in CVPR 2024 PVUW workshop: Complex Video Object Segmentation

    Authors: Zhensong Xu, Jiangtao Yao, Chengjing Wu, Ting Liu, Luoqi Liu

    Abstract: Complex video object segmentation serves as a fundamental task for a wide range of downstream applications such as video editing and automatic data annotation. Here we present the 2nd place solution in the MOSE track of PVUW 2024. To mitigate problems caused by tiny objects, similar objects and fast movements in MOSE. We use instance segmentation to generate extra pretraining data from the valid a… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 5pages, 4 figures, technique report for MOSE Track in CVPR 2024 PVUW workshop: Complex Video Object Segmentation

  24. arXiv:2406.07846  [pdf, other

    eess.AS

    DualVC 3: Leveraging Language Model Generated Pseudo Context for End-to-end Low Latency Streaming Voice Conversion

    Authors: Ziqian Ning, Shuai Wang, Pengcheng Zhu, Zhichao Wang, Jixun Yao, Lei Xie, Mengxiao Bi

    Abstract: Streaming voice conversion has become increasingly popular for its potential in real-time applications. The recently proposed DualVC 2 has achieved robust and high-quality streaming voice conversion with a latency of about 180ms. Nonetheless, the recognition-synthesis framework hinders end-to-end optimization, and the instability of automatic speech recognition (ASR) model with short chunks makes… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  25. arXiv:2406.06693  [pdf, other

    astro-ph.CO

    The measurement of the splashback radius of dark matter halo

    Authors: Weiwei Xu, Huanyuan Shan, Ran Li, Ji Yao, Chunxiang Wang, Nan Li, Chaoli Zhang

    Abstract: In the hierarchical evolution framework of cosmology, larger halos grow through matter accretion and halo mergers. To clarify the halo evolution, we need to define the halo mass and radius physically. However, the pseudo-evolution problem makes the process difficult. Thus, we aim to measure the splashback radius, a physically defined halo radius for a large number of halos with various mass and re… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 15 pages, 7 figures, submitted to ApJ

  26. arXiv:2406.04872  [pdf, other

    cs.LG

    Diversified Batch Selection for Training Acceleration

    Authors: Feng Hong, Yueming Lyu, Jiangchao Yao, Ya Zhang, Ivor W. Tsang, Yanfeng Wang

    Abstract: The remarkable success of modern machine learning models on large datasets often demands extensive training time and resource consumption. To save cost, a prevalent research line, known as online batch selection, explores selecting informative subsets during the training process. Although recent efforts achieve advancements by measuring the impact of each sample on generalization, their reliance o… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  27. arXiv:2406.03450  [pdf, other

    cs.CL cs.AI

    What is the Best Way for ChatGPT to Translate Poetry?

    Authors: Shanshan Wang, Derek F. Wong, Jingming Yao, Lidia S. Chao

    Abstract: Machine translation (MT) has historically faced significant challenges when applied to literary works, particularly in the domain of poetry translation. The advent of Large Language Models such as ChatGPT holds potential for innovation in this field. This study examines ChatGPT's capabilities in English-Chinese poetry translation tasks, utilizing targeted prompts and small sample scenarios to asce… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 19 pages, 1 figure. The paper has been accepted by ACL 2024(Main Conference)

  28. arXiv:2406.02517  [pdf, other

    cs.CL

    Deterministic Reversible Data Augmentation for Neural Machine Translation

    Authors: Jiashu Yao, Heyan Huang, Zeming Liu, Yuhang Guo

    Abstract: Data augmentation is an effective way to diversify corpora in machine translation, but previous methods may introduce semantic inconsistency between original and augmented data because of irreversible operations and random subword sampling procedures. To generate both symbolically diverse and semantically consistent augmentation data, we propose Deterministic Reversible Data Augmentation (DRDA), a… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Findings of ACL 2024

  29. arXiv:2406.02233  [pdf, other

    eess.AS

    Towards Out-of-Distribution Detection in Vocoder Recognition via Latent Feature Reconstruction

    Authors: Renmingyue Du, Jixun Yao, Qiuqiang Kong, Yin Cao

    Abstract: Advancements in synthesized speech have created a growing threat of impersonation, making it crucial to develop deepfake algorithm recognition. One significant aspect is out-of-distribution (OOD) detection, which has gained notable attention due to its important role in deepfake algorithm recognition. However, most of the current approaches for detecting OOD in deepfake algorithm recognition rely… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 5 pages, 4 figures

  30. arXiv:2405.20626  [pdf, other

    cs.IR cs.IT

    Causal Distillation for Alleviating Performance Heterogeneity in Recommender Systems

    Authors: Shengyu Zhang, Ziqi Jiang, Jiangchao Yao, Fuli Feng, Kun Kuang, Zhou Zhao, Shuo Li, Hongxia Yang, Tat-Seng Chua, Fei Wu

    Abstract: Recommendation performance usually exhibits a long-tail distribution over users -- a small portion of head users enjoy much more accurate recommendation services than the others. We reveal two sources of this performance heterogeneity problem: the uneven distribution of historical interactions (a natural source); and the biased training of recommender models (a model source). As addressing this pr… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: TKDE 2023

  31. arXiv:2405.19919  [pdf, other

    cs.LG cs.SI

    Unraveling the Impact of Heterophilic Structures on Graph Positive-Unlabeled Learning

    Authors: Yuhao Wu, Jiangchao Yao, Bo Han, Lina Yao, Tongliang Liu

    Abstract: While Positive-Unlabeled (PU) learning is vital in many real-world scenarios, its application to graph data still remains under-explored. We unveil that a critical challenge for PU learning on graph lies on the edge heterophily, which directly violates the irreducibility assumption for Class-Prior Estimation (class prior is essential for building PU learning algorithms) and degenerates the latent… ▽ More

    Submitted 1 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: ICML 2024

  32. arXiv:2405.18983  [pdf, other

    cs.LG cs.DC

    Federated Learning under Partially Class-Disjoint Data via Manifold Reshaping

    Authors: Ziqing Fan, Jiangchao Yao, Ruipeng Zhang, Lingjuan Lyu, Ya Zhang, Yanfeng Wang

    Abstract: Statistical heterogeneity severely limits the performance of federated learning (FL), motivating several explorations e.g., FedProx, MOON and FedDyn, to alleviate this problem. Despite effectiveness, their considered scenario generally requires samples from almost all classes during the local training of each client, although some covariate shifts may exist among clients. In fact, the natural case… ▽ More

    Submitted 3 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

  33. arXiv:2405.18972  [pdf, other

    cs.LG cs.DC

    Federated Learning with Bilateral Curation for Partially Class-Disjoint Data

    Authors: Ziqing Fan, Ruipeng Zhang, Jiangchao Yao, Bo Han, Ya Zhang, Yanfeng Wang

    Abstract: Partially class-disjoint data (PCDD), a common yet under-explored data formation where each client contributes a part of classes (instead of all classes) of samples, severely challenges the performance of federated algorithms. Without full classes, the local objective will contradict the global objective, yielding the angle collapse problem for locally missing classes and the space waste problem f… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  34. arXiv:2405.18890  [pdf, other

    cs.LG cs.DC

    Locally Estimated Global Perturbations are Better than Local Perturbations for Federated Sharpness-aware Minimization

    Authors: Ziqing Fan, Shengchao Hu, Jiangchao Yao, Gang Niu, Ya Zhang, Masashi Sugiyama, Yanfeng Wang

    Abstract: In federated learning (FL), the multi-step update and data heterogeneity among clients often lead to a loss landscape with sharper minima, degenerating the performance of the resulted global model. Prevalent federated approaches incorporate sharpness-aware minimization (SAM) into local training to mitigate this problem. However, the local loss landscapes may not accurately reflect the flatness of… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  35. arXiv:2405.18861  [pdf, other

    cs.CV cs.LG

    Domain-Inspired Sharpness-Aware Minimization Under Domain Shifts

    Authors: Ruipeng Zhang, Ziqing Fan, Jiangchao Yao, Ya Zhang, Yanfeng Wang

    Abstract: This paper presents a Domain-Inspired Sharpness-Aware Minimization (DISAM) algorithm for optimization under domain shifts. It is motivated by the inconsistent convergence degree of SAM across different domains, which induces optimization bias towards certain domains and thus impairs the overall convergence. To address this issue, we consider the domain-level convergence consistency in the sharpnes… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Published as a conference paper at ICLR 2024

  36. arXiv:2405.17759  [pdf, ps, other

    cs.IT

    Wireless Federated Learning over Resource-Constrained Networks: Digital versus Analog Transmissions

    Authors: Jiacheng Yao, Wei Xu, Zhaohui Yang, Xiaohu You, Mehdi Bennis, H. Vincent Poor

    Abstract: To enable wireless federated learning (FL) in communication resource-constrained networks, two communication schemes, i.e., digital and analog ones, are effective solutions. In this paper, we quantitatively compare these two techniques, highlighting their essential differences as well as respectively suitable scenarios. We first examine both digital and analog transmission schemes, together with a… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Accepted by IEEE TWC. arXiv admin note: text overlap with arXiv:2402.09657

  37. arXiv:2405.17385  [pdf, other

    quant-ph cond-mat.mes-hall cond-mat.str-el

    Thermalization and Criticality on an Analog-Digital Quantum Simulator

    Authors: Trond I. Andersen, Nikita Astrakhantsev, Amir H. Karamlou, Julia Berndtsson, Johannes Motruk, Aaron Szasz, Jonathan A. Gross, Alexander Schuckert, Tom Westerhout, Yaxing Zhang, Ebrahim Forati, Dario Rossi, Bryce Kobrin, Agustin Di Paolo, Andrey R. Klots, Ilya Drozdov, Vladislav D. Kurilovich, Andre Petukhov, Lev B. Ioffe, Andreas Elben, Aniket Rath, Vittorio Vitale, Benoit Vermersch, Rajeev Acharya, Laleh Aghababaie Beni , et al. (202 additional authors not shown)

    Abstract: Understanding how interacting particles approach thermal equilibrium is a major challenge of quantum simulators. Unlocking the full potential of such systems toward this goal requires flexible initial state preparation, precise time evolution, and extensive probes for final state characterization. We present a quantum simulator comprising 69 superconducting qubits which supports both universal qua… ▽ More

    Submitted 8 July, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

  38. arXiv:2405.16996  [pdf, other

    cs.CV

    Mitigating Noisy Correspondence by Geometrical Structure Consistency Learning

    Authors: Zihua Zhao, Mengxi Chen, Tianjie Dai, Jiangchao Yao, Bo han, Ya Zhang, Yanfeng Wang

    Abstract: Noisy correspondence that refers to mismatches in cross-modal data pairs, is prevalent on human-annotated or web-crawled datasets. Prior approaches to leverage such data mainly consider the application of uni-modal noisy label learning without amending the impact on both cross-modal and intra-modal geometrical structures in multimodal learning. Actually, we find that both structures are effective… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 10 pages, 5 figures, received by IEEE/CVF Computer Science and Pattern Recognition

  39. arXiv:2405.16444  [pdf, other

    cs.LG

    CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion

    Authors: Jiayi Yao, Hanchen Li, Yuhan Liu, Siddhant Ray, Yihua Cheng, Qizheng Zhang, Kuntai Du, Shan Lu, Junchen Jiang

    Abstract: Large language models (LLMs) often incorporate multiple text chunks in their inputs to provide the necessary contexts. To speed up the prefill of the long LLM inputs, one can pre-compute the KV cache of a text and re-use the KV cache when the context is reused as the prefix of another LLM input. However, the reused text chunks are not always the input prefix, and when they are not, their precomput… ▽ More

    Submitted 3 June, 2024; v1 submitted 26 May, 2024; originally announced May 2024.

  40. arXiv:2405.16283  [pdf, other

    cs.DC

    TURNIP: A "Nondeterministic" GPU Runtime with CPU RAM Offload

    Authors: Zhimin Ding, Jiawen Yao, Brianna Barrow, Tania Lorido Botran, Christopher Jermaine, Yuxin Tang, Jiehui Li, Xinyu Yao, Sleem Mahmoud Abdelghafar, Daniel Bourgeois

    Abstract: An obvious way to alleviate memory difficulties in GPU-based AI computing is via CPU offload, where data are moved between GPU and CPU RAM, so inexpensive CPU RAM is used to increase the amount of storage available. While CPU offload is an obvious idea, it can greatly slow down a computation, due to the relatively slow transfer rate between CPU RAM and GPU RAM. Thus, any system for CPU offload nee… ▽ More

    Submitted 27 May, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

  41. arXiv:2405.16265  [pdf, other

    cs.LG

    MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time

    Authors: Jikun Kang, Xin Zhe Li, Xi Chen, Amirreza Kazemi, Qianyi Sun, Boxing Chen, Dong Li, Xu He, Quan He, Feng Wen, Jianye Hao, Jun Yao

    Abstract: Although Large Language Models (LLMs) achieve remarkable performance across various tasks, they often struggle with complex reasoning tasks, such as answering mathematical questions. Recent efforts to address this issue have primarily focused on leveraging mathematical datasets through supervised fine-tuning or self-improvement techniques. However, these methods often depend on high-quality datase… ▽ More

    Submitted 26 June, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

  42. arXiv:2405.15271  [pdf

    eess.SY physics.ins-det physics.optics

    Seamless Integration and Implementation of Distributed Contact and Contactless Vital Sign Monitoring

    Authors: Dingding Liang, Yang Chen, Jiawei Gao, Taixia Shi, Jianping Yao

    Abstract: Real-time vital sign monitoring is gaining immense significance not only in the medical field but also in personal health management. Facing the needs of different application scenarios of the smart and healthy city in the future, the low-cost, large-scale, scalable, and distributed vital sign monitoring system is of great significance. In this work, a seamlessly integrated contact and contactless… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 14 pages,9 figures

  43. arXiv:2405.14230  [pdf, other

    cs.CV cs.AI cs.CL

    Boosting Medical Image-based Cancer Detection via Text-guided Supervision from Reports

    Authors: Guangyu Guo, Jiawen Yao, Yingda Xia, Tony C. W. Mok, Zhilin Zheng, Junwei Han, Le Lu, Dingwen Zhang, Jian Zhou, Ling Zhang

    Abstract: The absence of adequately sufficient expert-level tumor annotations hinders the effectiveness of supervised learning based opportunistic cancer screening on medical imaging. Clinical reports (that are rich in descriptive textual details) can offer a "free lunch'' supervision information and provide tumor location as a type of weak label to cope with screening tasks, thus saving human labeling work… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  44. arXiv:2405.12478  [pdf, other

    eess.SY

    Efficient Economic Model Predictive Control of Water Treatment Process with Learning-based Koopman Operator

    Authors: Minghao Han, Jingshi Yao, Adrian Wing-Keung Law, Xunyuan Yin

    Abstract: Used water treatment plays a pivotal role in advancing environmental sustainability. Economic model predictive control holds the promise of enhancing the overall operational performance of the water treatment facilities. In this study, we propose a data-driven economic predictive control approach within the Koopman modeling framework. First, we propose a deep learning-enabled input-output Koopman… ▽ More

    Submitted 14 July, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

  45. arXiv:2405.10786  [pdf, other

    eess.AS

    Distinctive and Natural Speaker Anonymization via Singular Value Transformation-assisted Matrix

    Authors: Jixun Yao, Qing Wang, Pengcheng Guo, Ziqian Ning, Lei Xie

    Abstract: Speaker anonymization is an effective privacy protection solution that aims to conceal the speaker's identity while preserving the naturalness and distinctiveness of the original speech. Mainstream approaches use an utterance-level vector from a pre-trained automatic speaker verification (ASV) model to represent speaker identity, which is then averaged or modified for anonymization. However, these… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: Accepted by IEEE/ACM Transactions on Audio, Speech, and Language Processing

  46. arXiv:2405.07704  [pdf, other

    nucl-th nucl-ex

    Shell structure and shape transition in odd-$Z$ superheavy nuclei with proton numbers $Z=117, 119$: insights from deformed relativistic Hartree-Bogoliubov in continuum

    Authors: Y. X. Zhang, B. R. Liu, K. Y. Zhang, J. M. Yao

    Abstract: We present a systematic study on the structural properties of odd-$Z$ superheavy nuclei with proton numbers $Z=117, 119$, and neutron numbers $N$ increasing from $N=170$ to the neutron dripline within the framework of axially deformed relativistic Hartree-Bogoliubov theory in continuum (DRHBc). The results are compared with those of even-even superheavy nuclei with proton numbers $Z=118$ and… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: 15 pages with 18 figures

  47. Array SAR 3D Sparse Imaging Based on Regularization by Denoising Under Few Observed Data

    Authors: Yangyang Wang, Xu Zhan, Jing Gao, Jinjie Yao, Shunjun Wei, JianSheng Bai

    Abstract: Array synthetic aperture radar (SAR) three-dimensional (3D) imaging can obtain 3D information of the target region, which is widely used in environmental monitoring and scattering information measurement. In recent years, with the development of compressed sensing (CS) theory, sparse signal processing is used in array SAR 3D imaging. Compared with matched filter (MF), sparse SAR imaging can effect… ▽ More

    Submitted 26 May, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

  48. arXiv:2405.05237  [pdf, other

    cs.CV

    EVA-X: A Foundation Model for General Chest X-ray Analysis with Self-supervised Learning

    Authors: Jingfeng Yao, Xinggang Wang, Yuehao Song, Huangxuan Zhao, Jun Ma, Yajie Chen, Wenyu Liu, Bo Wang

    Abstract: The diagnosis and treatment of chest diseases play a crucial role in maintaining human health. X-ray examination has become the most common clinical examination means due to its efficiency and cost-effectiveness. Artificial intelligence analysis methods for chest X-ray images are limited by insufficient annotation data and varying levels of annotation, resulting in weak generalization ability and… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: codes available at: https://github.com/hustvl/EVA-X

  49. arXiv:2405.03135  [pdf, other

    astro-ph.CO

    CURLING - I. The Influence of Point-like Image Approximation on the Outcomes of Cluster Strong Lens Modeling

    Authors: Yushan Xie, Huanyuan Shan, Nan Li, Ran Li, Eric Jullo, Chen Su, Xiaoyue Cao, Jean-Paul Kneib, Ana Acebron, Mengfan He, Ji Yao, Chunxiang Wang, Jiadong Li, Yin Li

    Abstract: Cluster-scale strong lensing is a powerful tool for exploring the properties of dark matter and constraining cosmological models. However, due to the complex parameter space, pixelized strong lens modeling in galaxy clusters is computationally expensive, leading to the point-source approximation of strongly lensed extended images, potentially introducing systematic biases. Herein, as the first pap… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: 12 pages, 8 figures

  50. arXiv:2405.02673  [pdf, other

    cs.CL

    On the Information Redundancy in Non-Autoregressive Translation

    Authors: Zhihao Wang, Longyue Wang, Jinsong Su, Junfeng Yao, Zhaopeng Tu

    Abstract: Token repetition is a typical form of multi-modal problem in fully non-autoregressive translation (NAT). In this work, we revisit the multi-modal problem in recently proposed NAT models. Our study reveals that these advanced models have introduced other types of information redundancy errors, which cannot be measured by the conventional metric - the continuous repetition ratio. By manually annotat… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: 10 pages, 10 tables