Skip to main content

Showing 1–50 of 636 results for author: Qian, Y

  1. arXiv:2407.11298  [pdf, other

    cs.RO

    ThinkGrasp: A Vision-Language System for Strategic Part Grasping in Clutter

    Authors: Yaoyao Qian, Xupeng Zhu, Ondrej Biza, Shuo Jiang, Linfeng Zhao, Haojie Huang, Yu Qi, Robert Platt

    Abstract: Robotic grasping in cluttered environments remains a significant challenge due to occlusions and complex object arrangements. We have developed ThinkGrasp, a plug-and-play vision-language grasping system that makes use of GPT-4o's advanced contextual reasoning for heavy clutter environment grasping strategies. ThinkGrasp can effectively identify and generate grasp poses for target objects, even wh… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Project Website:(https://h-freax.github.io/thinkgrasp_page/)

  2. arXiv:2407.09868  [pdf

    physics.med-ph

    Separation of Sodium Signals Between Mono- and Bi-Exponential T2 Decays via Multi-TE Single-Quantum Sodium (23Na) MRI

    Authors: Yongxian Qian, Ying-Chia Lin, Xingye Chen, Tiejun Zhao, Karthik Lakshmanan, Yulin Ge, Yvonne W. Lui, Fernando E. Boada

    Abstract: Purpose. It is a long standing pursuit in sodium (23Na) MRI to separate signals between mono and bi exponential T2 decays in the human brain, due to lack of clinically translational solutions under the restriction of intrinsically low signal to noise ratio (SNR). Here we propose a new technique called multi TE single quantum (MSQ) sodium MRI to address the challenge. Methods. We exploit an intrins… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: 37 pages and 14 figures

  3. arXiv:2407.09613  [pdf, other

    astro-ph.EP

    Shadows Wreak Havocs in Transition Disks

    Authors: Yansong Qian, Yanqin Wu

    Abstract: We demonstrate that shadows cast on a proto-planetary disk can drive it eccentric. Stellar irradiation dominates heating across much of these disks, so an uneven illumination can have interesting dynamical effects. Here, we focus on transition disks. We carry out 3D Athena++ simulations, using a constant thermal relaxation time to describe the disk's response to changing stellar illumination. We f… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: submitted to AAS journal; 11 pages, 9 figures; feedbacks welcome

  4. arXiv:2407.07844  [pdf, other

    cs.CV

    OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion

    Authors: Hao Wang, Pengzhen Ren, Zequn Jie, Xiao Dong, Chengjian Feng, Yinlong Qian, Lin Ma, Dongmei Jiang, Yaowei Wang, Xiangyuan Lan, Xiaodan Liang

    Abstract: Open-vocabulary detection is a challenging task due to the requirement of detecting objects based on class names, including those not encountered during training. Existing methods have shown strong zero-shot detection capabilities through pre-training on diverse large-scale datasets. However, these approaches still face two primary challenges: (i) how to universally integrate diverse data sources… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Technical Report

  5. arXiv:2407.02477  [pdf, other

    cs.CV cs.CL

    Understanding Alignment in Multimodal LLMs: A Comprehensive Study

    Authors: Elmira Amirloo, Jean-Philippe Fauconnier, Christoph Roesmann, Christian Kerl, Rinu Boney, Yusu Qian, Zirui Wang, Afshin Dehghan, Yinfei Yang, Zhe Gan, Peter Grasch

    Abstract: Preference alignment has become a crucial component in enhancing the performance of Large Language Models (LLMs), yet its impact in Multimodal Large Language Models (MLLMs) remains comparatively underexplored. Similar to language models, MLLMs for image understanding tasks encounter challenges like hallucination. In MLLMs, hallucination can occur not only by stating incorrect facts but also by pro… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  6. arXiv:2407.01509  [pdf, other

    cs.CV cs.CL

    MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs

    Authors: Yusu Qian, Hanrong Ye, Jean-Philippe Fauconnier, Peter Grasch, Yinfei Yang, Zhe Gan

    Abstract: We introduce MIA-Bench, a new benchmark designed to evaluate multimodal large language models (MLLMs) on their ability to strictly adhere to complex instructions. Our benchmark comprises a diverse set of 400 image-prompt pairs, each crafted to challenge the models' compliance with layered instructions in generating accurate responses that satisfy specific requested patterns. Evaluation results fro… ▽ More

    Submitted 3 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

  7. arXiv:2407.00707  [pdf, other

    physics.chem-ph cond-mat.str-el physics.comp-ph

    Deep learning quantum Monte Carlo for solids

    Authors: Yubing Qian, Xiang Li, Zhe Li, Weiluo Ren, Ji Chen

    Abstract: Deep learning has deeply changed the paradigms of many research fields. At the heart of chemical and physical sciences is the accurate ab initio calculation of many-body wavefunction, which has become one of the most notable examples to demonstrate the power of deep learning in science. In particular, the introduction of deep learning into quantum Monte Carlo (QMC) has significantly advanced the f… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  8. arXiv:2407.00105  [pdf, other

    cs.LG cs.AI

    Multiple Kronecker RLS fusion-based link propagation for drug-side effect prediction

    Authors: Yuqing Qian, Ziyu Zheng, Prayag Tiwari, Yijie Ding, Quan Zou

    Abstract: Drug-side effect prediction has become an essential area of research in the field of pharmacology. As the use of medications continues to rise, so does the importance of understanding and mitigating the potential risks associated with them. At present, researchers have turned to data-driven methods to predict drug-side effects. Drug-side effect prediction is a link prediction problem, and the rela… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

    Comments: Transactions on Machine Learning Research (TMLR 2024)

  9. arXiv:2406.13471  [pdf, other

    eess.AS cs.SD

    Diffusion-based Generative Modeling with Discriminative Guidance for Streamable Speech Enhancement

    Authors: Chenda Li, Samuele Cornell, Shinji Watanabe, Yanmin Qian

    Abstract: Diffusion-based generative models (DGMs) have recently attracted attention in speech enhancement research (SE) as previous works showed a remarkable generalization capability. However, DGMs are also computationally intensive, as they usually require many iterations in the reverse diffusion process (RDP), making them impractical for streaming SE systems. In this paper, we propose to use discriminat… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  10. arXiv:2406.11740  [pdf, other

    cs.RO cs.AI cs.LG

    Imagination Policy: Using Generative Point Cloud Models for Learning Manipulation Policies

    Authors: Haojie Huang, Karl Schmeckpeper, Dian Wang, Ondrej Biza, Yaoyao Qian, Haotian Liu, Mingxi Jia, Robert Platt, Robin Walters

    Abstract: Humans can imagine goal states during planning and perform actions to match those goals. In this work, we propose Imagination Policy, a novel multi-task key-frame policy network for solving high-precision pick and place tasks. Instead of learning actions directly, Imagination Policy generates point clouds to imagine desired states which are then translated to actions using rigid action estimation.… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  11. arXiv:2406.11364  [pdf, other

    cs.SD eess.AS

    AnoPatch: Towards Better Consistency in Machine Anomalous Sound Detection

    Authors: Anbai Jiang, Bing Han, Zhiqiang Lv, Yufeng Deng, Wei-Qiang Zhang, Xie Chen, Yanmin Qian, Jia Liu, Pingyi Fan

    Abstract: Large pre-trained models have demonstrated dominant performances in multiple areas, where the consistency between pre-training and fine-tuning is the key to success. However, few works reported satisfactory results of pre-trained models for the machine anomalous sound detection (ASD) task. This may be caused by the inconsistency of the pre-trained model and the inductive bias of machine audio, res… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH 2024

  12. arXiv:2406.11134  [pdf, other

    physics.comp-ph cond-mat.dis-nn cond-mat.str-el physics.chem-ph

    Emergent Wigner phases in moiré superlattice from deep learning

    Authors: Xiang Li, Yubing Qian, Weiluo Ren, Yang Xu, Ji Chen

    Abstract: Moiré superlattice designed in stacked van der Waals material provides a dynamic platform for hosting exotic and emergent condensed matter phenomena. However, the relevance of strong correlation effects and the large size of moiré unit cells pose significant challenges for traditional computational techniques. To overcome these challenges, we develop an unsupervised deep learning approach to uncov… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  13. arXiv:2406.08812  [pdf, other

    cs.SD eess.AS

    Generating Speakers by Prompting Listener Impressions for Pre-trained Multi-Speaker Text-to-Speech Systems

    Authors: Zhengyang Chen, Xuechen Liu, Erica Cooper, Junichi Yamagishi, Yanmin Qian

    Abstract: This paper proposes a speech synthesis system that allows users to specify and control the acoustic characteristics of a speaker by means of prompts describing the speaker's traits of synthesized speech. Unlike previous approaches, our method utilizes listener impressions to construct prompts, which are easier to collect and align more naturally with everyday descriptions of speaker traits. We ado… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted for presentation at Interspeech 2024 (with more analysis in the final Appendix part)

  14. arXiv:2406.07855  [pdf, other

    cs.CL cs.SD eess.AS

    VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment

    Authors: Bing Han, Long Zhou, Shujie Liu, Sanyuan Chen, Lingwei Meng, Yanming Qian, Yanqing Liu, Sheng Zhao, Jinyu Li, Furu Wei

    Abstract: With the help of discrete neural audio codecs, large language models (LLM) have increasingly been recognized as a promising methodology for zero-shot Text-to-Speech (TTS) synthesis. However, sampling based decoding strategies bring astonishing diversity to generation, but also pose robustness issues such as typos, omissions and repetition. In addition, the high sampling rate of audio also brings h… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 15 pages, 5 figures

  15. arXiv:2406.07198  [pdf, other

    eess.AS cs.MM

    Target Speech Diarization with Multimodal Prompts

    Authors: Yidi Jiang, Ruijie Tao, Zhengyang Chen, Yanmin Qian, Haizhou Li

    Abstract: Traditional speaker diarization seeks to detect ``who spoke when'' according to speaker characteristics. Extending to target speech diarization, we detect ``when target event occurs'' according to the semantic characteristics of speech. We propose a novel Multimodal Target Speech Diarization (MM-TSD) framework, which accommodates diverse and multi-modal prompts to specify target events in a flexib… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 13 pages, 7 figures

  16. arXiv:2406.06350  [pdf, other

    math.NA cs.LG physics.comp-ph

    Error Analysis and Numerical Algorithm for PDE Approximation with Hidden-Layer Concatenated Physics Informed Neural Networks

    Authors: Yianxia Qian, Yongchao Zhang, Suchuan Dong

    Abstract: We present the hidden-layer concatenated physics informed neural network (HLConcPINN) method, which combines hidden-layer concatenated feed-forward neural networks, a modified block time marching strategy, and a physics informed approach for approximating partial differential equations (PDEs). We analyze the convergence properties and establish the error bounds of this method for two types of PDEs… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 40 pages, 10 tables, 18 figures

  17. arXiv:2406.05740  [pdf, ps, other

    math.OC

    Convergence of ZH-type nonmonotone descent method for Kurdyka-Łojasiewicz optimization problems

    Authors: Yitian Qian, Ting Tao, Shaohua Pan, Houduo Qi

    Abstract: This note concerns a class of nonmonotone descent methods for minimizing a proper lower semicontinuous Kurdyka-Ł$\ddot{o}$jasiewicz (KL) function $Φ$, whose iterate sequence obeys the ZH-type nonmonotone decrease condition and a relative error condition. We prove that the iterate sequence converges to a critical point of $Φ$, and if $Φ$ has the KL property of exponent $θ\in(0,1)$ at this critical… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  18. arXiv:2406.05370  [pdf, other

    cs.CL cs.SD eess.AS

    VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers

    Authors: Sanyuan Chen, Shujie Liu, Long Zhou, Yanqing Liu, Xu Tan, Jinyu Li, Sheng Zhao, Yao Qian, Furu Wei

    Abstract: This paper introduces VALL-E 2, the latest advancement in neural codec language models that marks a milestone in zero-shot text-to-speech synthesis (TTS), achieving human parity for the first time. Based on its predecessor, VALL-E, the new iteration introduces two significant enhancements: Repetition Aware Sampling refines the original nucleus sampling process by accounting for token repetition in… ▽ More

    Submitted 17 June, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

    Comments: Demo posted

  19. arXiv:2406.05359  [pdf, other

    eess.AS cs.SD

    Towards Lightweight Speaker Verification via Adaptive Neural Network Quantization

    Authors: Bei Liu, Haoyu Wang, Yanmin Qian

    Abstract: Modern speaker verification (SV) systems typically demand expensive storage and computing resources, thereby hindering their deployment on mobile devices. In this paper, we explore adaptive neural network quantization for lightweight speaker verification. Firstly, we propose a novel adaptive uniform precision quantization method which enables the dynamic generation of quantization centroids custom… ▽ More

    Submitted 18 June, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

    Comments: submitted to IEEE/ACM Transactions on Audio Speech and Language Processing (Under Review)

  20. arXiv:2406.04660  [pdf, other

    eess.AS cs.SD

    URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement

    Authors: Wangyou Zhang, Robin Scheibler, Kohei Saijo, Samuele Cornell, Chenda Li, Zhaoheng Ni, Anurag Kumar, Jan Pirklbauer, Marvin Sach, Shinji Watanabe, Tim Fingscheidt, Yanmin Qian

    Abstract: The last decade has witnessed significant advancements in deep learning-based speech enhancement (SE). However, most existing SE research has limitations on the coverage of SE sub-tasks, data diversity and amount, and evaluation metrics. To fill this gap and promote research toward universal SE, we establish a new SE challenge, named URGENT, to focus on the universality, robustness, and generaliza… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 6 pages, 3 figures, 3 tables. Accepted by Interspeech 2024. An extended version of the accepted manuscript with appendix

  21. arXiv:2406.04588  [pdf, ps, other

    math.OC stat.ML

    A majorized PAM method with subspace correction for low-rank composite factorization model

    Authors: Ting Tao, Yitian Qian, Shaohua Pan

    Abstract: This paper concerns a class of low-rank composite factorization models arising from matrix completion. For this nonconvex and nonsmooth optimization problem, we propose a proximal alternating minimization algorithm (PAMA) with subspace correction, in which a subspace correction step is imposed on every proximal subproblem so as to guarantee that the corrected proximal subproblem has a closed-form… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 28 pages, 3 figures

    MSC Class: 49J52; 90C26; 49M27

  22. arXiv:2406.04269  [pdf, other

    eess.AS cs.SD

    Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement

    Authors: Wangyou Zhang, Kohei Saijo, Jee-weon Jung, Chenda Li, Shinji Watanabe, Yanmin Qian

    Abstract: Deep learning-based speech enhancement (SE) models have achieved impressive performance in the past decade. Numerous advanced architectures have been designed to deliver state-of-the-art performance; however, their scalability potential remains unrevealed. Meanwhile, the majority of research focuses on small-sized datasets with restricted diversity, leading to a plateau in performance improvement.… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 5 pages, 3 figures, 4 tables, Accepted by Interspeech 2024

  23. arXiv:2405.17809  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation

    Authors: Chenyang Le, Yao Qian, Dongmei Wang, Long Zhou, Shujie Liu, Xiaofei Wang, Midia Yousefi, Yanmin Qian, Jinyu Li, Sheng Zhao, Michael Zeng

    Abstract: There is a rising interest and trend in research towards directly translating speech from one language to another, known as end-to-end speech-to-speech translation. However, most end-to-end models struggle to outperform cascade models, i.e., a pipeline framework by concatenating speech recognition, machine translation and text-to-speech models. The primary challenges stem from the inherent complex… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Work in progress

  24. arXiv:2405.17233  [pdf, other

    cs.LG

    CLAQ: Pushing the Limits of Low-Bit Post-Training Quantization for LLMs

    Authors: Haoyu Wang, Bei Liu, Hang Shao, Bo Xiao, Ke Zeng, Guanglu Wan, Yanmin Qian

    Abstract: Parameter quantization for Large Language Models (LLMs) has attracted increasing attentions recently in reducing memory costs and improving computational efficiency. Early approaches have been widely adopted. However, the existing methods suffer from poor performance in low-bit (such as 2 to 3 bits) scenarios. In this paper, we present a novel and effective Column-Level Adaptive weight Quantizatio… ▽ More

    Submitted 2 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

  25. arXiv:2405.15580  [pdf, other

    cs.CV

    Open-Vocabulary SAM3D: Understand Any 3D Scene

    Authors: Hanchen Tai, Qingdong He, Jiangning Zhang, Yijie Qian, Zhenyu Zhang, Xiaobin Hu, Yabiao Wang, Yong Liu

    Abstract: Open-vocabulary 3D scene understanding presents a significant challenge in the field. Recent advancements have sought to transfer knowledge embedded in vision language models from the 2D domain to 3D domain. However, these approaches often require learning prior knowledge from specific 3D scene datasets, which limits their applicability in open-world scenarios. The Segment Anything Model (SAM) has… ▽ More

    Submitted 21 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: Project page: https://hithqd.github.io/projects/OV-SAM3D

  26. arXiv:2405.13080  [pdf, other

    cs.CR cs.LG

    EmInspector: Combating Backdoor Attacks in Federated Self-Supervised Learning Through Embedding Inspection

    Authors: Yuwen Qian, Shuchi Wu, Kang Wei, Ming Ding, Di Xiao, Tao Xiang, Chuan Ma, Song Guo

    Abstract: Federated self-supervised learning (FSSL) has recently emerged as a promising paradigm that enables the exploitation of clients' vast amounts of unlabeled data while preserving data privacy. While FSSL offers advantages, its susceptibility to backdoor attacks, a concern identified in traditional federated supervised learning (FSL), has not been investigated. To fill the research gap, we undertake… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: 18 pages, 12 figures

  27. arXiv:2405.12944  [pdf, other

    cs.CV

    AMFD: Distillation via Adaptive Multimodal Fusion for Multispectral Pedestrian Detection

    Authors: Zizhao Chen, Yeqiang Qian, Xiaoxiao Yang, Chunxiang Wang, Ming Yang

    Abstract: Multispectral pedestrian detection has been shown to be effective in improving performance within complex illumination scenarios. However, prevalent double-stream networks in multispectral detection employ two separate feature extraction branches for multi-modal data, leading to nearly double the inference time compared to single-stream networks utilizing only one feature extraction branch. This i… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  28. arXiv:2405.12914  [pdf, other

    cs.CV

    An Empirical Study and Analysis of Text-to-Image Generation Using Large Language Model-Powered Textual Representation

    Authors: Zhiyu Tan, Mengping Yang, Luozheng Qin, Hao Yang, Ye Qian, Qiang Zhou, Cheng Zhang, Hao Li

    Abstract: One critical prerequisite for faithful text-to-image generation is the accurate understanding of text inputs. Existing methods leverage the text encoder of the CLIP model to represent input prompts. However, the pre-trained CLIP model can merely encode English with a maximum token length of 77. Moreover, the model capacity of the text encoder from CLIP is relatively limited compared to Large Langu… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: Technical report. Project page: https://github.com/llm-conditioned-diffusion/llm-conditioned-diffusion

  29. arXiv:2405.06510  [pdf, other

    cs.AI

    UniDM: A Unified Framework for Data Manipulation with Large Language Models

    Authors: Yichen Qian, Yongyi He, Rong Zhu, Jintao Huang, Zhijian Ma, Haibin Wang, Yaohua Wang, Xiuyu Sun, Defu Lian, Bolin Ding, Jingren Zhou

    Abstract: Designing effective data manipulation methods is a long standing problem in data lakes. Traditional methods, which rely on rules or machine learning models, require extensive human efforts on training data collection and tuning models. Recent methods apply Large Language Models (LLMs) to resolve multiple data manipulation tasks. They exhibit bright benefits in terms of performance but still requir… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

    Comments: MLSys24

  30. arXiv:2404.19040  [pdf, other

    cs.CV

    GSTalker: Real-time Audio-Driven Talking Face Generation via Deformable Gaussian Splatting

    Authors: Bo Chen, Shoukang Hu, Qi Chen, Chenpeng Du, Ran Yi, Yanmin Qian, Xie Chen

    Abstract: We present GStalker, a 3D audio-driven talking face generation model with Gaussian Splatting for both fast training (40 minutes) and real-time rendering (125 FPS) with a 3$\sim$5 minute video for training material, in comparison with previous 2D and 3D NeRF-based modeling frameworks which require hours of training and seconds of rendering per frame. Specifically, GSTalker learns an audio-driven Ga… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  31. arXiv:2404.16383  [pdf, other

    cond-mat.mes-hall

    Euler band topology in spin-orbit coupled magnetic systems

    Authors: Seung Hun Lee, Yuting Qian, Bohm-Jung Yang

    Abstract: The Euler class characterizes the topology of two real bands isolated from other bands in two-dimensions. Despite various intriguing topological properties predicted up to now, the candidate real materials hosting electronic Euler bands are extremely rare. Here, we show that in a quantum spin Hall insulator with two-fold rotation $C_{2z}$ about the $z$-axis, a pair of bands with nontrivial… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  32. arXiv:2404.14485  [pdf, other

    hep-ph astro-ph.HE

    Enhanced Muonization by Active-Sterile Neutrino Mixing in Protoneutron Stars

    Authors: Anupam Ray, Yong-Zhong Qian

    Abstract: We study $ν_μ$-$ν_s$ and $\barν_μ$-$\barν_s$ mixing in the protoneutron star (PNS) created in a core-collapse supernova (CCSN). We point out the importance of the feedback on the general composition of the PNS in addition to the obvious feedback on the $ν_μ$ lepton number. We show that for our adopted mixing parameters $δm^2\sim 10^2\,$keV$^2$ and $\sin^2 2θ$ consistent with the current constraint… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: 12 pages, 7 figures. Comments are welcome

    Report number: N3AS-24-013

  33. arXiv:2404.13279  [pdf, other

    cs.CR eess.IV eess.SP

    Backdoor Attacks and Defenses on Semantic-Symbol Reconstruction in Semantic Communications

    Authors: Yuan Zhou, Rose Qingyang Hu, Yi Qian

    Abstract: Semantic communication is of crucial importance for the next-generation wireless communication networks. The existing works have developed semantic communication frameworks based on deep learning. However, systems powered by deep learning are vulnerable to threats such as backdoor attacks and adversarial attacks. This paper delves into backdoor attacks targeting deep learning-enabled semantic comm… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

    Comments: This paper has been accepted by IEEE ICC 2024

  34. arXiv:2404.11035  [pdf, other

    cs.IT cs.DC cs.NI

    Approximate Wireless Communication for Lossy Gradient Updates in IoT Federated Learning

    Authors: Xiang Ma, Haijian Sun, Rose Qingyang Hu, Yi Qian

    Abstract: Federated learning (FL) has emerged as a distributed machine learning (ML) technique that can protect local data privacy for participating clients and improve system efficiency. Instead of sharing raw data, FL exchanges intermediate learning parameters, such as gradients, among clients. This article presents an efficient wireless communication approach tailored for FL parameter transmission, espec… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: submitted to IEEE journals for publication

  35. arXiv:2404.06690  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations

    Authors: Leying Zhang, Yao Qian, Long Zhou, Shujie Liu, Dongmei Wang, Xiaofei Wang, Midia Yousefi, Yanmin Qian, Jinyu Li, Lei He, Sheng Zhao, Michael Zeng

    Abstract: Recent advancements in zero-shot text-to-speech (TTS) modeling have led to significant strides in generating high-fidelity and diverse speech. However, dialogue generation, along with achieving human-like naturalness in speech, continues to be a challenge. In this paper, we introduce CoVoMix: Conversational Voice Mixture Generation, a novel model for zero-shot, human-like, multi-speaker, multi-rou… ▽ More

    Submitted 29 May, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

  36. arXiv:2404.01462  [pdf, other

    cs.LG cs.CL cs.IR

    OpenChemIE: An Information Extraction Toolkit For Chemistry Literature

    Authors: Vincent Fan, Yujie Qian, Alex Wang, Amber Wang, Connor W. Coley, Regina Barzilay

    Abstract: Information extraction from chemistry literature is vital for constructing up-to-date reaction databases for data-driven chemistry. Complete extraction requires combining information across text, tables, and figures, whereas prior work has mainly investigated extracting reactions from single modalities. In this paper, we present OpenChemIE to address this complex challenge and enable the extractio… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: To be submitted to the Journal of Chemical Information and Modeling

  37. arXiv:2404.00878  [pdf, other

    cs.CV

    TryOn-Adapter: Efficient Fine-Grained Clothing Identity Adaptation for High-Fidelity Virtual Try-On

    Authors: Jiazheng Xing, Chao Xu, Yijie Qian, Yang Liu, Guang Dai, Baigui Sun, Yong Liu, Jingdong Wang

    Abstract: Virtual try-on focuses on adjusting the given clothes to fit a specific person seamlessly while avoiding any distortion of the patterns and textures of the garment. However, the clothing identity uncontrollability and training inefficiency of existing diffusion-based methods, which struggle to maintain the identity even with full parameter training, are significant limitations that hinder the wide… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

  38. arXiv:2404.00875  [pdf, other

    cs.CV

    DPA-Net: Structured 3D Abstraction from Sparse Views via Differentiable Primitive Assembly

    Authors: Fenggen Yu, Yiming Qian, Xu Zhang, Francisca Gil-Ureta, Brian Jackson, Eric Bennett, Hao Zhang

    Abstract: We present a differentiable rendering framework to learn structured 3D abstractions in the form of primitive assemblies from sparse RGB images capturing a 3D object. By leveraging differentiable volume rendering, our method does not require 3D supervision. Architecturally, our network follows the general pipeline of an image-conditioned neural radiance field (NeRF) exemplified by pixelNeRF for col… ▽ More

    Submitted 2 April, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

    Comments: 14 pages

  39. arXiv:2404.00589   

    cs.LG cs.CL

    Harnessing the Power of Large Language Model for Uncertainty Aware Graph Processing

    Authors: Zhenyu Qian, Yiming Qian, Yuting Song, Fei Gao, Hai Jin, Chen Yu, Xia Xie

    Abstract: Handling graph data is one of the most difficult tasks. Traditional techniques, such as those based on geometry and matrix factorization, rely on assumptions about the data relations that become inadequate when handling large and complex graph data. On the other hand, deep learning approaches demonstrate promising results in handling large graph data, but they often fall short of providing interpr… ▽ More

    Submitted 12 April, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

    Comments: Because my organization does not allow members to privately upload papers to arXiv, I am requesting a withdrawal of my submission

  40. arXiv:2403.17870  [pdf, other

    cs.CV cs.MM

    Boosting Diffusion Models with Moving Average Sampling in Frequency Domain

    Authors: Yurui Qian, Qi Cai, Yingwei Pan, Yehao Li, Ting Yao, Qibin Sun, Tao Mei

    Abstract: Diffusion models have recently brought a powerful revolution in image generation. Despite showing impressive generative capabilities, most of these models rely on the current sample to denoise the next one, possibly resulting in denoising instability. In this paper, we reinterpret the iterative denoising process as model optimization and leverage a moving average mechanism to ensemble all the prio… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: CVPR 2024

  41. arXiv:2403.16002  [pdf, other

    cs.CV

    SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking

    Authors: Xiaojun Hou, Jiazheng Xing, Yijie Qian, Yaowei Guo, Shuo Xin, Junhao Chen, Kai Tang, Mengmeng Wang, Zhengkai Jiang, Liang Liu, Yong Liu

    Abstract: Multimodal Visual Object Tracking (VOT) has recently gained significant attention due to its robustness. Early research focused on fully fine-tuning RGB-based trackers, which was inefficient and lacked generalized representation due to the scarcity of multimodal data. Therefore, recent studies have utilized prompt tuning to transfer pre-trained RGB-based trackers to multimodal data. However, the m… ▽ More

    Submitted 27 March, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR2024

  42. Advanced Long-Content Speech Recognition With Factorized Neural Transducer

    Authors: Xun Gong, Yu Wu, Jinyu Li, Shujie Liu, Rui Zhao, Xie Chen, Yanmin Qian

    Abstract: In this paper, we propose two novel approaches, which integrate long-content information into the factorized neural transducer (FNT) based architecture in both non-streaming (referred to as LongFNT ) and streaming (referred to as SLongFNT ) scenarios. We first investigate whether long-content transcriptions can improve the vanilla conformer transducer (C-T) models. Our experiments indicate that th… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: Accepted by TASLP 2024

    Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 1803-1815, 2024

  43. arXiv:2403.11210  [pdf, ps, other

    math.OC

    Error bounds for rank-one DNN reformulation of QAP and DC exact penalty approach

    Authors: Yitian Qian, Shaohua Pan, Shujun Bi, Houduo Qi

    Abstract: This paper concerns the quadratic assignment problem (QAP), a class of challenging combinatorial optimization problems. We provide an equivalent rank-one doubly nonnegative (DNN) reformulation with fewer equality constraints, and derive the local error bounds for its feasible set. By leveraging these error bounds, we prove that the penalty problem induced by the difference of convexity (DC) reform… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  44. arXiv:2403.01686  [pdf, other

    astro-ph.HE astro-ph.GA

    AT2023lli: A Tidal Disruption Event with Prominent Optical Early Bump and Delayed Episodic X-ray Emission

    Authors: Shifeng Huang, Ning Jiang, Jiazheng Zhu, Yibo Wang, Tinggui Wang, Shan-Qin Wang, Wen-Pei Gan, En-Wei Liang, Yu-Jing Qin, Zheyu Lin, Lin-Na Xu, Min-Xuan Cai, Ji-An Jiang, Xu Kong, Jiaxun Li, Long Li, Jian-Guo Wang, Ze-Lin Xu, Yongquan Xue, Ye-Fei Yuan, Jingquan Cheng, Lulu Fan, Jie Gao, Lei Hu, Weida Hu , et al. (20 additional authors not shown)

    Abstract: High-cadence, multiwavelength observations have continuously revealed the diversity of tidal disruption events (TDEs), thus greatly advancing our knowledge and understanding of TDEs. In this work, we conducted an intensive optical-UV and X-ray follow-up campaign of TDE AT2023lli, and found a remarkable month-long bump in its UV/optical light curve nearly two months prior to maximum brightness. The… ▽ More

    Submitted 26 March, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

    Comments: 14 pages, 8 figures,accepted for publication by ApJL

  45. arXiv:2403.01257  [pdf

    eess.SY

    Secure and Scalable Network Slicing with Plug-and-Play Support for Power Distribution System Communication Networks

    Authors: Jian Zhong, Chen Chen, Yuqi Qian, Yiheng Bian, Yuxiong Huang, Zhaohong Bie

    Abstract: With the rapid development of power distribution systems (PDSs), the number of terminal devices and the types of delivered services involved are constantly growing. These trends make the operations of PDSs highly dependent on the support of advanced communication networks, which face two related challenges. The first is to provide sufficient flexibility, resilience, and security to meet varying de… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

  46. arXiv:2403.00673  [pdf, other

    cs.LG

    Snapshot Reinforcement Learning: Leveraging Prior Trajectories for Efficiency

    Authors: Yanxiao Zhao, Yangge Qian, Tianyi Wang, Jingyang Shan, Xiaolin Qin

    Abstract: Deep reinforcement learning (DRL) algorithms require substantial samples and computational resources to achieve higher performance, which restricts their practical application and poses challenges for further development. Given the constraint of limited resources, it is essential to leverage existing computational work (e.g., learned policies, samples) to enhance sample efficiency and reduce the c… ▽ More

    Submitted 12 March, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

    Comments: Under review

  47. Event-Triggered Robust Cooperative Output Regulation for a Class of Linear Multi-Agent Systems with an Unknown Exosystem

    Authors: Yangyang Qian, Lu Liu

    Abstract: This paper investigates the robust cooperative output regulation problem for a class of heterogeneous uncertain linear multi-agent systems with an unknown exosystem via event-triggered control (ETC). By utilizing the internal model approach and the adaptive control technique, a distributed adaptive internal model is constructed for each agent. Then, based on this internal model, a fully distribute… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

    Comments: 13 pages, 8 figures

  48. arXiv:2402.15173  [pdf, other

    cs.LG

    Second-Order Fine-Tuning without Pain for LLMs:A Hessian Informed Zeroth-Order Optimizer

    Authors: Yanjun Zhao, Sizhe Dang, Haishan Ye, Guang Dai, Yi Qian, Ivor W. Tsang

    Abstract: Fine-tuning large language models (LLMs) with classic first-order optimizers entails prohibitive GPU memory due to the backpropagation process. Recent works have turned to zeroth-order optimizers for fine-tuning, which save substantial memory by using two forward passes. However, these optimizers are plagued by the heterogeneity of parameter curvatures across different dimensions. In this work, we… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  49. arXiv:2402.13220  [pdf, other

    cs.CV cs.CL

    How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts

    Authors: Yusu Qian, Haotian Zhang, Yinfei Yang, Zhe Gan

    Abstract: The remarkable advancements in Multimodal Large Language Models (MLLMs) have not rendered them immune to challenges, particularly in the context of handling deceptive information in prompts, thus producing hallucinated responses under such conditions. To quantitatively assess this vulnerability, we present MAD-Bench, a carefully curated benchmark that contains 850 test samples divided into 6 categ… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  50. arXiv:2402.08875  [pdf, other

    cs.CV

    Advancing Human Action Recognition with Foundation Models trained on Unlabeled Public Videos

    Authors: Yang Qian, Yinan Sun, Ali Kargarandehkordi, Parnian Azizian, Onur Cezmi Mutlu, Saimourya Surabhi, Pingyi Chen, Zain Jabbar, Dennis Paul Wall, Peter Washington

    Abstract: The increasing variety and quantity of tagged multimedia content on a variety of online platforms offer a unique opportunity to advance the field of human action recognition. In this study, we utilize 283,582 unique, unlabeled TikTok video clips, categorized into 386 hashtags, to train a domain-specific foundation model for action recognition. We employ VideoMAE V2, an advanced model integrating M… ▽ More

    Submitted 15 July, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

    Comments: 10 pages