Skip to main content

Showing 201–250 of 7,366 results for author: zhang, C

  1. arXiv:2406.05485  [pdf, other

    cs.CV

    Training-Free Robust Interactive Video Object Segmentation

    Authors: Xiaoli Wei, Zhaoqing Wang, Yandong Guo, Chunxia Zhang, Tongliang Liu, Mingming Gong

    Abstract: Interactive video object segmentation is a crucial video task, having various applications from video editing to data annotating. However, current approaches struggle to accurately segment objects across diverse domains. Recently, Segment Anything Model (SAM) introduces interactive visual prompts and demonstrates impressive performance across different domains. In this paper, we propose a training… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  2. arXiv:2406.04840  [pdf, other

    cs.SD eess.AS

    TraceableSpeech: Towards Proactively Traceable Text-to-Speech with Watermarking

    Authors: Junzuo Zhou, Jiangyan Yi, Tao Wang, Jianhua Tao, Ye Bai, Chu Yuan Zhang, Yong Ren, Zhengqi Wen

    Abstract: Various threats posed by the progress in text-to-speech (TTS) have prompted the need to reliably trace synthesized speech. However, contemporary approaches to this task involve adding watermarks to the audio separately after generation, a process that hurts both speech quality and watermark imperceptibility. In addition, these approaches are limited in robustness and flexibility. To address these… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: acceped by interspeech 2024

  3. arXiv:2406.04835  [pdf, other

    cs.RO

    SLR: Learning Quadruped Locomotion without Privileged Information

    Authors: Shiyi Chen, Zeyu Wan, Shiyang Yan, Chun Zhang, Weiyi Zhang, Qiang Li, Debing Zhang, Fasih Ud Din Farrukh

    Abstract: Traditional reinforcement learning control for quadruped robots often relies on privileged information, demanding meticulous selection and precise estimation, thereby imposing constraints on the development process. This work proposes a Self-learning Latent Representation (SLR) method, which achieves high-performance control policy learning without the need for privileged information. To enhance t… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  4. arXiv:2406.04828  [pdf, other

    cs.IR

    QAGCF: Graph Collaborative Filtering for Q&A Recommendation

    Authors: Changshuo Zhang, Teng Shi, Xiao Zhang, Yanping Zheng, Ruobing Xie, Qi Liu, Jun Xu, Ji-Rong Wen

    Abstract: Question and answer (Q&A) platforms usually recommend question-answer pairs to meet users' knowledge acquisition needs, unlike traditional recommendations that recommend only one item. This makes user behaviors more complex, and presents two challenges for Q&A recommendation, including: the collaborative information entanglement, which means user feedback is influenced by either the question or th… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  5. arXiv:2406.04802  [pdf, other

    cs.CV cs.LG

    Predictive Dynamic Fusion

    Authors: Bing Cao, Yinan Xia, Yi Ding, Changqing Zhang, Qinghua Hu

    Abstract: Multimodal fusion is crucial in joint decision-making systems for rendering holistic judgments. Since multimodal data changes in open environments, dynamic fusion has emerged and achieved remarkable progress in numerous applications. However, most existing dynamic multimodal fusion methods lack theoretical guarantees and easily fall into suboptimal problems, yielding unreliability and instability.… ▽ More

    Submitted 13 July, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

    Comments: Accepted by ICML 2024

  6. arXiv:2406.04692  [pdf, other

    cs.CL

    Mixture-of-Agents Enhances Large Language Model Capabilities

    Authors: Junlin Wang, Jue Wang, Ben Athiwaratkun, Ce Zhang, James Zou

    Abstract: Recent advances in large language models (LLMs) demonstrate substantial capabilities in natural language understanding and generation tasks. With the growing number of LLMs, how to harness the collective expertise of multiple LLMs is an exciting open direction. Toward this goal, we propose a new approach that leverages the collective strengths of multiple LLMs through a Mixture-of-Agents (MoA) met… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  7. arXiv:2406.04677  [pdf, other

    cond-mat.mtrl-sci

    Electric leakage suppression of phase-transforming ferroelectrics with donor impurities

    Authors: Chenbo Zhang, Xiaotong Peng, Bo Liu, Kai Zhang, Xian Chen

    Abstract: Phase-transforming ferroelectric materials are widely used in energy harvesting and conversion devices. However, the functionality of these devices is significantly impeded by electrical leakage at high temperatures. In this study, we fundamentally study the mechanism of electrical leakage suppression due to phase transformation in a series of donor-doped ferroelectric oxides,Ba0.955Eu0.03Ti(1-x)Z… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 16 pages, 6 figures

  8. arXiv:2406.04633  [pdf, ps, other

    eess.AS

    Boosting Diffusion Model for Spectrogram Up-sampling in Text-to-speech: An Empirical Study

    Authors: Chong Zhang, Yanqing Liu, Yang Zheng, Sheng Zhao

    Abstract: Scaling text-to-speech (TTS) with autoregressive language model (LM) to large-scale datasets by quantizing waveform into discrete speech tokens is making great progress to capture the diversity and expressiveness in human speech, but the speech reconstruction quality from discrete speech token is far from satisfaction depending on the compressed speech token compression ratio. Generative diffusion… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  9. arXiv:2406.04593  [pdf, other

    physics.chem-ph q-bio.BM

    SynAsk: Unleashing the Power of Large Language Models in Organic Synthesis

    Authors: Chonghuan Zhang, Qianghua Lin, Biwei Zhu, Haopeng Yang, Xiao Lian, Hao Deng, Jiajun Zheng, Kuangbiao Liao

    Abstract: The field of natural language processing (NLP) has witnessed a transformative shift with the emergence of large language models (LLMs), revolutionizing various language tasks and applications, and the integration of LLM into specialized domains enhances their capabilities for domain-specific applications. Notably, NLP has made significant strides in organic chemistry, particularly in predicting sy… ▽ More

    Submitted 13 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

  10. arXiv:2406.04251  [pdf, other

    cs.CV

    Gaussian Splatting with Localized Points Management

    Authors: Haosen Yang, Chenhao Zhang, Wenqing Wang, Marco Volino, Adrian Hilton, Li Zhang, Xiatian Zhu

    Abstract: Point management is a critical component in optimizing 3D Gaussian Splatting (3DGS) models, as the point initiation (e.g., via structure from motion) is distributionally inappropriate. Typically, the Adaptive Density Control (ADC) algorithm is applied, leveraging view-averaged gradient magnitude thresholding for point densification, opacity thresholding for pruning, and regular all-points opacity… ▽ More

    Submitted 13 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

  11. arXiv:2406.04027  [pdf, other

    cs.CR cs.SE

    PowerPeeler: A Precise and General Dynamic Deobfuscation Method for PowerShell Scripts

    Authors: Ruijie Li, Chenyang Zhang, Huajun Chai, Lingyun Ying, Haixin Duan, Jun Tao

    Abstract: PowerShell is a powerful and versatile task automation tool. Unfortunately, it is also widely abused by cyber attackers. To bypass malware detection and hinder threat analysis, attackers often employ diverse techniques to obfuscate malicious PowerShell scripts. Existing deobfuscation tools suffer from the limitation of static analysis, which fails to simulate the real deobfuscation process accurat… ▽ More

    Submitted 19 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: To appear in the ACM CCS 2024

  12. arXiv:2406.03889  [pdf, ps, other

    math.AP

    Harnack inequality for doubly nonlinear mixed local and nonlocal parabolic equations

    Authors: Vicentiu Radulescu, Bin Shang, Chao Zhang

    Abstract: In this paper, we establish the Harnack inequality of nonnegative weak solutions to the doubly nonlinear mixed local and nonlocal parabolic equations. This result is obtained by combining a related comparison principle, a local boundedness estimate, and an integral Harnack-type inequality. Our proof is based on the expansion of positivity together with a comparison argument.

    Submitted 6 June, 2024; originally announced June 2024.

  13. arXiv:2406.03882  [pdf, other

    cs.CL cs.SD eess.AS

    Spontaneous Speech-Based Suicide Risk Detection Using Whisper and Large Language Models

    Authors: Ziyun Cui, Chang Lei, Wen Wu, Yinan Duan, Diyang Qu, Ji Wu, Runsen Chen, Chao Zhang

    Abstract: The early detection of suicide risk is important since it enables the intervention to prevent potential suicide attempts. This paper studies the automatic detection of suicide risk based on spontaneous speech from adolescents, and collects a Mandarin dataset with 15 hours of suicide speech from more than a thousand adolescents aged from ten to eighteen for our experiments. To leverage the diverse… ▽ More

    Submitted 9 July, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  14. arXiv:2406.03542  [pdf, other

    astro-ph.HE astro-ph.GA

    Cosmic ray diffusion in magnetic fields amplified by nonlinear turbulent dynamo

    Authors: Chao Zhang, Siyao Xu

    Abstract: The diffusion of cosmic rays (CRs) in turbulent magnetic fields is fundamental to understand various astrophysical processes. We explore the CR diffusion in the magnetic luctuations amplified by the nonlinear turbulent dynamo, in the absence of a strong mean magnetic field. Using test particle simulations, we identify three distinct CR diffusion regimes: mirroring, wandering, and magnetic moment s… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 15 pages, 15 figure, submitted to ApJ

  15. arXiv:2406.03387  [pdf, other

    hep-ex

    Measurement of the branching fraction ratios $R(D^{+})$ and $R(D^{*+})$ using muonic $τ$ decays

    Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1063 additional authors not shown)

    Abstract: The branching fraction ratios of $\overline{B}^0\to D^+τ^-\overlineν_τ$ and $\overline{B}^0\to D^{*+}τ^-\overlineν_τ$ decays are measured with respect to their muonic counterparts, using a data sample corresponding to an integrated luminosity of 2.0 fb$^{-1}$ collected by the LHCb experiment in proton-proton collisions at $\sqrt{s} = 13$ TeV. The reconstructed final states are formed by combining… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: All figures and tables, along with machine-readable versions and any supplementary material and additional information, are available at https://lhcbproject.web.cern.ch/Publications/LHCbProjectPublic/LHCb-PAPER-2024-007.html (LHCb public pages)

    Report number: LHCb-PAPER-2024-007, CERN-EP-2024-125

  16. arXiv:2406.03372  [pdf, other

    physics.app-ph cs.LG

    Training of Physical Neural Networks

    Authors: Ali Momeni, Babak Rahmani, Benjamin Scellier, Logan G. Wright, Peter L. McMahon, Clara C. Wanjura, Yuhang Li, Anas Skalli, Natalia G. Berloff, Tatsuhiro Onodera, Ilker Oguz, Francesco Morichetti, Philipp del Hougne, Manuel Le Gallo, Abu Sebastian, Azalia Mirhoseini, Cheng Zhang, Danijela Marković, Daniel Brunner, Christophe Moser, Sylvain Gigan, Florian Marquardt, Aydogan Ozcan, Julie Grollier, Andrea J. Liu , et al. (3 additional authors not shown)

    Abstract: Physical neural networks (PNNs) are a class of neural-like networks that leverage the properties of physical systems to perform computation. While PNNs are so far a niche research area with small-scale laboratory demonstrations, they are arguably one of the most underappreciated important opportunities in modern AI. Could we train AI models 1000x larger than current ones? Could we do this and also… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 29 pages, 4 figures

  17. arXiv:2406.03199  [pdf, other

    cs.CL cs.AI cs.LG

    Bayesian WeakS-to-Strong from Text Classification to Generation

    Authors: Ziyun Cui, Ziyang Zhang, Wen Wu, Guangzhi Sun, Chao Zhang

    Abstract: Advances in large language models raise the question of how alignment techniques will adapt as models become increasingly complex and humans will only be able to supervise them weakly. Weak-to-Strong mimics such a scenario where weak model supervision attempts to harness the full capabilities of a much stronger model. This work extends Weak-to-Strong to WeakS-to-Strong by exploring an ensemble of… ▽ More

    Submitted 24 May, 2024; originally announced June 2024.

  18. arXiv:2406.03156  [pdf, other

    hep-ex

    Observation of new charmonium(-like) states in $B^+ \to D^{*\pm} D^{\mp} K^+$ decays

    Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1062 additional authors not shown)

    Abstract: A study of resonant structures in $B^{+}\rightarrow{D^{\ast+}D^{-}K^{+}}$ and $B^{+}\rightarrow{D^{\ast-}D^{+}K^{+}}$ decays is performed, using proton-proton collision data at centre-of-mass energies of $\sqrt{s}=7, 8$, and $13$ TeV recorded by the LHCb experiment, corresponding to an integrated luminosity of 9 fb$^{-1}$. A simultaneous amplitude fit is performed to the two channels with contribu… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: All figures and tables, along with any supplementary material and additional information, are available at https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2023-047.html (LHCb public pages)

    Report number: LHCb-PAPER-2023-047, CERN-EP-2024-096

  19. arXiv:2406.03088  [pdf, other

    cs.AR cs.LG

    HASS: Hardware-Aware Sparsity Search for Dataflow DNN Accelerator

    Authors: Zhewen Yu, Sudarshan Sreeram, Krish Agrawal, Junyi Wu, Alexander Montgomerie-Corcoran, Cheng Zhang, Jianyi Cheng, Christos-Savvas Bouganis, Yiren Zhao

    Abstract: Deep Neural Networks (DNNs) excel in learning hierarchical representations from raw data, such as images, audio, and text. To compute these DNN models with high performance and energy efficiency, these models are usually deployed onto customized hardware accelerators. Among various accelerator designs, dataflow architecture has shown promising performance due to its layer-pipelined structure and i… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: accepted to FPL2024

  20. arXiv:2406.03032  [pdf, other

    cs.CV

    Instructing Prompt-to-Prompt Generation for Zero-Shot Learning

    Authors: Man Liu, Huihui Bai, Feng Li, Chunjie Zhang, Yunchao Wei, Meng Wang, Tat-Seng Chua, Yao Zhao

    Abstract: Zero-shot learning (ZSL) aims to explore the semantic-visual interactions to discover comprehensive knowledge transferred from seen categories to classify unseen categories. Recently, prompt engineering has emerged in ZSL, demonstrating impressive potential as it enables the zero-shot transfer of diverse visual concepts to downstream tasks. However, these methods are still not well generalized to… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  21. arXiv:2406.03006  [pdf, ps, other

    quant-ph cs.DS cs.LG math.OC

    Quantum Algorithms and Lower Bounds for Finite-Sum Optimization

    Authors: Yexin Zhang, Chenyi Zhang, Cong Fang, Liwei Wang, Tongyang Li

    Abstract: Finite-sum optimization has wide applications in machine learning, covering important problems such as support vector machines, regression, etc. In this paper, we initiate the study of solving finite-sum optimization problems by quantum computing. Specifically, let $f_1,\ldots,f_n\colon\mathbb{R}^d\to\mathbb{R}$ be $\ell$-smooth convex functions and $ψ\colon\mathbb{R}^d\to\mathbb{R}$ be a $μ$-stro… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 27 pages. To appear in the Forty-first International Conference on Machine Learning International Conference on Machine Learning (ICML 2024)

  22. arXiv:2406.02931  [pdf, other

    hep-ex

    Measurements of the branching fractions of the $P$-wave charmonium spin-singlet state $h_c(^1P_1) \to h^+ h^-π^0/η$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (643 additional authors not shown)

    Abstract: Based on $(2712.4\pm 14.3)\times10^{6}$ $ψ(3686)$ events, we investigate four hadronic decay modes of the $P$-wave charmonium spin-singlet state $h_c(^1P_1) \to h^+ h^- π^0/η$ ($h=π$ or $K$) via the process $ψ(3686) \to π^{0}h_c$ at BESIII. The $h_c \to π^+ π^- π^0$ decay is observed with a significance of 9.6$σ$ after taking into account systematic uncertainties. Evidences for… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 9 pages, 7 figures

  23. arXiv:2406.02907  [pdf

    cond-mat.mes-hall

    Room-temperature tunable tunneling magnetoresistance in Fe3GaTe2/WSe2/Fe3GaTe2 van der Waals heterostructures

    Authors: Haiyang Pan, Anil Kumar Singh, Chusheng Zhang, Xueqi Hu, Jiayu Shi, Liheng An, Naizhou Wang, Ruihuan Duan, Zheng Liu, S tuart S. P. Parkin, Pritam Deb, Weibo Gao

    Abstract: The exceptional properties of two-dimensional (2D) magnet materials present a novel approach to fabricate functional magnetic tunnel junctions (MTJ) by constructing full van der Waals (vdW) heterostructures with atomically sharp and clean interfaces. The exploration of vdW MTJ devices with high working temperature and adjustable functionalities holds great potential for advancing the application o… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Journal ref: InfoMat.2023;e12504

  24. arXiv:2406.02888  [pdf, other

    cs.CL cs.AI cs.LG

    HYDRA: Model Factorization Framework for Black-Box LLM Personalization

    Authors: Yuchen Zhuang, Haotian Sun, Yue Yu, Rushi Qiang, Qifan Wang, Chao Zhang, Bo Dai

    Abstract: Personalization has emerged as a critical research area in modern intelligent systems, focusing on mining users' behavioral history and adapting to their preferences for delivering tailored experiences. Despite the remarkable few-shot capabilities exhibited by black-box large language models (LLMs), the inherent opacity of their model parameters presents significant challenges in aligning the gene… ▽ More

    Submitted 10 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: 24 pages, 6 figures, work in progress

  25. arXiv:2406.02886  [pdf, other

    cs.CL cs.AI

    PLaD: Preference-based Large Language Model Distillation with Pseudo-Preference Pairs

    Authors: Rongzhi Zhang, Jiaming Shen, Tianqi Liu, Haorui Wang, Zhen Qin, Feng Han, Jialu Liu, Simon Baumgartner, Michael Bendersky, Chao Zhang

    Abstract: Large Language Models (LLMs) have exhibited impressive capabilities in various tasks, yet their vast parameter sizes restrict their applicability in resource-constrained settings. Knowledge distillation (KD) offers a viable solution by transferring expertise from large teacher models to compact student models. However, traditional KD techniques face specific challenges when applied to LLMs, includ… ▽ More

    Submitted 6 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: Findings of ACL 2024

  26. arXiv:2406.02874  [pdf, other

    cond-mat.mtrl-sci physics.comp-ph

    Giant enhancement of hole mobility for 4H-silicon carbide through suppressing interband electron-phonon scattering

    Authors: Jianshi Sun, Shouhang Li, Zhen Tong, Cheng Shao, Meng An, Xiongfei Zhu, Chuang Zhang, Xiangchuan Chen, Yucheng Xiong, Thomas Frauenheim, Xiangjun Liu

    Abstract: 4H-Silicon Carbide (4H-SiC) possesses a high Baliga figure of merit, making it a promising material for power electronics. However, its applications are limited by its low hole mobility. Herein, we found that the hole mobility of 4H-SiC is mainly limited by the strong interband electron-phonon scattering using mode-level first-principles calculations. Our research indicates that applying compressi… ▽ More

    Submitted 20 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: 22 pages, 4 figures

  27. arXiv:2406.02537  [pdf, other

    cs.CL cs.CV cs.LG

    TopViewRS: Vision-Language Models as Top-View Spatial Reasoners

    Authors: Chengzu Li, Caiqi Zhang, Han Zhou, Nigel Collier, Anna Korhonen, Ivan Vulić

    Abstract: Top-view perspective denotes a typical way in which humans read and reason over different types of maps, and it is vital for localization and navigation of humans as well as of `non-human' agents, such as the ones backed by large Vision-Language Models (VLMs). Nonetheless, spatial reasoning capabilities of modern VLMs remain unattested and underexplored. In this work, we thus study their capabilit… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 9 pages, 3 figures, 3 tables (21 pages, 4 figures, 15 tables including references and appendices)

  28. arXiv:2406.02381  [pdf, other

    q-bio.BM cs.AI

    Kirigami: large convolutional kernels improve deep learning-based RNA secondary structure prediction

    Authors: Marc Harary, Chengxin Zhang

    Abstract: We introduce a novel fully convolutional neural network (FCN) architecture for predicting the secondary structure of ribonucleic acid (RNA) molecules. Interpreting RNA structures as weighted graphs, we employ deep learning to estimate the probability of base pairing between nucleotide residues. Unique to our model are its massive 11-pixel kernels, which we argue provide a distinct advantage for FC… ▽ More

    Submitted 6 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: -Updated authorship and acknowledgements

  29. arXiv:2406.02087  [pdf, ps, other

    math.AP

    Boundedness of variation, oscillation and maximal differential transform on BMO space

    Authors: Wenting Hu, Kai Wu, Dongyong Yang, Chao Zhang

    Abstract: In this paper, we prove that the oscillation operator, variation operator and maximal differential transform associated with the approximate identities are bounded from ${\rm BMO}({\mathbb R}^n)$ to its subspace ${\rm BLO}({\mathbb R}^n)$.

    Submitted 4 June, 2024; originally announced June 2024.

  30. arXiv:2406.02009  [pdf, other

    eess.AS cs.CL cs.SD

    Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis

    Authors: Kun Zhou, Shengkui Zhao, Yukun Ma, Chong Zhang, Hao Wang, Dianwen Ng, Chongjia Ni, Nguyen Trung Hieu, Jia Qi Yip, Bin Ma

    Abstract: Recent language model-based text-to-speech (TTS) frameworks demonstrate scalability and in-context learning capabilities. However, they suffer from robustness issues due to the accumulation of errors in speech unit predictions during autoregressive language modeling. In this paper, we propose a phonetic enhanced language modeling method to improve the performance of TTS models. We leverage self-su… ▽ More

    Submitted 11 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  31. arXiv:2406.01934  [pdf, other

    cs.CL

    Optimal Transport Guided Correlation Assignment for Multimodal Entity Linking

    Authors: Zefeng Zhang, Jiawei Sheng, Chuang Zhang, Yunzhi Liang, Wenyuan Zhang, Siqi Wang, Tingwen Liu

    Abstract: Multimodal Entity Linking (MEL) aims to link ambiguous mentions in multimodal contexts to entities in a multimodal knowledge graph. A pivotal challenge is to fully leverage multi-element correlations between mentions and entities to bridge modality gap and enable fine-grained semantic matching. Existing methods attempt several local correlative mechanisms, relying heavily on the automatically lear… ▽ More

    Submitted 5 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: Findings of ACL 2024

  32. arXiv:2406.01917  [pdf, other

    cs.CV cs.AI

    GOMAA-Geo: GOal Modality Agnostic Active Geo-localization

    Authors: Anindya Sarkar, Srikumar Sastry, Aleksis Pirinen, Chongjie Zhang, Nathan Jacobs, Yevgeniy Vorobeychik

    Abstract: We consider the task of active geo-localization (AGL) in which an agent uses a sequence of visual cues observed during aerial navigation to find a target specified through multiple possible modalities. This could emulate a UAV involved in a search-and-rescue operation navigating through an area, observing a stream of aerial images as it goes. The AGL task is associated with two important challenge… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 23 pages, 17 figures

  33. arXiv:2406.01375  [pdf, other

    cs.CL

    D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models

    Authors: Haoran Que, Jiaheng Liu, Ge Zhang, Chenchen Zhang, Xingwei Qu, Yinghao Ma, Feiyu Duan, Zhiqi Bai, Jiakai Wang, Yuanxing Zhang, Xu Tan, Jie Fu, Wenbo Su, Jiamang Wang, Lin Qu, Bo Zheng

    Abstract: Continual Pre-Training (CPT) on Large Language Models (LLMs) has been widely used to expand the model's fundamental understanding of specific downstream domains (e.g., math and code). For the CPT on domain-specific LLMs, one important question is how to choose the optimal mixture ratio between the general-corpus (e.g., Dolma, Slim-pajama) and the downstream domain-corpus. Existing methods usually… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  34. arXiv:2406.01359  [pdf, other

    cs.CL cs.SE

    R2C2-Coder: Enhancing and Benchmarking Real-world Repository-level Code Completion Abilities of Code Large Language Models

    Authors: Ken Deng, Jiaheng Liu, He Zhu, Congnan Liu, Jingxin Li, Jiakai Wang, Peng Zhao, Chenchen Zhang, Yanan Wu, Xueqiao Yin, Yuanxing Zhang, Wenbo Su, Bangyu Xiang, Tiezheng Ge, Bo Zheng

    Abstract: Code completion models have made significant progress in recent years. Recently, repository-level code completion has drawn more attention in modern software development, and several baseline methods and benchmarks have been proposed. However, existing repository-level code completion methods often fall short of fully using the extensive context of a project repository, such as the intricacies of… ▽ More

    Submitted 3 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

  35. arXiv:2406.01332  [pdf, ps, other

    hep-ex

    Measurements of the branching fractions of semileptonic $D^{+}_s$ decays via $e^+e^-\to D_s^{*+}D_s^{*-}$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (638 additional authors not shown)

    Abstract: We measure the absolute branching fractions of semileptonic $D^+_s$ decays via the $e^+e^-\to D_s^{*+}D_s^{*-}$ process using $e^+e^-$ collision data corresponding to an integrated luminosity of $10.64~\mathrm{fb}^{-1}$ collected by the BESIII detector at center-of-mass energies between 4.237 and 4.699 GeV. The branching fractions are… ▽ More

    Submitted 4 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: 14 pages, 3 figures

  36. arXiv:2406.01210  [pdf, other

    cs.CV

    GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer

    Authors: Ding Jia, Jianyuan Guo, Kai Han, Han Wu, Chao Zhang, Chang Xu, Xinghao Chen

    Abstract: Cross-modal transformers have demonstrated superiority in various vision tasks by effectively integrating different modalities. This paper first critiques prior token exchange methods which replace less informative tokens with inter-modal features, and demonstrate exchange based methods underperform cross-attention mechanisms, while the computational demand of the latter inevitably restricts its u… ▽ More

    Submitted 3 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted by ICML 2024, code and models are available at https://github.com/JiaDingCN/GeminiFusion

  37. arXiv:2406.01127  [pdf, other

    cs.CV

    Learning Adaptive Fusion Bank for Multi-modal Salient Object Detection

    Authors: Kunpeng Wang, Zhengzheng Tu, Chenglong Li, Cheng Zhang, Bin Luo

    Abstract: Multi-modal salient object detection (MSOD) aims to boost saliency detection performance by integrating visible sources with depth or thermal infrared ones. Existing methods generally design different fusion schemes to handle certain issues or challenges. Although these fusion schemes are effective at addressing specific issues or challenges, they may struggle to handle multiple complex challenges… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted by TCSVT 2024

  38. arXiv:2406.01103  [pdf, other

    cs.AI cs.HC cs.LG

    Advancing DRL Agents in Commercial Fighting Games: Training, Integration, and Agent-Human Alignment

    Authors: Chen Zhang, Qiang He, Zhou Yuan, Elvis S. Liu, Hong Wang, Jian Zhao, Yang Wang

    Abstract: Deep Reinforcement Learning (DRL) agents have demonstrated impressive success in a wide range of game genres. However, existing research primarily focuses on optimizing DRL competence rather than addressing the challenge of prolonged player interaction. In this paper, we propose a practical DRL agent system for fighting games named Shūkai, which has been successfully deployed to Naruto Mobile, a p… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Accept at ICML 2024

  39. arXiv:2406.01043  [pdf, ps, other

    math.AP

    Generalized Young Measure Solutions for a Class of Quasilinear Parabolic Equations with Linear Growth

    Authors: Jingfeng Shao, Zhichang Guo, Chao Zhang

    Abstract: Using the generalized Young measure theory, we extend the theory of Young measure solutions to a class of quasilinear parabolic equations with linear growth, and introduce the concept of generalized Young measure solutions. We prove the existence and uniqueness of the generalized Young measure solutions. In addition, for the gradient flow of convex parabolic variational integral, we show that the… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: This paper extend the theory of Young measure solutions to a class of quasilinear parabolic equations with linear growth, and introduce a concept of generalized Young measure solutions

    MSC Class: 35C99; 35D99; 35K59

  40. arXiv:2406.01007  [pdf, other

    hep-ex

    Measurement of Electron Antineutrino Oscillation Amplitude and Frequency via Neutron Capture on Hydrogen at Daya Bay

    Authors: Daya Bay collaboration, F. P. An, W. D. Bai, A. B. Balantekin, M. Bishai, S. Blyth, G. F. Cao, J. Cao, J. F. Chang, Y. Chang, H. S. Chen, H. Y. Chen, S. M. Chen, Y. Chen, Y. X. Chen, Z. Y. Chen, J. Cheng, J. Cheng, Y. -C. Cheng, Z. K. Cheng, J. J. Cherwinka, M. C. Chu, J. P. Cummings, O. Dalager, F. S. Deng , et al. (177 additional authors not shown)

    Abstract: This Letter reports the first measurement of the oscillation amplitude and frequency of reactor antineutrinos at Daya Bay via neutron capture on hydrogen using 1958 days of data. With over 3.6 million signal candidates, an optimized candidate selection, improved treatment of backgrounds and efficiencies, refined energy calibration, and an energy response model for the capture-on-hydrogen sensitive… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  41. arXiv:2406.00654  [pdf, other

    cs.CL cs.SD eess.AS

    Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback

    Authors: Chen Chen, Yuchen Hu, Wen Wu, Helin Wang, Eng Siong Chng, Chao Zhang

    Abstract: In recent years, text-to-speech (TTS) technology has witnessed impressive advancements, particularly with large-scale training datasets, showcasing human-level speech quality and impressive zero-shot capabilities on unseen speakers. However, despite human subjective evaluations, such as the mean opinion score (MOS), remaining the gold standard for assessing the quality of synthetic speech, even st… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: 19 pages, Preprint

  42. arXiv:2406.00562  [pdf, other

    cs.CL

    SPAGHETTI: Open-Domain Question Answering from Heterogeneous Data Sources with Retrieval and Semantic Parsing

    Authors: Heidi C. Zhang, Sina J. Semnani, Farhad Ghassemi, Jialiang Xu, Shicheng Liu, Monica S. Lam

    Abstract: We introduce SPAGHETTI: Semantic Parsing Augmented Generation for Hybrid English information from Text Tables and Infoboxes, a hybrid question-answering (QA) pipeline that utilizes information from heterogeneous knowledge sources, including knowledge base, text, tables, and infoboxes. Our LLM-augmented approach achieves state-of-the-art performance on the Compmix dataset, the most comprehensive he… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: ACL Findings 2024

  43. arXiv:2406.00468  [pdf, other

    cond-mat.mtrl-sci physics.chem-ph

    Molecular Modelling of Aqueous Batteries

    Authors: Alicia van Hees, Zhan-Yun Zhang, Aishwarya Sudhama, Chao Zhang

    Abstract: Aqueous batteries play an increasingly important role for the development of sustainable and safety-prioritised energy storage solutions. Compared to conventional lithium-ion batteries, the cell chemistry in aqueous batteries share many common features with those of electrolyzer and pseudo-capacitor systems because of the involvement of aqueous electrolyte and proton activity. This imposes the nee… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  44. arXiv:2406.00235  [pdf, other

    hep-ex

    Amplitude analysis of the radiative decay $B^0_s\to K^+K^-γ$

    Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1061 additional authors not shown)

    Abstract: A search for radiative decay of $B^0_s$ mesons to orbitally excited $K^+K^-$ states is performed using proton proton collisions recorded by the \mbox{LHCb}\xspace experiment, corresponding to an integrated luminosity of 9~fb$^{-1}$. The dikaon spectrum in the mass range $m_{KK}<2400$~{\ensuremath{\,\text{Me\kern -0.1em V\!/}c^2}\xspace} is dominated by the $φ(1020)$ resonance that accounts for alm… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

    Comments: All figures and tables, along with any supplementary material and additional information, are available at https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2024-002.html (LHCb public pages)

    Report number: LHCb-PAPER-2024-002, CERN-EP-2024-115

  45. arXiv:2406.00085  [pdf, other

    eess.IV cs.LG q-bio.NC

    Augmentation-based Unsupervised Cross-Domain Functional MRI Adaptation for Major Depressive Disorder Identification

    Authors: Yunling Ma, Chaojun Zhang, Xiaochuan Wang, Qianqian Wang, Liang Cao, Limei Zhang, Mingxia Liu

    Abstract: Major depressive disorder (MDD) is a common mental disorder that typically affects a person's mood, cognition, behavior, and physical health. Resting-state functional magnetic resonance imaging (rs-fMRI) data are widely used for computer-aided diagnosis of MDD. While multi-site fMRI data can provide more data for training reliable diagnostic models, significant cross-site data heterogeneity would… ▽ More

    Submitted 6 June, 2024; v1 submitted 31 May, 2024; originally announced June 2024.

  46. arXiv:2406.00025  [pdf, other

    cs.CL cs.AI

    SCALM: Towards Semantic Caching for Automated Chat Services with Large Language Models

    Authors: Jiaxing Li, Chi Xu, Feng Wang, Isaac M von Riedemann, Cong Zhang, Jiangchuan Liu

    Abstract: Large Language Models (LLMs) have become increasingly popular, transforming a wide range of applications across various domains. However, the real-world effectiveness of their query cache systems has not been thoroughly investigated. In this work, we for the first time conducted an analysis on real-world human-to-LLM interaction data, identifying key challenges in existing caching solutions for LL… ▽ More

    Submitted 24 May, 2024; originally announced June 2024.

  47. arXiv:2405.21013  [pdf, other

    cs.CV

    StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and Beyond

    Authors: Pengyuan Lyu, Yulin Li, Hao Zhou, Weihong Ma, Xingyu Wan, Qunyi Xie, Liang Wu, Chengquan Zhang, Kun Yao, Errui Ding, Jingdong Wang

    Abstract: Text-rich images have significant and extensive value, deeply integrated into various aspects of human life. Notably, both visual cues and linguistic symbols in text-rich images play crucial roles in information transmission but are accompanied by diverse challenges. Therefore, the efficient and effective understanding of text-rich images is a crucial litmus test for the capability of Vision-Langu… ▽ More

    Submitted 4 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

  48. arXiv:2405.21004  [pdf, other

    cs.HC cs.ET

    MunchSonic: Tracking Fine-grained Dietary Actions through Active Acoustic Sensing on Eyeglasses

    Authors: Saif Mahmud, Devansh Agarwal, Ashwin Ajit, Qikang Liang, Thalia Viranda, Francois Guimbretiere, Cheng Zhang

    Abstract: We introduce MunchSonic, an AI-powered active acoustic sensing system integrated into eyeglasses, designed to track fine-grained dietary actions like hand-to-mouth movements for food intake, chewing, and drinking. MunchSonic emits inaudible ultrasonic waves from a commodity eyeglass frame. The reflected signals contain rich information about the position and movements of various body parts, includ… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: 8 pages, 7 figures

  49. arXiv:2405.20990  [pdf, other

    cs.CR cs.AI cs.LG

    Locking Machine Learning Models into Hardware

    Authors: Eleanor Clifford, Adhithya Saravanan, Harry Langford, Cheng Zhang, Yiren Zhao, Robert Mullins, Ilia Shumailov, Jamie Hayes

    Abstract: Modern Machine Learning models are expensive IP and business competitiveness often depends on keeping this IP confidential. This in turn restricts how these models are deployed -- for example it is unclear how to deploy a model on-device without inevitably leaking the underlying model. At the same time, confidential computing technologies such as Multi-Party Computation or Homomorphic encryption r… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: 10 pages, 2 figures of main text; 14 pages, 16 figures of appendices

  50. arXiv:2405.20984  [pdf, other

    cs.LG

    Bayesian Design Principles for Offline-to-Online Reinforcement Learning

    Authors: Hao Hu, Yiqin Yang, Jianing Ye, Chengjie Wu, Ziqing Mai, Yujing Hu, Tangjie Lv, Changjie Fan, Qianchuan Zhao, Chongjie Zhang

    Abstract: Offline reinforcement learning (RL) is crucial for real-world applications where exploration can be costly or unsafe. However, offline learned policies are often suboptimal, and further online fine-tuning is required. In this paper, we tackle the fundamental dilemma of offline-to-online fine-tuning: if the agent remains pessimistic, it may fail to learn a better policy, while if it becomes optimis… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: Forty-first International Conference on Machine Learning (ICML), 2024