Skip to main content

Showing 1–50 of 4,639 results for author: Wu, X

  1. arXiv:2407.11780  [pdf, other

    cs.CL cs.AI

    SwitchCIT: Switching for Continual Instruction Tuning of Large Language Models

    Authors: Xinbo Wu, Max Hartman, Vidhata Arjun Jayaraman, Lav R. Varshney

    Abstract: Large language models (LLMs) have exhibited impressive capabilities in various domains, particularly in general language understanding. However these models, trained on massive text data, may not be finely optimized for specific tasks triggered by instructions. Continual instruction tuning is crucial to adapt LLMs to evolving tasks and domains, ensuring their effectiveness and relevance across a w… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  2. arXiv:2407.11727  [pdf, ps, other

    hep-ex hep-ph

    Measurement of the branching fraction of $D^+_s\to \ell^+ν_\ell$ via $e^+e^-\to D^{*+}_{s} D^{*-}_{s}$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (634 additional authors not shown)

    Abstract: Based on $10.64~\mathrm{fb}^{-1}$ of $e^+e^-$ collision data taken at center-of-mass energies between 4.237 and 4.699 GeV with the BESIII detector, we study the leptonic $D^+_s$ decays using the $e^+e^-\to D^{*+}_{s} D^{*-}_{s}$ process. The branching fractions of $D_s^+\to\ell^+ν_{\ell}\,(\ell=μ,τ)$ are measured to be $\mathcal{B}(D_s^+\toμ^+ν_μ)=(\bfmuv)\%$ and… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 27 pages, 13 figures

  3. arXiv:2407.11552  [pdf, other

    hep-th hep-ph

    Holographic Lifshitz flows

    Authors: Matteo Baggioli, Oriol Pujolas, Xin-Meng Wu

    Abstract: Without Lorentz symmetry, generic fixed points of the renormalization group (RG) are labelled by their dynamical (or `Lifshitz') exponent $z$. Hence, a rich variety of possible RG flows arises. The first example is already given by the standard non-relativistic limit, which can be viewed as the flow from a $z=1$ UV fixed point to a $z=2$ IR fixed point. In strongly coupled theories, there are good… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: v1: comments welcome

  4. arXiv:2407.11044  [pdf, other

    cs.LG cs.AI

    Generalizing soft actor-critic algorithms to discrete action spaces

    Authors: Le Zhang, Yong Gu, Xin Zhao, Yanshuo Zhang, Shu Zhao, Yifei Jin, Xinxin Wu

    Abstract: ATARI is a suite of video games used by reinforcement learning (RL) researchers to test the effectiveness of the learning algorithm. Receiving only the raw pixels and the game score, the agent learns to develop sophisticated strategies, even to the comparable level of a professional human games tester. Ideally, we also want an agent requiring very few interactions with the environment. Previous co… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: Chinese Conference on Pattern Recognition and Computer Vision (PRCV) 2024. GitHub Repo https://github.com/lezhang-thu/bigger-better-faster-SAC

  5. arXiv:2407.10770  [pdf, ps, other

    math.OC

    Globally-Constrained Decentralized Optimization with Variable Coupling

    Authors: Dandan Wang, Xuyang Wu, Zichong Ou, Jie Lu

    Abstract: Many realistic decision-making problems in networked scenarios, such as formation control and collaborative task offloading, often involve complicatedly entangled local decisions, which, however, have not been sufficiently investigated yet. Motivated by this, we study a class of decentralized optimization problems with a variable coupling structure that is new to the literature. Specifically, we c… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  6. arXiv:2407.10484  [pdf, other

    cs.CV cs.LG

    Understanding Matrix Function Normalizations in Covariance Pooling through the Lens of Riemannian Geometry

    Authors: Ziheng Chen, Yue Song, Xiao-Jun Wu, Gaowen Liu, Nicu Sebe

    Abstract: Global Covariance Pooling (GCP) has been demonstrated to improve the performance of Deep Neural Networks (DNNs) by exploiting second-order statistics of high-level representations. GCP typically performs classification of the covariance matrices by applying matrix function normalization, such as matrix logarithm or power, followed by a Euclidean classifier. However, covariance matrices inherently… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 24 pages, 3 figures

  7. arXiv:2407.10376  [pdf, other

    q-bio.NC cs.CL

    Large Language Model-based FMRI Encoding of Language Functions for Subjects with Neurocognitive Disorder

    Authors: Yuejiao Wang, Xianmin Gong, Lingwei Meng, Xixin Wu, Helen Meng

    Abstract: Functional magnetic resonance imaging (fMRI) is essential for developing encoding models that identify functional changes in language-related brain areas of individuals with Neurocognitive Disorders (NCD). While large language model (LLM)-based fMRI encoding has shown promise, existing studies predominantly focus on healthy, young adults, overlooking older NCD populations and cognitive level corre… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: 5 pages, accepted by Interspeech 2024

  8. arXiv:2407.10195  [pdf, other

    cs.CV

    V2I-Calib: A Novel Calibration Approach for Collaborative Vehicle and Infrastructure LiDAR Systems

    Authors: Qianxin Qu, Yijin Xiong, Xin Wu, Hanyu Li, Shichun Guo

    Abstract: Cooperative vehicle and infrastructure LiDAR systems hold great potential, yet their implementation faces numerous challenges. Calibration of LiDAR systems across heterogeneous vehicle and infrastructure endpoints is a critical step to ensure the accuracy and consistency of perception system data, necessitating calibration methods that are real-time and stable. To this end, this paper introduces a… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: to be published in 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS2024)

  9. arXiv:2407.10156  [pdf, other

    astro-ph.HE

    Triggering the Untriggered: The First Einstein Probe-Detected Gamma-Ray Burst 240219A and Its Implications

    Authors: Yi-Han Iris Yin, Bin-Bin Zhang, Jun Yang, Hui Sun, Chen Zhang, Yi-Xuan Shao, You-Dong Hu, Zi-Pei Zhu, Dong Xu, Li An, He Gao, Xue-Feng Wu, Bing Zhang, Alberto Javier Castro-Tirado, Shashi B. Pandey, Arne Rau, Weihua Lei, Wei Xie, Giancarlo Ghirlanda, Luigi Piro, Paul O'Brien, Eleonora Troja, Peter Jonker, Yun-Wei Yu, Jie An , et al. (26 additional authors not shown)

    Abstract: The Einstein Probe (EP) achieved its first detection and localization of a bright X-ray flare, EP240219a, on February 19, 2024, during its commissioning phase. Subsequent targeted searches triggered by the EP240219a alert identified a faint, untriggered gamma-ray burst (GRB) in the archived data of Fermi/GBM, Swift/BAT, Insight-HXMT/HE and INTEGRAL/SPI-ACS. The EP/WXT light curve reveals a long du… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: 14 pages, 8 figures, 3 tables

  10. arXiv:2407.10131  [pdf, other

    cs.CV

    WPS-SAM: Towards Weakly-Supervised Part Segmentation with Foundation Models

    Authors: Xinjian Wu, Ruisong Zhang, Jie Qin, Shijie Ma, Cheng-Lin Liu

    Abstract: Segmenting and recognizing diverse object parts is crucial in computer vision and robotics. Despite significant progress in object segmentation, part-level segmentation remains underexplored due to complex boundaries and scarce annotated data. To address this, we propose a novel Weakly-supervised Part Segmentation (WPS) setting and an approach called WPS-SAM, built on the large-scale pre-trained v… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  11. arXiv:2407.09943  [pdf, other

    cs.CL

    Minimizing PLM-Based Few-Shot Intent Detectors

    Authors: Haode Zhang, Xiao-Ming Wu, Albert Y. S. Lam

    Abstract: Recent research has demonstrated the feasibility of training efficient intent detectors based on pre-trained language model~(PLM) with limited labeled data. However, deploying these detectors in resource-constrained environments such as mobile devices poses challenges due to their large sizes. In this work, we aim to address this issue by exploring techniques to minimize the size of PLM-based inte… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  12. arXiv:2407.09876  [pdf, other

    astro-ph.HE hep-ph

    Detection of hidden emissions in two rotating radio transients with high surface magnetic fields

    Authors: S. B. Zhang, X. Yang, J. J. Geng, Y. P. Yang, X. F. Wu

    Abstract: Rotating Radio Transients (RRATs) are neutron stars emitting sporadic radio pulses. The unique emission of RRATs has been proposed to resemble those of known pulsar types, such as extreme nulling pulsars or pulsars with giant pulses. However, the presence of additional radiation beyond these sporadic pulses remains unclear. Through high-sensitivity observations and extended tracking, we detected t… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: 10 pages, 1 table, 6 figures

  13. arXiv:2407.09817  [pdf, other

    cs.SD cs.CL eess.AS

    Empowering Whisper as a Joint Multi-Talker and Target-Talker Speech Recognition System

    Authors: Lingwei Meng, Jiawen Kang, Yuejiao Wang, Zengrui Jin, Xixin Wu, Xunying Liu, Helen Meng

    Abstract: Multi-talker speech recognition and target-talker speech recognition, both involve transcription in multi-talker contexts, remain significant challenges. However, existing methods rarely attempt to simultaneously address both tasks. In this study, we propose a pioneering approach to empower Whisper, which is a speech foundation model, to tackle joint multi-talker and target-talker speech recogniti… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: Accepted to INTERSPEECH 2024

  14. arXiv:2407.09816  [pdf, other

    cs.CL

    MaskMoE: Boosting Token-Level Learning via Routing Mask in Mixture-of-Experts

    Authors: Zhenpeng Su, Zijia Lin, Xue Bai, Xing Wu, Yizhe Xiong, Haoran Lian, Guangyuan Ma, Hui Chen, Guiguang Ding, Wei Zhou, Songlin Hu

    Abstract: Scaling model capacity enhances its capabilities but significantly increases computation. Mixture-of-Experts models (MoEs) address this by allowing model capacity to scale without substantially increasing training or inference costs. Despite their promising results, MoE models encounter several challenges. Primarily, the dispersion of training tokens across multiple experts can lead to underfittin… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: Work in progress

  15. arXiv:2407.09787  [pdf, other

    cs.CV

    Semi-supervised 3D Object Detection with PatchTeacher and PillarMix

    Authors: Xiaopei Wu, Liang Peng, Liang Xie, Yuenan Hou, Binbin Lin, Xiaoshui Huang, Haifeng Liu, Deng Cai, Wanli Ouyang

    Abstract: Semi-supervised learning aims to leverage numerous unlabeled data to improve the model performance. Current semi-supervised 3D object detection methods typically use a teacher to generate pseudo labels for a student, and the quality of the pseudo labels is essential for the final performance. In this paper, we propose PatchTeacher, which focuses on partial scene 3D object detection to provide high… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: Accepted by AAAI 2024

  16. arXiv:2407.09751  [pdf, other

    cs.CV

    TASeg: Temporal Aggregation Network for LiDAR Semantic Segmentation

    Authors: Xiaopei Wu, Yuenan Hou, Xiaoshui Huang, Binbin Lin, Tong He, Xinge Zhu, Yuexin Ma, Boxi Wu, Haifeng Liu, Deng Cai, Wanli Ouyang

    Abstract: Training deep models for LiDAR semantic segmentation is challenging due to the inherent sparsity of point clouds. Utilizing temporal data is a natural remedy against the sparsity problem as it makes the input signal denser. However, previous multi-frame fusion algorithms fall short in utilizing sufficient temporal information due to the memory constraint, and they also ignore the informative tempo… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Accepted by CVPR 2024

  17. arXiv:2407.09355  [pdf

    q-bio.GN cs.AI

    FastImpute: A Baseline for Open-source, Reference-Free Genotype Imputation Methods -- A Case Study in PRS313

    Authors: Aaron Ge, Jeya Balasubramanian, Xueyao Wu, Peter Kraft, Jonas S. Almeida

    Abstract: Genotype imputation enhances genetic data by predicting missing SNPs using reference haplotype information. Traditional methods leverage linkage disequilibrium (LD) to infer untyped SNP genotypes, relying on the similarity of LD structures between genotyped target sets and fully sequenced reference panels. Recently, reference-free deep learning-based methods have emerged, offering a promising alte… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: This paper is 16 pages long and contains 7 figures. For more information and to access related resources: * Web application: https://aaronge-2020.github.io/DeepImpute/ * Code repository: https://github.com/aaronge-2020/DeepImpute

  18. arXiv:2407.08570  [pdf, ps, other

    hep-ph hep-th

    Determination of the QCD running coupling in the entire perturbative regime from a single experiment using the Principle of Maximum Conformality

    Authors: Leonardo Di Giustino, Stanley J. Brodsky, Philip G. Ratcliffe, Sheng-Quan Wang, Xing-Gang Wu

    Abstract: We present a new approach for determining the strong coupling $α_s(Q)$ over the entire perturbative range of validity, for scales from $Λ_{\mathrm{QCD}}$ up to the Planck scale ${\sim}10^{19}$\,GeV, with the highest precision and using the data of just a single experiment. The results obtained with this method are consistent with world averages and exhibit improved precision with respect to previo… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 7 pages, 4 figures

    Report number: SLAC-PUB-17782

  19. arXiv:2407.08551  [pdf, other

    cs.CL cs.SD eess.AS

    Autoregressive Speech Synthesis without Vector Quantization

    Authors: Lingwei Meng, Long Zhou, Shujie Liu, Sanyuan Chen, Bing Han, Shujie Hu, Yanqing Liu, Jinyu Li, Sheng Zhao, Xixin Wu, Helen Meng, Furu Wei

    Abstract: We present MELLE, a novel continuous-valued tokens based language modeling approach for text to speech synthesis (TTS). MELLE autoregressively generates continuous mel-spectrogram frames directly from text condition, bypassing the need for vector quantization, which are originally designed for audio compression and sacrifice fidelity compared to mel-spectrograms. Specifically, (i) instead of cross… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  20. arXiv:2407.08366  [pdf, other

    cs.RO cs.CV

    An Economic Framework for 6-DoF Grasp Detection

    Authors: Xiao-Ming Wu, Jia-Feng Cai, Jian-Jian Jiang, Dian Zheng, Yi-Lin Wei, Wei-Shi Zheng

    Abstract: Robotic grasping in clutters is a fundamental task in robotic manipulation. In this work, we propose an economic framework for 6-DoF grasp detection, aiming to economize the resource cost in training and meanwhile maintain effective grasp performance. To begin with, we discover that the dense supervision is the bottleneck of current SOTA methods that severely encumbers the entire training overload… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 19 pages, 7 figures. Accepted in ECCV 2024!

  21. arXiv:2407.07703  [pdf, other

    math.GR math.KT

    Embedding groups into boundedly acyclic groups

    Authors: Fan Wu, Xiaolei Wu, Mengfei Zhao, Zixiang Zhou

    Abstract: We show that the labeled Thompson groups and the twisted Brin--Thompson groups are boundedly acyclic. This allows us to prove several new embedding results for groups. First, every group of type $F_n$ embeds quasi-isometrically into a boundedly acyclic group of type $F_n$ that has no proper finite index subgroups. This improves a result of Bridson \cite{Br98} and a theorem of Fournier-Facio--Löh--… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 30 pages; comments welcome!

    MSC Class: 57M07; 21J06

  22. arXiv:2407.07651  [pdf, other

    hep-ex physics.data-an

    Study of the decay and production properties of $D_{s1}(2536)$ and $D_{s2}^*(2573)$

    Authors: M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (645 additional authors not shown)

    Abstract: The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  23. arXiv:2407.06975  [pdf

    cond-mat.mtrl-sci

    Optimization of noncollinear magnetic ordering temperature in Y-type hexaferrite by machine learning

    Authors: Yonghong Li, Jing Zhang, Linfeng Jiang, Long Zhang, Yugang Zhang, Xueliang Wu, Yisheng Chai, Xiaoyuan Zhou, Zizhen Zhou

    Abstract: Searching the optimal doping compositions of the Y-type hexaferrite Ba2Mg2Fe12O22 remains a long-standing challenge for enhanced non-collinear magnetic transition temperature (TNC). Instead of the conventional trial-and-error approach, the composition-property descriptor is established via a data driven machine learning method named SISSO (sure independence screening and sparsifying operator). Bas… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: accepted by Applied Physics Letters in 2024

    Journal ref: Appl. Phys. Lett. 125, 032903 (2024)

  24. arXiv:2407.06425  [pdf, other

    quant-ph cond-mat.mes-hall cond-mat.mtrl-sci cond-mat.supr-con physics.app-ph

    Precision frequency tuning of tunable transmon qubits using alternating-bias assisted annealing

    Authors: Xiqiao Wang, Joel Howard, Eyob A. Sete, Greg Stiehl, Cameron Kopas, Stefano Poletto, Xian Wu, Mark Field, Nicholas Sharac, Christopher Eckberg, Hilal Cansizoglu, Raja Katta, Josh Mutus, Andrew Bestwick, Kameshwar Yadavalli, David P. Pappas

    Abstract: Superconducting quantum processors are one of the leading platforms for realizing scalable fault-tolerant quantum computation (FTQC). The recent demonstration of post-fabrication tuning of Josephson junctions using alternating-bias assisted annealing (ABAA) technique and a reduction in junction loss after ABAA illuminates a promising path towards precision tuning of qubit frequency while maintaini… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  25. arXiv:2407.06191  [pdf, other

    cs.CV

    Tailor3D: Customized 3D Assets Editing and Generation with Dual-Side Images

    Authors: Zhangyang Qi, Yunhan Yang, Mengchen Zhang, Long Xing, Xiaoyang Wu, Tong Wu, Dahua Lin, Xihui Liu, Jiaqi Wang, Hengshuang Zhao

    Abstract: Recent advances in 3D AIGC have shown promise in directly creating 3D objects from text and images, offering significant cost savings in animation and product design. However, detailed edit and customization of 3D assets remains a long-standing challenge. Specifically, 3D Generation methods lack the ability to follow finely detailed instructions as precisely as their 2D image creation counterparts… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Project Page: https://tailor3d-2024.github.io/

  26. arXiv:2407.06137  [pdf, ps, other

    cs.CV

    OMuSense-23: A Multimodal Dataset for Contactless Breathing Pattern Recognition and Biometric Analysis

    Authors: Manuel Lage Cañellas, Le Nguyen, Anirban Mukherjee, Constantino Álvarez Casado, Xiaoting Wu, Praneeth Susarla, Sasan Sharifipour, Dinesh B. Jayagopi, Miguel Bordallo López

    Abstract: In the domain of non-contact biometrics and human activity recognition, the lack of a versatile, multimodal dataset poses a significant bottleneck. To address this, we introduce the Oulu Multi Sensing (OMuSense-23) dataset that includes biosignals obtained from a mmWave radar, and an RGB-D camera. The dataset features data from 50 individuals in three distinct poses -- standing, sitting, and lying… ▽ More

    Submitted 22 May, 2024; originally announced July 2024.

  27. arXiv:2407.06113  [pdf, other

    cs.CV

    C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition

    Authors: Rongchang Li, Zhenhua Feng, Tianyang Xu, Linze Li, Xiao-Jun Wu, Muhammad Awais, Sara Atito, Josef Kittler

    Abstract: Compositional actions consist of dynamic (verbs) and static (objects) concepts. Humans can easily recognize unseen compositions using the learned concepts. For machines, solving such a problem requires a model to recognize unseen actions composed of previously observed verbs and objects, thus requiring, so-called, compositional generalization ability. To facilitate this research, we propose a nove… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  28. arXiv:2407.05688  [pdf

    cs.CV cs.AI

    Learning with Alignments: Tackling the Inter- and Intra-domain Shifts for Cross-multidomain Facial Expression Recognition

    Authors: Yuxiang Yang, Lu Wen, Xinyi Zeng, Yuanyuan Xu, Xi Wu, Jiliu Zhou, Yan Wang

    Abstract: Facial Expression Recognition (FER) holds significant importance in human-computer interactions. Existing cross-domain FER methods often transfer knowledge solely from a single labeled source domain to an unlabeled target domain, neglecting the comprehensive information across multiple sources. Nevertheless, cross-multidomain FER (CMFER) is very challenging for (i) the inherent inter-domain shifts… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  29. arXiv:2407.05420  [pdf, ps, other

    cs.IR

    Towards Bridging the Cross-modal Semantic Gap for Multi-modal Recommendation

    Authors: Xinglong Wu, Anfeng Huang, Hongwei Yang, Hui He, Yu Tai, Weizhe Zhang

    Abstract: Multi-modal recommendation greatly enhances the performance of recommender systems by modeling the auxiliary information from multi-modality contents. Most existing multi-modal recommendation models primarily exploit multimedia information propagation processes to enrich item representations and directly utilize modal-specific embedding vectors independently obtained from upstream pre-trained mode… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  30. arXiv:2407.04996  [pdf, other

    cs.LG cs.CV

    The Solution for the sequential task continual learning track of the 2nd Greater Bay Area International Algorithm Competition

    Authors: Sishun Pan, Xixian Wu, Tingmin Li, Longfei Huang, Mingxu Feng, Zhonghua Wan, Yang Yang

    Abstract: This paper presents a data-free, parameter-isolation-based continual learning algorithm we developed for the sequential task continual learning track of the 2nd Greater Bay Area International Algorithm Competition. The method learns an independent parameter subspace for each task within the network's convolutional and linear layers and freezes the batch normalization layers after the first task. S… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  31. arXiv:2407.04994  [pdf, other

    cs.CV cs.LG

    The Solution for Language-Enhanced Image New Category Discovery

    Authors: Haonan Xu, Dian Chao, Xiangyu Wu, Zhonghua Wan, Yang Yang

    Abstract: Treating texts as images, combining prompts with textual labels for prompt tuning, and leveraging the alignment properties of CLIP have been successfully applied in zero-shot multi-label image recognition. Nonetheless, relying solely on textual labels to store visual information is insufficient for representing the diversity of visual objects. In this paper, we propose reversing the training proce… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  32. arXiv:2407.04726  [pdf, other

    eess.SP cs.LG

    Data-Driven Prediction and Uncertainty Quantification of PWR Crud-Induced Power Shift Using Convolutional Neural Networks

    Authors: Aidan Furlong, Farah Alsafadi, Scott Palmtag, Andrew Godfrey, Xu Wu

    Abstract: The development of Crud-Induced Power Shift (CIPS) is an operational challenge in Pressurized Water Reactors that is due to the development of crud on the fuel rod cladding. The available predictive tools developed previously, usually based on fundamental physics, are computationally expensive and have shown differing degrees of accuracy. This work proposes a completely top-down approach to predic… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

  33. arXiv:2407.04697  [pdf, other

    cs.CV cs.MM

    VCoME: Verbal Video Composition with Multimodal Editing Effects

    Authors: Weibo Gong, Xiaojie Jin, Xin Li, Dongliang He, Xinglong Wu

    Abstract: Verbal videos, featuring voice-overs or text overlays, provide valuable content but present significant challenges in composition, especially when incorporating editing effects to enhance clarity and visual appeal. In this paper, we introduce the novel task of verbal video composition with editing effects. This task aims to generate coherent and visually appealing verbal videos by integrating mult… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  34. arXiv:2407.04255  [pdf, other

    cs.CV

    Second Place Solution of WSDM2023 Toloka Visual Question Answering Challenge

    Authors: Xiangyu Wu, Zhouyang Chi, Yang Yang, Jianfeng Lu

    Abstract: In this paper, we present our solution for the WSDM2023 Toloka Visual Question Answering Challenge. Inspired by the application of multimodal pre-trained models to various downstream tasks(e.g., visual question answering, visual grounding, and cross-modal retrieval), we approached this competition as a visual grounding task, where the input is an image and a question, guiding the model to answer t… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Second Place of WSDM2023 Toloka Visual Question Answering Challenge

  35. arXiv:2407.04237  [pdf, other

    cs.CV cs.GR

    GSD: View-Guided Gaussian Splatting Diffusion for 3D Reconstruction

    Authors: Yuxuan Mu, Xinxin Zuo, Chuan Guo, Yilin Wang, Juwei Lu, Xiaofeng Wu, Songcen Xu, Peng Dai, Youliang Yan, Li Cheng

    Abstract: We present GSD, a diffusion model approach based on Gaussian Splatting (GS) representation for 3D object reconstruction from a single view. Prior works suffer from inconsistent 3D geometry or mediocre rendering quality due to improper representations. We take a step towards resolving these shortcomings by utilizing the recent state-of-the-art 3D explicit representation, Gaussian Splatting, and an… ▽ More

    Submitted 10 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

    Comments: Accepted for ECCV 2024

  36. arXiv:2407.04156  [pdf, other

    nucl-th hep-ph nucl-ex

    Probing the equilibration of the QCD matter created in heavy-ion collisions with dileptons

    Authors: Xiang-Yu Wu, Lipei Du, Charles Gale, Sangyong Jeon

    Abstract: A systematic study of intermediate invariant mass dilepton production in Pb+Pb collisions at $\sqrt{s_{NN}} = 5.02$ TeV is performed, using next-to-leading-order (NLO) thermal QCD dilepton emission rates with a multistage dynamical approach which includes event-by-event IP-Glasma initial conditions, relativistic viscous fluid dynamics, and a hadronic afterburner. Considering dilepton yield and ani… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: 13 pages, 10 figures

  37. arXiv:2407.03872  [pdf, other

    cs.CV cs.LG

    The Solution for the GAIIC2024 RGB-TIR object detection Challenge

    Authors: Xiangyu Wu, Jinling Xu, Longfei Huang, Yang Yang

    Abstract: This report introduces a solution to The task of RGB-TIR object detection from the perspective of unmanned aerial vehicles. Unlike traditional object detection methods, RGB-TIR object detection aims to utilize both RGB and TIR images for complementary information during detection. The challenges of RGB-TIR object detection from the perspective of unmanned aerial vehicles include highly complex ima… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  38. arXiv:2407.03788  [pdf, other

    cs.CV cs.CL

    Meta-optimized Angular Margin Contrastive Framework for Video-Language Representation Learning

    Authors: Thong Nguyen, Yi Bin, Xiaobao Wu, Xinshuai Dong, Zhiyuan Hu, Khoi Le, Cong-Duy Nguyen, See-Kiong Ng, Luu Anh Tuan

    Abstract: Data quality stands at the forefront of deciding the effectiveness of video-language representation learning. However, video-text pairs in previous data typically do not align perfectly with each other, which might lead to video-language representations that do not accurately reflect cross-modal semantics. Moreover, previous data also possess an uneven distribution of concepts, thereby hampering t… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  39. arXiv:2407.03591  [pdf, other

    physics.optics

    Controlling quasi-parametric amplifications: From multiple PT-symmetry phase transitions to non-Hermitian sensing

    Authors: Xiaoxiong Wu, Kai Bai, Penghong Yu, Zhaohui Dong, Yanyan He, Jingui Ma, Vladislav V. Yakovlev, Meng Xiao, Xianfeng Chen, Luqi Yuan

    Abstract: Quasi-parametric amplification (QPA) is a nonlinear interaction in which the idler wave is depleted through some loss mechanism. QPA plays an important role in signal amplification in ultrafast photonics and quantum light generation. The QPA process has a number of features characterized by the non-Hermitian parity-time ($\mathcal{PT}$) symmetry. In this report, we explore new interaction regimes… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 17 pages, 6 figures

  40. arXiv:2407.03040  [pdf, other

    cs.CL cs.AI

    Raw Text is All you Need: Knowledge-intensive Multi-turn Instruction Tuning for Large Language Model

    Authors: Xia Hou, Qifeng Li, Jian Yang, Tongliang Li, Linzheng Chai, Xianjie Wu, Hangyuan Ji, Zhoujun Li, Jixuan Nie, Jingbo Dun, Wenfeng Song

    Abstract: Instruction tuning as an effective technique aligns the outputs of large language models (LLMs) with human preference. But how to generate the seasonal multi-turn dialogues from raw documents for instruction tuning still requires further exploration. In this paper, we present a novel framework named R2S that leverages the CoD-Chain of Dialogue logic to guide large language models (LLMs) in generat… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 11 pages, 3 figures

    MSC Class: 68T50 ACM Class: I.2.7

  41. arXiv:2407.02899  [pdf, other

    hep-ex

    Measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (639 additional authors not shown)

    Abstract: A high precision measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$ is performed using $(10 087 \pm 44) \times 10^6$ $J/ψ$ events recorded by the {BESIII} detector at the {BEPCII} storage ring. The branching fractions of the two decays $J/ψ\to p \bar{p} η(η\to γγ)$ and $J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)$ are measured individually to be… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  42. arXiv:2407.02769  [pdf, other

    cs.CV

    Fine-Grained Scene Image Classification with Modality-Agnostic Adapter

    Authors: Yiqun Wang, Zhao Zhou, Xiangcheng Du, Xingjiao Wu, Yingbin Zheng, Cheng Jin

    Abstract: When dealing with the task of fine-grained scene image classification, most previous works lay much emphasis on global visual features when doing multi-modal feature fusion. In other words, models are deliberately designed based on prior intuitions about the importance of different modalities. In this paper, we present a new multi-modal feature fusion approach named MAA (Modality-Agnostic Adapter)… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  43. arXiv:2407.02607  [pdf, other

    math.DG cs.LG math.MG

    Product Geometries on Cholesky Manifolds with Applications to SPD Manifolds

    Authors: Ziheng Chen, Yue Song, Xiao-Jun Wu, Nicu Sebe

    Abstract: This paper presents two new metrics on the Symmetric Positive Definite (SPD) manifold via the Cholesky manifold, i.e., the space of lower triangular matrices with positive diagonal elements. We first unveil that the existing popular Riemannian metric on the Cholesky manifold can be generally characterized as the product metric of a Euclidean metric and a Riemannian metric on the space of n-dimensi… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 25 pages, 1 figures

    MSC Class: 47A64; 26E60; 53C22; 15B48; 58D17; 53C20; 58B20

  44. arXiv:2407.02318  [pdf, other

    cs.SD cs.CV cs.LG eess.AS

    The Solution for Temporal Sound Localisation Task of ICCV 1st Perception Test Challenge 2023

    Authors: Yurui Huang, Yang Yang, Shou Chen, Xiangyu Wu, Qingguo Chen, Jianfeng Lu

    Abstract: In this paper, we propose a solution for improving the quality of temporal sound localization. We employ a multimodal fusion approach to combine visual and audio features. High-quality visual features are extracted using a state-of-the-art self-supervised pre-training network, resulting in efficient video feature representations. At the same time, audio features serve as complementary information… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  45. arXiv:2407.01850  [pdf, other

    cs.CL

    Purple-teaming LLMs with Adversarial Defender Training

    Authors: Jingyan Zhou, Kun Li, Junan Li, Jiawen Kang, Minda Hu, Xixin Wu, Helen Meng

    Abstract: Existing efforts in safeguarding LLMs are limited in actively exposing the vulnerabilities of the target LLM and readily adapting to newly emerging safety risks. To address this, we present Purple-teaming LLMs with Adversarial Defender training (PAD), a pipeline designed to safeguard LLMs by novelly incorporating the red-teaming (attack) and blue-teaming (safety training) techniques. In PAD, we au… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  46. arXiv:2407.01700  [pdf, other

    eess.SY

    Joint Design of Conventional Public Transport Network and Mobility on Demand

    Authors: Xiaoyi Wu, Nisrine Mouhrim, Andrea Araldo, Yves Molenbruch, Dominique Feillet, Kris Braekers

    Abstract: Conventional Public Transport (PT) is based on fixed lines, running with routes and schedules determined a-priori. In low-demand areas, conventional PT is inefficient. Therein, Mobility on Demand (MoD) could serve users more efficiently and with an improved quality of service (QoS). The idea of integrating MoD into PT is therefore abundantly discussed by researchers and practitioners, mainly in th… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 26th Euro Working Group on Transportation Meeting

  47. arXiv:2407.01638  [pdf, other

    cs.SE cs.AI cs.DC cs.PL

    LASSI: An LLM-based Automated Self-Correcting Pipeline for Translating Parallel Scientific Codes

    Authors: Matthew T. Dearing, Yiheng Tao, Xingfu Wu, Zhiling Lan, Valerie Taylor

    Abstract: This paper addresses the problem of providing a novel approach to sourcing significant training data for LLMs focused on science and engineering. In particular, a crucial challenge is sourcing parallel scientific codes in the ranges of millions to billions of codes. To tackle this problem, we propose an automated pipeline framework, called LASSI, designed to translate between parallel programming… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  48. arXiv:2407.01271  [pdf, other

    cs.CL

    First Place Solution of 2023 Global Artificial Intelligence Technology Innovation Competition Track 1

    Authors: Xiangyu Wu, Hailiang Zhang, Yang Yang, Jianfeng Lu

    Abstract: In this paper, we present our champion solution to the Global Artificial Intelligence Technology Innovation Competition Track 1: Medical Imaging Diagnosis Report Generation. We select CPT-BASE as our base model for the text generation task. During the pre-training stage, we delete the mask language modeling task of CPT-BASE and instead reconstruct the vocabulary, adopting a span mask strategy and… ▽ More

    Submitted 3 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

    Comments: First Place of 2023 Global Artificial Intelligence Technology Innovation Competition

  49. arXiv:2407.01213  [pdf, other

    cs.SI

    EMIF: Evidence-aware Multi-source Information Fusion Network for Explainable Fake News Detection

    Authors: Qingxing Dong, Mengyi Zhang, Shiyuan Wu, Xiaozhen Wu

    Abstract: Extensive research on automatic fake news detection has been conducted due to the significant detrimental effects of fake news proliferation. Most existing approaches rely on a single source of evidence, such as comments or relevant news, to derive explanatory evidence for decision-making, demonstrating exceptional performance. However, their single evidence source suffers from two critical drawba… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  50. arXiv:2407.00136  [pdf, other

    hep-ex

    Observation of the Electromagnetic Dalitz Transition $h_c \rightarrow e^+e^-η_c$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, S. Ahmed, M. Albrecht, R. Aliberti, A. Amoroso, M. R. An, Q. An, X. H. Bai, Y. Bai, O. Bakina, R. Baldini Ferroli, I. Balossino, Y. Ban, K. Begzsuren, N. Berger, M. Bertani, D. Bettoni, F. Bianchi, J. Bloms, A. Bortone, I. Boyko, R. A. Briere , et al. (495 additional authors not shown)

    Abstract: Using $(27.12\pm 0.14)\times10^8$ $ψ(3686)$ decays and data samples of $e^+e^-$ collisions with $\sqrt{s}$ from 4.130 to 4.780~GeV collected with the BESIII detector, we report the first observation of the electromagnetic Dalitz transition $h_c\to e^+e^-η_c$ with a statistical significance of $5.4σ$. We measure the ratio of the branching fractions… ▽ More

    Submitted 2 July, 2024; v1 submitted 28 June, 2024; originally announced July 2024.