subscribe to arXiv mailings

Search for the rare $Λ_c^+ \to p μ^+ μ^-$ decay

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1062 additional authors not shown)

Abstract: A search for the nonresonant $Λ_c^+ \to p μ^+ μ^-$ decay is performed using proton-proton collision data recorded at a centre-of-mass energy of 13 TeV by the LHCb experiment, corresponding to an integrated luminosity of 5.4 fb$^{-1}$. No evidence for the decay is found in the dimuon invariant-mass regions where the expected contributions of resonances is subdominant. The upper limit on the branchi… ▽ More A search for the nonresonant $Λ_c^+ \to p μ^+ μ^-$ decay is performed using proton-proton collision data recorded at a centre-of-mass energy of 13 TeV by the LHCb experiment, corresponding to an integrated luminosity of 5.4 fb$^{-1}$. No evidence for the decay is found in the dimuon invariant-mass regions where the expected contributions of resonances is subdominant. The upper limit on the branching fraction of the $Λ_c^+ \to p μ^+ μ^-$ decay is determined to be $2.9~(3.2) \times 10^{-8}$ at 90% (95%) confidence level. The branching fractions in the dimuon invariant-mass regions dominated by the $η$, $ρ$ and $ω$ resonances are also determined. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: All figures and tables, along with any supplementary material and additional information, are available at https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2024-005.html (LHCb public pages)

Report number: LHCb-PAPER-2024-005, CERN-EP-2024-158

arXiv:2407.11398 [pdf, other]

Animate3D: Animating Any 3D Model with Multi-view Video Diffusion

Authors: Yanqin Jiang, Chaohui Yu, Chenjie Cao, Fan Wang, Weiming Hu, Jin Gao

Abstract: Recent advances in 4D generation mainly focus on generating 4D content by distilling pre-trained text or single-view image-conditioned models. It is inconvenient for them to take advantage of various off-the-shelf 3D assets with multi-view attributes, and their results suffer from spatiotemporal inconsistency owing to the inherent ambiguity in the supervision signals. In this work, we present Anim… ▽ More Recent advances in 4D generation mainly focus on generating 4D content by distilling pre-trained text or single-view image-conditioned models. It is inconvenient for them to take advantage of various off-the-shelf 3D assets with multi-view attributes, and their results suffer from spatiotemporal inconsistency owing to the inherent ambiguity in the supervision signals. In this work, we present Animate3D, a novel framework for animating any static 3D model. The core idea is two-fold: 1) We propose a novel multi-view video diffusion model (MV-VDM) conditioned on multi-view renderings of the static 3D object, which is trained on our presented large-scale multi-view video dataset (MV-Video). 2) Based on MV-VDM, we introduce a framework combining reconstruction and 4D Score Distillation Sampling (4D-SDS) to leverage the multi-view video diffusion priors for animating 3D objects. Specifically, for MV-VDM, we design a new spatiotemporal attention module to enhance spatial and temporal consistency by integrating 3D and video diffusion models. Additionally, we leverage the static 3D model's multi-view renderings as conditions to preserve its identity. For animating 3D models, an effective two-stage pipeline is proposed: we first reconstruct motions directly from generated multi-view videos, followed by the introduced 4D-SDS to refine both appearance and motion. Qualitative and quantitative experiments demonstrate that Animate3D significantly outperforms previous approaches. Data, code, and models will be open-released. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: Project Page: https://animate3d.github.io/

arXiv:2407.10990 [pdf]

MedBench: A Comprehensive, Standardized, and Reliable Benchmarking System for Evaluating Chinese Medical Large Language Models

Authors: Mianxin Liu, Jinru Ding, Jie Xu, Weiguo Hu, Xiaoyang Li, Lifeng Zhu, Zhian Bai, Xiaoming Shi, Benyou Wang, Haitao Song, Pengfei Liu, Xiaofan Zhang, Shanshan Wang, Kang Li, Haofen Wang, Tong Ruan, Xuanjing Huang, Xin Sun, Shaoting Zhang

Abstract: Ensuring the general efficacy and goodness for human beings from medical large language models (LLM) before real-world deployment is crucial. However, a widely accepted and accessible evaluation process for medical LLM, especially in the Chinese context, remains to be established. In this work, we introduce "MedBench", a comprehensive, standardized, and reliable benchmarking system for Chinese med… ▽ More Ensuring the general efficacy and goodness for human beings from medical large language models (LLM) before real-world deployment is crucial. However, a widely accepted and accessible evaluation process for medical LLM, especially in the Chinese context, remains to be established. In this work, we introduce "MedBench", a comprehensive, standardized, and reliable benchmarking system for Chinese medical LLM. First, MedBench assembles the currently largest evaluation dataset (300,901 questions) to cover 43 clinical specialties and performs multi-facet evaluation on medical LLM. Second, MedBench provides a standardized and fully automatic cloud-based evaluation infrastructure, with physical separations for question and ground truth. Third, MedBench implements dynamic evaluation mechanisms to prevent shortcut learning and answer remembering. Applying MedBench to popular general and medical LLMs, we observe unbiased, reproducible evaluation results largely aligning with medical professionals' perspectives. This study establishes a significant foundation for preparing the practical applications of Chinese medical LLMs. MedBench is publicly accessible at https://medbench.opencompass.org.cn. △ Less

Submitted 23 June, 2024; originally announced July 2024.

Comments: 25 pages.4 figures

arXiv:2407.10956 [pdf, other]

Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?

Authors: Ruisheng Cao, Fangyu Lei, Haoyuan Wu, Jixuan Chen, Yeqiao Fu, Hongcheng Gao, Xinzhuang Xiong, Hanchong Zhang, Yuchen Mao, Wenjing Hu, Tianbao Xie, Hongshen Xu, Danyang Zhang, Sida Wang, Ruoxi Sun, Pengcheng Yin, Caiming Xiong, Ansong Ni, Qian Liu, Victor Zhong, Lu Chen, Kai Yu, Tao Yu

Abstract: Data science and engineering workflows often span multiple stages, from warehousing to orchestration, using tools like BigQuery, dbt, and Airbyte. As vision language models (VLMs) advance in multimodal understanding and code generation, VLM-based agents could potentially automate these workflows by generating SQL queries, Python code, and GUI operations. This automation can improve the productivit… ▽ More Data science and engineering workflows often span multiple stages, from warehousing to orchestration, using tools like BigQuery, dbt, and Airbyte. As vision language models (VLMs) advance in multimodal understanding and code generation, VLM-based agents could potentially automate these workflows by generating SQL queries, Python code, and GUI operations. This automation can improve the productivity of experts while democratizing access to large-scale data analysis. In this paper, we introduce Spider2-V, the first multimodal agent benchmark focusing on professional data science and engineering workflows, featuring 494 real-world tasks in authentic computer environments and incorporating 20 enterprise-level professional applications. These tasks, derived from real-world use cases, evaluate the ability of a multimodal agent to perform data-related tasks by writing code and managing the GUI in enterprise data software systems. To balance realistic simulation with evaluation simplicity, we devote significant effort to developing automatic configurations for task setup and carefully crafting evaluation metrics for each task. Furthermore, we supplement multimodal agents with comprehensive documents of these enterprise data software systems. Our empirical evaluation reveals that existing state-of-the-art LLM/VLM-based agents do not reliably automate full data workflows (14.0% success). Even with step-by-step guidance, these agents still underperform in tasks that require fine-grained, knowledge-intensive GUI actions (16.2%) and involve remote cloud-hosted workspaces (10.6%). We hope that Spider2-V paves the way for autonomous multimodal agents to transform the automation of data science and engineering workflow. Our code and data are available at https://spider2-v.github.io. △ Less

Submitted 15 July, 2024; originally announced July 2024.

Comments: 34 pages, 14 figures, 10 tables

arXiv:2407.10430 [pdf, other]

Expanding the Scope: Inductive Knowledge Graph Reasoning with Multi-Starting Progressive Propagation

Authors: Zhoutian Shao, Yuanning Cui, Wei Hu

Abstract: Knowledge graphs (KGs) are widely acknowledged as incomplete, and new entities are constantly emerging in the real world. Inductive KG reasoning aims to predict missing facts for these new entities. Among existing models, graph neural networks (GNNs) based ones have shown promising performance for this task. However, they are still challenged by inefficient message propagation due to the distance… ▽ More Knowledge graphs (KGs) are widely acknowledged as incomplete, and new entities are constantly emerging in the real world. Inductive KG reasoning aims to predict missing facts for these new entities. Among existing models, graph neural networks (GNNs) based ones have shown promising performance for this task. However, they are still challenged by inefficient message propagation due to the distance and scalability issues. In this paper, we propose a new inductive KG reasoning model, MStar, by leveraging conditional message passing neural networks (C-MPNNs). Our key insight is to select multiple query-specific starting entities to expand the scope of progressive propagation. To propagate query-related messages to a farther area within limited steps, we subsequently design a highway layer to propagate information toward these selected starting entities. Moreover, we introduce a training strategy called LinkVerify to mitigate the impact of noisy training samples. Experimental results validate that MStar achieves superior performance compared with state-of-the-art models, especially for distant entities. △ Less

Submitted 15 July, 2024; originally announced July 2024.

Comments: Accepted in the 23rd International Semantic Web Conference (ISWC 2024)

arXiv:2407.07479 [pdf, other]

How to Make Cross Encoder a Good Teacher for Efficient Image-Text Retrieval?

Authors: Yuxin Chen, Zongyang Ma, Ziqi Zhang, Zhongang Qi, Chunfeng Yuan, Bing Li, Junfu Pu, Ying Shan, Xiaojuan Qi, Weiming Hu

Abstract: Dominant dual-encoder models enable efficient image-text retrieval but suffer from limited accuracy while the cross-encoder models offer higher accuracy at the expense of efficiency. Distilling cross-modality matching knowledge from cross-encoder to dual-encoder provides a natural approach to harness their strengths. Thus we investigate the following valuable question: how to make cross-encoder a… ▽ More Dominant dual-encoder models enable efficient image-text retrieval but suffer from limited accuracy while the cross-encoder models offer higher accuracy at the expense of efficiency. Distilling cross-modality matching knowledge from cross-encoder to dual-encoder provides a natural approach to harness their strengths. Thus we investigate the following valuable question: how to make cross-encoder a good teacher for dual-encoder? Our findings are threefold:(1) Cross-modal similarity score distribution of cross-encoder is more concentrated while the result of dual-encoder is nearly normal making vanilla logit distillation less effective. However ranking distillation remains practical as it is not affected by the score distribution.(2) Only the relative order between hard negatives conveys valid knowledge while the order information between easy negatives has little significance.(3) Maintaining the coordination between distillation loss and dual-encoder training loss is beneficial for knowledge transfer. Based on these findings we propose a novel Contrastive Partial Ranking Distillation (CPRD) method which implements the objective of mimicking relative order between hard negative samples with contrastive learning. This approach coordinates with the training of the dual-encoder effectively transferring valid knowledge from the cross-encoder to the dual-encoder. Extensive experiments on image-text retrieval and ranking tasks show that our method surpasses other distillation methods and significantly improves the accuracy of dual-encoder. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: Accepted by CVPR 2024

arXiv:2407.07478 [pdf, other]

EA-VTR: Event-Aware Video-Text Retrieval

Authors: Zongyang Ma, Ziqi Zhang, Yuxin Chen, Zhongang Qi, Chunfeng Yuan, Bing Li, Yingmin Luo, Xu Li, Xiaojuan Qi, Ying Shan, Weiming Hu

Abstract: Understanding the content of events occurring in the video and their inherent temporal logic is crucial for video-text retrieval. However, web-crawled pre-training datasets often lack sufficient event information, and the widely adopted video-level cross-modal contrastive learning also struggles to capture detailed and complex video-text event alignment. To address these challenges, we make improv… ▽ More Understanding the content of events occurring in the video and their inherent temporal logic is crucial for video-text retrieval. However, web-crawled pre-training datasets often lack sufficient event information, and the widely adopted video-level cross-modal contrastive learning also struggles to capture detailed and complex video-text event alignment. To address these challenges, we make improvements from both data and model perspectives. In terms of pre-training data, we focus on supplementing the missing specific event content and event temporal transitions with the proposed event augmentation strategies. Based on the event-augmented data, we construct a novel Event-Aware Video-Text Retrieval model, ie, EA-VTR, which achieves powerful video-text retrieval ability through superior video event awareness. EA-VTR can efficiently encode frame-level and video-level visual representations simultaneously, enabling detailed event content and complex event temporal cross-modal alignment, ultimately enhancing the comprehensive understanding of video events. Our method not only significantly outperforms existing approaches on multiple datasets for Text-to-Video Retrieval and Video Action Recognition tasks, but also demonstrates superior event content perceive ability on Multi-event Video-Text Retrieval and Video Moment Retrieval tasks, as well as outstanding event temporal logic understanding ability on Test of Time task. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: Accepted by ECCV 2024

arXiv:2407.07403 [pdf, other]

A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends

Authors: Daizong Liu, Mingyu Yang, Xiaoye Qu, Pan Zhou, Yu Cheng, Wei Hu

Abstract: With the significant development of large models in recent years, Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities across a wide range of multimodal understanding and reasoning tasks. Compared to traditional Large Language Models (LLMs), LVLMs present great potential and challenges due to its closer proximity to the multi-resource real-world applications and the compl… ▽ More With the significant development of large models in recent years, Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities across a wide range of multimodal understanding and reasoning tasks. Compared to traditional Large Language Models (LLMs), LVLMs present great potential and challenges due to its closer proximity to the multi-resource real-world applications and the complexity of multi-modal processing. However, the vulnerability of LVLMs is relatively underexplored, posing potential security risks in daily usage. In this paper, we provide a comprehensive review of the various forms of existing LVLM attacks. Specifically, we first introduce the background of attacks targeting LVLMs, including the attack preliminary, attack challenges, and attack resources. Then, we systematically review the development of LVLM attack methods, such as adversarial attacks that manipulate model outputs, jailbreak attacks that exploit model vulnerabilities for unauthorized actions, prompt injection attacks that engineer the prompt type and pattern, and data poisoning that affects model training. Finally, we discuss promising research directions in the future. We believe that our survey provides insights into the current landscape of LVLM vulnerabilities, inspiring more researchers to explore and mitigate potential safety issues in LVLM developments. The latest papers on LVLM attacks are continuously collected in https://github.com/liudaizong/Awesome-LVLM-Attack. △ Less

Submitted 11 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.06111 [pdf, other]

Enhancing the Prediction of Glass Dynamics by Incorporating the Direction of Deviation from Equilibrium Positions

Authors: Xiao Jiang, Zean Tian, Kenli Li, Wangyu Hu

Abstract: Elucidating the intricate relationship between the structure and dynamics in the context of the glass transition has been a persistent challenge. Machine learning (ML) has emerged as a pivotal tool, offering novel pathways to predict dynamic behaviors from structural descriptors. Notably, recent research has highlighted that the distance between the initial particle positions between the equilibri… ▽ More Elucidating the intricate relationship between the structure and dynamics in the context of the glass transition has been a persistent challenge. Machine learning (ML) has emerged as a pivotal tool, offering novel pathways to predict dynamic behaviors from structural descriptors. Notably, recent research has highlighted that the distance between the initial particle positions between the equilibrium positions substantially enhances the prediction of glassy dynamics. However, these methodologies have been limited in their ability to capture the directional aspects of these deviations from the equilibrium positions, which are crucial for a comprehensive understanding of the complex particle interactions within the cage dynamics. Therefore, this paper introduces a novel structural parameter: the vectorial displacement of particles from their initial configuration to their equilibrium positions. Recognizing the inadequacy of current ML models in effectively handling such vectorial parameters, we have developed an Equivariance-Constrained Invariant Graph Neural Network (EIGNN). This innovative model not only bolsters the descriptive capacity of conventional rotation-invariant models but also streamlines the computational demands associated with rotation-equivariant graph neural networks. Our rigorous experimental validation on 3D glassy system from GlassBench dataset has yielded compelling evidence that the EIGNN model significantly enhance the correlation between structural representation and dynamic properties. △ Less

Submitted 9 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.05014 [pdf, ps, other]

Well-posedness and Bilinear Controllability of a Repairable System with Degraded State

Authors: Daniel Owusu Adu, Weiwei Hu

Abstract: In this work, we consider the dynamics of repairable systems characterized by three distinct states: one signifying normal operational states, another representing degraded conditions and a third denoting failed conditions. These systems are characterized by their ability to be repaired when failures and/or degradation occur. Typically described by transport equations, these systems exhibit a coup… ▽ More In this work, we consider the dynamics of repairable systems characterized by three distinct states: one signifying normal operational states, another representing degraded conditions and a third denoting failed conditions. These systems are characterized by their ability to be repaired when failures and/or degradation occur. Typically described by transport equations, these systems exhibit a coupled nature, interlinked through integro-differential equations and integral boundary conditions that dictate the transitions among all the states. In this paper, we address two less-explored facets: 1) the well-posedness and the asymptotic behavior of such systems with maximum repair time being finite; and 2) the bilinear controllability of the system via repair actions. In particular, we focus on the case where only one degraded and one failed states exist. We first discuss part 1) for given time-independent repair rates and then design the space-time dependent repair strategies that can manipulate system dynamics to achieve the desired level over a finite horizon. Our objective is to enhance the system availability -- the probability of being operational when needed over a fixed period of time. We present rigorous analysis and develop control strategies that leverage the bilinear structure of the system model. △ Less

Submitted 6 July, 2024; originally announced July 2024.

arXiv:2407.05009 [pdf, ps, other]

Bilinear Controllability of a Simple Reparable System

Authors: Daniel Owusu Adu, Weiwei Hu

Abstract: Reparable systems are systems that are characterized by their ability to undergo maintenance actions when failures occur. These systems are often described by transport equations, all coupled through an integro-differential equation. In this paper, we address the understudied aspect of the controllability of reparable systems. In particular, we focus on a two-state reparable system and our goal is… ▽ More Reparable systems are systems that are characterized by their ability to undergo maintenance actions when failures occur. These systems are often described by transport equations, all coupled through an integro-differential equation. In this paper, we address the understudied aspect of the controllability of reparable systems. In particular, we focus on a two-state reparable system and our goal is to design a control strategy that enhances the system availability -- the probability of being operational when needed. We establish bilinear controllability, demonstrating that appropriate control actions can manipulate system dynamics to achieve desired availability levels. We provide theoretical foundations and develop control strategies that leverage the bilinear structure of the equations. △ Less

Submitted 6 July, 2024; originally announced July 2024.

arXiv:2407.04675 [pdf, other]

Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition

Authors: Ye Bai, Jingping Chen, Jitong Chen, Wei Chen, Zhuo Chen, Chuang Ding, Linhao Dong, Qianqian Dong, Yujiao Du, Kepan Gao, Lu Gao, Yi Guo, Minglun Han, Ting Han, Wenchao Hu, Xinying Hu, Yuxiang Hu, Deyu Hua, Lu Huang, Mingkun Huang, Youjia Huang, Jishuo Jin, Fanliu Kong, Zongwei Lan, Tianyu Li , et al. (30 additional authors not shown)

Abstract: Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this wor… ▽ More Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this work, we introduce Seed-ASR, a large language model (LLM) based speech recognition model. Seed-ASR is developed based on the framework of audio conditioned LLM (AcLLM), leveraging the capabilities of LLMs by inputting continuous speech representations together with contextual information into the LLM. Through stage-wise large-scale training and the elicitation of context-aware capabilities in LLM, Seed-ASR demonstrates significant improvement over end-to-end models on comprehensive evaluation sets, including multiple domains, accents/dialects and languages. Additionally, Seed-ASR can be further deployed to support specific needs in various scenarios without requiring extra language models. Compared to recently released large ASR models, Seed-ASR achieves 10%-40% reduction in word (or character, for Chinese) error rates on Chinese and English public test sets, further demonstrating its powerful performance. △ Less

Submitted 10 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

arXiv:2407.04553 [pdf, other]

The Nature of the High-energy Gamma-Ray Radiation Associated with the High-redshift Blazar B3 1343+451

Authors: Fan Wu, Wen Hu, Benzhong Dai

Abstract: High-redshift blazars are the most powerful extragalactic astrophysical sources ever detected in the high-energy gamma-ray band. In this study, we present a temporal and spectral analysis of the high-redshift blazar B3 1343+451 based on 14 years of Fermi-LAT observations, spanning from 2008 August 4 to 2022 June 6 (MJD 54686-59733). We extract a seven-day binned $γ$-ray light curve in the energy r… ▽ More High-redshift blazars are the most powerful extragalactic astrophysical sources ever detected in the high-energy gamma-ray band. In this study, we present a temporal and spectral analysis of the high-redshift blazar B3 1343+451 based on 14 years of Fermi-LAT observations, spanning from 2008 August 4 to 2022 June 6 (MJD 54686-59733). We extract a seven-day binned $γ$-ray light curve in the energy range 0.1--500 GeV and identify seven outburst periods with a peak flux of $>4.32\times10^{-7} \rm ph \cdot cm^{-2} \cdot s^{-1}$. The highest seven day flux (above 100 MeV) reaches $(8.06\pm0.56)\times10^{-7} \rm erg \ cm^{-2} \ s^{-1}$ on MJD = 56,177.16, which is 10 times higher than the flux in the quiescent period. To understand the properties of distant blazar jets, we employ a standard one-zone leptonic scenario and model the multiwavelength spectral energy distributions of one quiescent and seven flaring periods. We find that the $γ$-ray spectrum is better reproduced if we assume that the dissipation region of the jet, $R_{\rm diss}$, is located within the molecular torus, where infrared emission is the dominant external photon field. We infer that the jets in higher-redshift blazars have larger power and kinetic energy, where the kinetic energy is significantly greater than the radiation power, and the jet production efficiency suggests that we need to lower the accretion efficiency. These results imply that B3 1343+451 may have a standard thin disk surrounding its massive black hole, and the jets of B3 1343+451 may not be fully explained by the Blandford--Payne process. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: 13 pages, 5 figures.Accepted for publication in APJ

arXiv:2407.03884 [pdf, other]

Planning with Large Language Models for Conversational Agents

Authors: Zhigen Li, Jianxiang Peng, Yanmeng Wang, Tianhao Shen, Minghui Zhang, Linxi Su, Shang Wu, Yihang Wu, Yuqian Wang, Ye Wang, Wei Hu, Jianfeng Li, Shaojun Wang, Jing Xiao, Deyi Xiong

Abstract: Controllability and proactivity are crucial properties of autonomous conversational agents (CAs). Controllability requires the CAs to follow the standard operating procedures (SOPs), such as verifying identity before activating credit cards. Proactivity requires the CAs to guide the conversation towards the goal during user uncooperation, such as persuasive dialogue. Existing research cannot be un… ▽ More Controllability and proactivity are crucial properties of autonomous conversational agents (CAs). Controllability requires the CAs to follow the standard operating procedures (SOPs), such as verifying identity before activating credit cards. Proactivity requires the CAs to guide the conversation towards the goal during user uncooperation, such as persuasive dialogue. Existing research cannot be unified with controllability, proactivity, and low manual annotation. To bridge this gap, we propose a new framework for planning-based conversational agents (PCA) powered by large language models (LLMs), which only requires humans to define tasks and goals for the LLMs. Before conversation, LLM plans the core and necessary SOP for dialogue offline. During the conversation, LLM plans the best action path online referring to the SOP, and generates responses to achieve process controllability. Subsequently, we propose a semi-automatic dialogue data creation framework and curate a high-quality dialogue dataset (PCA-D). Meanwhile, we develop multiple variants and evaluation metrics for PCA, e.g., planning with Monte Carlo Tree Search (PCA-M), which searches for the optimal dialogue action while satisfying SOP constraints and achieving the proactive of the dialogue. Experiment results show that LLMs finetuned on PCA-D can significantly improve the performance and generalize to unseen domains. PCA-M outperforms other CoT and ToT baselines in terms of conversation controllability, proactivity, task success rate, and overall logical coherence, and is applicable in industry dialogue scenarios. The dataset and codes are available at XXXX. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2406.19240 [pdf, other]

Data Preparation for Deep Learning based Code Smell Detection: A Systematic Literature Review

Authors: Fengji Zhang, Zexian Zhang, Jacky Wai Keung, Xiangru Tang, Zhen Yang, Xiao Yu, Wenhua Hu

Abstract: Code Smell Detection (CSD) plays a crucial role in improving software quality and maintainability. And Deep Learning (DL) techniques have emerged as a promising approach for CSD due to their superior performance. However, the effectiveness of DL-based CSD methods heavily relies on the quality of the training data. Despite its importance, little attention has been paid to analyzing the data prepara… ▽ More Code Smell Detection (CSD) plays a crucial role in improving software quality and maintainability. And Deep Learning (DL) techniques have emerged as a promising approach for CSD due to their superior performance. However, the effectiveness of DL-based CSD methods heavily relies on the quality of the training data. Despite its importance, little attention has been paid to analyzing the data preparation process. This systematic literature review analyzes the data preparation techniques used in DL-based CSD methods. We identify 36 relevant papers published by December 2023 and provide a thorough analysis of the critical considerations in constructing CSD datasets, including data requirements, collection, labeling, and cleaning. We also summarize seven primary challenges and corresponding solutions in the literature. Finally, we offer actionable recommendations for preparing and accessing high-quality CSD data, emphasizing the importance of data diversity, standardization, and accessibility. This survey provides valuable insights for researchers and practitioners to harness the full potential of DL techniques in CSD. △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.18442 [pdf, other]

Correlation of the L-mode density limit with edge collisionality

Authors: Andrew Maris, Cristina Rea, Alessandro Pau, Wenhui Hu, Bingjia Xiao, Robert Granetz, Earl Marmar, the EUROfusion Tokamak Exploitation team, the Alcator C-Mod team, the ASDEX Upgrade team, the DIII-D team, the EAST team, the TCV team

Abstract: The "density limit" is one of the fundamental bounds on tokamak operating space, and is commonly estimated via the empirical Greenwald scaling. This limit has garnered renewed interest in recent years as it has become clear that ITER and many tokamak pilot plant concepts must operate near or above the widely-used Greenwald limit to achieve their objectives. Evidence has also grown that the Greenwa… ▽ More The "density limit" is one of the fundamental bounds on tokamak operating space, and is commonly estimated via the empirical Greenwald scaling. This limit has garnered renewed interest in recent years as it has become clear that ITER and many tokamak pilot plant concepts must operate near or above the widely-used Greenwald limit to achieve their objectives. Evidence has also grown that the Greenwald scaling - in its remarkable simplicity - may not capture the full complexity of the disruptive density limit. In this study, we assemble a multi-machine database to quantify the effectiveness of the Greenwald limit as a predictor of the L-mode density limit and identify alternative stability metrics. We find that a two-parameter dimensionless boundary in the plasma edge, $ν_{*\rm, edge}^{\rm limit} = 3.0 β_{T,{\rm edge}}^{-0.4}$, achieves significantly higher accuracy (true negative rate of 97.7% at a true positive rate of 95%) than the Greenwald limit (true negative rate 86.1% at a true positive rate of 95%) across a multi-machine dataset including metal- and carbon-wall tokamaks (AUG, C-Mod, DIII-D, and TCV). The collisionality boundary presented here can be applied for density limit avoidance in current devices and in ITER, where it can be measured and responded to in real time. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: 27 pages, 9 figures

arXiv:2406.18078 [pdf, other]

Self-Training with Pseudo-Label Scorer for Aspect Sentiment Quad Prediction

Authors: Yice Zhang, Jie Zeng, Weiming Hu, Ziyi Wang, Shiwei Chen, Ruifeng Xu

Abstract: Aspect Sentiment Quad Prediction (ASQP) aims to predict all quads (aspect term, aspect category, opinion term, sentiment polarity) for a given review, which is the most representative and challenging task in aspect-based sentiment analysis. A key challenge in the ASQP task is the scarcity of labeled data, which limits the performance of existing methods. To tackle this issue, we propose a self-tra… ▽ More Aspect Sentiment Quad Prediction (ASQP) aims to predict all quads (aspect term, aspect category, opinion term, sentiment polarity) for a given review, which is the most representative and challenging task in aspect-based sentiment analysis. A key challenge in the ASQP task is the scarcity of labeled data, which limits the performance of existing methods. To tackle this issue, we propose a self-training framework with a pseudo-label scorer, wherein a scorer assesses the match between reviews and their pseudo-labels, aiming to filter out mismatches and thereby enhance the effectiveness of self-training. We highlight two critical aspects to ensure the scorer's effectiveness and reliability: the quality of the training dataset and its model architecture. To this end, we create a human-annotated comparison dataset and train a generative model on it using ranking-based objectives. Extensive experiments on public ASQP datasets reveal that using our scorer can greatly and consistently improve the effectiveness of self-training. Moreover, we explore the possibility of replacing humans with large language models for comparison dataset annotation, and experiments demonstrate its feasibility. We release our code and data at https://github.com/HITSZ-HLT/ST-w-Scorer-ABSA . △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: Accepted to ACL 2024 Main Conference

arXiv:2406.17006 [pdf, other]

Probing the nature of the $χ_{c1}(3872)$ state using radiative decays

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1094 additional authors not shown)

Abstract: The radiative decays $χ_{c1}(3872)\rightarrowψ(2S)γ$ and $χ_{c1}(3872)\rightarrow J/ψγ$ are used to probe the~nature of the~$χ_{c1}(3872)$ state using proton-proton collision data collected with the LHCb detector, corresponding to an~integrated luminosity of~9fb$^{-1}$. Using the~$B^+\rightarrow χ_{c1}(3872)K^+$decay, the $χ_{c1}(3872)\rightarrow ψ(2S)γ$ process is observed for the first time and… ▽ More The radiative decays $χ_{c1}(3872)\rightarrowψ(2S)γ$ and $χ_{c1}(3872)\rightarrow J/ψγ$ are used to probe the~nature of the~$χ_{c1}(3872)$ state using proton-proton collision data collected with the LHCb detector, corresponding to an~integrated luminosity of~9fb$^{-1}$. Using the~$B^+\rightarrow χ_{c1}(3872)K^+$decay, the $χ_{c1}(3872)\rightarrow ψ(2S)γ$ process is observed for the first time and the ratio of its partial width to that of the $χ_{c1}(3872)\rightarrow J/ψγ$ decay is measured to be $$ \frac{Γ_{χ_{c1}(3872)\rightarrow ψ(2S)γ}} {Γ_{χ_{c1}(3872)\rightarrow J/ψγ}} = 1.67 \pm 0.21 \pm 0.12 \pm0.04 , $$ where the first uncertainty is statistical, the second systematic and the third is due to the uncertainties on the branching fractions of the $ψ(2S)$ and $J/ψ$ mesons. The measured ratio makes the interpretation of the $χ_{c1}(3872)$ state as a~pure $D^0\bar{D}^{*0}+\bar{D}^0D^{*0}$ molecule questionable and strongly indicates a sizeable compact charmonium or tetraquark component within the $χ_{c1}(3872)$ state. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: 31 pages, 2 figures. All figures and tables, along with any supplementary material and additional information, are available at https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2024-015.html (LHCb public pages)

Report number: LHCb-PAPER-2024-015, CERN-EP-2025-157

arXiv:2406.16334 [pdf, other]

CONCERTO: Instrument model of Fourier transform spectroscopy, white-noise components

Authors: Alessandro Fasano, Peter Ade, Manuel Aravena, Emilio Barria, Alexandre Beelen, Alain Benoit, Matthieu Béthermin, Julien Bounmy, Olivier Bourrion, Guillaume Bres, Martino Calvo, Andrea Catalano, Carlos De Breuck, François-Xavier Désert, Cédric Dubois, Carlos Durán, Thomas Fenouillet, Jose Garcia, Gregory Garde, Johannes Goupy, Christophe Hoarau, Wenkai Hu, Guilaine Lagache, Jean-Charles Lambert, Florence Levy-Bertrand , et al. (12 additional authors not shown)

Abstract: Modern astrophysics relies on intricate instrument setups to meet the demands of sensitivity, sky coverage, and multi-channel observations. An example is the CONCERTO project, employing advanced technology like kinetic inductance detectors and a Martin-Puplett interferometer. This instrument, installed at the APEX telescope atop the Chajnantor plateau, began commissioning observations in April 202… ▽ More Modern astrophysics relies on intricate instrument setups to meet the demands of sensitivity, sky coverage, and multi-channel observations. An example is the CONCERTO project, employing advanced technology like kinetic inductance detectors and a Martin-Puplett interferometer. This instrument, installed at the APEX telescope atop the Chajnantor plateau, began commissioning observations in April 2021. Following a successful commissioning phase that concluded in June 2021, CONCERTO was offered to the scientific community for observations, with a final observing run in December 2022. CONCERTO boasts an 18.5 arcmin field of view and a spectral resolution down to 1.45 GHz in the 130-310 GHz electromagnetic band. We developed a comprehensive instrument model of CONCERTO inspired by Fourier transform spectrometry principles to optimize performance and address systematic errors. This model integrates instrument noises, subsystem characteristics, and celestial signals, leveraging both physical data and simulations. Our methodology involves delineating simulation components, executing on-sky simulations, and comparing results with real observations. The resulting instrument model is pivotal, enabling a precise error correction and enhancing the reliability of astrophysical insights obtained from observational data. In this work, we focus on the description of three white-noise noise components included in the instrument model that characterize the white-noise level: the photon, the generation-recombination, and the amplifier noises. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: 8 pages, 1 figure, Proceeding of the SPIE conference Millimeter, Submillimeter, and Far-Infrared Detectors and Instrumentation for Astronomy XII, SPIE Astronomical Telescopes + Instrumentation 2024

arXiv:2406.16291 [pdf, other]

Integrated Study of X-ray Spectrum and Time Lags for HBL Mrk 421 within the Framework of the Multiple-Zone Leptonic Model

Authors: Wen Hu, Jia-Lai Kang, Zhen-Yi Cai, Jun-Xian Wang, Zhen-Bo Su, Guang-Cheng Xiao

Abstract: We present the timing analysis of 10 archived \XMM observations with an exposure of $>40$ ks of Markarian 421. Mrk 421 is the brightest high-frequency-peaked BL Lac object (HBL) emitting in X-rays produced by electrons accelerated in the innermost regions of a relativistic jet pointing toward us. For each observation, we construct averaged X-ray spectra in 0.5--10 keV band, as well as 100 s binned… ▽ More We present the timing analysis of 10 archived \XMM observations with an exposure of $>40$ ks of Markarian 421. Mrk 421 is the brightest high-frequency-peaked BL Lac object (HBL) emitting in X-rays produced by electrons accelerated in the innermost regions of a relativistic jet pointing toward us. For each observation, we construct averaged X-ray spectra in 0.5--10 keV band, as well as 100 s binned light curves (LCs) in various subbands. During these observations, the source exhibited various intensity states differing by close to an order of magnitude in flux, with the fractional variability amplitude increasing with energy through the X-ray band. Bayesian power spectral density analysis reveals that the X-ray variability can be characterized by a colored noise, with an index ranging from $\sim-1.9$ to $-3.0$. Moreover, both the standard cross-correlation function and cross-spectral methods indicate that the amount of time lags increases with the energy difference between two compared LCs. A time-dependent two-zone jet model is developed to extract physical information from the X-ray emission of Mrk 421. In the model, we assume that the jet emission mostly comprises a quasi-stationary component and a highly variable one. Our results show that the two-zone model can simultaneously provide a satisfactory description for both the X-ray spectra and time lags observed in different epochs, with the model parameters constrained in a fully acceptable interval. We suggest that shocks within the jets may be the primary energy dissipation process responsible for triggering the rapid variability, although magnetic reconnection cannot be excluded. △ Less

Submitted 23 June, 2024; originally announced June 2024.

Comments: 33 pages, 12 figures, 6 tables; Accepted for publication in ApJ supplement series

arXiv:2406.15572 [pdf, other]

CONCERTO at APEX -- On-sky performance in continuum

Authors: W. Hu, A. Beelen, G. Lagache, A. Fasano, A. Lundgren, P. Ade, M. Aravena, E. Barria, A. Benoit, M. Bethermin, J. Bounmy, O. Bourrion, G. Bres, C. De Breuck, M. Calvo, A. Catalano, F. -X. Desert, C. Dubois, C. A Duran, T. Fenouillet, J. Garcia, G. Garde, J. Goupy, C. Hoarau, J. -C. Lambert , et al. (14 additional authors not shown)

Abstract: We present the data-processing algorithms and the performance of CONCERTO (CarbON CII line in post-rEionisation and ReionisaTiOn epoch) in continuum by analysing the data from the commissioning and scientific observations. The beam pattern is characterized by an effective FWHM of 31.9 $\pm$ 0.6" and 34.4 $\pm$ 1.0" for high-frequency (HF) and low-frequency (LF) bands. The main beam is slightly elo… ▽ More We present the data-processing algorithms and the performance of CONCERTO (CarbON CII line in post-rEionisation and ReionisaTiOn epoch) in continuum by analysing the data from the commissioning and scientific observations. The beam pattern is characterized by an effective FWHM of 31.9 $\pm$ 0.6" and 34.4 $\pm$ 1.0" for high-frequency (HF) and low-frequency (LF) bands. The main beam is slightly elongated with a mean eccentricity of 0.46. Two error beams of $\sim$65" and $\sim$130" are characterized, enabling the estimate of a main beam efficiency of $\sim$0.52. The field of view is accurately reconstructed and presents coherent distortions between the HF and LF arrays. LEKID parameters were robustly determined for 80% of the read tones. Cross-talks between LEKIDs are the first cause of flagging, followed by an excess of eccentricity for $\sim$10% of the LEKIDs, all located in a given region of the field of view. On the 44 scans of Uranus selected for the absolute photometric calibration, 72.5% and 78.2% of the LEKIDs are selected as valid detectors with a probability >70%. By comparing Uranus measurements with a model, we obtain calibration factors of 19.5$\pm$0.6 [Hz/Jy] and 25.6$\pm$0.9 [Hz/Jy] for HF and LF. The point-source continuum measurement uncertainties are 3.0% and 3.4% for HF and LF bands. The RMS of CONCERTO maps is verified to evolve as proportional to the inverse square root of integration time. The measured NEFDs for HF and LF are 115$\pm$2 mJy/beam$\cdot$s$^{1/2}$ and 95$\pm$1 mJy/beam$\cdot$s$^{1/2}$, obtained using CONCERTO data on the COSMOS field for a mean precipitable water vapour and elevation of 0.81 mm and 55.7 deg. CONCERTO demonstrates unique capabilities in fast dual-band spectral mapping with a $\sim$18.5' instantaneous field-of-view. CONCERTO's performance in continuum is perfectly in line with expectations. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: 23pages, 22 figures

arXiv:2406.14473 [pdf, other]

Data-Centric AI in the Age of Large Language Models

Authors: Xinyi Xu, Zhaoxuan Wu, Rui Qiao, Arun Verma, Yao Shu, Jingtan Wang, Xinyuan Niu, Zhenfeng He, Jiangwei Chen, Zijian Zhou, Gregory Kang Ruey Lau, Hieu Dao, Lucas Agussurja, Rachael Hwee Ling Sim, Xiaoqiang Lin, Wenyang Hu, Zhongxiang Dai, Pang Wei Koh, Bryan Kian Hsiang Low

Abstract: This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs). We start by making the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs, and yet it receives disproportionally low attention from the research community. We identify four specific… ▽ More This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs). We start by making the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs, and yet it receives disproportionally low attention from the research community. We identify four specific scenarios centered around data, covering data-centric benchmarks and data curation, data attribution, knowledge transfer, and inference contextualization. In each scenario, we underscore the importance of data, highlight promising research directions, and articulate the potential impacts on the research community and, where applicable, the society as a whole. For instance, we advocate for a suite of data-centric benchmarks tailored to the scale and complexity of data for LLMs. These benchmarks can be used to develop new data curation methods and document research efforts and results, which can help promote openness and transparency in AI and LLM research. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: Preprint

arXiv:2406.13511 [pdf, other]

Slice-Level Scheduling for High Throughput and Load Balanced LLM Serving

Authors: Ke Cheng, Wen Hu, Zhi Wang, Hongen Peng, Jianguo Li, Sheng Zhang

Abstract: Large language models (LLMs) iteratively generate text token by token, with memory usage increasing with the length of generated token sequences. The unpredictability of generation lengths makes it difficult to estimate the time and memory needed to process requests, posing a challenge for effective request scheduling. Conventional sequence-level scheduling (SLS) serves requests in a first-come fi… ▽ More Large language models (LLMs) iteratively generate text token by token, with memory usage increasing with the length of generated token sequences. The unpredictability of generation lengths makes it difficult to estimate the time and memory needed to process requests, posing a challenge for effective request scheduling. Conventional sequence-level scheduling (SLS) serves requests in a first-come first-served (FCFS) manner with static batching where requests with short generation lengths are delayed until those with long ones have finished generation, which hurts computational efficiency. Besides, to avoid out-of-memory (OOM) errors, SLS batches requests with a small batch size, which limits throughput. Recently proposed iteration-level scheduling (ILS) enhances computational efficiency with continuous batching to return completed requests timely and dynamically add new requests for processing. However, many ILS schedulers limit the number of parallel-processing requests to avoid OOM errors while achieving a fast inference speed, which compromises throughput. Moreover, existing SLS and ILS schedulers fail to balance the workload across multiple deployed LLM instances. To tackle these challenges, we propose slice-level scheduling (SCLS). By splitting the predefined maximal generation length limit into slices and serving batches slice by slice, it provides a precise range of serving time and memory usage for batched requests, laying the foundation for effective scheduling. Experiments confirm that compared with SLS and ILS schedulers, SCLS can improve throughput by up to 315.8% and greatly mitigate load imbalance with proposed batching and offloading algorithms. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 13 pages, 22 figures

arXiv:2406.13409 [pdf, other]

doi 10.1145/3581783.3612007

PetalView: Fine-grained Location and Orientation Extraction of Street-view Images via Cross-view Local Search with Supplementary Materials

Authors: Wenmiao Hu, Yichen Zhang, Yuxuan Liang, Xianjing Han, Yifang Yin, Hannes Kruppa, See-Kiong Ng, Roger Zimmermann

Abstract: Satellite-based street-view information extraction by cross-view matching refers to a task that extracts the location and orientation information of a given street-view image query by using one or multiple geo-referenced satellite images. Recent work has initiated a new research direction to find accurate information within a local area covered by one satellite image centered at a location prior (… ▽ More Satellite-based street-view information extraction by cross-view matching refers to a task that extracts the location and orientation information of a given street-view image query by using one or multiple geo-referenced satellite images. Recent work has initiated a new research direction to find accurate information within a local area covered by one satellite image centered at a location prior (e.g., from GPS). It can be used as a standalone solution or complementary step following a large-scale search with multiple satellite candidates. However, these existing works require an accurate initial orientation (angle) prior (e.g., from IMU) and/or do not efficiently search through all possible poses. To allow efficient search and to give accurate prediction regardless of the existence or the accuracy of the angle prior, we present PetalView extractors with multi-scale search. The PetalView extractors give semantically meaningful features that are equivalent across two drastically different views, and the multi-scale search strategy efficiently inspects the satellite image from coarse to fine granularity to provide sub-meter and sub-degree precision extraction. Moreover, when an angle prior is given, we propose a learnable prior angle mixer to utilize this information. Our method obtains the best performance on the VIGOR dataset and successfully improves the performance on KITTI dataset test 1 set with the recall within 1 meter (r@1m) for location estimation to 68.88% and recall within 1 degree (r@1d) 21.10% when no angle prior is available, and with angle prior achieves stable estimations at r@1m and r@1d above 70% and 21%, up to a 40-degree noise level. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: This paper has been accepted by ACM Multimedia 2023. This version contains additional supplementary materials

Journal ref: Proceedings of the 31st ACM International Conference on Multimedia (2023) 56-66

arXiv:2406.12970 [pdf, other]

Warm and Fuzzy Dark Matter: Free Streaming of Wave Dark Matter

Authors: Rayne Liu, Wayne Hu, Huangyu Xiao

Abstract: Wave or fuzzy dark matter that is produced with relativistic wavenumbers exhibits free streaming effects analogous to warm or hot particle dark matter with relativistic momenta. Axions produced after inflation provide such a warm or mildly relativistic candidate, where the enhanced suppression and observational bounds are only moderately stronger than that from wave propagation of initially cold a… ▽ More Wave or fuzzy dark matter that is produced with relativistic wavenumbers exhibits free streaming effects analogous to warm or hot particle dark matter with relativistic momenta. Axions produced after inflation provide such a warm or mildly relativistic candidate, where the enhanced suppression and observational bounds are only moderately stronger than that from wave propagation of initially cold axions. More generally, the free streaming damping also impacts isocurvature fluctuations from generation in causally disconnected patches. As coherent spatial fluctuations free stream away they leave incoherent and transient superpositions in their wakes. These multiple wave momentum streams are the wave analogue of particle phase space fluctuations or directional collisionless damping of massive neutrinos or hot dark matter. The observable impact on both adiabatic and isocurvature fluctuations of fuzzy dark matter can differ from their cold dark matter counterparts due to free streaming depending on how warm or hot is their momentum distribution. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 16 pages, 11 figures

Report number: FERMILAB-PUB-24-0296-T

arXiv:2406.12881 [pdf, other]

Towards Unlocking Insights from Logbooks Using AI

Authors: Antonin Sulc, Alex Bien, Annika Eichler, Daniel Ratner, Florian Rehm, Frank Mayet, Gregor Hartmann, Hayden Hoschouer, Henrik Tuennermann, Jan Kaiser, Jason St. John, Jennefer Maldonado, Kyle Hazelwood, Raimund Kammering, Thorsten Hellert, Tim Wilksen, Verena Kain, Wan-Lin Hu

Abstract: Electronic logbooks contain valuable information about activities and events concerning their associated particle accelerator facilities. However, the highly technical nature of logbook entries can hinder their usability and automation. As natural language processing (NLP) continues advancing, it offers opportunities to address various challenges that logbooks present. This work explores jointly t… ▽ More Electronic logbooks contain valuable information about activities and events concerning their associated particle accelerator facilities. However, the highly technical nature of logbook entries can hinder their usability and automation. As natural language processing (NLP) continues advancing, it offers opportunities to address various challenges that logbooks present. This work explores jointly testing a tailored Retrieval Augmented Generation (RAG) model for enhancing the usability of particle accelerator logbooks at institutes like DESY, BESSY, Fermilab, BNL, SLAC, LBNL, and CERN. The RAG model uses a corpus built on logbook contributions and aims to unlock insights from these logbooks by leveraging retrieval over facility datasets, including discussion about potential multimodal sources. Our goals are to increase the FAIR-ness (findability, accessibility, interoperability, and reusability) of logbooks by exploiting their information content to streamline everyday use, enable macro-analysis for root cause analysis, and facilitate problem-solving automation. △ Less

Submitted 25 May, 2024; originally announced June 2024.

Comments: 5 pages, 1 figure, 15th International Particle Accelerator Conference

arXiv:2406.12111 [pdf, other]

Precision measurement of the $Ξ^-_b$ baryon lifetime

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1064 additional authors not shown)

Abstract: A sample of $pp$ collision data, corresponding to an integrated luminosity of 5.5 fb$^{-1}$ and collected by the LHCb experiment during Run 2, is used to measure the ratio of the lifetime of the $Ξ^-_b$ baryon to that of the $Λ^0_b$ baryon, $r_τ\equivτ_{Ξ^-_b}/τ_{Λ^0_b}$. The value ${r_τ^{\rm Run\,2}=1.076\pm0.013\pm0.006}$ is obtained, where the first uncertainty is statistical and the second sys… ▽ More A sample of $pp$ collision data, corresponding to an integrated luminosity of 5.5 fb$^{-1}$ and collected by the LHCb experiment during Run 2, is used to measure the ratio of the lifetime of the $Ξ^-_b$ baryon to that of the $Λ^0_b$ baryon, $r_τ\equivτ_{Ξ^-_b}/τ_{Λ^0_b}$. The value ${r_τ^{\rm Run\,2}=1.076\pm0.013\pm0.006}$ is obtained, where the first uncertainty is statistical and the second systematic. This value is averaged with the corresponding value from Run 1 to obtain ${r_τ^{\rm Run\,1,2} = 1.078\pm0.012\pm0.007}$. Multiplying by the world-average value of the $Λ^0_b$ lifetime yields $τ_{Ξ^-_b}^{\rm Run~1,2} = 1.578\pm0.018\pm0.010\pm0.011$ ps, where the uncertainties are statistical, systematic, and due to the limited knowledge of the $Λ^0_b$ lifetime. This measurement improves the precision of the current world average of the $Ξ^-_b$ lifetime by about a factor of two, and is in good agreement with the most recent theoretical predictions. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 12 pages, 5 figures. All figures and tables, along with any supplementary material and additional information, are available at https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2014-010.html (LHCb public pages)

Report number: LHCb-PAPER-2024-010, CERN-EP-2024-139

arXiv:2406.10765 [pdf, other]

PWDFT-SW: Extending the Limit of Plane-Wave DFT Calculations to 16K Atoms on the New Sunway Supercomputer

Authors: Qingcai Jiang, Zhenwei Cao, Junshi Chen, Xinming Qin, Wei Hu, Hong An, Jinlong Yang

Abstract: First-principles density functional theory (DFT) with plane wave (PW) basis set is the most widely used method in quantum mechanical material simulations due to its advantages in accuracy and universality. However, a perceived drawback of PW-based DFT calculations is their substantial computational cost and memory usage, which currently limits their ability to simulate large-scale complex systems… ▽ More First-principles density functional theory (DFT) with plane wave (PW) basis set is the most widely used method in quantum mechanical material simulations due to its advantages in accuracy and universality. However, a perceived drawback of PW-based DFT calculations is their substantial computational cost and memory usage, which currently limits their ability to simulate large-scale complex systems containing thousands of atoms. This situation is exacerbated in the new Sunway supercomputer, where each process is limited to a mere 16 GB of memory. Herein, we present a novel parallel implementation of plane wave density functional theory on the new Sunway supercomputer (PWDFT-SW). PWDFT-SW fully extracts the benefits of Sunway supercomputer by extensively refactoring and calibrating our algorithms to align with the system characteristics of the Sunway system. Through extensive numerical experiments, we demonstrate that our methods can substantially decrease both computational costs and memory usage. Our optimizations translate to a speedup of 64.8x for a physical system containing 4,096 silicon atoms, enabling us to push the limit of PW-based DFT calculations to large-scale systems containing 16,384 carbon atoms. △ Less

Submitted 15 June, 2024; originally announced June 2024.

arXiv:2406.08835 [pdf, other]

A Single-Step Non-Autoregressive Automatic Speech Recognition Architecture with High Accuracy and Inference Speed

Authors: Ziyang Zhuang, Chenfeng Miao, Kun Zou, Shuai Gong, Ming Fang, Tao Wei, Zijian Li, Wei Hu, Shaojun Wang, Jing Xiao

Abstract: Non-autoregressive (NAR) automatic speech recognition (ASR) models predict tokens independently and simultaneously, bringing high inference speed. However, there is still a gap in the accuracy of the NAR models compared to the autoregressive (AR) models. To further narrow the gap between the NAR and AR models, we propose a single-step NAR ASR architecture with high accuracy and inference speed, ca… ▽ More Non-autoregressive (NAR) automatic speech recognition (ASR) models predict tokens independently and simultaneously, bringing high inference speed. However, there is still a gap in the accuracy of the NAR models compared to the autoregressive (AR) models. To further narrow the gap between the NAR and AR models, we propose a single-step NAR ASR architecture with high accuracy and inference speed, called EfficientASR. It uses an Index Mapping Vector (IMV) based alignment generator to generate alignments during training, and an alignment predictor to learn the alignments for inference. It can be trained end-to-end (E2E) with cross-entropy loss combined with alignment loss. The proposed EfficientASR achieves competitive results on the AISHELL-1 and AISHELL-2 benchmarks compared to the state-of-the-art (SOTA) models. Specifically, it achieves character error rates (CER) of 4.26%/4.62% on the AISHELL-1 dev/test dataset, which outperforms the SOTA AR Conformer with about 30x inference speedup. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.05785 [pdf, other]

A Survey on Text-guided 3D Visual Grounding: Elements, Recent Advances, and Future Directions

Authors: Daizong Liu, Yang Liu, Wencan Huang, Wei Hu

Abstract: Text-guided 3D visual grounding (T-3DVG), which aims to locate a specific object that semantically corresponds to a language query from a complicated 3D scene, has drawn increasing attention in the 3D research community over the past few years. Compared to 2D visual grounding, this task presents great potential and challenges due to its closer proximity to the real world and the complexity of data… ▽ More Text-guided 3D visual grounding (T-3DVG), which aims to locate a specific object that semantically corresponds to a language query from a complicated 3D scene, has drawn increasing attention in the 3D research community over the past few years. Compared to 2D visual grounding, this task presents great potential and challenges due to its closer proximity to the real world and the complexity of data collection and 3D point cloud source processing. In this survey, we attempt to provide a comprehensive overview of the T-3DVG progress, including its fundamental elements, recent research advances, and future research directions. To the best of our knowledge, this is the first systematic survey on the T-3DVG task. Specifically, we first provide a general structure of the T-3DVG pipeline with detailed components in a tutorial style, presenting a complete background overview. Then, we summarize the existing T-3DVG approaches into different categories and analyze their strengths and weaknesses. We also present the benchmark datasets and evaluation metrics to assess their performances. Finally, we discuss the potential limitations of existing T-3DVG and share some insights on several promising research directions. The latest papers are continually collected at https://github.com/liudaizong/Awesome-3D-Visual-Grounding. △ Less

Submitted 9 June, 2024; originally announced June 2024.

arXiv:2406.05608 [pdf]

Janus graphene nanoribbons with a single ferromagnetic zigzag edge

Authors: Shaotang Song, Yu Teng, Weichen Tang, Zhen Xu, Yuanyuan He, Jiawei Ruan, Takahiro Kojima, Wenping Hu, Franz J Giessibl, Hiroshi Sakaguchi, Steven G Louie, Jiong Lu

Abstract: Topological design of pi-electrons in zigzag-edged graphene nanoribbons (ZGNRs) leads to a wealth of magnetic quantum phenomena and exotic quantum phases. Symmetric ZGNRs typically exhibit antiferromagnetically coupled spin-ordered edge states. Eliminating cross-edge magnetic coupling in ZGNRs not only enables the realization of a new class of ferromagnetic quantum spin chains, enabling the explor… ▽ More Topological design of pi-electrons in zigzag-edged graphene nanoribbons (ZGNRs) leads to a wealth of magnetic quantum phenomena and exotic quantum phases. Symmetric ZGNRs typically exhibit antiferromagnetically coupled spin-ordered edge states. Eliminating cross-edge magnetic coupling in ZGNRs not only enables the realization of a new class of ferromagnetic quantum spin chains, enabling the exploration of quantum spin physics and entanglement of multiple qubits in the 1D limit, but also establishes a long-sought carbon-based ferromagnetic transport channel, pivotal for ultimate scaling of GNR-based quantum electronics. However, designing such GNRs entails overcoming daunting challenges, including simultaneous breaking of structural and spin symmetries, and designing elegant precursors for asymmetric fabrication of reactive zigzag edges. Here, we report a general approach for designing and fabricating such ferromagnetic GNRs in the form of Janus GNRs with two distinct edge configurations. Guided by Lieb's theorem and topological classification theory, we devised two JGNRs by asymmetrically introduced a topological defect array of benzene motifs to one zigzag edge, while keeping the opposing zigzag edge unchanged. This breaks structural symmetry and creates a sublattice imbalance within each unit cell, initiating a spin symmetry breaking. Three Z-shape precursors are designed to fabricate one parent ZGNR and two JGNRs with an optimal lattice spacing of the defect array for a complete quench of the magnetic edge states at the defective edge. Characterization via scanning probe microscopy/spectroscopy and first-principles density functional theory confirms the successful fabrication of Janus GNRs with ferromagnetic ground state delocalised along the pristine zigzag edge. △ Less

Submitted 8 June, 2024; originally announced June 2024.

Comments: 19 pages, 4 figures

arXiv:2406.04983 [pdf, other]

CityCraft: A Real Crafter for 3D City Generation

Authors: Jie Deng, Wenhao Chai, Junsheng Huang, Zhonghan Zhao, Qixuan Huang, Mingyan Gao, Jianshu Guo, Shengyu Hao, Wenhao Hu, Jenq-Neng Hwang, Xi Li, Gaoang Wang

Abstract: City scene generation has gained significant attention in autonomous driving, smart city development, and traffic simulation. It helps enhance infrastructure planning and monitoring solutions. Existing methods have employed a two-stage process involving city layout generation, typically using Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), or Transformers, followed by neur… ▽ More City scene generation has gained significant attention in autonomous driving, smart city development, and traffic simulation. It helps enhance infrastructure planning and monitoring solutions. Existing methods have employed a two-stage process involving city layout generation, typically using Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), or Transformers, followed by neural rendering. These techniques often exhibit limited diversity and noticeable artifacts in the rendered city scenes. The rendered scenes lack variety, resembling the training images, resulting in monotonous styles. Additionally, these methods lack planning capabilities, leading to less realistic generated scenes. In this paper, we introduce CityCraft, an innovative framework designed to enhance both the diversity and quality of urban scene generation. Our approach integrates three key stages: initially, a diffusion transformer (DiT) model is deployed to generate diverse and controllable 2D city layouts. Subsequently, a Large Language Model(LLM) is utilized to strategically make land-use plans within these layouts based on user prompts and language guidelines. Based on the generated layout and city plan, we utilize the asset retrieval module and Blender for precise asset placement and scene construction. Furthermore, we contribute two new datasets to the field: 1)CityCraft-OSM dataset including 2D semantic layouts of urban areas, corresponding satellite images, and detailed annotations. 2) CityCraft-Buildings dataset, featuring thousands of diverse, high-quality 3D building assets. CityCraft achieves state-of-the-art performance in generating realistic 3D cities. △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: 20 pages, 9 figures

arXiv:2406.04785 [pdf, other]

Enabling Efficient Batch Serving for LMaaS via Generation Length Prediction

Authors: Ke Cheng, Wen Hu, Zhi Wang, Peng Du, Jianguo Li, Sheng Zhang

Abstract: Nowadays, large language models (LLMs) are published as a service and can be accessed by various applications via APIs, also known as language-model-as-a-service (LMaaS). Without knowing the generation length of requests, existing serving systems serve requests in a first-come, first-served (FCFS) manner with a fixed batch size, which leads to two problems that affect batch serving efficiency. Fir… ▽ More Nowadays, large language models (LLMs) are published as a service and can be accessed by various applications via APIs, also known as language-model-as-a-service (LMaaS). Without knowing the generation length of requests, existing serving systems serve requests in a first-come, first-served (FCFS) manner with a fixed batch size, which leads to two problems that affect batch serving efficiency. First, the generation lengths of requests in a batch vary, and requests with short generation lengths must wait for requests with long generation lengths to finish during the batch serving procedure. Second, requests with longer generation lengths consume more memory during serving. Without knowing the generation lengths of batched requests, the batch size is always set small to avoid the out-of-memory (OOM) error, thus preventing the GPU from being fully utilized. In this paper, we find that a significant number of popular applications in the LMaaS scenario have a positive correlation between the generation length and the length of raw user input. Based on this observation, we propose Magnus, which can accurately predict the request generation length with the user input length, application-level, and user-level semantic features. Accordingly, Magnus can achieve high request throughput by batching requests of similar generation lengths together with adaptive batch sizes. Besides, Magnus can also schedule batches with the highest response ratio next (HRRN) policy to reduce request response time. Experiments conducted on our testbed show that Magnus improves request throughput by up to 234\% and reduces response time by up to 89.7\% compared to baselines. △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: 12 pages, 14 figures

arXiv:2406.03646 [pdf, other]

CLASSY X: Highlighting Differences Between Partial Covering and Semi-Analytic Modeling in the Estimate of Galactic Outflow Properties

Authors: M. Huberty, C. Carr, C. Scarlata, T. Heckman, A. Henry, X. Xu, K. Ariano-Cordoba, D. Berg, S. Charlot, J. Chisholm, S. Gazagnes, M. Hayes, W. Hu, B. James, R. M. Jennings, C. Leitherer, C. L. Martin, M. Mingozzi, E. Skillman, Y. Sugahara

Abstract: Feedback driven massive outflows play a crucial role in galaxy evolution by regulating star formation and influencing the dynamics of surrounding media. Extracting outflow properties from spectral lines is a notoriously difficult process for a number of reasons, including the possibility that a substantial fraction of the outflow is carried by dense gas in a very narrow range in velocity. This gas… ▽ More Feedback driven massive outflows play a crucial role in galaxy evolution by regulating star formation and influencing the dynamics of surrounding media. Extracting outflow properties from spectral lines is a notoriously difficult process for a number of reasons, including the possibility that a substantial fraction of the outflow is carried by dense gas in a very narrow range in velocity. This gas can hide in spectra with insufficient resolution. Empirically motivated analysis based on the Apparent Optical Depth method, commonly used in the literature, neglects the contribution of this gas, and may therefore underestimate the true gas column density. More complex semi-analytical line transfer (e.g., SALT) models, on the other hand, allow for the presence of this gas by modeling the radial density and velocity of the outflows as power laws. Here we compare the two approaches to quantify the uncertainties in the inferences of outflow properties based on 1-D "down-the-barrel" using the UV spectra of the CLASSY galaxy sample. We find that empirical modeling may significantly underestimate the column densities relative to SALT analysis, particularly in the optically thick regime. We use simulations to show that the main reason for this discrepancy is the presence of large amount of dense material at low velocities, which can be hidden by the finite spectral resolution of the data. The SALT models in turn could over-estimate the column densities if the assumed power laws of the density profiles strong are not a property of actual outflows. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: Submitted to ApJ, Comments welcome

arXiv:2406.03387 [pdf, other]

Measurement of the branching fraction ratios $R(D^{+})$ and $R(D^{*+})$ using muonic $τ$ decays

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1063 additional authors not shown)

Abstract: The branching fraction ratios of $\overline{B}^0\to D^+τ^-\overlineν_τ$ and $\overline{B}^0\to D^{*+}τ^-\overlineν_τ$ decays are measured with respect to their muonic counterparts, using a data sample corresponding to an integrated luminosity of 2.0 fb$^{-1}$ collected by the LHCb experiment in proton-proton collisions at $\sqrt{s} = 13$ TeV. The reconstructed final states are formed by combining… ▽ More The branching fraction ratios of $\overline{B}^0\to D^+τ^-\overlineν_τ$ and $\overline{B}^0\to D^{*+}τ^-\overlineν_τ$ decays are measured with respect to their muonic counterparts, using a data sample corresponding to an integrated luminosity of 2.0 fb$^{-1}$ collected by the LHCb experiment in proton-proton collisions at $\sqrt{s} = 13$ TeV. The reconstructed final states are formed by combining $D^+$ mesons with $τ^-\toμ^-\overlineν_μν_τ$ candidates, where the $D^+$ is reconstructed via the $D^+\to K^-π^+π^+$ decay. The results are \begin{align*} R(D^{+}) &= 0.249 \pm 0.043 \pm 0.047, R(D^{*+}) &= 0.402 \pm 0.081\pm 0.085, \end{align*} where the first uncertainties are statistical and the second systematic. The two measurements have a correlation coefficient of $-0.39$ and are compatible with the Standard Model. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: All figures and tables, along with machine-readable versions and any supplementary material and additional information, are available at https://lhcbproject.web.cern.ch/Publications/LHCbProjectPublic/LHCb-PAPER-2024-007.html (LHCb public pages)

Report number: LHCb-PAPER-2024-007, CERN-EP-2024-125

arXiv:2406.03156 [pdf, other]

Observation of new charmonium(-like) states in $B^+ \to D^{*\pm} D^{\mp} K^+$ decays

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1062 additional authors not shown)

Abstract: A study of resonant structures in $B^{+}\rightarrow{D^{\ast+}D^{-}K^{+}}$ and $B^{+}\rightarrow{D^{\ast-}D^{+}K^{+}}$ decays is performed, using proton-proton collision data at centre-of-mass energies of $\sqrt{s}=7, 8$, and $13$ TeV recorded by the LHCb experiment, corresponding to an integrated luminosity of 9 fb$^{-1}$. A simultaneous amplitude fit is performed to the two channels with contribu… ▽ More A study of resonant structures in $B^{+}\rightarrow{D^{\ast+}D^{-}K^{+}}$ and $B^{+}\rightarrow{D^{\ast-}D^{+}K^{+}}$ decays is performed, using proton-proton collision data at centre-of-mass energies of $\sqrt{s}=7, 8$, and $13$ TeV recorded by the LHCb experiment, corresponding to an integrated luminosity of 9 fb$^{-1}$. A simultaneous amplitude fit is performed to the two channels with contributions from resonances decaying to $D^{\ast-}D^{+}$ and $D^{\ast+}D^{-}$ states linked by $C$ parity. This procedure allows the $C$-parities of resonances in the $D^{\ast\pm}D^{\mp}$ mass spectra to be determined. Four charmonium(-like) states are observed decaying into $D^{\ast\pm}D^{\mp}$: $η_c(3945)$, $h_c(4000)$, $χ_{c1}(4010)$ and $h_c(4300)$, with quantum numbers $J^{PC}$ equal to $0^{-+}$, $1^{+-}$, $1^{++}$ and $1^{+-}$, respectively. At least three of these states have not been observed previously. In addition, the existence of the $T_{\bar{c}\bar{s}0}^{*}(2870)^{0}$ and $T_{\bar{c}\bar{s}1}^{*}(2900)^{0}$ resonances in the $D^-K^+$ mass spectrum, already observed in the $B^+ \to D^+ D^- K^+$ decay, is confirmed in a different production channel. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: All figures and tables, along with any supplementary material and additional information, are available at https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2023-047.html (LHCb public pages)

Report number: LHCb-PAPER-2023-047, CERN-EP-2024-096

arXiv:2406.02087 [pdf, ps, other]

Boundedness of variation, oscillation and maximal differential transform on BMO space

Authors: Wenting Hu, Kai Wu, Dongyong Yang, Chao Zhang

Abstract: In this paper, we prove that the oscillation operator, variation operator and maximal differential transform associated with the approximate identities are bounded from ${\rm BMO}({\mathbb R}^n)$ to its subspace ${\rm BLO}({\mathbb R}^n)$. In this paper, we prove that the oscillation operator, variation operator and maximal differential transform associated with the approximate identities are bounded from ${\rm BMO}({\mathbb R}^n)$ to its subspace ${\rm BLO}({\mathbb R}^n)$. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2406.00403 [pdf, other]

Dual-perspective Cross Contrastive Learning in Graph Transformers

Authors: Zelin Yao, Chuang Liu, Xueqi Ma, Mukun Chen, Jia Wu, Xiantao Cai, Bo Du, Wenbin Hu

Abstract: Graph contrastive learning (GCL) is a popular method for leaning graph representations by maximizing the consistency of features across augmented views. Traditional GCL methods utilize single-perspective i.e. data or model-perspective) augmentation to generate positive samples, restraining the diversity of positive samples. In addition, these positive samples may be unreliable due to uncontrollabl… ▽ More Graph contrastive learning (GCL) is a popular method for leaning graph representations by maximizing the consistency of features across augmented views. Traditional GCL methods utilize single-perspective i.e. data or model-perspective) augmentation to generate positive samples, restraining the diversity of positive samples. In addition, these positive samples may be unreliable due to uncontrollable augmentation strategies that potentially alter the semantic information. To address these challenges, this paper proposed a innovative framework termed dual-perspective cross graph contrastive learning (DC-GCL), which incorporates three modifications designed to enhance positive sample diversity and reliability: 1) We propose dual-perspective augmentation strategy that provide the model with more diverse training data, enabling the model effective learning of feature consistency across different views. 2) From the data perspective, we slightly perturb the original graphs using controllable data augmentation, effectively preserving their semantic information. 3) From the model perspective, we enhance the encoder by utilizing more powerful graph transformers instead of graph neural networks. Based on the model's architecture, we propose three pruning-based strategies to slightly perturb the encoder, providing more reliable positive samples. These modifications collectively form the DC-GCL's foundation and provide more diverse and reliable training inputs, offering significant improvements over traditional GCL methods. Extensive experiments on various benchmarks demonstrate that DC-GCL consistently outperforms different baselines on various datasets and tasks. △ Less

Submitted 1 June, 2024; originally announced June 2024.

Comments: 12 pages, 5 figures, submitted to IEEE TKDE

arXiv:2406.00235 [pdf, other]

Amplitude analysis of the radiative decay $B^0_s\to K^+K^-γ$

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1061 additional authors not shown)

Abstract: A search for radiative decay of $B^0_s$ mesons to orbitally excited $K^+K^-$ states is performed using proton proton collisions recorded by the \mbox{LHCb}\xspace experiment, corresponding to an integrated luminosity of 9~fb$^{-1}$. The dikaon spectrum in the mass range $m_{KK}<2400$~{\ensuremath{\,\text{Me\kern -0.1em V\!/}c^2}\xspace} is dominated by the $φ(1020)$ resonance that accounts for alm… ▽ More A search for radiative decay of $B^0_s$ mesons to orbitally excited $K^+K^-$ states is performed using proton proton collisions recorded by the \mbox{LHCb}\xspace experiment, corresponding to an integrated luminosity of 9~fb$^{-1}$. The dikaon spectrum in the mass range $m_{KK}<2400$~{\ensuremath{\,\text{Me\kern -0.1em V\!/}c^2}\xspace} is dominated by the $φ(1020)$ resonance that accounts for almost 70$\%$ of the decay rate. Considering the possible contributions of $f_2{(1270)}$, $f'_2{(1525)}$ and $f_2{(2010)}$ meson states, the overall tensor contribution to the amplitude is measured to be \begin{equation} {\cal F}_{\{f_2\}}=16.8\pm 0.5\mathrm{~(stat.)}\pm0.7\mathrm{~(syst.)}\%,\nonumber \end{equation} mostly dominated by the $f'_2(1525)$ state. Several statistically equivalent solutions are obtained for the detailed resonant structure depending on whether the smaller amplitudes interfere destructively or constructively with the dominant amplitude. The preferred solution that corresponds to the lowest values of the fit fractions along with constructive interference leads to the relative branching ratio measurement \begin{equation} \frac{{\cal B}(B^0_s\to f'_2γ)}{{\cal B}(B^0_s\toφγ)}= 19.4^{+0.9}_{-0.8}\mathrm{~(stat.)}{}^{+1.4}_{-0.5}\mathrm{~(syst.)}\pm0.5\mathrm{~(\cal{B})}\%\nonumber, \end{equation} where the last uncertainty is due to the ratio of measured branching fractions to the $K^+K^-$ final state. This result represents the first observation of the radiative $B^0_s\to f'_2(1525)γ$ decay, which is the second radiative transition observed in the $B^0_s$ sector. △ Less

Submitted 31 May, 2024; originally announced June 2024.

Comments: All figures and tables, along with any supplementary material and additional information, are available at https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2024-002.html (LHCb public pages)

Report number: LHCb-PAPER-2024-002, CERN-EP-2024-115

arXiv:2405.20279 [pdf, other]

CV-VAE: A Compatible Video VAE for Latent Generative Video Models

Authors: Sijie Zhao, Yong Zhang, Xiaodong Cun, Shaoshu Yang, Muyao Niu, Xiaoyu Li, Wenbo Hu, Ying Shan

Abstract: Spatio-temporal compression of videos, utilizing networks such as Variational Autoencoders (VAE), plays a crucial role in OpenAI's SORA and numerous other video generative models. For instance, many LLM-like video models learn the distribution of discrete tokens derived from 3D VAEs within the VQVAE framework, while most diffusion-based video models capture the distribution of continuous latent ex… ▽ More Spatio-temporal compression of videos, utilizing networks such as Variational Autoencoders (VAE), plays a crucial role in OpenAI's SORA and numerous other video generative models. For instance, many LLM-like video models learn the distribution of discrete tokens derived from 3D VAEs within the VQVAE framework, while most diffusion-based video models capture the distribution of continuous latent extracted by 2D VAEs without quantization. The temporal compression is simply realized by uniform frame sampling which results in unsmooth motion between consecutive frames. Currently, there lacks of a commonly used continuous video (3D) VAE for latent diffusion-based video models in the research community. Moreover, since current diffusion-based approaches are often implemented using pre-trained text-to-image (T2I) models, directly training a video VAE without considering the compatibility with existing T2I models will result in a latent space gap between them, which will take huge computational resources for training to bridge the gap even with the T2I models as initialization. To address this issue, we propose a method for training a video VAE of latent video models, namely CV-VAE, whose latent space is compatible with that of a given image VAE, e.g., image VAE of Stable Diffusion (SD). The compatibility is achieved by the proposed novel latent space regularization, which involves formulating a regularization loss using the image VAE. Benefiting from the latent space compatibility, video models can be trained seamlessly from pre-trained T2I or video models in a truly spatio-temporally compressed latent space, rather than simply sampling video frames at equal intervals. With our CV-VAE, existing video models can generate four times more frames with minimal finetuning. Extensive experiments are conducted to demonstrate the effectiveness of the proposed video VAE. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: Project Page: https://ailab-cvc.github.io/cvvae/index.html

arXiv:2405.19958 [pdf, other]

Multi-Aspect Controllable Text Generation with Disentangled Counterfactual Augmentation

Authors: Yi Liu, Xiangyu Liu, Xiangrong Zhu, Wei Hu

Abstract: Multi-aspect controllable text generation aims to control the generated texts in attributes from multiple aspects (e.g., "positive" from sentiment and "sport" from topic). For ease of obtaining training samples, existing works neglect attribute correlations formed by the intertwining of different attributes. Particularly, the stereotype formed by imbalanced attribute correlations significantly aff… ▽ More Multi-aspect controllable text generation aims to control the generated texts in attributes from multiple aspects (e.g., "positive" from sentiment and "sport" from topic). For ease of obtaining training samples, existing works neglect attribute correlations formed by the intertwining of different attributes. Particularly, the stereotype formed by imbalanced attribute correlations significantly affects multi-aspect control. In this paper, we propose MAGIC, a new multi-aspect controllable text generation method with disentangled counterfactual augmentation. We alleviate the issue of imbalanced attribute correlations during training using counterfactual feature vectors in the attribute latent space by disentanglement. During inference, we enhance attribute correlations by target-guided counterfactual augmentation to further improve multi-aspect control. Experiments show that MAGIC outperforms state-of-the-art baselines in both imbalanced and balanced attribute correlation scenarios. Our source code and data are available at https://github.com/nju-websoft/MAGIC. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: Accepted in the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024)

arXiv:2405.19782 [pdf, other]

Dataflow-Guided Retrieval Augmentation for Repository-Level Code Completion

Authors: Wei Cheng, Yuhan Wu, Wei Hu

Abstract: Recent years have witnessed the deployment of code language models (LMs) in various code intelligence tasks such as code completion. Yet, it is challenging for pre-trained LMs to generate correct completions in private repositories. Previous studies retrieve cross-file context based on import relations or text similarity, which is insufficiently relevant to completion targets. In this paper, we pr… ▽ More Recent years have witnessed the deployment of code language models (LMs) in various code intelligence tasks such as code completion. Yet, it is challenging for pre-trained LMs to generate correct completions in private repositories. Previous studies retrieve cross-file context based on import relations or text similarity, which is insufficiently relevant to completion targets. In this paper, we propose a dataflow-guided retrieval augmentation approach, called DraCo, for repository-level code completion. DraCo parses a private repository into code entities and establishes their relations through an extended dataflow analysis, forming a repo-specific context graph. Whenever triggering code completion, DraCo precisely retrieves relevant background knowledge from the repo-specific context graph and generates well-formed prompts to query code LMs. Furthermore, we construct a large Python dataset, ReccEval, with more diverse completion targets. Our experiments demonstrate the superior accuracy and applicable efficiency of DraCo, improving code exact match by 3.43% and identifier F1-score by 3.27% on average compared to the state-of-the-art approach. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: Accepted in the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024)

arXiv:2405.19315 [pdf, other]

Matryoshka Query Transformer for Large Vision-Language Models

Authors: Wenbo Hu, Zi-Yi Dou, Liunian Harold Li, Amita Kamath, Nanyun Peng, Kai-Wei Chang

Abstract: Large Vision-Language Models (LVLMs) typically encode an image into a fixed number of visual tokens (e.g., 576) and process these tokens with a language model. Despite their strong performance, LVLMs face challenges in adapting to varying computational constraints. This raises the question: can we achieve flexibility in the number of visual tokens to suit different tasks and computational resource… ▽ More Large Vision-Language Models (LVLMs) typically encode an image into a fixed number of visual tokens (e.g., 576) and process these tokens with a language model. Despite their strong performance, LVLMs face challenges in adapting to varying computational constraints. This raises the question: can we achieve flexibility in the number of visual tokens to suit different tasks and computational resources? We answer this with an emphatic yes. Inspired by Matryoshka Representation Learning, we introduce the Matryoshka Query Transformer (MQT), capable of encoding an image into m visual tokens during inference, where m can be any number up to a predefined maximum. This is achieved by employing a query transformer with M latent query tokens to compress the visual embeddings. During each training step, we randomly select m <= M latent query tokens and train the model using only these first m tokens, discarding the rest. Combining MQT with LLaVA, we train a single model once, and flexibly and drastically reduce the number of inference-time visual tokens while maintaining similar or better performance compared to training independent models for each number of tokens. Our model, MQT-LLAVA, matches LLaVA-1.5 performance across 11 benchmarks using a maximum of 256 tokens instead of LLaVA's fixed 576. Reducing to 16 tokens (8x less TFLOPs) only sacrifices the performance by 2.4 points on MMBench. On certain tasks such as ScienceQA and MMMU, we can even go down to only 2 visual tokens with performance drops of just 3% and 6% each. Our exploration of the trade-off between the accuracy and computational cost brought about by the number of visual tokens facilitates future research to achieve the best of both worlds. △ Less

Submitted 6 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

Comments: Preprint. Our code and model are publicly available at https://github.com/gordonhu608/MQT-LLaVA

arXiv:2405.18435 [pdf, other]

QUBIQ: Uncertainty Quantification for Biomedical Image Segmentation Challenge

Authors: Hongwei Bran Li, Fernando Navarro, Ivan Ezhov, Amirhossein Bayat, Dhritiman Das, Florian Kofler, Suprosanna Shit, Diana Waldmannstetter, Johannes C. Paetzold, Xiaobin Hu, Benedikt Wiestler, Lucas Zimmer, Tamaz Amiranashvili, Chinmay Prabhakar, Christoph Berger, Jonas Weidner, Michelle Alonso-Basant, Arif Rashid, Ujjwal Baid, Wesam Adel, Deniz Ali, Bhakti Baheti, Yingbin Bai, Ishaan Bhatt, Sabri Can Cetindag , et al. (55 additional authors not shown)

Abstract: Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the de… ▽ More Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the development and evaluation of automated segmentation algorithms. Accurately modeling and quantifying this variability is essential for enhancing the robustness and clinical applicability of these algorithms. We report the set-up and summarize the benchmark results of the Quantification of Uncertainties in Biomedical Image Quantification Challenge (QUBIQ), which was organized in conjunction with International Conferences on Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2020 and 2021. The challenge focuses on the uncertainty quantification of medical image segmentation which considers the omnipresence of inter-rater variability in imaging datasets. The large collection of images with multi-rater annotations features various modalities such as MRI and CT; various organs such as the brain, prostate, kidney, and pancreas; and different image dimensions 2D-vs-3D. A total of 24 teams submitted different solutions to the problem, combining various baseline models, Bayesian neural networks, and ensemble model techniques. The obtained results indicate the importance of the ensemble models, as well as the need for further research to develop efficient 3D methods for uncertainty quantification methods in 3D segmentation tasks. △ Less

Submitted 24 June, 2024; v1 submitted 19 March, 2024; originally announced May 2024.

Comments: initial technical report

arXiv:2405.17875 [pdf, other]

BO4IO: A Bayesian optimization approach to inverse optimization with uncertainty quantification

Authors: Yen-An Lu, Wei-Shou Hu, Joel A. Paulson, Qi Zhang

Abstract: This work addresses data-driven inverse optimization (IO), where the goal is to estimate unknown parameters in an optimization model from observed decisions that can be assumed to be optimal or near-optimal solutions to the optimization problem. The IO problem is commonly formulated as a large-scale bilevel program that is notoriously difficult to solve. Deviating from traditional exact solution m… ▽ More This work addresses data-driven inverse optimization (IO), where the goal is to estimate unknown parameters in an optimization model from observed decisions that can be assumed to be optimal or near-optimal solutions to the optimization problem. The IO problem is commonly formulated as a large-scale bilevel program that is notoriously difficult to solve. Deviating from traditional exact solution methods, we propose a derivative-free optimization approach based on Bayesian optimization, which we call BO4IO, to solve general IO problems. We treat the IO loss function as a black box and approximate it with a Gaussian process model. Using the predicted posterior function, an acquisition function is minimized at each iteration to query new candidate solutions and sequentially converge to the optimal parameter estimates. The main advantages of using Bayesian optimization for IO are two-fold: (i) it circumvents the need of complex reformulations of the bilevel program or specialized algorithms and can hence enable computational tractability even when the underlying optimization problem is nonconvex or involves discrete variables, and (ii) it allows approximations of the profile likelihood, which provide uncertainty quantification on the IO parameter estimates. We apply the proposed method to three computational case studies, covering different classes of forward optimization problems ranging from convex nonlinear to nonconvex mixed-integer nonlinear programs. Our extensive computational results demonstrate the efficacy and robustness of BO4IO to accurately estimate unknown model parameters from small and noisy datasets. In addition, the proposed profile likelihood analysis has proven to be effective in providing good approximations of the confidence intervals on the parameter estimates and assessing the identifiability of the unknown parameters. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.17811 [pdf, other]

Mani-GS: Gaussian Splatting Manipulation with Triangular Mesh

Authors: Xiangjun Gao, Xiaoyu Li, Yiyu Zhuang, Qi Zhang, Wenbo Hu, Chaopeng Zhang, Yao Yao, Ying Shan, Long Quan

Abstract: Neural 3D representations such as Neural Radiance Fields (NeRF), excel at producing photo-realistic rendering results but lack the flexibility for manipulation and editing which is crucial for content creation. Previous works have attempted to address this issue by deforming a NeRF in canonical space or manipulating the radiance field based on an explicit mesh. However, manipulating NeRF is not hi… ▽ More Neural 3D representations such as Neural Radiance Fields (NeRF), excel at producing photo-realistic rendering results but lack the flexibility for manipulation and editing which is crucial for content creation. Previous works have attempted to address this issue by deforming a NeRF in canonical space or manipulating the radiance field based on an explicit mesh. However, manipulating NeRF is not highly controllable and requires a long training and inference time. With the emergence of 3D Gaussian Splatting (3DGS), extremely high-fidelity novel view synthesis can be achieved using an explicit point-based 3D representation with much faster training and rendering speed. However, there is still a lack of effective means to manipulate 3DGS freely while maintaining rendering quality. In this work, we aim to tackle the challenge of achieving manipulable photo-realistic rendering. We propose to utilize a triangular mesh to manipulate 3DGS directly with self-adaptation. This approach reduces the need to design various algorithms for different types of Gaussian manipulation. By utilizing a triangle shape-aware Gaussian binding and adapting method, we can achieve 3DGS manipulation and preserve high-fidelity rendering after manipulation. Our approach is capable of handling large deformations, local manipulations, and soft body simulations while keeping high-quality rendering. Furthermore, we demonstrate that our method is also effective with inaccurate meshes extracted from 3DGS. Experiments conducted demonstrate the effectiveness of our method and its superiority over baseline approaches. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: Project page here: https://gaoxiangjun.github.io/mani_gs/

arXiv:2405.17347 [pdf, other]

Comprehensive analysis of local and nonlocal amplitudes in the $B^0\rightarrow K^{*0}μ^+μ^-$ decay

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1070 additional authors not shown)

Abstract: A comprehensive study of the local and nonlocal amplitudes contributing to the decay $B^0\rightarrow K^{*0}(\to K^+π^-) μ^+μ^-$ is performed by analysing the phase-space distribution of the decay products. The analysis is based on \proton\proton collision data corresponding to an integrated luminosity of 8.4fb$^{-1}$ collected by the LHCb experiment. This measurement employs for the first time a m… ▽ More A comprehensive study of the local and nonlocal amplitudes contributing to the decay $B^0\rightarrow K^{*0}(\to K^+π^-) μ^+μ^-$ is performed by analysing the phase-space distribution of the decay products. The analysis is based on \proton\proton collision data corresponding to an integrated luminosity of 8.4fb$^{-1}$ collected by the LHCb experiment. This measurement employs for the first time a model of both one-particle and two-particle nonlocal amplitudes, and utilises the complete dimuon mass spectrum without any veto regions around the narrow charmonium resonances. In this way it is possible to explicitly isolate the local and nonlocal contributions and capture the interference between them. The results show that interference with nonlocal contributions, although larger than predicted, only has a minor impact on the Wilson Coefficients determined from the fit to the data. For the local contributions, the Wilson Coefficient $C_9$, responsible for vector dimuon currents, exhibits a $2.1σ$ deviation from the Standard Model expectation. The Wilson Coefficients $C_{10}$, $C_{9}'$ and $C_{10}'$ are all in better agreement than $C_{9}$ with the Standard Model and the global significance is at the level of $1.5σ$. The model used also accounts for nonlocal contributions from $B^{0}\to K^{*0}\left[τ^+τ^-\to μ^+μ^-\right]$ rescattering, resulting in the first direct measurement of the $b sττ$ vector effective-coupling $C_{9τ}$. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: All figures and tables, along with any supplementary material and additional information, are available at https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2024-011.html (LHCb public pages)

Report number: LHCb-PAPER-2024-011, CERN-EP-2024-122

arXiv:2405.16209 [pdf]

Analytical photoresponses of Schottky contact MoS2 phototransistors

Authors: Jianyong Wei, Yumeng Liu, Yizhuo Wang, Kai Li, Zhentao Lian, Maosong Xie, Xinhan Yang, Seyed Saleh Mousavi Khaleghi, Fuxing Dai, Weida Hu, Xuejiao Gao, Rui Yang, Yaping Dan

Abstract: High-gain photodetectors based on two-dimensional (2D) semiconductors, in particular those in photoconductive mode, have been extensively investigated in the past decade. However, the classical photoconductive theory was derived on two misplaced assumptions. In this work, we established an explicit analytical device model for Schottky contact MoS2 phototransistors that fits well with experimental… ▽ More High-gain photodetectors based on two-dimensional (2D) semiconductors, in particular those in photoconductive mode, have been extensively investigated in the past decade. However, the classical photoconductive theory was derived on two misplaced assumptions. In this work, we established an explicit analytical device model for Schottky contact MoS2 phototransistors that fits well with experimental data. From the fitting results, we found that the Richardson constant of the MoS2 Schottky contact is temperature dependent, indicating that the Schottky contacts for the 2D material is best described by the mixed thermionic emission and diffusion model. Based on this device model, we further established an analytical photoresponse for the few-layer MoS2 phototransistors, from which we found the voltage distribution on the two Schottky contacts and the channel, and extracted the minority carrier recombination lifetimes. The lifetimes are comparable with the values found from transient photoluminescence measurements, which therefore validates our analytical photoresponses for Schottky contact 2D semiconducting phototransistors. △ Less

Submitted 25 May, 2024; originally announced May 2024.

Comments: 15 pages, 6 figures

arXiv:2405.16122 [pdf, other]

Prompt Optimization with EASE? Efficient Ordering-aware Automated Selection of Exemplars

Authors: Zhaoxuan Wu, Xiaoqiang Lin, Zhongxiang Dai, Wenyang Hu, Yao Shu, See-Kiong Ng, Patrick Jaillet, Bryan Kian Hsiang Low

Abstract: Large language models (LLMs) have shown impressive capabilities in real-world applications. The capability of in-context learning (ICL) allows us to adapt an LLM to downstream tasks by including input-label exemplars in the prompt without model fine-tuning. However, the quality of these exemplars in the prompt greatly impacts performance, highlighting the need for an effective automated exemplar s… ▽ More Large language models (LLMs) have shown impressive capabilities in real-world applications. The capability of in-context learning (ICL) allows us to adapt an LLM to downstream tasks by including input-label exemplars in the prompt without model fine-tuning. However, the quality of these exemplars in the prompt greatly impacts performance, highlighting the need for an effective automated exemplar selection method. Recent studies have explored retrieval-based approaches to select exemplars tailored to individual test queries, which can be undesirable due to extra test-time computation and an increased risk of data exposure. Moreover, existing methods fail to adequately account for the impact of exemplar ordering on the performance. On the other hand, the impact of the instruction, another essential component in the prompt given to the LLM, is often overlooked in existing exemplar selection methods. To address these challenges, we propose a novel method named EASE, which leverages the hidden embedding from a pre-trained language model to represent ordered sets of exemplars and uses a neural bandit algorithm to optimize the sets of exemplars while accounting for exemplar ordering. Our EASE can efficiently find an ordered set of exemplars that performs well for all test queries from a given task, thereby eliminating test-time computation. Importantly, EASE can be readily extended to jointly optimize both the exemplars and the instruction. Through extensive empirical evaluations (including novel tasks), we demonstrate the superiority of EASE over existing methods, and reveal practical insights about the impact of exemplar selection on ICL, which may be of independent interest. Our code is available at https://github.com/ZhaoxuanWu/EASE-Prompt-Optimization. △ Less

Submitted 25 May, 2024; originally announced May 2024.

Comments: 23 pages, 1 figure, 23 tables

arXiv:2405.14628 [pdf, other]

Online robust estimation and bootstrap inference for function-on-scalar regression

Authors: Guanghui Cheng, Wenjuan Hu, Ruitao Lin, Chen Wang

Abstract: We propose a novel and robust online function-on-scalar regression technique via geometric median to learn associations between functional responses and scalar covariates based on massive or streaming datasets. The online estimation procedure, developed using the average stochastic gradient descent algorithm, offers an efficient and cost-effective method for analyzing sequentially augmented datase… ▽ More We propose a novel and robust online function-on-scalar regression technique via geometric median to learn associations between functional responses and scalar covariates based on massive or streaming datasets. The online estimation procedure, developed using the average stochastic gradient descent algorithm, offers an efficient and cost-effective method for analyzing sequentially augmented datasets, eliminating the need to store large volumes of data in memory. We establish the almost sure consistency, $L_p$ convergence, and asymptotic normality of the online estimator. To enable efficient and fast inference of the parameters of interest, including the derivation of confidence intervals, we also develop an innovative two-step online bootstrap procedure to approximate the limiting error distribution of the robust online estimator. Numerical studies under a variety of scenarios demonstrate the effectiveness and efficiency of the proposed online learning method. A real application analyzing PM$_{2.5}$ air-quality data is also included to exemplify the proposed online approach. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Showing 1–50 of 1,688 results for author: Hu, W