subscribe to arXiv mailings

Resource Allocation and Workload Scheduling for Large-Scale Distributed Deep Learning: A Survey

Authors: Feng Liang, Zhen Zhang, Haifeng Lu, Chengming Li, Victor C. M. Leung, Yanyi Guo, Xiping Hu

Abstract: With rapidly increasing distributed deep learning workloads in large-scale data centers, efficient distributed deep learning framework strategies for resource allocation and workload scheduling have become the key to high-performance deep learning. The large-scale environment with large volumes of datasets, models, and computational and communication resources raises various unique challenges for… ▽ More With rapidly increasing distributed deep learning workloads in large-scale data centers, efficient distributed deep learning framework strategies for resource allocation and workload scheduling have become the key to high-performance deep learning. The large-scale environment with large volumes of datasets, models, and computational and communication resources raises various unique challenges for resource allocation and workload scheduling in distributed deep learning, such as scheduling complexity, resource and workload heterogeneity, and fault tolerance. To uncover these challenges and corresponding solutions, this survey reviews the literature, mainly from 2019 to 2024, on efficient resource allocation and workload scheduling strategies for large-scale distributed DL. We explore these strategies by focusing on various resource types, scheduling granularity levels, and performance goals during distributed training and inference processes. We highlight critical challenges for each topic and discuss key insights of existing technologies. To illustrate practical large-scale resource allocation and workload scheduling in real distributed deep learning scenarios, we use a case study of training large language models. This survey aims to encourage computer science, artificial intelligence, and communications researchers to understand recent advances and explore future research directions for efficient framework strategies for large-scale distributed deep learning. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.08090 [pdf, other]

From Sim-to-Real: Toward General Event-based Low-light Frame Interpolation with Per-scene Optimization

Authors: Ziran Zhang, Yongrui Ma, Yueting Chen, Feng Zhang, Jinwei Gu, Tianfan Xue, Shi Guo

Abstract: Video Frame Interpolation (VFI) is important for video enhancement, frame rate up-conversion, and slow-motion generation. The introduction of event cameras, which capture per-pixel brightness changes asynchronously, has significantly enhanced VFI capabilities, particularly for high-speed, nonlinear motions. However, these event-based methods encounter challenges in low-light conditions, notably tr… ▽ More Video Frame Interpolation (VFI) is important for video enhancement, frame rate up-conversion, and slow-motion generation. The introduction of event cameras, which capture per-pixel brightness changes asynchronously, has significantly enhanced VFI capabilities, particularly for high-speed, nonlinear motions. However, these event-based methods encounter challenges in low-light conditions, notably trailing artifacts and signal latency, which hinder their direct applicability and generalization. Addressing these issues, we propose a novel per-scene optimization strategy tailored for low-light conditions. This approach utilizes the internal statistics of a sequence to handle degraded event data under low-light conditions, improving the generalizability to different lighting and camera settings. To evaluate its robustness in low-light condition, we further introduce EVFI-LL, a unique RGB+Event dataset captured under low-light conditions. Our results demonstrate state-of-the-art performance in low-light environments. Both the dataset and the source code will be made publicly available upon publication. Project page: https://naturezhanghn.github.io/sim2real. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.07984 [pdf, ps, other]

On the density patch problem for the 2-D inhomogeneous Navier-Stokes equations

Authors: Tiantian Hao, Feng Shao, Dongyi Wei, Zhifei Zhang

Abstract: In this paper, we first construct a class of global strong solutions for the 2-D inhomogeneous Navier-Stokes equations under very general assumption that the initial density is only bounded and the initial velocity is in $H^1(\mathbb{R}^2)$. With suitable assumptions on the initial density, which includes the case of density patch and vacuum bubbles, we prove that Lions' s weak solution is the sam… ▽ More In this paper, we first construct a class of global strong solutions for the 2-D inhomogeneous Navier-Stokes equations under very general assumption that the initial density is only bounded and the initial velocity is in $H^1(\mathbb{R}^2)$. With suitable assumptions on the initial density, which includes the case of density patch and vacuum bubbles, we prove that Lions' s weak solution is the same as the strong solution with the same initial data. In particular, this gives a complete resolution of the density patch problem proposed by Lions: {\it for the density patch data $ρ_0=1_{D}$ with a smooth bounded domain $D\subset\mathbb{R}^2$, the regularity of $D$ is preserved by the time evolution of Lions's weak solution.} △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 23 pages

arXiv:2406.07918 [pdf, other]

Micro-expression recognition based on depth map to point cloud

Authors: Ren Zhang, Jianqin Yin, Chao Qi, Zehao Wang, Zhicheng Zhang, Yonghao Dang

Abstract: Micro-expressions are nonverbal facial expressions that reveal the covert emotions of individuals, making the micro-expression recognition task receive widespread attention. However, the micro-expression recognition task is challenging due to the subtle facial motion and brevity in duration. Many 2D image-based methods have been developed in recent years to recognize MEs effectively, but, these ap… ▽ More Micro-expressions are nonverbal facial expressions that reveal the covert emotions of individuals, making the micro-expression recognition task receive widespread attention. However, the micro-expression recognition task is challenging due to the subtle facial motion and brevity in duration. Many 2D image-based methods have been developed in recent years to recognize MEs effectively, but, these approaches are restricted by facial texture information and are susceptible to environmental factors, such as lighting. Conversely, depth information can effectively represent motion information related to facial structure changes and is not affected by lighting. Motion information derived from facial structures can describe motion features that pixel textures cannot delineate. We proposed a network for micro-expression recognition based on facial depth information, and our experiments have demonstrated the crucial role of depth maps in the micro-expression recognition task. Initially, we transform the depth map into a point cloud and obtain the motion information for each point by aligning the initiating frame with the apex frame and performing a differential operation. Subsequently, we adjusted all point cloud motion feature input dimensions and used them as inputs for multiple point cloud networks to assess the efficacy of this representation. PointNet++ was chosen as the ultimate outcome for micro-expression recognition due to its superior performance. Our experiments show that our proposed method significantly outperforms the existing deep learning methods, including the baseline, on the $CAS(ME)^3$ dataset, which includes depth information. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.07601 [pdf, other]

IceCube Search for Neutrino Emission from X-ray Bright Seyfert Galaxies

Authors: R. Abbasi, M. Ackermann, J. Adams, S. K. Agarwalla, J. A. Aguilar, M. Ahlers, J. M. Alameddine, N. M. Amin, K. Andeen, C. Argüelles, Y. Ashida, S. Athanasiadou, L. Ausborm, S. N. Axani, X. Bai, A. Balagopal V., M. Baricevic, S. W. Barwick, S. Bash, V. Basu, R. Bay, J. J. Beatty, J. Becker Tjus, J. Beise, C. Bellenghi , et al. (400 additional authors not shown)

Abstract: The recent IceCube detection of TeV neutrino emission from the nearby active galaxy NGC 1068 suggests that active galactic nuclei (AGN) could make a sizable contribution to the diffuse flux of astrophysical neutrinos. The absence of TeV $γ$-rays from NGC 1068 indicates neutrino production in the vicinity of the supermassive black hole, where the high radiation density leads to $γ$-ray attenuation.… ▽ More The recent IceCube detection of TeV neutrino emission from the nearby active galaxy NGC 1068 suggests that active galactic nuclei (AGN) could make a sizable contribution to the diffuse flux of astrophysical neutrinos. The absence of TeV $γ$-rays from NGC 1068 indicates neutrino production in the vicinity of the supermassive black hole, where the high radiation density leads to $γ$-ray attenuation. Therefore, any potential neutrino emission from similar sources is not expected to correlate with high-energy $γ$-rays. Disk-corona models predict neutrino emission from Seyfert galaxies to correlate with keV X-rays, as they are tracers of coronal activity. Using through-going track events from the Northern Sky recorded by IceCube between 2011 and 2021, we report results from a search for individual and aggregated neutrino signals from 27 additional Seyfert galaxies that are contained in the BAT AGN Spectroscopic Survey (BASS). Besides the generic single power-law, we evaluate the spectra predicted by the disk-corona model. Assuming all sources to be intrinsically similar to NGC 1068, our findings constrain the collective neutrino emission from X-ray bright Seyfert galaxies in the Northern Hemisphere, but, at the same time, show excesses of neutrinos that could be associated with the objects NGC 4151 and CGCG 420-015. These excesses result in a 2.7$σ$ significance with respect to background expectations. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: 17 pages, 9 figures

arXiv:2406.07499 [pdf, other]

Trim 3D Gaussian Splatting for Accurate Geometry Representation

Authors: Lue Fan, Yuxue Yang, Minxing Li, Hongsheng Li, Zhaoxiang Zhang

Abstract: In this paper, we introduce Trim 3D Gaussian Splatting (TrimGS) to reconstruct accurate 3D geometry from images. Previous arts for geometry reconstruction from 3D Gaussians mainly focus on exploring strong geometry regularization. Instead, from a fresh perspective, we propose to obtain accurate 3D geometry of a scene by Gaussian trimming, which selectively removes the inaccurate geometry while pre… ▽ More In this paper, we introduce Trim 3D Gaussian Splatting (TrimGS) to reconstruct accurate 3D geometry from images. Previous arts for geometry reconstruction from 3D Gaussians mainly focus on exploring strong geometry regularization. Instead, from a fresh perspective, we propose to obtain accurate 3D geometry of a scene by Gaussian trimming, which selectively removes the inaccurate geometry while preserving accurate structures. To achieve this, we analyze the contributions of individual 3D Gaussians and propose a contribution-based trimming strategy to remove the redundant or inaccurate Gaussians. Furthermore, our experimental and theoretical analyses reveal that a relatively small Gaussian scale is a non-negligible factor in representing and optimizing the intricate details. Therefore the proposed TrimGS maintains relatively small Gaussian scales. In addition, TrimGS is also compatible with the effective geometry regularization strategies in previous arts. When combined with the original 3DGS and the state-of-the-art 2DGS, TrimGS consistently yields more accurate geometry and higher perceptual quality. Our project page is https://trimgs.github.io △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: Project page: https://trimgs.github.io/

arXiv:2406.07362 [pdf, other]

AI.vs.Clinician: Unveiling Intricate Interactions Between AI and Clinicians through an Open-Access Database

Authors: Wanling Gao, Yuan Liu, Zhuoming Yu, Dandan Cui, Wenjing Liu, Xiaoshuang Liang, Jiahui Zhao, Jiyue Xie, Hao Li, Li Ma, Ning Ye, Yumiao Kang, Dingfeng Luo, Peng Pan, Wei Huang, Zhongmou Liu, Jizhong Hu, Fan Huang, Gangyuan Zhao, Chongrong Jiang, Tianyi Wei, Zhifei Zhang, Yunyou Huang, Jianfeng Zhan

Abstract: Artificial Intelligence (AI) plays a crucial role in medical field and has the potential to revolutionize healthcare practices. However, the success of AI models and their impacts hinge on the synergy between AI and medical specialists, with clinicians assuming a dominant role. Unfortunately, the intricate dynamics and interactions between AI and clinicians remain undiscovered and thus hinder AI f… ▽ More Artificial Intelligence (AI) plays a crucial role in medical field and has the potential to revolutionize healthcare practices. However, the success of AI models and their impacts hinge on the synergy between AI and medical specialists, with clinicians assuming a dominant role. Unfortunately, the intricate dynamics and interactions between AI and clinicians remain undiscovered and thus hinder AI from being translated into medical practice. To address this gap, we have curated a groundbreaking database called AI.vs.Clinician. This database is the first of its kind for studying the interactions between AI and clinicians. It derives from 7,500 collaborative diagnosis records on a life-threatening medical emergency -- Sepsis -- from 14 medical centers across China. For the patient cohorts well-chosen from MIMIC databases, the AI-related information comprises the model property, feature input, diagnosis decision, and inferred probabilities of sepsis onset presently and within next three hours. The clinician-related information includes the viewed examination data and sequence, viewed time, preliminary and final diagnosis decisions with or without AI assistance, and recommended treatment. △ Less

Submitted 15 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

Comments: 12 pages

arXiv:2406.07306 [pdf, other]

A directional total variation minimization algorithm for isotropic resolution in digital breast tomosynthesis

Authors: Emil Y. Sidky, Xiangyi Wu, Xiaoyu Duan, Hailing Huang, Wei Zhao, Leo Y. Zhang, John Paul Phillips, Zheng Zhang, Buxin Chen, Dan Xia, Ingrid S. Reiser, Xiaochuan Pan

Abstract: An optimization-based image reconstruction algorithm is developed for contrast enhanced digital breast tomosynthesis (DBT) using dual-energy scanning. The algorithm minimizes directional total variation (TV) with a data discrepancy and non-negativity constraints. Iodinated contrast agent (ICA) imaging is performed by reconstructing images from dual-energy DBT data followed by weighted subtraction.… ▽ More An optimization-based image reconstruction algorithm is developed for contrast enhanced digital breast tomosynthesis (DBT) using dual-energy scanning. The algorithm minimizes directional total variation (TV) with a data discrepancy and non-negativity constraints. Iodinated contrast agent (ICA) imaging is performed by reconstructing images from dual-energy DBT data followed by weighted subtraction. Physical DBT data is acquired with a Siemens Mammomat scanner of a structured breast phantom with ICA inserts. Results are shown for both directional TV minimization and filtered back-projection for reference. It is seen that directional TV is able to substantially reduce depth blur for the ICA objects. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: Proceedings paper for accepted contribution to the 8th International Conference on Image Formation in X-Ray Computed Tomography (https://www.ct-meeting.org)

arXiv:2406.07238 [pdf]

Structures and Superconductivity of Hydrogen and Hydrides under Extreme Pressure

Authors: Zihan Zhang, Wendi Zhao, Defang Duan, Tian Cui

Abstract: Metallic hydrogen, existing in remarkably extreme environments, was predicted to exhibit long-sought room-temperature superconductivity. Although the superconductivity of metallic hydrogen has not been confirmed experimentally, superconductivity of hydrogen in hydrides was recently discovered with remarkably high critical temperature as theoretically predicted. In recent years, theoretical simulat… ▽ More Metallic hydrogen, existing in remarkably extreme environments, was predicted to exhibit long-sought room-temperature superconductivity. Although the superconductivity of metallic hydrogen has not been confirmed experimentally, superconductivity of hydrogen in hydrides was recently discovered with remarkably high critical temperature as theoretically predicted. In recent years, theoretical simulations have become a new paradigm for material science, especially exploration of material at extreme pressure. As the typical high-pressure material, metallic hydrogen has been providing a fertile playground for advanced simulations for long time. Simulations not only provide the substitute of experiments for hydrogen at high-pressure, but also encouraged the discovery of almost all the experimentally discovered superconducting hydrides with the record high superconducting transition temperature. This work reviews recent progress in hydrogen and hydrides under extreme pressure, focusing on phase diagram, structures and the long-sought goal of high-temperature superconductivity. In the end, we highlight structural features of hydrides for realization of hydrogen-driven superconducting hydrides near ambient pressure. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: 35 pages, 9 figures

arXiv:2406.07021 [pdf, other]

A Tool for Test Case Scenarios Generation Using Large Language Models

Authors: Abdul Malik Sami, Zeeshan Rasheed, Muhammad Waseem, Zheying Zhang, Herda Tomas, Pekka Abrahamsson

Abstract: Large Language Models (LLMs) are widely used in Software Engineering (SE) for various tasks, including generating code, designing and documenting software, adding code comments, reviewing code, and writing test scripts. However, creating test scripts or automating test cases demands test suite documentation that comprehensively covers functional requirements. Such documentation must enable thoroug… ▽ More Large Language Models (LLMs) are widely used in Software Engineering (SE) for various tasks, including generating code, designing and documenting software, adding code comments, reviewing code, and writing test scripts. However, creating test scripts or automating test cases demands test suite documentation that comprehensively covers functional requirements. Such documentation must enable thorough testing within a constrained scope and timeframe, particularly as requirements and user demands evolve. This article centers on generating user requirements as epics and high-level user stories and crafting test case scenarios based on these stories. It introduces a web-based software tool that employs an LLM-based agent and prompt engineering to automate the generation of test case scenarios against user requirements. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: 6 pages, 2 figures, and 1 table

arXiv:2406.06829 [pdf, other]

Personalized Binomial DAGs Learning with Network Structured Covariates

Authors: Boxin Zhao, Weishi Wang, Dingyuan Zhu, Ziqi Liu, Dong Wang, Zhiqiang Zhang, Jun Zhou, Mladen Kolar

Abstract: The causal dependence in data is often characterized by Directed Acyclic Graphical (DAG) models, widely used in many areas. Causal discovery aims to recover the DAG structure using observational data. This paper focuses on causal discovery with multi-variate count data. We are motivated by real-world web visit data, recording individual user visits to multiple websites. Building a causal diagram c… ▽ More The causal dependence in data is often characterized by Directed Acyclic Graphical (DAG) models, widely used in many areas. Causal discovery aims to recover the DAG structure using observational data. This paper focuses on causal discovery with multi-variate count data. We are motivated by real-world web visit data, recording individual user visits to multiple websites. Building a causal diagram can help understand user behavior in transitioning between websites, inspiring operational strategy. A challenge in modeling is user heterogeneity, as users with different backgrounds exhibit varied behaviors. Additionally, social network connections can result in similar behaviors among friends. We introduce personalized Binomial DAG models to address heterogeneity and network dependency between observations, which are common in real-world applications. To learn the proposed DAG model, we develop an algorithm that embeds the network structure into a dimension-reduced covariate, learns each node's neighborhood to reduce the DAG search space, and explores the variance-mean relation to determine the ordering. Simulations show our algorithm outperforms state-of-the-art competitors in heterogeneous data. We demonstrate its practical usefulness on a real-world web visit dataset. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.06684 [pdf, other]

Search for neutrino emission from hard X-ray AGN with IceCube

Authors: R. Abbasi, M. Ackermann, J. Adams, S. K. Agarwalla, J. A. Aguilar, M. Ahlers, J. M. Alameddine, N. M. Amin, K. Andeen, C. Argüelles, Y. Ashida, S. Athanasiadou, L. Ausborm, S. N. Axani, X. Bai, A. Balagopal V., M. Baricevic, S. W. Barwick, S. Bash, V. Basu, R. Bay, J. J. Beatty, J. Becker Tjus, J. Beise, C. Bellenghi , et al. (401 additional authors not shown)

Abstract: Active Galactic Nuclei (AGN) are promising candidate sources of high-energy astrophysical neutrinos since they provide environments rich in matter and photon targets where cosmic ray interactions may lead to the production of gamma rays and neutrinos. We searched for high-energy neutrino emission from AGN using the $\textit{Swift}$-BAT Spectroscopic Survey (BASS) catalog of hard X-ray sources and… ▽ More Active Galactic Nuclei (AGN) are promising candidate sources of high-energy astrophysical neutrinos since they provide environments rich in matter and photon targets where cosmic ray interactions may lead to the production of gamma rays and neutrinos. We searched for high-energy neutrino emission from AGN using the $\textit{Swift}$-BAT Spectroscopic Survey (BASS) catalog of hard X-ray sources and 12 years of IceCube muon track data. First, upon performing a stacked search, no significant emission was found. Second, we searched for neutrinos from a list of 43 candidate sources and found an excess from the direction of two sources, Seyfert galaxies NGC 1068 and NGC 4151. We observed NGC 1068 at flux $φ_{ν_μ+\barν_μ}$ = $4.02_{-1.52}^{+1.58} \times 10^{-11}$ TeV$^{-1}$ cm$^{-2}$ s$^{-1}$ normalized at 1 TeV, with power-law spectral index, $γ$ = 3.10$^{+0.26}_{-0.22}$, consistent with previous IceCube results. The observation of a neutrino excess from the direction of NGC 4151 is at a post-trial significance of 2.9$σ$. If interpreted as an astrophysical signal, the excess observed from NGC 4151 corresponds to a flux $φ_{ν_μ+\barν_μ}$ = $1.51_{-0.81}^{+0.99} \times 10^{-11}$ TeV$^{-1}$ cm$^{-2}$ s$^{-1}$ normalized at 1 TeV and $γ$ = 2.83$^{+0.35}_{-0.28}$. △ Less

Submitted 12 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.06626 [pdf, other]

Benchmarking Neural Decoding Backbones towards Enhanced On-edge iBCI Applications

Authors: Zhou Zhou, Guohang He, Zheng Zhang, Luziwei Leng, Qinghai Guo, Jianxing Liao, Xuan Song, Ran Cheng

Abstract: Traditional invasive Brain-Computer Interfaces (iBCIs) typically depend on neural decoding processes conducted on workstations within laboratory settings, which prevents their everyday usage. Implementing these decoding processes on edge devices, such as the wearables, introduces considerable challenges related to computational demands, processing speed, and maintaining accuracy. This study seeks… ▽ More Traditional invasive Brain-Computer Interfaces (iBCIs) typically depend on neural decoding processes conducted on workstations within laboratory settings, which prevents their everyday usage. Implementing these decoding processes on edge devices, such as the wearables, introduces considerable challenges related to computational demands, processing speed, and maintaining accuracy. This study seeks to identify an optimal neural decoding backbone that boasts robust performance and swift inference capabilities suitable for edge deployment. We executed a series of neural decoding experiments involving nonhuman primates engaged in random reaching tasks, evaluating four prospective models, Gated Recurrent Unit (GRU), Transformer, Receptance Weighted Key Value (RWKV), and Selective State Space model (Mamba), across several metrics: single-session decoding, multi-session decoding, new session fine-tuning, inference speed, calibration speed, and scalability. The findings indicate that although the GRU model delivers sufficient accuracy, the RWKV and Mamba models are preferable due to their superior inference and calibration speeds. Additionally, RWKV and Mamba comply with the scaling law, demonstrating improved performance with larger data sets and increased model sizes, whereas GRU shows less pronounced scalability, and the Transformer model requires computational resources that scale prohibitively. This paper presents a thorough comparative analysis of the four models in various scenarios. The results are pivotal in pinpointing an optimal backbone that can handle increasing data volumes and is viable for edge implementation. This analysis provides essential insights for ongoing research and practical applications in the field. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2406.06593 [pdf, other]

Differentiable Combinatorial Scheduling at Scale

Authors: Mingju Liu, Yingjie Li, Jiaqi Yin, Zhiru Zhang, Cunxi Yu

Abstract: This paper addresses the complex issue of resource-constrained scheduling, an NP-hard problem that spans critical areas including chip design and high-performance computing. Traditional scheduling methods often stumble over scalability and applicability challenges. We propose a novel approach using a differentiable combinatorial scheduling framework, utilizing Gumbel-Softmax differentiable samplin… ▽ More This paper addresses the complex issue of resource-constrained scheduling, an NP-hard problem that spans critical areas including chip design and high-performance computing. Traditional scheduling methods often stumble over scalability and applicability challenges. We propose a novel approach using a differentiable combinatorial scheduling framework, utilizing Gumbel-Softmax differentiable sampling technique. This new technical allows for a fully differentiable formulation of linear programming (LP) based scheduling, extending its application to a broader range of LP formulations. To encode inequality constraints for scheduling tasks, we introduce \textit{constrained Gumbel Trick}, which adeptly encodes arbitrary inequality constraints. Consequently, our method facilitates an efficient and scalable scheduling via gradient descent without the need for training data. Comparative evaluations on both synthetic and real-world benchmarks highlight our capability to significantly improve the optimization efficiency of scheduling, surpassing state-of-the-art solutions offered by commercial and open-source solvers such as CPLEX, Gurobi, and CP-SAT in the majority of the designs. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: 13 pages; International Conference on Machine Learning (ICML'24)

arXiv:2406.06567 [pdf, other]

DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via Adaptive Heads Fusion

Authors: Yilong Chen, Linhao Zhang, Junyuan Shang, Zhenyu Zhang, Tingwen Liu, Shuohuan Wang, Yu Sun

Abstract: Large language models (LLMs) with billions of parameters demonstrate impressive performance. However, the widely used Multi-Head Attention (MHA) in LLMs incurs substantial computational and memory costs during inference. While some efforts have optimized attention mechanisms by pruning heads or sharing parameters among heads, these methods often lead to performance degradation or necessitate subst… ▽ More Large language models (LLMs) with billions of parameters demonstrate impressive performance. However, the widely used Multi-Head Attention (MHA) in LLMs incurs substantial computational and memory costs during inference. While some efforts have optimized attention mechanisms by pruning heads or sharing parameters among heads, these methods often lead to performance degradation or necessitate substantial continued pre-training costs to restore performance. Based on the analysis of attention redundancy, we design a Decoupled-Head Attention (DHA) mechanism. DHA adaptively configures group sharing for key heads and value heads across various layers, achieving a better balance between performance and efficiency. Inspired by the observation of clustering similar heads, we propose to progressively transform the MHA checkpoint into the DHA model through linear fusion of similar head parameters step by step, retaining the parametric knowledge of the MHA checkpoint. We construct DHA models by transforming various scales of MHA checkpoints given target head budgets. Our experiments show that DHA remarkably requires a mere 0.25\% of the original model's pre-training budgets to achieve 97.6\% of performance while saving 75\% of KV cache. Compared to Group-Query Attention (GQA), DHA achieves a 5$\times$ training acceleration, a maximum of 13.93\% performance improvement under 0.01\% pre-training budget, and 4\% relative improvement under 0.05\% pre-training budget. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: 10 pages, 9 figures, 3 tables

arXiv:2406.06518 [pdf, other]

Data Augmentation for Multivariate Time Series Classification: An Experimental Study

Authors: Romain Ilbert, Thai V. Hoang, Zonghua Zhang

Abstract: Our study investigates the impact of data augmentation on the performance of multivariate time series models, focusing on datasets from the UCR archive. Despite the limited size of these datasets, we achieved classification accuracy improvements in 10 out of 13 datasets using the Rocket and InceptionTime models. This highlights the essential role of sufficient data in training effective models, pa… ▽ More Our study investigates the impact of data augmentation on the performance of multivariate time series models, focusing on datasets from the UCR archive. Despite the limited size of these datasets, we achieved classification accuracy improvements in 10 out of 13 datasets using the Rocket and InceptionTime models. This highlights the essential role of sufficient data in training effective models, paralleling the advancements seen in computer vision. Our work delves into adapting and applying existing methods in innovative ways to the domain of multivariate time series classification. Our comprehensive exploration of these techniques sets a new standard for addressing data scarcity in time series analysis, emphasizing that diverse augmentation strategies are crucial for unlocking the potential of both traditional and deep learning models. Moreover, by meticulously analyzing and applying a variety of augmentation techniques, we demonstrate that strategic data enrichment can enhance model accuracy. This not only establishes a benchmark for future research in time series analysis but also underscores the importance of adopting varied augmentation approaches to improve model performance in the face of limited data availability. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: Workshop on Multivariate Time Series Analytics (MulTiSA), ICDE Workshop

arXiv:2406.06118 [pdf, other]

Strong and weak $CP$ tests in sequential decays of polarized $Σ^0$ hyperons

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (644 additional authors not shown)

Abstract: The $J/ψ, ψ(3686) \to Σ^0 \barΣ^{0}$ processes and subsequent decays are studied using the world's largest $J/ψ$ and $ψ(3686)$ data samples collected with the BESIII detector. The strong-$CP$ symmetry is tested in the decays of the $Σ^0$ hyperons for the first time by measuring the decay parameters, $α_{Σ^0} = -0.0017 \pm 0.0021 \pm 0.0018$ and $\barα_{Σ^0} = 0.0021 \pm 0.0020 \pm 0.0022$. The wea… ▽ More The $J/ψ, ψ(3686) \to Σ^0 \barΣ^{0}$ processes and subsequent decays are studied using the world's largest $J/ψ$ and $ψ(3686)$ data samples collected with the BESIII detector. The strong-$CP$ symmetry is tested in the decays of the $Σ^0$ hyperons for the first time by measuring the decay parameters, $α_{Σ^0} = -0.0017 \pm 0.0021 \pm 0.0018$ and $\barα_{Σ^0} = 0.0021 \pm 0.0020 \pm 0.0022$. The weak-$CP$ test is performed in the subsequent decays of their daughter particles $Λ$ and $\barΛ$. Also for the first time, the transverse polarizations of the $Σ^0$ hyperons in $J/ψ$ and $ψ(3686)$ decays are observed with opposite directions, and the ratios between the S-wave and D-wave contributions of the $J/ψ, ψ(3686) \to Σ^0 \barΣ^{0}$ decays are obtained. These results are crucial to understand the decay dynamics of the charmonium states and the production mechanism of the $Σ^0-\barΣ^0$ pairs. △ Less

Submitted 16 July, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.06087 [pdf, other]

GAIA: Rethinking Action Quality Assessment for AI-Generated Videos

Authors: Zijian Chen, Wei Sun, Yuan Tian, Jun Jia, Zicheng Zhang, Jiarui Wang, Ru Huang, Xiongkuo Min, Guangtao Zhai, Wenjun Zhang

Abstract: Assessing action quality is both imperative and challenging due to its significant impact on the quality of AI-generated videos, further complicated by the inherently ambiguous nature of actions within AI-generated video (AIGV). Current action quality assessment (AQA) algorithms predominantly focus on actions from real specific scenarios and are pre-trained with normative action features, thus ren… ▽ More Assessing action quality is both imperative and challenging due to its significant impact on the quality of AI-generated videos, further complicated by the inherently ambiguous nature of actions within AI-generated video (AIGV). Current action quality assessment (AQA) algorithms predominantly focus on actions from real specific scenarios and are pre-trained with normative action features, thus rendering them inapplicable in AIGVs. To address these problems, we construct GAIA, a Generic AI-generated Action dataset, by conducting a large-scale subjective evaluation from a novel causal reasoning-based perspective, resulting in 971,244 ratings among 9,180 video-action pairs. Based on GAIA, we evaluate a suite of popular text-to-video (T2V) models on their ability to generate visually rational actions, revealing their pros and cons on different categories of actions. We also extend GAIA as a testbed to benchmark the AQA capacity of existing automatic evaluation methods. Results show that traditional AQA methods, action-related metrics in recent T2V benchmarks, and mainstream video quality methods correlate poorly with human opinions, indicating a sizable gap between current models and human action perception patterns in AIGVs. Our findings underscore the significance of action quality as a unique perspective for studying AIGVs and can catalyze progress towards methods with enhanced capacities for AQA in AIGVs. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 28 pages, 13 figures

arXiv:2406.06043 [pdf, other]

doi 10.1145/3637528.3671531

Modeling User Retention through Generative Flow Networks

Authors: Ziru Liu, Shuchang Liu, Bin Yang, Zhenghai Xue, Qingpeng Cai, Xiangyu Zhao, Zijian Zhang, Lantao Hu, Han Li, Peng Jiang

Abstract: Recommender systems aim to fulfill the user's daily demands. While most existing research focuses on maximizing the user's engagement with the system, it has recently been pointed out that how frequently the users come back for the service also reflects the quality and stability of recommendations. However, optimizing this user retention behavior is non-trivial and poses several challenges includi… ▽ More Recommender systems aim to fulfill the user's daily demands. While most existing research focuses on maximizing the user's engagement with the system, it has recently been pointed out that how frequently the users come back for the service also reflects the quality and stability of recommendations. However, optimizing this user retention behavior is non-trivial and poses several challenges including the intractable leave-and-return user activities, the sparse and delayed signal, and the uncertain relations between users' retention and their immediate feedback towards each item in the recommendation list. In this work, we regard the retention signal as an overall estimation of the user's end-of-session satisfaction and propose to estimate this signal through a probabilistic flow. This flow-based modeling technique can back-propagate the retention reward towards each recommended item in the user session, and we show that the flow combined with traditional learning-to-rank objectives eventually optimizes a non-discounted cumulative reward for both immediate user feedback and user retention. We verify the effectiveness of our method through both offline empirical studies on two public datasets and online A/B tests in an industrial platform. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: KDD-ADS 2024

arXiv:2406.06039 [pdf, other]

Diving into Underwater: Segment Anything Model Guided Underwater Salient Instance Segmentation and A Large-scale Dataset

Authors: Shijie Lian, Ziyi Zhang, Hua Li, Wenjie Li, Laurence Tianruo Yang, Sam Kwong, Runmin Cong

Abstract: With the breakthrough of large models, Segment Anything Model (SAM) and its extensions have been attempted to apply in diverse tasks of computer vision. Underwater salient instance segmentation is a foundational and vital step for various underwater vision tasks, which often suffer from low segmentation accuracy due to the complex underwater circumstances and the adaptive ability of models. Moreov… ▽ More With the breakthrough of large models, Segment Anything Model (SAM) and its extensions have been attempted to apply in diverse tasks of computer vision. Underwater salient instance segmentation is a foundational and vital step for various underwater vision tasks, which often suffer from low segmentation accuracy due to the complex underwater circumstances and the adaptive ability of models. Moreover, the lack of large-scale datasets with pixel-level salient instance annotations has impeded the development of machine learning techniques in this field. To address these issues, we construct the first large-scale underwater salient instance segmentation dataset (USIS10K), which contains 10,632 underwater images with pixel-level annotations in 7 categories from various underwater scenes. Then, we propose an Underwater Salient Instance Segmentation architecture based on Segment Anything Model (USIS-SAM) specifically for the underwater domain. We devise an Underwater Adaptive Visual Transformer (UA-ViT) encoder to incorporate underwater domain visual prompts into the segmentation network. We further design an out-of-the-box underwater Salient Feature Prompter Generator (SFPG) to automatically generate salient prompters instead of explicitly providing foreground points or boxes as prompts in SAM. Comprehensive experimental results show that our USIS-SAM method can achieve superior performance on USIS10K datasets compared to the state-of-the-art methods. Datasets and codes are released on https://github.com/LiamLian0727/USIS10K. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: Accepted to ICML 2024, Code released at: https://github.com/LiamLian0727/USIS10K

arXiv:2406.05961 [pdf, other]

BS-PLCNet 2: Two-stage Band-split Packet Loss Concealment Network with Intra-model Knowledge Distillation

Authors: Zihan Zhang, Xianjun Xia, Chuanzeng Huang, Yijian Xiao, Lei Xie

Abstract: Audio packet loss is an inevitable problem in real-time speech communication. A band-split packet loss concealment network (BS-PLCNet) targeting full-band signals was recently proposed. Although it performs superiorly in the ICASSP 2024 PLC Challenge, BS-PLCNet is a large model with high computational complexity of 8.95G FLOPS. This paper presents its updated version, BS-PLCNet 2, to reduce comput… ▽ More Audio packet loss is an inevitable problem in real-time speech communication. A band-split packet loss concealment network (BS-PLCNet) targeting full-band signals was recently proposed. Although it performs superiorly in the ICASSP 2024 PLC Challenge, BS-PLCNet is a large model with high computational complexity of 8.95G FLOPS. This paper presents its updated version, BS-PLCNet 2, to reduce computational complexity and improve performance further. Specifically, to compensate for the missing future information, in the wide-band module, we design a dual-path encoder structure (with non-causal and causal path) and leverage an intra-model knowledge distillation strategy to distill the future information from the non-causal teacher to the casual student. Moreover, we introduce a lightweight post-processing module after packet loss restoration to recover speech distortions and remove residual noise in the audio signal. With only 40% of original parameters in BS-PLCNet, BS-PLCNet 2 brings 0.18 PLCMOS improvement on the ICASSP 2024 PLC challenge blind set, achieving state-of-the-art performance on this dataset. △ Less

Submitted 9 June, 2024; originally announced June 2024.

Comments: Accepted by Interspeech 2024

arXiv:2406.05955 [pdf, other]

Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters

Authors: Yixin Song, Haotong Xie, Zhengyan Zhang, Bo Wen, Li Ma, Zeyu Mi, Haibo Chen

Abstract: Exploiting activation sparsity is a promising approach to significantly accelerating the inference process of large language models (LLMs) without compromising performance. However, activation sparsity is determined by activation functions, and commonly used ones like SwiGLU and GeGLU exhibit limited sparsity. Simply replacing these functions with ReLU fails to achieve sufficient sparsity. Moreove… ▽ More Exploiting activation sparsity is a promising approach to significantly accelerating the inference process of large language models (LLMs) without compromising performance. However, activation sparsity is determined by activation functions, and commonly used ones like SwiGLU and GeGLU exhibit limited sparsity. Simply replacing these functions with ReLU fails to achieve sufficient sparsity. Moreover, inadequate training data can further increase the risk of performance degradation. To address these challenges, we propose a novel dReLU function, which is designed to improve LLM activation sparsity, along with a high-quality training data mixture ratio to facilitate effective sparsification. Additionally, we leverage sparse activation patterns within the Feed-Forward Network (FFN) experts of Mixture-of-Experts (MoE) models to further boost efficiency. By applying our neuron sparsification method to the Mistral and Mixtral models, only 2.5 billion and 4.3 billion parameters are activated per inference iteration, respectively, while achieving even more powerful model performance. Evaluation results demonstrate that this sparsity achieves a 2-5x decoding speedup. Remarkably, on mobile phones, our TurboSparse-Mixtral-47B achieves an inference speed of 11 tokens per second. Our models are available at \url{https://huggingface.co/PowerInfer} △ Less

Submitted 10 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

arXiv:2406.05898 [pdf, other]

Async Learned User Embeddings for Ads Delivery Optimization

Authors: Mingwei Tang, Meng Liu, Hong Li, Junjie Yang, Chenglin Wei, Boyang Li, Dai Li, Rengan Xu, Yifan Xu, Zehua Zhang, Xiangyu Wang, Linfeng Liu, Yuelei Xie, Chengye Liu, Labib Fawaz, Li Li, Hongnan Wang, Bill Zhu, Sri Reddy

Abstract: In recommendation systems, high-quality user embeddings can capture subtle preferences, enable precise similarity calculations, and adapt to changing preferences over time to maintain relevance. The effectiveness of recommendation systems depends on the quality of user embedding. We propose to asynchronously learn high fidelity user embeddings for billions of users each day from sequence based mul… ▽ More In recommendation systems, high-quality user embeddings can capture subtle preferences, enable precise similarity calculations, and adapt to changing preferences over time to maintain relevance. The effectiveness of recommendation systems depends on the quality of user embedding. We propose to asynchronously learn high fidelity user embeddings for billions of users each day from sequence based multimodal user activities through a Transformer-like large scale feature learning module. The async learned user representations embeddings (ALURE) are further converted to user similarity graphs through graph learning and then combined with user realtime activities to retrieval highly related ads candidates for the ads delivery system. Our method shows significant gains in both offline and online experiments. △ Less

Submitted 23 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

Comments: Accepted by workshop on Multimodal Representation and Retrieval at SIGIR 2024, Washington DC

arXiv:2406.05827 [pdf, ps, other]

Measurement of the integrated luminosity of the data collected at 3.773 GeV by BESIII from 2021 to 2024

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (634 additional authors not shown)

Abstract: We present a measurement of the integrated luminosity of $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at a center-of-mass energy of $E_{\rm cm} = 3.773$~GeV. The integrated luminosities of the data sets taken from December 2021 to June 2022, from November 2022 to June 2023, and from October 2023 to February 2024 are determined to be $4.995 \pm 0.019$~fb$^{-1}$,… ▽ More We present a measurement of the integrated luminosity of $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at a center-of-mass energy of $E_{\rm cm} = 3.773$~GeV. The integrated luminosities of the data sets taken from December 2021 to June 2022, from November 2022 to June 2023, and from October 2023 to February 2024 are determined to be $4.995 \pm 0.019$~fb$^{-1}$, $8.157 \pm 0.031$~fb$^{-1}$, and $4.191 \pm 0.016$~fb$^{-1}$, respectively, by analyzing large angle Bhabha scattering events. The uncertainties are dominated by systematic effects and the statistical uncertainties are negligible. Our results provide essential input for future analyses and precision measurements. △ Less

Submitted 9 June, 2024; originally announced June 2024.

arXiv:2406.05746 [pdf]

doi 10.1007/s10462-024-10763-w

Methodology and Real-World Applications of Dynamic Uncertain Causality Graph for Clinical Diagnosis with Explainability and Invariance

Authors: Zhan Zhang, Qin Zhang, Yang Jiao, Lin Lu, Lin Ma, Aihua Liu, Xiao Liu, Juan Zhao, Yajun Xue, Bing Wei, Mingxia Zhang, Ru Gao, Hong Zhao, Jie Lu, Fan Li, Yang Zhang, Yiming Wang, Lei Zhang, Fengwei Tian, Jie Hu, Xin Gou

Abstract: AI-aided clinical diagnosis is desired in medical care. Existing deep learning models lack explainability and mainly focus on image analysis. The recently developed Dynamic Uncertain Causality Graph (DUCG) approach is causality-driven, explainable, and invariant across different application scenarios, without problems of data collection, labeling, fitting, privacy, bias, generalization, high cost… ▽ More AI-aided clinical diagnosis is desired in medical care. Existing deep learning models lack explainability and mainly focus on image analysis. The recently developed Dynamic Uncertain Causality Graph (DUCG) approach is causality-driven, explainable, and invariant across different application scenarios, without problems of data collection, labeling, fitting, privacy, bias, generalization, high cost and high energy consumption. Through close collaboration between clinical experts and DUCG technicians, 46 DUCG models covering 54 chief complaints were constructed. Over 1,000 diseases can be diagnosed without triage. Before being applied in real-world, the 46 DUCG models were retrospectively verified by third-party hospitals. The verified diagnostic precisions were no less than 95%, in which the diagnostic precision for every disease including uncommon ones was no less than 80%. After verifications, the 46 DUCG models were applied in the real-world in China. Over one million real diagnosis cases have been performed, with only 17 incorrect diagnoses identified. Due to DUCG's transparency, the mistakes causing the incorrect diagnoses were found and corrected. The diagnostic abilities of the clinicians who applied DUCG frequently were improved significantly. Following the introduction to the earlier presented DUCG methodology, the recommendation algorithm for potential medical checks is presented and the key idea of DUCG is extracted. △ Less

Submitted 9 June, 2024; originally announced June 2024.

Journal ref: Artificaial Intelligence Review, (2024) 57:151

arXiv:2406.05657 [pdf, other]

Single channel PICOSEC Micromegas detector with improved time resolution

Authors: A. Utrobicic, R. Aleksan, Y. Angelis, J. Bortfeldt, F. Brunbauer, M. Brunoldi, E. Chatzianagnostou, J. Datta, K. Dehmelt, G. Fanourakis, D. Fiorina, K. J. Floethner, M. Gallinaro, F. Garcia, I. Giomataris, K. Gnanvo, F. J. Iguaz, D. Janssens, A. Kallitsopoulou, M. Kovacic, B. Kross, P. Legou, M. Lisowska, J. Liu, M. Lupberger , et al. (25 additional authors not shown)

Abstract: This paper presents design guidelines and experimental verification of a single-channel PICOSEC Micromegas (MM) detector with an improved time resolution. The design encompasses the detector board, vessel, auxiliary mechanical parts, and electrical connectivity for high voltage (HV) and signals, focusing on improving stability, reducing noise, and ensuring signal integrity to optimize timing perfo… ▽ More This paper presents design guidelines and experimental verification of a single-channel PICOSEC Micromegas (MM) detector with an improved time resolution. The design encompasses the detector board, vessel, auxiliary mechanical parts, and electrical connectivity for high voltage (HV) and signals, focusing on improving stability, reducing noise, and ensuring signal integrity to optimize timing performance. A notable feature is the simple and fast reassembly procedure, facilitating quick replacement of detector internal components that allows for an efficient measurement strategy involving different detector components. The paper also examines the influence of parasitics on the output signal integrity. To validate the design, a prototype assembly and three interchangeable detector boards with varying readout pad diameters were manufactured. The detectors were initially tested in the laboratory environment. Finally, the timing performance of detectors with different pad sizes was verified using a Minimum Ionizing Particle (MIP) beam test. Notably, a record time resolution for a PICOSEC Micromegas detector technology with a CsI photocathode of 12.5$\pm$0.8 ps was achieved with a 10 mm diameter readout pad size detector. △ Less

Submitted 9 June, 2024; originally announced June 2024.

arXiv:2406.05625 [pdf, other]

ATLAS: Improving Lay Summarisation with Attribute-based Control

Authors: Zhihao Zhang, Tomas Goldsack, Carolina Scarton, Chenghua Lin

Abstract: Lay summarisation aims to produce summaries of scientific articles that are comprehensible to non-expert audiences. However, previous work assumes a one-size-fits-all approach, where the content and style of the produced summary are entirely dependent on the data used to train the model. In practice, audiences with different levels of expertise will have specific needs, impacting what content shou… ▽ More Lay summarisation aims to produce summaries of scientific articles that are comprehensible to non-expert audiences. However, previous work assumes a one-size-fits-all approach, where the content and style of the produced summary are entirely dependent on the data used to train the model. In practice, audiences with different levels of expertise will have specific needs, impacting what content should appear in a lay summary and how it should be presented. Aiming to address this, we propose ATLAS, a novel abstractive summarisation approach that can control various properties that contribute to the overall "layness" of the generated summary using targeted control attributes. We evaluate ATLAS on a combination of biomedical lay summarisation datasets, where it outperforms state-of-the-art baselines using mainstream summarisation metrics. Additional analyses provided on the discriminatory power and emergent influence of our selected controllable attributes further attest to the effectiveness of our approach. △ Less

Submitted 8 June, 2024; originally announced June 2024.

arXiv:2406.05513 [pdf, ps, other]

A Two-Stage Adverse Weather Semantic Segmentation Method for WeatherProof Challenge CVPR 2024 Workshop UG2+

Authors: Jianzhao Wang, Yanyan Wei, Dehua Hu, Yilin Zhang, Shengeng Tang, Kun Li, Zhao Zhang

Abstract: This technical report presents our team's solution for the WeatherProof Dataset Challenge: Semantic Segmentation in Adverse Weather at CVPR'24 UG2+. We propose a two-stage deep learning framework for this task. In the first stage, we preprocess the provided dataset by concatenating images into video sequences. Subsequently, we leverage a low-rank video deraining method to generate high-fidelity ps… ▽ More This technical report presents our team's solution for the WeatherProof Dataset Challenge: Semantic Segmentation in Adverse Weather at CVPR'24 UG2+. We propose a two-stage deep learning framework for this task. In the first stage, we preprocess the provided dataset by concatenating images into video sequences. Subsequently, we leverage a low-rank video deraining method to generate high-fidelity pseudo ground truths. These pseudo ground truths offer superior alignment compared to the original ground truths, facilitating model convergence during training. In the second stage, we employ the InternImage network to train for the semantic segmentation task using the generated pseudo ground truths. Notably, our meticulously designed framework demonstrates robustness to degraded data captured under adverse weather conditions. In the challenge, our solution achieved a competitive score of 0.43 on the Mean Intersection over Union (mIoU) metric, securing a respectable rank of 4th. △ Less

Submitted 10 July, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

arXiv:2406.05452 [pdf, other]

Near-Field Channel Estimation for Extremely Large-Scale Terahertz Communications

Authors: Songjie Yang, Yizhou Peng, Wanting Lyu, Ya Li, Hongjun He, Zhongpei Zhang, Chau Yuen

Abstract: Future Terahertz communications exhibit significant potential in accommodating ultra-high-rate services. Employing extremely large-scale array antennas is a key approach to realize this potential, as they can harness substantial beamforming gains to overcome the severe path loss and leverage the electromagnetic advantages in the near field. This paper proposes novel estimation methods designed to… ▽ More Future Terahertz communications exhibit significant potential in accommodating ultra-high-rate services. Employing extremely large-scale array antennas is a key approach to realize this potential, as they can harness substantial beamforming gains to overcome the severe path loss and leverage the electromagnetic advantages in the near field. This paper proposes novel estimation methods designed to enhance efficiency in Terahertz widely-spaced multi-subarray (WSMS) systems. Initially, we introduce three sparse channel representation methods: polar-domain representation (PD-R), multi-angular-domain representation (MAD-R), and two-dimensional polar-angular-domain representation (2D-PAD-R). Each method is meticulously developed for near-field WSMS channels, capitalizing on their sparsity characteristics. Building on this, we propose four estimation frameworks using the sparse recovery theory: polar-domain estimation (PD-E), multi-angular-domain estimation (MAD-E), two-stage polar-angular-domain estimation (TS-PAD-E), and two-dimensional polar-angular-domain estimation (2D-PAD-E). Particularly, 2D-PAD-E, integrating a 2D dictionary process, and TS-PAD-E, with its sequential approach to angle and distance estimation, stand out as particularly effective for near-field angle-distance estimation, enabling decoupled calculation of these parameters. Overall, these frameworks provide versatile and efficient solutions for WSMS channel estimation, balancing low complexity with high-performance outcomes. Additionally, they represent a fresh perspective on near-field signal processing. △ Less

Submitted 8 June, 2024; originally announced June 2024.

arXiv:2406.05318 [pdf]

Integrating Text and Image Pre-training for Multi-modal Algorithmic Reasoning

Authors: Zijian Zhang, Wei Liu

Abstract: In this paper, we present our solution for SMART-101 Challenge of CVPR Multi-modal Algorithmic Reasoning Task 2024. Unlike traditional visual questions and answer tasks, this challenge evaluates abstraction, deduction and generalization ability of neural network in solving visuo-linguistic puzzles designed for specially children in the 6-8 age group. Our model is based on two pre-trained models, d… ▽ More In this paper, we present our solution for SMART-101 Challenge of CVPR Multi-modal Algorithmic Reasoning Task 2024. Unlike traditional visual questions and answer tasks, this challenge evaluates abstraction, deduction and generalization ability of neural network in solving visuo-linguistic puzzles designed for specially children in the 6-8 age group. Our model is based on two pre-trained models, dedicated to extract features from text and image respectively. To integrate the features from different modalities, we employed a fusion layer with attention mechanism. We explored different text and image pre-trained models, and fine-tune the integrated classifier on the SMART-101 dataset. Experiment results show that under the data splitting style of puzzle split, our proposed integrated classifier achieves superior performance, verifying the effectiveness of multi-modal pre-trained representations. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2406.05168 [pdf, other]

doi 10.1103/PhysRevLett.132.223802

Topological photonic alloy

Authors: Tiantao Qu, Mudi Wang, Xiaoyu Cheng, Xiaohan Cui, Ruo-Yang Zhang, Zhao-Qing Zhang, Lei Zhang, Jun Chen, C. T. Chan

Abstract: We present the new concept of photonic alloy as a non-periodic topological material. By mixing non-magnetized and magnetized rods in a non-periodic 2D photonic crystal configuration, we realized photonic alloys in the microwave regime. Our experimental findings reveal that the photonic alloy sustains non-reciprocal chiral edge states (CESs) even at very low concentration of magnetized rods. The no… ▽ More We present the new concept of photonic alloy as a non-periodic topological material. By mixing non-magnetized and magnetized rods in a non-periodic 2D photonic crystal configuration, we realized photonic alloys in the microwave regime. Our experimental findings reveal that the photonic alloy sustains non-reciprocal chiral edge states (CESs) even at very low concentration of magnetized rods. The non-trivial topology and the associated edge states of these non-periodic systems can be characterized by the winding of the reflection phase. Our results indicate that the threshold concentrations for the investigated system within the first non-trivial band gap to exhibit topological behavior approach zero in the thermodynamic limit for substitutional alloys, while the threshold remains non-zero for interstitial alloys. At low concentration, the system exhibits an inhomogeneous structure characterized by isolated patches of non-percolating magnetic domains that are spaced far apart within a topologically trivial photonic crystal. Surprisingly, the system manifests CESs despite a local breakdown of time-reversal symmetry rather than a global one. Photonic alloys represent a new category of disordered topological materials, offering exciting opportunities for exploring topological materials with adjustable gaps. △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: 6 pages, 4 figures

Journal ref: Phys. Rev. Lett. 132, 223802 (2024)

arXiv:2406.04906 [pdf, other]

RU-AI: A Large Multimodal Dataset for Machine Generated Content Detection

Authors: Liting Huang, Zhihao Zhang, Yiran Zhang, Xiyue Zhou, Shoujin Wang

Abstract: The recent advancements in generative AI models, which can create realistic and human-like content, are significantly transforming how people communicate, create, and work. While the appropriate use of generative AI models can benefit the society, their misuse poses significant threats to data reliability and authentication. However, due to a lack of aligned multimodal datasets, effective and robu… ▽ More The recent advancements in generative AI models, which can create realistic and human-like content, are significantly transforming how people communicate, create, and work. While the appropriate use of generative AI models can benefit the society, their misuse poses significant threats to data reliability and authentication. However, due to a lack of aligned multimodal datasets, effective and robust methods for detecting machine-generated content are still in the early stages of development. In this paper, we introduce RU-AI, a new large-scale multimodal dataset designed for the robust and efficient detection of machine-generated content in text, image, and voice. Our dataset is constructed from three large publicly available datasets: Flickr8K, COCO, and Places205, by combining the original datasets and their corresponding machine-generated pairs. Additionally, experimental results show that our proposed unified model, which incorporates a multimodal embedding module with a multilayer perceptron network, can effectively determine the origin of the data (i.e., original data samples or machine-generated ones) from RU-AI. However, future work is still required to address the remaining challenges posed by RU-AI. The source code and dataset are available at https://github.com/ZhihaoZhang97/RU-AI. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2406.04819 [pdf, ps, other]

doi 10.1007/s11433-024-2427-2

Magnetism of $\mathrm{NaYbS_2}$: From finite temperatures to ground state

Authors: Weizhen Zhuo, Zheng Zhang, Mingtai Xie, Anmin Zhang, Jianting Ji, Feng Jin, Qingming Zhang

Abstract: Rare-earth chalcogenide compounds $\mathrm{ARECh_2}$ (A = alkali or monovalent metal, RE = rare earth, Ch = O, S, Se, Te) are a large family of quantum spin liquid (QSL) candidate materials. $\mathrm{NaYbS_2}$ is a representative member of the family. Several key issues on $\mathrm{NaYbS_2}$, particularly how to determine the highly anisotropic spin Hamiltonian and describe the magnetism at finite… ▽ More Rare-earth chalcogenide compounds $\mathrm{ARECh_2}$ (A = alkali or monovalent metal, RE = rare earth, Ch = O, S, Se, Te) are a large family of quantum spin liquid (QSL) candidate materials. $\mathrm{NaYbS_2}$ is a representative member of the family. Several key issues on $\mathrm{NaYbS_2}$, particularly how to determine the highly anisotropic spin Hamiltonian and describe the magnetism at finite temperatures and the ground state, remain to be addressed. In this paper, we conducted an in-depth and comprehensive study on the magnetism of $\mathrm{NaYbS_2}$ from finite temperatures to the ground state. Firstly, we successfully detected three crystalline electric field (CEF) excitation energy levels using low-temperature Raman scattering technique. Combining them with the CEF theory and magnetization data, we worked out the CEF parameters, CEF energy levels, and CEF wavefunctions. We further determined a characteristic temperature of $\sim$40 K, above which the magnetism is dominated by CEF excitations while below which the spin-exchange interactions play a main role. The characteristic temperature has been confirmed by the temperature-dependent electron spin resonance (ESR) linewidth. Low-temperature ESR experiments on the dilute magnetic doped crystal of $\mathrm{NaYb_{0.1}Lu_{0.9}S_2}$ further helped us to determine the accurate $g$-factor. Next, we quantitatively obtained the spin-exchange interactions in the spin Hamiltonian by consistently simulating the magnetization and specific heat data. Finally, the above studies allow us to explore the ground state magnetism of $\mathrm{NaYbS_2}$ by using the density matrix renormalization group. We combined numerical calculations and experimental results to demonstrate that the ground state of $\mathrm{NaYbS_2}$ is a Dirac-like QSL. △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: 13 pages, 6 figures

Journal ref: Science China Physics, Mechanics & Astronomy (2024)

arXiv:2406.04504 [pdf, other]

Mixed Finite Element Method for Multi-layer Elastic Contact Systems

Authors: Zhizhuo Zhang, Mikaël Barboteu, Xiaobing Nie, Serge Dumont, Mahmoud Abdel-Aty, Jinde Cao

Abstract: With the development of multi-layer elastic systems in the field of engineering mechanics, the corresponding variational inequality theory and algorithm design have received more attention and research. In this study, a class of equivalent saddle point problems with interlayer Tresca friction conditions and the mixed finite element method are proposed and analyzed. Then, the convergence of the num… ▽ More With the development of multi-layer elastic systems in the field of engineering mechanics, the corresponding variational inequality theory and algorithm design have received more attention and research. In this study, a class of equivalent saddle point problems with interlayer Tresca friction conditions and the mixed finite element method are proposed and analyzed. Then, the convergence of the numerical solution of the mixed finite element method is theoretically proven, and the corresponding algebraic dual algorithm is given. Finally, through numerical experiments, the mixed finite element method is not only compared with the layer decomposition method, but also its convergence relationship with respect to the spatial discretization parameter $H$ is verified. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.04499 [pdf, other]

A layer decomposition method for multi-layer elastic contact systems with interlayer Tresca friction

Authors: Zhizhuo Zhang, Xiaobing Nie, Mikaël Barboteu, Jinde Cao

Abstract: With the increasing demand for the accuracy of numerical simulation of pavement mechanics, the variational inequality model and its induced finite element method which can simulate the interlayer contact state becomes a potential solution. In this paper, a layer decomposition algorithm for solving variational inequality models of multi-layer elastic contact systems with interlayer Tresca friction… ▽ More With the increasing demand for the accuracy of numerical simulation of pavement mechanics, the variational inequality model and its induced finite element method which can simulate the interlayer contact state becomes a potential solution. In this paper, a layer decomposition algorithm for solving variational inequality models of multi-layer elastic contact systems with interlayer Tresca friction conditions is studied. Continuous and discrete versions of the algorithm and their convergence theorems have been proposed and proved successively. Then, the algebraic form of the executable optimization algorithm and the numerical experimental results verify the practicability of the variational inequality model and its algorithm in the pavement mechanics modeling. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.04324 [pdf, other]

SF-V: Single Forward Video Generation Model

Authors: Zhixing Zhang, Yanyu Li, Yushu Wu, Yanwu Xu, Anil Kag, Ivan Skorokhodov, Willi Menapace, Aliaksandr Siarohin, Junli Cao, Dimitris Metaxas, Sergey Tulyakov, Jian Ren

Abstract: Diffusion-based video generation models have demonstrated remarkable success in obtaining high-fidelity videos through the iterative denoising process. However, these models require multiple denoising steps during sampling, resulting in high computational costs. In this work, we propose a novel approach to obtain single-step video generation models by leveraging adversarial training to fine-tune p… ▽ More Diffusion-based video generation models have demonstrated remarkable success in obtaining high-fidelity videos through the iterative denoising process. However, these models require multiple denoising steps during sampling, resulting in high computational costs. In this work, we propose a novel approach to obtain single-step video generation models by leveraging adversarial training to fine-tune pre-trained video diffusion models. We show that, through the adversarial training, the multi-steps video diffusion model, i.e., Stable Video Diffusion (SVD), can be trained to perform single forward pass to synthesize high-quality videos, capturing both temporal and spatial dependencies in the video data. Extensive experiments demonstrate that our method achieves competitive generation quality of synthesized videos with significantly reduced computational overhead for the denoising process (i.e., around $23\times$ speedup compared with SVD and $6\times$ speedup compared with existing works, with even better generation quality), paving the way for real-time video synthesis and editing. More visualization results are made publicly available at https://snap-research.github.io/SF-V. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: Project Page: https://snap-research.github.io/SF-V

arXiv:2406.03877 [pdf, other]

Bench2Drive: Towards Multi-Ability Benchmarking of Closed-Loop End-To-End Autonomous Driving

Authors: Xiaosong Jia, Zhenjie Yang, Qifeng Li, Zhiyuan Zhang, Junchi Yan

Abstract: In an era marked by the rapid scaling of foundation models, autonomous driving technologies are approaching a transformative threshold where end-to-end autonomous driving (E2E-AD) emerges due to its potential of scaling up in the data-driven manner. However, existing E2E-AD methods are mostly evaluated under the open-loop log-replay manner with L2 errors and collision rate as metrics (e.g., in nuS… ▽ More In an era marked by the rapid scaling of foundation models, autonomous driving technologies are approaching a transformative threshold where end-to-end autonomous driving (E2E-AD) emerges due to its potential of scaling up in the data-driven manner. However, existing E2E-AD methods are mostly evaluated under the open-loop log-replay manner with L2 errors and collision rate as metrics (e.g., in nuScenes), which could not fully reflect the driving performance of algorithms as recently acknowledged in the community. For those E2E-AD methods evaluated under the closed-loop protocol, they are tested in fixed routes (e.g., Town05Long and Longest6 in CARLA) with the driving score as metrics, which is known for high variance due to the unsmoothed metric function and large randomness in the long route. Besides, these methods usually collect their own data for training, which makes algorithm-level fair comparison infeasible. To fulfill the paramount need of comprehensive, realistic, and fair testing environments for Full Self-Driving (FSD), we present Bench2Drive, the first benchmark for evaluating E2E-AD systems' multiple abilities in a closed-loop manner. Bench2Drive's official training data consists of 2 million fully annotated frames, collected from 10000 short clips uniformly distributed under 44 interactive scenarios (cut-in, overtaking, detour, etc), 23 weathers (sunny, foggy, rainy, etc), and 12 towns (urban, village, university, etc) in CARLA v2. Its evaluation protocol requires E2E-AD models to pass 44 interactive scenarios under different locations and weathers which sums up to 220 routes and thus provides a comprehensive and disentangled assessment about their driving capability under different situations. We implement state-of-the-art E2E-AD models and evaluate them in Bench2Drive, providing insights regarding current status and future directions. △ Less

Submitted 11 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

Comments: Fix typos in text and Table 4. More reference

arXiv:2406.03876 [pdf]

Time-resolved optical assessment of exciton formation in mixed two-dimensional perovskite films

Authors: Zheng Zhang, Jianan Wang, Yijie Shi, Xi Wang, Zhong Wang, Xiangyu Zhu, Chunlong Hu, Zonghao Liu, Wei Chen, Wenxi Liang

Abstract: We report the observation of exciton formation from the cooled band-edge carriers in mixed two-dimensional hybrid organic-inorganic perovskites using femtosecond transient absorption spectroscopy. By monitoring the changes of bleach signal upon excitations with various photon energy, we are able to extract the values of exciton binding energy and the occupancies of carriers of free and bound state… ▽ More We report the observation of exciton formation from the cooled band-edge carriers in mixed two-dimensional hybrid organic-inorganic perovskites using femtosecond transient absorption spectroscopy. By monitoring the changes of bleach signal upon excitations with various photon energy, we are able to extract the values of exciton binding energy and the occupancies of carriers of free and bound states for each two-dimensional phase. We also confirm the existence of Mahan exciton when injected carrier density is above the Mott criterion. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: Main text: 15 pages, 4 figures. Supplementary Information: 16 pages, 16 figures, 10 tables

arXiv:2406.03875 [pdf, other]

Energy-storing analysis and fishtail stiffness optimization for a wire-driven elastic robotic fish

Authors: Xiaocun Liao, Chao Zhou, Junfeng Fan, Zhuoliang Zhang, Zhaoran Yin, Liangwei Deng

Abstract: The robotic fish with high propulsion efficiency and good maneuverability achieves underwater fishlike propulsion by commonly adopting the motor to drive the fishtail, causing the significant fluctuations of the motor power due to the uneven swing speed of the fishtail in one swing cycle. Hence, we propose a wire-driven robotic fish with a spring-steel-based active-segment elastic spine. This bion… ▽ More The robotic fish with high propulsion efficiency and good maneuverability achieves underwater fishlike propulsion by commonly adopting the motor to drive the fishtail, causing the significant fluctuations of the motor power due to the uneven swing speed of the fishtail in one swing cycle. Hence, we propose a wire-driven robotic fish with a spring-steel-based active-segment elastic spine. This bionic spine can produce elastic deformation to store energy under the action of the wire driving and motor for responding to the fluctuations of the motor power. Further, we analyze the effects of the energy-storing of the active-segment elastic spine on the smoothness of motor power. Based on the developed Lagrangian dynamic model and cantilever beam model, the power-variance-based nonlinear optimization model for the stiffness of the active-segment elastic spine is established to respond to the sharp fluctuations of motor power during each fishtail swing cycle. Results validate that the energy-storing of the active-segment elastic spine plays a vital role in improving the power fluctuations and maximum frequency of the motor by adjusting its stiffness reasonably, which is beneficial to achieving high propulsion and high speed for robotic fish. Compared with the active-segment rigid spine that is incapable of storing energy, the energy-storing of the active-segment elastic spine is beneficial to increase the maximum frequency of the motor and the average thrust of the fishtail by 0.41 Hz, and 0.06 N, respectively. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: 14 pages, 19 figures

arXiv:2406.03848 [pdf, other]

OceanCastNet: A Deep Learning Ocean Wave Model with Energy Conservation

Authors: Ziliang Zhang, Huaming Yu, Danqin Ren

Abstract: Traditional wave forecasting models, although based on energy conservation equations, are computationally expensive. On the other hand, existing deep learning geophysical fluid models, while computationally efficient, often suffer from issues such as energy dissipation in long-term forecasts. This paper proposes a novel energy-balanced deep learning wave forecasting model called OceanCastNet (OCN)… ▽ More Traditional wave forecasting models, although based on energy conservation equations, are computationally expensive. On the other hand, existing deep learning geophysical fluid models, while computationally efficient, often suffer from issues such as energy dissipation in long-term forecasts. This paper proposes a novel energy-balanced deep learning wave forecasting model called OceanCastNet (OCN). By incorporating wind fields at the current, previous, and future time steps, as well as wave fields at the current and previous time steps as input variables, OCN maintains energy balance within the model. Furthermore, the model employs adaptive Fourier operators as its core components and designs a masked loss function to better handle the impact of land-sea boundaries. A series of experiments on the ERA5 dataset demonstrate that OCN can achieve short-term forecast accuracy comparable to traditional models while exhibiting an understanding of the wave generation process. In comparative experiments under both normal and extreme conditions, OCN consistently outperforms the widely used WaveWatch III model in the industry. Even after long-term forecasting, OCN maintains a stable and energy-rich state. By further constructing a simple meteorological model, OCN-wind, which considers energy balance, this paper confirms the importance of energy constraints for improving the long-term forecast performance of deep learning meteorological models. This finding provides new ideas for future research on deep learning geophysical fluid models. △ Less

Submitted 9 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.03844 [pdf, other]

PREX and CREX: Evidence for Strong Isovector Spin-Orbit Interaction

Authors: Tong-Gang Yue, Zhen Zhang, Lie-Wen Chen

Abstract: The recent PREX-2 and CREX data on the model-independent extraction of the charge-weak form factor difference $ΔF_{\rm CW}$ in $^{208}$Pb and $^{48}$Ca challenge modern nuclear energy density functionals (EDFs) as well as our present understanding on the neutron skin and nuclear symmetry energy. Within the Skyrme-like EDFs, we demonstrate that the isovector spin-orbit interaction can strongly chan… ▽ More The recent PREX-2 and CREX data on the model-independent extraction of the charge-weak form factor difference $ΔF_{\rm CW}$ in $^{208}$Pb and $^{48}$Ca challenge modern nuclear energy density functionals (EDFs) as well as our present understanding on the neutron skin and nuclear symmetry energy. Within the Skyrme-like EDFs, we demonstrate that the isovector spin-orbit interaction can strongly change the $ΔF_{\rm CW}$ in $^{48}$Ca while it has essentially no influence on the $ΔF_{\rm CW}$ in $^{208}$Pb, mainly due to the eight spin-orbit unpaired $1f_{7/2}$ neutrons in $^{48}$Ca. To simultaneously describe PREX-2 and CREX data in $1σ$ error, we find the strength of isovector spin-orbit interaction should be larger than about four times of that in the conventional Skyrme-like EDFs, implying the neutrons and protons have significantly different spin-orbit interaction. To further reconcile the data on electric dipole polarizability in $^{208}$Pb and $^{48}$Ca, we obtain $L \approx 55$ MeV for the slope parameter of the symmetry energy, $Δr_{\rm np}(^{208}\rm{Pb}) \approx 0.19$ fm and $Δr_{\rm np}(^{48}\rm{Ca}) \approx 0.12$ fm for the neutron skin thickness. The implications of the strong isovector spin-orbit interaction are discussed. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: 14 pages, 6 figures (including Supplemental Material)

arXiv:2406.03842 [pdf, ps, other]

Blow-up of cylindrically symmetric solutions for Fractional NLS

Authors: Tianxiang Gou, Vicentiu D. Radulescu, Zhitao Zhang

Abstract: In this paper, we consider blow-up of solutions to the Cauchy problem for the following fractional NLS, $$ \textnormal{i} \, \partial_t u=(-Δ)^s u-|u|^{2 σ} u \quad \text{in} \,\, \R \times \R^N, $$ where $N \geq 2$, $1/2 <s<1$ and $0<σ<2s/(N-2s)$. In the mass critical and supercritical cases, we establish a criterion for blow-up of solutions to the problem for cylindrically symmetric data. The re… ▽ More In this paper, we consider blow-up of solutions to the Cauchy problem for the following fractional NLS, $$ \textnormal{i} \, \partial_t u=(-Δ)^s u-|u|^{2 σ} u \quad \text{in} \,\, \R \times \R^N, $$ where $N \geq 2$, $1/2 <s<1$ and $0<σ<2s/(N-2s)$. In the mass critical and supercritical cases, we establish a criterion for blow-up of solutions to the problem for cylindrically symmetric data. The results extend the known ones with respect to blow-up of solutions to the problem for radially symmetric data in \cite{BHL}. △ Less

Submitted 7 July, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

Comments: 9 pages

MSC Class: 35R11; 35B44

arXiv:2406.03840 [pdf, ps, other]

Global tensor polarization of spin $3/2$ hadrons and quark spin correlations in relativistic heavy ion collisions

Authors: Zhe Zhang, Ji-peng Lv, Zi-han Yu, Zuo-tang Liang

Abstract: We study the global polarization of spin-$3/2$ hadrons in relativistic heavy ion collisions. We show in particular that the global tensor polarizations of rank two or three for spin-$3/2$ hadrons are sensitive to the local two or three quark spin correlations respectively in the quark gluon plasma produced in the collision processes. We present the relationships between these measurable tensor pol… ▽ More We study the global polarization of spin-$3/2$ hadrons in relativistic heavy ion collisions. We show in particular that the global tensor polarizations of rank two or three for spin-$3/2$ hadrons are sensitive to the local two or three quark spin correlations respectively in the quark gluon plasma produced in the collision processes. We present the relationships between these measurable tensor polarizations and quark spin correlations in the quark matter system. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: 11 pages

arXiv:2406.03817 [pdf, other]

Field Theory of Active Brownian Particles with Dry Friction

Authors: Ziluo Zhang, Shurui Yuan, Shigeyuki Komura

Abstract: We present a field theoretic approach to capture the motion of a particle with dry friction for one- and two-dimensional diffusive particles, and further expand the framework for two-dimensional active Brownian particles. Starting with the Fokker-Planck equation and introducing the Hermite polynomials as the corresponding eigen-functions, we obtain the actions and propagators. Using a perturbation… ▽ More We present a field theoretic approach to capture the motion of a particle with dry friction for one- and two-dimensional diffusive particles, and further expand the framework for two-dimensional active Brownian particles. Starting with the Fokker-Planck equation and introducing the Hermite polynomials as the corresponding eigen-functions, we obtain the actions and propagators. Using a perturbation expansion, we calculate the effective diffusion coefficient in the presence of both wet and dry frictions in a perturbative way via the Green-Kubo relation. We further compare the analytical result with the numerical simulation. Our result can be used to estimate the values of dry friction coefficient in experiments. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.03758 [pdf]

Phonon heat conduction across slippery interfaces in twisted graphite

Authors: Fuwei Yang, Wenjiang Zhou, Zhibin Zhang, Xuanyu Huang, Jingwen Zhang, Nianjie Liang, Wujuan Yan, Yuxi Wang, Mingchao Ding, Quanlin Guo, Yu Han, Te-Huan Liu, Kaihui Liu, Quanshui Zheng, Bai Song

Abstract: Interlayer rotation in van der Waals (vdW) materials offers great potential for manipulating phonon dynamics and heat flow in advanced electronics with ever higher compactness and power density. However, despite extensive theoretical efforts in recent years, experimental measurements remain scarce especially due to the critical challenges of preparing single-crystalline twisted interfaces and prob… ▽ More Interlayer rotation in van der Waals (vdW) materials offers great potential for manipulating phonon dynamics and heat flow in advanced electronics with ever higher compactness and power density. However, despite extensive theoretical efforts in recent years, experimental measurements remain scarce especially due to the critical challenges of preparing single-crystalline twisted interfaces and probing interfacial thermal transport with sufficient resolution. Here, we exploited the intrinsic twisted interfaces in highly oriented pyrolytic graphite (HOPG). By developing novel experimental schemes based on microfabricated mesas, we managed to achieve simultaneous mechanical characterizations and thermal measurements. In particular, we pushed the HOPG mesas with a microprobe to identify and rotate single-crystalline intrinsic interfaces owing to their slippery nature as is well known in structural superlubricity. Remarkably, we observed over 30-fold suppression of thermal conductance for the slippery interfaces by using epitaxial graphite as a control. Nonetheless, the interfacial conductance remains around 600 $\mathrm{MWm^{-2}K^{-1}}$ which surpasses the highest values for artificially stacked vdW structures by more than five times. Further, atomic simulations revealed the predominant role of the transverse acoustic phonons. Together, our findings highlight a general physical picture that directly correlates interfacial thermal transport with sliding resistance, and lay the foundation for twist-enabled thermal management which are particularly beneficial to twistronics and slidetronics. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.03746 [pdf, other]

Efficient Knowledge Infusion via KG-LLM Alignment

Authors: Zhouyu Jiang, Ling Zhong, Mengshu Sun, Jun Xu, Rui Sun, Hui Cai, Shuhan Luo, Zhiqiang Zhang

Abstract: To tackle the problem of domain-specific knowledge scarcity within large language models (LLMs), knowledge graph-retrievalaugmented method has been proven to be an effective and efficient technique for knowledge infusion. However, existing approaches face two primary challenges: knowledge mismatch between public available knowledge graphs and the specific domain of the task at hand, and poor infor… ▽ More To tackle the problem of domain-specific knowledge scarcity within large language models (LLMs), knowledge graph-retrievalaugmented method has been proven to be an effective and efficient technique for knowledge infusion. However, existing approaches face two primary challenges: knowledge mismatch between public available knowledge graphs and the specific domain of the task at hand, and poor information compliance of LLMs with knowledge graphs. In this paper, we leverage a small set of labeled samples and a large-scale corpus to efficiently construct domain-specific knowledge graphs by an LLM, addressing the issue of knowledge mismatch. Additionally, we propose a three-stage KG-LLM alignment strategyto enhance the LLM's capability to utilize information from knowledge graphs. We conduct experiments with a limited-sample setting on two biomedical question-answering datasets, and the results demonstrate that our approach outperforms existing baselines. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: ACL2024 Findings

arXiv:2406.03712 [pdf, other]

A Survey on Medical Large Language Models: Technology, Application, Trustworthiness, and Future Directions

Authors: Lei Liu, Xiaoyan Yang, Junchi Lei, Xiaoyang Liu, Yue Shen, Zhiqiang Zhang, Peng Wei, Jinjie Gu, Zhixuan Chu, Zhan Qin, Kui Ren

Abstract: Large language models (LLMs), such as GPT series models, have received substantial attention due to their impressive capabilities for generating and understanding human-level language. More recently, LLMs have emerged as an innovative and powerful adjunct in the medical field, transforming traditional practices and heralding a new era of enhanced healthcare services. This survey provides a compreh… ▽ More Large language models (LLMs), such as GPT series models, have received substantial attention due to their impressive capabilities for generating and understanding human-level language. More recently, LLMs have emerged as an innovative and powerful adjunct in the medical field, transforming traditional practices and heralding a new era of enhanced healthcare services. This survey provides a comprehensive overview of Medical Large Language Models (Med-LLMs), outlining their evolution from general to the medical-specific domain (i.e, Technology and Application), as well as their transformative impact on healthcare (e.g., Trustworthiness and Safety). Concretely, starting from the fundamental history and technology of LLMs, we first delve into the progressive adaptation and refinements of general LLM models in the medical domain, especially emphasizing the advanced algorithms that boost the LLMs' performance in handling complicated medical environments, including clinical reasoning, knowledge graph, retrieval-augmented generation, human alignment, and multi-modal learning. Secondly, we explore the extensive applications of Med-LLMs across domains such as clinical decision support, report generation, and medical education, illustrating their potential to streamline healthcare services and augment patient outcomes. Finally, recognizing the imperative and responsible innovation, we discuss the challenges of ensuring fairness, accountability, privacy, and robustness in Med-LLMs applications. Finally, we conduct a concise discussion for anticipating possible future trajectories of Med-LLMs, identifying avenues for the prudent expansion of Med-LLMs. By consolidating above-mentioned insights, this review seeks to provide a comprehensive investigation of the potential strengths and limitations of Med-LLMs for professionals and researchers, ensuring a responsible landscape in the healthcare setting. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2406.03700 [pdf]

Ferroelectricity-tuned band topology and superconductivity in two-dimensional materials and related heterostructures

Authors: Jianyong Chen, Ping Cui, Zhenyu Zhang

Abstract: Ferroelectricity, band topology, and superconductivity are respectively local, global, and macroscopic properties of quantum materials, and understanding their mutual couplings offers unique opportunities for exploring rich physics and enhanced functionalities. In this mini-review, we attempt to highlight some of the latest advances in this vibrant area, focusing in particular on ferroelectricity-… ▽ More Ferroelectricity, band topology, and superconductivity are respectively local, global, and macroscopic properties of quantum materials, and understanding their mutual couplings offers unique opportunities for exploring rich physics and enhanced functionalities. In this mini-review, we attempt to highlight some of the latest advances in this vibrant area, focusing in particular on ferroelectricity-tuned superconductivity and band topology in two-dimensional (2D) materials and related heterostructures. We will first present results from predictive studies of the delicate couplings between ferroelectricity and topology or superconductivity based on first-principles calculations and phenomenological modeling, with ferroelectricity-enabled topological superconductivity as an appealing objective. Next, we will cover the latest advances on experimental studies of ferroelectricity-tuned superconductivity based on different 2D materials or van der Waals heterostructures. Finally, as perspectives, we will outline schemes that may allow to materialize new types of 2D systems that simultaneously harbor ferroelectricity and superconductivity, or that may lead to enhanced ferroelectric superconductivity, ferroelectric topological superconductivity, and new types of superconducting devices such as superconducting diodes. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: Invited Review for Adv.Funct.Mater.,comments are welcome

arXiv:2406.03684 [pdf, other]

Principles of Designing Robust Remote Face Anti-Spoofing Systems

Authors: Xiang Xu, Tianchen Zhao, Zheng Zhang, Zhihua Li, Jon Wu, Alessandro Achille, Mani Srivastava

Abstract: Protecting digital identities of human face from various attack vectors is paramount, and face anti-spoofing plays a crucial role in this endeavor. Current approaches primarily focus on detecting spoofing attempts within individual frames to detect presentation attacks. However, the emergence of hyper-realistic generative models capable of real-time operation has heightened the risk of digitally g… ▽ More Protecting digital identities of human face from various attack vectors is paramount, and face anti-spoofing plays a crucial role in this endeavor. Current approaches primarily focus on detecting spoofing attempts within individual frames to detect presentation attacks. However, the emergence of hyper-realistic generative models capable of real-time operation has heightened the risk of digitally generated attacks. In light of these evolving threats, this paper aims to address two key aspects. First, it sheds light on the vulnerabilities of state-of-the-art face anti-spoofing methods against digital attacks. Second, it presents a comprehensive taxonomy of common threats encountered in face anti-spoofing systems. Through a series of experiments, we demonstrate the limitations of current face anti-spoofing detection techniques and their failure to generalize to novel digital attack scenarios. Notably, the existing models struggle with digital injection attacks including adversarial noise, realistic deepfake attacks, and digital replay attacks. To aid in the design and implementation of robust face anti-spoofing systems resilient to these emerging vulnerabilities, the paper proposes key design principles from model accuracy and robustness to pipeline robustness and even platform robustness. Especially, we suggest to implement the proactive face anti-spoofing system using active sensors to significant reduce the risks for unseen attack vectors and improve the user experience. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: Under review

arXiv:2406.03660 [pdf, other]

doi 10.1145/3643776

Refactoring to Pythonic Idioms: A Hybrid Knowledge-Driven Approach Leveraging Large Language Models

Authors: Zejun Zhang, Zhenchang Xing, Xiaoxue Ren, Qinghua Lu, Xiwei Xu

Abstract: Pythonic idioms are highly valued and widely used in the Python programming community. However, many Python users find it challenging to use Pythonic idioms. Adopting a rule-based approach or LLM-only approach is not sufficient to overcome three persistent challenges of code idiomatization including code miss, wrong detection and wrong refactoring. Motivated by the determinism of rules and adaptab… ▽ More Pythonic idioms are highly valued and widely used in the Python programming community. However, many Python users find it challenging to use Pythonic idioms. Adopting a rule-based approach or LLM-only approach is not sufficient to overcome three persistent challenges of code idiomatization including code miss, wrong detection and wrong refactoring. Motivated by the determinism of rules and adaptability of LLMs, we propose a hybrid approach consisting of three modules. We not only write prompts to instruct LLMs to complete tasks, but we also invoke Analytic Rule Interfaces (ARIs) to accomplish tasks. The ARIs are Python code generated by prompting LLMs to generate code. We first construct a knowledge module with three elements including ASTscenario, ASTcomponent and Condition, and prompt LLMs to generate Python code for incorporation into an ARI library for subsequent use. After that, for any syntax-error-free Python code, we invoke ARIs from the ARI library to extract ASTcomponent from the ASTscenario, and then filter out ASTcomponent that does not meet the condition. Finally, we design prompts to instruct LLMs to abstract and idiomatize code, and then invoke ARIs from the ARI library to rewrite non-idiomatic code into the idiomatic code. Next, we conduct a comprehensive evaluation of our approach, RIdiom, and Prompt-LLM on nine established Pythonic idioms in RIdiom. Our approach exhibits superior accuracy, F1-score, and recall, while maintaining precision levels comparable to RIdiom, all of which consistently exceed or come close to 90% for each metric of each idiom. Lastly, we extend our evaluation to encompass four new Pythonic idioms. Our approach consistently outperforms Prompt-LLM, achieving metrics with values consistently exceeding 90% for accuracy, F1-score, precision, and recall. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: Accepted by FSE 2024,22 pages

Showing 251–300 of 10,438 results for author: Zhang, Z