subscribe to arXiv mailings

Enhancing spatial auditory attention decoding with neuroscience-inspired prototype training

Authors: Zelin Qiu, Jianjun Gu, Dingding Yao, Junfeng Li

Abstract: The spatial auditory attention decoding (Sp-AAD) technology aims to determine the direction of auditory attention in multi-talker scenarios via neural recordings. Despite the success of recent Sp-AAD algorithms, their performance is hindered by trial-specific features in EEG data. This study aims to improve decoding performance against these features. Studies in neuroscience indicate that spatial… ▽ More The spatial auditory attention decoding (Sp-AAD) technology aims to determine the direction of auditory attention in multi-talker scenarios via neural recordings. Despite the success of recent Sp-AAD algorithms, their performance is hindered by trial-specific features in EEG data. This study aims to improve decoding performance against these features. Studies in neuroscience indicate that spatial auditory attention can be reflected in the topological distribution of EEG energy across different frequency bands. This insight motivates us to propose Prototype Training, a neuroscience-inspired method for Sp-AAD. This method constructs prototypes with enhanced energy distribution representations and reduced trial-specific characteristics, enabling the model to better capture auditory attention features. To implement prototype training, an EEGWaveNet that employs the wavelet transform of EEG is further proposed. Detailed experiments indicate that the EEGWaveNet with prototype training outperforms other competitive models on various datasets, and the effectiveness of the proposed method is also validated. As a training method independent of model architecture, prototype training offers new insights into the field of Sp-AAD. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.05869 [pdf, other]

PORCA: Root Cause Analysis with Partially Observed Data

Authors: Chang Gong, Di Yao, Jin Wang, Wenbin Li, Lanting Fang, Yongtao Xie, Kaiyu Feng, Peng Han, Jingping Bi

Abstract: Root Cause Analysis (RCA) aims at identifying the underlying causes of system faults by uncovering and analyzing the causal structure from complex systems. It has been widely used in many application domains. Reliable diagnostic conclusions are of great importance in mitigating system failures and financial losses. However, previous studies implicitly assume a full observation of the system, which… ▽ More Root Cause Analysis (RCA) aims at identifying the underlying causes of system faults by uncovering and analyzing the causal structure from complex systems. It has been widely used in many application domains. Reliable diagnostic conclusions are of great importance in mitigating system failures and financial losses. However, previous studies implicitly assume a full observation of the system, which neglect the effect of partial observation (i.e., missing nodes and latent malfunction). As a result, they fail in deriving reliable RCA results. In this paper, we unveil the issues of unobserved confounders and heterogeneity in partial observation and come up with a new problem of root cause analysis with partially observed data. To achieve this, we propose PORCA, a novel RCA framework which can explore reliable root causes under both unobserved confounders and unobserved heterogeneity. PORCA leverages magnified score-based causal discovery to efficiently optimize acyclic directed mixed graph under unobserved confounders. In addition, we also develop a heterogeneity-aware scheduling strategy to provide adaptive sample weights. Extensive experimental results on one synthetic and two real-world datasets demonstrate the effectiveness and superiority of the proposed framework. △ Less

Submitted 11 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.00541 [pdf]

Answering real-world clinical questions using large language model based systems

Authors: Yen Sia Low, Michael L. Jackson, Rebecca J. Hyde, Robert E. Brown, Neil M. Sanghavi, Julian D. Baldwin, C. William Pike, Jananee Muralidharan, Gavin Hui, Natasha Alexander, Hadeel Hassan, Rahul V. Nene, Morgan Pike, Courtney J. Pokrzywa, Shivam Vedak, Adam Paul Yan, Dong-han Yao, Amy R. Zipursky, Christina Dinh, Philip Ballentine, Dan C. Derieg, Vladimir Polony, Rehan N. Chawdry, Jordan Davies, Brigham B. Hyde , et al. (2 additional authors not shown)

Abstract: Evidence to guide healthcare decisions is often limited by a lack of relevant and trustworthy literature as well as difficulty in contextualizing existing research for a specific patient. Large language models (LLMs) could potentially address both challenges by either summarizing published literature or generating new studies based on real-world data (RWD). We evaluated the ability of five LLM-bas… ▽ More Evidence to guide healthcare decisions is often limited by a lack of relevant and trustworthy literature as well as difficulty in contextualizing existing research for a specific patient. Large language models (LLMs) could potentially address both challenges by either summarizing published literature or generating new studies based on real-world data (RWD). We evaluated the ability of five LLM-based systems in answering 50 clinical questions and had nine independent physicians review the responses for relevance, reliability, and actionability. As it stands, general-purpose LLMs (ChatGPT-4, Claude 3 Opus, Gemini Pro 1.5) rarely produced answers that were deemed relevant and evidence-based (2% - 10%). In contrast, retrieval augmented generation (RAG)-based and agentic LLM systems produced relevant and evidence-based answers for 24% (OpenEvidence) to 58% (ChatRWD) of questions. Only the agentic ChatRWD was able to answer novel questions compared to other LLMs (65% vs. 0-9%). These results suggest that while general-purpose LLMs should not be used as-is, a purpose-built system for evidence summarization based on RAG and one for generating novel evidence working synergistically would improve availability of pertinent evidence for patient care. △ Less

Submitted 29 June, 2024; originally announced July 2024.

Comments: 28 pages (2 figures, 3 tables) inclusive of 8 pages of supplemental materials (4 supplemental figures and 4 supplemental tables)

arXiv:2407.00014 [pdf]

Simplifying Kinematic Parameter Estimation in sEMG Prosthetic Hands: A Two-Point Approach

Authors: Gang Liu, Zhenxiang Wang, Ziyang He, Shanshan Guo, Rui Zhang, Dezhong Yao

Abstract: Regression-based sEMG prosthetic hands are widely used for their ability to provide continuous kinematic parameters. However, establishing these models traditionally requires complex kinematic sensor systems to collect corresponding kinematic data in synchronization with EMG, which is cumbersome and user-unfriendly. This paper presents a simplified approach utilizing only two data points to depict… ▽ More Regression-based sEMG prosthetic hands are widely used for their ability to provide continuous kinematic parameters. However, establishing these models traditionally requires complex kinematic sensor systems to collect corresponding kinematic data in synchronization with EMG, which is cumbersome and user-unfriendly. This paper presents a simplified approach utilizing only two data points to depict kinematic parameters. Finger flexion is recorded as 1, extension as -1, and a near-linear model is employed to interpolate intermediate values, offering a viable alternative for kinematic data. We validated the approach with twenty participants through offline analysis and online experiments. The offline analysis confirmed the model's capability to fill in intermediate points and the online experiments demonstrated that participants could control gestures, adjust force accurately. This study significantly reduces the complexity of collecting dynamic parameters in EMG-based regression prosthetics, thus enhancing usability for prosthetic hands. △ Less

Submitted 1 May, 2024; originally announced July 2024.

Comments: 13 pages

arXiv:2406.19065 [pdf, other]

STBench: Assessing the Ability of Large Language Models in Spatio-Temporal Analysis

Authors: Wenbin Li, Di Yao, Ruibo Zhao, Wenjie Chen, Zijie Xu, Chengxue Luo, Chang Gong, Quanliang Jing, Haining Tan, Jingping Bi

Abstract: The rapid evolution of large language models (LLMs) holds promise for reforming the methodology of spatio-temporal data mining. However, current works for evaluating the spatio-temporal understanding capability of LLMs are somewhat limited and biased. These works either fail to incorporate the latest language models or only focus on assessing the memorized spatio-temporal knowledge. To address thi… ▽ More The rapid evolution of large language models (LLMs) holds promise for reforming the methodology of spatio-temporal data mining. However, current works for evaluating the spatio-temporal understanding capability of LLMs are somewhat limited and biased. These works either fail to incorporate the latest language models or only focus on assessing the memorized spatio-temporal knowledge. To address this gap, this paper dissects LLMs' capability of spatio-temporal data into four distinct dimensions: knowledge comprehension, spatio-temporal reasoning, accurate computation, and downstream applications. We curate several natural language question-answer tasks for each category and build the benchmark dataset, namely STBench, containing 13 distinct tasks and over 60,000 QA pairs. Moreover, we have assessed the capabilities of 13 LLMs, such as GPT-4o, Gemma and Mistral. Experimental results reveal that existing LLMs show remarkable performance on knowledge comprehension and spatio-temporal reasoning tasks, with potential for further enhancement on other tasks through in-context learning, chain-of-though prompting, and fine-tuning. The code and datasets of STBench are released on https://github.com/LwbXc/STBench. △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.16728 [pdf, other]

doi 10.1145/3616855.3635766

CausalMMM: Learning Causal Structure for Marketing Mix Modeling

Authors: Chang Gong, Di Yao, Lei Zhang, Sheng Chen, Wenbin Li, Yueyang Su, Jingping Bi

Abstract: In online advertising, marketing mix modeling (MMM) is employed to predict the gross merchandise volume (GMV) of brand shops and help decision-makers to adjust the budget allocation of various advertising channels. Traditional MMM methods leveraging regression techniques can fail in handling the complexity of marketing. Although some efforts try to encode the causal structures for better predictio… ▽ More In online advertising, marketing mix modeling (MMM) is employed to predict the gross merchandise volume (GMV) of brand shops and help decision-makers to adjust the budget allocation of various advertising channels. Traditional MMM methods leveraging regression techniques can fail in handling the complexity of marketing. Although some efforts try to encode the causal structures for better prediction, they have the strict restriction that causal structures are prior-known and unchangeable. In this paper, we define a new causal MMM problem that automatically discovers the interpretable causal structures from data and yields better GMV predictions. To achieve causal MMM, two essential challenges should be addressed: (1) Causal Heterogeneity. The causal structures of different kinds of shops vary a lot. (2) Marketing Response Patterns. Various marketing response patterns i.e., carryover effect and shape effect, have been validated in practice. We argue that causal MMM needs dynamically discover specific causal structures for different shops and the predictions should comply with the prior known marketing response patterns. Thus, we propose CausalMMM that integrates Granger causality in a variational inference framework to measure the causal relationships between different channels and predict the GMV with the regularization of both temporal and saturation marketing response patterns. Extensive experiments show that CausalMMM can not only achieve superior performance of causal structure learning on synthetic datasets with improvements of 5.7%\sim 7.1%, but also enhance the GMV prediction results on a representative E-commerce platform. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: WSDM 2024, full version

arXiv:2406.10616 [pdf, other]

doi 10.1145/3637528.3671660

HiFGL: A Hierarchical Framework for Cross-silo Cross-device Federated Graph Learning

Authors: Zhuoning Guo, Duanyi Yao, Qiang Yang, Hao Liu

Abstract: Federated Graph Learning (FGL) has emerged as a promising way to learn high-quality representations from distributed graph data with privacy preservation. Despite considerable efforts have been made for FGL under either cross-device or cross-silo paradigm, how to effectively capture graph knowledge in a more complicated cross-silo cross-device environment remains an under-explored problem. However… ▽ More Federated Graph Learning (FGL) has emerged as a promising way to learn high-quality representations from distributed graph data with privacy preservation. Despite considerable efforts have been made for FGL under either cross-device or cross-silo paradigm, how to effectively capture graph knowledge in a more complicated cross-silo cross-device environment remains an under-explored problem. However, this task is challenging because of the inherent hierarchy and heterogeneity of decentralized clients, diversified privacy constraints in different clients, and the cross-client graph integrity requirement. To this end, in this paper, we propose a Hierarchical Federated Graph Learning (HiFGL) framework for cross-silo cross-device FGL. Specifically, we devise a unified hierarchical architecture to safeguard federated GNN training on heterogeneous clients while ensuring graph integrity. Moreover, we propose a Secret Message Passing (SecMP) scheme to shield unauthorized access to subgraph-level and node-level sensitive information simultaneously. Theoretical analysis proves that HiFGL achieves multi-level privacy preservation with complexity guarantees. Extensive experiments on real-world datasets validate the superiority of the proposed framework against several baselines. Furthermore, HiFGL's versatile nature allows for its application in either solely cross-silo or cross-device settings, further broadening its utility in real-world FGL applications. △ Less

Submitted 15 June, 2024; originally announced June 2024.

Comments: Accepted by SIGKDD 2024

arXiv:2406.08901 [pdf, other]

Fractional Chern insulator candidate in twisted bilayer checkboard lattice

Authors: Jia-Zheng Ma, Rui-Zhen Huang, Guo-Yi Zhu, Ji-Yao Chen, Dao-Xin Yao

Abstract: We investigate a fractional Chern insulator (FCI) candidate arising from Moiré bands with higher Chern number C=2 on a magic angle twisted bilayer checkboard lattice (MATBCB). There are two nearly flat low lying bands in the single particle energy spectrum under the first magic angle $φ\approx 1.608^{\circ}$ and chiral limit. We find MATBCB hosts a nearly uniform Berry curvature distribution and e… ▽ More We investigate a fractional Chern insulator (FCI) candidate arising from Moiré bands with higher Chern number C=2 on a magic angle twisted bilayer checkboard lattice (MATBCB). There are two nearly flat low lying bands in the single particle energy spectrum under the first magic angle $φ\approx 1.608^{\circ}$ and chiral limit. We find MATBCB hosts a nearly uniform Berry curvature distribution and exhibits tiny violation of quantum geometric trace condition in the first moiré Brillourin Zone (mBZ), indicating that there is a nearly ideal quantum geometry in MATBCB in single particle level. Turning on projected Coulomb interactions, we perform exact diagonalization and find a ten-fold ground state quasi-degeneracy in many body energy spectrum with filling fraction $ν=1/5$. The ten-fold quasi-degenrate ground states further show spectra flow under flux pumping. By diagnosing the particle entanglement spectrum (PES) of the ground states, we obtain a clear PES gap and quasi-hole state counting consistent with Halperin spin singlet generalized Pauli principle, suggesting that a fractional Chern insulator is realized in this system. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 14 pages, 9 figures

arXiv:2406.08165 [pdf, other]

Double pion photoproduction off nucleons in covariant chiral perturbation theory

Authors: Kai-Ge Kang, Xiong-Hui Cao, De-Liang Yao, Han-Qing Zheng

Abstract: The double pion photoproduction off nucleons near threshold is analyzed in a covariant baryon chiral perturbation theory up to next to leading order, where the $Δ(1232)$, $N^*(1400)$ and $ρ(770)$ resonances are included as explicit degrees of freedom. For the process $γp \to π^+ π^0 n$, the chiral results of total cross sections, invariant-mass distributions and beam-helicity asymmetry are in good… ▽ More The double pion photoproduction off nucleons near threshold is analyzed in a covariant baryon chiral perturbation theory up to next to leading order, where the $Δ(1232)$, $N^*(1400)$ and $ρ(770)$ resonances are included as explicit degrees of freedom. For the process $γp \to π^+ π^0 n$, the chiral results of total cross sections, invariant-mass distributions and beam-helicity asymmetry are in good agreement with the experimental data within uncertainties. For the process $γp \to π^0 π^0 p$, the prediction of total cross section deviates from the existing experimental data. Once the final-state interaction of $ππ$ in the isoscalar S-wave channel is taken into account, a good description of the cross section is achieved. The effect of the Roper resonance always turns out be negligible, and hence can be thrown away in future study of this process. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 27 pages, 11 figures, 1 table

arXiv:2406.04837 [pdf, other]

Normal and superconducting properties of La$_3$Ni$_2$O$_7$

Authors: Meng Wang, Hai-Hu Wen, Tao Wu, Dao-Xin Yao, Tao Xiang

Abstract: This review provides a comprehensive overview of current research on the structural, electronic, and magnetic characteristics of the recently discovered high-temperature superconductor La$_3$Ni$_2$O$_7$ under high pressures. We present the experimental results for synthesizing and characterizing this material, derived from measurements of transport, thermodynamics, and various spectroscopic techni… ▽ More This review provides a comprehensive overview of current research on the structural, electronic, and magnetic characteristics of the recently discovered high-temperature superconductor La$_3$Ni$_2$O$_7$ under high pressures. We present the experimental results for synthesizing and characterizing this material, derived from measurements of transport, thermodynamics, and various spectroscopic techniques, and discuss their physical implications. We also explore theoretical models proposed to describe the electronic structures and superconducting pairing symmetry in La$_3$Ni$_2$O$_7$, highlighting the intricate interplay between electronic correlations and magnetic interactions. Despite these advances, challenges remain in growing high-quality samples free of extrinsic phases and oxygen deficiencies and in developing reliable measurement tools for determining diamagnetism and other physical quantities under high pressures. Further investigations in these areas are essential to deepening our understanding of the physical properties of La$_3$Ni$_2$O$_7$ and unlocking its superconducting pairing mechanism. △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: 15 pages, 11 figures

arXiv:2405.19161 [pdf]

Origin of the density wave instability in trilayer nickelate La4Ni3O10 revealed by optical and ultrafast spectroscopy

Authors: Shuxiang Xu, Cui-Qun Chen, Mengwu Huo, Deyuan Hu, Hao Wang, Qiong Wu, Rongsheng Li, Dong Wu, Meng Wang, Dao-Xin Yao, Tao Dong, Nanlin Wang

Abstract: Here we employed optical spectroscopy and ultrafast reflectivity measurements to investigate the density wave instability of trilayer nickelate La4Ni3O10 at ambient pressure. Our optical spectroscopy measurements indicate that La4Ni3O10 is metallic with a large plasma frequency at room temperature. As the temperature decreases, we observe the formation of an energy gap in reflectivity below TDW, s… ▽ More Here we employed optical spectroscopy and ultrafast reflectivity measurements to investigate the density wave instability of trilayer nickelate La4Ni3O10 at ambient pressure. Our optical spectroscopy measurements indicate that La4Ni3O10 is metallic with a large plasma frequency at room temperature. As the temperature decreases, we observe the formation of an energy gap in reflectivity below TDW, signaling the charge/spin density wave transition. The Drude component was largely removed due to the gap opening in the Fermi surface. Our Drude-Lorentz analysis reveals that the energy gap in La4Ni3O10 is approximately 61 meV, which is three times larger than that obtained from ARPES measurements. The density wave gap feature is more prominent than that observed in bilayer nickelate La3Ni2O7, suggesting more carriers are gapped at the Fermi surface across the density wave transition. By comparing the measured plasma frequency with the first-principles calculation, we categorize La4Ni3O10 as a moderately electronic correlation material, similar to the parent compound of iron-based superconductors, however, being weaker than the bilayer nickelate La3Ni2O7. Our ultrafast pump-probe experiments also show that the relaxation time diverges near the transition temperature. By analyzing the amplitude and relaxation time with the Rothwarf-Taylor model, we estimate the energy gap to be 58 meV, which agrees with the result of optical spectroscopy. The more prominent gap feature and weaker electronic correlation might be the cause of a lower superconductivity transition temperature in La4Ni3O10 under high pressure. These findings significantly contribute to understanding the origin of density wave and superconductivity in trilayer nickelate La4Ni3O10. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.17903 [pdf, other]

doi 10.1016/j.neunet.2024.106493

Reliable Object Tracking by Multimodal Hybrid Feature Extraction and Transformer-Based Fusion

Authors: Hongze Sun, Rui Liu, Wuque Cai, Jun Wang, Yue Wang, Huajin Tang, Yan Cui, Dezhong Yao, Daqing Guo

Abstract: Visual object tracking, which is primarily based on visible light image sequences, encounters numerous challenges in complicated scenarios, such as low light conditions, high dynamic ranges, and background clutter. To address these challenges, incorporating the advantages of multiple visual modalities is a promising solution for achieving reliable object tracking. However, the existing approaches… ▽ More Visual object tracking, which is primarily based on visible light image sequences, encounters numerous challenges in complicated scenarios, such as low light conditions, high dynamic ranges, and background clutter. To address these challenges, incorporating the advantages of multiple visual modalities is a promising solution for achieving reliable object tracking. However, the existing approaches usually integrate multimodal inputs through adaptive local feature interactions, which cannot leverage the full potential of visual cues, thus resulting in insufficient feature modeling. In this study, we propose a novel multimodal hybrid tracker (MMHT) that utilizes frame-event-based data for reliable single object tracking. The MMHT model employs a hybrid backbone consisting of an artificial neural network (ANN) and a spiking neural network (SNN) to extract dominant features from different visual modalities and then uses a unified encoder to align the features across different domains. Moreover, we propose an enhanced transformer-based module to fuse multimodal features using attention mechanisms. With these methods, the MMHT model can effectively construct a multiscale and multidimensional visual feature space and achieve discriminative feature modeling. Extensive experiments demonstrate that the MMHT model exhibits competitive performance in comparison with that of other state-of-the-art methods. Overall, our results highlight the effectiveness of the MMHT model in terms of addressing the challenges faced in visual object tracking tasks. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: 16 pages, 7 figures, 9 tabes; This work has been submitted for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2405.16848 [pdf, other]

A re-calibration method for object detection with multi-modal alignment bias in autonomous driving

Authors: Zhihang Song, Lihui Peng, Jianming Hu, Danya Yao, Yi Zhang

Abstract: Multi-modal object detection in autonomous driving has achieved great breakthroughs due to the usage of fusing complementary information from different sensors. The calibration in fusion between sensors such as LiDAR and camera is always supposed to be precise in previous work. However, in reality, calibration matrices are fixed when the vehicles leave the factory, but vibration, bumps, and data l… ▽ More Multi-modal object detection in autonomous driving has achieved great breakthroughs due to the usage of fusing complementary information from different sensors. The calibration in fusion between sensors such as LiDAR and camera is always supposed to be precise in previous work. However, in reality, calibration matrices are fixed when the vehicles leave the factory, but vibration, bumps, and data lags may cause calibration bias. As the research on the calibration influence on fusion detection performance is relatively few, flexible calibration dependency multi-sensor detection method has always been attractive. In this paper, we conducted experiments on SOTA detection method EPNet++ and proved slight bias on calibration can reduce the performance seriously. We also proposed a re-calibration model based on semantic segmentation which can be combined with a detection algorithm to improve the performance and robustness of multi-modal calibration bias. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 10 pages, 6 figures

arXiv:2405.14953 [pdf, other]

Mallows-DPO: Fine-Tune Your LLM with Preference Dispersions

Authors: Haoxian Chen, Hanyang Zhao, Henry Lam, David Yao, Wenpin Tang

Abstract: Direct Preference Optimization (DPO) has recently emerged as a popular approach to improve reinforcement learning with human feedback (RLHF), leading to better techniques to fine-tune large language models (LLM). A weakness of DPO, however, lies in its lack of capability to characterize the diversity of human preferences. Inspired by Mallows' theory of preference ranking, we develop in this paper… ▽ More Direct Preference Optimization (DPO) has recently emerged as a popular approach to improve reinforcement learning with human feedback (RLHF), leading to better techniques to fine-tune large language models (LLM). A weakness of DPO, however, lies in its lack of capability to characterize the diversity of human preferences. Inspired by Mallows' theory of preference ranking, we develop in this paper a new approach, the Mallows-DPO. A distinct feature of this approach is a dispersion index, which reflects the dispersion of human preference to prompts. We show that existing DPO models can be reduced to special cases of this dispersion index, thus unified with Mallows-DPO. More importantly, we demonstrate (empirically) how to use this dispersion index to enhance the performance of DPO in a broad array of benchmark tasks, from synthetic bandit selection to controllable generations and dialogues, while maintaining great generalization capabilities. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.14291 [pdf, other]

Variational Bayes for Federated Continual Learning

Authors: Dezhong Yao, Sanmu Li, Yutong Dai, Zhiqiang Xu, Shengshan Hu, Peilin Zhao, Lichao Sun

Abstract: Federated continual learning (FCL) has received increasing attention due to its potential in handling real-world streaming data, characterized by evolving data distributions and varying client classes over time. The constraints of storage limitations and privacy concerns confine local models to exclusively access the present data within each learning cycle. Consequently, this restriction induces p… ▽ More Federated continual learning (FCL) has received increasing attention due to its potential in handling real-world streaming data, characterized by evolving data distributions and varying client classes over time. The constraints of storage limitations and privacy concerns confine local models to exclusively access the present data within each learning cycle. Consequently, this restriction induces performance degradation in model training on previous data, termed "catastrophic forgetting". However, existing FCL approaches need to identify or know changes in data distribution, which is difficult in the real world. To release these limitations, this paper directs attention to a broader continuous framework. Within this framework, we introduce Federated Bayesian Neural Network (FedBNN), a versatile and efficacious framework employing a variational Bayesian neural network across all clients. Our method continually integrates knowledge from local and historical data distributions into a single model, adeptly learning from new data distributions while retaining performance on historical distributions. We rigorously evaluate FedBNN's performance against prevalent methods in federated learning and continual learning using various metrics. Experimental analyses across diverse datasets demonstrate that FedBNN achieves state-of-the-art results in mitigating forgetting. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.13888 [pdf, other]

Marrying Causal Representation Learning with Dynamical Systems for Science

Authors: Dingling Yao, Caroline Muller, Francesco Locatello

Abstract: Causal representation learning promises to extend causal models to hidden causal variables from raw entangled measurements. However, most progress has focused on proving identifiability results in different settings, and we are not aware of any successful real-world application. At the same time, the field of dynamical systems benefited from deep learning and scaled to countless applications but d… ▽ More Causal representation learning promises to extend causal models to hidden causal variables from raw entangled measurements. However, most progress has focused on proving identifiability results in different settings, and we are not aware of any successful real-world application. At the same time, the field of dynamical systems benefited from deep learning and scaled to countless applications but does not allow parameter identification. In this paper, we draw a clear connection between the two and their key assumptions, allowing us to apply identifiable methods developed in causal representation learning to dynamical systems. At the same time, we can leverage scalable differentiable solvers developed for differential equations to build models that are both identifiable and practical. Overall, we learn explicitly controllable models that isolate the trajectory-specific parameters for further downstream tasks such as out-of-distribution classification or treatment effect estimation. We experiment with a wind simulator with partially known factors of variation. We also apply the resulting model to real-world climate data and successfully answer downstream causal questions in line with existing literature on climate change. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: 21 pages, 8 figures, 6 tables

arXiv:2405.07626 [pdf, other]

AnomalyLLM: Few-shot Anomaly Edge Detection for Dynamic Graphs using Large Language Models

Authors: Shuo Liu, Di Yao, Lanting Fang, Zhetao Li, Wenbin Li, Kaiyu Feng, XiaoWen Ji, Jingping Bi

Abstract: Detecting anomaly edges for dynamic graphs aims to identify edges significantly deviating from the normal pattern and can be applied in various domains, such as cybersecurity, financial transactions and AIOps. With the evolving of time, the types of anomaly edges are emerging and the labeled anomaly samples are few for each type. Current methods are either designed to detect randomly inserted edge… ▽ More Detecting anomaly edges for dynamic graphs aims to identify edges significantly deviating from the normal pattern and can be applied in various domains, such as cybersecurity, financial transactions and AIOps. With the evolving of time, the types of anomaly edges are emerging and the labeled anomaly samples are few for each type. Current methods are either designed to detect randomly inserted edges or require sufficient labeled data for model training, which harms their applicability for real-world applications. In this paper, we study this problem by cooperating with the rich knowledge encoded in large language models(LLMs) and propose a method, namely AnomalyLLM. To align the dynamic graph with LLMs, AnomalyLLM pre-trains a dynamic-aware encoder to generate the representations of edges and reprograms the edges using the prototypes of word embeddings. Along with the encoder, we design an in-context learning framework that integrates the information of a few labeled samples to achieve few-shot anomaly detection. Experiments on four datasets reveal that AnomalyLLM can not only significantly improve the performance of few-shot anomaly detection, but also achieve superior results on new anomalies without any update of model parameters. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 13pages

arXiv:2405.04284 [pdf, ps, other]

Quasi-stationary distributions for subcritical branching Markov chains

Authors: Wenming Hong, Dan Yao

Abstract: Consider a subcritical branching Markov chain. Let $Z_n$ denote the counting measure of particles of generation $n$. Under some conditions, we give a probabilistic proof for the existence of the Yaglom limit of $(Z_n)_{n\in\mathbb{N}}$ by the moment method, based on the spinal decomposition and the many-to-few formula. As a result, we give explicit integral representations of all quasi-stationary… ▽ More Consider a subcritical branching Markov chain. Let $Z_n$ denote the counting measure of particles of generation $n$. Under some conditions, we give a probabilistic proof for the existence of the Yaglom limit of $(Z_n)_{n\in\mathbb{N}}$ by the moment method, based on the spinal decomposition and the many-to-few formula. As a result, we give explicit integral representations of all quasi-stationary distributions of $(Z_n)_{n\in\mathbb{N}}$, whose proofs are direct and probabilistic, and don't rely on Martin boundary theory. △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2405.00696 [pdf, other]

Life-long Learning and Testing for Automated Vehicles via Adaptive Scenario Sampling as A Continuous Optimization Process

Authors: Jingwei Ge, Pengbo Wang, Cheng Chang, Yi Zhang, Danya Yao, Li Li

Abstract: Sampling critical testing scenarios is an essential step in intelligence testing for Automated Vehicles (AVs). However, due to the lack of prior knowledge on the distribution of critical scenarios in sampling space, we can hardly efficiently find the critical scenarios or accurately evaluate the intelligence of AVs. To solve this problem, we formulate the testing as a continuous optimization proce… ▽ More Sampling critical testing scenarios is an essential step in intelligence testing for Automated Vehicles (AVs). However, due to the lack of prior knowledge on the distribution of critical scenarios in sampling space, we can hardly efficiently find the critical scenarios or accurately evaluate the intelligence of AVs. To solve this problem, we formulate the testing as a continuous optimization process which iteratively generates potential critical scenarios and meanwhile evaluates these scenarios. A bi-level loop is proposed for such life-long learning and testing. In the outer loop, we iteratively learn space knowledge by evaluating AV in the already sampled scenarios and then sample new scenarios based on the retained knowledge. Outer loop stops when all generated samples cover the whole space. While to maximize the coverage of the space in each outer loop, we set an inner loop which receives newly generated samples in outer loop and outputs the updated positions of these samples. We assume that points in a small sphere-like subspace can be covered (or represented) by the point in the center of this sphere. Therefore, we can apply a multi-rounds heuristic strategy to move and pack these spheres in space to find the best covering solution. The simulation results show that faster and more accurate evaluation of AVs can be achieved with more critical scenarios. △ Less

Submitted 28 March, 2024; originally announced May 2024.

arXiv:2404.19582 [pdf, other]

Leveraging Label Information for Stealthy Data Stealing in Vertical Federated Learning

Authors: Duanyi Yao, Songze Li, Xueluan Gong, Sizai Hou, Gaoning Pan

Abstract: We develop DMAVFL, a novel attack strategy that evades current detection mechanisms. The key idea is to integrate a discriminator with auxiliary classifier that takes a full advantage of the label information (which was completely ignored in previous attacks): on one hand, label information helps to better characterize embeddings of samples from distinct classes, yielding an improved reconstructio… ▽ More We develop DMAVFL, a novel attack strategy that evades current detection mechanisms. The key idea is to integrate a discriminator with auxiliary classifier that takes a full advantage of the label information (which was completely ignored in previous attacks): on one hand, label information helps to better characterize embeddings of samples from distinct classes, yielding an improved reconstruction performance; on the other hand, computing malicious gradients with label information better mimics the honest training, making the malicious gradients indistinguishable from the honest ones, and the attack much more stealthy. Our comprehensive experiments demonstrate that DMAVFL significantly outperforms existing attacks, and successfully circumvents SOTA defenses for malicious attacks. Additional ablation studies and evaluations on other defenses further underscore the robustness and effectiveness of DMAVFL. △ Less

Submitted 30 April, 2024; originally announced April 2024.

arXiv:2404.19219 [pdf, other]

Chiral magnon in ferromagnetic chiral crystals

Authors: Dapeng Yao, Takehito Yokoyama

Abstract: We theoretically propose chiral magnon in ferromagnetic chiral crystals. We show that the crystal chirality is imprinted in orbital angular momentum of magnons which exhibits the opposite signs for opposite chiralities of the crystal. We also show that a finite magnon orbital angular momentum can be induced by a temperature gradient which is a magnonic analogue of the Edelstein effect. We theoretically propose chiral magnon in ferromagnetic chiral crystals. We show that the crystal chirality is imprinted in orbital angular momentum of magnons which exhibits the opposite signs for opposite chiralities of the crystal. We also show that a finite magnon orbital angular momentum can be induced by a temperature gradient which is a magnonic analogue of the Edelstein effect. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: 9 pages, 5 figures

arXiv:2404.16738 [pdf, other]

Superconducting Klein and anti-Klein tunneling in Weyl junctions

Authors: Jiajia Huang, Luyang Wang, Dao-Xin Yao

Abstract: Klein tunneling is an old topic in relativistic quantum physics, and has been observed recently in graphene where massless particles reside. Here, we propose a new heterostructure platform for Klein tunneling to occur, which consists of a Weyl-semimetal-based normal state/superconductor (NS) junction. By developing a Blonder-Tinkham-Klapwijk-like theory, we find that Klein tunneling occurs at norm… ▽ More Klein tunneling is an old topic in relativistic quantum physics, and has been observed recently in graphene where massless particles reside. Here, we propose a new heterostructure platform for Klein tunneling to occur, which consists of a Weyl-semimetal-based normal state/superconductor (NS) junction. By developing a Blonder-Tinkham-Klapwijk-like theory, we find that Klein tunneling occurs at normal incidence, which can lead to differential conductance doubling. If the (single) Weyl semimeltals are replaced by double Weyl semimetals, anti-Klein tunneling will take place of Klein tunneling. Our work provides a theoretical guide for the detection of (anti-)Klein tunneling in three-dimensional chiral NS junctions. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: 5 pages, 7 figures

arXiv:2404.11001 [pdf]

Modulation of the Octahedral Structure and Potential Superconductivity of La$_3$Ni$_2$O$_7$ through Strain Engineering

Authors: Zihao Huo, Zhihui Luo, Peng Zhang, Aiqin Yang, Zhengtao Liu, Xiangru Tao, Zihan Zhang, Shumin Guo, Qiwen Jiang, Wenxuan Chen, Dao-Xin Yao, Defang Duan, Tian Cui

Abstract: The recent transport measurement of La$_3$Ni$_2$O$_7$ uncover a "right-triangle" shape of the superconducting dome in the pressure-temperature (P-T) phase diagram. Motivated by this, we perform theoretical first-principles studies of La$_3$Ni$_2$O$_7$ with the pressure ranging from 0 to 100 GPa. Notably, we reveal a pressure dependence of the Ni-$d_{z^2}$ electron density at the Fermi energy (… ▽ More The recent transport measurement of La$_3$Ni$_2$O$_7$ uncover a "right-triangle" shape of the superconducting dome in the pressure-temperature (P-T) phase diagram. Motivated by this, we perform theoretical first-principles studies of La$_3$Ni$_2$O$_7$ with the pressure ranging from 0 to 100 GPa. Notably, we reveal a pressure dependence of the Ni-$d_{z^2}$ electron density at the Fermi energy ($n_z^{EF}$) that highly coincides with such shape. On this basis, we further explore the electronic structure under uniaxial stress. By tracking the stress response of $n_z^{EF}$, we propose that superconductivity can be achieved by applying only about 2 GPa of compression along the c axis. The idea is further exemplified from the perspectives of lattice distortion, band structure, Fermi surface and superconducting phase coherence. We also discuss the possible charge modulation under the stress and provide an insight to the relation between n_z^EF and the superconducting Tc in La$_3$Ni$_2$O$_7$ system. Our study provides a helpful guide to the future experiment. △ Less

Submitted 8 July, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

arXiv:2404.08498 [pdf, other]

Two-dimensional XY Ferromagnet Induced by Long-range Interaction

Authors: Tianning Xiao, Dingyun Yao, Chao Zhang, Zhijie Fan, Youjin Deng

Abstract: The crossover between short-range and long-range (LR) universal behaviors remains a central theme in the physics of long-range interacting systems. The competition between LR coupling and the Berezinskii-Kosterlitz-Thouless mechanism makes the problem more subtle and less understood in the two-dimensional (2D) XY model, a cornerstone for investigating low-dimensional phenomena and their implicatio… ▽ More The crossover between short-range and long-range (LR) universal behaviors remains a central theme in the physics of long-range interacting systems. The competition between LR coupling and the Berezinskii-Kosterlitz-Thouless mechanism makes the problem more subtle and less understood in the two-dimensional (2D) XY model, a cornerstone for investigating low-dimensional phenomena and their implications in quantum computation. We study the 2D XY model with algebraically decaying interaction $\sim1/r^{2+σ}$. Utilizing an advanced update strategy, we conduct large-scale Monte Carlo simulations of the model up to a linear size of $L=8192$. Our results demonstrate continuous phase transitions into a ferromagnetic phase for $σ\leq 2$, which exhibits the simultaneous emergence of a long-ranged order and a power-law decaying correlation function due to the Goldstone mode. Furthermore, we find logarithmic scaling behaviors in the low-temperature phase at $σ= 2$. The observed scaling behaviors in the low-temperature phase for $σ\le 2$ agree with our theoretical analysis. Our findings request further theoretical understandings and can be of practical application in cutting-edge experiments like Rydberg atom arrays. △ Less

Submitted 12 April, 2024; originally announced April 2024.

arXiv:2404.04863 [pdf]

Microscopic Insights into Fatigue Mechanism in Wurtzite Ferroelectric Al$_{0.65}$Sc$_{0.35}$N: Oxygen Infiltration Enabled Grain Amorphization Spanning Boundary to Bulk

Authors: Ruiqing Wang, Danyang Yao, Jiuren Zhou, Yang Li, Zhi Jiang, Dongliang Chen, Xu Ran, Yu Gao, Zixuan Cheng, Yong Wang, Yan Liu, Yue Hao, Genquan Han

Abstract: For the first time, the fatigue behavior involving external oxygen in highly Sc-doped AlN ferroelectric film was observed using transmission electron microscope techniques. Despite increasing the Sc composition in AlScN film contributes to reducing the device operation voltage, the inherent affinity of Sc for oxygen introduces instability in device performance. In this study, oxygen incorporation… ▽ More For the first time, the fatigue behavior involving external oxygen in highly Sc-doped AlN ferroelectric film was observed using transmission electron microscope techniques. Despite increasing the Sc composition in AlScN film contributes to reducing the device operation voltage, the inherent affinity of Sc for oxygen introduces instability in device performance. In this study, oxygen incorporation at top electrode edges and grain boundaries accompanied with an increase in current leakage and the disappearance of ferroelectric properties, was observed in nanoscale after long-term field cycling. This observation indicates the emergence of non-ferroelectric and even amorphous states. This presented work revealed solid experimental evidence of an oxygen-involved fatigue mechanism, providing valuable insights into the physical nature of the ferroelectric properties of AlScN films. △ Less

Submitted 7 April, 2024; originally announced April 2024.

Comments: 2 Pages,7 figures

arXiv:2404.02963 [pdf, other]

Unraveling the Mn $L_3$-edge RIXS spectrum of lightly manganese doped Sr$_{3}$Ru$_{2}$O$_{7}$

Authors: Wei-Yang Chen, Shih-Wen Huang, Yi Tseng, Wenliang Zhang, Eugenio Paris, Teguh Citra Asmara, Jenn-Min Lee, Thorsten Schmitt, Yu-Cheng Shao, Yi-De Chuang, Byron Freelon, Dao-Xin Yao, Trinanjan Datta

Abstract: Resonant inelastic x-ray scattering (RIXS) experiment was performed at the Mn $L_3$ edge. A 10 $\%$ Mn-doped Sr$_{3}$Ru$_{2}$O$_{7}$ compound, where the Mn$^{3+}$ ions are in the 3$d^4$ state, were probed for $dd$ excitations. The dilute doping concentration allows one to treat the dopant Mn$^{3+}$ ions as effectively free in the host ruthenium compound. The local nature of $dd$ RIXS spectroscopy… ▽ More Resonant inelastic x-ray scattering (RIXS) experiment was performed at the Mn $L_3$ edge. A 10 $\%$ Mn-doped Sr$_{3}$Ru$_{2}$O$_{7}$ compound, where the Mn$^{3+}$ ions are in the 3$d^4$ state, were probed for $dd$ excitations. The dilute doping concentration allows one to treat the dopant Mn$^{3+}$ ions as effectively free in the host ruthenium compound. The local nature of $dd$ RIXS spectroscopy permits one to use a single-site model to simulate the experimental spectra. The simulated spectra reproduces the in-plane [100] experimental RIXS spectrum. We also predict the intensity for the in-plane [110] direction and the out-of-plane spin orientation configuration [001]. Based on our single-ion model we were able to fit the experimental data to obtain the crystal field parameters, the 10Dq value, and the intra-orbital spin-flip energy 2$\mathcal{J}$(or $3J_{H}$, where $J_{H}$ is the Hund's energy) of the Mn$^{3+}$ ion. Utilizing our computed RIXS quantum transition amplitudes between the various $d$ orbitals of the Mn$^{3+}$ ion, the expression for the Kramers-Heisenberg cross section, and a self-consistent fitting procedure we also identify the energy boundaries of the non-spin-flip and spin-flip $dd$ excitations present in the experimental data. From our fitting procedure we obtain $2\mathcal{J} (3J_{H})=2.06$ eV, a value which is in excellent agreement with that computed from the free ion Racah parameters. We also identified the charge transfer boundary. In addition to predicting the microscopic parameters, we find a quantum spin-flip transition in the non-cross ($σ_{in}-σ_{out}$, $π_{in}-π_{out}$) x-ray polarization channels of the $dd$ RIXS spectra. A similar transition, was previously predicted to occur in the $π-π$ channel of the magnon spectrum in the non-collinear non-coplanar Kagome compound composed of Cu$^{2+}$ 3d$^{9}$ ion. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: 12 pages, 7 figures, see PDF text for full abstract info

arXiv:2403.17743 [pdf, other]

Low-energy elastic (anti)neutrino-nucleon scattering in covariant baryon chiral perturbation theory

Authors: Jin-Man Chen, Ze-Rui Liang, De-Liang Yao

Abstract: The low-energy antineutrino- and neutrino-nucleon neutral current elastic scattering is studied within the framework of the relativistic SU(2) baryon chiral perturbation theory up to the order of $\mathcal{O}(p^3)$. We have derived the model-independent hadronic amplitudes and extracted the form factors from them. It is found that differential cross sections ${{\rm d} σ}/{{\rm d} Q^2}$ for the pro… ▽ More The low-energy antineutrino- and neutrino-nucleon neutral current elastic scattering is studied within the framework of the relativistic SU(2) baryon chiral perturbation theory up to the order of $\mathcal{O}(p^3)$. We have derived the model-independent hadronic amplitudes and extracted the form factors from them. It is found that differential cross sections ${{\rm d} σ}/{{\rm d} Q^2}$ for the processes of (anti)neutrino-proton scattering are in good agreement with the existing MiniBooNE data in the $Q^2$ region $[0.13,0.20]$ GeV$^2$, where nuclear effects are expected to be negligible. For $Q^2\leq 0.13$ GeV$^2$, large deviation is observed, which is mainly owing to the sizeable Pauli blocking effect. Comparisons with the simulation data produced by the NuWro and GIENE Mento Carlo events generators are also discussed. The chiral results obtained in this work can be utilized as inputs in various nuclear models to achieve the goal of precise determination of the strangeness axial vector form factor, in particular when the low-energy MicroBooNE data are available in the near future. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: 25 pages, 8 figures, 2 tables

arXiv:2403.10801 [pdf, other]

Securely Fine-tuning Pre-trained Encoders Against Adversarial Examples

Authors: Ziqi Zhou, Minghui Li, Wei Liu, Shengshan Hu, Yechao Zhang, Wei Wan, Lulu Xue, Leo Yu Zhang, Dezhong Yao, Hai Jin

Abstract: With the evolution of self-supervised learning, the pre-training paradigm has emerged as a predominant solution within the deep learning landscape. Model providers furnish pre-trained encoders designed to function as versatile feature extractors, enabling downstream users to harness the benefits of expansive models with minimal effort through fine-tuning. Nevertheless, recent works have exposed a… ▽ More With the evolution of self-supervised learning, the pre-training paradigm has emerged as a predominant solution within the deep learning landscape. Model providers furnish pre-trained encoders designed to function as versatile feature extractors, enabling downstream users to harness the benefits of expansive models with minimal effort through fine-tuning. Nevertheless, recent works have exposed a vulnerability in pre-trained encoders, highlighting their susceptibility to downstream-agnostic adversarial examples (DAEs) meticulously crafted by attackers. The lingering question pertains to the feasibility of fortifying the robustness of downstream models against DAEs, particularly in scenarios where the pre-trained encoders are publicly accessible to the attackers. In this paper, we initially delve into existing defensive mechanisms against adversarial examples within the pre-training paradigm. Our findings reveal that the failure of current defenses stems from the domain shift between pre-training data and downstream tasks, as well as the sensitivity of encoder parameters. In response to these challenges, we propose Genetic Evolution-Nurtured Adversarial Fine-tuning (Gen-AF), a two-stage adversarial fine-tuning approach aimed at enhancing the robustness of downstream models. Our extensive experiments, conducted across ten self-supervised training methods and six datasets, demonstrate that Gen-AF attains high testing accuracy and robust testing accuracy against state-of-the-art DAEs. △ Less

Submitted 18 March, 2024; v1 submitted 16 March, 2024; originally announced March 2024.

arXiv:2403.08335 [pdf, other]

A Sparsity Principle for Partially Observable Causal Representation Learning

Authors: Danru Xu, Dingling Yao, Sébastien Lachapelle, Perouz Taslakian, Julius von Kügelgen, Francesco Locatello, Sara Magliacane

Abstract: Causal representation learning aims at identifying high-level causal variables from perceptual data. Most methods assume that all latent causal variables are captured in the high-dimensional observations. We instead consider a partially observed setting, in which each measurement only provides information about a subset of the underlying causal state. Prior work has studied this setting with multi… ▽ More Causal representation learning aims at identifying high-level causal variables from perceptual data. Most methods assume that all latent causal variables are captured in the high-dimensional observations. We instead consider a partially observed setting, in which each measurement only provides information about a subset of the underlying causal state. Prior work has studied this setting with multiple domains or views, each depending on a fixed subset of latents. Here, we focus on learning from unpaired observations from a dataset with an instance-dependent partial observability pattern. Our main contribution is to establish two identifiability results for this setting: one for linear mixing functions without parametric assumptions on the underlying causal model, and one for piecewise linear mixing functions with Gaussian latent causal variables. Based on these insights, we propose two methods for estimating the underlying causal variables by enforcing sparsity in the inferred representation. Experiments on different simulated datasets and established benchmarks highlight the effectiveness of our approach in recovering the ground-truth latents. △ Less

Submitted 15 June, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

Comments: 45 pages, 32 figures, 16 tables

arXiv:2403.06215 [pdf]

Observation of in-gap states in a two-dimensional CrI2/NbSe2 heterostructure

Authors: Peigen Li, Jihai Zhang, Di Zhu, Cui-Qun Chen, Enkui Yi, Bing Shen, Yusheng Hou, Zhongbo Yan, Dao-Xin Yao, Donghui Guo, Dingyong Zhong

Abstract: Low-dimensional magnetic structures coupled with a superconductor are promising platforms for realizing Majorana zero modes, which are considered as novel non-Abelian anyons with potential applications in topological quantum computing. Here, we report the observation of in-gap edge states and zero-bias conductance peaks (ZBCPs) in a two-dimensional (2D) magnetic-superconducting heterostructure con… ▽ More Low-dimensional magnetic structures coupled with a superconductor are promising platforms for realizing Majorana zero modes, which are considered as novel non-Abelian anyons with potential applications in topological quantum computing. Here, we report the observation of in-gap edge states and zero-bias conductance peaks (ZBCPs) in a two-dimensional (2D) magnetic-superconducting heterostructure consisting of a single-layer chromium diiodide (CrI2) on a niobium diselenide (NbSe2) superconductor. Single-layer CrI2 nanosheets, which hold antiferromagnetic (AFM) ground states according to our first-principles calculations, were epitaxially grown on the layered NbSe2 substrate. By using scanning tunneling microscopy/spectroscopy, we observed robust in-gap states spatially located at the edge of the nanosheets and defect-induced ZBCPs inside the CrI2 nanosheets. Magnetic-flux vortices induced by an external field exhibit broken threefold rotational symmetry of the pristine NbSe2 superconductor, implying the efficient modulation of the interfacial superconducting states by the epitaxial CrI2 layer. A phenomenological model was proposed and suggested the existence of chiral edge states in a 2D AFM-superconducting hybrid system with an even Chern number, providing a qualitatively plausible understanding for our experimental observation. △ Less

Submitted 10 March, 2024; originally announced March 2024.

Comments: 19 pages, 5 figures

arXiv:2403.05951 [pdf]

Free-standing cubic gauche nitrogen stable at 760 K under ambient pressure

Authors: Yuxuan Xu, Guo Chen, Fei Du, Ming Li, Liangfei Wu, Deyuan Yao, Junfeng Ding, Zhi Zeng, Haiqing Lin, Xianlong Wang

Abstract: Cubic gauche nitrogen (cg-N) has received wide attention due to its high energy density and environmental friendliness. However, existing synthesis methods for cg-N predominantly rely on the high-pressure techniques, or the utilization of nanoconfined effects using highly toxic and sensitive sodium azide as precursor, which significantly restrict the practical application of cg-N as high energy de… ▽ More Cubic gauche nitrogen (cg-N) has received wide attention due to its high energy density and environmental friendliness. However, existing synthesis methods for cg-N predominantly rely on the high-pressure techniques, or the utilization of nanoconfined effects using highly toxic and sensitive sodium azide as precursor, which significantly restrict the practical application of cg-N as high energy density materials (HDEM). Here, based on the first-principles simulations, we find that the adsorption of potassium on the cg-N surface exhibits superior stabilization compared to sodium. Then, we chose the safer potassium azide as raw material for synthesizing cg-N. Through plasma-enhanced chemical vapor deposition treatment, the free-standing cg-N was successfully synthesized without the need of high-pressure and nanoconfined effects. Importantly, it demonstrates excellent thermal stability up to 760 K, and then a rapid and intense thermal decomposition occurs, exhibiting typical behaviors of HDEM thermal decomposition. Our work has significantly promoted the practical application of cg-N as HDEM. △ Less

Submitted 9 March, 2024; originally announced March 2024.

arXiv:2403.02975

A General and Flexible Multi-concept Parsing Framework for Multilingual Semantic Matching

Authors: Dong Yao

Abstract: Sentence semantic matching is a research hotspot in natural language processing, which is considerably significant in various key scenarios, such as community question answering, searching, chatbot, and recommendation. Since most of the advanced models directly model the semantic relevance among words between two sentences while neglecting the \textit{keywords} and \textit{intents} concepts of the… ▽ More Sentence semantic matching is a research hotspot in natural language processing, which is considerably significant in various key scenarios, such as community question answering, searching, chatbot, and recommendation. Since most of the advanced models directly model the semantic relevance among words between two sentences while neglecting the \textit{keywords} and \textit{intents} concepts of them, DC-Match is proposed to disentangle keywords from intents and utilizes them to optimize the matching performance. Although DC-Match is a simple yet effective method for semantic matching, it highly depends on the external NER techniques to identify the keywords of sentences, which limits the performance of semantic matching for minor languages since satisfactory NER tools are usually hard to obtain. In this paper, we propose to generally and flexibly resolve the text into multi concepts for multilingual semantic matching to liberate the model from the reliance on NER models. To this end, we devise a \underline{M}ulti-\underline{C}oncept \underline{P}arsed \underline{S}emantic \underline{M}atching framework based on the pre-trained language models, abbreviated as \textbf{MCP-SM}, to extract various concepts and infuse them into the classification tokens. We conduct comprehensive experiments on English datasets QQP and MRPC, and Chinese dataset Medical-SM. Besides, we experiment on Arabic datasets MQ2Q and XNLI, the outstanding performance further prove MCP-SM's applicability in low-resource languages. △ Less

Submitted 3 April, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

Comments: arXiv admin comment: This version has been removed by arXiv administrators as the submitter did not have the rights to agree to the license at the time of submission

arXiv:2403.01686 [pdf, other]

doi 10.3847/2041-8213/ad319

AT2023lli: A Tidal Disruption Event with Prominent Optical Early Bump and Delayed Episodic X-ray Emission

Authors: Shifeng Huang, Ning Jiang, Jiazheng Zhu, Yibo Wang, Tinggui Wang, Shan-Qin Wang, Wen-Pei Gan, En-Wei Liang, Yu-Jing Qin, Zheyu Lin, Lin-Na Xu, Min-Xuan Cai, Ji-An Jiang, Xu Kong, Jiaxun Li, Long Li, Jian-Guo Wang, Ze-Lin Xu, Yongquan Xue, Ye-Fei Yuan, Jingquan Cheng, Lulu Fan, Jie Gao, Lei Hu, Weida Hu , et al. (20 additional authors not shown)

Abstract: High-cadence, multiwavelength observations have continuously revealed the diversity of tidal disruption events (TDEs), thus greatly advancing our knowledge and understanding of TDEs. In this work, we conducted an intensive optical-UV and X-ray follow-up campaign of TDE AT2023lli, and found a remarkable month-long bump in its UV/optical light curve nearly two months prior to maximum brightness. The… ▽ More High-cadence, multiwavelength observations have continuously revealed the diversity of tidal disruption events (TDEs), thus greatly advancing our knowledge and understanding of TDEs. In this work, we conducted an intensive optical-UV and X-ray follow-up campaign of TDE AT2023lli, and found a remarkable month-long bump in its UV/optical light curve nearly two months prior to maximum brightness. The bump represents the longest separation time from the main peak among known TDEs to date. The main UV/optical outburst declines as $t^{-4.10}$, making it one of the fastest decaying optically selected TDEs. Furthermore, we detected sporadic X-ray emission 30 days after the UV/optical peak, accompanied by a reduction in the period of inactivity. It is proposed that the UV/optical bump could be caused by the self-intersection of the stream debris, whereas the primary peak is generated by the reprocessed emission of the accretion process. In addition, our results suggest that episodic X-ray radiation during the initial phase of decline may be due to the patched obscurer surrounding the accretion disk, a phenomenon associated with the inhomogeneous reprocessing process. The double TDE scenario, in which two stars are disrupted in sequence, is also a possible explanation for producing the observed early bump and main peak. We anticipate that the multicolor light curves of TDEs, especially in the very early stages, and the underlying physics can be better understood in the near future with the assistance of dedicated surveys such as the deep high-cadence survey of the 2.5-meter Wide Field Survey Telescope (WFST). △ Less

Submitted 26 March, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

Comments: 14 pages, 8 figures,accepted for publication by ApJL

arXiv:2402.07196 [pdf, other]

doi 10.1103/PhysRevB.110.014503

Trilayer multi-orbital models of $\mathrm{La_{4}Ni_{3}O_{10}}$

Authors: Cui-Qun Chen, Zhihui Luo, Meng Wang, Wéi Wú, Dao-Xin Yao

Abstract: Recently, the discovery of superconductivity in Ruddlesden-Popper (RP) $\mathrm{La_4Ni_3O_{10}}$ under pressure has further expanded the realm of nickelate-based superconductor family. In this paper, we performed a first-principle study of $\mathrm{La_4Ni_3O_{10}}$ for both $P2_1/a$ phase at ambient pressure and $I4/mmm$ phase at high pressure, with $U$=0, 3.5\ eV. Our results confirmed the charac… ▽ More Recently, the discovery of superconductivity in Ruddlesden-Popper (RP) $\mathrm{La_4Ni_3O_{10}}$ under pressure has further expanded the realm of nickelate-based superconductor family. In this paper, we performed a first-principle study of $\mathrm{La_4Ni_3O_{10}}$ for both $P2_1/a$ phase at ambient pressure and $I4/mmm$ phase at high pressure, with $U$=0, 3.5\ eV. Our results confirmed the characteristic upward shift of Ni-$d_{z^2}$ bonding band under pressure. Moreover, our analysis of electronic spectrum and orbital occupancy unveil the dynamic mechanism of electronic reconstructions under pressure, embedded in a critical dual effect. Based on our results, we further proposed a trilayer two-orbital model by performing Wannier downfolding on Ni-$e_g$ orbitals. Our model reveals four Fermi surface sheets with $α,β,β^\prime,γ$ pockets, bearing resemblance to that of bilayer $\mathrm{La_3Ni_2O_7}$. According to the model, our calculated spin susceptibility under random phase approximation shows that $d_{x^2-y^2}$ orbital is also important for the magnetic fluctuation in RP series. Finally, a high energy sixteen-orbital model with direct $dp,pp$ hoppings is proposed, which implies that $\mathrm{La_4Ni_3O_{10}}$ also lies in charge-transfer picture within Zaanen-Sawatzky-Allen scheme. Our exposition of electronic reconstructions and multi-orbital models shed light on theoretical electronic correlation study and experimental exploration of lower pressure superconductor in RP series. △ Less

Submitted 24 April, 2024; v1 submitted 11 February, 2024; originally announced February 2024.

Comments: 11 pages, 6 figures

Journal ref: Phys. Rev. B 110, 014503 (2024)

arXiv:2402.01348 [pdf, other]

CORE: Mitigating Catastrophic Forgetting in Continual Learning through Cognitive Replay

Authors: Jianshu Zhang, Yankai Fu, Ziheng Peng, Dongyu Yao, Kun He

Abstract: This paper introduces a novel perspective to significantly mitigate catastrophic forgetting in continuous learning (CL), which emphasizes models' capacity to preserve existing knowledge and assimilate new information. Current replay-based methods treat every task and data sample equally and thus can not fully exploit the potential of the replay buffer. In response, we propose COgnitive REplay (COR… ▽ More This paper introduces a novel perspective to significantly mitigate catastrophic forgetting in continuous learning (CL), which emphasizes models' capacity to preserve existing knowledge and assimilate new information. Current replay-based methods treat every task and data sample equally and thus can not fully exploit the potential of the replay buffer. In response, we propose COgnitive REplay (CORE), which draws inspiration from human cognitive review processes. CORE includes two key strategies: Adaptive Quantity Allocation and Quality-Focused Data Selection. The former adaptively modulates the replay buffer allocation for each task based on its forgetting rate, while the latter guarantees the inclusion of representative data that best encapsulates the characteristics of each task within the buffer. Our approach achieves an average accuracy of 37.95% on split-CIFAR10, surpassing the best baseline method by 6.52%. Additionally, it significantly enhances the accuracy of the poorest-performing task by 6.30% compared to the top baseline. Code is available at https://github.com/sterzhang/CORE. △ Less

Submitted 9 April, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

Comments: Accepted by CogSci24 as oral presentation

arXiv:2402.00272 [pdf, other]

Quantum phase transitions and composite excitations of antiferromagnetic quantum spin trimer chains in a magnetic field

Authors: Jun-Qing Cheng, Zhi-Yao Ning, Han-Qing Wu, Dao-Xin Yao

Abstract: Motivated by recent advancements in theoretical and experimental studies on the high-energy excitations, we theoretically explore the quantum phase transition and composite dynamics of the antiferromagnetic trimer chains in a magnetic field using the exact diagonalization, density matrix renormalization group, time-dependent variational principle and cluster perturbation theory. We measure the ent… ▽ More Motivated by recent advancements in theoretical and experimental studies on the high-energy excitations, we theoretically explore the quantum phase transition and composite dynamics of the antiferromagnetic trimer chains in a magnetic field using the exact diagonalization, density matrix renormalization group, time-dependent variational principle and cluster perturbation theory. We measure the entanglement entropy to uncover the phase diagram, encompassing the XY-I, $1/3$ magnetization plateau, XY-II and ferromagnetic phases. Both critical XY-I and XY-II phases are both described by the conformal field theory with the central charge $c \simeq 1$. By analyzing the dynamical structure factor, we elucidate the distinct features of spin dynamics across different phases. In the regime of weak intertrimer interaction, we identify the intermediate-energy and high-energy modes in the XY-I and $1/3$ magnetization plateau phases as the internal trimer excitations, corresponding to the propagation of doublon and quarton, respectively. Notably, the application of a magnetic field splits the high-energy spectra into two branches labeled as the upper quarton and lower quarton. Furthermore, we also explore the spin dynamics of a trimerized model closely related to the quantum magnet \ce{Na_2Cu_3Ge_4O_12}, and discuss the possibility of the quarton Bose-Einstein condensation. Our results can be verified in the inelastic neutron scattering experiments and provide deep insights for exploring the high-energy exotic excitations. △ Less

Submitted 5 June, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

Comments: 14+7 pages, 16 figures

arXiv:2401.16687 [pdf, other]

Revisiting Gradient Pruning: A Dual Realization for Defending against Gradient Attacks

Authors: Lulu Xue, Shengshan Hu, Ruizhi Zhao, Leo Yu Zhang, Shengqing Hu, Lichao Sun, Dezhong Yao

Abstract: Collaborative learning (CL) is a distributed learning framework that aims to protect user privacy by allowing users to jointly train a model by sharing their gradient updates only. However, gradient inversion attacks (GIAs), which recover users' training data from shared gradients, impose severe privacy threats to CL. Existing defense methods adopt different techniques, e.g., differential privacy,… ▽ More Collaborative learning (CL) is a distributed learning framework that aims to protect user privacy by allowing users to jointly train a model by sharing their gradient updates only. However, gradient inversion attacks (GIAs), which recover users' training data from shared gradients, impose severe privacy threats to CL. Existing defense methods adopt different techniques, e.g., differential privacy, cryptography, and perturbation defenses, to defend against the GIAs. Nevertheless, all current defense methods suffer from a poor trade-off between privacy, utility, and efficiency. To mitigate the weaknesses of existing solutions, we propose a novel defense method, Dual Gradient Pruning (DGP), based on gradient pruning, which can improve communication efficiency while preserving the utility and privacy of CL. Specifically, DGP slightly changes gradient pruning with a stronger privacy guarantee. And DGP can also significantly improve communication efficiency with a theoretical analysis of its convergence and generalization. Our extensive experiments show that DGP can effectively defend against the most powerful GIAs and reduce the communication cost without sacrificing the model's utility. △ Less

Submitted 29 January, 2024; originally announced January 2024.

arXiv:2401.15668 [pdf, other]

Lips Are Lying: Spotting the Temporal Inconsistency between Audio and Visual in Lip-Syncing DeepFakes

Authors: Weifeng Liu, Tianyi She, Jiawei Liu, Run Wang, Dongyu Yao, Ziyou Liang

Abstract: In recent years, DeepFake technology has achieved unprecedented success in high-quality video synthesis, whereas these methods also pose potential and severe security threats to humanity. DeepFake can be bifurcated into entertainment applications like face swapping and illicit uses such as lip-syncing fraud. However, lip-forgery videos, which neither change identity nor have discernible visual art… ▽ More In recent years, DeepFake technology has achieved unprecedented success in high-quality video synthesis, whereas these methods also pose potential and severe security threats to humanity. DeepFake can be bifurcated into entertainment applications like face swapping and illicit uses such as lip-syncing fraud. However, lip-forgery videos, which neither change identity nor have discernible visual artifacts, present a formidable challenge to existing DeepFake detection methods. Our preliminary experiments have shown that the effectiveness of the existing methods often drastically decreases or even fails when tackling lip-syncing videos. In this paper, for the first time, we propose a novel approach dedicated to lip-forgery identification that exploits the inconsistency between lip movements and audio signals. We also mimic human natural cognition by capturing subtle biological links between lips and head regions to boost accuracy. To better illustrate the effectiveness and advances of our proposed method, we curate a high-quality LipSync dataset by employing the SOTA lip generator. We hope this high-quality and diverse dataset could be well served the further research on this challenging and interesting field. Experimental results show that our approach gives an average accuracy of more than 95.3% in spotting lip-syncing videos, significantly outperforming the baselines. Extensive experiments demonstrate the capability to tackle deepfakes and the robustness in surviving diverse input transformations. Our method achieves an accuracy of up to 90.2% in real-world scenarios (e.g., WeChat video call) and shows its powerful capabilities in real scenario deployment. To facilitate the progress of this research community, we release all resources at https://github.com/AaronComo/LipFD. △ Less

Submitted 28 January, 2024; originally announced January 2024.

Comments: The first two authors contributed equally to this work

arXiv:2401.14302 [pdf, ps, other]

Correlation function and the inverse problem in the $BD$ interaction

Authors: Hai-Peng Li, Jing-Yu Yi, Chu-Wen Xiao, De-Liang Yao, Wei-Hong Liang, Eulogio Oset

Abstract: We study the correlation functions of the $B^0 D^+, B^+ D^0$ system, which develops a bound state of approximately $40$ MeV, using inputs consistent with the $T_{cc}(3875)$ state. Then we address the inverse problem starting from these correlation functions to determine the scattering observables related to the system, including the existence of the bound state and its molecular nature. The import… ▽ More We study the correlation functions of the $B^0 D^+, B^+ D^0$ system, which develops a bound state of approximately $40$ MeV, using inputs consistent with the $T_{cc}(3875)$ state. Then we address the inverse problem starting from these correlation functions to determine the scattering observables related to the system, including the existence of the bound state and its molecular nature. The important output of the approach is the uncertainty with which these observables can be obtained, considering errors in the $B^0 D^+, B^+ D^0$ correlation functions typical of current values in present correlation functions. We find that it is possible to obtain scattering lengths and effective ranges with relative high precision and the existence of a bound state. Although the pole position is obtained with errors of the order of $50 \%$ of the binding energy, the molecular probability of the state is obtained with a very small error of the order of $6\%$. All these findings can serve as motivation to perform such measurements in future runs of high energy hadron collisions. △ Less

Submitted 28 March, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

Comments: 16 pages, 3 figures, 7 tables; V2: version to be published in Chinese Physics C

arXiv:2401.11089 [pdf, other]

doi 10.1007/978-981-99-9896-8_6

FedRKG: A Privacy-preserving Federated Recommendation Framework via Knowledge Graph Enhancement

Authors: Dezhong Yao, Tongtong Liu, Qi Cao, Hai Jin

Abstract: Federated Learning (FL) has emerged as a promising approach for preserving data privacy in recommendation systems by training models locally. Recently, Graph Neural Networks (GNN) have gained popularity in recommendation tasks due to their ability to capture high-order interactions between users and items. However, privacy concerns prevent the global sharing of the entire user-item graph. To addre… ▽ More Federated Learning (FL) has emerged as a promising approach for preserving data privacy in recommendation systems by training models locally. Recently, Graph Neural Networks (GNN) have gained popularity in recommendation tasks due to their ability to capture high-order interactions between users and items. However, privacy concerns prevent the global sharing of the entire user-item graph. To address this limitation, some methods create pseudo-interacted items or users in the graph to compensate for missing information for each client. Unfortunately, these methods introduce random noise and raise privacy concerns. In this paper, we propose FedRKG, a novel federated recommendation system, where a global knowledge graph (KG) is constructed and maintained on the server using publicly available item information, enabling higher-order user-item interactions. On the client side, a relation-aware GNN model leverages diverse KG relationships. To protect local interaction items and obscure gradients, we employ pseudo-labeling and Local Differential Privacy (LDP). Extensive experiments conducted on three real-world datasets demonstrate the competitive performance of our approach compared to centralized algorithms while ensuring privacy preservation. Moreover, FedRKG achieves an average accuracy improvement of 4% compared to existing federated learning baselines. △ Less

Submitted 19 January, 2024; originally announced January 2024.

arXiv:2401.04497 [pdf, other]

doi 10.1063/5.0207915

Electric field-induced nonreciprocal spin current due to chiral phonons in chiral-structure superconductors

Authors: Dapeng Yao, Mamoru Matsuo, Takehito Yokoyama

Abstract: The recent experiment [R. Nakajima, et al., Nature 613, 479 (2023)] has reported a pair of oppositely polarized spins under an alternating electric current in a superconductor with a chiral structure. However, these behaviors cannot be explained by the conventional Edelstein effect and require a new mechanism. In this Letter, we propose a mechanism of spin current generation under an external elec… ▽ More The recent experiment [R. Nakajima, et al., Nature 613, 479 (2023)] has reported a pair of oppositely polarized spins under an alternating electric current in a superconductor with a chiral structure. However, these behaviors cannot be explained by the conventional Edelstein effect and require a new mechanism. In this Letter, we propose a mechanism of spin current generation under an external electric field due to chiral phonons in a chiral-structure superconductor based on the Bogoliubov de Gennes and the Boltzmann equations. In our mechanism, chiral phonons are induced by electric field due to inversion symmetry breaking and electron-phonon interaction. They work as an effective Zeeman field and hence spin-polarize Bogoliubov quasiparticles in the superconductor. As a result, the spin current carried by quasiparticles flows along the screw axis and shows a quadratic dependence on the electric field at the low-field range, leading to a nonreciprocal spin transport. The spin current also shows a nonmonotonic temperature dependence and has a maximum at around the superconducting transition temperature. △ Less

Submitted 12 March, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

Comments: 5 pages, 5 figures

Journal ref: Appl. Phys. Lett. 124, 162603 (2024)

arXiv:2401.00376 [pdf, other]

Magnon, doublon and quarton excitations in 2D S=1/2 trimerized Heisenberg models

Authors: Yue-Yue Chang, Jun-Qing Cheng, Hui Shao, Dao-Xin Yao, Han-Qing Wu

Abstract: We investigate the magnetic excitations of the trimerized Heisenberg models with intra-trimer interaction $J_1$ and inter-trimer interaction $J_2$ on four different two-dimensional lattices using a combination of stochastic series expansion quantum Monte Carlo (SSE QMC) and stochastic analytic continuation methods (SAC), complemented by cluster perturbation theory (CPT). These models exhibit quasi… ▽ More We investigate the magnetic excitations of the trimerized Heisenberg models with intra-trimer interaction $J_1$ and inter-trimer interaction $J_2$ on four different two-dimensional lattices using a combination of stochastic series expansion quantum Monte Carlo (SSE QMC) and stochastic analytic continuation methods (SAC), complemented by cluster perturbation theory (CPT). These models exhibit quasi-particle-like excitations when $g=J_2/J_1$ is small, characterized by low-energy magnons, intermediate-energy doublons, and high-energy quartons. The low-energy magnons are associated with the magnetic ground states. They can be described by the linear spin wave theory (LSWT) of the effective block spin model and the original spin model. Doublons and quartons emerge from the corresponding internal excitations of the trimers with distinct energy levels, which can be effectively analyzed using perturbation theory when the ratio of exchange interactions $g$ is small. In this small $g$ regime, we observe a clear separation between the magnon and higher-energy spectra. However, as $g$ increases, these three spectra gradually merge into the magnon modes or continua. Nevertheless, the LSWT fails to provide quantitative descriptions of the higher-energy excitation bands due to significant quantum fluctuations. Notably, in the Collinear II and trimerized hexagon lattice, a broad continuum emerges above the single-magnon spectrum, originating from the quasi-1D physics due to the dilute connections between chains. Our numerical analysis of these 2D trimers yields valuable theoretical predictions and explanations for the inelastic neutron scattering (INS) spectra of 2D magnetic materials featuring trimerized lattices. △ Less

Submitted 16 June, 2024; v1 submitted 30 December, 2023; originally announced January 2024.

arXiv:2401.00258 [pdf, other]

doi 10.1103/PhysRevB.109.155122

Phase diagram and critical behavior of Hubbard model on the square-hexagon-octagon lattice

Authors: Xinwei Jia, Dao-Xin Yao, Han-Qing Wu

Abstract: Employing the projective formalism of determinant quantum Monte Carlo (DQMC) simulations, we meticulously explore the ground-state phase diagram and critical behavior of the half-filled Hubbard model on a square-hexagon-octagon (SHO) lattice. This lattice, a two-dimensional (2D) structure comprising squares, hexagons, and octagons, is representative of the biphenylene network (BPN). Our findings r… ▽ More Employing the projective formalism of determinant quantum Monte Carlo (DQMC) simulations, we meticulously explore the ground-state phase diagram and critical behavior of the half-filled Hubbard model on a square-hexagon-octagon (SHO) lattice. This lattice, a two-dimensional (2D) structure comprising squares, hexagons, and octagons, is representative of the biphenylene network (BPN). Our findings reveal an intriguing ground-state phase diagram, featuring an antiferromagnetic (AFM) Mott insulating phase enveloped by three valence-bond solid-like (VBS-like) insulating phases. Analyzing the single-particle gap, spin gap, and single-particle spectral function, we observe that the metallic state in the noninteracting case becomes unstable under the influence of Hubbard U. This interaction drives the system into a hexagon insulating phase before transitioning into an AFM Mott insulating phase. To quantify the critical exponents, we use finite-size scaling techniques. The critical exponents of quantum critical points between the AFM Mott insulating phase and two insulating phases, plaquette insulator and ethylene insulator, closely align with the 3D O(3) universality class. However, the critical exponents of quantum critical points between the hexagon insulating phase and the AFM Mott insulating phase deviate from the 3D O(3) universality class. This deviation is a finite-size effect and can be attributed to the coupling between the fluctuations of magnetic order parameter and very low-energy fermionic excitations. Our comprehensive study not only advances the understanding of correlation effects on the SHO lattice but also sheds light on the less-explored critical exponents in weakly insulating quantum critical point. △ Less

Submitted 16 June, 2024; v1 submitted 30 December, 2023; originally announced January 2024.

Journal ref: Phys. Rev. B 109, 155122 (2024)

arXiv:2312.14297 [pdf]

Scalable nanoimprint manufacturing of multi-layer hybrid metasurface device

Authors: Shinhyuk Choi, Jiawei Zuo, Nabasindhu Das Yu Yao, Chao Wang

Abstract: Optical metasurfaces, consisting of subwavelength-scale meta-atom arrays, hold great promise to overcome fundamental limitations of conventional optics. Scalable nanomanufacturing of metasurfaces with high uniformity and reproducibility is key to transferring technology from laboratory demonstrations to commercialization. Recently, nanoimprint lithography (NIL) has attracted increasing interests f… ▽ More Optical metasurfaces, consisting of subwavelength-scale meta-atom arrays, hold great promise to overcome fundamental limitations of conventional optics. Scalable nanomanufacturing of metasurfaces with high uniformity and reproducibility is key to transferring technology from laboratory demonstrations to commercialization. Recently, nanoimprint lithography (NIL) has attracted increasing interests for metasurface fabrication because of its superior nanometer resolution, rapid prototyping and large-area manufacturing capability. Despite NIL demonstrations of single-layer metasurface, scalable fabrication of double- and multi-layer metasurfaces remains challenging. Here we leverage the nanometer-scale resolution and 3D pattern transfer capability of NIL to fabricate multi-layered metasurfaces for on-chip polarimetric imaging devices. Our process achieved sub-100 nm nanostructures, high alignment accuracy (translational error <200 nm; rotational error <0.02°), and good uniformity (<4 nm linewidth deviation) over >20 mm2. This NIL-based, low-cost and high-throughput nanomanufacturing approach paves the way toward scalable production of a plethora of metasurface structures for ultra-compact optic and optoelectronic devices and systems. △ Less

Submitted 21 December, 2023; originally announced December 2023.

Comments: 5 figures

arXiv:2312.06197 [pdf, other]

MART: Learning Hierarchical Music Audio Representations with Part-Whole Transformer

Authors: Dong Yao, Jieming Zhu, Jiahao Xun, Shengyu Zhang, Zhou Zhao, Liqun Deng, Wenqiao Zhang, Zhenhua Dong, Xin Jiang

Abstract: Recent research in self-supervised contrastive learning of music representations has demonstrated remarkable results across diverse downstream tasks. However, a prevailing trend in existing methods involves representing equally-sized music clips in either waveform or spectrogram formats, often overlooking the intrinsic part-whole hierarchies within music. In our quest to comprehend the bottom-up s… ▽ More Recent research in self-supervised contrastive learning of music representations has demonstrated remarkable results across diverse downstream tasks. However, a prevailing trend in existing methods involves representing equally-sized music clips in either waveform or spectrogram formats, often overlooking the intrinsic part-whole hierarchies within music. In our quest to comprehend the bottom-up structure of music, we introduce MART, a hierarchical music representation learning approach that facilitates feature interactions among cropped music clips while considering their part-whole hierarchies. Specifically, we propose a hierarchical part-whole transformer to capture the structural relationships between music clips in a part-whole hierarchy. Furthermore, a hierarchical contrastive learning objective is crafted to align part-whole music representations at adjacent levels, progressively establishing a multi-hierarchy representation space. The effectiveness of our music representation learning from part-whole hierarchies has been empirically validated across multiple downstream tasks, including music classification and cover song identification. △ Less

Submitted 19 April, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

Comments: Short paper accepted by WWW 2024. This is revised and condensed based on the previous version titled "Music-PAW: Learning Music Representations via Hierarchical Part-whole Interaction and Contrast". For more experimental details and discussions, please refer to the original long paper at arXiv:2312.06197v1

arXiv:2311.16763 [pdf]

doi 10.1007/s11433-023-2329-x

Structural transition, electric transport, and electronic structures in the compressed trilayer nickelate La4Ni3O10

Authors: Jingyuan Li, Cui-Qun Chen, Chaoxin Huang, Yifeng Han, Mengwu Huo, Xing Huang, Peiyue Ma, Zhengyang Qiu, Junfeng Chen, Xunwu Hu, Lan Chen, Tao Xie, Bing Shen, Hualei Sun, Dao-Xin Yao, Meng Wang

Abstract: Atomic structure and electronic band structure are fundamental properties for understanding the mechanism of superconductivity. Motivated by the discovery of pressure-induced high-temperature superconductivity at 80 K in the bilayer Ruddlesden-Popper nickelate La3Ni2O7, the atomic structure and electronic band structure of the trilayer nickelate La4Ni3O10 under pressure up to 44.3 GPa are investig… ▽ More Atomic structure and electronic band structure are fundamental properties for understanding the mechanism of superconductivity. Motivated by the discovery of pressure-induced high-temperature superconductivity at 80 K in the bilayer Ruddlesden-Popper nickelate La3Ni2O7, the atomic structure and electronic band structure of the trilayer nickelate La4Ni3O10 under pressure up to 44.3 GPa are investigated. A structural transition from the monoclinic P21/a space group to the tetragonal I4/mmm around 12.6-13.4 GPa is identified, accompanying with a drop of resistance below 7 K. Density functional theory calculations suggest that the bonding state of Ni 3dz2 orbital rises and crosses the Fermi level at high pressures, which may give rise to possible superconductivity observed in resistance under pressure in La4Ni3O10. The trilayer nickelate La4Ni3O10 shows some similarities with the bilayer La3Ni2O7 and has unique properties, providing a new platform to investigate the underlying mechanism of superconductivity in nickelates. △ Less

Submitted 30 March, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

Comments: 19 pages, 4 figures

Journal ref: SCIENCE CHINA Physics, Mechanics & Astronomy 67.11(2024):117403

arXiv:2311.05690 [pdf, other]

doi 10.1103/PhysRevLett.132.206502

Measuring the Boundary Gapless State and Criticality via Disorder Operator

Authors: Zenan Liu, Rui-Zhen Huang, Yan-Cheng Wang, Zheng Yan, Dao-Xin Yao

Abstract: The disorder operator is often designed to reveal the conformal field theory information in quantum many-body systems. By using large-scale quantum Monte Carlo simulation, we study the scaling behavior of disorder operators on the boundary in the two-dimensional Heisenberg model on the square-octagon lattice with gapless topological edge state. In the Affleck-Kennedy-Lieb-Tasaki phase, the disorde… ▽ More The disorder operator is often designed to reveal the conformal field theory information in quantum many-body systems. By using large-scale quantum Monte Carlo simulation, we study the scaling behavior of disorder operators on the boundary in the two-dimensional Heisenberg model on the square-octagon lattice with gapless topological edge state. In the Affleck-Kennedy-Lieb-Tasaki phase, the disorder operator is shown to hold the perimeter scaling with a logarithmic term associated with the Luttinger liquid parameter K. This effective Luttinger liquid parameter K reflects the low-energy physics and CFT for (1+1)D boundary. At bulk critical point, the effective K is suppressed but keeps finite value, indicating the coupling between the gapless edge state and bulk fluctuation. The logarithmic term numerically captures this coupling picture, which reveals the (1+1)D SU(2)1 CFT and (2+1)D O(3) CFT at boundary criticality. Our Letter paves a new way to study the exotic boundary state and boundary criticality. △ Less

Submitted 16 June, 2024; v1 submitted 9 November, 2023; originally announced November 2023.

Comments: 8 Pages,7 figures

Journal ref: Phys. Rev. Lett. 132,206502 (2024)

arXiv:2311.04056 [pdf, other]

Multi-View Causal Representation Learning with Partial Observability

Authors: Dingling Yao, Danru Xu, Sébastien Lachapelle, Sara Magliacane, Perouz Taslakian, Georg Martius, Julius von Kügelgen, Francesco Locatello

Abstract: We present a unified framework for studying the identifiability of representations learned from simultaneously observed views, such as different data modalities. We allow a partially observed setting in which each view constitutes a nonlinear mixture of a subset of underlying latent variables, which can be causally related. We prove that the information shared across all subsets of any number of v… ▽ More We present a unified framework for studying the identifiability of representations learned from simultaneously observed views, such as different data modalities. We allow a partially observed setting in which each view constitutes a nonlinear mixture of a subset of underlying latent variables, which can be causally related. We prove that the information shared across all subsets of any number of views can be learned up to a smooth bijection using contrastive learning and a single encoder per view. We also provide graphical criteria indicating which latent variables can be identified through a simple set of rules, which we refer to as identifiability algebra. Our general framework and theoretical results unify and extend several previous works on multi-view nonlinear ICA, disentanglement, and causal representation learning. We experimentally validate our claims on numerical, image, and multi-modal data sets. Further, we demonstrate that the performance of prior methods is recovered in different special cases of our setup. Overall, we find that access to multiple partial views enables us to identify a more fine-grained representation, under the generally milder assumption of partial observability. △ Less

Submitted 8 March, 2024; v1 submitted 7 November, 2023; originally announced November 2023.

Comments: 28 pages, 10 figures, 11 tables

arXiv:2311.04044 [pdf, other]

PrivLM-Bench: A Multi-level Privacy Evaluation Benchmark for Language Models

Authors: Haoran Li, Dadi Guo, Donghao Li, Wei Fan, Qi Hu, Xin Liu, Chunkit Chan, Duanyi Yao, Yuan Yao, Yangqiu Song

Abstract: The rapid development of language models (LMs) brings unprecedented accessibility and usage for both models and users. On the one hand, powerful LMs achieve state-of-the-art performance over numerous downstream NLP tasks. On the other hand, more and more attention is paid to unrestricted model accesses that may bring malicious privacy risks of data leakage. To address these issues, many recent wor… ▽ More The rapid development of language models (LMs) brings unprecedented accessibility and usage for both models and users. On the one hand, powerful LMs achieve state-of-the-art performance over numerous downstream NLP tasks. On the other hand, more and more attention is paid to unrestricted model accesses that may bring malicious privacy risks of data leakage. To address these issues, many recent works propose privacy-preserving language models (PPLMs) with differential privacy (DP). Unfortunately, different DP implementations make it challenging for a fair comparison among existing PPLMs. In this paper, we present PrivLM-Bench, a multi-perspective privacy evaluation benchmark to empirically and intuitively quantify the privacy leakage of LMs. Instead of only reporting DP parameters, PrivLM-Bench sheds light on the neglected inference data privacy during actual usage. PrivLM-Bench first clearly defines multi-faceted privacy objectives. Then, PrivLM-Bench constructs a unified pipeline to perform private fine-tuning. Lastly, PrivLM-Bench performs existing privacy attacks on LMs with pre-defined privacy objectives as the empirical evaluation results. The empirical attack results are used to fairly and intuitively evaluate the privacy leakage of various PPLMs. We conduct extensive experiments on three datasets of GLUE for mainstream LMs. △ Less

Submitted 1 June, 2024; v1 submitted 7 November, 2023; originally announced November 2023.

Comments: To appear at ACL 2024

arXiv:2310.20341 [pdf]

doi 10.1016/j.mtphys.2023.101309

Signature of Topological Semimetal in Harmonic-honeycomb ReO3

Authors: Yifeng Han, Cui-Qun Chen, Hualei Sun, Shuang Zhao, Long Jiang, Yuxuan Liu, Zhongxiong Sun, Meng Wang, Hongliang Dong, Ziyou Zhang, Zhiqiang Chen, Bin Chen, Dao-Xin Yao, Man-Rong Li

Abstract: Transition-metal honeycomb compounds are capturing scientific attention due to their distinctive electronic configurations, underscored by the triangular-lattice spin-orbit coupling and competition between multiple interactions, paving the way for potential manifestations of phenomena such as Dirac semimetal, superconductivity, and quantum spin liquid states. These compounds can undergo discernibl… ▽ More Transition-metal honeycomb compounds are capturing scientific attention due to their distinctive electronic configurations, underscored by the triangular-lattice spin-orbit coupling and competition between multiple interactions, paving the way for potential manifestations of phenomena such as Dirac semimetal, superconductivity, and quantum spin liquid states. These compounds can undergo discernible pressure-induced alterations in their crystallographic and electronic paradigms, as exemplified by our high-pressure (HP) synthesis and exploration of the honeycomb polymorph of ReO3 (P6322). This HP-P6322 polymorph bears a phase transition from P6322 to P63/mmc upon cooling around Tp = 250 K, as evidenced by the evolution of temperature-dependent magnetization (M-T curves), cell dimension, and conductivity initiated by an inherent bifurcation of the oxygen position in the ab plane. Insightful analysis of its band structure positions suggests this HP-P6322 polymorph being a plausible candidate for Dirac semimetal properties. This phase transition evokes anomalies in the temperature-dependent variation of paramagnetism (non-linearity) and a crossover from semiconductor to temperature-independent metal, showing a temperature independent conductivity behavior below ~200 K. Under increasing external pressure, both the Tp and resistance of this HP-polymorph is slightly magnetic-field dependent and undergo a "V"-style evolution (decreasing and then increasing) before becoming pressure independent up to 20.2 GPa. Theoretical calculations pinpoint this anionic disorder as a probable catalyst for the decrement in the conductive efficiency and muted temperature-dependent conductivity response. △ Less

Submitted 28 December, 2023; v1 submitted 31 October, 2023; originally announced October 2023.

Journal ref: Materials Today Physics 40,101309 (2024)

Showing 1–50 of 374 results for author: Yao, D