subscribe to arXiv mailings

arXiv:2407.09667 [pdf]

Real and imaginary part symmetries (RIPS) and artifacts removal in 1H magnetic resonance spectroscopy without water suppression

Authors: Zhengchao Dong

Abstract: Purpose: Proton MR spectroscopic imaging (1H MRSI) without water suppression (WS) possess some distinct advantages over the conventionally used 1H MRSI with WS. However, the sideband artifacts in the non-water suppressed spectra hinder the applications of the 1H MRSI without WS. Although many hardware or software techniques to tackle the sidebands have been developed, they suffer from various shor… ▽ More Purpose: Proton MR spectroscopic imaging (1H MRSI) without water suppression (WS) possess some distinct advantages over the conventionally used 1H MRSI with WS. However, the sideband artifacts in the non-water suppressed spectra hinder the applications of the 1H MRSI without WS. Although many hardware or software techniques to tackle the sidebands have been developed, they suffer from various shortcomings, such as the prolonged data acquisition time, insufficient elimination of the sidebands, or loss of some spectral information. Here, we present a software method that allows the elimination of sideband artifacts from 1H MRSI data acquired without WS. Methods The presented method is based on the following observations: (1) the real part of the sidebands is anti-symmetric, and the imaginary part of sidebands is symmetric about the unsuppressed water signal, which is at the center of the spectrum, and (2) there is no observable metabolite resonances in the downfield region spectrum acquired with conventional MRSI pulse sequences. We model the complex sidebands in the downfield region using a singular value decomposition-based method, reconstruct from them the upfield sidebands according to the symmetry features of the sidebands, and subtract the sidebands from the spectrum. We demonstrate the method using phantom and in vivo data. Results The method can remarkably reduce sideband artifacts in both the real and the imaginary parts of the spectrum. In both phantom data and high-quality in vivo data, the residual sidebands are smaller than those in the WS spectra. Conclusion The method allows to remove sideband artifacts in both the real and the imaginary parts of the spectrum. The performance of the method on phantom data and high-quality in vivo data outperforms on poor quality in vivo data. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: 25 pages, 9 figures, 2 tables

arXiv:2407.07147 [pdf, other]

Analytical Insights on Hadronic Top Quark Polarimetry

Authors: Zhongtian Dong, Dorival Gonçalves, Kyoungchul Kong, Andrew J. Larkoski, Alberto Navarro

Abstract: Top quark polarization provides an important tool for studying its production mechanisms, spin correlations, top quark properties, and new physics searches. Unlike lighter quarks, the top quark's polarization remains intact until its decay, enabling precise spin measurements. While the down-type fermions from $W$ boson decay are known to be effective spin analyzers, charged leptons have typically… ▽ More Top quark polarization provides an important tool for studying its production mechanisms, spin correlations, top quark properties, and new physics searches. Unlike lighter quarks, the top quark's polarization remains intact until its decay, enabling precise spin measurements. While the down-type fermions from $W$ boson decay are known to be effective spin analyzers, charged leptons have typically been the main target for most analyses. In this paper, we investigate the relevance of global jet dynamics -- considering kinematics, jet charges, and particle multiplicity -- for hadronic top quark polarimetry. The formalism used allows for analytical derivations obtained throughout the manuscript, offering deeper insights into the corresponding phenomenology. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: 14 pages and 3 figures

arXiv:2407.05563 [pdf, other]

LLMBox: A Comprehensive Library for Large Language Models

Authors: Tianyi Tang, Yiwen Hu, Bingqian Li, Wenyang Luo, Zijing Qin, Haoxiang Sun, Jiapeng Wang, Shiyi Xu, Xiaoxue Cheng, Geyang Guo, Han Peng, Bowen Zheng, Yiru Tang, Yingqian Min, Yushuo Chen, Jie Chen, Yuanqian Zhao, Luran Ding, Yuhao Wang, Zican Dong, Chunxuan Xia, Junyi Li, Kun Zhou, Wayne Xin Zhao, Ji-Rong Wen

Abstract: To facilitate the research on large language models (LLMs), this paper presents a comprehensive and unified library, LLMBox, to ease the development, use, and evaluation of LLMs. This library is featured with three main merits: (1) a unified data interface that supports the flexible implementation of various training strategies, (2) a comprehensive evaluation that covers extensive tasks, datasets,… ▽ More To facilitate the research on large language models (LLMs), this paper presents a comprehensive and unified library, LLMBox, to ease the development, use, and evaluation of LLMs. This library is featured with three main merits: (1) a unified data interface that supports the flexible implementation of various training strategies, (2) a comprehensive evaluation that covers extensive tasks, datasets, and models, and (3) more practical consideration, especially on user-friendliness and efficiency. With our library, users can easily reproduce existing methods, train new models, and conduct comprehensive performance comparisons. To rigorously test LLMBox, we conduct extensive experiments in a diverse coverage of evaluation settings, and experimental results demonstrate the effectiveness and efficiency of our library in supporting various implementations related to LLMs. The detailed introduction and usage guidance can be found at https://github.com/RUCAIBox/LLMBox. △ Less

Submitted 7 July, 2024; originally announced July 2024.

Comments: Accepted by ACL 2024 Demo

arXiv:2407.04404 [pdf]

Fixed and Movable Antenna Technology for 6G Integrated Sensing and Communication

Authors: Yong Zeng, Zhenjun Dong, Huizhi Wang, Lipeng Zhu, Ziyao Hong, Qingji Jiang, Dongming Wang, Shi Jin, Rui Zhang

Abstract: By deploying antenna arrays at the transmitter/receiver to provide additional spatial-domain degrees of freedom (DoFs), multi-antenna technology greatly improves the reliability and efficiency of wireless communication. Meanwhile, the application of multi-antenna technology in the radar field has achieved spatial angle resolution and improved sensing DoF, thus significantly enhancing wireless sens… ▽ More By deploying antenna arrays at the transmitter/receiver to provide additional spatial-domain degrees of freedom (DoFs), multi-antenna technology greatly improves the reliability and efficiency of wireless communication. Meanwhile, the application of multi-antenna technology in the radar field has achieved spatial angle resolution and improved sensing DoF, thus significantly enhancing wireless sensing performance. However, wireless communication and radar sensing have undergone independent development over the past few decades. As a result, although multi-antenna technology has dramatically advanced in these two fields separately, it has not been deeply integrated by exploiting their synergy. A new opportunity to fill up this gap arises as the integration of sensing and communication has been identified as one of the typical usage scenarios of the 6G communication network. Motivated by the above, this article aims to explore the multi-antenna technology for 6G ISAC, with the focus on its future development trends such as continuous expansion of antenna array scale, more diverse array architectures, and more flexible antenna designs. First, we introduce several new and promising antenna architectures, including the centralized antenna architectures based on traditional compact arrays or emerging sparse arrays, the distributed antenna architectures exemplified by the cell-free massive MIMO, and the movable/fluid antennas with flexible positions and/or orientations in a given 3D space. Next, for each antenna architecture mentioned above, we present the corresponding far-field/near-field channel models and analyze the communication and sensing performance. Finally, we summarize the characteristics of different antenna architectures and look forward to new ideas for solving the difficulties in acquiring CSI caused by the continuous expansion of antenna array scale and flexible antenna designs. △ Less

Submitted 16 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

Comments: in Chinese language

arXiv:2407.03591 [pdf, other]

Controlling quasi-parametric amplifications: From multiple PT-symmetry phase transitions to non-Hermitian sensing

Authors: Xiaoxiong Wu, Kai Bai, Penghong Yu, Zhaohui Dong, Yanyan He, Jingui Ma, Vladislav V. Yakovlev, Meng Xiao, Xianfeng Chen, Luqi Yuan

Abstract: Quasi-parametric amplification (QPA) is a nonlinear interaction in which the idler wave is depleted through some loss mechanism. QPA plays an important role in signal amplification in ultrafast photonics and quantum light generation. The QPA process has a number of features characterized by the non-Hermitian parity-time ($\mathcal{PT}$) symmetry. In this report, we explore new interaction regimes… ▽ More Quasi-parametric amplification (QPA) is a nonlinear interaction in which the idler wave is depleted through some loss mechanism. QPA plays an important role in signal amplification in ultrafast photonics and quantum light generation. The QPA process has a number of features characterized by the non-Hermitian parity-time ($\mathcal{PT}$) symmetry. In this report, we explore new interaction regimes and uncover multiple $\mathcal{PT}$-symmetry phase transitions in such QPA process where transitions are particularly sensitive to external parameters. In particular, we demonstrate the feasibility of detection of $10^{-11}$ inhomogeneities of the doped absorber, which is order of magnitude more sensitive than similar measurements performed in a linear absorption regime. In doing so, we reveal a family of $\mathcal{PT}$-symmetry phase transitions appearing in the QPA process and provide a novel nonlinear optical sensing mechanism for precise optical measurements. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 17 pages, 6 figures

arXiv:2407.03531 [pdf, other]

OrbitGrasp: $SE(3)$-Equivariant Grasp Learning

Authors: Boce Hu, Xupeng Zhu, Dian Wang, Zihao Dong, Haojie Huang, Chenghao Wang, Robin Walters, Robert Platt

Abstract: While grasp detection is an important part of any robotic manipulation pipeline, reliable and accurate grasp detection in $SE(3)$ remains a research challenge. Many robotics applications in unstructured environments such as the home or warehouse would benefit a lot from better grasp performance. This paper proposes a novel framework for detecting $SE(3)$ grasp poses based on point cloud input. Our… ▽ More While grasp detection is an important part of any robotic manipulation pipeline, reliable and accurate grasp detection in $SE(3)$ remains a research challenge. Many robotics applications in unstructured environments such as the home or warehouse would benefit a lot from better grasp performance. This paper proposes a novel framework for detecting $SE(3)$ grasp poses based on point cloud input. Our main contribution is to propose an $SE(3)$-equivariant model that maps each point in the cloud to a continuous grasp quality function over the 2-sphere $S^2$ using a spherical harmonic basis. Compared with reasoning about a finite set of samples, this formulation improves the accuracy and efficiency of our model when a large number of samples would otherwise be needed. In order to accomplish this, we propose a novel variation on EquiFormerV2 that leverages a UNet-style backbone to enlarge the number of points the model can handle. Our resulting method, which we name $\textit{OrbitGrasp}$, significantly outperforms baselines in both simulation and physical experiments. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.03442 [pdf, other]

Fisher-aware Quantization for DETR Detectors with Critical-category Objectives

Authors: Huanrui Yang, Yafeng Huang, Zhen Dong, Denis A Gudovskiy, Tomoyuki Okuno, Yohei Nakata, Yuan Du, Kurt Keutzer, Shanghang Zhang

Abstract: The impact of quantization on the overall performance of deep learning models is a well-studied problem. However, understanding and mitigating its effects on a more fine-grained level is still lacking, especially for harder tasks such as object detection with both classification and regression objectives. This work defines the performance for a subset of task-critical categories, i.e. the critical… ▽ More The impact of quantization on the overall performance of deep learning models is a well-studied problem. However, understanding and mitigating its effects on a more fine-grained level is still lacking, especially for harder tasks such as object detection with both classification and regression objectives. This work defines the performance for a subset of task-critical categories, i.e. the critical-category performance, as a crucial yet largely overlooked fine-grained objective for detection tasks. We analyze the impact of quantization at the category-level granularity, and propose methods to improve performance for the critical categories. Specifically, we find that certain critical categories have a higher sensitivity to quantization, and are prone to overfitting after quantization-aware training (QAT). To explain this, we provide theoretical and empirical links between their performance gaps and the corresponding loss landscapes with the Fisher information framework. Using this evidence, we apply a Fisher-aware mixed-precision quantization scheme, and a Fisher-trace regularization for the QAT on the critical-category loss landscape. The proposed methods improve critical-category metrics of the quantized transformer-based DETR detectors. They are even more significant in case of larger models and higher number of classes where the overfitting becomes more severe. For example, our methods lead to 10.4% and 14.5% mAP gains for, correspondingly, 4-bit DETR-R50 and Deformable DETR on the most impacted critical classes in the COCO Panoptic dataset. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: Poster presentation at the 2nd Workshop on Advancing Neural Network Training: Computational Efficiency, Scalability, and Resource Optimization (WANT@ICML 2024)

arXiv:2407.03014 [pdf]

Dielectric Fano Nanoantennas for Enabling Sub-Nanosecond Lifetimes in NV-based Single Photon Emitters

Authors: Shu An, Dmitry Kalashnikov, Wenqiao Shi, Zackaria Mahfoud, Ah Bian Chew, Yan Liu, Jing Wu, Di Zhu, Weibo Gao, Cheng-Wei Qiu, Victor Leong, Zhaogang Dong

Abstract: Solid-state quantum emitters are essential sources of single photons, and enhancing their emission rates is of paramount importance for applications in quantum communications, computing, and metrology. One approach is to couple quantum emitters with resonant photonic nanostructures, where the emission rate is enhanced due to the Purcell effect. Dielectric nanoantennas are promising as they provide… ▽ More Solid-state quantum emitters are essential sources of single photons, and enhancing their emission rates is of paramount importance for applications in quantum communications, computing, and metrology. One approach is to couple quantum emitters with resonant photonic nanostructures, where the emission rate is enhanced due to the Purcell effect. Dielectric nanoantennas are promising as they provide strong emission enhancement compared to plasmonic ones, which suffer from high Ohmic loss. Here, we designed and fabricated a dielectric Fano resonator based on a pair of silicon (Si) ellipses and a disk, which supports the mode hybridization between quasi-bound-states-in-the-continuum (quasi-BIC) and Mie resonance. We demonstrated the performance of the developed resonant system by interfacing it with single photon emitters (SPEs) based on nitrogen-vacancy (NV-) centers in nanodiamonds (NDs). We observed that the interfaced emitters have a Purcell enhancement factor of ~10, with sub-ns emission lifetime and a polarization contrast of 9. Our results indicate a promising method for developing efficient and compact single-photon sources for integrated quantum photonics applications. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 20 pages, 4 figures

arXiv:2407.02887 [pdf, other]

Explicitly Guided Information Interaction Network for Cross-modal Point Cloud Completion

Authors: Hang Xu, Chen Long, Wenxiao Zhang, Yuan Liu, Zhen Cao, Zhen Dong, Bisheng Yang

Abstract: In this paper, we explore a novel framework, EGIInet (Explicitly Guided Information Interaction Network), a model for View-guided Point cloud Completion (ViPC) task, which aims to restore a complete point cloud from a partial one with a single view image. In comparison with previous methods that relied on the global semantics of input images, EGIInet efficiently combines the information from two m… ▽ More In this paper, we explore a novel framework, EGIInet (Explicitly Guided Information Interaction Network), a model for View-guided Point cloud Completion (ViPC) task, which aims to restore a complete point cloud from a partial one with a single view image. In comparison with previous methods that relied on the global semantics of input images, EGIInet efficiently combines the information from two modalities by leveraging the geometric nature of the completion task. Specifically, we propose an explicitly guided information interaction strategy supported by modal alignment for point cloud completion. First, in contrast to previous methods which simply use 2D and 3D backbones to encode features respectively, we unified the encoding process to promote modal alignment. Second, we propose a novel explicitly guided information interaction strategy that could help the network identify critical information within images, thus achieving better guidance for completion. Extensive experiments demonstrate the effectiveness of our framework, and we achieved a new state-of-the-art (+16% CD over XMFnet) in benchmark datasets despite using fewer parameters than the previous methods. The pre-trained model and code and are available at https://github.com/WHU-USI3DV/EGIInet. △ Less

Submitted 4 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

Comments: ECCV 2024

arXiv:2407.01663 [pdf, other]

Hadronic Top Quark Polarimetry with ParticleNet

Authors: Zhongtian Dong, Dorival Gonçalves, Kyoungchul Kong, Andrew J. Larkoski, Alberto Navarro

Abstract: Precision studies for top quark physics are a cornerstone of the Large Hadron Collider program. Polarization, probed through decay kinematics, provides a unique tool to scrutinize the top quark across its various production modes and to explore potential new physics effects. However, the top quark most often decays hadronically, for which unambiguous identification of its decay products sensitive… ▽ More Precision studies for top quark physics are a cornerstone of the Large Hadron Collider program. Polarization, probed through decay kinematics, provides a unique tool to scrutinize the top quark across its various production modes and to explore potential new physics effects. However, the top quark most often decays hadronically, for which unambiguous identification of its decay products sensitive to top quark polarization is not possible. In this Letter, we introduce a jet flavor tagging method to significantly improve spin analyzing power in hadronic decays, going beyond exclusive kinematic information employed in previous studies. We provide parametric estimates of the improvement from flavor tagging with any set of measured observables and demonstrate this in practice on simulated data using a Graph Neural Network (GNN). We find that the spin analyzing power in hadronic decays can improve by approximately 20% (40%) compared to the kinematic approach, assuming an efficiency of 0.5 (0.2) for the network. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: 7 pages and 3 figures

arXiv:2407.01118 [pdf, other]

Preliminary results of sky brightness measurements in near-infrared at Lenghu, China

Authors: Jinji Li, Bin Ma, Zhongnan Dong, Haoran Zhang

Abstract: Low sky brightness is crucial for ground-based astronomical observations, because it limits the observational capability to detect fainter sources. Lenghu, located on the Tibetan Plateau in China, has been identified as an high-quality astronomical site in China, including dark sky in optical band. In this work, we will report the preliminary results of near-infrared sky brightness measurements at… ▽ More Low sky brightness is crucial for ground-based astronomical observations, because it limits the observational capability to detect fainter sources. Lenghu, located on the Tibetan Plateau in China, has been identified as an high-quality astronomical site in China, including dark sky in optical band. In this work, we will report the preliminary results of near-infrared sky brightness measurements at Lenghu. Utilizing a wide-field small telescope equipped with an InGaAs camera, we have been conducting long-term monitoring of near-infrared sky brightness in the J and H' bands, respectively, since January 2024. For each image, photometry and astrometry were performed, then sky background was calibrated by standard stars from the 2MASS catalog. This report includes preliminary results on the sky brightness at zenith in the J and H' bands, as well as their variations with solar elevation at Lenghu. Our initial results indicate that the near-infrared sky brightness at Lenghu is comparable to that of other world-class sites, and long-term monitoring will be continued. △ Less

Submitted 8 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

Comments: 8 pages, 8 figures, presented at 2024 SPIE Astronomical Telescopes + Instrumentation conference

arXiv:2407.00433 [pdf]

Screening of half-Heuslers with temperature-induced band convergence and enhanced thermoelectric properties

Authors: Jinyang Xi, Zirui Dong, Menghan Gao, Jun Luo, Jiong Yang

Abstract: Enhancing band convergence is an effective way to optimize the thermoelectric (TE) properties of materials. However, the temperature-induced band renormalization is commonly ignored. By employing the recently-developed electron-phonon renormalization (EPR) method, the nature of band renormalization in half-Heusler (HH) compounds TiCoSb and NbFeSb is revealed, and the key factors for temperature-in… ▽ More Enhancing band convergence is an effective way to optimize the thermoelectric (TE) properties of materials. However, the temperature-induced band renormalization is commonly ignored. By employing the recently-developed electron-phonon renormalization (EPR) method, the nature of band renormalization in half-Heusler (HH) compounds TiCoSb and NbFeSb is revealed, and the key factors for temperature-induced conduction band convergence in HH are found out. Using these as the screening criteria, 3 out of 274 HHs (TiRhBi, TiPtSn, NbPtTl) are then stood out from our MatHub-3d database. Taking TiPtSn as the example, it shows the conduction band convergence at mid-high temperature, and further resulting in enhanced Seebeck coefficient S: e.g., at 600 K with electron concentration 10^20 cm^-3, the predicted S with and without renormalized band is 352.83 uV/K and 289.52 uV/K, respectively. Herein, the former is closer to our measurement value of 338.79 uV/K. Besides, the effective masses obtained from calculation and experiment are both enlarged with temperature, indicating the existence of band convergence. Our work demonstrates for the first time the significance of adding the temperature effect on electronic structure in the design of potential high-performance TE materials. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2406.19853 [pdf, other]

YuLan: An Open-source Large Language Model

Authors: Yutao Zhu, Kun Zhou, Kelong Mao, Wentong Chen, Yiding Sun, Zhipeng Chen, Qian Cao, Yihan Wu, Yushuo Chen, Feng Wang, Lei Zhang, Junyi Li, Xiaolei Wang, Lei Wang, Beichen Zhang, Zican Dong, Xiaoxue Cheng, Yuhan Chen, Xinyu Tang, Yupeng Hou, Qiangqiang Ren, Xincheng Pang, Shufang Xie, Wayne Xin Zhao, Zhicheng Dou , et al. (13 additional authors not shown)

Abstract: Large language models (LLMs) have become the foundation of many applications, leveraging their extensive capabilities in processing and understanding natural language. While many open-source LLMs have been released with technical reports, the lack of training details hinders further research and development. This paper presents the development of YuLan, a series of open-source LLMs with $12$ billi… ▽ More Large language models (LLMs) have become the foundation of many applications, leveraging their extensive capabilities in processing and understanding natural language. While many open-source LLMs have been released with technical reports, the lack of training details hinders further research and development. This paper presents the development of YuLan, a series of open-source LLMs with $12$ billion parameters. The base model of YuLan is pre-trained on approximately $1.7$T tokens derived from a diverse corpus, including massive English, Chinese, and multilingual texts. We design a three-stage pre-training method to enhance YuLan's overall capabilities. Subsequent phases of training incorporate instruction-tuning and human alignment, employing a substantial volume of high-quality synthesized data. To facilitate the learning of complex and long-tail knowledge, we devise a curriculum-learning framework throughout across these stages, which helps LLMs learn knowledge in an easy-to-hard manner. YuLan's training is finished on Jan, 2024 and has achieved performance on par with state-of-the-art LLMs across various English and Chinese benchmarks. This paper outlines a comprehensive technical roadmap for developing LLMs from scratch. Our model and codes are available at https://github.com/RUC-GSAI/YuLan-Chat. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.17507 [pdf, other]

ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling

Authors: Minghui Fang, Shengpeng Ji, Jialong Zuo, Hai Huang, Yan Xia, Jieming Zhu, Xize Cheng, Xiaoda Yang, Wenrui Liu, Gang Wang, Zhenhua Dong, Zhou Zhao

Abstract: Generative retrieval, which has demonstrated effectiveness in text-to-text retrieval, utilizes a sequence-to-sequence model to directly generate candidate identifiers based on natural language queries. Without explicitly computing the similarity between queries and candidates, generative retrieval surpasses dual-tower models in both speed and accuracy on large-scale corpora, providing new insights… ▽ More Generative retrieval, which has demonstrated effectiveness in text-to-text retrieval, utilizes a sequence-to-sequence model to directly generate candidate identifiers based on natural language queries. Without explicitly computing the similarity between queries and candidates, generative retrieval surpasses dual-tower models in both speed and accuracy on large-scale corpora, providing new insights for cross-modal retrieval. However, constructing identifiers for multimodal data remains an untapped problem, and the modality gap between natural language queries and multimodal candidates hinders retrieval performance due to the absence of additional encoders. To this end, we propose a pioneering generAtive Cross-modal rEtrieval framework (ACE), which is a comprehensive framework for end-to-end cross-modal retrieval based on coarse-to-fine semantic modeling. We propose combining K-Means and RQ-VAE to construct coarse and fine tokens, serving as identifiers for multimodal data. Correspondingly, we design the coarse-to-fine feature fusion strategy to efficiently align natural language queries and candidate identifiers. ACE is the first work to comprehensively demonstrate the feasibility of generative approach on text-to-image/audio/video retrieval, challenging the dominance of the embedding-based dual-tower architecture. Extensive experiments show that ACE achieves state-of-the-art performance in cross-modal retrieval and outperforms the strong baselines on Recall@1 by 15.27% on average. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.17036 [pdf, other]

Superconductivity from spin-canting fluctuations in rhombohedral graphene

Authors: Zhiyu Dong, Étienne Lantagne-Hurtubise, Jason Alicea

Abstract: Rhombohedral graphene multilayers host various broken-symmetry metallic phases as well as superconductors whose pairing mechanism and order parameter symmetry remain unsettled. Strikingly, experiments have revealed prominent new superconducting regions in rhombohedral bilayer and trilayer graphene devices with proximity-induced Ising spin-orbit coupling. We propose that these superconductors desce… ▽ More Rhombohedral graphene multilayers host various broken-symmetry metallic phases as well as superconductors whose pairing mechanism and order parameter symmetry remain unsettled. Strikingly, experiments have revealed prominent new superconducting regions in rhombohedral bilayer and trilayer graphene devices with proximity-induced Ising spin-orbit coupling. We propose that these superconductors descend from a common spin-canted normal state that spontaneously breaks a U(1) spin symmetry and thus supports a gapless (Goldstone) magnon mode. In particular, we develop a scenario wherein pairing in the spin-canted state emerges from a novel type of magnon-mediated attraction: Contrary to conventional mechanisms that involve the exchange of a single boson, we show that second-order processes that exchange two magnons are dominant and produce an $s$-wave pairing interaction featuring a unique logarithmic low-frequency divergence. This low-frequency divergence disappears when spin-orbit coupling vanishes, providing a promising explanation for spin-orbit-enabled pairing. Numerous other experimental observations -- including nontrivial dependence of superconductivity on the spin-orbit coupling strength, in-plane magnetic fields, and Fermi surface structure -- also naturally follow from our scenario. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: 14 pages, 5 figures

arXiv:2406.16864 [pdf, other]

StableNormal: Reducing Diffusion Variance for Stable and Sharp Normal

Authors: Chongjie Ye, Lingteng Qiu, Xiaodong Gu, Qi Zuo, Yushuang Wu, Zilong Dong, Liefeng Bo, Yuliang Xiu, Xiaoguang Han

Abstract: This work addresses the challenge of high-quality surface normal estimation from monocular colored inputs (i.e., images and videos), a field which has recently been revolutionized by repurposing diffusion priors. However, previous attempts still struggle with stochastic inference, conflicting with the deterministic nature of the Image2Normal task, and costly ensembling step, which slows down the e… ▽ More This work addresses the challenge of high-quality surface normal estimation from monocular colored inputs (i.e., images and videos), a field which has recently been revolutionized by repurposing diffusion priors. However, previous attempts still struggle with stochastic inference, conflicting with the deterministic nature of the Image2Normal task, and costly ensembling step, which slows down the estimation process. Our method, StableNormal, mitigates the stochasticity of the diffusion process by reducing inference variance, thus producing "Stable-and-Sharp" normal estimates without any additional ensembling process. StableNormal works robustly under challenging imaging conditions, such as extreme lighting, blurring, and low quality. It is also robust against transparent and reflective surfaces, as well as cluttered scenes with numerous objects. Specifically, StableNormal employs a coarse-to-fine strategy, which starts with a one-step normal estimator (YOSO) to derive an initial normal guess, that is relatively coarse but reliable, then followed by a semantic-guided refinement process (SG-DRN) that refines the normals to recover geometric details. The effectiveness of StableNormal is demonstrated through competitive performance in standard datasets such as DIODE-indoor, iBims, ScannetV2 and NYUv2, and also in various downstream tasks, such as surface reconstruction and normal enhancement. These results evidence that StableNormal retains both the "stability" and "sharpness" for accurate normal estimation. StableNormal represents a baby attempt to repurpose diffusion priors for deterministic estimation. To democratize this, code and models have been publicly available in hf.co/Stable-X △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: HF Demo: hf.co/Stable-X, Video: https://www.youtube.com/watch?v=sylXTxG_U2U

arXiv:2406.16003 [pdf]

Unidirectional Chiral Emission via Twisted Bi-layer Metasurfaces

Authors: Dmitrii Gromyko, Shu An, Sergey Gorelik, Jiahui Xu, Li Jun Lim, Henry Yit Loong Lee, Febiana Tjiptoharsono, Zhi-Kuang Tan, Cheng-Wei Qiu, Zhaogang Dong, Lin Wu

Abstract: Controlling and channelling light emissions from unpolarized quantum dots into specific directions with chiral polarization remains a key challenge in modern photonics. Stacked metasurface designs offer a potential compact solution for chirality and directionality engineering. However, experimental observations of directional chiral radiation from resonant metasurfaces with quantum emitters remain… ▽ More Controlling and channelling light emissions from unpolarized quantum dots into specific directions with chiral polarization remains a key challenge in modern photonics. Stacked metasurface designs offer a potential compact solution for chirality and directionality engineering. However, experimental observations of directional chiral radiation from resonant metasurfaces with quantum emitters remain obscure. In this paper, we present experimental observations of unidirectional chiral emission from a twisted bi-layer metasurface via multi-dimensional control, including twist angle, interlayer distance, and lateral displacement between the top and bottom layers, as enabled by doublet alignment lithography (DAL). First, maintaining alignment, the metasurface demonstrates a resonant intrinsic optical chirality with near-unity circular dichroism of 0.94 and reflectance difference of 74%, where a high circular dichroism greater than 0.9 persists across a wide range of angles from -11 to 11 degrees. Second, engineered lateral displacement induces a unidirectional chiral resonance, resulting in unidirectional chiral emission from the quantum dots deposited onto the metasurface. Our bi-layer metasurfaces offer a universal compact platform for efficient radiation manipulation over a wide angular range, promising potential applications in miniaturized lasers, grating couplers, and chiral nanoantennas. △ Less

Submitted 22 June, 2024; originally announced June 2024.

Comments: 16 pages, 4 figures

arXiv:2406.15073 [pdf, other]

KnobTree: Intelligent Database Parameter Configuration via Explainable Reinforcement Learning

Authors: Jiahan Chen, Shuhan Qi, Yifan Li, Zeyu Dong, Mingfeng Ding, Yulin Wu, Xuan Wang

Abstract: Databases are fundamental to contemporary information systems, yet traditional rule-based configuration methods struggle to manage the complexity of real-world applications with hundreds of tunable parameters. Deep reinforcement learning (DRL), which combines perception and decision-making, presents a potential solution for intelligent database configuration tuning. However, due to black-box prope… ▽ More Databases are fundamental to contemporary information systems, yet traditional rule-based configuration methods struggle to manage the complexity of real-world applications with hundreds of tunable parameters. Deep reinforcement learning (DRL), which combines perception and decision-making, presents a potential solution for intelligent database configuration tuning. However, due to black-box property of RL-based method, the generated database tuning strategies still face the urgent problem of lack explainability. Besides, the redundant parameters in large scale database always make the strategy learning become unstable. This paper proposes KnobTree, an interpertable framework designed for the optimization of database parameter configuration. In this framework, an interpertable database tuning algorithm based on RL-based differentatial tree is proposed, which building a transparent tree-based model to generate explainable database tuning strategies. To address the problem of large-scale parameters, We also introduce a explainable method for parameter importance assessment, by utilizing Shapley Values to identify parameters that have significant impacts on database performance. Experiments conducted on MySQL and Gbase8s databases have verified exceptional transparency and interpretability of the KnobTree model. The good property makes generated strategies can offer practical guidance to algorithm designers and database administrators. Moreover, our approach also slightly outperforms the existing RL-based tuning algorithms in aspects such as throughput, latency, and processing time. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.14927 [pdf, other]

Gaussian-Informed Continuum for Physical Property Identification and Simulation

Authors: Junhao Cai, Yuji Yang, Weihao Yuan, Yisheng He, Zilong Dong, Liefeng Bo, Hui Cheng, Qifeng Chen

Abstract: This paper studies the problem of estimating physical properties (system identification) through visual observations. To facilitate geometry-aware guidance in physical property estimation, we introduce a novel hybrid framework that leverages 3D Gaussian representation to not only capture explicit shapes but also enable the simulated continuum to deduce implicit shapes during training. We propose a… ▽ More This paper studies the problem of estimating physical properties (system identification) through visual observations. To facilitate geometry-aware guidance in physical property estimation, we introduce a novel hybrid framework that leverages 3D Gaussian representation to not only capture explicit shapes but also enable the simulated continuum to deduce implicit shapes during training. We propose a new dynamic 3D Gaussian framework based on motion factorization to recover the object as 3D Gaussian point sets across different time states. Furthermore, we develop a coarse-to-fine filling strategy to generate the density fields of the object from the Gaussian reconstruction, allowing for the extraction of object continuums along with their surfaces and the integration of Gaussian attributes into these continuums. In addition to the extracted object surfaces, the Gaussian-informed continuum also enables the rendering of object masks during simulations, serving as implicit shape guidance for physical property estimation. Extensive experimental evaluations demonstrate that our pipeline achieves state-of-the-art performance across multiple benchmarks and metrics. Additionally, we illustrate the effectiveness of the proposed method through real-world demonstrations, showcasing its practical utility. Our project page is at https://jukgei.github.io/project/gic. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: 19 pages, 8 figures

arXiv:2406.14264 [pdf, other]

Zero-Shot Image Denoising for High-Resolution Electron Microscopy

Authors: Xuanyu Tian, Zhuoya Dong, Xiyue Lin, Yue Gao, Hongjiang Wei, Yanhang Ma, Jingyi Yu, Yuyao Zhang

Abstract: High-resolution electron microscopy (HREM) imaging technique is a powerful tool for directly visualizing a broad range of materials in real-space. However, it faces challenges in denoising due to ultra-low signal-to-noise ratio (SNR) and scarce data availability. In this work, we propose Noise2SR, a zero-shot self-supervised learning (ZS-SSL) denoising framework for HREM. Within our framework, we… ▽ More High-resolution electron microscopy (HREM) imaging technique is a powerful tool for directly visualizing a broad range of materials in real-space. However, it faces challenges in denoising due to ultra-low signal-to-noise ratio (SNR) and scarce data availability. In this work, we propose Noise2SR, a zero-shot self-supervised learning (ZS-SSL) denoising framework for HREM. Within our framework, we propose a super-resolution (SR) based self-supervised training strategy, incorporating the Random Sub-sampler module. The Random Sub-sampler is designed to generate approximate infinite noisy pairs from a single noisy image, serving as an effective data augmentation in zero-shot denoising. Noise2SR trains the network with paired noisy images of different resolutions, which is conducted via SR strategy. The SR-based training facilitates the network adopting more pixels for supervision, and the random sub-sampling helps compel the network to learn continuous signals enhancing the robustness. Meanwhile, we mitigate the uncertainty caused by random-sampling by adopting minimum mean squared error (MMSE) estimation for the denoised results. With the distinctive integration of training strategy and proposed designs, Noise2SR can achieve superior denoising performance using a single noisy HREM image. We evaluate the performance of Noise2SR in both simulated and real HREM denoising tasks. It outperforms state-of-the-art ZS-SSL methods and achieves comparable denoising performance with supervised methods. The success of Noise2SR suggests its potential for improving the SNR of images in material imaging domains. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: 12 pages, 12 figures

arXiv:2406.14017 [pdf, other]

EAGER: Two-Stream Generative Recommender with Behavior-Semantic Collaboration

Authors: Ye Wang, Jiahao Xun, Minjie Hong, Jieming Zhu, Tao Jin, Wang Lin, Haoyuan Li, Linjun Li, Yan Xia, Zhou Zhao, Zhenhua Dong

Abstract: Generative retrieval has recently emerged as a promising approach to sequential recommendation, framing candidate item retrieval as an autoregressive sequence generation problem. However, existing generative methods typically focus solely on either behavioral or semantic aspects of item information, neglecting their complementary nature and thus resulting in limited effectiveness. To address this… ▽ More Generative retrieval has recently emerged as a promising approach to sequential recommendation, framing candidate item retrieval as an autoregressive sequence generation problem. However, existing generative methods typically focus solely on either behavioral or semantic aspects of item information, neglecting their complementary nature and thus resulting in limited effectiveness. To address this limitation, we introduce EAGER, a novel generative recommendation framework that seamlessly integrates both behavioral and semantic information. Specifically, we identify three key challenges in combining these two types of information: a unified generative architecture capable of handling two feature types, ensuring sufficient and independent learning for each type, and fostering subtle interactions that enhance collaborative information utilization. To achieve these goals, we propose (1) a two-stream generation architecture leveraging a shared encoder and two separate decoders to decode behavior tokens and semantic tokens with a confidence-based ranking strategy; (2) a global contrastive task with summary tokens to achieve discriminative decoding for each type of information; and (3) a semantic-guided transfer task designed to implicitly promote cross-interactions through reconstruction and estimation objectives. We validate the effectiveness of EAGER on four public benchmarks, demonstrating its superior performance compared to existing methods. △ Less

Submitted 3 July, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

Comments: Accepted by KDD 2024. Code available at https://reczoo.github.io/EAGER

arXiv:2406.09807 [pdf, other]

Same App, Different Behaviors: Uncovering Device-specific Behaviors in Android Apps

Authors: Zikan Dong, Yanjie Zhao, Tianming Liu, Chao Wang, Guosheng Xu, Guoai Xu, Haoyu Wang

Abstract: The Android ecosystem faces a notable challenge known as fragmentation, which denotes the extensive diversity within the system. This issue is mainly related to differences in system versions, device hardware specifications, and customizations introduced by manufacturers. The growing divergence among devices leads to marked variations in how a given app behaves across diverse devices. This is refe… ▽ More The Android ecosystem faces a notable challenge known as fragmentation, which denotes the extensive diversity within the system. This issue is mainly related to differences in system versions, device hardware specifications, and customizations introduced by manufacturers. The growing divergence among devices leads to marked variations in how a given app behaves across diverse devices. This is referred to as device-specific behaviors. In this work, we present the first large-scale empirical study of device-specific behaviors in real-world Android apps. We have designed a three-phase static analysis framework to accurately detect and understand the device-specific behaviors. Upon employing our tool on a dataset comprising more than 20,000 apps, we detected device-specific behaviors in 2,357 of them. By examining the distribution of device-specific behaviors, our analysis revealed that apps within the Chinese third-party app market exhibit more relevant behaviors compared to their counterparts in Google Play. Additionally, these behaviors are more likely to feature dominant brands that hold larger market shares. Reflecting this, we have classified these device-specific behaviors into 29 categories based on implemented functionalities, providing structured insight into these behaviors. Beyond common behaviors like issue fixes and feature adaptations, we observed 33 aggressive apps, including popular ones with millions of downloads, abusing system properties of customized ROMs to obtain user-unresettable identifiers without requiring permission, substantially impacting user privacy. Finally, we investigated the origins of device-specific behaviors, revealing significant challenges developers face in implementing them comprehensively. Our research sheds light on the promising but less touched research direction of device-specific behaviors, benefiting community stakeholders. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.09509 [pdf, other]

CleanDiffuser: An Easy-to-use Modularized Library for Diffusion Models in Decision Making

Authors: Zibin Dong, Yifu Yuan, Jianye Hao, Fei Ni, Yi Ma, Pengyi Li, Yan Zheng

Abstract: Leveraging the powerful generative capability of diffusion models (DMs) to build decision-making agents has achieved extensive success. However, there is still a demand for an easy-to-use and modularized open-source library that offers customized and efficient development for DM-based decision-making algorithms. In this work, we introduce CleanDiffuser, the first DM library specifically designed f… ▽ More Leveraging the powerful generative capability of diffusion models (DMs) to build decision-making agents has achieved extensive success. However, there is still a demand for an easy-to-use and modularized open-source library that offers customized and efficient development for DM-based decision-making algorithms. In this work, we introduce CleanDiffuser, the first DM library specifically designed for decision-making algorithms. By revisiting the roles of DMs in the decision-making domain, we identify a set of essential sub-modules that constitute the core of CleanDiffuser, allowing for the implementation of various DM algorithms with simple and flexible building blocks. To demonstrate the reliability and flexibility of CleanDiffuser, we conduct comprehensive evaluations of various DM algorithms implemented with CleanDiffuser across an extensive range of tasks. The analytical experiments provide a wealth of valuable design choices and insights, reveal opportunities and challenges, and lay a solid groundwork for future research. CleanDiffuser will provide long-term support to the decision-making community, enhancing reproducibility and fostering the development of more robust solutions. The code and documentation of CleanDiffuser are open-sourced on the https://github.com/CleanDiffuserTeam/CleanDiffuser. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: The first two authors contribute equally to this work. Code and documentation: https://github.com/CleanDiffuserTeam/CleanDiffuser

arXiv:2406.07932 [pdf, other]

Counteracting Duration Bias in Video Recommendation via Counterfactual Watch Time

Authors: Haiyuan Zhao, Guohao Cai, Jieming Zhu, Zhenhua Dong, Jun Xu, Ji-Rong Wen

Abstract: In video recommendation, an ongoing effort is to satisfy users' personalized information needs by leveraging their logged watch time. However, watch time prediction suffers from duration bias, hindering its ability to reflect users' interests accurately. Existing label-correction approaches attempt to uncover user interests through grouping and normalizing observed watch time according to video du… ▽ More In video recommendation, an ongoing effort is to satisfy users' personalized information needs by leveraging their logged watch time. However, watch time prediction suffers from duration bias, hindering its ability to reflect users' interests accurately. Existing label-correction approaches attempt to uncover user interests through grouping and normalizing observed watch time according to video duration. Although effective to some extent, we found that these approaches regard completely played records (i.e., a user watches the entire video) as equally high interest, which deviates from what we observed on real datasets: users have varied explicit feedback proportion when completely playing videos. In this paper, we introduce the counterfactual watch time(CWT), the potential watch time a user would spend on the video if its duration is sufficiently long. Analysis shows that the duration bias is caused by the truncation of CWT due to the video duration limitation, which usually occurs on those completely played records. Besides, a Counterfactual Watch Model (CWM) is proposed, revealing that CWT equals the time users get the maximum benefit from video recommender systems. Moreover, a cost-based transform function is defined to transform the CWT into the estimation of user interest, and the model can be learned by optimizing a counterfactual likelihood function defined over observed user watch times. Extensive experiments on three real video recommendation datasets and online A/B testing demonstrated that CWM effectively enhanced video recommendation accuracy and counteracted the duration bias. △ Less

Submitted 13 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

Comments: Accepted by KDD 2024

arXiv:2406.04252 [pdf]

Sub-nanometer depth resolution and single dopant visualization achieved by tilt-coupled multislice electron ptychography

Authors: Zehao Dong, Yang Zhang, Chun-Chien Chiu, Sicheng Lu, Jianbing Zhang, Yu-Chen Liu, Suya Liu, Jan-Chi Yang, Pu Yu, Yayu Wang, Zhen Chen

Abstract: Real-space imaging of three-dimensional atomic structures is a critical yet challenging task in materials science. Although scanning transmission electron microscopy has achieved sub-angstrom lateral resolution through techniques like electron ptychography1,2, depth resolution remains limited to only 2 to 3 nanometers with a single projection setup3,4. Attaining better depth resolution typically n… ▽ More Real-space imaging of three-dimensional atomic structures is a critical yet challenging task in materials science. Although scanning transmission electron microscopy has achieved sub-angstrom lateral resolution through techniques like electron ptychography1,2, depth resolution remains limited to only 2 to 3 nanometers with a single projection setup3,4. Attaining better depth resolution typically necessitates large sample tilt angles and many projections, as seen in atomic electron tomography5,6. Here, we develop a new algorithm based on multislice electron ptychography which couples only a few projections at small tilt angles, but is sufficient to improve the depth resolution by more than threefold to the sub-nanometer scale, and potentially to the atomic level. This technique maintains high resolving power for both light and heavy atoms, and significantly improves the visibility of single dopants. We are thus able to experimentally detect dilute substitutional praseodymium dopants in a brownmillerite oxide, Ca2Co2O5, in three dimensions and observe the accompanying lattice distortion. This technique requires only a moderate level of data acquisition or processing, and can be seamlessly integrated into electron microscopes equipped with conventional components. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: 27 pages, 5 figures, 10 supplementary figures

arXiv:2406.01238 [pdf, other]

EffiQA: Efficient Question-Answering with Strategic Multi-Model Collaboration on Knowledge Graphs

Authors: Zixuan Dong, Baoyun Peng, Yufei Wang, Jia Fu, Xiaodong Wang, Yongxue Shan, Xin Zhou

Abstract: While large language models (LLMs) have shown remarkable capabilities in natural language processing, they struggle with complex, multi-step reasoning tasks involving knowledge graphs (KGs). Existing approaches that integrate LLMs and KGs either underutilize the reasoning abilities of LLMs or suffer from prohibitive computational costs due to tight coupling. To address these limitations, we propos… ▽ More While large language models (LLMs) have shown remarkable capabilities in natural language processing, they struggle with complex, multi-step reasoning tasks involving knowledge graphs (KGs). Existing approaches that integrate LLMs and KGs either underutilize the reasoning abilities of LLMs or suffer from prohibitive computational costs due to tight coupling. To address these limitations, we propose a novel collaborative framework named EffiQA that can strike a balance between performance and efficiency via an iterative paradigm. EffiQA consists of three stages: global planning, efficient KG exploration, and self-reflection. Specifically, EffiQA leverages the commonsense capability of LLMs to explore potential reasoning pathways through global planning. Then, it offloads semantic pruning to a small plug-in model for efficient KG exploration. Finally, the exploration results are fed to LLMs for self-reflection to further improve the global planning and efficient KG exploration. Empirical evidence on multiple KBQA benchmarks shows EffiQA's effectiveness, achieving an optimal balance between reasoning accuracy and computational costs. We hope the proposed new framework will pave the way for efficient, knowledge-intensive querying by redefining the integration of LLMs and KGs, fostering future research on knowledge-based question answering. △ Less

Submitted 7 July, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

Comments: 10 pages, 4 figures, 3 tables

arXiv:2406.00974 [pdf, other]

Large Language Model Assisted Optimal Bidding of BESS in FCAS Market: An AI-agent based Approach

Authors: Borui Zhang, Chaojie Li, Guo Chen, Zhaoyang Dong

Abstract: To incentivize flexible resources such as Battery Energy Storage Systems (BESSs) to offer Frequency Control Ancillary Services (FCAS), Australia's National Electricity Market (NEM) has implemented changes in recent years towards shorter-term bidding rules and faster service requirements. However, firstly, existing bidding optimization methods often overlook or oversimplify the key aspects of FCAS… ▽ More To incentivize flexible resources such as Battery Energy Storage Systems (BESSs) to offer Frequency Control Ancillary Services (FCAS), Australia's National Electricity Market (NEM) has implemented changes in recent years towards shorter-term bidding rules and faster service requirements. However, firstly, existing bidding optimization methods often overlook or oversimplify the key aspects of FCAS market procedures, resulting in an inaccurate depiction of the market bidding process. Thus, the BESS bidding problem is modeled based on the actual bidding records and the latest market specifications and then formulated as a deep reinforcement learning (DRL) problem. Secondly, the erratic decisions of the DRL agent caused by imperfectly predicted market information increases the risk of profit loss. Hence, a Conditional Value at Risk (CVaR)-based DRL algorithm is developed to enhance the risk resilience of bidding strategies. Thirdly, well-trained DRL models still face performance decline in uncommon scenarios during online operations. Therefore, a Large Language Models (LLMs)-assisted artificial intelligence (AI)-agent interactive decision-making framework is proposed to improve the strategy timeliness, reliability and interpretability in uncertain new scenarios, where conditional hybrid decision and self-reflection mechanisms are designed to address LLMs' hallucination challenge. The experiment results demonstrate that our proposed framework has higher bidding profitability compared to the baseline methods by effectively mitigating the profit loss caused by various uncertainties. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2405.20228 [pdf, other]

Love-C relations for elastic hybrid stars

Authors: Zoey Zhiyuan Dong, Joshua Cole Faggert, Shu Yan Lau, Kent Yagi

Abstract: Neutron stars (NSs) provide a unique laboratory to study matter under extreme densities. Recent observations from gravitational and electromagnetic waves have enabled constraints on NS properties, such as tidal deformability (related to the tidal Love number) and stellar compactness. Although each of these two NS observables depends strongly on the stellar internal structure, the relation between… ▽ More Neutron stars (NSs) provide a unique laboratory to study matter under extreme densities. Recent observations from gravitational and electromagnetic waves have enabled constraints on NS properties, such as tidal deformability (related to the tidal Love number) and stellar compactness. Although each of these two NS observables depends strongly on the stellar internal structure, the relation between them (called the Love-C relation) is known to be equation-of-state insensitive. In this study, we investigate the effects of a possible crystalline phase in the core of hybrid stars (HSs) on the mass-radius and Love-C relations, where HSs are a subclass of NS models with a quark matter core and a nuclear matter envelope with a sharp phase transition in between. We find that both the maximum mass and the corresponding radius increase as one increases the stiffness of the quark matter core controlled by the speed of sound, while the density discontinuity at the nuclear-quark matter transition effectively softens the equations of state. Deviations of the Love-C relation for elastic HSs from that of fluid NSs become more pronounced with a larger shear modulus, lower transition pressure, and larger density gap and can be as large as 60%. These findings suggest a potential method for testing the existence of distinct phases within HSs, though deviations are not large enough to be detected with current measurements of the tidal deformability and compactness. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: 23 pages, 9 fighures, submitted to GRG

arXiv:2405.19262 [pdf, other]

Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models

Authors: Zhanhui Zhou, Zhixuan Liu, Jie Liu, Zhichen Dong, Chao Yang, Yu Qiao

Abstract: Large language models are usually fine-tuned to align with human preferences. However, fine-tuning a large language model can be challenging. In this work, we introduce $\textit{weak-to-strong search}$, framing the alignment of a large language model as a test-time greedy search to maximize the log-likelihood difference between small tuned and untuned models while sampling from the frozen large mo… ▽ More Large language models are usually fine-tuned to align with human preferences. However, fine-tuning a large language model can be challenging. In this work, we introduce $\textit{weak-to-strong search}$, framing the alignment of a large language model as a test-time greedy search to maximize the log-likelihood difference between small tuned and untuned models while sampling from the frozen large model. This method serves both as (i) a compute-efficient model up-scaling strategy that avoids directly tuning the large model and as (ii) an instance of weak-to-strong generalization that enhances a strong model with weak test-time guidance. Empirically, we demonstrate the flexibility of weak-to-strong search across different tasks. In controlled-sentiment generation and summarization, we use tuned and untuned $\texttt{gpt2}$s to effectively improve the alignment of large models without additional training. Crucially, in a more difficult instruction-following benchmark, AlpacaEval 2.0, we show that reusing off-the-shelf small model pairs (e.g., $\texttt{zephyr-7b-beta}$ and its untuned version) can significantly improve the length-controlled win rates of both white-box and black-box large models against $\texttt{gpt-4-turbo}$ (e.g., $34.4 \rightarrow 37.9$ for $\texttt{Llama-3-70B-Instruct}$ and $16.0 \rightarrow 20.1$ for $\texttt{gpt-3.5-turbo-instruct}$), despite the small models' low win rates $\approx 10.0$. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.18009 [pdf, other]

Exploring Context Window of Large Language Models via Decomposed Positional Vectors

Authors: Zican Dong, Junyi Li, Xin Men, Wayne Xin Zhao, Bingbing Wang, Zhen Tian, Weipeng Chen, Ji-Rong Wen

Abstract: Transformer-based large language models (LLMs) typically have a limited context window, resulting in significant performance degradation when processing text beyond the length of the context window. Extensive studies have been proposed to extend the context window and achieve length extrapolation of LLMs, but there is still a lack of in-depth interpretation of these approaches. In this study, we e… ▽ More Transformer-based large language models (LLMs) typically have a limited context window, resulting in significant performance degradation when processing text beyond the length of the context window. Extensive studies have been proposed to extend the context window and achieve length extrapolation of LLMs, but there is still a lack of in-depth interpretation of these approaches. In this study, we explore the positional information within and beyond the context window for deciphering the underlying mechanism of LLMs. By using a mean-based decomposition method, we disentangle positional vectors from hidden states of LLMs and analyze their formation and effect on attention. Furthermore, when texts exceed the context window, we analyze the change of positional vectors in two settings, i.e., direct extrapolation and context window extension. Based on our findings, we design two training-free context window extension methods, positional vector replacement and attention window extension. Experimental results show that our methods can effectively extend the context window length. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.17998 [pdf, other]

Source Echo Chamber: Exploring the Escalation of Source Bias in User, Data, and Recommender System Feedback Loop

Authors: Yuqi Zhou, Sunhao Dai, Liang Pang, Gang Wang, Zhenhua Dong, Jun Xu, Ji-Rong Wen

Abstract: Recently, researchers have uncovered that neural retrieval models prefer AI-generated content (AIGC), called source bias. Compared to active search behavior, recommendation represents another important means of information acquisition, where users are more prone to source bias. Furthermore, delving into the recommendation scenario, as AIGC becomes integrated within the feedback loop involving user… ▽ More Recently, researchers have uncovered that neural retrieval models prefer AI-generated content (AIGC), called source bias. Compared to active search behavior, recommendation represents another important means of information acquisition, where users are more prone to source bias. Furthermore, delving into the recommendation scenario, as AIGC becomes integrated within the feedback loop involving users, data, and the recommender system, it progressively contaminates the candidate items, the user interaction history, and ultimately, the data used to train the recommendation models. How and to what extent the source bias affects the neural recommendation models within feedback loop remains unknown. In this study, we extend the investigation of source bias into the realm of recommender systems, specifically examining its impact across different phases of the feedback loop. We conceptualize the progression of AIGC integration into the recommendation content ecosystem in three distinct phases-HGC dominate, HGC-AIGC coexist, and AIGC dominance-each representing past, present, and future states, respectively. Through extensive experiments across three datasets from diverse domains, we demonstrate the prevalence of source bias and reveal a potential digital echo chamber with source bias amplification throughout the feedback loop. This trend risks creating a recommender ecosystem with limited information source, such as AIGC, being disproportionately recommended. To counteract this bias and prevent its escalation in the feedback loop, we introduce a black-box debiasing method that maintains model impartiality towards both HGC and AIGC. Our experimental results validate the effectiveness of the proposed debiasing method, confirming its potential to disrupt the feedback loop. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.16546 [pdf, other]

Cocktail: A Comprehensive Information Retrieval Benchmark with LLM-Generated Documents Integration

Authors: Sunhao Dai, Weihao Liu, Yuqi Zhou, Liang Pang, Rongju Ruan, Gang Wang, Zhenhua Dong, Jun Xu, Ji-Rong Wen

Abstract: The proliferation of Large Language Models (LLMs) has led to an influx of AI-generated content (AIGC) on the internet, transforming the corpus of Information Retrieval (IR) systems from solely human-written to a coexistence with LLM-generated content. The impact of this surge in AIGC on IR systems remains an open question, with the primary challenge being the lack of a dedicated benchmark for rese… ▽ More The proliferation of Large Language Models (LLMs) has led to an influx of AI-generated content (AIGC) on the internet, transforming the corpus of Information Retrieval (IR) systems from solely human-written to a coexistence with LLM-generated content. The impact of this surge in AIGC on IR systems remains an open question, with the primary challenge being the lack of a dedicated benchmark for researchers. In this paper, we introduce Cocktail, a comprehensive benchmark tailored for evaluating IR models in this mixed-sourced data landscape of the LLM era. Cocktail consists of 16 diverse datasets with mixed human-written and LLM-generated corpora across various text retrieval tasks and domains. Additionally, to avoid the potential bias from previously included dataset information in LLMs, we also introduce an up-to-date dataset, named NQ-UTD, with queries derived from recent events. Through conducting over 1,000 experiments to assess state-of-the-art retrieval models against the benchmarked datasets in Cocktail, we uncover a clear trade-off between ranking performance and source bias in neural retrieval models, highlighting the necessity for a balanced approach in designing future IR systems. We hope Cocktail can serve as a foundational resource for IR research in the LLM era, with all data and code publicly available at \url{https://github.com/KID-22/Cocktail}. △ Less

Submitted 2 July, 2024; v1 submitted 26 May, 2024; originally announced May 2024.

Comments: Accepted by Findings of ACL 2024; Datasets Link: https://huggingface.co/IR-Cocktail

arXiv:2405.16120 [pdf, other]

BankFair: Balancing Accuracy and Fairness under Varying User Traffic in Recommender System

Authors: Xiaopeng Ye, Chen Xu, Jun Xu, Xuyang Xie, Gang Wang, Zhenhua Dong

Abstract: Driven by sustainability and economic considerations, two-sided recommendation platforms are required to satisfy the needs of both users and providers. Previous studies often indicate that the two sides' needs differ in urgency: providers have relatively long-term exposure requirements, while users desire short-term, accurate services. However, our empirical study reveals that existing methods for… ▽ More Driven by sustainability and economic considerations, two-sided recommendation platforms are required to satisfy the needs of both users and providers. Previous studies often indicate that the two sides' needs differ in urgency: providers have relatively long-term exposure requirements, while users desire short-term, accurate services. However, our empirical study reveals that existing methods for balancing fairness and accuracy often fail to ensure both long-term fairness and short-term accuracy under fluctuating user traffic in real applications. Notably, when user traffic is low, user experience tends to decline significantly. Then, we conducted a theoretical analysis confirming that user traffic is a crucial factor in such a trade-off problem. Ensuring accuracy and fairness under variable user traffic remains a challenge. Inspired by the bankruptcy problem in economics, we propose a novel fairness-aware re-ranking approach called BankFair. BankFair intuitively uses the Talmud rule to leverage periods of high user traffic to compensate for periods of low traffic, ensuring consistent user service while maintaining long-term fairness. BankFair is composed of two modules: (1) utilizing the Talmud rule to determine the necessary degree of fairness across varying user traffic periods, and (2) implementing an online re-ranking algorithm based on the fairness degree established by the Talmud rule. Experiments on one publicly available and one real industrial dataset demonstrate that BankFair outperforms all baselines in terms of both accuracy and provider fairness. △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2405.12954 [pdf, other]

A Method on Searching Better Activation Functions

Authors: Haoyuan Sun, Zihao Wu, Bo Xia, Pu Chang, Zibin Dong, Yifu Yuan, Yongzhe Chang, Xueqian Wang

Abstract: The success of artificial neural networks (ANNs) hinges greatly on the judicious selection of an activation function, introducing non-linearity into network and enabling them to model sophisticated relationships in data. However, the search of activation functions has largely relied on empirical knowledge in the past, lacking theoretical guidance, which has hindered the identification of more effe… ▽ More The success of artificial neural networks (ANNs) hinges greatly on the judicious selection of an activation function, introducing non-linearity into network and enabling them to model sophisticated relationships in data. However, the search of activation functions has largely relied on empirical knowledge in the past, lacking theoretical guidance, which has hindered the identification of more effective activation functions. In this work, we offer a proper solution to such issue. Firstly, we theoretically demonstrate the existence of the worst activation function with boundary conditions (WAFBC) from the perspective of information entropy. Furthermore, inspired by the Taylor expansion form of information entropy functional, we propose the Entropy-based Activation Function Optimization (EAFO) methodology. EAFO methodology presents a novel perspective for designing static activation functions in deep neural networks and the potential of dynamically optimizing activation during iterative training. Utilizing EAFO methodology, we derive a novel activation function from ReLU, known as Correction Regularized ReLU (CRReLU). Experiments conducted with vision transformer and its variants on CIFAR-10, CIFAR-100 and ImageNet-1K datasets demonstrate the superiority of CRReLU over existing corrections of ReLU. Extensive empirical studies on task of large language model (LLM) fine-tuning, CRReLU exhibits superior performance compared to GELU, suggesting its broader potential for practical applications. △ Less

Submitted 22 May, 2024; v1 submitted 18 May, 2024; originally announced May 2024.

Comments: 16 pages,3 figures

arXiv:2405.12892 [pdf, other]

Retrievable Domain-Sensitive Feature Memory for Multi-Domain Recommendation

Authors: Yuang Zhao, Zhaocheng Du, Qinglin Jia, Linxuan Zhang, Zhenhua Dong, Ruiming Tang

Abstract: With the increase in the business scale and number of domains in online advertising, multi-domain ad recommendation has become a mainstream solution in the industry. The core of multi-domain recommendation is effectively modeling the commonalities and distinctions among domains. Existing works are dedicated to designing model architectures for implicit multi-domain modeling while overlooking an in… ▽ More With the increase in the business scale and number of domains in online advertising, multi-domain ad recommendation has become a mainstream solution in the industry. The core of multi-domain recommendation is effectively modeling the commonalities and distinctions among domains. Existing works are dedicated to designing model architectures for implicit multi-domain modeling while overlooking an in-depth investigation from a more fundamental perspective of feature distributions. This paper focuses on features with significant differences across various domains in both distributions and effects on model predictions. We refer to these features as domain-sensitive features, which serve as carriers of domain distinctions and are crucial for multi-domain modeling. Experiments demonstrate that existing multi-domain modeling methods may neglect domain-sensitive features, indicating insufficient learning of domain distinctions. To avoid this neglect, we propose a domain-sensitive feature attribution method to identify features that best reflect domain distinctions from the feature set. Further, we design a memory architecture that extracts domain-specific information from domain-sensitive features for the model to retrieve and integrate, thereby enhancing the awareness of domain distinctions. Extensive offline and online experiments demonstrate the superiority of our method in capturing domain distinctions and improving multi-domain recommendation performance. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2405.10800 [pdf, other]

Heterogeneity-Informed Meta-Parameter Learning for Spatiotemporal Time Series Forecasting

Authors: Zheng Dong, Renhe Jiang, Haotian Gao, Hangchen Liu, Jinliang Deng, Qingsong Wen, Xuan Song

Abstract: Spatiotemporal time series forecasting plays a key role in a wide range of real-world applications. While significant progress has been made in this area, fully capturing and leveraging spatiotemporal heterogeneity remains a fundamental challenge. Therefore, we propose a novel Heterogeneity-Informed Meta-Parameter Learning scheme. Specifically, our approach implicitly captures spatiotemporal heter… ▽ More Spatiotemporal time series forecasting plays a key role in a wide range of real-world applications. While significant progress has been made in this area, fully capturing and leveraging spatiotemporal heterogeneity remains a fundamental challenge. Therefore, we propose a novel Heterogeneity-Informed Meta-Parameter Learning scheme. Specifically, our approach implicitly captures spatiotemporal heterogeneity through learning spatial and temporal embeddings, which can be viewed as a clustering process. Then, a novel spatiotemporal meta-parameter learning paradigm is proposed to learn spatiotemporal-specific parameters from meta-parameter pools, which is informed by the captured heterogeneity. Based on these ideas, we develop a Heterogeneity-Informed Spatiotemporal Meta-Network (HimNet) for spatiotemporal time series forecasting. Extensive experiments on five widely-used benchmarks demonstrate our method achieves state-of-the-art performance while exhibiting superior interpretability. Our code is available at https://github.com/XDZhelheim/HimNet. △ Less

Submitted 17 May, 2024; originally announced May 2024.

Comments: Accepted by KDD'24 Research Track

arXiv:2405.10596 [pdf, other]

CELA: Cost-Efficient Language Model Alignment for CTR Prediction

Authors: Xingmei Wang, Weiwen Liu, Xiaolong Chen, Qi Liu, Xu Huang, Defu Lian, Xiangyang Li, Yasheng Wang, Zhenhua Dong, Ruiming Tang

Abstract: Click-Through Rate (CTR) prediction holds a paramount position in recommender systems. The prevailing ID-based paradigm underperforms in cold-start scenarios due to the skewed distribution of feature frequency. Additionally, the utilization of a single modality fails to exploit the knowledge contained within textual features. Recent efforts have sought to mitigate these challenges by integrating P… ▽ More Click-Through Rate (CTR) prediction holds a paramount position in recommender systems. The prevailing ID-based paradigm underperforms in cold-start scenarios due to the skewed distribution of feature frequency. Additionally, the utilization of a single modality fails to exploit the knowledge contained within textual features. Recent efforts have sought to mitigate these challenges by integrating Pre-trained Language Models (PLMs). They design hard prompts to structure raw features into text for each interaction and then apply PLMs for text processing. With external knowledge and reasoning capabilities, PLMs extract valuable information even in cases of sparse interactions. Nevertheless, compared to ID-based models, pure text modeling degrades the efficacy of collaborative filtering, as well as feature scalability and efficiency during both training and inference. To address these issues, we propose \textbf{C}ost-\textbf{E}fficient \textbf{L}anguage Model \textbf{A}lignment (\textbf{CELA}) for CTR prediction. CELA incorporates textual features and language models while preserving the collaborative filtering capabilities of ID-based models. This model-agnostic framework can be equipped with plug-and-play textual features, with item-level alignment enhancing the utilization of external information while maintaining training and inference efficiency. Through extensive offline experiments, CELA demonstrates superior performance compared to state-of-the-art methods. Furthermore, an online A/B test conducted on an industrial App recommender system showcases its practical effectiveness, solidifying the potential for real-world applications of CELA. △ Less

Submitted 17 June, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

Comments: 10 pages, 5 figures

MSC Class: 68T07

arXiv:2405.10284 [pdf, other]

doi 10.3390/axioms13050323

Quantum Vision Transformers for Quark-Gluon Classification

Authors: Marçal Comajoan Cara, Gopal Ramesh Dahale, Zhongtian Dong, Roy T. Forestano, Sergei Gleyzer, Daniel Justice, Kyoungchul Kong, Tom Magorsch, Konstantin T. Matchev, Katia Matcheva, Eyup B. Unlu

Abstract: We introduce a hybrid quantum-classical vision transformer architecture, notable for its integration of variational quantum circuits within both the attention mechanism and the multi-layer perceptrons. The research addresses the critical challenge of computational efficiency and resource constraints in analyzing data from the upcoming High Luminosity Large Hadron Collider, presenting the architect… ▽ More We introduce a hybrid quantum-classical vision transformer architecture, notable for its integration of variational quantum circuits within both the attention mechanism and the multi-layer perceptrons. The research addresses the critical challenge of computational efficiency and resource constraints in analyzing data from the upcoming High Luminosity Large Hadron Collider, presenting the architecture as a potential solution. In particular, we evaluate our method by applying the model to multi-detector jet images from CMS Open Data. The goal is to distinguish quark-initiated from gluon-initiated jets. We successfully train the quantum model and evaluate it via numerical simulations. Using this approach, we achieve classification performance almost on par with the one obtained with the completely classical architecture, considering a similar number of parameters. △ Less

Submitted 16 May, 2024; originally announced May 2024.

Comments: 14 pages, 8 figures. Published in MDPI Axioms 2024, 13(5), 323

MSC Class: 68Q12 (Primary) 81P68; 68T07 (Secondary)

Journal ref: Axioms 2024, 13(5), 323

arXiv:2405.08686 [pdf]

Antiferromagnetic Quantum Anomalous Hall Effect Modulated by Spin Flips and Flops

Authors: Zichen Lian, Yongchao Wang, Yongqian Wang, Yang Feng, Zehao Dong, Shuai Yang, Liangcai Xu, Yaoxin Li, Bohan Fu, Yuetan Li, Wanjun Jiang, Chang Liu, Jinsong Zhang, Yayu Wang

Abstract: The interplay between nontrivial band topology and layered antiferromagnetism in MnBi2Te4 has opened up a new avenue for exploring topological phases of matter. Representative examples include the quantum anomalous Hall effect and axion insulator state observed in odd and even number layers of MnBi2Te4, when the top and bottom surfaces have parallel and antiparallel spin alignments respectively. T… ▽ More The interplay between nontrivial band topology and layered antiferromagnetism in MnBi2Te4 has opened up a new avenue for exploring topological phases of matter. Representative examples include the quantum anomalous Hall effect and axion insulator state observed in odd and even number layers of MnBi2Te4, when the top and bottom surfaces have parallel and antiparallel spin alignments respectively. The rich and complex spin dynamics associated with the van der Waals antiferromagnetic order is expected to generate novel topological phases and phase transitions that are unique to MnBi2Te4. Here we fabricate a device of 7-septuple-layer MnBi2Te4 covered with AlOx capping layer, which enables the investigation of antiferromagnetic quantum anomalous Hall effect over wide parameter spaces. By tuning the gate voltage and perpendicular magnetic field, we uncover a cascade of quantum phase transitions that can be attributed to the influence of spin configurations on charge transport. Furthermore, we find that an in-plane magnetic field enhances both the coercive field and exchange gap of the surface state, in sharp contrast to that in ferromagnetic quantum anomalous Hall state. We propose that these peculiar features arise from the spin flip and flop transitions inherent to van der Waals antiferromagnet. The versatile tunability of the quantum anomalous Hall effect in MnBi2Te4 paves the way for potential applications in topological antiferromagnetic spintronics. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: 16 pages, 4 figures

arXiv:2405.06999 [pdf, other]

Large Language Model-aided Edge Learning in Distribution System State Estimation

Authors: Renyou Xie, Xin Yin, Chaojie Li, Nian Liu, Bo Zhao, Zhaoyang Dong

Abstract: Distribution system state estimation (DSSE) plays a crucial role in the real-time monitoring, control, and operation of distribution networks. Besides intensive computational requirements, conventional DSSE methods need high-quality measurements to obtain accurate states, whereas missing values often occur due to sensor failures or communication delays. To address these challenging issues, a forec… ▽ More Distribution system state estimation (DSSE) plays a crucial role in the real-time monitoring, control, and operation of distribution networks. Besides intensive computational requirements, conventional DSSE methods need high-quality measurements to obtain accurate states, whereas missing values often occur due to sensor failures or communication delays. To address these challenging issues, a forecast-then-estimate framework of edge learning is proposed for DSSE, leveraging large language models (LLMs) to forecast missing measurements and provide pseudo-measurements. Firstly, natural language-based prompts and measurement sequences are integrated by the proposed LLM to learn patterns from historical data and provide accurate forecasting results. Secondly, a convolutional layer-based neural network model is introduced to improve the robustness of state estimation under missing measurement. Thirdly, to alleviate the overfitting of the deep learning-based DSSE, it is reformulated as a multi-task learning framework containing shared and task-specific layers. The uncertainty weighting algorithm is applied to find the optimal weights to balance different tasks. The numerical simulation on the Simbench case is used to demonstrate the effectiveness of the proposed forecast-then-estimate framework. △ Less

Submitted 11 May, 2024; originally announced May 2024.

arXiv:2405.06927 [pdf, ps, other]

Multimodal Pretraining and Generation for Recommendation: A Tutorial

Authors: Jieming Zhu, Chuhan Wu, Rui Zhang, Zhenhua Dong

Abstract: Personalized recommendation stands as a ubiquitous channel for users to explore information or items aligned with their interests. Nevertheless, prevailing recommendation models predominantly rely on unique IDs and categorical features for user-item matching. While this ID-centric approach has witnessed considerable success, it falls short in comprehensively grasping the essence of raw item conten… ▽ More Personalized recommendation stands as a ubiquitous channel for users to explore information or items aligned with their interests. Nevertheless, prevailing recommendation models predominantly rely on unique IDs and categorical features for user-item matching. While this ID-centric approach has witnessed considerable success, it falls short in comprehensively grasping the essence of raw item contents across diverse modalities, such as text, image, audio, and video. This underutilization of multimodal data poses a limitation to recommender systems, particularly in the realm of multimedia services like news, music, and short-video platforms. The recent surge in pretraining and generation techniques presents both opportunities and challenges in the development of multimodal recommender systems. This tutorial seeks to provide a thorough exploration of the latest advancements and future trajectories in multimodal pretraining and generation techniques within the realm of recommender systems. The tutorial comprises three parts: multimodal pretraining, multimodal generation, and industrial applications and open challenges in the field of recommendation. Our target audience encompasses scholars, practitioners, and other parties interested in this domain. By providing a succinct overview of the field, we aspire to facilitate a swift understanding of multimodal recommendation and foster meaningful discussions on the future development of this evolving landscape. △ Less

Submitted 11 May, 2024; originally announced May 2024.

Comments: Published in WWW 2024 Tutorial. Find the tutorial materials at https://mmrec.github.io/tutorial/www2024/

arXiv:2405.03952 [pdf, other]

HAFFormer: A Hierarchical Attention-Free Framework for Alzheimer's Disease Detection From Spontaneous Speech

Authors: Zhongren Dong, Zixing Zhang, Weixiang Xu, Jing Han, Jianjun Ou, Björn W. Schuller

Abstract: Automatically detecting Alzheimer's Disease (AD) from spontaneous speech plays an important role in its early diagnosis. Recent approaches highly rely on the Transformer architectures due to its efficiency in modelling long-range context dependencies. However, the quadratic increase in computational complexity associated with self-attention and the length of audio poses a challenge when deploying… ▽ More Automatically detecting Alzheimer's Disease (AD) from spontaneous speech plays an important role in its early diagnosis. Recent approaches highly rely on the Transformer architectures due to its efficiency in modelling long-range context dependencies. However, the quadratic increase in computational complexity associated with self-attention and the length of audio poses a challenge when deploying such models on edge devices. In this context, we construct a novel framework, namely Hierarchical Attention-Free Transformer (HAFFormer), to better deal with long speech for AD detection. Specifically, we employ an attention-free module of Multi-Scale Depthwise Convolution to replace the self-attention and thus avoid the expensive computation, and a GELU-based Gated Linear Unit to replace the feedforward layer, aiming to automatically filter out the redundant information. Moreover, we design a hierarchical structure to force it to learn a variety of information grains, from the frame level to the dialogue level. By conducting extensive experiments on the ADReSS-M dataset, the introduced HAFFormer can achieve competitive results (82.6% accuracy) with other recent work, but with significant computational complexity and model size reduction compared to the standard Transformer. This shows the efficiency of HAFFormer in dealing with long audio for AD detection. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Journal ref: publised at ICASSP 2024

arXiv:2405.00742 [pdf, other]

Federated Graph Learning for EV Charging Demand Forecasting with Personalization Against Cyberattacks

Authors: Yi Li, Renyou Xie, Chaojie Li, Yi Wang, Zhaoyang Dong

Abstract: Mitigating cybersecurity risk in electric vehicle (EV) charging demand forecasting plays a crucial role in the safe operation of collective EV chargings, the stability of the power grid, and the cost-effective infrastructure expansion. However, existing methods either suffer from the data privacy issue and the susceptibility to cyberattacks or fail to consider the spatial correlation among differe… ▽ More Mitigating cybersecurity risk in electric vehicle (EV) charging demand forecasting plays a crucial role in the safe operation of collective EV chargings, the stability of the power grid, and the cost-effective infrastructure expansion. However, existing methods either suffer from the data privacy issue and the susceptibility to cyberattacks or fail to consider the spatial correlation among different stations. To address these challenges, a federated graph learning approach involving multiple charging stations is proposed to collaboratively train a more generalized deep learning model for demand forecasting while capturing spatial correlations among various stations and enhancing robustness against potential attacks. Firstly, for better model performance, a Graph Neural Network (GNN) model is leveraged to characterize the geographic correlation among different charging stations in a federated manner. Secondly, to ensure robustness and deal with the data heterogeneity in a federated setting, a message passing that utilizes a global attention mechanism to aggregate personalized models for each client is proposed. Thirdly, by concerning cyberattacks, a special credit-based function is designed to mitigate potential threats from malicious clients or unwanted attacks. Extensive experiments on a public EV charging dataset are conducted using various deep learning techniques and federated learning methods to demonstrate the prediction accuracy and robustness of the proposed approach. △ Less

Submitted 30 April, 2024; originally announced May 2024.

Comments: 11 pages,4 figures

arXiv:2404.18073 [pdf, other]

Charge and spin density wave orders in field-biased Bernal bilayer graphene

Authors: Zhiyu Dong, Patrick A. Lee, Leonid Levitov

Abstract: This paper aims to clarify the nature of a surprising ordered phase recently reported in biased Bernal bilayer graphene that occurs at the phase boundary between the isospin-polarized and unpolarized phases. Strong nonlinearity of transport at abnormally small currents, with $dI/dV$ vs. $I$ sharply rising and then falling back, is typical for a charge/spin-density-wave state (CDW or SDW) sliding t… ▽ More This paper aims to clarify the nature of a surprising ordered phase recently reported in biased Bernal bilayer graphene that occurs at the phase boundary between the isospin-polarized and unpolarized phases. Strong nonlinearity of transport at abnormally small currents, with $dI/dV$ vs. $I$ sharply rising and then falling back, is typical for a charge/spin-density-wave state (CDW or SDW) sliding transport. Here, however, it is observed at an isospin-order phase boundary, prompting a question about the CDW/SDW mechanism and its relation to the quantum critical point. We argue that the observed phase diagram cannot be understood within a standard weak-coupling picture. Rather, it points to a mechanism that relies on an effective interaction enhancement at a quantum critical point. We develop a detailed strong-coupling framework accounting for the soft collective modes that explain these observations. △ Less

Submitted 28 April, 2024; originally announced April 2024.

Comments: 11 pages, 7 figures

arXiv:2404.16304 [pdf, other]

BezierFormer: A Unified Architecture for 2D and 3D Lane Detection

Authors: Zhiwei Dong, Xi Zhu, Xiya Cao, Ran Ding, Wei Li, Caifa Zhou, Yongliang Wang, Qiangbo Liu

Abstract: Lane detection has made significant progress in recent years, but there is not a unified architecture for its two sub-tasks: 2D lane detection and 3D lane detection. To fill this gap, we introduce BézierFormer, a unified 2D and 3D lane detection architecture based on Bézier curve lane representation. BézierFormer formulate queries as Bézier control points and incorporate a novel Bézier curve atten… ▽ More Lane detection has made significant progress in recent years, but there is not a unified architecture for its two sub-tasks: 2D lane detection and 3D lane detection. To fill this gap, we introduce BézierFormer, a unified 2D and 3D lane detection architecture based on Bézier curve lane representation. BézierFormer formulate queries as Bézier control points and incorporate a novel Bézier curve attention mechanism. This attention mechanism enables comprehensive and accurate feature extraction for slender lane curves via sampling and fusing multiple reference points on each curve. In addition, we propose a novel Chamfer IoU-based loss which is more suitable for the Bézier control points regression. The state-of-the-art performance of BézierFormer on widely-used 2D and 3D lane detection benchmarks verifies its effectiveness and suggests the worthiness of further exploration. △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: ICME 2024, 11 pages, 8 figures

arXiv:2404.15773 [pdf, other]

Possible gapless quantum spin liquid behavior in the triangular-lattice Ising antiferromagnet PrMgAl$_{11}$O$_{19}$

Authors: Zhen Ma, Shuhan Zheng, Yingqi Chen, Ruokai Xu, Zhao-Yang Dong, Jinghui Wang, Hong Du, Jan Peter Embs, Shuaiwei Li, Yao Li, Yongjun Zhang, Meifeng Liu, Ruidan Zhong, Jun-Ming Liu, Jinsheng Wen

Abstract: Quantum spin liquids (QSLs) represent a novel state where spins are highly entangled but do not order even at zero temperature due to strong quantum fluctuations. Such a state is mostly studied in Heisenberg models defined on geometrically frustrated lattices. Here, we turn to a new triangular-lattice antiferromagnet PrMgAl$_{11}$O$_{19}$, in which the interactions are believed to be of Ising type… ▽ More Quantum spin liquids (QSLs) represent a novel state where spins are highly entangled but do not order even at zero temperature due to strong quantum fluctuations. Such a state is mostly studied in Heisenberg models defined on geometrically frustrated lattices. Here, we turn to a new triangular-lattice antiferromagnet PrMgAl$_{11}$O$_{19}$, in which the interactions are believed to be of Ising type. Magnetic susceptibility measured with an external field along the $c$ axis is two orders of magnitude larger than that with a field in the $ab$ plane, displaying an ideal easy-axis behavior. Meanwhile, there is no magnetic phase transition or spin freezing observed down to 1.8 K. Ultralow-temperature specific heat measured down to 50 mK does not capture any phase transition either, but a hump at 4.5 K, below which the magnetic specific heat exhibits a quasi-quadratic temperature dependence that is consistent with a Dirac QSL state. Inelastic neutron scattering technique is also employed to elucidate the nature of its ground state. In the magnetic excitation spectra, there is a gapless broad continuum at the base temperature 55~mK, in favor of the realization of a gapless QSL. Our results provide a scarce example for the QSL behaviors observed in an Ising-type magnet, which can serve as a promising platform for future research on QSL physics based on an Ising model. △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: 11 pages, 5 figures

Journal ref: Phys. Rev. B 109, 165143 (2024)

arXiv:2404.14774 [pdf, other]

Contrastive Quantization based Semantic Code for Generative Recommendation

Authors: Mengqun Jin, Zexuan Qiu, Jieming Zhu, Zhenhua Dong, Xiu Li

Abstract: With the success of large language models, generative retrieval has emerged as a new retrieval technique for recommendation. It can be divided into two stages: the first stage involves constructing discrete Codes (i.e., codes), and the second stage involves decoding the code sequentially via the transformer architecture. Current methods often construct item semantic codes by reconstructing based q… ▽ More With the success of large language models, generative retrieval has emerged as a new retrieval technique for recommendation. It can be divided into two stages: the first stage involves constructing discrete Codes (i.e., codes), and the second stage involves decoding the code sequentially via the transformer architecture. Current methods often construct item semantic codes by reconstructing based quantization on item textual representation, but they fail to capture item discrepancy that is essential in modeling item relationships in recommendation sytems. In this paper, we propose to construct the code representation of items by simultaneously considering both item relationships and semantic information. Specifically, we employ a pre-trained language model to extract item's textual description and translate it into item's embedding. Then we propose to enhance the encoder-decoder based RQVAE model with contrastive objectives to learn item code. To be specific, we employ the embeddings generated by the decoder from the samples themselves as positive instances and those from other samples as negative instances. Thus we effectively enhance the item discrepancy across all items, better preserving the item neighbourhood. Finally, we train and test semantic code with with generative retrieval on a sequential recommendation model. Our experiments demonstrate that our method improves NDCG@5 with 43.76% on the MIND dataset and Recall@10 with 80.95% on the Office dataset compared to the previous baselines. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2404.14309 [pdf, other]

Towards Better Adversarial Purification via Adversarial Denoising Diffusion Training

Authors: Yiming Liu, Kezhao Liu, Yao Xiao, Ziyi Dong, Xiaogang Xu, Pengxu Wei, Liang Lin

Abstract: Recently, diffusion-based purification (DBP) has emerged as a promising approach for defending against adversarial attacks. However, previous studies have used questionable methods to evaluate the robustness of DBP models, their explanations of DBP robustness also lack experimental support. We re-examine DBP robustness using precise gradient, and discuss the impact of stochasticity on DBP robustne… ▽ More Recently, diffusion-based purification (DBP) has emerged as a promising approach for defending against adversarial attacks. However, previous studies have used questionable methods to evaluate the robustness of DBP models, their explanations of DBP robustness also lack experimental support. We re-examine DBP robustness using precise gradient, and discuss the impact of stochasticity on DBP robustness. To better explain DBP robustness, we assess DBP robustness under a novel attack setting, Deterministic White-box, and pinpoint stochasticity as the main factor in DBP robustness. Our results suggest that DBP models rely on stochasticity to evade the most effective attack direction, rather than directly countering adversarial perturbations. To improve the robustness of DBP models, we propose Adversarial Denoising Diffusion Training (ADDT). This technique uses Classifier-Guided Perturbation Optimization (CGPO) to generate adversarial perturbation through guidance from a pre-trained classifier, and uses Rank-Based Gaussian Mapping (RBGM) to convert adversarial pertubation into a normal Gaussian distribution. Empirical results show that ADDT improves the robustness of DBP models. Further experiments confirm that ADDT equips DBP models with the ability to directly counter adversarial perturbations. △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.13655 [pdf, other]

SPGNN: Recognizing Salient Subgraph Patterns via Enhanced Graph Convolution and Pooling

Authors: Zehao Dong, Muhan Zhang, Yixin Chen

Abstract: Graph neural networks (GNNs) have revolutionized the field of machine learning on non-Euclidean data such as graphs and networks. GNNs effectively implement node representation learning through neighborhood aggregation and achieve impressive results in many graph-related tasks. However, most neighborhood aggregation approaches are summation-based, which can be problematic as they may not be suffic… ▽ More Graph neural networks (GNNs) have revolutionized the field of machine learning on non-Euclidean data such as graphs and networks. GNNs effectively implement node representation learning through neighborhood aggregation and achieve impressive results in many graph-related tasks. However, most neighborhood aggregation approaches are summation-based, which can be problematic as they may not be sufficiently expressive to encode informative graph structures. Furthermore, though the graph pooling module is also of vital importance for graph learning, especially for the task of graph classification, research on graph down-sampling mechanisms is rather limited. To address the above challenges, we propose a concatenation-based graph convolution mechanism that injectively updates node representations to maximize the discriminative power in distinguishing non-isomorphic subgraphs. In addition, we design a novel graph pooling module, called WL-SortPool, to learn important subgraph patterns in a deep-learning manner. WL-SortPool layer-wise sorts node representations (i.e. continuous WL colors) to separately learn the relative importance of subtrees with different depths for the purpose of classification, thus better characterizing the complex graph topology and rich information encoded in the graph. We propose a novel Subgraph Pattern GNN (SPGNN) architecture that incorporates these enhancements. We test the proposed SPGNN architecture on many graph classification benchmarks. Experimental results show that our method can achieve highly competitive results with state-of-the-art graph kernels and other GNN approaches. △ Less

Submitted 29 April, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

arXiv:2404.13501 [pdf, other]

A Survey on the Memory Mechanism of Large Language Model based Agents

Authors: Zeyu Zhang, Xiaohe Bo, Chen Ma, Rui Li, Xu Chen, Quanyu Dai, Jieming Zhu, Zhenhua Dong, Ji-Rong Wen

Abstract: Large language model (LLM) based agents have recently attracted much attention from the research and industry communities. Compared with original LLMs, LLM-based agents are featured in their self-evolving capability, which is the basis for solving real-world problems that need long-term and complex agent-environment interactions. The key component to support agent-environment interactions is the m… ▽ More Large language model (LLM) based agents have recently attracted much attention from the research and industry communities. Compared with original LLMs, LLM-based agents are featured in their self-evolving capability, which is the basis for solving real-world problems that need long-term and complex agent-environment interactions. The key component to support agent-environment interactions is the memory of the agents. While previous studies have proposed many promising memory mechanisms, they are scattered in different papers, and there lacks a systematical review to summarize and compare these works from a holistic perspective, failing to abstract common and effective designing patterns for inspiring future studies. To bridge this gap, in this paper, we propose a comprehensive survey on the memory mechanism of LLM-based agents. In specific, we first discuss ''what is'' and ''why do we need'' the memory in LLM-based agents. Then, we systematically review previous studies on how to design and evaluate the memory module. In addition, we also present many agent applications, where the memory module plays an important role. At last, we analyze the limitations of existing work and show important future directions. To keep up with the latest advances in this field, we create a repository at \url{https://github.com/nuster1128/LLM_Agent_Memory_Survey}. △ Less

Submitted 20 April, 2024; originally announced April 2024.

Comments: 39 pages, 5 figures, 4 tables

Showing 1–50 of 668 results for author: Dong, Z