subscribe to arXiv mailings

doi 10.1103/PhysRevB.110.045405

SQUID oscillations in PbTe nanowire networks

Authors: Yichun Gao, Wenyu Song, Zehao Yu, Shuai Yang, Yuhao Wang, Ruidong Li, Fangting Chen, Zuhan Geng, Lining Yang, Jiaye Xu, Zhaoyu Wang, Zonglin Li, Shan Zhang, Xiao Feng, Tiantian Wang, Yunyi Zang, Lin Li, Runan Shang, Qi-Kun Xue, Ke He, Hao Zhang

Abstract: Network structures by semiconductor nanowires hold great promise for advanced quantum devices, especially for applications in topological quantum computing. In this study, we created networks of PbTe nanowires arranged in loop configurations. Using shadow-wall epitaxy, we defined superconducting quantum interference devices (SQUIDs) using the superconductor Pb. These SQUIDs exhibit oscillations in… ▽ More Network structures by semiconductor nanowires hold great promise for advanced quantum devices, especially for applications in topological quantum computing. In this study, we created networks of PbTe nanowires arranged in loop configurations. Using shadow-wall epitaxy, we defined superconducting quantum interference devices (SQUIDs) using the superconductor Pb. These SQUIDs exhibit oscillations in supercurrent upon the scanning of a magnetic field. Most of the oscillations can be fitted assuming a sinusoidal current-phase relation for each Josephson junction. Under certain conditions, the oscillations are found to be skewed, suggesting possible deviation from a sinusoidal behavior. Our results highlight the potential of PbTe nanowires for building complex quantum devices in the form of networks. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Journal ref: Phys. Rev. B 110, 045405 (2024)

arXiv:2404.06350 [pdf, other]

Rolling Shutter Correction with Intermediate Distortion Flow Estimation

Authors: Mingdeng Cao, Sidi Yang, Yujiu Yang, Yinqiang Zheng

Abstract: This paper proposes to correct the rolling shutter (RS) distorted images by estimating the distortion flow from the global shutter (GS) to RS directly. Existing methods usually perform correction using the undistortion flow from the RS to GS. They initially predict the flow from consecutive RS frames, subsequently rescaling it as the displacement fields from the RS frame to the underlying GS image… ▽ More This paper proposes to correct the rolling shutter (RS) distorted images by estimating the distortion flow from the global shutter (GS) to RS directly. Existing methods usually perform correction using the undistortion flow from the RS to GS. They initially predict the flow from consecutive RS frames, subsequently rescaling it as the displacement fields from the RS frame to the underlying GS image using time-dependent scaling factors. Following this, RS-aware forward warping is employed to convert the RS image into its GS counterpart. Nevertheless, this strategy is prone to two shortcomings. First, the undistortion flow estimation is rendered inaccurate by merely linear scaling the flow, due to the complex non-linear motion nature. Second, RS-aware forward warping often results in unavoidable artifacts. To address these limitations, we introduce a new framework that directly estimates the distortion flow and rectifies the RS image with the backward warping operation. More specifically, we first propose a global correlation-based flow attention mechanism to estimate the initial distortion flow and GS feature jointly, which are then refined by the following coarse-to-fine decoder layers. Additionally, a multi-distortion flow prediction strategy is integrated to mitigate the issue of inaccurate flow estimation further. Experimental results validate the effectiveness of the proposed method, which outperforms state-of-the-art approaches on various benchmarks while maintaining high efficiency. The project is available at \url{https://github.com/ljzycmd/DFRSC}. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: CVPR2024

arXiv:2404.05686 [pdf, other]

Chebyshev pseudosite matrix product state approach for cluster perturbation theory

Authors: Pei-Yuan Zhao, Ke Ding, Shuo Yang

Abstract: We introduce the Chebyshev pseudosite matrix product state approach (ChePSMPS) as a solver for cluster perturbation theory (CPT), crucial for simulating spectral functions in two-dimensional electron-phonon ($e$-ph) coupling systems. ChePSMPS distinguishes itself from conventional exact diagonalization solvers by supporting larger clusters, thereby significantly mitigating finite-size effects. Fre… ▽ More We introduce the Chebyshev pseudosite matrix product state approach (ChePSMPS) as a solver for cluster perturbation theory (CPT), crucial for simulating spectral functions in two-dimensional electron-phonon ($e$-ph) coupling systems. ChePSMPS distinguishes itself from conventional exact diagonalization solvers by supporting larger clusters, thereby significantly mitigating finite-size effects. Free from the fermion sign problem, ChePSMPS enhances its ability to explore $e$-ph effects and generate high-resolution spectral functions in doped Mott insulators. We use this method to simulate the spectra for both one- and two-dimensional Hubbard-Holstein models, highlighting its superiority over other methods. Our findings validate ChePSMPS as a powerful and reliable Green's function solver. In conjunction with embedding methods, ChePSMPS emerges as an essential tool for simulating strongly correlated $e$-ph coupling systems. △ Less

Submitted 26 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

Comments: 8 pages, 5 figures

arXiv:2404.05685 [pdf, other]

Global phase diagram of doped quantum spin liquid on the Kagome lattice

Authors: Zheng-Tao Xu, Zheng-Cheng Gu, Shuo Yang

Abstract: It has long been believed that doped quantum spin liquids (QSLs) can give rise to fascinating quantum phases, including the possibility of high-temperature superconductivity (SC) as proposed by P. W. Anderson's resonating valence bond (RVB) scenario. The Kagome lattice $t$-$J$ model is known to exhibit spin liquid behavior at half-filling, making it an ideal system for studying the properties of d… ▽ More It has long been believed that doped quantum spin liquids (QSLs) can give rise to fascinating quantum phases, including the possibility of high-temperature superconductivity (SC) as proposed by P. W. Anderson's resonating valence bond (RVB) scenario. The Kagome lattice $t$-$J$ model is known to exhibit spin liquid behavior at half-filling, making it an ideal system for studying the properties of doped QSL. In this study, we employ the fermionic projected entangled simplex state (PESS) method to investigate the ground state properties of the Kagome lattice $t$-$J$ model with $t/J = 3.0$. Our results reveal a phase transition from charge density wave (CDW) states to uniform states around a critical doping level $δ_c \approx 0.27$. Within the CDW phase, we observe different types of Wigner crystal (WC) formulated by doped holes that are energetically favored. As we enter the uniform phase, a non-Fermi liquid (NFL) state emerges within the doping range $0.27 < δ< 0.32$, characterized by an exponential decay of all correlation functions. With further hole doping, we discover the appearance of a pair density wave (PDW) state within a narrow doping region $0.32 < δ< 1/3$. We also discuss the potential experimental implications of our findings. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: 11 pages, 17 figures

arXiv:2404.04915 [pdf, other]

Measurement of the $e^+e^- \to π^+π^-π^0$ cross section in the energy range 0.62-3.50 GeV at Belle II

Authors: Belle II Collaboration, I. Adachi, L. Aggarwal, H. Aihara, N. Akopov, A. Aloisio, N. Anh Ky, D. M. Asner, H. Atmacan, V. Aushev, M. Aversano, R. Ayad, V. Babu, H. Bae, S. Bahinipati, P. Bambade, Sw. Banerjee, S. Bansal, M. Barrett, J. Baudot, A. Baur, A. Beaubien, F. Becherer, J. Becker, J. V. Bennett , et al. (338 additional authors not shown)

Abstract: We report a measurement of the $e^+e^- \to π^+π^-π^0$ cross section in the energy range from 0.62 to 3.50 GeV using an initial-state radiation technique. We use an $e^+e^-$ data sample corresponding to 191 $\text{fb}^{-1}$ of integrated luminosity, collected at a center-of-mass energy at or near the $Υ{(4S)}$ resonance with the Belle II detector at the SuperKEKB collider. Signal yields are extract… ▽ More We report a measurement of the $e^+e^- \to π^+π^-π^0$ cross section in the energy range from 0.62 to 3.50 GeV using an initial-state radiation technique. We use an $e^+e^-$ data sample corresponding to 191 $\text{fb}^{-1}$ of integrated luminosity, collected at a center-of-mass energy at or near the $Υ{(4S)}$ resonance with the Belle II detector at the SuperKEKB collider. Signal yields are extracted by fitting the two-photon mass distribution in $e^+e^- \to π^+π^-π^0γ$ events, which involve a $π^0 \to γγ$ decay and an energetic photon radiated from the initial state. Signal efficiency corrections with an accuracy of 1.6% are obtained from several control data samples. The uncertainty on the cross section at the $ω$ and $φ$ resonances is dominated by the systematic uncertainty of 2.2%. The resulting cross sections in the 0.62-1.80 GeV energy range yield $ a_μ^{3π} = [48.91 \pm 0.23~(\mathrm{stat}) \pm 1.07~(\mathrm{syst})] \times 10^{-10} $ for the leading-order hadronic vacuum polarization contribution to the muon anomalous magnetic moment. This result differs by $2.5$ standard deviations from the most precise current determination. △ Less

Submitted 7 April, 2024; originally announced April 2024.

Comments: 23 pages, 24 figures, submitted to PRD

Report number: KEK Preprint 2023-51, Belle II Preprint 2024-004

arXiv:2404.04801 [pdf, ps, other]

doi 10.1007/s41605-024-00467-8

LHAASO-KM2A detector simulation using Geant4

Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (254 additional authors not shown)

Abstract: KM2A is one of the main sub-arrays of LHAASO, working on gamma ray astronomy and cosmic ray physics at energies above 10 TeV. Detector simulation is the important foundation for estimating detector performance and data analysis. It is a big challenge to simulate the KM2A detector in the framework of Geant4 due to the need to track numerous photons from a large number of detector units (>6000) with… ▽ More KM2A is one of the main sub-arrays of LHAASO, working on gamma ray astronomy and cosmic ray physics at energies above 10 TeV. Detector simulation is the important foundation for estimating detector performance and data analysis. It is a big challenge to simulate the KM2A detector in the framework of Geant4 due to the need to track numerous photons from a large number of detector units (>6000) with large altitude difference (30 m) and huge coverage (1.3 km^2). In this paper, the design of the KM2A simulation code G4KM2A based on Geant4 is introduced. The process of G4KM2A is optimized mainly in memory consumption to avoid memory overffow. Some simpliffcations are used to signiffcantly speed up the execution of G4KM2A. The running time is reduced by at least 30 times compared to full detector simulation. The particle distributions and the core/angle resolution comparison between simulation and experimental data of the full KM2A array are also presented, which show good agreement. △ Less

Submitted 7 April, 2024; originally announced April 2024.

arXiv:2404.04685 [pdf]

Recent Advances in Nanophotonics for Optofluidics

Authors: Sen Yang, Chuchuan Hong, Guodong Zhu, Theodore H. Anyika, Ikjun Hong, Justus C. Ndukaife

Abstract: Optofluidics is dedicated to achieving integrated control of particle and fluid motion, particularly on the micrometer scale, by utilizing light to direct fluid flow and particle motion. The field has seen significant growth recently, driven by the concerted efforts of researchers across various scientific disciplines, notably for its successful applications in biomedical science. In this review,… ▽ More Optofluidics is dedicated to achieving integrated control of particle and fluid motion, particularly on the micrometer scale, by utilizing light to direct fluid flow and particle motion. The field has seen significant growth recently, driven by the concerted efforts of researchers across various scientific disciplines, notably for its successful applications in biomedical science. In this review, we explore a range of optofluidic architectures developed over the past decade, with a primary focus on mechanisms for precise control of micro and nanoscale biological objects and their applications in sensing. Regarding nanoparticle manipulation, we delve into mechanisms based on optical nanotweezers using nanolocalized light fields and light-based hybrid effects with dramatically improved performance and capabilities. In the context of sensing, we emphasize those works that used optofluidics to aggregate molecules or particles to promote sensing and detection. Additionally, we highlight emerging research directions, encompassing both fundamental principles and practical applications in the field. △ Less

Submitted 6 April, 2024; originally announced April 2024.

arXiv:2404.03375 [pdf, other]

Search for the $B_s^0 \rightarrow μ^+μ^-γ$ decay

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1068 additional authors not shown)

Abstract: A search for the fully reconstructed $B_s^0 \rightarrow μ^+μ^-γ$ decay is performed at the LHCb experiment using proton-proton collisions at $\sqrt{s}=13$\,TeV corresponding to an integrated luminosity of $5.4\,\mathrm{fb^{-1}}$. No significant signal is found and upper limits on the branching fraction in intervals of the dimuon mass are set \begin{align} {\cal B}(B_s^0 \rightarrow μ^+μ^-γ) <… ▽ More A search for the fully reconstructed $B_s^0 \rightarrow μ^+μ^-γ$ decay is performed at the LHCb experiment using proton-proton collisions at $\sqrt{s}=13$\,TeV corresponding to an integrated luminosity of $5.4\,\mathrm{fb^{-1}}$. No significant signal is found and upper limits on the branching fraction in intervals of the dimuon mass are set \begin{align} {\cal B}(B_s^0 \rightarrow μ^+μ^-γ) < 4.2\times10^{-8},~&m(μμ)\in[2m_μ,~1.70]\,\mathrm{GeV/c^2} ,\nonumber {\cal B}(B_s^0 \rightarrow μ^+μ^-γ) < 7.7\times10^{-8},~&m(μμ)\in[1.70,~2.88]\,\mathrm{GeV/c^2},\nonumber {\cal B}(B_s^0 \rightarrow μ^+μ^-γ) < 4.2\times10^{-8},~&m(μμ)\in[3.92 ,~m_{B_s^0}]\,\mathrm{GeV/c^2},\nonumber \end{align} at 95\% confidence level. Additionally, upper limits are set on the branching fraction in the $[2m_μ,~1.70]\,\mathrm{GeV/c^2}$ dimuon mass region excluding the contribution from the intermediate $φ(1020)$ meson, and in the region combining all dimuon-mass intervals. △ Less

Submitted 4 April, 2024; originally announced April 2024.

Comments: All figures and tables, along with any supplementary material and additional information, are available at https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2023-045.html

Report number: LHCb-PAPER-2023-045, CERN-EP-2024-065

arXiv:2404.03149 [pdf, other]

Design and Evaluation of a Compact 3D End-effector Assistive Robot for Adaptive Arm Support

Authors: Sibo Yang, Lincong Luo, Wei Chuan Law, Youlong Wang, Lei Li, Wei Tech Ang

Abstract: We developed a 3D end-effector type of upper limb assistive robot, named as Assistive Robotic Arm Extender (ARAE), that provides transparency movement and adaptive arm support control to achieve home-based therapy and training in the real environment. The proposed system composes five degrees of freedom, including three active motors and two passive joints at the end-effector module. The core stru… ▽ More We developed a 3D end-effector type of upper limb assistive robot, named as Assistive Robotic Arm Extender (ARAE), that provides transparency movement and adaptive arm support control to achieve home-based therapy and training in the real environment. The proposed system composes five degrees of freedom, including three active motors and two passive joints at the end-effector module. The core structure of the system is based on a parallel mechanism. The kinematic and dynamic modeling are illustrated in detail. The proposed adaptive arm support control framework calculates the compensated force based on the estimated human arm posture in 3D space. It firstly estimates human arm joint angles using two proposed methods: fixed torso and sagittal plane models without using external sensors such as IMUs, magnetic sensors, or depth cameras. The experiments were carried out to evaluate the performance of the two proposed angle estimation methods. Then, the estimated human joint angles were input into the human upper limb dynamics model to derive the required support force generated by the robot. The muscular activities were measured to evaluate the effects of the proposed framework. The obvious reduction of muscular activities was exhibited when participants were tested with the ARAE under an adaptive arm gravity compensation control framework. The overall results suggest that the ARAE system, when combined with the proposed control framework, has the potential to offer adaptive arm support. This integration could enable effective training with Activities of Daily Living (ADLs) and interaction with real environments. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: 11 pages

arXiv:2404.02760 [pdf, other]

doi 10.1073/pnas.2406884121

Gate-tunable subband degeneracy in semiconductor nanowires

Authors: Yuhao Wang, Wenyu Song, Zhan Cao, Zehao Yu, Shuai Yang, Zonglin Li, Yichun Gao, Ruidong Li, Fangting Chen, Zuhan Geng, Lining Yang, Jiaye Xu, Zhaoyu Wang, Shan Zhang, Xiao Feng, Tiantian Wang, Yunyi Zang, Lin Li, Runan Shang, Qi-Kun Xue, Dong E. Liu, Ke He, Hao Zhang

Abstract: Degeneracy and symmetry have a profound relation in quantum systems. Here, we report gate-tunable subband degeneracy in PbTe nanowires with a nearly symmetric cross-sectional shape. The degeneracy is revealed in electron transport by the absence of a quantized plateau. Utilizing a dual gate design, we can apply an electric field to lift the degeneracy, reflected as emergence of the plateau. This d… ▽ More Degeneracy and symmetry have a profound relation in quantum systems. Here, we report gate-tunable subband degeneracy in PbTe nanowires with a nearly symmetric cross-sectional shape. The degeneracy is revealed in electron transport by the absence of a quantized plateau. Utilizing a dual gate design, we can apply an electric field to lift the degeneracy, reflected as emergence of the plateau. This degeneracy and its tunable lifting were challenging to observe in previous nanowire experiments, possibly due to disorder. Numerical simulations can qualitatively capture our observation, shedding light on device parameters for future applications. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Journal ref: PNAS 121, e2406884121 (2024)

arXiv:2404.01954 [pdf, other]

HyperCLOVA X Technical Report

Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in developing their sovereign LLMs. △ Less

Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: 44 pages; updated authors list and fixed author names

arXiv:2404.01862 [pdf, other]

Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model

Authors: Xu He, Qiaochu Huang, Zhensong Zhang, Zhiwei Lin, Zhiyong Wu, Sicheng Yang, Minglei Li, Zhiyi Chen, Songcen Xu, Xiaofei Wu

Abstract: Co-speech gestures, if presented in the lively form of videos, can achieve superior visual effects in human-machine interaction. While previous works mostly generate structural human skeletons, resulting in the omission of appearance information, we focus on the direct generation of audio-driven co-speech gesture videos in this work. There are two main challenges: 1) A suitable motion feature is n… ▽ More Co-speech gestures, if presented in the lively form of videos, can achieve superior visual effects in human-machine interaction. While previous works mostly generate structural human skeletons, resulting in the omission of appearance information, we focus on the direct generation of audio-driven co-speech gesture videos in this work. There are two main challenges: 1) A suitable motion feature is needed to describe complex human movements with crucial appearance information. 2) Gestures and speech exhibit inherent dependencies and should be temporally aligned even of arbitrary length. To solve these problems, we present a novel motion-decoupled framework to generate co-speech gesture videos. Specifically, we first introduce a well-designed nonlinear TPS transformation to obtain latent motion features preserving essential appearance information. Then a transformer-based diffusion model is proposed to learn the temporal correlation between gestures and speech, and performs generation in the latent motion space, followed by an optimal motion selection module to produce long-term coherent and consistent gesture videos. For better visual perception, we further design a refinement network focusing on missing details of certain areas. Extensive experimental results show that our proposed framework significantly outperforms existing approaches in both motion and video-related evaluations. Our code, demos, and more resources are available at https://github.com/thuhcsi/S2G-MDDiffusion. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: 22 pages, 8 figures, CVPR 2024

arXiv:2404.01168 [pdf, other]

Mirror-3DGS: Incorporating Mirror Reflections into 3D Gaussian Splatting

Authors: Jiarui Meng, Haijie Li, Yanmin Wu, Qiankun Gao, Shuzhou Yang, Jian Zhang, Siwei Ma

Abstract: 3D Gaussian Splatting (3DGS) has marked a significant breakthrough in the realm of 3D scene reconstruction and novel view synthesis. However, 3DGS, much like its predecessor Neural Radiance Fields (NeRF), struggles to accurately model physical reflections, particularly in mirrors that are ubiquitous in real-world scenes. This oversight mistakenly perceives reflections as separate entities that phy… ▽ More 3D Gaussian Splatting (3DGS) has marked a significant breakthrough in the realm of 3D scene reconstruction and novel view synthesis. However, 3DGS, much like its predecessor Neural Radiance Fields (NeRF), struggles to accurately model physical reflections, particularly in mirrors that are ubiquitous in real-world scenes. This oversight mistakenly perceives reflections as separate entities that physically exist, resulting in inaccurate reconstructions and inconsistent reflective properties across varied viewpoints. To address this pivotal challenge, we introduce Mirror-3DGS, an innovative rendering framework devised to master the intricacies of mirror geometries and reflections, paving the way for the generation of realistically depicted mirror reflections. By ingeniously incorporating mirror attributes into the 3DGS and leveraging the principle of plane mirror imaging, Mirror-3DGS crafts a mirrored viewpoint to observe from behind the mirror, enriching the realism of scene renderings. Extensive assessments, spanning both synthetic and real-world scenes, showcase our method's ability to render novel views with enhanced fidelity in real-time, surpassing the state-of-the-art Mirror-NeRF specifically within the challenging mirror regions. Our code will be made publicly available for reproducible research. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: 22 pages, 7 figures

arXiv:2404.00979 [pdf, other]

PDF: A Probability-Driven Framework for Open World 3D Point Cloud Semantic Segmentation

Authors: Jinfeng Xu, Siyuan Yang, Xianzhi Li, Yuan Tang, Yixue Hao, Long Hu, Min Chen

Abstract: Existing point cloud semantic segmentation networks cannot identify unknown classes and update their knowledge, due to a closed-set and static perspective of the real world, which would induce the intelligent agent to make bad decisions. To address this problem, we propose a Probability-Driven Framework (PDF) for open world semantic segmentation that includes (i) a lightweight U-decoder branch to… ▽ More Existing point cloud semantic segmentation networks cannot identify unknown classes and update their knowledge, due to a closed-set and static perspective of the real world, which would induce the intelligent agent to make bad decisions. To address this problem, we propose a Probability-Driven Framework (PDF) for open world semantic segmentation that includes (i) a lightweight U-decoder branch to identify unknown classes by estimating the uncertainties, (ii) a flexible pseudo-labeling scheme to supply geometry features along with probability distribution features of unknown classes by generating pseudo labels, and (iii) an incremental knowledge distillation strategy to incorporate novel classes into the existing knowledge base gradually. Our framework enables the model to behave like human beings, which could recognize unknown objects and incrementally learn them with the corresponding knowledge. Experimental results on the S3DIS and ScanNetv2 datasets demonstrate that the proposed PDF outperforms other methods by a large margin in both important tasks of open world semantic segmentation. △ Less

Submitted 1 April, 2024; originally announced April 2024.

arXiv:2404.00706 [pdf, other]

Chiral Spin Textures Driven by Emergent Spin-Orbit Interaction: A Numerical Study

Authors: Shuai Yang, Zhiyu Dong, Yan Chen

Abstract: We explore numerically the intricate interplay between Berry phases in both real and momentum spaces within itinerant magnets. This interplay manifests as an emergent spin-orbit coupling, where charge carriers occupying a Berry-curved band generate an orbital magnetization, inducing a pseudo-magnetic field originating in chiral spin textures. Using density-matrix-renormalization-group techniques,… ▽ More We explore numerically the intricate interplay between Berry phases in both real and momentum spaces within itinerant magnets. This interplay manifests as an emergent spin-orbit coupling, where charge carriers occupying a Berry-curved band generate an orbital magnetization, inducing a pseudo-magnetic field originating in chiral spin textures. Using density-matrix-renormalization-group techniques, we demonstrate that switching on a band Berry curvature in a metallic ferromagnetic phase results in chiral magnetic textures. Furthermore, employing a two-leg strip geometry, we establish a connection between charge and spin chirality, further supporting this emergent spin-orbit interaction. △ Less

Submitted 9 April, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

arXiv:2404.00591 [pdf, other]

Task-Space Riccati Feedback based Whole Body Control for Underactuated Legged Locomotion

Authors: Shunpeng Yang, Zejun Hong, Sen Li, Patrick Wensing, Wei Zhang, Hua Chen

Abstract: This manuscript primarily aims to enhance the performance of whole-body controllers(WBC) for underactuated legged locomotion. We introduce a systematic parameter design mechanism for the floating-base feedback control within the WBC. The proposed approach involves utilizing the linearized model of unactuated dynamics to formulate a Linear Quadratic Regulator(LQR) and solving a Riccati gain while a… ▽ More This manuscript primarily aims to enhance the performance of whole-body controllers(WBC) for underactuated legged locomotion. We introduce a systematic parameter design mechanism for the floating-base feedback control within the WBC. The proposed approach involves utilizing the linearized model of unactuated dynamics to formulate a Linear Quadratic Regulator(LQR) and solving a Riccati gain while accounting for potential physical constraints through a second-order approximation of the log-barrier function. And then the user-tuned feedback gain for the floating base task is replaced by a new one constructed from the solved Riccati gain. Extensive simulations conducted in MuJoCo with a point bipedal robot, as well as real-world experiments performed on a quadruped robot, demonstrate the effectiveness of the proposed method. In the different bipedal locomotion tasks, compared with the user-tuned method, the proposed approach is at least 12% better and up to 50% better at linear velocity tracking, and at least 7% better and up to 47% better at angular velocity tracking. In the quadruped experiment, linear velocity tracking is improved by at least 3% and angular velocity tracking is improved by at least 23% using the proposed method. △ Less

Submitted 31 March, 2024; originally announced April 2024.

Comments: 6 pages, submitted to IROS 2024

arXiv:2404.00489 [pdf, other]

PROMPT-SAW: Leveraging Relation-Aware Graphs for Textual Prompt Compression

Authors: Muhammad Asif Ali, Zhengping Li, Shu Yang, Keyuan Cheng, Yang Cao, Tianhao Huang, Lijie Hu, Lu Yu, Di Wang

Abstract: Large language models (LLMs) have shown exceptional abilities for multiple different natural language processing tasks. While prompting is a crucial tool for LLM inference, we observe that there is a significant cost associated with exceedingly lengthy prompts. Existing attempts to compress lengthy prompts lead to sub-standard results in terms of readability and interpretability of the compressed… ▽ More Large language models (LLMs) have shown exceptional abilities for multiple different natural language processing tasks. While prompting is a crucial tool for LLM inference, we observe that there is a significant cost associated with exceedingly lengthy prompts. Existing attempts to compress lengthy prompts lead to sub-standard results in terms of readability and interpretability of the compressed prompt, with a detrimental impact on prompt utility. To address this, we propose PROMPT-SAW: Prompt compresSion via Relation AWare graphs, an effective strategy for prompt compression over task-agnostic and task-aware prompts. PROMPT-SAW uses the prompt's textual information to build a graph, later extracts key information elements in the graph to come up with the compressed prompt. We also propose GSM8K-AUG, i.e., an extended version of the existing GSM8k benchmark for task-agnostic prompts in order to provide a comprehensive evaluation platform. Experimental evaluation using benchmark datasets shows that prompts compressed by PROMPT-SAW are not only better in terms of readability, but they also outperform the best-performing baseline models by up to 14.3 and 13.7 respectively for task-aware and task-agnostic settings while compressing the original prompt text by 33.0 and 56.7. △ Less

Submitted 30 March, 2024; originally announced April 2024.

arXiv:2404.00486 [pdf, other]

Dialectical Alignment: Resolving the Tension of 3H and Security Threats of LLMs

Authors: Shu Yang, Jiayuan Su, Han Jiang, Mengdi Li, Keyuan Cheng, Muhammad Asif Ali, Lijie Hu, Di Wang

Abstract: With the rise of large language models (LLMs), ensuring they embody the principles of being helpful, honest, and harmless (3H), known as Human Alignment, becomes crucial. While existing alignment methods like RLHF, DPO, etc., effectively fine-tune LLMs to match preferences in the preference dataset, they often lead LLMs to highly receptive human input and external evidence, even when this informat… ▽ More With the rise of large language models (LLMs), ensuring they embody the principles of being helpful, honest, and harmless (3H), known as Human Alignment, becomes crucial. While existing alignment methods like RLHF, DPO, etc., effectively fine-tune LLMs to match preferences in the preference dataset, they often lead LLMs to highly receptive human input and external evidence, even when this information is poisoned. This leads to a tendency for LLMs to be Adaptive Chameleons when external evidence conflicts with their parametric memory. This exacerbates the risk of LLM being attacked by external poisoned data, which poses a significant security risk to LLM system applications such as Retrieval-augmented generation (RAG). To address the challenge, we propose a novel framework: Dialectical Alignment (DA), which (1) utilizes AI feedback to identify optimal strategies for LLMs to navigate inter-context conflicts and context-memory conflicts with different external evidence in context window (i.e., different ratios of poisoned factual contexts); (2) constructs the SFT dataset as well as the preference dataset based on the AI feedback and strategies above; (3) uses the above datasets for LLM alignment to defense poisoned context attack while preserving the effectiveness of in-context knowledge editing. Our experiments show that the dialectical alignment model improves poisoned data attack defense by 20 and does not require any additional prompt engineering or prior declaration of ``you may be attacked`` to the LLMs' context window. △ Less

Submitted 30 March, 2024; originally announced April 2024.

arXiv:2404.00236 [pdf, other]

Enhancing Content-based Recommendation via Large Language Model

Authors: Wentao Xu, Qianqian Xie, Shuo Yang, Jiangxia Cao, Shuchao Pang

Abstract: In real-world applications, users express different behaviors when they interact with different items, including implicit click/like interactions, and explicit comments/reviews interactions. Nevertheless, almost all recommender works are focused on how to describe user preferences by the implicit click/like interactions, to find the synergy of people. For the content-based explicit comments/review… ▽ More In real-world applications, users express different behaviors when they interact with different items, including implicit click/like interactions, and explicit comments/reviews interactions. Nevertheless, almost all recommender works are focused on how to describe user preferences by the implicit click/like interactions, to find the synergy of people. For the content-based explicit comments/reviews interactions, some works attempt to utilize them to mine the semantic knowledge to enhance recommender models. However, they still neglect the following two points: (1) The content semantic is a universal world knowledge; how do we extract the multi-aspect semantic information to empower different domains? (2) The user/item ID feature is a fundamental element for recommender models; how do we align the ID and content semantic feature space? In this paper, we propose a `plugin' semantic knowledge transferring method \textbf{LoID}, which includes two major components: (1) LoRA-based large language model pretraining to extract multi-aspect semantic information; (2) ID-based contrastive objective to align their feature spaces. We conduct extensive experiments with SOTA baselines on real-world datasets, the detailed results demonstrating significant improvements of our method LoID. △ Less

Submitted 29 March, 2024; originally announced April 2024.

Comments: Work in progress

arXiv:2404.00201 [pdf, other]

Angular analysis of $B \to K^* e^+ e^-$ in the low-$q^2$ region with new electron identification at Belle

Authors: Belle Collaboration, D. Ferlewicz, P. Urquijo, I. Adachi, K. Adamczyk, H. Aihara, D. M. Asner, H. Atmacan, R. Ayad, V. Babu, Sw. Banerjee, P. Behera, K. Belous, J. Bennett, M. Bessner, V. Bhardwaj, B. Bhuyan, T. Bilka, D. Biswas, D. Bodrov, M. Bračko, P. Branchini, T. E. Browder, A. Budano, M. Campajola , et al. (145 additional authors not shown)

Abstract: We perform an angular analysis of the $B\to K^* e^+ e^-$ decay for the dielectron mass squared, $q^2$, range of $0.0008$ to $1.1200 ~\text{GeV}^2 /c^4$ using the full Belle data set in the $K^{*0} \to K^+ π^-$ and $K^{*+} \to K_S^0 π^+$ channels, incorporating new methods of electron identification to improve the statistical power of the data set. This analysis is sensitive to contributions from r… ▽ More We perform an angular analysis of the $B\to K^* e^+ e^-$ decay for the dielectron mass squared, $q^2$, range of $0.0008$ to $1.1200 ~\text{GeV}^2 /c^4$ using the full Belle data set in the $K^{*0} \to K^+ π^-$ and $K^{*+} \to K_S^0 π^+$ channels, incorporating new methods of electron identification to improve the statistical power of the data set. This analysis is sensitive to contributions from right-handed currents from physics beyond the Standard Model by constraining the Wilson coefficients $\mathcal{C}_7^{(\prime)}$. We perform a fit to the $B\to K^* e^+ e^-$ differential decay rate and measure the imaginary component of the transversality amplitude to be $A_T^{\rm Im} = -1.27 \pm 0.52 \pm 0.12$, and the $K^*$ transverse asymmetry to be $A_T^{(2)} = 0.52 \pm 0.53 \pm 0.11$. The resulting constraints on the value of $\mathcal{C}_7^{\prime}$ are consistent with the Standard Model within a $2σ$ confidence interval. △ Less

Submitted 2 April, 2024; v1 submitted 29 March, 2024; originally announced April 2024.

Comments: Submitted to PRD

Report number: Belle preprint 2023-20, KEK preprint 2023-38

arXiv:2403.19942 [pdf, other]

Research on high-frequency quasi-periodic oscillations in generalized black-bounce spacetime

Authors: Jianbo Lu, Shining Yang, Yuying Zhang, Liu Yang, Mou Xu

Abstract: In order to solve problem of spacetime singularity in theoretical physics, researchers proposed the regular black holes (BH). The generalized black-bounce (GBB) spacetime, as a unified treatment of distinct kinds of geometries in the framework of general relativity (e.g. regular BH and wormholes), has been extensively studied. Firstly, we derive to give the explicit forms of Lagrangian for a nonli… ▽ More In order to solve problem of spacetime singularity in theoretical physics, researchers proposed the regular black holes (BH). The generalized black-bounce (GBB) spacetime, as a unified treatment of distinct kinds of geometries in the framework of general relativity (e.g. regular BH and wormholes), has been extensively studied. Firstly, we derive to give the explicit forms of Lagrangian for a nonlinear electromagnetic field and potential for a non-canonical phantom field in the action of gravitational system corresponding to GBB solution. Secondly, this paper computes the radius of the innermost stable circular orbit (ISCO) and the stable circular orbit region for different types of celestial bodies in GBB spacetime. The research suggests that traversable wormholes may have two ISCOs or one ISCO depending on the throat's scale, whereas regular BH and extremal BH possess only one ISCO. Thirdly, quasi-periodic oscillations (QPOs) have been found to be a reliable tool for testing gravitational theories. Therefore, we compute the radial and azimuthal epicyclic angular frequencies of particles oscillating on stable circular orbits around various celestial bodies and compare them with the oscillation frequency properties of schwarzschild BH. Moreover, due to the limited amount of research on the high-frequency quasi-periodic oscillations (HFQPOs) phenomenon and its generation mechanisms around particles near wormholes using observational data, this paper aims to study theoretical models that can simultaneously describe both BH and wormholes by fitting observational data. Using resonance models and associated frequency ratios, we are able to locate the resonances of different celestial bodies within the GBB spacetime. △ Less

Submitted 28 March, 2024; originally announced March 2024.

Comments: 20 pages, 8 figures

arXiv:2403.19902 [pdf, other]

Heterogeneous Network Based Contrastive Learning Method for PolSAR Land Cover Classification

Authors: Jianfeng Cai, Yue Ma, Zhixi Feng, Shuyuan Yang

Abstract: Polarimetric synthetic aperture radar (PolSAR) image interpretation is widely used in various fields. Recently, deep learning has made significant progress in PolSAR image classification. Supervised learning (SL) requires a large amount of labeled PolSAR data with high quality to achieve better performance, however, manually labeled data is insufficient. This causes the SL to fail into overfitting… ▽ More Polarimetric synthetic aperture radar (PolSAR) image interpretation is widely used in various fields. Recently, deep learning has made significant progress in PolSAR image classification. Supervised learning (SL) requires a large amount of labeled PolSAR data with high quality to achieve better performance, however, manually labeled data is insufficient. This causes the SL to fail into overfitting and degrades its generalization performance. Furthermore, the scattering confusion problem is also a significant challenge that attracts more attention. To solve these problems, this article proposes a Heterogeneous Network based Contrastive Learning method(HCLNet). It aims to learn high-level representation from unlabeled PolSAR data for few-shot classification according to multi-features and superpixels. Beyond the conventional CL, HCLNet introduces the heterogeneous architecture for the first time to utilize heterogeneous PolSAR features better. And it develops two easy-to-use plugins to narrow the domain gap between optics and PolSAR, including feature filter and superpixel-based instance discrimination, which the former is used to enhance the complementarity of multi-features, and the latter is used to increase the diversity of negative samples. Experiments demonstrate the superiority of HCLNet on three widely used PolSAR benchmark datasets compared with state-of-the-art methods. Ablation studies also verify the importance of each component. Besides, this work has implications for how to efficiently utilize the multi-features of PolSAR data to learn better high-level representation in CL and how to construct networks suitable for PolSAR data better. △ Less

Submitted 3 May, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.19460 [pdf, other]

RiEMann: Near Real-Time SE(3)-Equivariant Robot Manipulation without Point Cloud Segmentation

Authors: Chongkai Gao, Zhengrong Xue, Shuying Deng, Tianhai Liang, Siqi Yang, Lin Shao, Huazhe Xu

Abstract: We present RiEMann, an end-to-end near Real-time SE(3)-Equivariant Robot Manipulation imitation learning framework from scene point cloud input. Compared to previous methods that rely on descriptor field matching, RiEMann directly predicts the target poses of objects for manipulation without any object segmentation. RiEMann learns a manipulation task from scratch with 5 to 10 demonstrations, gener… ▽ More We present RiEMann, an end-to-end near Real-time SE(3)-Equivariant Robot Manipulation imitation learning framework from scene point cloud input. Compared to previous methods that rely on descriptor field matching, RiEMann directly predicts the target poses of objects for manipulation without any object segmentation. RiEMann learns a manipulation task from scratch with 5 to 10 demonstrations, generalizes to unseen SE(3) transformations and instances of target objects, resists visual interference of distracting objects, and follows the near real-time pose change of the target object. The scalable action space of RiEMann facilitates the addition of custom equivariant actions such as the direction of turning the faucet, which makes articulated object manipulation possible for RiEMann. In simulation and real-world 6-DOF robot manipulation experiments, we test RiEMann on 5 categories of manipulation tasks with a total of 25 variants and show that RiEMann outperforms baselines in both task success rates and SE(3) geodesic distance errors on predicted poses (reduced by 68.6%), and achieves a 5.4 frames per second (FPS) network inference speed. Code and video results are available at https://riemann-web.github.io/. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.19238 [pdf, other]

Taming Lookup Tables for Efficient Image Retouching

Authors: Sidi Yang, Binxiao Huang, Mingdeng Cao, Yatai Ji, Hanzhong Guo, Ngai Wong, Yujiu Yang

Abstract: The widespread use of high-definition screens in edge devices, such as end-user cameras, smartphones, and televisions, is spurring a significant demand for image enhancement. Existing enhancement models often optimize for high performance while falling short of reducing hardware inference time and power consumption, especially on edge devices with constrained computing and storage resources. To th… ▽ More The widespread use of high-definition screens in edge devices, such as end-user cameras, smartphones, and televisions, is spurring a significant demand for image enhancement. Existing enhancement models often optimize for high performance while falling short of reducing hardware inference time and power consumption, especially on edge devices with constrained computing and storage resources. To this end, we propose Image Color Enhancement Lookup Table (ICELUT) that adopts LUTs for extremely efficient edge inference, without any convolutional neural network (CNN). During training, we leverage pointwise (1x1) convolution to extract color information, alongside a split fully connected layer to incorporate global information. Both components are then seamlessly converted into LUTs for hardware-agnostic deployment. ICELUT achieves near-state-of-the-art performance and remarkably low power consumption. We observe that the pointwise network structure exhibits robust scalability, upkeeping the performance even with a heavily downsampled 32x32 input image. These enable ICELUT, the first-ever purely LUT-based image enhancer, to reach an unprecedented speed of 0.4ms on GPU and 7ms on CPU, at least one order faster than any CNN solution. Codes are available at https://github.com/Stephen0808/ICELUT. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.19112 [pdf, other]

Uncover the Premeditated Attacks: Detecting Exploitable Reentrancy Vulnerabilities by Identifying Attacker Contracts

Authors: Shuo Yang, Jiachi Chen, Mingyuan Huang, Zibin Zheng, Yuan Huang

Abstract: Reentrancy, a notorious vulnerability in smart contracts, has led to millions of dollars in financial loss. However, current smart contract vulnerability detection tools suffer from a high false positive rate in identifying contracts with reentrancy vulnerabilities. Moreover, only a small portion of the detected reentrant contracts can actually be exploited by hackers, making these tools less effe… ▽ More Reentrancy, a notorious vulnerability in smart contracts, has led to millions of dollars in financial loss. However, current smart contract vulnerability detection tools suffer from a high false positive rate in identifying contracts with reentrancy vulnerabilities. Moreover, only a small portion of the detected reentrant contracts can actually be exploited by hackers, making these tools less effective in securing the Ethereum ecosystem in practice. In this paper, we propose BlockWatchdog, a tool that focuses on detecting reentrancy vulnerabilities by identifying attacker contracts. These attacker contracts are deployed by hackers to exploit vulnerable contracts automatically. By focusing on attacker contracts, BlockWatchdog effectively detects truly exploitable reentrancy vulnerabilities by identifying reentrant call flow. Additionally, BlockWatchdog is capable of detecting new types of reentrancy vulnerabilities caused by poor designs when using ERC tokens or user-defined interfaces, which cannot be detected by current rule-based tools. We implement BlockWatchdog using cross-contract static dataflow techniques based on attack logic obtained from an empirical study that analyzes attacker contracts from 281 attack incidents. BlockWatchdog is evaluated on 421,889 Ethereum contract bytecodes and identifies 113 attacker contracts that target 159 victim contracts, leading to the theft of Ether and tokens valued at approximately 908.6 million USD. Notably, only 18 of the identified 159 victim contracts can be reported by current reentrancy detection tools. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: Accepted by ICSE 2024

arXiv:2403.19093 [pdf, other]

doi 10.1109/IROS55552.2023.10341360

Task2Morph: Differentiable Task-inspired Framework for Contact-Aware Robot Design

Authors: Yishuai Cai, Shaowu Yang, Minglong Li, Xinglin Chen, Yunxin Mao, Xiaodong Yi, Wenjing Yang

Abstract: Optimizing the morphologies and the controllers that adapt to various tasks is a critical issue in the field of robot design, aka. embodied intelligence. Previous works typically model it as a joint optimization problem and use search-based methods to find the optimal solution in the morphology space. However, they ignore the implicit knowledge of task-to-morphology mapping which can directly insp… ▽ More Optimizing the morphologies and the controllers that adapt to various tasks is a critical issue in the field of robot design, aka. embodied intelligence. Previous works typically model it as a joint optimization problem and use search-based methods to find the optimal solution in the morphology space. However, they ignore the implicit knowledge of task-to-morphology mapping which can directly inspire robot design. For example, flipping heavier boxes tends to require more muscular robot arms. This paper proposes a novel and general differentiable task-inspired framework for contact-aware robot design called Task2Morph. We abstract task features highly related to task performance and use them to build a task-to-morphology mapping. Further, we embed the mapping into a differentiable robot design process, where the gradient information is leveraged for both the mapping learning and the whole optimization. The experiments are conducted on three scenarios, and the results validate that Task2Morph outperforms DiffHand, which lacks a task-inspired morphology module, in terms of efficiency and effectiveness. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: 9 pages, 10 figures, published to IROS

Journal ref: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2023: 452-459

arXiv:2403.18776 [pdf, other]

doi 10.1364/OE.510670

Breaking the Limitations with Sparse Inputs by Variational Frameworks (BLIss) in Terahertz Super-Resolution 3D Reconstruction

Authors: Yiyao Zhang, Ke Chen, Shang-Hua Yang

Abstract: Data acquisition, image processing, and image quality are the long-lasting issues for terahertz (THz) 3D reconstructed imaging. Existing methods are primarily designed for 2D scenarios, given the challenges associated with obtaining super-resolution (SR) data and the absence of an efficient SR 3D reconstruction framework in conventional computed tomography (CT). Here, we demonstrate BLIss, a new a… ▽ More Data acquisition, image processing, and image quality are the long-lasting issues for terahertz (THz) 3D reconstructed imaging. Existing methods are primarily designed for 2D scenarios, given the challenges associated with obtaining super-resolution (SR) data and the absence of an efficient SR 3D reconstruction framework in conventional computed tomography (CT). Here, we demonstrate BLIss, a new approach for THz SR 3D reconstruction with sparse 2D data input. BLIss seamlessly integrates conventional CT techniques and variational framework with the core of the adapted Euler-Elastica-based model. The quantitative 3D image evaluation metrics, including the standard deviation of Gaussian, mean curvatures, and the multi-scale structural similarity index measure (MS-SSIM), validate the superior smoothness and fidelity achieved with our variational framework approach compared with conventional THz CT modal. Beyond its contributions to advancing THz SR 3D reconstruction, BLIss demonstrates potential applicability in other imaging modalities, such as X-ray and MRI. This suggests extensive impacts on the broader field of imaging applications. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: 15 pages, 7 figures. Supplemental Document: https://doi.org/10.6084/m9.figshare.24455206

Journal ref: Optics Express (OE) 2024

arXiv:2403.18354 [pdf, other]

Magnetic helicity evolution during active region emergence and subsequent flare productivity

Authors: Zheng Sun, Ting Li, Quan Wang, Shangbin Yang, Mei Zhang, Yajie Chen

Abstract: Aims. Solar active regions (ARs), which are formed by flux emergence, serve as the primary sources of solar eruptions. However, the specific physical mechanism that governs the emergence process and its relationship with flare productivity remains to be thoroughly understood. Methods. We examined 136 emerging ARs, focusing on the evolution of their magnetic helicity and magnetic energy during the… ▽ More Aims. Solar active regions (ARs), which are formed by flux emergence, serve as the primary sources of solar eruptions. However, the specific physical mechanism that governs the emergence process and its relationship with flare productivity remains to be thoroughly understood. Methods. We examined 136 emerging ARs, focusing on the evolution of their magnetic helicity and magnetic energy during the emergence phase. Based on the relation between helicity accumulation and magnetic flux evolution, we categorized the samples and investigated their flare productivity. Results. The emerging ARs we studied can be categorized into three types, Type-I, Type-II, and Type-III, and they account for 52.2%, 25%, and 22.8% of the total number in our sample, respectively. Type-I ARs exhibit a synchronous increase in both the magnetic flux and magnetic helicity, while the magnetic helicity in Type-II ARs displays a lag in increasing behind the magnetic flux. Type-III ARs show obvious helicity injections of opposite signs. Significantly, 90% of the flare-productive ARs (flare index > 6) were identified as Type-I ARs, suggesting that this type of AR has a higher potential to become flare productive. In contrast, Type-II and Type-III ARs exhibited a low and moderate likelihood of becoming active, respectively. Our statistical analysis also revealed that Type-I ARs accumulate more magnetic helicity and energy, far beyond what is found in Type-II and Type-III ARs. Moreover, we observed that flare-productive ARs consistently accumulate a significant amount of helicity and energy during their emergence phase. Conclusions. These findings provide valuable insight into the flux emergence phenomena, offering promising possibilities for early-stage predictions of solar eruptions. △ Less

Submitted 27 March, 2024; originally announced March 2024.

arXiv:2403.18348 [pdf, other]

Sequential Recommendation with Latent Relations based on Large Language Model

Authors: Shenghao Yang, Weizhi Ma, Peijie Sun, Qingyao Ai, Yiqun Liu, Mingchen Cai, Min Zhang

Abstract: Sequential recommender systems predict items that may interest users by modeling their preferences based on historical interactions. Traditional sequential recommendation methods rely on capturing implicit collaborative filtering signals among items. Recent relation-aware sequential recommendation models have achieved promising performance by explicitly incorporating item relations into the modeli… ▽ More Sequential recommender systems predict items that may interest users by modeling their preferences based on historical interactions. Traditional sequential recommendation methods rely on capturing implicit collaborative filtering signals among items. Recent relation-aware sequential recommendation models have achieved promising performance by explicitly incorporating item relations into the modeling of user historical sequences, where most relations are extracted from knowledge graphs. However, existing methods rely on manually predefined relations and suffer the sparsity issue, limiting the generalization ability in diverse scenarios with varied item relations. In this paper, we propose a novel relation-aware sequential recommendation framework with Latent Relation Discovery (LRD). Different from previous relation-aware models that rely on predefined rules, we propose to leverage the Large Language Model (LLM) to provide new types of relations and connections between items. The motivation is that LLM contains abundant world knowledge, which can be adopted to mine latent relations of items for recommendation. Specifically, inspired by that humans can describe relations between items using natural language, LRD harnesses the LLM that has demonstrated human-like knowledge to obtain language knowledge representations of items. These representations are fed into a latent relation discovery module based on the discrete state variational autoencoder (DVAE). Then the self-supervised relation discovery tasks and recommendation tasks are jointly optimized. Experimental results on multiple public datasets demonstrate our proposed latent relations discovery method can be incorporated with existing relation-aware sequential recommendation models and significantly improve the performance. Further analysis experiments indicate the effectiveness and reliability of the discovered latent relations. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: Accepted by SIGIR 2024

arXiv:2403.18325 [pdf, other]

Common Sense Enhanced Knowledge-based Recommendation with Large Language Model

Authors: Shenghao Yang, Weizhi Ma, Peijie Sun, Min Zhang, Qingyao Ai, Yiqun Liu, Mingchen Cai

Abstract: Knowledge-based recommendation models effectively alleviate the data sparsity issue leveraging the side information in the knowledge graph, and have achieved considerable performance. Nevertheless, the knowledge graphs used in previous work, namely metadata-based knowledge graphs, are usually constructed based on the attributes of items and co-occurring relations (e.g., also buy), in which the for… ▽ More Knowledge-based recommendation models effectively alleviate the data sparsity issue leveraging the side information in the knowledge graph, and have achieved considerable performance. Nevertheless, the knowledge graphs used in previous work, namely metadata-based knowledge graphs, are usually constructed based on the attributes of items and co-occurring relations (e.g., also buy), in which the former provides limited information and the latter relies on sufficient interaction data and still suffers from cold start issue. Common sense, as a form of knowledge with generality and universality, can be used as a supplement to the metadata-based knowledge graph and provides a new perspective for modeling users' preferences. Recently, benefiting from the emergent world knowledge of the large language model, efficient acquisition of common sense has become possible. In this paper, we propose a novel knowledge-based recommendation framework incorporating common sense, CSRec, which can be flexibly coupled to existing knowledge-based methods. Considering the challenge of the knowledge gap between the common sense-based knowledge graph and metadata-based knowledge graph, we propose a knowledge fusion approach based on mutual information maximization theory. Experimental results on public datasets demonstrate that our approach significantly improves the performance of existing knowledge-based recommendation models. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: Accepted by DASFAA 2024

arXiv:2403.18208 [pdf, other]

An Evolutionary Network Architecture Search Framework with Adaptive Multimodal Fusion for Hand Gesture Recognition

Authors: Yizhang Xia, Shihao Song, Zhanglu Hou, Junwen Xu, Juan Zou, Yuan Liu, Shengxiang Yang

Abstract: Hand gesture recognition (HGR) based on multimodal data has attracted considerable attention owing to its great potential in applications. Various manually designed multimodal deep networks have performed well in multimodal HGR (MHGR), but most of existing algorithms require a lot of expert experience and time-consuming manual trials. To address these issues, we propose an evolutionary network arc… ▽ More Hand gesture recognition (HGR) based on multimodal data has attracted considerable attention owing to its great potential in applications. Various manually designed multimodal deep networks have performed well in multimodal HGR (MHGR), but most of existing algorithms require a lot of expert experience and time-consuming manual trials. To address these issues, we propose an evolutionary network architecture search framework with the adaptive multimodel fusion (AMF-ENAS). Specifically, we design an encoding space that simultaneously considers fusion positions and ratios of the multimodal data, allowing for the automatic construction of multimodal networks with different architectures through decoding. Additionally, we consider three input streams corresponding to intra-modal surface electromyography (sEMG), intra-modal accelerometer (ACC), and inter-modal sEMG-ACC. To automatically adapt to various datasets, the ENAS framework is designed to automatically search a MHGR network with appropriate fusion positions and ratios. To the best of our knowledge, this is the first time that ENAS has been utilized in MHGR to tackle issues related to the fusion position and ratio of multimodal data. Experimental results demonstrate that AMF-ENAS achieves state-of-the-art performance on the Ninapro DB2, DB3, and DB7 datasets. △ Less