subscribe to arXiv mailings

Effective Motion Modeling for UAV-platform Multiple Object Tracking with Re-Margin Loss

Authors: Mufeng Yao, Jinlong Peng, Qingdong He, Bo Peng, Hao Chen, Mingmin Chi, Chao Liu, Jon Atli Benediktsson

Abstract: Multiple object tracking (MOT) from unmanned aerial vehicle (UAV) platforms requires efficient motion modeling. This is because UAV-MOT faces tracking difficulties caused by large and irregular motion, and insufficient training due to the motion long-tailed distribution of current UAV-MOT datasets. Previous UAV-MOT methods either extract motion and detection features redundantly or supervise motio… ▽ More Multiple object tracking (MOT) from unmanned aerial vehicle (UAV) platforms requires efficient motion modeling. This is because UAV-MOT faces tracking difficulties caused by large and irregular motion, and insufficient training due to the motion long-tailed distribution of current UAV-MOT datasets. Previous UAV-MOT methods either extract motion and detection features redundantly or supervise motion model in a sparse scheme, which limited their tracking performance and speed. To this end, we propose a flowing-by-detection module to realize accurate motion modeling with a minimum cost. Focusing on the motion long-tailed problem that were ignored by previous works, the flow-guided margin loss is designed to enable more complete training of large moving objects. Experiments on two widely open-source datasets show that our proposed model can successfully track objects with large and irregular motion and outperform existing state-of-the-art methods in UAV-MOT tasks. △ Less

Submitted 15 July, 2024; originally announced July 2024.

Comments: arXiv admin note: text overlap with arXiv:2308.07207

arXiv:2407.02073 [pdf, other]

Contribution Evaluation of Heterogeneous Participants in Federated Learning via Prototypical Representations

Authors: Qi Guo, Minghao Yao, Zhen Tian, Saiyu Qi, Yong Qi, Yun Lin, Jin Song Dong

Abstract: Contribution evaluation in federated learning (FL) has become a pivotal research area due to its applicability across various domains, such as detecting low-quality datasets, enhancing model robustness, and designing incentive mechanisms. Existing contribution evaluation methods, which primarily rely on data volume, model similarity, and auxiliary test datasets, have shown success in diverse scena… ▽ More Contribution evaluation in federated learning (FL) has become a pivotal research area due to its applicability across various domains, such as detecting low-quality datasets, enhancing model robustness, and designing incentive mechanisms. Existing contribution evaluation methods, which primarily rely on data volume, model similarity, and auxiliary test datasets, have shown success in diverse scenarios. However, their effectiveness often diminishes due to the heterogeneity of data distributions, presenting a significant challenge to their applicability. In response, this paper explores contribution evaluation in FL from an entirely new perspective of representation. In this work, we propose a new method for the contribution evaluation of heterogeneous participants in federated learning (FLCE), which introduces a novel indicator \emph{class contribution momentum} to conduct refined contribution evaluation. Our core idea is the construction and application of the class contribution momentum indicator from individual, relative, and holistic perspectives, thereby achieving an effective and efficient contribution evaluation of heterogeneous participants without relying on an auxiliary test dataset. Extensive experimental results demonstrate the superiority of our method in terms of fidelity, effectiveness, efficiency, and heterogeneity across various scenarios. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2406.08204 [pdf, other]

Diffusion-Promoted HDR Video Reconstruction

Authors: Yuanshen Guan, Ruikang Xu, Mingde Yao, Ruisheng Gao, Lizhi Wang, Zhiwei Xiong

Abstract: High dynamic range (HDR) video reconstruction aims to generate HDR videos from low dynamic range (LDR) frames captured with alternating exposures. Most existing works solely rely on the regression-based paradigm, leading to adverse effects such as ghosting artifacts and missing details in saturated regions. In this paper, we propose a diffusion-promoted method for HDR video reconstruction, termed… ▽ More High dynamic range (HDR) video reconstruction aims to generate HDR videos from low dynamic range (LDR) frames captured with alternating exposures. Most existing works solely rely on the regression-based paradigm, leading to adverse effects such as ghosting artifacts and missing details in saturated regions. In this paper, we propose a diffusion-promoted method for HDR video reconstruction, termed HDR-V-Diff, which incorporates a diffusion model to capture the HDR distribution. As such, HDR-V-Diff can reconstruct HDR videos with realistic details while alleviating ghosting artifacts. However, the direct introduction of video diffusion models would impose massive computational burden. Instead, to alleviate this burden, we first propose an HDR Latent Diffusion Model (HDR-LDM) to learn the distribution prior of single HDR frames. Specifically, HDR-LDM incorporates a tonemapping strategy to compress HDR frames into the latent space and a novel exposure embedding to aggregate the exposure information into the diffusion process. We then propose a Temporal-Consistent Alignment Module (TCAM) to learn the temporal information as a complement for HDR-LDM, which conducts coarse-to-fine feature alignment at different scales among video frames. Finally, we design a Zero-Init Cross-Attention (ZiCA) mechanism to effectively integrate the learned distribution prior and temporal information for generating HDR frames. Extensive experiments validate that HDR-V-Diff achieves state-of-the-art results on several representative datasets. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: Arxiv Preprint

arXiv:2406.01003 [pdf, other]

Uni-ISP: Unifying the Learning of ISPs from Multiple Cameras

Authors: Lingen Li, Mingde Yao, Xingyu Meng, Muquan Yu, Tianfan Xue, Jinwei Gu

Abstract: Modern end-to-end image signal processors (ISPs) can learn complex mappings from RAW/XYZ data to sRGB (or inverse), opening new possibilities in image processing. However, as the diversity of camera models continues to expand, developing and maintaining individual ISPs is not sustainable in the long term, which inherently lacks versatility, hindering the adaptability to multiple camera models. In… ▽ More Modern end-to-end image signal processors (ISPs) can learn complex mappings from RAW/XYZ data to sRGB (or inverse), opening new possibilities in image processing. However, as the diversity of camera models continues to expand, developing and maintaining individual ISPs is not sustainable in the long term, which inherently lacks versatility, hindering the adaptability to multiple camera models. In this paper, we propose a novel pipeline, Uni-ISP, which unifies the learning of ISPs from multiple cameras, offering an accurate and versatile processor to multiple camera models. The core of Uni-ISP is leveraging device-aware embeddings through learning inverse/forward ISPs and its special training scheme. By doing so, Uni-ISP not only improves the performance of inverse/forward ISPs but also unlocks a variety of new applications inaccessible to existing learned ISPs. Moreover, since there is no dataset synchronously captured by multiple cameras for training, we construct a real-world 4K dataset, FiveCam, comprising more than 2,400 pairs of sRGB-RAW images synchronously captured by five smartphones. We conducted extensive experiments demonstrating Uni-ISP's accuracy in inverse/forward ISPs (with improvements of +1.5dB/2.4dB PSNR), its versatility in enabling new applications, and its adaptability to new camera models. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2405.16466 [pdf, other]

High-Performance Temporal Reversible Spiking Neural Networks with $O(L)$ Training Memory and $O(1)$ Inference Cost

Authors: JiaKui Hu, Man Yao, Xuerui Qiu, Yuhong Chou, Yuxuan Cai, Ning Qiao, Yonghong Tian, Bo XU, Guoqi Li

Abstract: Multi-timestep simulation of brain-inspired Spiking Neural Networks (SNNs) boost memory requirements during training and increase inference energy cost. Current training methods cannot simultaneously solve both training and inference dilemmas. This work proposes a novel Temporal Reversible architecture for SNNs (T-RevSNN) to jointly address the training and inference challenges by altering the for… ▽ More Multi-timestep simulation of brain-inspired Spiking Neural Networks (SNNs) boost memory requirements during training and increase inference energy cost. Current training methods cannot simultaneously solve both training and inference dilemmas. This work proposes a novel Temporal Reversible architecture for SNNs (T-RevSNN) to jointly address the training and inference challenges by altering the forward propagation of SNNs. We turn off the temporal dynamics of most spiking neurons and design multi-level temporal reversible interactions at temporal turn-on spiking neurons, resulting in a $O(L)$ training memory. Combined with the temporal reversible nature, we redesign the input encoding and network organization of SNNs to achieve $O(1)$ inference energy cost. Then, we finely adjust the internal units and residual connections of the basic SNN block to ensure the effectiveness of sparse temporal information interaction. T-RevSNN achieves excellent accuracy on ImageNet, while the memory efficiency, training time acceleration, and inference energy efficiency can be significantly improved by $8.6 \times$, $2.0 \times$, and $1.6 \times$, respectively. This work is expected to break the technical bottleneck of significantly increasing memory cost and training time for large-scale SNNs while maintaining high performance and low inference energy cost. Source code and models are available at: https://github.com/BICLab/T-RevSNN. △ Less

Submitted 26 May, 2024; originally announced May 2024.

Comments: Accepted by ICML2024

arXiv:2405.15519 [pdf]

Confocal structured illumination microscopy

Authors: Weishuai Zhou, Manhong Yao, Xi Lin, Quan Yu, Junzheng Peng, Jingang Zhong

Abstract: Confocal microscopy, a critical advancement in optical imaging, is widely applied because of its excellent anti-noise ability. However, it has low imaging efficiency and can cause phototoxicity. Optical-sectioning structured illumination microscopy (OS-SIM) can overcome the limitations of confocal microscopy but still face challenges in imaging depth and signal-to-noise ratio (SNR). We introduce t… ▽ More Confocal microscopy, a critical advancement in optical imaging, is widely applied because of its excellent anti-noise ability. However, it has low imaging efficiency and can cause phototoxicity. Optical-sectioning structured illumination microscopy (OS-SIM) can overcome the limitations of confocal microscopy but still face challenges in imaging depth and signal-to-noise ratio (SNR). We introduce the concept of confocal imaging into OS-SIM and propose confocal structured illumination microscopy (CSIM) to enhance the imaging performance of OS-SIM. CSIM exploits the principle of dual photography to reconstruct a dual image from each pixel of the camera. The reconstructed dual image is equivalent to the image obtained by using the spatial light modulator (SLM) as a virtual camera, enabling the separation of the conjugate and non-conjugate signals recorded by the camera pixel. We can reject the non-conjugate signals by extracting the conjugate signal from each dual image to reconstruct a confocal image when establishing the conjugate relationship between the camera and the SLM. We have constructed the theoretical framework of CSIM. Optical-sectioning experimental results demonstrate that CSIM can reconstruct images with superior SNR and greater imaging depth compared with existing OS-SIM. CSIM is expected to expand the application scope of OS-SIM. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.14839 [pdf, other]

A Textbook Remedy for Domain Shifts: Knowledge Priors for Medical Image Analysis

Authors: Yue Yang, Mona Gandhi, Yufei Wang, Yifan Wu, Michael S. Yao, Chris Callison-Burch, James C. Gee, Mark Yatskar

Abstract: While deep networks have achieved broad success in analyzing natural images, when applied to medical scans, they often fail in unexcepted situations. We investigate this challenge and focus on model sensitivity to domain shifts, such as data sampled from different hospitals or data confounded by demographic variables such as sex, race, etc, in the context of chest X-rays and skin lesion images. A… ▽ More While deep networks have achieved broad success in analyzing natural images, when applied to medical scans, they often fail in unexcepted situations. We investigate this challenge and focus on model sensitivity to domain shifts, such as data sampled from different hospitals or data confounded by demographic variables such as sex, race, etc, in the context of chest X-rays and skin lesion images. A key finding we show empirically is that existing visual backbones lack an appropriate prior from the architecture for reliable generalization in these settings. Taking inspiration from medical training, we propose giving deep networks a prior grounded in explicit medical knowledge communicated in natural language. To this end, we introduce Knowledge-enhanced Bottlenecks (KnoBo), a class of concept bottleneck models that incorporates knowledge priors that constrain it to reason with clinically relevant factors found in medical textbooks or PubMed. KnoBo uses retrieval-augmented language models to design an appropriate concept space paired with an automatic training procedure for recognizing the concept. We evaluate different resources of knowledge and recognition architectures on a broad range of domain shifts across 20 datasets. In our comprehensive evaluation with two imaging modalities, KnoBo outperforms fine-tuned models on confounded datasets by 32.4% on average. Finally, evaluations reveal that PubMed is a promising resource for making medical models less sensitive to domain shift, outperforming other resources on both diversity of information and final prediction performance. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: 23 pages, 9 figures, 12 tables, project page: https://yueyang1996.github.io/knobo/

arXiv:2405.10987 [pdf, other]

Manifold-based Incomplete Multi-view Clustering via Bi-Consistency Guidance

Authors: Huibing Wang, Mingze Yao, Yawei Chen, Yunqiu Xu, Haipeng Liu, Wei Jia, Xianping Fu, Yang Wang

Abstract: Incomplete multi-view clustering primarily focuses on dividing unlabeled data into corresponding categories with missing instances, and has received intensive attention due to its superiority in real applications. Considering the influence of incomplete data, the existing methods mostly attempt to recover data by adding extra terms. However, for the unsupervised methods, a simple recovery strategy… ▽ More Incomplete multi-view clustering primarily focuses on dividing unlabeled data into corresponding categories with missing instances, and has received intensive attention due to its superiority in real applications. Considering the influence of incomplete data, the existing methods mostly attempt to recover data by adding extra terms. However, for the unsupervised methods, a simple recovery strategy will cause errors and outlying value accumulations, which will affect the performance of the methods. Broadly, the previous methods have not taken the effectiveness of recovered instances into consideration, or cannot flexibly balance the discrepancies between recovered data and original data. To address these problems, we propose a novel method termed Manifold-based Incomplete Multi-view clustering via Bi-consistency guidance (MIMB), which flexibly recovers incomplete data among various views, and attempts to achieve biconsistency guidance via reverse regularization. In particular, MIMB adds reconstruction terms to representation learning by recovering missing instances, which dynamically examines the latent consensus representation. Moreover, to preserve the consistency information among multiple views, MIMB implements a biconsistency guidance strategy with reverse regularization of the consensus representation and proposes a manifold embedding measure for exploring the hidden structure of the recovered data. Notably, MIMB aims to balance the importance of different views, and introduces an adaptive weight term for each view. Finally, an optimization algorithm with an alternating iteration optimization strategy is designed for final clustering. Extensive experimental results on 6 benchmark datasets are provided to confirm that MIMB can significantly obtain superior results as compared with several state-of-the-art baselines. △ Less

Submitted 16 May, 2024; originally announced May 2024.

arXiv:2405.07704 [pdf, other]

Shell structure and shape transition in odd-$Z$ superheavy nuclei with proton numbers $Z=117, 119$: insights from deformed relativistic Hartree-Bogoliubov in continuum

Authors: Y. X. Zhang, B. R. Liu, K. Y. Zhang, J. M. Yao

Abstract: We present a systematic study on the structural properties of odd-$Z$ superheavy nuclei with proton numbers $Z=117, 119$, and neutron numbers $N$ increasing from $N=170$ to the neutron dripline within the framework of axially deformed relativistic Hartree-Bogoliubov theory in continuum (DRHBc). The results are compared with those of even-even superheavy nuclei with proton numbers $Z=118$ and… ▽ More We present a systematic study on the structural properties of odd-$Z$ superheavy nuclei with proton numbers $Z=117, 119$, and neutron numbers $N$ increasing from $N=170$ to the neutron dripline within the framework of axially deformed relativistic Hartree-Bogoliubov theory in continuum (DRHBc). The results are compared with those of even-even superheavy nuclei with proton numbers $Z=118$ and $120$. We analyze various bulk properties of their ground states, including binding energies, quadrupole deformations, root-mean-square radii, nucleon separation energies, and $α$-decay energies. The coexistence of competing prolate and oblate or spherical shapes leads to abrupt changes in both quadrupole deformations and charge radii as functions of neutron numbers. Compared to even-even nuclei, the odd-mass ones exhibit a more complicated transition picture, in which the quantum numbers of $K^π$ of the lowest-energy configuration may change with deformation. This may result in the change of angular momentum in the ground-state to ground-state $α$-decay and thus quench the decay rate in odd-mass nuclei. Moreover, our results demonstrate a pronounced proton shell gap at $Z=120$, instead of $Z=114$, which is consistent with the predictions of most covariant density functional theories. Moreover, large neutron shell gaps are found at $N=172$ and $N=258$ in the four isotopic chains, as well as at $N=184$ in the light two isotopic chains with $Z=117$ and $Z=118$, attributed to the nearly-degenerate $3d$ and $4p$ spin-orbit doublet states due to the presence of bubble structure. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 15 pages with 18 figures

arXiv:2404.19534 [pdf, other]

MIPI 2024 Challenge on Nighttime Flare Removal: Methods and Results

Authors: Yuekun Dai, Dafeng Zhang, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Peiqing Yang, Zhezhu Jin, Guanqun Liu, Chen Change Loy, Lize Zhang, Shuai Liu, Chaoyu Feng, Luyang Wang, Shuan Chen, Guangqi Shao, Xiaotao Wang, Lei Lei, Qirui Yang, Qihua Cheng, Zhiqiang Xu, Yihao Liu, Huanjing Yue, Jingyu Yang , et al. (38 additional authors not shown)

Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging (MIPI). Building on the achievements of the previous MIPI Workshops held at ECCV 2022 and CVPR 2023, we introduce our third MIPI challenge including three tracks focusing on novel image sensors and imaging algorithms. In this paper, we summarize and review the Nighttime Flare Removal track on MIPI 2024. In total, 170 participants were successfully registered, and 14 teams submitted results in the final testing phase. The developed solutions in this challenge achieved state-of-the-art performance on Nighttime Flare Removal. More details of this challenge and the link to the dataset can be found at https://mipi-challenge.org/MIPI2024/. △ Less

Submitted 27 May, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

Comments: CVPR 2024 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Nighttime Flare Removal Challenge Report. Website: https://mipi-challenge.org/MIPI2024/

arXiv:2404.15244 [pdf, other]

Efficient Transformer Encoders for Mask2Former-style models

Authors: Manyi Yao, Abhishek Aich, Yumin Suh, Amit Roy-Chowdhury, Christian Shelton, Manmohan Chandraker

Abstract: Vision transformer based models bring significant improvements for image segmentation tasks. Although these architectures offer powerful capabilities irrespective of specific segmentation tasks, their use of computational resources can be taxing on deployed devices. One way to overcome this challenge is by adapting the computation level to the specific needs of the input image rather than the curr… ▽ More Vision transformer based models bring significant improvements for image segmentation tasks. Although these architectures offer powerful capabilities irrespective of specific segmentation tasks, their use of computational resources can be taxing on deployed devices. One way to overcome this challenge is by adapting the computation level to the specific needs of the input image rather than the current one-size-fits-all approach. To this end, we introduce ECO-M2F or EffiCient TransfOrmer Encoders for Mask2Former-style models. Noting that the encoder module of M2F-style models incur high resource-intensive computations, ECO-M2F provides a strategy to self-select the number of hidden layers in the encoder, conditioned on the input image. To enable this self-selection ability for providing a balance between performance and computational efficiency, we present a three step recipe. The first step is to train the parent architecture to enable early exiting from the encoder. The second step is to create an derived dataset of the ideal number of encoder layers required for each training example. The third step is to use the aforementioned derived dataset to train a gating network that predicts the number of encoder layers to be used, conditioned on the input image. Additionally, to change the computational-accuracy tradeoff, only steps two and three need to be repeated which significantly reduces retraining time. Experiments on the public datasets show that the proposed approach reduces expected encoder computational cost while maintaining performance, adapts to various user compute resources, is flexible in architecture configurations, and can be extended beyond the segmentation task to object detection. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2404.08581 [pdf, other]

Emulating generator coordinate method with extended eigenvector continuation: Lipkin-Meshkov-Glick model

Authors: Q. Y. Luo, X. Zhang, L. H. Chen, J. M. Yao

Abstract: We present a benchmark study of generator coordinate method (GCM) combined with eigenvector continuation (EC) in two different schemes for the low-lying states of Lipkin-Meshkov-Glick (LMG) model, where the interaction strength is treated as a controlling parameter, simulating quantum many-body systems with the phase transition from non-collective to collective states. We demonstrate that the EC… ▽ More We present a benchmark study of generator coordinate method (GCM) combined with eigenvector continuation (EC) in two different schemes for the low-lying states of Lipkin-Meshkov-Glick (LMG) model, where the interaction strength is treated as a controlling parameter, simulating quantum many-body systems with the phase transition from non-collective to collective states. We demonstrate that the EC$_{\rm kmax}$ scheme accurately reproduces the low-lying states of the LMG model. In this scheme, the EC basis consists of the wave functions of low-lying states up to the $k_{\rm max}$-th state of sampling Hamiltonians. Compared to EC$_1$, which only includes the wave functions of the $k$-th state of sampling Hamiltonians for the $k$-th state of a target Hamiltonian, the EC$_{\rm kmax}$ scheme exhibits significantly improved efficiency and accuracy. This study suggests the potential utilization of the extended EC scheme as an efficient emulator for GCM calculations. △ Less

Submitted 12 April, 2024; originally announced April 2024.

Comments: 9 pages with 12 figures

arXiv:2404.06015 [pdf, other]

Passive None-line-of-sight imaging with arbitrary scene condition and detection pattern in small amount of prior data

Authors: Yunting Gui, Yuegang Fu, Xueming Xiao, Meibao Yao

Abstract: Passive Non-Line-of-Sight (NLOS) imaging requires to reconstruct objects which cannot be seen in line without using external controllable light sources. It can be widely applied in areas like counter-terrorism, urban-Warfare, autonomous-driving and robot-vision. Existing methods for passive NLOS typically required extensive prior information and significant computational resources to establish lig… ▽ More Passive Non-Line-of-Sight (NLOS) imaging requires to reconstruct objects which cannot be seen in line without using external controllable light sources. It can be widely applied in areas like counter-terrorism, urban-Warfare, autonomous-driving and robot-vision. Existing methods for passive NLOS typically required extensive prior information and significant computational resources to establish light transport matrices or train neural networks. These constraints pose significant challenges for transitioning models to different NLOS scenarios. Thus, the pressing issue in passive NLOS imaging currently lies in whether it is possible to estimate the light transport matrices which corresponding to relay surfaces and scenes, as well as the specific distribution of targets, with a small amount of prior knowledge. In this work, we hypothesized a high-dimensional manifold and mathematically proved its existence. Within this high-dimensional manifold, the structural information of obscured targets is minimally disrupted. Therefore, we proposed a universal framework named High-Dimensional Projection Selection (HDPS) which can establish this high-dimensional manifold and output its projection onto corresponding surfaces on low-dimensional. HDPS can be applied to most mature network architectures and estimate the distribution of target and light spot obtained by camera with only minimal prior data. Certainly, with the help of the estimated information, it can establish a high-dimensional manifold consisting of target and input. As demonstrated in experiment, our framework, even when applied to the most basic network structures, can achieve higher accuracy results with significantly smaller amounts of prior data. Thereby, our approach enables passive NLOS scenarios to reconstruct target by limited prior data and computational resources. △ Less

Submitted 9 April, 2024; originally announced April 2024.

arXiv:2404.03663 [pdf, other]

Spike-driven Transformer V2: Meta Spiking Neural Network Architecture Inspiring the Design of Next-generation Neuromorphic Chips

Authors: Man Yao, Jiakui Hu, Tianxiang Hu, Yifan Xu, Zhaokun Zhou, Yonghong Tian, Bo Xu, Guoqi Li

Abstract: Neuromorphic computing, which exploits Spiking Neural Networks (SNNs) on neuromorphic chips, is a promising energy-efficient alternative to traditional AI. CNN-based SNNs are the current mainstream of neuromorphic computing. By contrast, no neuromorphic chips are designed especially for Transformer-based SNNs, which have just emerged, and their performance is only on par with CNN-based SNNs, offer… ▽ More Neuromorphic computing, which exploits Spiking Neural Networks (SNNs) on neuromorphic chips, is a promising energy-efficient alternative to traditional AI. CNN-based SNNs are the current mainstream of neuromorphic computing. By contrast, no neuromorphic chips are designed especially for Transformer-based SNNs, which have just emerged, and their performance is only on par with CNN-based SNNs, offering no distinct advantage. In this work, we propose a general Transformer-based SNN architecture, termed as ``Meta-SpikeFormer", whose goals are: 1) Lower-power, supports the spike-driven paradigm that there is only sparse addition in the network; 2) Versatility, handles various vision tasks; 3) High-performance, shows overwhelming performance advantages over CNN-based SNNs; 4) Meta-architecture, provides inspiration for future next-generation Transformer-based neuromorphic chip designs. Specifically, we extend the Spike-driven Transformer in \citet{yao2023spike} into a meta architecture, and explore the impact of structure, spike-driven self-attention, and skip connection on its performance. On ImageNet-1K, Meta-SpikeFormer achieves 80.0\% top-1 accuracy (55M), surpassing the current state-of-the-art (SOTA) SNN baselines (66M) by 3.7\%. This is the first direct training SNN backbone that can simultaneously supports classification, detection, and segmentation, obtaining SOTA results in SNNs. Finally, we discuss the inspiration of the meta SNN architecture for neuromorphic chip design. Source code and models are available at \url{https://github.com/BICLab/Spike-Driven-Transformer-V2}. △ Less

Submitted 15 February, 2024; originally announced April 2024.

Comments: Accepted by ICLR2024. Code and Model: https://github.com/BICLab/Spike-Driven-Transformer-V2

arXiv:2404.00714 [pdf, other]

Neural Radiance Field-based Visual Rendering: A Comprehensive Review

Authors: Mingyuan Yao, Yukang Huo, Yang Ran, Qingbin Tian, Ruifeng Wang, Haihua Wang

Abstract: In recent years, Neural Radiance Fields (NeRF) has made remarkable progress in the field of computer vision and graphics, providing strong technical support for solving key tasks including 3D scene understanding, new perspective synthesis, human body reconstruction, robotics, and so on, the attention of academics to this research result is growing. As a revolutionary neural implicit field represen… ▽ More In recent years, Neural Radiance Fields (NeRF) has made remarkable progress in the field of computer vision and graphics, providing strong technical support for solving key tasks including 3D scene understanding, new perspective synthesis, human body reconstruction, robotics, and so on, the attention of academics to this research result is growing. As a revolutionary neural implicit field representation, NeRF has caused a continuous research boom in the academic community. Therefore, the purpose of this review is to provide an in-depth analysis of the research literature on NeRF within the past two years, to provide a comprehensive academic perspective for budding researchers. In this paper, the core architecture of NeRF is first elaborated in detail, followed by a discussion of various improvement strategies for NeRF, and case studies of NeRF in diverse application scenarios, demonstrating its practical utility in different domains. In terms of datasets and evaluation metrics, This paper details the key resources needed for NeRF model training. Finally, this paper provides a prospective discussion on the future development trends and potential challenges of NeRF, aiming to provide research inspiration for researchers in the field and to promote the further development of related technologies. △ Less

Submitted 31 March, 2024; originally announced April 2024.

Comments: 35 pages, 22 figures, 14 tables, 18 formulas

arXiv:2404.00142 [pdf, other]

Loss resilience of driven-dissipative remote entanglement in chiral waveguide quantum electrodynamics

Authors: Abdullah Irfan, Mingxing Yao, Andrew Lingenfelter, Xi Cao, Aashish A. Clerk, Wolfgang Pfaff

Abstract: Establishing limits of entanglement in open quantum systems is a problem of fundamental interest, with strong implications for applications in quantum information science. Here, we study limits of entanglement stabilization between remote qubits. We theoretically investigate the loss resilience of driven-dissipative entanglement between remote qubits coupled to a chiral waveguide. We find that by… ▽ More Establishing limits of entanglement in open quantum systems is a problem of fundamental interest, with strong implications for applications in quantum information science. Here, we study limits of entanglement stabilization between remote qubits. We theoretically investigate the loss resilience of driven-dissipative entanglement between remote qubits coupled to a chiral waveguide. We find that by coupling a pair of storage qubits to the two driven qubits, the steady state can be tailored such that the storage qubits show a degree of entanglement that is higher than what can be achieved with only two driven qubits coupled to the waveguide. By reducing the degree of entanglement of the driven qubits, we show that the entanglement between the storage qubits becomes more resilient to waveguide loss. Our analytical and numerical results offer insights into how waveguide loss limits the degree of entanglement in this driven-dissipative system, and offers important guidance for remote entanglement stabilization in the laboratory, for example using superconducting circuits. △ Less

Submitted 29 March, 2024; originally announced April 2024.

Comments: 12 pages, 4 figures

arXiv:2403.17722 [pdf, other]

Nuclear matrix elements of neutrinoless double-beta decay in covariant density functional theory with different mechanisms

Authors: C. R. Ding, Gang Li, J. M. Yao

Abstract: Nuclear matrix elements (NMEs) for neutrinoless double-beta ($0νββ$) decay in candidate nuclei play a crucial role in interpreting results from current experiments and in designing future ones. Accurate NME values serve as important nuclear inputs for constraining parameters in new physics, such as neutrino mass and the Wilson coefficients of lepton-number-violating (LNV) operators. In this study,… ▽ More Nuclear matrix elements (NMEs) for neutrinoless double-beta ($0νββ$) decay in candidate nuclei play a crucial role in interpreting results from current experiments and in designing future ones. Accurate NME values serve as important nuclear inputs for constraining parameters in new physics, such as neutrino mass and the Wilson coefficients of lepton-number-violating (LNV) operators. In this study, we present a comprehensive calculation of NMEs for $0νββ$ decay in $^{76}$Ge, $^{82}$Se, $^{100}$Mo, $^{130}$Te, and $^{136}$Xe, using nuclear wave functions obtained from multi-reference covariant density functional theory (MR-CDFT). We employ three types of transition potentials at the leading order in chiral effective field theory. Our results, along with recent data, are utilized to constrain the coefficients of LNV operators. The results demonstrate that the combined NMEs based on the Feynman diagrams at the hadronic scale for the nonstandard mechanisms leads to the uncertainty by different nuclear models comparable to that for the standard mechanism. The use of NMEs from various nuclear models does not dramatically change the parameter space intervals for the coefficients, although MR-CDFT yields the most stringent constraint. Furthermore, our NMEs can also be used to perform a more comprehensive analysis with multiple isotopes. △ Less

Submitted 10 June, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

Comments: 9 pages with 7 figures and 1 table

arXiv:2403.15919 [pdf, other]

Negotiating the Shared Agency between Humans & AI in the Recommender System

Authors: Mengke Wu, Weizi Liu, Yanyun Wang, Mike Yao

Abstract: Smart recommendation algorithms have revolutionized information dissemination, enhancing efficiency and reshaping content delivery across various domains. However, concerns about user agency have arisen due to the inherent opacity (information asymmetry) and the nature of one-way output (power asymmetry) on algorithms. While both issues have been criticized by scholars via advocating explainable A… ▽ More Smart recommendation algorithms have revolutionized information dissemination, enhancing efficiency and reshaping content delivery across various domains. However, concerns about user agency have arisen due to the inherent opacity (information asymmetry) and the nature of one-way output (power asymmetry) on algorithms. While both issues have been criticized by scholars via advocating explainable AI (XAI) and human-AI collaborative decision-making (HACD), few research evaluates their integrated effects on users, and few HACD discussions in recommender systems beyond improving and filtering the results. This study proposes an incubating idea as a missing step in HACD that allows users to control the degrees of AI-recommended content. Then, we integrate it with existing XAI to a flow prototype aimed at assessing the enhancement of user agency. We seek to understand how types of agency impact user perception and experience, and bring empirical evidence to refine the guidelines and designs for human-AI interactive systems. △ Less

Submitted 19 April, 2024; v1 submitted 23 March, 2024; originally announced March 2024.

arXiv:2403.05606 [pdf, other]

A Concept-based Interpretable Model for the Diagnosis of Choroid Neoplasias using Multimodal Data

Authors: Yifan Wu, Yang Liu, Yue Yang, Michael S. Yao, Wenli Yang, Xuehui Shi, Lihong Yang, Dongjun Li, Yueming Liu, James C. Gee, Xuan Yang, Wenbin Wei, Shi Gu

Abstract: Diagnosing rare diseases presents a common challenge in clinical practice, necessitating the expertise of specialists for accurate identification. The advent of machine learning offers a promising solution, while the development of such technologies is hindered by the scarcity of data on rare conditions and the demand for models that are both interpretable and trustworthy in a clinical context. In… ▽ More Diagnosing rare diseases presents a common challenge in clinical practice, necessitating the expertise of specialists for accurate identification. The advent of machine learning offers a promising solution, while the development of such technologies is hindered by the scarcity of data on rare conditions and the demand for models that are both interpretable and trustworthy in a clinical context. Interpretable AI, with its capacity for human-readable outputs, can facilitate validation by clinicians and contribute to medical education. In the current work, we focus on choroid neoplasias, the most prevalent form of eye cancer in adults, albeit rare with 5.1 per million. We built the so-far largest dataset consisting of 750 patients, incorporating three distinct imaging modalities collected from 2004 to 2022. Our work introduces a concept-based interpretable model that distinguishes between three types of choroidal tumors, integrating insights from domain experts via radiological reports. Remarkably, this model not only achieves an F1 score of 0.91, rivaling that of black-box models, but also boosts the diagnostic accuracy of junior doctors by 42%. This study highlights the significant potential of interpretable machine learning in improving the diagnosis of rare diseases, laying a groundwork for future breakthroughs in medical AI that could tackle a wider array of complex health scenarios. △ Less

Submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.05283 [pdf]

Closely piling up of multiple adhesive fronts in adhesive friction due to re-attachment

Authors: Puyu Cao, Meicheng Yao, Bin Chen

Abstract: To understand why the adhesive frictional force was in linear proportion to the real contact area in experiments, we investigate the adhesive friction generated by sliding elastic solids adhered to a rigid surface via multiple adhesive springs. Our results indicate that the shear-off force of the interface increases with the energetically guided re-attachment rate of adhesive springs, reaching sat… ▽ More To understand why the adhesive frictional force was in linear proportion to the real contact area in experiments, we investigate the adhesive friction generated by sliding elastic solids adhered to a rigid surface via multiple adhesive springs. Our results indicate that the shear-off force of the interface increases with the energetically guided re-attachment rate of adhesive springs, reaching saturation at high re-attachment rates. Remarkably, this shear-off force can surpass the predictions made by the fracture theory. By plotting the adhesive forces along the interface, we observe substantial high adhesive forces distributed throughout the interface, based on which we identify multiple adhesive fronts closely piling up along the interface. These regions can exhibit similar force profiles, and their number appears to increase with the size of the interface, leading to a linear increase in the calculated shear-off force with the size of the interface. We then suggest that multiple adhesive fronts closely pile up to back up each other in adhesive friction due to re-attachments, which may provide profound insights into understanding the observed phenomena associated with adhesive friction along an interface. △ Less

Submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.03324 [pdf]

Observation of Chiral Surface State in Superconducting NbGe$_2$

Authors: Mengyu Yao, Martin Gutierrez-Amigo, Subhajit Roychowdhury, Ion Errea, Alexander Fedorov, Vladimir N. Strocov, Maia G. Vergniory, Claudia Felser

Abstract: The interplay between topology and superconductivity in quantum materials harbors rich physics ripe for discovery. In this study, we investigate the topological properties and superconductivity of the nonsymmorphic chiral superconductor NbGe$_2$ using high-resolution angle-resolved pho-toemission spectroscopy (ARPES), transport measurements, and ab initio calculations. The ARPES data revealed exot… ▽ More The interplay between topology and superconductivity in quantum materials harbors rich physics ripe for discovery. In this study, we investigate the topological properties and superconductivity of the nonsymmorphic chiral superconductor NbGe$_2$ using high-resolution angle-resolved pho-toemission spectroscopy (ARPES), transport measurements, and ab initio calculations. The ARPES data revealed exotic chiral surface states on the (100) surface originating from the inherent chiral crystal structure. Supporting calculations indicate that NbGe$_2$ likely hosts elusive Weyl fermions in its bulk electronic structure. Furthermore, we uncovered the signatures of van Hove singularities that can enhance many-body interactions. Additionally, transport measurements demonstrated that NbGe$_2$ exhibits superconductivity below 2K. Overall, our comprehensive results provide the first concrete evidence that NbGe$_2$ is a promising platform for investigating the interplay between non-trivial band topology, possible Weyl fermions, van Hove singularities, and superconductivity in chiral quantum materials. △ Less

Submitted 4 April, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

arXiv:2403.01177 [pdf, other]

Quantum-number projected generator coordinate method for $^{21}$Ne with a chiral two-nucleon-plus-three-nucleon interaction

Authors: W. Lin, E. F. Zhou, J. M. Yao, H. Hergert

Abstract: We report a study of the low-lying states of deformed $^{21}$Ne within the framework of quantum-number projected generator coordinate method (PGCM), starting from a chiral two-nucleon-plus-three-nucleon (NN+3N) interaction. The wave functions of states are constructed as a linear combination of a set of axially-deformed Hartree-Fock-Bogliubov (HFB) wave functions with different quadrupole deformat… ▽ More We report a study of the low-lying states of deformed $^{21}$Ne within the framework of quantum-number projected generator coordinate method (PGCM), starting from a chiral two-nucleon-plus-three-nucleon (NN+3N) interaction. The wave functions of states are constructed as a linear combination of a set of axially-deformed Hartree-Fock-Bogliubov (HFB) wave functions with different quadrupole deformations. These HFB wave functions are projected onto different angular momenta and the correct neutron and proton numbers for $^{21}$Ne. The results of calculations based on the effective Hamiltonians derived by normal-ordering the 3N interaction with respect to three different reference states, including the quantum-number projected HFB wave functions for $^{20}$Ne, $^{22}$Ne, and an ensemble of them with equal weights, are compared. This study serves as a key step towards ab initio calculations of odd-mass deformed nuclei with the in-medium GCM. △ Less

Submitted 5 March, 2024; v1 submitted 2 March, 2024; originally announced March 2024.

Comments: 22page, 8 figures, an invited contribution to the special issue of Symmetry journal: "Restoration of Broken Symmetries in the Nuclear Many-Body Problem", edited by Prof. Javid Sheikh and Prof. Peter Ring

arXiv:2402.19129 [pdf]

Stacking faults enabled second harmonic generation in centrosymmetric van der Waals RhI3

Authors: Yue Liu, Wen He, Bingze Wu, Fengyuan Xuan, Yuqiang Fang, Zhengbo Zhong, Jierui Fu, Jiapeng Wang, Zhipeng Li, Jinzhong Wang, Mingguang Yao, Fuqiang Huang, Liang Zhen, Yang Li, Chengyan Xu

Abstract: Second harmonic generation (SHG) in van der Waals (vdWs) materials has garnered significant attention due to its potential for integrated nonlinear optical and optoelectronic applications. Stacking faults in vdWs materials, a typical kind of planar defect, can introduce a new degree of freedom to modulate the crystal symmetry and resultant SHG response, however, the physical origin and tunability… ▽ More Second harmonic generation (SHG) in van der Waals (vdWs) materials has garnered significant attention due to its potential for integrated nonlinear optical and optoelectronic applications. Stacking faults in vdWs materials, a typical kind of planar defect, can introduce a new degree of freedom to modulate the crystal symmetry and resultant SHG response, however, the physical origin and tunability of stacking-fault-governed SHG in vdWs materials remain unclear. Here, taking the intrinsically centrosymmetric vdWs RhI3 as an example, we theoretically reveal the origin of stacking-fault-governed SHG response, where the SHG response comes from the energetically favorable AC- Cstacking fault of which the electrical transitions along the high symmetry paths Gamma-M and Gamma-K in the Brillion zone play the dominant role at 810 nm. Such stacking-fault-governed SHG response is further confirmed via structural characterizations and SHG measurements. Furthermore, by applying hydrostatic pressure on RhI3, the correlation between structural evolution and SHG response is revealed with SHG enhancement up to 6.9 times, where the decreased electronic transition energies and huger momentum matrix elements due to the stronger interlayer interactions upon compression magnify the SHG susceptibility. This study develops a promising foundation based on strategically designed stacking faults for pioneering new avenues in nonlinear nano-optics. △ Less

Submitted 29 February, 2024; originally announced February 2024.

arXiv:2402.17043 [pdf, other]

Traffic Control via Connected and Automated Vehicles: An Open-Road Field Experiment with 100 CAVs

Authors: Jonathan W. Lee, Han Wang, Kathy Jang, Amaury Hayat, Matthew Bunting, Arwa Alanqary, William Barbour, Zhe Fu, Xiaoqian Gong, George Gunter, Sharon Hornstein, Abdul Rahman Kreidieh, Nathan Lichtlé, Matthew W. Nice, William A. Richardson, Adit Shah, Eugene Vinitsky, Fangyu Wu, Shengquan Xiang, Sulaiman Almatrudi, Fahd Althukair, Rahul Bhadani, Joy Carpio, Raphael Chekroun, Eric Cheng , et al. (39 additional authors not shown)

Abstract: The CIRCLES project aims to reduce instabilities in traffic flow, which are naturally occurring phenomena due to human driving behavior. These "phantom jams" or "stop-and-go waves,"are a significant source of wasted energy. Toward this goal, the CIRCLES project designed a control system referred to as the MegaController by the CIRCLES team, that could be deployed in real traffic. Our field experim… ▽ More The CIRCLES project aims to reduce instabilities in traffic flow, which are naturally occurring phenomena due to human driving behavior. These "phantom jams" or "stop-and-go waves,"are a significant source of wasted energy. Toward this goal, the CIRCLES project designed a control system referred to as the MegaController by the CIRCLES team, that could be deployed in real traffic. Our field experiment leveraged a heterogeneous fleet of 100 longitudinally-controlled vehicles as Lagrangian traffic actuators, each of which ran a controller with the architecture described in this paper. The MegaController is a hierarchical control architecture, which consists of two main layers. The upper layer is called Speed Planner, and is a centralized optimal control algorithm. It assigns speed targets to the vehicles, conveyed through the LTE cellular network. The lower layer is a control layer, running on each vehicle. It performs local actuation by overriding the stock adaptive cruise controller, using the stock on-board sensors. The Speed Planner ingests live data feeds provided by third parties, as well as data from our own control vehicles, and uses both to perform the speed assignment. The architecture of the speed planner allows for modular use of standard control techniques, such as optimal control, model predictive control, kernel methods and others, including Deep RL, model predictive control and explicit controllers. Depending on the vehicle architecture, all onboard sensing data can be accessed by the local controllers, or only some. Control inputs vary across different automakers, with inputs ranging from torque or acceleration requests for some cars, and electronic selection of ACC set points in others. The proposed architecture allows for the combination of all possible settings proposed above. Most configurations were tested throughout the ramp up to the MegaVandertest. △ Less

Submitted 26 February, 2024; originally announced February 2024.

arXiv:2402.13100 [pdf]

An Introduction to Causal Inference Methods with Multi-omics Data

Authors: Minhao Yao, Zhonghua Liu

Abstract: Omics biomarkers play a pivotal role in personalized medicine by providing molecular-level insights into the etiology of diseases, guiding precise diagnostics, and facilitating targeted therapeutic interventions. Recent advancements in omics technologies have resulted in an increasing abundance of multimodal omics data, providing unprecedented opportunities for identifying novel omics biomarkers f… ▽ More Omics biomarkers play a pivotal role in personalized medicine by providing molecular-level insights into the etiology of diseases, guiding precise diagnostics, and facilitating targeted therapeutic interventions. Recent advancements in omics technologies have resulted in an increasing abundance of multimodal omics data, providing unprecedented opportunities for identifying novel omics biomarkers for human diseases. Mendelian randomization (MR) is a practically useful causal inference method that uses genetic variants as instrumental variables (IVs) to infer causal relationships between omics biomarkers and complex traits/diseases by removing hidden confounding bias. In this article, we first present current challenges in performing MR analysis with omics data, and then describe four MR methods for analyzing multi-omics data including epigenomics, transcriptomics, proteomics, and metabolomics data, all executable within the R software environment. △ Less

Submitted 20 February, 2024; originally announced February 2024.

arXiv:2402.10458 [pdf, ps, other]

doi 10.1103/PhysRevB.109.075127

Charge bond order and s-wave superconductivity in the kagome lattice with electron-phonon coupling and electron-electron interaction

Authors: Qing-Geng Yang, Meng Yao, Da Wang, Qiang-Hua Wang

Abstract: The effects of optical bond phonons coupled to electrons in two-dimensional lattices have attracted much interest recently, with the hope to explore unconventional superconducting mechanism and pairing symmetries. Here we conduct a systematic investigation of such phonon modes in the kagome lattice at and around the upper van Hove filling, in order to unravel new effects of the bond phonons in the… ▽ More The effects of optical bond phonons coupled to electrons in two-dimensional lattices have attracted much interest recently, with the hope to explore unconventional superconducting mechanism and pairing symmetries. Here we conduct a systematic investigation of such phonon modes in the kagome lattice at and around the upper van Hove filling, in order to unravel new effects of the bond phonons in the presence of the unique sublattice frustration. We combine the singular-mode functional renormalization group and the projector determinant quantum Monte Carlo methods. At the upper van Hove filling and in the absence of the Hubbard interaction $U$, we find there exists an s-wave superconducting state at weaker electron-phonon coupling constant $λ$ and higher phonon frequency $ω$, and a charge bond order (or the valence bond solid) state at larger $λ$ and lower $ω$. The Hubbard interaction $U$ suppresses drastically the s-wave pairing, so that only the charge bond order survives. On the other hand, upon slight doping away from the van Hove filling, we observe that the charge bond order is suppressed due to the breakdown of the perfect Fermi surface nesting, while the superconductivity persists. The s-wave superconductivity and charge bond order may be relevant in the layered kagome superconductors AV$_3$Sb$_5$ (A=K, Rb, Cs). △ Less

Submitted 16 February, 2024; originally announced February 2024.

Journal ref: Phys. Rev. B 109, 075127 (2024)

arXiv:2402.07369 [pdf, other]

Diff-RNTraj: A Structure-aware Diffusion Model for Road Network-constrained Trajectory Generation

Authors: Tonglong Wei, Youfang Lin, Shengnan Guo, Yan Lin, Yiheng Huang, Chenyang Xiang, Yuqing Bai, Menglu Ya, Huaiyu Wan

Abstract: Trajectory data is essential for various applications as it records the movement of vehicles. However, publicly available trajectory datasets remain limited in scale due to privacy concerns, which hinders the development of trajectory data mining and trajectory-based applications. To address this issue, some methods for generating synthetic trajectories have been proposed to expand the scale of th… ▽ More Trajectory data is essential for various applications as it records the movement of vehicles. However, publicly available trajectory datasets remain limited in scale due to privacy concerns, which hinders the development of trajectory data mining and trajectory-based applications. To address this issue, some methods for generating synthetic trajectories have been proposed to expand the scale of the dataset. However, all existing methods generate trajectories in the geographical coordinate system, which poses two limitations for their utilization in practical applications: 1) the inability to ensure that the generated trajectories are constrained on the road. 2) the lack of road-related information. In this paper, we propose a new problem to meet the practical application need, \emph{i.e.}, road network-constrained trajectory (RNTraj) generation, which can directly generate trajectories on the road network with road-related information. RNTraj is a hybrid type of data, in which each point is represented by a discrete road segment and a continuous moving rate. To generate RNTraj, we design a diffusion model called Diff-RNTraj. This model can effectively handle the hybrid RNTraj using a continuous diffusion framework by incorporating a pre-training strategy to embed hybrid RNTraj into continuous representations. During the sampling stage, a RNTraj decoder is designed to map the continuous representation generated by the diffusion model back to the hybrid RNTraj format. Furthermore, Diff-RNTraj introduces a novel loss function to enhance the spatial validity of the generated trajectories. Extensive experiments conducted on two real-world trajectory datasets demonstrate the effectiveness of the proposed model. △ Less

Submitted 11 February, 2024; originally announced February 2024.

arXiv:2402.06532 [pdf, other]

Generative Adversarial Bayesian Optimization for Surrogate Objectives

Authors: Michael S. Yao, Yimeng Zeng, Hamsa Bastani, Jacob Gardner, James C. Gee, Osbert Bastani

Abstract: Offline model-based policy optimization seeks to optimize a learned surrogate objective function without querying the true oracle objective during optimization. However, inaccurate surrogate model predictions are frequently encountered along the optimization trajectory. To address this limitation, we propose generative adversarial Bayesian optimization (GABO) using adaptive source critic regulariz… ▽ More Offline model-based policy optimization seeks to optimize a learned surrogate objective function without querying the true oracle objective during optimization. However, inaccurate surrogate model predictions are frequently encountered along the optimization trajectory. To address this limitation, we propose generative adversarial Bayesian optimization (GABO) using adaptive source critic regularization, a task-agnostic framework for Bayesian optimization that employs a Lipschitz-bounded source critic model to constrain the optimization trajectory to regions where the surrogate function is reliable. We show that under certain assumptions for the continuous input space prior, our algorithm dynamically adjusts the strength of the source critic regularization. GABO outperforms existing baselines on a number of different offline optimization tasks across a variety of scientific domains. Our code is available at https://github.com/michael-s-yao/gabo △ Less

Submitted 9 February, 2024; originally announced February 2024.

Comments: 15 pages, 3 figures

arXiv:2401.08145 [pdf]

Low power consumption grating magneto-optical trap based on planar elements

Authors: Zhilong Yu, Yumeng Zhu, Minghao Yao, Feng Qi, Liang Chen, Chang-ling Zou, Junyi Duan, Xiaochi Liu

Abstract: The grating-based magneto-optical trap (GMOT) is a promising approach for miniaturizing cold-atom systems. However, the power consumption of a GMOT system dominates its feasibility in practical applications. In this study, we demonstrated a GMOT system based on planar elements that can operate with low power consumption. A high-diffraction-efficiency grating chip was used to cool atoms with a sing… ▽ More The grating-based magneto-optical trap (GMOT) is a promising approach for miniaturizing cold-atom systems. However, the power consumption of a GMOT system dominates its feasibility in practical applications. In this study, we demonstrated a GMOT system based on planar elements that can operate with low power consumption. A high-diffraction-efficiency grating chip was used to cool atoms with a single incident beam. A planar coil chip was designed and fabricated with a low power consumption nested architecture. The grating and coil chips were adapted to a passive pump vacuum chamber, and up to 106 87Rb atoms were trapped. These elements effectively reduce the power consumption of the GMOT and have great potential for applications in practical cold-atom-based devices. △ Less

Submitted 16 January, 2024; originally announced January 2024.

arXiv:2311.18131 [pdf, other]

Isolating effects of large and small scale turbulence on thermodiffusively unstable premixed hydrogen flames

Authors: Matthew X. Yao, Guillaume Blanquart

Abstract: Lean turbulent premixed hydrogen/air flames have substantially increased flame speeds, commonly attributed to differential diffusion effects. In this work, the effect of turbulence on lean hydrogen combustion is studied through Direct Numerical Simulation using detailed chemistry and detailed transport. Simulations are conducted at six Karlovitz numbers and three integral length scales. A general… ▽ More Lean turbulent premixed hydrogen/air flames have substantially increased flame speeds, commonly attributed to differential diffusion effects. In this work, the effect of turbulence on lean hydrogen combustion is studied through Direct Numerical Simulation using detailed chemistry and detailed transport. Simulations are conducted at six Karlovitz numbers and three integral length scales. A general expression for the burning efficiency is proposed which depends on the conditional mean chemical source term and gradient of a progress variable. At a fixed Karlovitz number, the normalized turbulent flame speed and area both increase linearly with the integral length scale ratio. The effect on the mean source term profile is minimal, indicating that the increase in flame speed can solely be attributed to the increase in flame area. At a fixed integral length scale, both the flame speed and area first increase with Karlovitz number before decreasing. At higher Karlovitz numbers, the diffusivity is enhanced due to penetration of turbulence into the reaction zone, significantly dampening differential diffusion effects. △ Less

Submitted 29 November, 2023; originally announced November 2023.

arXiv:2311.15305 [pdf, other]

doi 10.1103/PhysRevC.109.034305

Multireference covariant density-functional theory for the low-lying states of odd-mass nuclei

Authors: E. F. Zhou, X. Y. Wu, J. M. Yao

Abstract: We extend multireference covariant density-functional theory (MR-CDFT) based on a relativistic point-coupling energy functional to describe the low-lying states of odd-mass nuclei. The nuclear wave function is constructed as a superposition of quadrupole-octupole deformed mean-field configurations, with projection onto angular momentum, particle numbers, and parity within the framework of the gene… ▽ More We extend multireference covariant density-functional theory (MR-CDFT) based on a relativistic point-coupling energy functional to describe the low-lying states of odd-mass nuclei. The nuclear wave function is constructed as a superposition of quadrupole-octupole deformed mean-field configurations, with projection onto angular momentum, particle numbers, and parity within the framework of the generator coordinate method. Using $^{25}$Mg as an example, we calculate the energy spectrum, electric multipole, and magnetic dipole transition strengths based on three different schemes for the mean-field configurations of odd-mass nuclei. We find that the low-energy structure of $^{25}$Mg is reasonably reproduced in all three schemes. In particular, the effect of octupole correlation is illustrated in the application to the low-lying parity doublets of $^{21}$Ne. This work demonstrates the success of the MR-CDFT for the low-lying states of odd-mass nuclei with possible strong quadruple-octupole correlations. △ Less

Submitted 8 January, 2024; v1 submitted 26 November, 2023; originally announced November 2023.

Comments: 23 pages with 32 figures and 3 tables

Journal ref: Phys. Rev. C109, 034305 (2024)

arXiv:2311.13217 [pdf, other]

Controllable orbital angular momentum monopoles in chiral topological semimetals

Authors: Yun Yen, Jonas A. Krieger, Mengyu Yao, Iñigo Robredo, Kaustuv Manna, Qun Yang, Emily C. McFarlane, Chandra Shekhar, Horst Borrmann, Samuel Stolz, Roland Widmer, Oliver Gröning, Vladimir N. Strocov, Stuart S. P. Parkin, Claudia Felser, Maia G. Vergniory, Michael Schüler, Niels B. M. Schröter

Abstract: The emerging field of orbitronics aims at generating and controlling currents of electronic orbital angular momentum (OAM) for information processing. Structurally chiral topological crystals could be particularly suitable orbitronic materials because they have been predicted to host topological band degeneracies in reciprocal space that are monopoles of OAM. Around such a monopole, the OAM is loc… ▽ More The emerging field of orbitronics aims at generating and controlling currents of electronic orbital angular momentum (OAM) for information processing. Structurally chiral topological crystals could be particularly suitable orbitronic materials because they have been predicted to host topological band degeneracies in reciprocal space that are monopoles of OAM. Around such a monopole, the OAM is locked isotopically parallel or antiparallel to the direction of the electron's momentum, which could be used to generate large and controllable OAM currents. However, OAM monopoles have not yet been directly observed in chiral crystals, and no handle to control their polarity has been discovered. Here, we use circular dichroism in angle-resolved photoelectron spectroscopy (CD-ARPES) to image OAM monopoles in the chiral topological semimetals PtGa and PdGa. Moreover, we also demonstrate that the polarity of the monopole can be controlled via the structural handedness of the host crystal by imaging OAM monopoles and anti-monopoles in the two enantiomers of PdGa, respectively. For most photon energies used in our study, we observe a sign change in the CD-ARPES spectrum when comparing positive and negative momenta along the light direction near the topological degeneracy. This is consistent with the conventional view that CD-ARPES measures the projection of the OAM monopole along the photon momentum. For some photon energies, however, this sign change disappears, which can be understood from our numerical simulations as the interference of polar atomic OAM contributions, consistent with the presence of OAM monopoles. Our results highlight the potential of chiral crystals for orbitronic device applications, and our methodology could enable the discovery of even more complicated nodal OAM textures that could be exploited for orbitronics. △ Less

Submitted 22 November, 2023; originally announced November 2023.

Comments: 16 pages, 8 figures

arXiv:2311.02810 [pdf]

doi 10.1007/s10409-024-24010-x

Bending short DNAs as transversely isotropic rings in series

Authors: Chenyu Shi, Meicheng Yao, Bin Chen

Abstract: Despite the significance of the high flexibility exhibited by short DNAs, there remains an incomplete understanding of their anomalous persistence length. In this study, we propose a novel approach wherein each fundamental characteristic of gene sequences within short DNAs is modeled as a transversely isotropic ring. Our comprehensive model analysis not only successfully replicates the observed hi… ▽ More Despite the significance of the high flexibility exhibited by short DNAs, there remains an incomplete understanding of their anomalous persistence length. In this study, we propose a novel approach wherein each fundamental characteristic of gene sequences within short DNAs is modeled as a transversely isotropic ring. Our comprehensive model analysis not only successfully replicates the observed high flexibility of short DNAs but also sheds light on the impact of sequence dependence, aligning with experimental findings. Furthermore, our analysis suggests that the bending behavior of short DNAs can be effectively described by the Timoshenko beam theory, accounting for shear considerations. △ Less

Submitted 5 November, 2023; originally announced November 2023.

Comments: 12 pages, 4 figures

arXiv:2310.12848 [pdf, other]

Neural Degradation Representation Learning for All-In-One Image Restoration

Authors: Mingde Yao, Ruikang Xu, Yuanshen Guan, Jie Huang, Zhiwei Xiong

Abstract: Existing methods have demonstrated effective performance on a single degradation type. In practical applications, however, the degradation is often unknown, and the mismatch between the model and the degradation will result in a severe performance drop. In this paper, we propose an all-in-one image restoration network that tackles multiple degradations. Due to the heterogeneous nature of different… ▽ More Existing methods have demonstrated effective performance on a single degradation type. In practical applications, however, the degradation is often unknown, and the mismatch between the model and the degradation will result in a severe performance drop. In this paper, we propose an all-in-one image restoration network that tackles multiple degradations. Due to the heterogeneous nature of different types of degradations, it is difficult to process multiple degradations in a single network. To this end, we propose to learn a neural degradation representation (NDR) that captures the underlying characteristics of various degradations. The learned NDR decomposes different types of degradations adaptively, similar to a neural dictionary that represents basic degradation components. Subsequently, we develop a degradation query module and a degradation injection module to effectively recognize and utilize the specific degradation based on NDR, enabling the all-in-one restoration ability for multiple degradations. Moreover, we propose a bidirectional optimization strategy to effectively drive NDR to learn the degradation representation by optimizing the degradation and restoration processes alternately. Comprehensive experiments on representative types of degradations (including noise, haze, rain, and downsampling) demonstrate the effectiveness and generalization capability of our method. △ Less

Submitted 19 October, 2023; originally announced October 2023.

arXiv:2310.06297 [pdf, other]

Reducing Detailed Vehicle Energy Dynamics to Physics-Like Models

Authors: Nour Khoudari, Sulaiman Almatrudi, Rabie Ramadan, Joy Carpio, Mengsha Yao, Kenneth Butts, Alexandre M. Bayen, Jonathan W. Lee, Benjamin Seibold

Abstract: The energy demand of vehicles, particularly in unsteady drive cycles, is affected by complex dynamics internal to the engine and other powertrain components. Yet, in many applications, particularly macroscopic traffic flow modeling and optimization, structurally simple approximations to the complex vehicle dynamics are needed that nevertheless reproduce the correct effective energy behavior. This… ▽ More The energy demand of vehicles, particularly in unsteady drive cycles, is affected by complex dynamics internal to the engine and other powertrain components. Yet, in many applications, particularly macroscopic traffic flow modeling and optimization, structurally simple approximations to the complex vehicle dynamics are needed that nevertheless reproduce the correct effective energy behavior. This work presents a systematic model reduction pipeline that starts from complex vehicle models based on the Autonomie software and derives a hierarchy of simplified models that are fast to evaluate, easy to disseminate in open-source frameworks, and compatible with optimization frameworks. The pipeline, based on a virtual chassis dynamometer and subsequent approximation strategies, is reproducible and is applied to six different vehicle classes to produce concrete explicit energy models that represent an average vehicle in each class and leverage the accuracy and validation work of the Autonomie software. △ Less

Submitted 10 October, 2023; originally announced October 2023.

Comments: 40 pages, 9 figures

arXiv:2309.17334 [pdf, other]

Multi-Depth Branch Network for Efficient Image Super-Resolution

Authors: Huiyuan Tian, Li Zhang, Shijian Li, Min Yao, Gang Pan

Abstract: A longstanding challenge in Super-Resolution (SR) is how to efficiently enhance high-frequency details in Low-Resolution (LR) images while maintaining semantic coherence. This is particularly crucial in practical applications where SR models are often deployed on low-power devices. To address this issue, we propose an innovative asymmetric SR architecture featuring Multi-Depth Branch Module (MDBM)… ▽ More A longstanding challenge in Super-Resolution (SR) is how to efficiently enhance high-frequency details in Low-Resolution (LR) images while maintaining semantic coherence. This is particularly crucial in practical applications where SR models are often deployed on low-power devices. To address this issue, we propose an innovative asymmetric SR architecture featuring Multi-Depth Branch Module (MDBM). These MDBMs contain branches of different depths, designed to capture high- and low-frequency information simultaneously and efficiently. The hierarchical structure of MDBM allows the deeper branch to gradually accumulate fine-grained local details under the contextual guidance of the shallower branch. We visualize this process using feature maps, and further demonstrate the rationality and effectiveness of this design using proposed novel Fourier spectral analysis methods. Moreover, our model exhibits more significant spectral differentiation between branches than existing branch networks. This suggests that MDBM reduces feature redundancy and offers a more effective method for integrating high- and low-frequency information. Extensive qualitative and quantitative evaluations on various datasets show that our model can generate structurally consistent and visually realistic HR images. It achieves state-of-the-art (SOTA) results at a very fast inference speed. Our code is available at https://github.com/thy960112/MDBN. △ Less

Submitted 15 January, 2024; v1 submitted 29 September, 2023; originally announced September 2023.

arXiv:2309.11753 [pdf, other]

Improve the efficiency of deep reinforcement learning through semantic exploration guided by natural language

Authors: Zhourui Guo, Meng Yao, Yang Yu, Qiyue Yin

Abstract: Reinforcement learning is a powerful technique for learning from trial and error, but it often requires a large number of interactions to achieve good performance. In some domains, such as sparse-reward tasks, an oracle that can provide useful feedback or guidance to the agent during the learning process is really of great importance. However, querying the oracle too frequently may be costly or im… ▽ More Reinforcement learning is a powerful technique for learning from trial and error, but it often requires a large number of interactions to achieve good performance. In some domains, such as sparse-reward tasks, an oracle that can provide useful feedback or guidance to the agent during the learning process is really of great importance. However, querying the oracle too frequently may be costly or impractical, and the oracle may not always have a clear answer for every situation. Therefore, we propose a novel method for interacting with the oracle in a selective and efficient way, using a retrieval-based approach. We assume that the interaction can be modeled as a sequence of templated questions and answers, and that there is a large corpus of previous interactions available. We use a neural network to encode the current state of the agent and the oracle, and retrieve the most relevant question from the corpus to ask the oracle. We then use the oracle's answer to update the agent's policy and value function. We evaluate our method on an object manipulation task. We show that our method can significantly improve the efficiency of RL by reducing the number of interactions needed to reach a certain level of performance, compared to baselines that do not use the oracle or use it in a naive way. △ Less

Submitted 20 September, 2023; originally announced September 2023.

arXiv:2309.09488 [pdf, other]

doi 10.1142/S0218301323400116

Generator coordinate method for nuclear octupole excitations: status and perspectives

Authors: E. F. Zhou, J. M. Yao

Abstract: Strong octupole correlations have been observed in the low-lying states of atomic nuclei across various mass regions. In this review, we provide an overview of Beyond Mean-Field (BMF) studies of nuclear octupole collective motions with Generator Coordinate Method (GCM) in combination with quantum-number projections that are implemented to restore the broken symmetries in nuclear mean-field states.… ▽ More Strong octupole correlations have been observed in the low-lying states of atomic nuclei across various mass regions. In this review, we provide an overview of Beyond Mean-Field (BMF) studies of nuclear octupole collective motions with Generator Coordinate Method (GCM) in combination with quantum-number projections that are implemented to restore the broken symmetries in nuclear mean-field states. We highlight recent developments within this framework and their applications to excitation spectra and electromagnetic transition rates in octupole-shaped nuclei and hypernuclei. We discuss the novel phenomena of nucleon clustering in light nuclei. Additionally, we explore the phase transition from octupole vibrations to rotational motions as spin increases in heavy nuclei. Lastly, we examine the status and future prospects of studies on octupole deformation effects in nuclear Schiff moments. These studies, along with the upper limits of atomic Electric Dipole Moment (EDM), impose stringent constraints on beyond-standard-model time-reversal-violating nucleon-nucleon interactions. △ Less

Submitted 12 October, 2023; v1 submitted 18 September, 2023; originally announced September 2023.

Comments: 56 pages, 21 figures, an invited review for the journal Int. J. Mod. Phys. E

Journal ref: Int. J. Mod. Phys. E (2023)

arXiv:2308.15634 [pdf, other]

doi 10.1103/PhysRevLett.132.182502

Ab initio uncertainty quantification of neutrinoless double-beta decay in $^{76}$Ge

Authors: A. Belley, J. M. Yao, B. Bally, J. Pitcher, J. Engel, H. Hergert, J. D. Holt, T. Miyagi, T. R. Rodriguez, A. M. Romero, S. R. Stroberg, X. Zhang

Abstract: The observation of neutrinoless double-beta ($0νββ$) decay would offer proof of lepton number violation, demonstrating that neutrinos are Majorana particles, while also helping us understand why there is more matter than antimatter in the Universe. If the decay is driven by the exchange of the three known light neutrinos, a discovery would, in addition, link the observed decay rate to the neutrino… ▽ More The observation of neutrinoless double-beta ($0νββ$) decay would offer proof of lepton number violation, demonstrating that neutrinos are Majorana particles, while also helping us understand why there is more matter than antimatter in the Universe. If the decay is driven by the exchange of the three known light neutrinos, a discovery would, in addition, link the observed decay rate to the neutrino mass scale through a theoretical quantity known as the nuclear matrix element (NME). Accurate values of the NMEs for all nuclei considered for use in $0νββ$ experiments are therefore crucial for designing and interpreting those experiments. Here, we report the first comprehensive ab initio uncertainty quantification of the $0νββ$-decay NME, in the key nucleus $^{76}$Ge. Our method employs nuclear strong and weak interactions derived within chiral effective field theory and recently developed many-body emulators. Our result, with a conservative treatment of uncertainty, is an NME of $2.60^{+1.28}_{-1.36}$, which, together with the best-existing half-life sensitivity and phase-space factor, sets an upper limit for effective neutrino mass of $187^{+205}_{-62}$ meV. The result is important for designing next-generation germanium detectors aiming to cover the entire inverted hierarchy region of neutrino masses. △ Less

Submitted 19 January, 2024; v1 submitted 29 August, 2023; originally announced August 2023.

Comments: 7 pages, 1 table, and 2 figures

Journal ref: Phys. Rev. Lett. 132, 182502 (2024)

arXiv:2308.14018 [pdf, other]

VQ-Font: Few-Shot Font Generation with Structure-Aware Enhancement and Quantization

Authors: Mingshuai Yao, Yabo Zhang, Xianhui Lin, Xiaoming Li, Wangmeng Zuo

Abstract: Few-shot font generation is challenging, as it needs to capture the fine-grained stroke styles from a limited set of reference glyphs, and then transfer to other characters, which are expected to have similar styles. However, due to the diversity and complexity of Chinese font styles, the synthesized glyphs of existing methods usually exhibit visible artifacts, such as missing details and distorte… ▽ More Few-shot font generation is challenging, as it needs to capture the fine-grained stroke styles from a limited set of reference glyphs, and then transfer to other characters, which are expected to have similar styles. However, due to the diversity and complexity of Chinese font styles, the synthesized glyphs of existing methods usually exhibit visible artifacts, such as missing details and distorted strokes. In this paper, we propose a VQGAN-based framework (i.e., VQ-Font) to enhance glyph fidelity through token prior refinement and structure-aware enhancement. Specifically, we pre-train a VQGAN to encapsulate font token prior within a codebook. Subsequently, VQ-Font refines the synthesized glyphs with the codebook to eliminate the domain gap between synthesized and real-world strokes. Furthermore, our VQ-Font leverages the inherent design of Chinese characters, where structure components such as radicals and character components are combined in specific arrangements, to recalibrate fine-grained styles based on references. This process improves the matching and fusion of styles at the structure level. Both modules collaborate to enhance the fidelity of the generated fonts. Experiments on a collected font dataset show that our VQ-Font outperforms the competing methods both quantitatively and qualitatively, especially in generating challenging styles. △ Less

Submitted 27 August, 2023; originally announced August 2023.

Comments: 13 pages, 14 figures

arXiv:2308.13783 [pdf, other]

Generalized Lightness Adaptation with Channel Selective Normalization

Authors: Mingde Yao, Jie Huang, Xin Jin, Ruikang Xu, Shenglong Zhou, Man Zhou, Zhiwei Xiong

Abstract: Lightness adaptation is vital to the success of image processing to avoid unexpected visual deterioration, which covers multiple aspects, e.g., low-light image enhancement, image retouching, and inverse tone mapping. Existing methods typically work well on their trained lightness conditions but perform poorly in unknown ones due to their limited generalization ability. To address this limitation,… ▽ More Lightness adaptation is vital to the success of image processing to avoid unexpected visual deterioration, which covers multiple aspects, e.g., low-light image enhancement, image retouching, and inverse tone mapping. Existing methods typically work well on their trained lightness conditions but perform poorly in unknown ones due to their limited generalization ability. To address this limitation, we propose a novel generalized lightness adaptation algorithm that extends conventional normalization techniques through a channel filtering design, dubbed Channel Selective Normalization (CSNorm). The proposed CSNorm purposely normalizes the statistics of lightness-relevant channels and keeps other channels unchanged, so as to improve feature generalization and discrimination. To optimize CSNorm, we propose an alternating training strategy that effectively identifies lightness-relevant channels. The model equipped with our CSNorm only needs to be trained on one lightness condition and can be well generalized to unknown lightness conditions. Experimental results on multiple benchmark datasets demonstrate the effectiveness of CSNorm in enhancing the generalization ability for the existing lightness adaptation methods. Code is available at https://github.com/mdyao/CSNorm. △ Less

Submitted 26 August, 2023; originally announced August 2023.

Comments: Accepted to ICCV 2023. Code: https://github.com/mdyao/CSNorm/

arXiv:2308.12538 [pdf, other]

Mutual-Guided Dynamic Network for Image Fusion

Authors: Yuanshen Guan, Ruikang Xu, Mingde Yao, Lizhi Wang, Zhiwei Xiong

Abstract: Image fusion aims to generate a high-quality image from multiple images captured under varying conditions. The key problem of this task is to preserve complementary information while filtering out irrelevant information for the fused result. However, existing methods address this problem by leveraging static convolutional neural networks (CNNs), suffering two inherent limitations during feature ex… ▽ More Image fusion aims to generate a high-quality image from multiple images captured under varying conditions. The key problem of this task is to preserve complementary information while filtering out irrelevant information for the fused result. However, existing methods address this problem by leveraging static convolutional neural networks (CNNs), suffering two inherent limitations during feature extraction, i.e., being unable to handle spatial-variant contents and lacking guidance from multiple inputs. In this paper, we propose a novel mutual-guided dynamic network (MGDN) for image fusion, which allows for effective information utilization across different locations and inputs. Specifically, we design a mutual-guided dynamic filter (MGDF) for adaptive feature extraction, composed of a mutual-guided cross-attention (MGCA) module and a dynamic filter predictor, where the former incorporates additional guidance from different inputs and the latter generates spatial-variant kernels for different locations. In addition, we introduce a parallel feature fusion (PFF) module to effectively fuse local and global information of the extracted features. To further reduce the redundancy among the extracted features while simultaneously preserving their shared structural information, we devise a novel loss function that combines the minimization of normalized mutual information (NMI) with an estimated gradient mask. Experimental results on five benchmark datasets demonstrate that our proposed method outperforms existing methods on four image fusion tasks. The code and model are publicly available at: https://github.com/Guanys-dar/MGDN. △ Less

Submitted 1 September, 2023; v1 submitted 23 August, 2023; originally announced August 2023.

Comments: ACMMM 2023 accepted

arXiv:2308.10683 [pdf, other]

doi 10.1093/mnras/stad2595

Variability, polarimetry, and timing properties of single pulses from PSR J2222-0137 using FAST

Authors: X. L. Miao, W. W. Zhu, M. Kramer, P. C. C. Freire, L. Shao, M. Yuan, L. Q. Meng, Z. W. Wu, C. C. Miao, Y. J. Guo, D. J. Champion, E. Fonseca, J. M. Yao, M. Y. Xue, J. R. Niu, H. Hu, C. M. Zhang

Abstract: In our work, we analyse $5\times10^{4}$ single pulses from the recycled pulsar PSR J2222$-$0137 in one of its scintillation maxima observed by the Five-hundred-meter Aperture Spherical radio Telescope (FAST). PSR J2222$-$0137 is one of the nearest and best studies of binary pulsars and a unique laboratory for testing gravitational theories. We report single pulses' energy distribution and polariza… ▽ More In our work, we analyse $5\times10^{4}$ single pulses from the recycled pulsar PSR J2222$-$0137 in one of its scintillation maxima observed by the Five-hundred-meter Aperture Spherical radio Telescope (FAST). PSR J2222$-$0137 is one of the nearest and best studies of binary pulsars and a unique laboratory for testing gravitational theories. We report single pulses' energy distribution and polarization from the pulsar's main-pulse region. The single pulse energy follows the log-normal distribution. We resolve a steep polarization swing, but at the current time resolution ($64\,μ{\rm s}$), we find no evidence for the orthogonal jump in the main-pulse region, as has been suspected. We find a potential sub-pulse drifting period of $P_{3} \sim 3.5\,P$. We analyse the jitter noise from different integrated numbers of pulses and find that its $σ_{j}$ is $270\pm{9}\,{\rm ns}$ for 1-hr integration at 1.25 GHz. This result is useful for optimizing future timing campaigns with FAST or other radio telescopes. △ Less

Submitted 21 August, 2023; originally announced August 2023.

Comments: 11 pages, 14 figures, accepted by Monthly Notices of the Royal Astronomical Society

Journal ref: MNRAS 526 (2023) 2156

arXiv:2308.08227 [pdf, other]

Inherent Redundancy in Spiking Neural Networks

Authors: Man Yao, Jiakui Hu, Guangshe Zhao, Yaoyuan Wang, Ziyang Zhang, Bo Xu, Guoqi Li

Abstract: Spiking Neural Networks (SNNs) are well known as a promising energy-efficient alternative to conventional artificial neural networks. Subject to the preconceived impression that SNNs are sparse firing, the analysis and optimization of inherent redundancy in SNNs have been largely overlooked, thus the potential advantages of spike-based neuromorphic computing in accuracy and energy efficiency are i… ▽ More Spiking Neural Networks (SNNs) are well known as a promising energy-efficient alternative to conventional artificial neural networks. Subject to the preconceived impression that SNNs are sparse firing, the analysis and optimization of inherent redundancy in SNNs have been largely overlooked, thus the potential advantages of spike-based neuromorphic computing in accuracy and energy efficiency are interfered. In this work, we pose and focus on three key questions regarding the inherent redundancy in SNNs. We argue that the redundancy is induced by the spatio-temporal invariance of SNNs, which enhances the efficiency of parameter utilization but also invites lots of noise spikes. Further, we analyze the effect of spatio-temporal invariance on the spatio-temporal dynamics and spike firing of SNNs. Then, motivated by these analyses, we propose an Advance Spatial Attention (ASA) module to harness SNNs' redundancy, which can adaptively optimize their membrane potential distribution by a pair of individual spatial attention sub-modules. In this way, noise spike features are accurately regulated. Experimental results demonstrate that the proposed method can significantly drop the spike firing with better performance than state-of-the-art SNN baselines. Our code is available in \url{https://github.com/BICLab/ASA-SNN}. △ Less

Submitted 16 August, 2023; originally announced August 2023.

Comments: Accepted by ICCV2023

arXiv:2308.07207 [pdf, other]

FOLT: Fast Multiple Object Tracking from UAV-captured Videos Based on Optical Flow

Authors: Mufeng Yao, Jiaqi Wang, Jinlong Peng, Mingmin Chi, Chao Liu

Abstract: Multiple object tracking (MOT) has been successfully investigated in computer vision. However, MOT for the videos captured by unmanned aerial vehicles (UAV) is still challenging due to small object size, blurred object appearance, and very large and/or irregular motion in both ground objects and UAV platforms. In this paper, we propose FOLT to mitigate these problems and reach fast and accurat… ▽ More Multiple object tracking (MOT) has been successfully investigated in computer vision. However, MOT for the videos captured by unmanned aerial vehicles (UAV) is still challenging due to small object size, blurred object appearance, and very large and/or irregular motion in both ground objects and UAV platforms. In this paper, we propose FOLT to mitigate these problems and reach fast and accurate MOT in UAV view. Aiming at speed-accuracy trade-off, FOLT adopts a modern detector and light-weight optical flow extractor to extract object detection features and motion features at a minimum cost. Given the extracted flow, the flow-guided feature augmentation is designed to augment the object detection feature based on its optical flow, which improves the detection of small objects. Then the flow-guided motion prediction is also proposed to predict the object's position in the next frame, which improves the tracking performance of objects with very large displacements between adjacent frames. Finally, the tracker matches the detected objects and predicted objects using a spatially matching scheme to generate tracks for every object. Experiments on Visdrone and UAVDT datasets show that our proposed model can successfully track small objects with large and irregular motion and outperform existing state-of-the-art methods in UAV-MOT tasks. △ Less

Submitted 14 August, 2023; v1 submitted 14 August, 2023; originally announced August 2023.

Comments: Accepted by ACM Multi-Media 2023

arXiv:2308.04322 [pdf, other]

Domain Adaptive Person Search via GAN-based Scene Synthesis for Cross-scene Videos

Authors: Huibing Wang, Tianxiang Cui, Mingze Yao, Huijuan Pang, Yushan Du

Abstract: Person search has recently been a challenging task in the computer vision domain, which aims to search specific pedestrians from real cameras.Nevertheless, most surveillance videos comprise only a handful of images of each pedestrian, which often feature identical backgrounds and clothing. Hence, it is difficult to learn more discriminative features for person search in real scenes. To tackle this… ▽ More Person search has recently been a challenging task in the computer vision domain, which aims to search specific pedestrians from real cameras.Nevertheless, most surveillance videos comprise only a handful of images of each pedestrian, which often feature identical backgrounds and clothing. Hence, it is difficult to learn more discriminative features for person search in real scenes. To tackle this challenge, we draw on Generative Adversarial Networks (GAN) to synthesize data from surveillance videos. GAN has thrived in computer vision problems because it produces high-quality images efficiently. We merely alter the popular Fast R-CNN model, which is capable of processing videos and yielding accurate detection outcomes. In order to appropriately relieve the pressure brought by the two-stage model, we design an Assisted-Identity Query Module (AIDQ) to provide positive images for the behind part. Besides, the proposed novel GAN-based Scene Synthesis model that can synthesize high-quality cross-id person images for person search tasks. In order to facilitate the feature learning of the GAN-based Scene Synthesis model, we adopt an online learning strategy that collaboratively learns the synthesized images and original images. Extensive experiments on two widely used person search benchmarks, CUHK-SYSU and PRW, have shown that our method has achieved great performance, and the extensive ablation study further justifies our GAN-synthetic data can effectively increase the variability of the datasets and be more realistic. △ Less

Submitted 8 August, 2023; originally announced August 2023.

arXiv:2307.13198 [pdf, other]

Change of rotation measure during eclipse of a black widow PSR J2051$-$0827

Authors: S. Q. Wang, J. B. Wang, D. Z. Li, J. M. Yao, R. N. Manchester, G. Hobbs, N. Wang, S. Dai, H. Xu, R. Luo, Y. Feng, W. Y. Wang, D. Li, Y. W. Yu, Z. X. Du, C. H. Niu, S. B. Zhang, C. M. Zhang

Abstract: Black widows are millisecond pulsars ablating their companions. The material blown from the companion blocks the radio emission, resulting in radio eclipses. The properties of the eclipse medium are poorly understood. Here, we present direct evidence of the existence of magnetic fields in the eclipse medium of the black widow PSR J2051$-$0827 using observations made with the Five-hundred-meter Ape… ▽ More Black widows are millisecond pulsars ablating their companions. The material blown from the companion blocks the radio emission, resulting in radio eclipses. The properties of the eclipse medium are poorly understood. Here, we present direct evidence of the existence of magnetic fields in the eclipse medium of the black widow PSR J2051$-$0827 using observations made with the Five-hundred-meter Aperture Spherical radio Telescope (FAST). We detect a regular decrease in rotation measure (RM) in the egress of eclipse, changing from $60\,\rm rad\,m^{-2}$ to $-28.7\,\rm rad\,m^{-2}$. The RM gradually changes back to normal when the line-of-sight moves away from the eclipse. The estimated line-of-sight magnetic field strength in the eclipse medium is $\sim 0.1$ G. The RM reversal could be caused by a change of the magnetic field strength along the line of sight due to binary orbital motion. The RM reversal phenomenon has also been observed in some repeating fast radio bursts (FRBs), and the study of spider pulsars may provide additional information about the origin of FRBs. △ Less

Submitted 24 July, 2023; originally announced July 2023.

Comments: 7 pages, 3 figures, accept for publication in ApJ

arXiv:2307.09482 [pdf, other]

doi 10.1103/PhysRevX.14.021028

Exact Results for a Boundary-Driven Double Spin Chain and Resource-Efficient Remote Entanglement Stabilization

Authors: Andrew Lingenfelter, Mingxing Yao, Andrew Pocklington, Yu-Xin Wang, Abdullah Irfan, Wolfgang Pfaff, Aashish A. Clerk

Abstract: We derive an exact solution for the steady state of a setup where two $XX$-coupled $N$-qubit spin chains (with possibly non-uniform couplings) are subject to boundary Rabi drives, and common boundary loss generated by a waveguide (either bidirectional or unidirectional). For a wide range of parameters, this system has a pure entangled steady state, providing a means for stabilizing remote multi-qu… ▽ More We derive an exact solution for the steady state of a setup where two $XX$-coupled $N$-qubit spin chains (with possibly non-uniform couplings) are subject to boundary Rabi drives, and common boundary loss generated by a waveguide (either bidirectional or unidirectional). For a wide range of parameters, this system has a pure entangled steady state, providing a means for stabilizing remote multi-qubit entanglement without the use of squeezed light. Our solution also provides insights into a single boundary-driven dissipative $XX$ spin chain that maps to an interacting fermionic model. The non-equilibrium steady state exhibits surprising correlation effects, including an emergent pairing of hole excitations that arises from dynamically constrained hopping. Our system could be implemented in a number of experimental platforms, including circuit QED. △ Less

Submitted 20 May, 2024; v1 submitted 18 July, 2023; originally announced July 2023.

Comments: 12 pages main text, 13 figures, 15 page appendix; equivalent to published version

Journal ref: Phys. Rev. X 14, 021028 (2024)

arXiv:2307.01694 [pdf, other]

Spike-driven Transformer

Authors: Man Yao, Jiakui Hu, Zhaokun Zhou, Li Yuan, Yonghong Tian, Bo Xu, Guoqi Li

Abstract: Spiking Neural Networks (SNNs) provide an energy-efficient deep learning option due to their unique spike-based event-driven (i.e., spike-driven) paradigm. In this paper, we incorporate the spike-driven paradigm into Transformer by the proposed Spike-driven Transformer with four unique properties: 1) Event-driven, no calculation is triggered when the input of Transformer is zero; 2) Binary spike c… ▽ More Spiking Neural Networks (SNNs) provide an energy-efficient deep learning option due to their unique spike-based event-driven (i.e., spike-driven) paradigm. In this paper, we incorporate the spike-driven paradigm into Transformer by the proposed Spike-driven Transformer with four unique properties: 1) Event-driven, no calculation is triggered when the input of Transformer is zero; 2) Binary spike communication, all matrix multiplications associated with the spike matrix can be transformed into sparse additions; 3) Self-attention with linear complexity at both token and channel dimensions; 4) The operations between spike-form Query, Key, and Value are mask and addition. Together, there are only sparse addition operations in the Spike-driven Transformer. To this end, we design a novel Spike-Driven Self-Attention (SDSA), which exploits only mask and addition operations without any multiplication, and thus having up to $87.2\times$ lower computation energy than vanilla self-attention. Especially in SDSA, the matrix multiplication between Query, Key, and Value is designed as the mask operation. In addition, we rearrange all residual connections in the vanilla Transformer before the activation functions to ensure that all neurons transmit binary spike signals. It is shown that the Spike-driven Transformer can achieve 77.1\% top-1 accuracy on ImageNet-1K, which is the state-of-the-art result in the SNN field. The source code is available at https://github.com/BICLab/Spike-Driven-Transformer. △ Less

Submitted 4 July, 2023; originally announced July 2023.

arXiv:2305.14725 [pdf, other]

AMELI: Enhancing Multimodal Entity Linking with Fine-Grained Attributes

Authors: Barry Menglong Yao, Yu Chen, Qifan Wang, Sijia Wang, Minqian Liu, Zhiyang Xu, Licheng Yu, Lifu Huang

Abstract: We propose attribute-aware multimodal entity linking, where the input is a mention described with a text and image, and the goal is to predict the corresponding target entity from a multimodal knowledge base (KB) where each entity is also described with a text description, a visual image and a set of attributes and values. To support this research, we construct AMELI, a large-scale dataset consist… ▽ More We propose attribute-aware multimodal entity linking, where the input is a mention described with a text and image, and the goal is to predict the corresponding target entity from a multimodal knowledge base (KB) where each entity is also described with a text description, a visual image and a set of attributes and values. To support this research, we construct AMELI, a large-scale dataset consisting of 18,472 reviews and 35,598 products. To establish baseline performance on AMELI, we experiment with the current state-of-the-art multimodal entity linking approaches and our enhanced attribute-aware model and demonstrate the importance of incorporating the attribute information into the entity linking process. To be best of our knowledge, we are the first to build benchmark dataset and solutions for the attribute-aware multimodal entity linking task. Datasets and codes will be made publicly available. △ Less

Submitted 24 May, 2023; originally announced May 2023.

Comments: 12 pages, 4 figures

ACM Class: I.2.7

Showing 1–50 of 261 results for author: Yao, M