subscribe to arXiv mailings

Signature of Orbital Driven Finite Momentum Pairing in a 3D Ising Superconductor

Authors: F. Z. Yang, H. D. Zhang, Saswata Mandal, F. Y. Meng, G. Fabbris, A. Said, P. Mercado Lozano, A. Rajapitamahuni, E. Vescovo, C. Nelson, S. Lin, Y. Park, E. M. Clements, T. Z. Ward, H. -N. Lee, H. C. Lei, C. X. Liu, H. Miao

Abstract: The finite momentum superconducting pairing states (FMPs), where Cooper pairs carry non-zero momentum, are believed to give rise to exotic physical phenomena including the pseudogap phase of cuprate high-Tc superconductors and Majorana fermions in topological superconductivity. FMPs can emerge in intertwined electronic liquids with strong spin-spin interactions or be induced by lifting the spin de… ▽ More The finite momentum superconducting pairing states (FMPs), where Cooper pairs carry non-zero momentum, are believed to give rise to exotic physical phenomena including the pseudogap phase of cuprate high-Tc superconductors and Majorana fermions in topological superconductivity. FMPs can emerge in intertwined electronic liquids with strong spin-spin interactions or be induced by lifting the spin degeneracy under magnetic field as originally proposed by Fulde-Ferrell and Larkin-Ovchinnikov. In quantum materials with strong Ising-type spin-orbit coupling, such as the 2D transition metal dichalcogenides (TMDs), the spin degree of freedom is frozen enabling novel orbital driven FMPs via magnetoelectric effect. While evidence of orbital driven FMPs has been revealed in bilayer TMDs, its realization in 3D bulk materials remains an unresolved challenge. Here we report experimental signatures of FMP in a locally noncentrosymmetric bulk superconductor 4Hb-TaS2. Using hard X-ray diffraction and angle-resolved photoemission spectroscopy, we reveal unusual 2D chiral charge density wave (CDW) and weak interlayer hopping in 4Hb-TaS2. Below the superconducting transition temperature, the upper critical field, Hc2, linearly increases via decreasing temperature, and well exceeds the Pauli limit, thus establishing the dominant orbital pair-breaking mechanism. Remarkably, we discover a field-induced superconductivity-to-superconductivity transition that breaks continuous rotational symmetry of the s-wave uniform pairing in the Bardeen-Cooper-Schrieffer theory down to the six-fold rotation symmetry. Combining with a Ginzburg-Landau free energy analysis that incorporates magnetoelectric effect, our observations provide strong evidence of orbital driven FMP in the 3D quantum heterostructure 4Hb-TaS2. △ Less

Submitted 14 July, 2024; originally announced July 2024.

arXiv:2407.10279 [pdf, other]

AlphaDou: High-Performance End-to-End Doudizhu AI Integrating Bidding

Authors: Chang Lei, Huan Lei

Abstract: Artificial intelligence for card games has long been a popular topic in AI research. In recent years, complex card games like Mahjong and Texas Hold'em have been solved, with corresponding AI programs reaching the level of human experts. However, the game of Dou Di Zhu presents significant challenges due to its vast state/action space and unique characteristics involving reasoning about competitio… ▽ More Artificial intelligence for card games has long been a popular topic in AI research. In recent years, complex card games like Mahjong and Texas Hold'em have been solved, with corresponding AI programs reaching the level of human experts. However, the game of Dou Di Zhu presents significant challenges due to its vast state/action space and unique characteristics involving reasoning about competition and cooperation, making the game extremely difficult to solve.The RL model DouZero, trained using the Deep Monte Carlo algorithm framework, has shown excellent performance in DouDiZhu. However, there are differences between its simplified game environment and the actual Dou Di Zhu environment, and its performance is still a considerable distance from that of human experts. This paper modifies the Deep Monte Carlo algorithm framework by using reinforcement learning to obtain a neural network that simultaneously estimates win rates and expectations. The action space is pruned using expectations, and strategies are generated based on win rates. This RL model is trained in a realistic DouDiZhu environment and achieves a state-of-the-art level among publicly available models. △ Less

Submitted 14 July, 2024; originally announced July 2024.

arXiv:2406.18129 [pdf, other]

CTS: Sim-to-Real Unsupervised Domain Adaptation on 3D Detection

Authors: Meiying Zhang, Weiyuan Peng, Guangyao Ding, Chenyang Lei, Chunlin Ji, Qi Hao

Abstract: Simulation data can be accurately labeled and have been expected to improve the performance of data-driven algorithms, including object detection. However, due to the various domain inconsistencies from simulation to reality (sim-to-real), cross-domain object detection algorithms usually suffer from dramatic performance drops. While numerous unsupervised domain adaptation (UDA) methods have been d… ▽ More Simulation data can be accurately labeled and have been expected to improve the performance of data-driven algorithms, including object detection. However, due to the various domain inconsistencies from simulation to reality (sim-to-real), cross-domain object detection algorithms usually suffer from dramatic performance drops. While numerous unsupervised domain adaptation (UDA) methods have been developed to address cross-domain tasks between real-world datasets, progress in sim-to-real remains limited. This paper presents a novel Complex-to-Simple (CTS) framework to transfer models from labeled simulation (source) to unlabeled reality (target) domains. Based on a two-stage detector, the novelty of this work is threefold: 1) developing fixed-size anchor heads and RoI augmentation to address size bias and feature diversity between two domains, thereby improving the quality of pseudo-label; 2) developing a novel corner-format representation of aleatoric uncertainty (AU) for the bounding box, to uniformly quantify pseudo-label quality; 3) developing a noise-aware mean teacher domain adaptation method based on AU, as well as object-level and frame-level sampling strategies, to migrate the impact of noisy labels. Experimental results demonstrate that our proposed approach significantly enhances the sim-to-real domain adaptation capability of 3D object detection models, outperforming state-of-the-art cross-domain algorithms, which are usually developed for real-to-real UDA tasks. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.11591 [pdf, other]

Capacitive detection of the topological magnetoelectric effect

Authors: Chao Lei, Perry T. Mahon, C. M. Canali, A. H. MacDonald

Abstract: The topological magnetoelectric effect (TME) is a defining property of 3-dimensional $\mathbb{Z}_{2}$ topological insulators that was predicted on theoretical grounds more than a decade ago, but has still not been directly measured. In this Letter we propose a strategy for direct measurement of the TME, and discuss the precision of the effect in real devices with charge and spin disorder. The topological magnetoelectric effect (TME) is a defining property of 3-dimensional $\mathbb{Z}_{2}$ topological insulators that was predicted on theoretical grounds more than a decade ago, but has still not been directly measured. In this Letter we propose a strategy for direct measurement of the TME, and discuss the precision of the effect in real devices with charge and spin disorder. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 5 + 1 pages, 3 figures

arXiv:2406.09534 [pdf, other]

FeatNavigator: Automatic Feature Augmentation on Tabular Data

Authors: Jiaming Liang, Chuan Lei, Xiao Qin, Jiani Zhang, Asterios Katsifodimos, Christos Faloutsos, Huzefa Rangwala

Abstract: Data-centric AI focuses on understanding and utilizing high-quality, relevant data in training machine learning (ML) models, thereby increasing the likelihood of producing accurate and useful results. Automatic feature augmentation, aiming to augment the initial base table with useful features from other tables, is critical in data preparation as it improves model performance, robustness, and gene… ▽ More Data-centric AI focuses on understanding and utilizing high-quality, relevant data in training machine learning (ML) models, thereby increasing the likelihood of producing accurate and useful results. Automatic feature augmentation, aiming to augment the initial base table with useful features from other tables, is critical in data preparation as it improves model performance, robustness, and generalizability. While recent works have investigated automatic feature augmentation, most of them have limited capabilities in utilizing all useful features as many of them are in candidate tables not directly joinable with the base table. Worse yet, with numerous join paths leading to these distant features, existing solutions fail to fully exploit them within a reasonable compute budget. We present FeatNavigator, an effective and efficient framework that explores and integrates high-quality features in relational tables for ML models. FeatNavigator evaluates a feature from two aspects: (1) the intrinsic value of a feature towards an ML task (i.e., feature importance) and (2) the efficacy of a join path connecting the feature to the base table (i.e., integration quality). FeatNavigator strategically selects a small set of available features and their corresponding join paths to train a feature importance estimation model and an integration quality prediction model. Furthermore, FeatNavigator's search algorithm exploits both estimated feature importance and integration quality to identify the optimized feature augmentation plan. Our experimental results show that FeatNavigator outperforms state-of-the-art solutions on five public datasets by up to 40.1% in ML model performance. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 15 pages, 41 figures

arXiv:2406.03882 [pdf, other]

Spontaneous Speech-Based Suicide Risk Detection Using Whisper and Large Language Models

Authors: Ziyun Cui, Chang Lei, Wen Wu, Yinan Duan, Diyang Qu, Ji Wu, Runsen Chen, Chao Zhang

Abstract: The early detection of suicide risk is important since it enables the intervention to prevent potential suicide attempts. This paper studies the automatic detection of suicide risk based on spontaneous speech from adolescents, and collects a Mandarin dataset with 15 hours of suicide speech from more than a thousand adolescents aged from ten to eighteen for our experiments. To leverage the diverse… ▽ More The early detection of suicide risk is important since it enables the intervention to prevent potential suicide attempts. This paper studies the automatic detection of suicide risk based on spontaneous speech from adolescents, and collects a Mandarin dataset with 15 hours of suicide speech from more than a thousand adolescents aged from ten to eighteen for our experiments. To leverage the diverse acoustic and linguistic features embedded in spontaneous speech, both the Whisper speech model and textual large language models (LLMs) are used for suicide risk detection. Both all-parameter finetuning and parameter-efficient finetuning approaches are used to adapt the pre-trained models for suicide risk detection, and multiple audio-text fusion approaches are evaluated to combine the representations of Whisper and the LLM. The proposed system achieves a detection accuracy of 0.807 and an F1-score of 0.846 on the test set with 119 subjects, indicating promising potential for real suicide risk detection applications. △ Less

Submitted 9 July, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

Comments: Accepted by Interspeech 2024

arXiv:2406.03461 [pdf, other]

Polarization Wavefront Lidar: Learning Large Scene Reconstruction from Polarized Wavefronts

Authors: Dominik Scheuble, Chenyang Lei, Seung-Hwan Baek, Mario Bijelic, Felix Heide

Abstract: Lidar has become a cornerstone sensing modality for 3D vision, especially for large outdoor scenarios and autonomous driving. Conventional lidar sensors are capable of providing centimeter-accurate distance information by emitting laser pulses into a scene and measuring the time-of-flight (ToF) of the reflection. However, the polarization of the received light that depends on the surface orientati… ▽ More Lidar has become a cornerstone sensing modality for 3D vision, especially for large outdoor scenarios and autonomous driving. Conventional lidar sensors are capable of providing centimeter-accurate distance information by emitting laser pulses into a scene and measuring the time-of-flight (ToF) of the reflection. However, the polarization of the received light that depends on the surface orientation and material properties is usually not considered. As such, the polarization modality has the potential to improve scene reconstruction beyond distance measurements. In this work, we introduce a novel long-range polarization wavefront lidar sensor (PolLidar) that modulates the polarization of the emitted and received light. Departing from conventional lidar sensors, PolLidar allows access to the raw time-resolved polarimetric wavefronts. We leverage polarimetric wavefronts to estimate normals, distance, and material properties in outdoor scenarios with a novel learned reconstruction method. To train and evaluate the method, we introduce a simulated and real-world long-range dataset with paired raw lidar data, ground truth distance, and normal maps. We find that the proposed method improves normal and distance reconstruction by 53\% mean angular error and 41\% mean absolute error compared to existing shape-from-polarization (SfP) and ToF methods. Code and data are open-sourced at https://light.princeton.edu/pollidar. △ Less

Submitted 11 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

Comments: Accepted at CVPR 2024; Project Website: https://light.princeton.edu/publication/pollidar

arXiv:2406.01555 [pdf, other]

Towards Flexible Interactive Reflection Removal with Human Guidance

Authors: Xiao Chen, Xudong Jiang, Yunkang Tao, Zhen Lei, Qing Li, Chenyang Lei, Zhaoxiang Zhang

Abstract: Single image reflection removal is inherently ambiguous, as both the reflection and transmission components requiring separation may follow natural image statistics. Existing methods attempt to address the issue by using various types of low-level and physics-based cues as sources of reflection signals. However, these cues are not universally applicable, since they are only observable in specific… ▽ More Single image reflection removal is inherently ambiguous, as both the reflection and transmission components requiring separation may follow natural image statistics. Existing methods attempt to address the issue by using various types of low-level and physics-based cues as sources of reflection signals. However, these cues are not universally applicable, since they are only observable in specific capture scenarios. This leads to a significant performance drop when test images do not align with their assumptions. In this paper, we aim to explore a novel flexible interactive reflection removal approach that leverages various forms of sparse human guidance, such as points and bounding boxes, as auxiliary high-level prior to achieve robust reflection removal. However, incorporating the raw user guidance naively into the existing reflection removal network does not result in performance gains. To this end, we innovatively transform raw user input into a unified form -- reflection masks using an Interactive Segmentation Foundation Model. Such a design absorbs the quintessence of the foundational segmentation model and flexible human guidance, thereby mitigating the challenges of reflection separations. Furthermore, to fully utilize user guidance and reduce user annotation costs, we design a mask-guided reflection removal network, comprising our proposed self-adaptive prompt block. This block adaptively incorporates user guidance as anchors and refines transmission features via cross-attention mechanisms. Extensive results on real-world images validate that our method demonstrates state-of-the-art performance on various datasets with the help of flexible and sparse user guidance. Our code and dataset will be publicly available here https://github.com/ShawnChenn/FlexibleReflectionRemoval. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2405.17705 [pdf, other]

DC-Gaussian: Improving 3D Gaussian Splatting for Reflective Dash Cam Videos

Authors: Linhan Wang, Kai Cheng, Shuo Lei, Shengkun Wang, Wei Yin, Chenyang Lei, Xiaoxiao Long, Chang-Tien Lu

Abstract: We present DC-Gaussian, a new method for generating novel views from in-vehicle dash cam videos. While neural rendering techniques have made significant strides in driving scenarios, existing methods are primarily designed for videos collected by autonomous vehicles. However, these videos are limited in both quantity and diversity compared to dash cam videos, which are more widely used across vari… ▽ More We present DC-Gaussian, a new method for generating novel views from in-vehicle dash cam videos. While neural rendering techniques have made significant strides in driving scenarios, existing methods are primarily designed for videos collected by autonomous vehicles. However, these videos are limited in both quantity and diversity compared to dash cam videos, which are more widely used across various types of vehicles and capture a broader range of scenarios. Dash cam videos often suffer from severe obstructions such as reflections and occlusions on the windshields, which significantly impede the application of neural rendering techniques. To address this challenge, we develop DC-Gaussian based on the recent real-time neural rendering technique 3D Gaussian Splatting (3DGS). Our approach includes an adaptive image decomposition module to model reflections and occlusions in a unified manner. Additionally, we introduce illumination-aware obstruction modeling to manage reflections and occlusions under varying lighting conditions. Lastly, we employ a geometry-guided Gaussian enhancement strategy to improve rendering details by incorporating additional geometry priors. Experiments on self-captured and public dash cam videos show that our method not only achieves state-of-the-art performance in novel view synthesis, but also accurately reconstructing captured scenes getting rid of obstructions. △ Less

Submitted 29 May, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

Comments: 9 pages,7 figures;project page: https://linhanwang.github.io/dcgaussian/

arXiv:2405.17213 [pdf]

Highly inhomogeneous interactions between background climate and urban warming across typical local climate zones in heatwave and non-heatwave days

Authors: Jing Kong, Yongling Zhao, Kai Gao, Dominik Strebel, Jan Carmeliet, Chengwang Lei

Abstract: Urban heat island (UHI) in conjunction with heatwave (HW) leads to exacerbation of thermal stress in urban areas. Prior research on UHI and HW has predominantly concentrated on examining the thermal conditions at the surface and near-surface, with few investigations extending to the radiative and dynamical interactions of UHI and HW, particularly with a focus on the inhomogeneities across local cl… ▽ More Urban heat island (UHI) in conjunction with heatwave (HW) leads to exacerbation of thermal stress in urban areas. Prior research on UHI and HW has predominantly concentrated on examining the thermal conditions at the surface and near-surface, with few investigations extending to the radiative and dynamical interactions of UHI and HW, particularly with a focus on the inhomogeneities across local climate zones (LCZs). Here, we analyse the temperature disparity between HW and non-HW conditions across LCZs in the Sydney area by quantifying the contributions of individual radiative and dynamical processes using the coupled surface-atmosphere climate feedback-response analysis method (CFRAM). Three HW events in 2017, 2019, and 2020 are simulated using the Weather Research and Forecasting (WRF) model coupled with the Single-Layer Urban Canopy Model (SLUCM). The maximum temperature difference between HW and non-HW days may reach up to 10 K, with the increased net solar radiation during HWs being comparable to the typical level of anthropogenic heat flux in urban areas. It is also found that the reduction of clouds, the presence of vapor, and the increase of sensible heat contribute to the warming effect at different levels, with the contribution of clouds being the most dominant. Conversely, the generation of dry convection and the increase of latent heat flux lead to mitigating effects, with the latter being more dominant and capable of causing up to 10 K surface temperature difference between LCZ1 (compact high-rise) and LCZ9 (sparsely built). The differences in the contributions of climate feedback processes across different LCZs become more evident during more severe and humid HWs. These findings underscore the necessity of implementing local climate zone-tailored heat mitigation strategies. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.05163 [pdf, other]

Fast Fourier transforms and fast Wigner and Weyl functions in large quantum systems

Authors: C. Lei, A. Vourdas

Abstract: Two methods for fast Fourier transforms are used in a quantum context. The first method is for systems with dimension of the Hilbert space $D=d^n$ with $d$ an odd integer, and is inspired by the Cooley-Tukey formalism. The `large Fourier transform' is expressed as a sequence of $n$ `small Fourier transforms' (together with some other transforms) in quantum systems with $d$-dimensional Hilbert spac… ▽ More Two methods for fast Fourier transforms are used in a quantum context. The first method is for systems with dimension of the Hilbert space $D=d^n$ with $d$ an odd integer, and is inspired by the Cooley-Tukey formalism. The `large Fourier transform' is expressed as a sequence of $n$ `small Fourier transforms' (together with some other transforms) in quantum systems with $d$-dimensional Hilbert space. Limitations of the method are discussed. In some special cases, the $n$ Fourier transforms can be performed in parallel. The second method is for systems with dimension of the Hilbert space $D=d_0...d_{n-1}$ with $d_0,...,d_{n-1}$ odd integers coprime to each other. It is inspired by the Good formalism, which in turn is based on the Chinese reminder theorem. In this case also the `large Fourier transform' is expressed as a sequence of $n$ `small Fourier transforms' (that involve some constants related to the number theory that describes the formalism). The `small Fourier transforms' can be performed in a classical computer or in a quantum computer (in which case we have the additional well known advantages of quantum Fourier transform circuits). In the case that the small Fourier transforms are performed with a classical computer, complexity arguments for both methods show the reduction in computational time from ${\cal O}(D^2)$ to ${\cal O}(D\log D)$. The second method is also used for the fast calculation of Wigner and Weyl functions, in quantum systems with large finite dimension of the Hilbert space. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Journal ref: Eur. Phys. J. Plus 139, 394 (2024)

arXiv:2404.18209 [pdf, other]

4DBInfer: A 4D Benchmarking Toolbox for Graph-Centric Predictive Modeling on Relational DBs

Authors: Minjie Wang, Quan Gan, David Wipf, Zhenkun Cai, Ning Li, Jianheng Tang, Yanlin Zhang, Zizhao Zhang, Zunyao Mao, Yakun Song, Yanbo Wang, Jiahang Li, Han Zhang, Guang Yang, Xiao Qin, Chuan Lei, Muhan Zhang, Weinan Zhang, Christos Faloutsos, Zheng Zhang

Abstract: Although RDBs store vast amounts of rich, informative data spread across interconnected tables, the progress of predictive machine learning models as applied to such tasks arguably falls well behind advances in other domains such as computer vision or natural language processing. This deficit stems, at least in part, from the lack of established/public RDB benchmarks as needed for training and eva… ▽ More Although RDBs store vast amounts of rich, informative data spread across interconnected tables, the progress of predictive machine learning models as applied to such tasks arguably falls well behind advances in other domains such as computer vision or natural language processing. This deficit stems, at least in part, from the lack of established/public RDB benchmarks as needed for training and evaluation purposes. As a result, related model development thus far often defaults to tabular approaches trained on ubiquitous single-table benchmarks, or on the relational side, graph-based alternatives such as GNNs applied to a completely different set of graph datasets devoid of tabular characteristics. To more precisely target RDBs lying at the nexus of these two complementary regimes, we explore a broad class of baseline models predicated on: (i) converting multi-table datasets into graphs using various strategies equipped with efficient subsampling, while preserving tabular characteristics; and (ii) trainable models with well-matched inductive biases that output predictions based on these input subgraphs. Then, to address the dearth of suitable public benchmarks and reduce siloed comparisons, we assemble a diverse collection of (i) large-scale RDB datasets and (ii) coincident predictive tasks. From a delivery standpoint, we operationalize the above four dimensions (4D) of exploration within a unified, scalable open-source toolbox called 4DBInfer. We conclude by presenting evaluations using 4DBInfer, the results of which highlight the importance of considering each such dimension in the design of RDB predictive models, as well as the limitations of more naive approaches such as simply joining adjacent tables. Our source code is released at https://github.com/awslabs/multi-table-benchmark . △ Less

Submitted 28 April, 2024; originally announced April 2024.

Comments: Under review

arXiv:2404.05661 [pdf, other]

Automatic Controllable Colorization via Imagination

Authors: Xiaoyan Cong, Yue Wu, Qifeng Chen, Chenyang Lei

Abstract: We propose a framework for automatic colorization that allows for iterative editing and modifications. The core of our framework lies in an imagination module: by understanding the content within a grayscale image, we utilize a pre-trained image generation model to generate multiple images that contain the same content. These images serve as references for coloring, mimicking the process of human… ▽ More We propose a framework for automatic colorization that allows for iterative editing and modifications. The core of our framework lies in an imagination module: by understanding the content within a grayscale image, we utilize a pre-trained image generation model to generate multiple images that contain the same content. These images serve as references for coloring, mimicking the process of human experts. As the synthesized images can be imperfect or different from the original grayscale image, we propose a Reference Refinement Module to select the optimal reference composition. Unlike most previous end-to-end automatic colorization algorithms, our framework allows for iterative and localized modifications of the colorization results because we explicitly model the coloring samples. Extensive experiments demonstrate the superiority of our framework over existing automatic colorization algorithms in editability and flexibility. Project page: https://xy-cong.github.io/imagine-colorization. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: CVPR 2024. Project page: https://xy-cong.github.io/imagine-colorization

arXiv:2404.04318 [pdf, other]

Robust Depth Enhancement via Polarization Prompt Fusion Tuning

Authors: Kei Ikemura, Yiming Huang, Felix Heide, Zhaoxiang Zhang, Qifeng Chen, Chenyang Lei

Abstract: Existing depth sensors are imperfect and may provide inaccurate depth values in challenging scenarios, such as in the presence of transparent or reflective objects. In this work, we present a general framework that leverages polarization imaging to improve inaccurate depth measurements from various depth sensors. Previous polarization-based depth enhancement methods focus on utilizing pure physics… ▽ More Existing depth sensors are imperfect and may provide inaccurate depth values in challenging scenarios, such as in the presence of transparent or reflective objects. In this work, we present a general framework that leverages polarization imaging to improve inaccurate depth measurements from various depth sensors. Previous polarization-based depth enhancement methods focus on utilizing pure physics-based formulas for a single sensor. In contrast, our method first adopts a learning-based strategy where a neural network is trained to estimate a dense and complete depth map from polarization data and a sensor depth map from different sensors. To further improve the performance, we propose a Polarization Prompt Fusion Tuning (PPFT) strategy to effectively utilize RGB-based models pre-trained on large-scale datasets, as the size of the polarization dataset is limited to train a strong model from scratch. We conducted extensive experiments on a public dataset, and the results demonstrate that the proposed method performs favorably compared to existing depth enhancement baselines. Code and demos are available at https://lastbasket.github.io/PPFT/. △ Less

Submitted 5 April, 2024; originally announced April 2024.

Comments: CVPR 2024. Project page: https://lastbasket.github.io/PPFT/. The first two authors contribute equally

arXiv:2403.13956 [pdf]

Epitaxially defined Luttinger liquids on MoS$_2$ bicrystals

Authors: Bingchen Deng, Heonsu Ahn, Jue Wang, Gunho Moon, Ninad Dongre, Chao Lei, Giovanni Scuri, Jiho Sung, Elise Brutschea, Kenji Watanabe, Takashi Taniguchi, Fan Zhang, Moon-Ho Jo, Hongkun Park

Abstract: A mirror twin boundary (MTB) in a transition metal dichalcogenide (TMD) monolayer can host one-dimensional electron liquid of a topological nature with tunable interactions. Unfortunately, the electrical characterization of such boundaries has been challenging due to the paucity of samples with large enough size and high quality. Here, we report an epitaxial growth of monolayer molybdenum disulfid… ▽ More A mirror twin boundary (MTB) in a transition metal dichalcogenide (TMD) monolayer can host one-dimensional electron liquid of a topological nature with tunable interactions. Unfortunately, the electrical characterization of such boundaries has been challenging due to the paucity of samples with large enough size and high quality. Here, we report an epitaxial growth of monolayer molybdenum disulfide (MoS$_2$) bicrystals with well-isolated MTBs that are tens of micrometers long. Conductance measurements of these MTBs exhibit power-law behaviors as a function of temperature and bias voltage up to room temperature, consistent with electrons tunneling into a Luttinger liquid. Transport measurements of two distinct types of MTBs reveal the critical role of the atomic-scale defects. This study demonstrates that MTBs in TMD monolayers provide an exciting new platform for studying the interplay between electronic interactions and topology. △ Less

Submitted 20 March, 2024; originally announced March 2024.

arXiv:2403.12372 [pdf, other]

Learning Transferable Time Series Classifier with Cross-Domain Pre-training from Language Model

Authors: Mingyue Cheng, Xiaoyu Tao, Qi Liu, Hao Zhang, Yiheng Chen, Chenyi Lei

Abstract: Advancements in self-supervised pre-training (SSL) have significantly advanced the field of learning transferable time series representations, which can be very useful in enhancing the downstream task. Despite being effective, most existing works struggle to achieve cross-domain SSL pre-training, missing valuable opportunities to integrate patterns and features from different domains. The main cha… ▽ More Advancements in self-supervised pre-training (SSL) have significantly advanced the field of learning transferable time series representations, which can be very useful in enhancing the downstream task. Despite being effective, most existing works struggle to achieve cross-domain SSL pre-training, missing valuable opportunities to integrate patterns and features from different domains. The main challenge lies in the significant differences in the characteristics of time-series data across different domains, such as variations in the number of channels and temporal resolution scales. To address this challenge, we propose CrossTimeNet, a novel cross-domain SSL learning framework to learn transferable knowledge from various domains to largely benefit the target downstream task. One of the key characteristics of CrossTimeNet is the newly designed time series tokenization module, which could effectively convert the raw time series into a sequence of discrete tokens based on a reconstruction optimization process. Besides, we highlight that predicting a high proportion of corrupted tokens can be very helpful for extracting informative patterns across different domains during SSL pre-training, which has been largely overlooked in past years. Furthermore, unlike previous works, our work treats the pre-training language model (PLM) as the initialization of the encoder network, investigating the feasibility of transferring the knowledge learned by the PLM to the time series area. Through these efforts, the path to cross-domain pre-training of a generic time series model can be effectively paved. We conduct extensive experiments in a real-world scenario across various time series classification domains. The experimental results clearly confirm CrossTimeNet's superior performance. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.07653 [pdf, other]

OmniMatch: Effective Self-Supervised Any-Join Discovery in Tabular Data Repositories

Authors: Christos Koutras, Jiani Zhang, Xiao Qin, Chuan Lei, Vasileios Ioannidis, Christos Faloutsos, George Karypis, Asterios Katsifodimos

Abstract: How can we discover join relationships among columns of tabular data in a data repository? Can this be done effectively when metadata is missing? Traditional column matching works mainly rely on similarity measures based on exact value overlaps, hence missing important semantics or failing to handle noise in the data. At the same time, recent dataset discovery methods focusing on deep table repres… ▽ More How can we discover join relationships among columns of tabular data in a data repository? Can this be done effectively when metadata is missing? Traditional column matching works mainly rely on similarity measures based on exact value overlaps, hence missing important semantics or failing to handle noise in the data. At the same time, recent dataset discovery methods focusing on deep table representation learning techniques, do not take into consideration the rich set of column similarity signals found in prior matching and discovery methods. Finally, existing methods heavily depend on user-provided similarity thresholds, hindering their deployability in real-world settings. In this paper, we propose OmniMatch, a novel join discovery technique that detects equi-joins and fuzzy-joins betwen columns by combining column-pair similarity measures with Graph Neural Networks (GNNs). OmniMatch's GNN can capture column relatedness leveraging graph transitivity, significantly improving the recall of join discovery tasks. At the same time, OmniMatch also increases the precision by augmenting its training data with negative column join examples through an automated negative example generation process. Most importantly, compared to the state-of-the-art matching and discovery methods, OmniMatch exhibits up to 14% higher effectiveness in F1 score and AUC without relying on metadata or user-provided thresholds for each similarity metric. △ Less

Submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.06579 [pdf, other]

Edge Information Hub: Orchestrating Satellites, UAVs, MEC, Sensing and Communications for 6G Closed-Loop Controls

Authors: Chengleyang Lei, Wei Feng, Peng Wei, Yunfei Chen, Ning Ge, Shiwen Mao

Abstract: An increasing number of field robots would be used for mission-critical tasks in remote or post-disaster areas. Due to usually-limited individual abilities, these robots require an edge information hub (EIH), which is capable of not only communications but also sensing and computing. Such EIH could be deployed on a flexibly-dispatched unmanned aerial vehicle (UAV). Different from traditional aeria… ▽ More An increasing number of field robots would be used for mission-critical tasks in remote or post-disaster areas. Due to usually-limited individual abilities, these robots require an edge information hub (EIH), which is capable of not only communications but also sensing and computing. Such EIH could be deployed on a flexibly-dispatched unmanned aerial vehicle (UAV). Different from traditional aerial base stations or mobile edge computing (MEC), the EIH would direct the operations of robots via sensing-communication-computing-control ($\textbf{SC}^3$) closed-loop orchestration. This paper aims to optimize the closed-loop control performance of multiple $\textbf{SC}^3$ loops, under the constraints of satellite-backhaul rate, computing capability, and on-board energy. Specifically, the linear quadratic regulator (LQR) control cost is used to measure the closed-loop utility, and a sum LQR cost minimization problem is formulated to jointly optimize the splitting of sensor data and allocation of communication and computing resources. We first derive the optimal splitting ratio of sensor data, and then recast the problem to a more tractable form. An iterative algorithm is finally proposed to provide a sub-optimal solution. Simulation results demonstrate the superiority of the proposed algorithm. We also uncover the influence of $\textbf{SC}^3$ parameters on closed-loop controls, highlighting more systematic understanding. △ Less

Submitted 11 March, 2024; originally announced March 2024.

Comments: 13pages, 9 figures

arXiv:2402.14361 [pdf, other]

OpenTab: Advancing Large Language Models as Open-domain Table Reasoners

Authors: Kezhi Kong, Jiani Zhang, Zhengyuan Shen, Balasubramaniam Srinivasan, Chuan Lei, Christos Faloutsos, Huzefa Rangwala, George Karypis

Abstract: Large Language Models (LLMs) trained on large volumes of data excel at various natural language tasks, but they cannot handle tasks requiring knowledge that has not been trained on previously. One solution is to use a retriever that fetches relevant information to expand LLM's knowledge scope. However, existing textual-oriented retrieval-based LLMs are not ideal on structured table data due to div… ▽ More Large Language Models (LLMs) trained on large volumes of data excel at various natural language tasks, but they cannot handle tasks requiring knowledge that has not been trained on previously. One solution is to use a retriever that fetches relevant information to expand LLM's knowledge scope. However, existing textual-oriented retrieval-based LLMs are not ideal on structured table data due to diversified data modalities and large table sizes. In this work, we propose OpenTab, an open-domain table reasoning framework powered by LLMs. Overall, OpenTab leverages table retriever to fetch relevant tables and then generates SQL programs to parse the retrieved tables efficiently. Utilizing the intermediate data derived from the SQL executions, it conducts grounded inference to produce accurate response. Extensive experimental evaluation shows that OpenTab significantly outperforms baselines in both open- and closed-domain settings, achieving up to 21.5% higher accuracy. We further run ablation studies to validate the efficacy of our proposed designs of the system. △ Less

Submitted 12 April, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

Comments: Accepted by ICLR 2024

arXiv:2402.09504 [pdf, other]

Superconducting Quantum Memory with a Suspended Coaxial Resonator

Authors: Lev Krayzman, Chan U Lei, Suhas Ganjam, James Teoh, Luigi Frunzio, Robert J. Schoelkopf

Abstract: A promising way to store quantum information is by encoding it in the bosonic excitations of microwave resonators. This provides for long coherence times, low dephasing rates, as well as a hardware-efficient approach to quantum error correction. There are two main methods used to make superconducting microwave resonators: traditionally machined out of bulk material, and lithographically fabricated… ▽ More A promising way to store quantum information is by encoding it in the bosonic excitations of microwave resonators. This provides for long coherence times, low dephasing rates, as well as a hardware-efficient approach to quantum error correction. There are two main methods used to make superconducting microwave resonators: traditionally machined out of bulk material, and lithographically fabricated on-chip in thin film. 3D resonators have few loss channels and larger mode volumes, and therefore smaller participations in the lossy parts, but it can be challenging to reach high material qualities. On-chip resonators can use low-loss thin films, but confine the field more tightly, resulting in higher participations and additional loss channels from the dielectric substrate. In this work, we present a design in which a dielectric scaffold supports a thin-film conductor within a 3D package, thus combining the low surface participations of bulk-machined cavities with the high quality and control over materials of thin-film circuits. By incorporating a separate chip containing a transmon qubit, we realize a quantum memory and measure single-photon lifetimes in excess of a millisecond. This hybrid 3D architecture has several advantages for scaling, as it relaxes the importance of the package and permits modular construction with separately-replaceable qubit and resonator devices. △ Less

Submitted 14 February, 2024; originally announced February 2024.

Comments: 11 pages, 8 figures

arXiv:2402.04722 [pdf, ps, other]

Ten simple rules for teaching sustainable software engineering

Authors: Kit Gallagher, Richard Creswell, Ben Lambert, Martin Robinson, Chon Lok Lei, Gary R. Mirams, David J. Gavaghan

Abstract: Computational methods and associated software implementations are central to every field of scientific investigation. Modern biological research, particularly within systems biology, has relied heavily on the development of software tools to process and organize increasingly large datasets, simulate complex mechanistic models, provide tools for the analysis and management of data, and visualize an… ▽ More Computational methods and associated software implementations are central to every field of scientific investigation. Modern biological research, particularly within systems biology, has relied heavily on the development of software tools to process and organize increasingly large datasets, simulate complex mechanistic models, provide tools for the analysis and management of data, and visualize and organize outputs. However, developing high-quality research software requires scientists to develop a host of software development skills, and teaching these skills to students is challenging. There has been a growing importance placed on ensuring reproducibility and good development practices in computational research. However, less attention has been devoted to informing the specific teaching strategies which are effective at nurturing in researchers the complex skillset required to produce high-quality software that, increasingly, is required to underpin both academic and industrial biomedical research. Recent articles in the Ten Simple Rules collection have discussed the teaching of foundational computer science and coding techniques to biology students. We advance this discussion by describing the specific steps for effectively teaching the necessary skills scientists need to develop sustainable software packages which are fit for (re-)use in academic research or more widely. Although our advice is likely to be applicable to all students and researchers hoping to improve their software development skills, our guidelines are directed towards an audience of students that have some programming literacy but little formal training in software development or engineering, typical of early doctoral students. These practices are also applicable outside of doctoral training environments, and we believe they should form a key part of postgraduate training schemes more generally in the life sciences. △ Less

Submitted 7 February, 2024; originally announced February 2024.

Comments: Prepared for submission to PLOS Computational Biology's 10 Simple Rules collection

arXiv:2401.11098 [pdf, other]

Neural auto-designer for enhanced quantum kernels

Authors: Cong Lei, Yuxuan Du, Peng Mi, Jun Yu, Tongliang Liu

Abstract: Quantum kernels hold great promise for offering computational advantages over classical learners, with the effectiveness of these kernels closely tied to the design of the quantum feature map. However, the challenge of designing effective quantum feature maps for real-world datasets, particularly in the absence of sufficient prior information, remains a significant obstacle. In this study, we pres… ▽ More Quantum kernels hold great promise for offering computational advantages over classical learners, with the effectiveness of these kernels closely tied to the design of the quantum feature map. However, the challenge of designing effective quantum feature maps for real-world datasets, particularly in the absence of sufficient prior information, remains a significant obstacle. In this study, we present a data-driven approach that automates the design of problem-specific quantum feature maps. Our approach leverages feature-selection techniques to handle high-dimensional data on near-term quantum machines with limited qubits, and incorporates a deep neural predictor to efficiently evaluate the performance of various candidate quantum kernels. Through extensive numerical simulations on different datasets, we demonstrate the superiority of our proposal over prior methods, especially for the capability of eliminating the kernel concentration issue and identifying the feature map with prediction advantages. Our work not only unlocks the potential of quantum kernels for enhancing real-world tasks but also highlights the substantial role of deep learning in advancing quantum machine learning. △ Less

Submitted 19 January, 2024; originally announced January 2024.

Comments: 24 pages, 14 figures, 9 tables, ICLR2024

arXiv:2401.07426 [pdf, other]

Generalized Planning for the Abstraction and Reasoning Corpus

Authors: Chao Lei, Nir Lipovetzky, Krista A. Ehinger

Abstract: The Abstraction and Reasoning Corpus (ARC) is a general artificial intelligence benchmark that poses difficulties for pure machine learning methods due to its requirement for fluid intelligence with a focus on reasoning and abstraction. In this work, we introduce an ARC solver, Generalized Planning for Abstract Reasoning (GPAR). It casts an ARC problem as a generalized planning (GP) problem, where… ▽ More The Abstraction and Reasoning Corpus (ARC) is a general artificial intelligence benchmark that poses difficulties for pure machine learning methods due to its requirement for fluid intelligence with a focus on reasoning and abstraction. In this work, we introduce an ARC solver, Generalized Planning for Abstract Reasoning (GPAR). It casts an ARC problem as a generalized planning (GP) problem, where a solution is formalized as a planning program with pointers. We express each ARC problem using the standard Planning Domain Definition Language (PDDL) coupled with external functions representing object-centric abstractions. We show how to scale up GP solvers via domain knowledge specific to ARC in the form of restrictions over the actions model, predicates, arguments and valid structure of planning programs. Our experiments demonstrate that GPAR outperforms the state-of-the-art solvers on the object-centric tasks of the ARC, showing the effectiveness of GP and the expressiveness of PDDL to model ARC problems. The challenges provided by the ARC benchmark motivate research to advance existing GP solvers and understand new relations with other planning computational models. Code is available at github.com/you68681/GPAR. △ Less

Submitted 14 January, 2024; originally announced January 2024.

Comments: Accepted at AAAI 2024 (extended version)

arXiv:2312.16245 [pdf, other]

iKUN: Speak to Trackers without Retraining

Authors: Yunhao Du, Cheng Lei, Zhicheng Zhao, Fei Su

Abstract: Referring multi-object tracking (RMOT) aims to track multiple objects based on input textual descriptions. Previous works realize it by simply integrating an extra textual module into the multi-object tracker. However, they typically need to retrain the entire framework and have difficulties in optimization. In this work, we propose an insertable Knowledge Unification Network, termed iKUN, to enab… ▽ More Referring multi-object tracking (RMOT) aims to track multiple objects based on input textual descriptions. Previous works realize it by simply integrating an extra textual module into the multi-object tracker. However, they typically need to retrain the entire framework and have difficulties in optimization. In this work, we propose an insertable Knowledge Unification Network, termed iKUN, to enable communication with off-the-shelf trackers in a plug-and-play manner. Concretely, a knowledge unification module (KUM) is designed to adaptively extract visual features based on textual guidance. Meanwhile, to improve the localization accuracy, we present a neural version of Kalman filter (NKF) to dynamically adjust process noise and observation noise based on the current motion status. Moreover, to address the problem of open-set long-tail distribution of textual descriptions, a test-time similarity calibration method is proposed to refine the confidence score with pseudo frequency. Extensive experiments on Refer-KITTI dataset verify the effectiveness of our framework. Finally, to speed up the development of RMOT, we also contribute a more challenging dataset, Refer-Dance, by extending public DanceTrack dataset with motion and dressing descriptions. The codes and dataset are available at https://github.com/dyhBUPT/iKUN. △ Less

Submitted 11 March, 2024; v1 submitted 25 December, 2023; originally announced December 2023.

Comments: CVPR 2024 camera-ready

arXiv:2312.15252 [pdf, other]

DTIAM: A unified framework for predicting drug-target interactions, binding affinities and activation/inhibition mechanisms

Authors: Zhangli Lu, Chuqi Lei, Kaili Wang, Libo Qin, Jing Tang, Min Li

Abstract: Accurate and robust prediction of drug-target interactions (DTIs) plays a vital role in drug discovery. Despite extensive efforts have been invested in predicting novel DTIs, existing approaches still suffer from insufficient labeled data and cold start problems. More importantly, there is currently a lack of studies focusing on elucidating the mechanism of action (MoA) between drugs and targets.… ▽ More Accurate and robust prediction of drug-target interactions (DTIs) plays a vital role in drug discovery. Despite extensive efforts have been invested in predicting novel DTIs, existing approaches still suffer from insufficient labeled data and cold start problems. More importantly, there is currently a lack of studies focusing on elucidating the mechanism of action (MoA) between drugs and targets. Distinguishing the activation and inhibition mechanisms is critical and challenging in drug development. Here, we introduce a unified framework called DTIAM, which aims to predict interactions, binding affinities, and activation/inhibition mechanisms between drugs and targets. DTIAM learns drug and target representations from large amounts of label-free data through self-supervised pre-training, which accurately extracts the substructure and contextual information of drugs and targets, and thus benefits the downstream prediction based on these representations. DTIAM achieves substantial performance improvement over other state-of-the-art methods in all tasks, particularly in the cold start scenario. Moreover, independent validation demonstrates the strong generalization ability of DTIAM. All these results suggested that DTIAM can provide a practically useful tool for predicting novel DTIs and further distinguishing the MoA of candidate drugs. DTIAM, for the first time, provides a unified framework for accurate and robust prediction of drug-target interactions, binding affinities, and activation/inhibition mechanisms. △ Less

Submitted 23 December, 2023; originally announced December 2023.

arXiv:2312.15139 [pdf, other]

Automatic Tooth Arrangement with Joint Features of Point and Mesh Representations via Diffusion Probabilistic Models

Authors: Changsong Lei, Mengfei Xia, Shaofeng Wang, Yaqian Liang, Ran Yi, Yuhui Wen, Yongjin Liu

Abstract: Tooth arrangement is a crucial step in orthodontics treatment, in which aligning teeth could improve overall well-being, enhance facial aesthetics, and boost self-confidence. To improve the efficiency of tooth arrangement and minimize errors associated with unreasonable designs by inexperienced practitioners, some deep learning-based tooth arrangement methods have been proposed. Currently, most ex… ▽ More Tooth arrangement is a crucial step in orthodontics treatment, in which aligning teeth could improve overall well-being, enhance facial aesthetics, and boost self-confidence. To improve the efficiency of tooth arrangement and minimize errors associated with unreasonable designs by inexperienced practitioners, some deep learning-based tooth arrangement methods have been proposed. Currently, most existing approaches employ MLPs to model the nonlinear relationship between tooth features and transformation matrices to achieve tooth arrangement automatically. However, the limited datasets (which to our knowledge, have not been made public) collected from clinical practice constrain the applicability of existing methods, making them inadequate for addressing diverse malocclusion issues. To address this challenge, we propose a general tooth arrangement neural network based on the diffusion probabilistic model. Conditioned on the features extracted from the dental model, the diffusion probabilistic model can learn the distribution of teeth transformation matrices from malocclusion to normal occlusion by gradually denoising from a random variable, thus more adeptly managing real orthodontic data. To take full advantage of effective features, we exploit both mesh and point cloud representations by designing different encoding networks to extract the tooth (local) and jaw (global) features, respectively. In addition to traditional metrics ADD, PA-ADD, CSA, and ME_{rot}, we propose a new evaluation metric based on dental arch curves to judge whether the generated teeth meet the individual normal occlusion. Experimental results demonstrate that our proposed method achieves state-of-the-art tooth alignment results and satisfactory occlusal relationships between dental arches. We will publish the code and dataset. △ Less

Submitted 22 December, 2023; originally announced December 2023.

arXiv:2312.14235 [pdf, other]

Neural Spline Fields for Burst Image Fusion and Layer Separation

Authors: Ilya Chugunov, David Shustin, Ruyu Yan, Chenyang Lei, Felix Heide

Abstract: Each photo in an image burst can be considered a sample of a complex 3D scene: the product of parallax, diffuse and specular materials, scene motion, and illuminant variation. While decomposing all of these effects from a stack of misaligned images is a highly ill-conditioned task, the conventional align-and-merge burst pipeline takes the other extreme: blending them into a single image. In this w… ▽ More Each photo in an image burst can be considered a sample of a complex 3D scene: the product of parallax, diffuse and specular materials, scene motion, and illuminant variation. While decomposing all of these effects from a stack of misaligned images is a highly ill-conditioned task, the conventional align-and-merge burst pipeline takes the other extreme: blending them into a single image. In this work, we propose a versatile intermediate representation: a two-layer alpha-composited image plus flow model constructed with neural spline fields -- networks trained to map input coordinates to spline control points. Our method is able to, during test-time optimization, jointly fuse a burst image capture into one high-resolution reconstruction and decompose it into transmission and obstruction layers. Then, by discarding the obstruction layer, we can perform a range of tasks including seeing through occlusions, reflection suppression, and shadow removal. Validated on complex synthetic and in-the-wild captures we find that, with no post-processing steps or learned priors, our generalizable model is able to outperform existing dedicated single-image and multi-view obstruction removal approaches. △ Less

Submitted 21 December, 2023; originally announced December 2023.

Comments: project website: https://light.princeton.edu/publication/nsf

arXiv:2312.08571 [pdf, other]

PhasePerturbation: Speech Data Augmentation via Phase Perturbation for Automatic Speech Recognition

Authors: Chengxi Lei, Satwinder Singh, Feng Hou, Xiaoyun Jia, Ruili Wang

Abstract: Most of the current speech data augmentation methods operate on either the raw waveform or the amplitude spectrum of speech. In this paper, we propose a novel speech data augmentation method called PhasePerturbation that operates dynamically on the phase spectrum of speech. Instead of statically rotating a phase by a constant degree, PhasePerturbation utilizes three dynamic phase spectrum operatio… ▽ More Most of the current speech data augmentation methods operate on either the raw waveform or the amplitude spectrum of speech. In this paper, we propose a novel speech data augmentation method called PhasePerturbation that operates dynamically on the phase spectrum of speech. Instead of statically rotating a phase by a constant degree, PhasePerturbation utilizes three dynamic phase spectrum operations, i.e., a randomization operation, a frequency masking operation, and a temporal masking operation, to enhance the diversity of speech data. We conduct experiments on wav2vec2.0 pre-trained ASR models by fine-tuning them with the PhasePerturbation augmented TIMIT corpus. The experimental results demonstrate 10.9\% relative reduction in the word error rate (WER) compared with the baseline model fine-tuned without any augmentation operation. Furthermore, the proposed method achieves additional improvements (12.9\% and 15.9\%) in WER by complementing the Vocal Tract Length Perturbation (VTLP) and the SpecAug, which are both amplitude spectrum-based augmentation methods. The results highlight the capability of PhasePerturbation to improve the current amplitude spectrum-based augmentation methods. △ Less

Submitted 13 December, 2023; originally announced December 2023.

arXiv:2312.07254 [pdf, other]

The GUA-Speech System Description for CNVSRC Challenge 2023

Authors: Shengqiang Li, Chao Lei, Baozhong Ma, Binbin Zhang, Fuping Pan

Abstract: This study describes our system for Task 1 Single-speaker Visual Speech Recognition (VSR) fixed track in the Chinese Continuous Visual Speech Recognition Challenge (CNVSRC) 2023. Specifically, we use intermediate connectionist temporal classification (Inter CTC) residual modules to relax the conditional independence assumption of CTC in our model. Then we use a bi-transformer decoder to enable the… ▽ More This study describes our system for Task 1 Single-speaker Visual Speech Recognition (VSR) fixed track in the Chinese Continuous Visual Speech Recognition Challenge (CNVSRC) 2023. Specifically, we use intermediate connectionist temporal classification (Inter CTC) residual modules to relax the conditional independence assumption of CTC in our model. Then we use a bi-transformer decoder to enable the model to capture both past and future contextual information. In addition, we use Chinese characters as the modeling units to improve the recognition accuracy of our model. Finally, we use a recurrent neural network language model (RNNLM) for shallow fusion in the inference stage. Experiments show that our system achieves a character error rate (CER) of 38.09% on the Eval set which reaches a relative CER reduction of 21.63% over the official baseline, and obtains a second place in the challenge. △ Less

Submitted 12 December, 2023; originally announced December 2023.

Comments: CNVSRC 2023 Challenge

arXiv:2311.17859 [pdf, other]

Non-universal surface magnetoelectric response in antiferromagnetic topological insulators

Authors: Chao Lei, Perry T. Mahon, Allan H. MacDonald

Abstract: The electronic ground state of a three-dimensional (3D) band insulator with time-reversal ($Θ$) symmetry or time-reversal times a discrete translation ($ΘT_{1/2}$) symmetry is classified by a $\mathbb{Z}_{2}$-valued topological invariant and characterized by quantized magnetoelectric response. Here we demonstrate by explicit calculation in model $\mathbb{Z}_{2}$ topological insulator thin-films th… ▽ More The electronic ground state of a three-dimensional (3D) band insulator with time-reversal ($Θ$) symmetry or time-reversal times a discrete translation ($ΘT_{1/2}$) symmetry is classified by a $\mathbb{Z}_{2}$-valued topological invariant and characterized by quantized magnetoelectric response. Here we demonstrate by explicit calculation in model $\mathbb{Z}_{2}$ topological insulator thin-films that whereas the magnetoelectric response is localized at the surface in the $Θ$ symmetry (non-magnetic) case, it is non-universally partitioned between surface and interior contributions in the $ΘT_{1/2}$ (anti-ferromagnetic) case, while remaining quantized. Within our model the magnetic field induced polarization arises entirely from an anomalous ${\cal N}=0$ Landau level subspace within which the projected Hamiltonian is a generalized Su-Schrieffer-Heeger model whose topological properties are consistent with those of the starting 3D model. We identify a novel connection between the ground state geometry of that 3D model and surface-interior-partitioning in thin films. △ Less

Submitted 8 July, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

Comments: 6+13 pages, 4 figures, comments welcome

arXiv:2311.15571 [pdf, other]

Video-based Visible-Infrared Person Re-Identification with Auxiliary Samples

Authors: Yunhao Du, Cheng Lei, Zhicheng Zhao, Yuan Dong, Fei Su

Abstract: Visible-infrared person re-identification (VI-ReID) aims to match persons captured by visible and infrared cameras, allowing person retrieval and tracking in 24-hour surveillance systems. Previous methods focus on learning from cross-modality person images in different cameras. However, temporal information and single-camera samples tend to be neglected. To crack this nut, in this paper, we first… ▽ More Visible-infrared person re-identification (VI-ReID) aims to match persons captured by visible and infrared cameras, allowing person retrieval and tracking in 24-hour surveillance systems. Previous methods focus on learning from cross-modality person images in different cameras. However, temporal information and single-camera samples tend to be neglected. To crack this nut, in this paper, we first contribute a large-scale VI-ReID dataset named BUPTCampus. Different from most existing VI-ReID datasets, it 1) collects tracklets instead of images to introduce rich temporal information, 2) contains pixel-aligned cross-modality sample pairs for better modality-invariant learning, 3) provides one auxiliary set to help enhance the optimization, in which each identity only appears in a single camera. Based on our constructed dataset, we present a two-stream framework as baseline and apply Generative Adversarial Network (GAN) to narrow the gap between the two modalities. To exploit the advantages introduced by the auxiliary set, we propose a curriculum learning based strategy to jointly learn from both primary and auxiliary sets. Moreover, we design a novel temporal k-reciprocal re-ranking method to refine the ranking list with fine-grained temporal correlation cues. Experimental results demonstrate the effectiveness of the proposed methods. We also reproduce 9 state-of-the-art image-based and video-based VI-ReID methods on BUPTCampus and our methods show substantial superiority to them. The codes and dataset are available at: https://github.com/dyhBUPT/BUPTCampus. △ Less

Submitted 27 November, 2023; originally announced November 2023.

Comments: Accepted by Transactions on Information Forensics & Security 2023

arXiv:2310.09469 [pdf, other]

Towards More Accurate Diffusion Model Acceleration with A Timestep Aligner

Authors: Mengfei Xia, Yujun Shen, Changsong Lei, Yu Zhou, Ran Yi, Deli Zhao, Wenping Wang, Yong-jin Liu

Abstract: A diffusion model, which is formulated to produce an image using thousands of denoising steps, usually suffers from a slow inference speed. Existing acceleration algorithms simplify the sampling by skipping most steps yet exhibit considerable performance degradation. By viewing the generation of diffusion models as a discretized integrating process, we argue that the quality drop is partly caused… ▽ More A diffusion model, which is formulated to produce an image using thousands of denoising steps, usually suffers from a slow inference speed. Existing acceleration algorithms simplify the sampling by skipping most steps yet exhibit considerable performance degradation. By viewing the generation of diffusion models as a discretized integrating process, we argue that the quality drop is partly caused by applying an inaccurate integral direction to a timestep interval. To rectify this issue, we propose a timestep aligner that helps find a more accurate integral direction for a particular interval at the minimum cost. Specifically, at each denoising step, we replace the original parameterization by conditioning the network on a new timestep, which is obtained by aligning the sampling distribution to the real distribution. Extensive experiments show that our plug-in design can be trained efficiently and boost the inference performance of various state-of-the-art acceleration methods, especially when there are few denoising steps. For example, when using 10 denoising steps on the popular LSUN Bedroom dataset, we improve the FID of DDIM from 9.65 to 6.07, simply by adopting our method for a more appropriate set of timesteps. Code will be made publicly available. △ Less

Submitted 13 October, 2023; originally announced October 2023.

arXiv:2309.07474 [pdf, other]

A Fuzzy Cascaded Proportional-Derivative Controller for Under-actuated Flexible Joint Manipulators Using Bayesian Optimization

Authors: Changyi Lei, Quanmin Zhu

Abstract: This paper proposes a novel fuzzy cascaded Proportional-Derivative (PD) controller for under-actuated single-link flexible joint manipulators. The original flexible joint system is considered as two coupled $2^{nd}$-order sub-systems. The proposed controller is composed of two cascaded PD controllers and two fuzzy logic regulators (FLRs). The first (virtual) PD controller is used to generate desir… ▽ More This paper proposes a novel fuzzy cascaded Proportional-Derivative (PD) controller for under-actuated single-link flexible joint manipulators. The original flexible joint system is considered as two coupled $2^{nd}$-order sub-systems. The proposed controller is composed of two cascaded PD controllers and two fuzzy logic regulators (FLRs). The first (virtual) PD controller is used to generate desired control input that stabilizes the first $2^{nd}$-order sub-system. Solving the equation by considering the coupling terms as design variables, the reference signal is generated for the second sub-system. Then through simple compensation design, together with the second PD controller, the cascaded PD controller is derived. In order to further improve the performance, two FLRs are implemented that adaptively tune the parameters of PD controllers. Under natural assumptions, the cascaded fuzzy PD controller is proved to possess locally asymptotic stability. All the offline tuning processes are completed data-efficiently by Bayesian Optimization. The results in simulation illustrate the stability and validity of our proposed method. Besides, the idea of cascaded PD controller presented here may be extended as a novel control method for other under-actuated systems, and the stability analysis renders a new perspective towards the stability proof of all other fuzzy-enhanced PID controllers. △ Less

Submitted 14 September, 2023; originally announced September 2023.

Comments: 19 pages, 23 figures, 6 tables

MSC Class: 93C42; 93D15; 93C42 (Primary); 93C10 (Secondary)

arXiv:2309.04669 [pdf, other]

Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual Tokenization

Authors: Yang Jin, Kun Xu, Kun Xu, Liwei Chen, Chao Liao, Jianchao Tan, Quzhe Huang, Bin Chen, Chenyi Lei, An Liu, Chengru Song, Xiaoqiang Lei, Di Zhang, Wenwu Ou, Kun Gai, Yadong Mu

Abstract: Recently, the remarkable advance of the Large Language Model (LLM) has inspired researchers to transfer its extraordinary reasoning capability to both vision and language data. However, the prevailing approaches primarily regard the visual input as a prompt and focus exclusively on optimizing the text generation process conditioned upon vision content by a frozen LLM. Such an inequitable treatment… ▽ More Recently, the remarkable advance of the Large Language Model (LLM) has inspired researchers to transfer its extraordinary reasoning capability to both vision and language data. However, the prevailing approaches primarily regard the visual input as a prompt and focus exclusively on optimizing the text generation process conditioned upon vision content by a frozen LLM. Such an inequitable treatment of vision and language heavily constrains the model's potential. In this paper, we break through this limitation by representing both vision and language in a unified form. Specifically, we introduce a well-designed visual tokenizer to translate the non-linguistic image into a sequence of discrete tokens like a foreign language that LLM can read. The resulting visual tokens encompass high-level semantics worthy of a word and also support dynamic sequence length varying from the image. Coped with this tokenizer, the presented foundation model called LaVIT can handle both image and text indiscriminately under the same generative learning paradigm. This unification empowers LaVIT to serve as an impressive generalist interface to understand and generate multi-modal content simultaneously. Extensive experiments further showcase that it outperforms the existing models by a large margin on massive vision-language tasks. Our code and models are available at https://github.com/jy0205/LaVIT. △ Less

Submitted 22 March, 2024; v1 submitted 8 September, 2023; originally announced September 2023.

Comments: ICLR 2024

arXiv:2308.15539 [pdf, other]

Surpassing millisecond coherence times in on-chip superconducting quantum memories by optimizing materials, processes, and circuit design

Authors: Suhas Ganjam, Yanhao Wang, Yao Lu, Archan Banerjee, Chan U Lei, Lev Krayzman, Kim Kisslinger, Chenyu Zhou, Ruoshui Li, Yichen Jia, Mingzhao Liu, Luigi Frunzio, Robert J. Schoelkopf

Abstract: The performance of superconducting quantum circuits for quantum computing has advanced tremendously in recent decades; however, a comprehensive understanding of relaxation mechanisms does not yet exist. In this work, we utilize a multimode approach to characterizing energy losses in superconducting quantum circuits, with the goals of predicting device performance and improving coherence through ma… ▽ More The performance of superconducting quantum circuits for quantum computing has advanced tremendously in recent decades; however, a comprehensive understanding of relaxation mechanisms does not yet exist. In this work, we utilize a multimode approach to characterizing energy losses in superconducting quantum circuits, with the goals of predicting device performance and improving coherence through materials, process, and circuit design optimization. Using this approach, we measure significant reductions in surface and bulk dielectric losses by employing a tantalum-based materials platform and annealed sapphire substrates. With this knowledge we predict and experimentally verify the relaxation times of aluminum- and tantalum-based transmon qubits. We additionally optimize device geometry to maximize coherence within a coaxial tunnel architecture, and realize on-chip quantum memories with single-photon Ramsey times of 2.0$-$2.7 ms, limited by their energy relaxation times of 1.0$-$1.4 ms. To our knowledge this is the highest coherence achieved in an on-chip quantum memory, and demonstrates an advancement towards a more modular and compact coaxial circuit architecture for bosonic qubits with reproducibly high coherence. △ Less

Submitted 14 September, 2023; v1 submitted 29 August, 2023; originally announced August 2023.

Comments: Updated submission 23/09/14: corrected grant number under Acknowledgements

arXiv:2308.13241 [pdf, other]

WSTac: Interactive Surface Perception based on Whisker-Inspired and Self-Illuminated Vision-Based Tactile Sensor

Authors: Kai Chong Lei, Kit Wa Sou, Wang Sing Chan, Jiayi Yan, Siqi Ping, Dengfeng Peng, Wenbo Ding, Xiao-Ping Zhang

Abstract: Modern Visual-Based Tactile Sensors (VBTSs) use cost-effective cameras to track elastomer deformation, but struggle with ambient light interference. Solutions typically involve using internal LEDs and blocking external light, thus adding complexity. Creating a VBTS resistant to ambient light with just a camera and an elastomer remains a challenge. In this work, we introduce WStac, a self-illuminat… ▽ More Modern Visual-Based Tactile Sensors (VBTSs) use cost-effective cameras to track elastomer deformation, but struggle with ambient light interference. Solutions typically involve using internal LEDs and blocking external light, thus adding complexity. Creating a VBTS resistant to ambient light with just a camera and an elastomer remains a challenge. In this work, we introduce WStac, a self-illuminating VBTS comprising a mechanoluminescence (ML) whisker elastomer, camera, and 3D printed parts. The ML whisker elastomer, inspired by the touch sensitivity of vibrissae, offers both light isolation and high ML intensity under stress, thereby removing the necessity for additional LED modules. With the incorporation of machine learning, the sensor effectively utilizes the dynamic contact variations of 25 whiskers to successfully perform tasks like speed regression, directional identification, and texture classification. Videos are available at: https://sites.google.com/view/wstac/. △ Less

Submitted 25 August, 2023; originally announced August 2023.

arXiv:2308.09282 [pdf, other]

doi 10.1103/PhysRevResearch.6.023289

Symmetry, topology, and geometry: The many faces of the topological magnetoelectric effect

Authors: Perry T. Mahon, Chao Lei, Allan H. MacDonald

Abstract: A delicate tension complicates the relationship between the topological magnetoelectric effect (TME) in three-dimensional (3D) $\mathbb{Z}_2$ topological insulators (TIs) and time-reversal symmetry (TRS). TRS underlies a particular $\mathbb{Z}_2$ topological classification of the electronic ground state of crystalline band insulators and the associated quantization of the magnetoelectric response… ▽ More A delicate tension complicates the relationship between the topological magnetoelectric effect (TME) in three-dimensional (3D) $\mathbb{Z}_2$ topological insulators (TIs) and time-reversal symmetry (TRS). TRS underlies a particular $\mathbb{Z}_2$ topological classification of the electronic ground state of crystalline band insulators and the associated quantization of the magnetoelectric response coefficient calculated using bulk linear response theory but, according to standard symmetry arguments, simultaneously forbids a nonzero magnetoelectric coefficient in any physical finite-size system. This contrast between theories of magnetoelectric response in formal bulk models and in real finite-sized materials originates from the distinct approaches required to introduce notions of (electronic) polarization and orbital magnetization in these fundamentally different environments. In this work we argue for a modified interpretation of the bulk linear response calculations in non-magnetic $\mathbb{Z}_2$ TIs that is more plainly consistent with TRS, and use this interpretation to discuss the effect's observation - still absent over a decade after its prediction. Motivated by analytical results, we conjecture a type of microscopic bulk-boundary correspondence: a bulk insulator with (generalized) TRS supports a magnetoelectric coefficient that is purely itinerant (which is generically related to the geometry of the ground state) if and only if magnetic surface dopants are required for the TME to manifest in finite samples thereof. We conclude that in non-magnetic $\mathbb{Z}_2$ TIs the TME is activated by magnetic surface dopants, that the charge density response to a uniform dc magnetic field is localized at the surface and specified by the configuration of those dopants, and that the TME is qualitatively less robust against disorder than the integer quantum Hall effect. △ Less

Submitted 17 June, 2024; v1 submitted 17 August, 2023; originally announced August 2023.

Comments: 33 pages, 5 figures

Journal ref: Phys. Rev. Research 6, 023289 (2024)

arXiv:2308.02797 [pdf, other]

Thin On-Sensor Nanophotonic Array Cameras

Authors: Praneeth Chakravarthula, Jipeng Sun, Xiao Li, Chenyang Lei, Gene Chou, Mario Bijelic, Johannes Froesch, Arka Majumdar, Felix Heide

Abstract: Today's commodity camera systems rely on compound optics to map light originating from the scene to positions on the sensor where it gets recorded as an image. To record images without optical aberrations, i.e., deviations from Gauss' linear model of optics, typical lens systems introduce increasingly complex stacks of optical elements which are responsible for the height of existing commodity cam… ▽ More Today's commodity camera systems rely on compound optics to map light originating from the scene to positions on the sensor where it gets recorded as an image. To record images without optical aberrations, i.e., deviations from Gauss' linear model of optics, typical lens systems introduce increasingly complex stacks of optical elements which are responsible for the height of existing commodity cameras. In this work, we investigate flat nanophotonic computational cameras as an alternative that employs an array of skewed lenslets and a learned reconstruction approach. The optical array is embedded on a metasurface that, at 700~nm height, is flat and sits on the sensor cover glass at 2.5~mm focal distance from the sensor. To tackle the highly chromatic response of a metasurface and design the array over the entire sensor, we propose a differentiable optimization method that continuously samples over the visible spectrum and factorizes the optical modulation for different incident fields into individual lenses. We reconstruct a megapixel image from our flat imager with a learned probabilistic reconstruction method that employs a generative diffusion model to sample an implicit prior. To tackle scene-dependent aberrations in broadband, we propose a method for acquiring paired captured training data in varying illumination conditions. We assess the proposed flat camera design in simulation and with an experimental prototype, validating that the method is capable of recovering images from diverse scenes in broadband with a single nanophotonic layer. △ Less

Submitted 5 August, 2023; originally announced August 2023.

Comments: 18 pages, 12 figures, to be published in ACM Transactions on Graphics

ACM Class: I.4.0

arXiv:2307.05139 [pdf, ps, other]

Coherent phonon and unconventional carriers in the magnetic kagome metal Fe$_3$Sn$_2$

Authors: M. V. Gonçalves-Faria, A. Pashkin, Q. Wang, H. C. Lei, S. Winnerl, A. A. Tsirlin, M. Helm, E. Uykur

Abstract: Temperature- and fluence-dependent carrier dynamics of the magnetic Kagome metal Fe$_3$Sn$_2$ were studied using the ultrafast optical pump-probe technique. Two carrier relaxation processes ($τ_1$ and $τ_2$) and a laser induced coherent optical phonon were observed. By using the two-temperature model for metals, we ascribe the shorter relaxation $τ_1$ (~1 ps) to hot electrons transferring their en… ▽ More Temperature- and fluence-dependent carrier dynamics of the magnetic Kagome metal Fe$_3$Sn$_2$ were studied using the ultrafast optical pump-probe technique. Two carrier relaxation processes ($τ_1$ and $τ_2$) and a laser induced coherent optical phonon were observed. By using the two-temperature model for metals, we ascribe the shorter relaxation $τ_1$ (~1 ps) to hot electrons transferring their energy to the crystal lattice via electron-phonon scattering. $τ_2$ (~25 ps), on the other hand, cannot be explained as a conventional process and is attributed to the unconventional (localized) carriers in the material. The observed coherent oscillation is assigned to be a totally symmetric A$_{1g}$ optical phonon dominated by Sn displacements out of the Kagome planes, and possesses a prominently large amplitude, on the order of 10$^{-3}$, comparable to the maximum of the reflectivity change ($Δ$R/R). This amplitude is equivalent to charge-density-wave (CDW) systems, although no signs of such an instability were hitherto reported in Fe$_3$Sn$_2$. Our results set an unexpected connection between Fe$_3$Sn$_2$ and kagome metals with CDW instabilities, and suggest a unique interplay between phonon and electron dynamics in this compound. △ Less

Submitted 11 July, 2023; originally announced July 2023.

Comments: 12 pages, 14 figures

arXiv:2307.03169 [pdf, other]

Demonstrating a superconducting dual-rail cavity qubit with erasure-detected logical measurements

Authors: Kevin S. Chou, Tali Shemma, Heather McCarrick, Tzu-Chiao Chien, James D. Teoh, Patrick Winkel, Amos Anderson, Jonathan Chen, Jacob Curtis, Stijn J. de Graaf, John W. O. Garmon, Benjamin Gudlewski, William D. Kalfus, Trevor Keen, Nishaad Khedkar, Chan U Lei, Gangqiang Liu, Pinlei Lu, Yao Lu, Aniket Maiti, Luke Mastalli-Kelly, Nitish Mehta, Shantanu O. Mundhada, Anirudh Narla, Taewan Noh , et al. (9 additional authors not shown)

Abstract: A critical challenge in developing scalable error-corrected quantum systems is the accumulation of errors while performing operations and measurements. One promising approach is to design a system where errors can be detected and converted into erasures. Such a system utilizing erasure qubits are known to have relaxed requirements for quantum error correction. A recent proposal aims to do this usi… ▽ More A critical challenge in developing scalable error-corrected quantum systems is the accumulation of errors while performing operations and measurements. One promising approach is to design a system where errors can be detected and converted into erasures. Such a system utilizing erasure qubits are known to have relaxed requirements for quantum error correction. A recent proposal aims to do this using a dual-rail encoding with superconducting cavities. However, experimental characterization and demonstration of a dual-rail cavity qubit has not yet been realized. In this work, we implement such a dual-rail cavity qubit; we demonstrate a projective logical measurement with integrated erasure detection and use it to measure dual-rail qubit idling errors. We measure logical state preparation and measurement errors at the $0.01\%$-level and detect over $99\%$ of cavity decay events as erasures. We use the precision of this new measurement protocol to distinguish different types of errors in this system, finding that while decay errors occur with probability $\sim 0.2\%$ per microsecond, phase errors occur 6 times less frequently and bit flips occur at least 140 times less frequently. These findings represent the first confirmation of the expected error hierarchy necessary to concatenate dual-rail erasure qubits into a highly efficient erasure code. △ Less

Submitted 13 October, 2023; v1 submitted 6 July, 2023; originally announced July 2023.

arXiv:2307.00749 [pdf, other]

Understanding the impact of numerical solvers on inference for differential equation models

Authors: Richard Creswell, Katherine M. Shepherd, Ben Lambert, Gary R. Mirams, Chon Lok Lei, Simon Tavener, Martin Robinson, David J. Gavaghan

Abstract: Most ordinary differential equation (ODE) models used to describe biological or physical systems must be solved approximately using numerical methods. Perniciously, even those solvers which seem sufficiently accurate for the forward problem, i.e., for obtaining an accurate simulation, may not be sufficiently accurate for the inverse problem, i.e., for inferring the model parameters from data. We s… ▽ More Most ordinary differential equation (ODE) models used to describe biological or physical systems must be solved approximately using numerical methods. Perniciously, even those solvers which seem sufficiently accurate for the forward problem, i.e., for obtaining an accurate simulation, may not be sufficiently accurate for the inverse problem, i.e., for inferring the model parameters from data. We show that for both fixed step and adaptive step ODE solvers, solving the forward problem with insufficient accuracy can distort likelihood surfaces, which may become jagged, causing inference algorithms to get stuck in local "phantom" optima. We demonstrate that biases in inference arising from numerical approximation of ODEs are potentially most severe in systems involving low noise and rapid nonlinear dynamics. We reanalyze an ODE changepoint model previously fit to the COVID-19 outbreak in Germany and show the effect of the step size on simulation and inference results. We then fit a more complicated rainfall-runoff model to hydrological data and illustrate the importance of tuning solver tolerances to avoid distorted likelihood surfaces. Our results indicate that when performing inference for ODE model parameters, adaptive step size solver tolerances must be set cautiously and likelihood surfaces should be inspected for characteristic signs of numerical issues. △ Less

Submitted 3 July, 2023; originally announced July 2023.

arXiv:2307.00735 [pdf, other]

Novelty and Lifted Helpful Actions in Generalized Planning

Authors: Chao Lei, Nir Lipovetzky, Krista A. Ehinger

Abstract: It has been shown recently that successful techniques in classical planning, such as goal-oriented heuristics and landmarks, can improve the ability to compute planning programs for generalized planning (GP) problems. In this work, we introduce the notion of action novelty rank, which computes novelty with respect to a planning program, and propose novelty-based generalized planning solvers, which… ▽ More It has been shown recently that successful techniques in classical planning, such as goal-oriented heuristics and landmarks, can improve the ability to compute planning programs for generalized planning (GP) problems. In this work, we introduce the notion of action novelty rank, which computes novelty with respect to a planning program, and propose novelty-based generalized planning solvers, which prune a newly generated planning program if its most frequent action repetition is greater than a given bound $v$, implemented by novelty-based best-first search BFS($v$) and its progressive variant PGP($v$). Besides, we introduce lifted helpful actions in GP derived from action schemes, and propose new evaluation functions and structural program restrictions to scale up the search. Our experiments show that the new algorithms BFS($v$) and PGP($v$) outperform the state-of-the-art in GP over the standard generalized planning benchmarks. Practical findings on the above-mentioned methods in generalized planning are briefly discussed. △ Less

Submitted 2 July, 2023; originally announced July 2023.

Comments: Accepted at SoCS 2023 (extended version)

arXiv:2306.10325 [pdf]

Pressure tunable quantum anomalous Hall states in a topological antiferromagnet

Authors: Su Kong Chong, Chao Lei, Jie Li, Yang Cheng, David Graf, Seng Huat Lee, Masaki Tanabe, Ting-Hsun Yang, Zhiqiang Mao, Allan H. MacDonald, Kang L. Wang

Abstract: Mechanical modulation of the lattice parameter can modify the electronic structure and manipulate the magnetic coupling of a material without introducing impurities. Inspired by success in pressure-controlled magnetism, we investigate the effect of hydrostatic pressure on quantized Chern states in the antiferromagnetic topological insulator MnBi2Te4, using transport as a probe. We show that pressu… ▽ More Mechanical modulation of the lattice parameter can modify the electronic structure and manipulate the magnetic coupling of a material without introducing impurities. Inspired by success in pressure-controlled magnetism, we investigate the effect of hydrostatic pressure on quantized Chern states in the antiferromagnetic topological insulator MnBi2Te4, using transport as a probe. We show that pressure can enhance the robustness of quantum anomalous Hall (QAH) phases that are otherwise delicate in 7SL MnBi2Te4 and in the spin-flop (SF) state of 8SL MnBi2Te4. We explain our findings using a coupled Dirac cone model of MnBi2Te4, which identifies stronger hybridization between van der Waals layers as the driver of topological states. We further demonstrate that moderate pressures readily available in laboratory systems can provide reversible control of magnetic and topological phases. Our results reveal a strong connection between the mechanical engineering of band topology and magnetism. △ Less

Submitted 17 June, 2023; originally announced June 2023.

Comments: 11 pages, 4 figures

arXiv:2305.09152 [pdf]

Security Enhancement of Quantum Noise Stream Cipher Based on Probabilistic Constellation Shaping

Authors: Sheng Liu, Shuang Wei, Wei Wang, Chao Lei, Tianhe Liu, Yajie Li, Yunbo Li, Dawei Ge, Dong Wang, Yongli Zhao, Dechao Zhang, Han Li, Jie Zhang

Abstract: We propose a QNSC pre-coding scheme based on probabilistic shaping of the basis, to reduce the probability of ciphertext bits that are easier to be intercepted. Experiment results show this scheme can improve the security performance by 100% in terms of Eve's cipher text BER. We propose a QNSC pre-coding scheme based on probabilistic shaping of the basis, to reduce the probability of ciphertext bits that are easier to be intercepted. Experiment results show this scheme can improve the security performance by 100% in terms of Eve's cipher text BER. △ Less

Submitted 16 May, 2023; originally announced May 2023.

arXiv:2305.01947 [pdf]

Understanding the Impact of Heatwave on Urban Heat Island in Greater Sydney: Temporal Surface Energy Budget Change with Land Types

Authors: Jing Kong, Yongling Zhao, Dominik Strebel, Kai Gao, Jan Carmeliet, Chengwang Lei

Abstract: The impact of heatwaves (HWs) on urban heat island (UHI) is a contentious topic with contradictory research findings. A comprehensive understanding of the response of urban and rural areas to HWs, considering the underlying cause of surface energy budget changes, remains elusive. This study attempts to address this gap by investigating a 2020 HW event in the Greater Sydney Area using the Advanced… ▽ More The impact of heatwaves (HWs) on urban heat island (UHI) is a contentious topic with contradictory research findings. A comprehensive understanding of the response of urban and rural areas to HWs, considering the underlying cause of surface energy budget changes, remains elusive. This study attempts to address this gap by investigating a 2020 HW event in the Greater Sydney Area using the Advanced Weather Research and Forecasting (WRF) model. Findings indicate that the HW intensifies the nighttime surface UHI by approximately 4°C. An analysis of surface energy budgets reveals that urban areas store more heat during the HW due to receiving more solar radiation and less evapotranspiration compared to rural areas. The maximum heat storage flux in urban during the HW can be around 200 W/m2 higher than that during post-HW. The stored heat is released at nightime, raising the air temperature in the urban areas. Forests and savannas have relatively lower storage heat fluxes due to high transpiration and albedo, and the maximum heat storage flux is only around 50 W/m2 higher than that during post-HW. In contrast, a negative synergistic effect is detected between the 2-m UHI and HW. This may be because other meteorological conditions including wind have substantial impacts on the air temperature pattern. The strong hot and dry winds coming from the west and the proximity of tall buildings to the coast diminish the sea breeze coming from the east, resulting in a higher air temperature in the western urban district. Meanwhile, the western forest area also experiences higher temperatures due to the westward winds. In addition, changes in wind direction alter the temperature distribution in the northern rural region. Based on the present study, urban climate simulation data and associated findings can be used to develop urban heat mitigation strategies for UHI during HW. △ Less

Submitted 3 May, 2023; originally announced May 2023.

arXiv:2305.01872 [pdf, other]

Microwave loss characterization using multi-mode superconducting resonators

Authors: Chan U Lei, Suhas Ganjam, Lev Krayzman, Archan Banerjee, Kim Kisslinger, Sooyeon Hwang, Luigi Frunzio, Robert J. Schoelkopf

Abstract: Measuring the losses arising from different materials and interfaces is crucial to improving the coherence of superconducting quantum circuits. Although this has been of interest for a long time, current studies can either only provide bounds to those losses, or require several devices for a complete characterization. In this work, we introduce a method to measure the microwave losses of materials… ▽ More Measuring the losses arising from different materials and interfaces is crucial to improving the coherence of superconducting quantum circuits. Although this has been of interest for a long time, current studies can either only provide bounds to those losses, or require several devices for a complete characterization. In this work, we introduce a method to measure the microwave losses of materials and interfaces with a single multi-mode superconducting resonator. We demonstrate a formalism for analyzing the loss sensitivity of multi-mode systems and discuss the design strategies of multi-mode resonators for material loss studies. We present two types of multi-mode superconducting resonators for the study of bulk superconductors: the forky whispering-gallery-mode resonator (FWGMR) and the ellipsoidal cavity. We use these resonators to measure the surface dielectric, conductor, and seam losses of high-purity (5N5) aluminum and aluminum alloy (6061), as well as how they are affected by chemical etching, diamond turning, and thin-film coating. We find that chemical etching and diamond turning reduce both the surface dielectric and conductive losses of high-purity aluminum, but provide no appreciable improvement to the seam. Coating the surfaces of diamond-turned aluminum alloys with e-beam evaporated or sputtered aluminum thin-films significantly reduces all three losses under study. In addition, we study the effect of chemical etching on the surface of high-purity aluminum using transmission electron microscopy (TEM) and find that the chemical etching process creates a thinner and more uniform oxide layer, consistent with the observed improvement in the surface dielectric loss. △ Less

Submitted 2 May, 2023; originally announced May 2023.

arXiv:2304.10226 [pdf, other]

Domain Generalization for Mammographic Image Analysis with Contrastive Learning

Authors: Zheren Li, Zhiming Cui, Lichi Zhang, Sheng Wang, Chenjin Lei, Xi Ouyang, Dongdong Chen, Xiangyu Zhao, Yajia Gu, Zaiyi Liu, Chunling Liu, Dinggang Shen, Jie-Zhi Cheng

Abstract: The deep learning technique has been shown to be effectively addressed several image analysis tasks in the computer-aided diagnosis scheme for mammography. The training of an efficacious deep learning model requires large data with diverse styles and qualities. The diversity of data often comes from the use of various scanners of vendors. But, in practice, it is impractical to collect a sufficient… ▽ More The deep learning technique has been shown to be effectively addressed several image analysis tasks in the computer-aided diagnosis scheme for mammography. The training of an efficacious deep learning model requires large data with diverse styles and qualities. The diversity of data often comes from the use of various scanners of vendors. But, in practice, it is impractical to collect a sufficient amount of diverse data for training. To this end, a novel contrastive learning is developed to equip the deep learning models with better style generalization capability. Specifically, the multi-style and multi-view unsupervised self-learning scheme is carried out to seek robust feature embedding against style diversity as a pretrained model. Afterward, the pretrained network is further fine-tuned to the downstream tasks, e.g., mass detection, matching, BI-RADS rating, and breast density classification. The proposed method has been evaluated extensively and rigorously with mammograms from various vendor style domains and several public datasets. The experimental results suggest that the proposed domain generalization method can effectively improve performance of four mammographic image tasks on the data from both seen and unseen domains, and outperform many state-of-the-art (SOTA) generalization methods. △ Less

Submitted 7 September, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

Comments: arXiv admin note: text overlap with arXiv:2111.10827

arXiv:2304.00219 [pdf, other]

ConvBLS: An Effective and Efficient Incremental Convolutional Broad Learning System for Image Classification

Authors: Chunyu Lei, C. L. Philip Chen, Jifeng Guo, Tong Zhang

Abstract: Deep learning generally suffers from enormous computational resources and time-consuming training processes. Broad Learning System (BLS) and its convolutional variants have been proposed to mitigate these issues and have achieved superb performance in image classification. However, the existing convolutional-based broad learning system (C-BLS) either lacks an efficient training method and incremen… ▽ More Deep learning generally suffers from enormous computational resources and time-consuming training processes. Broad Learning System (BLS) and its convolutional variants have been proposed to mitigate these issues and have achieved superb performance in image classification. However, the existing convolutional-based broad learning system (C-BLS) either lacks an efficient training method and incremental learning capability or suffers from poor performance. To this end, we propose a convolutional broad learning system (ConvBLS) based on the spherical K-means (SKM) algorithm and two-stage multi-scale (TSMS) feature fusion, which consists of the convolutional feature (CF) layer, convolutional enhancement (CE) layer, TSMS feature fusion layer, and output layer. First, unlike the current C-BLS, the simple yet efficient SKM algorithm is utilized to learn the weights of CF layers. Compared with random filters, the SKM algorithm makes the CF layer learn more comprehensive spatial features. Second, similar to the vanilla BLS, CE layers are established to expand the feature space. Third, the TSMS feature fusion layer is proposed to extract more effective multi-scale features through the integration of CF layers and CE layers. Thanks to the above design and the pseudo-inverse calculation of the output layer weights, our proposed ConvBLS method is unprecedentedly efficient and effective. Finally, the corresponding incremental learning algorithms are presented for rapid remodeling if the model deems to expand. Experiments and comparisons demonstrate the superiority of our method. △ Less

Submitted 1 April, 2023; originally announced April 2023.

arXiv:2303.15494 [pdf, other]

Semantic-visual Guided Transformer for Few-shot Class-incremental Learning

Authors: Wenhao Qiu, Sichao Fu, Jingyi Zhang, Chengxiang Lei, Qinmu Peng

Abstract: Few-shot class-incremental learning (FSCIL) has recently attracted extensive attention in various areas. Existing FSCIL methods highly depend on the robustness of the feature backbone pre-trained on base classes. In recent years, different Transformer variants have obtained significant processes in the feature representation learning of massive fields. Nevertheless, the progress of the Transformer… ▽ More Few-shot class-incremental learning (FSCIL) has recently attracted extensive attention in various areas. Existing FSCIL methods highly depend on the robustness of the feature backbone pre-trained on base classes. In recent years, different Transformer variants have obtained significant processes in the feature representation learning of massive fields. Nevertheless, the progress of the Transformer in FSCIL scenarios has not achieved the potential promised in other fields so far. In this paper, we develop a semantic-visual guided Transformer (SV-T) to enhance the feature extracting capacity of the pre-trained feature backbone on incremental classes. Specifically, we first utilize the visual (image) labels provided by the base classes to supervise the optimization of the Transformer. And then, a text encoder is introduced to automatically generate the corresponding semantic (text) labels for each image from the base classes. Finally, the constructed semantic labels are further applied to the Transformer for guiding its hyperparameters updating. Our SV-T can take full advantage of more supervision information from base classes and further enhance the training robustness of the feature backbone. More importantly, our SV-T is an independent method, which can directly apply to the existing FSCIL architectures for acquiring embeddings of various incremental classes. Extensive experiments on three benchmarks, two FSCIL architectures, and two Transformer variants show that our proposed SV-T obtains a significant improvement in comparison to the existing state-of-the-art FSCIL methods. △ Less

Submitted 27 March, 2023; originally announced March 2023.

Comments: Accepted by IEEE International Conference on Multimedia and Expo (ICME 2023)

arXiv:2303.14635 [pdf, other]

doi 10.1103/PhysRevB.108.125424

Kerr, Faraday, and Magnetoelectric Effects in MnBi$_2$Te$_4$ Thin Films

Authors: Chao Lei, Allan H. MacDonald

Abstract: The topological magneto-electric effect (TME) is a characteristic property of topological insulators. In this article, we use a simplified coupled-Dirac-cone electronic structure model to theoretically evaluate the THz and far infrared Kerr and Faraday responses of thin films of MnBi$_2$Te$_4$ with up to $N=10$ septuple layers with the goal of clarifying the relationship between these convenient m… ▽ More The topological magneto-electric effect (TME) is a characteristic property of topological insulators. In this article, we use a simplified coupled-Dirac-cone electronic structure model to theoretically evaluate the THz and far infrared Kerr and Faraday responses of thin films of MnBi$_2$Te$_4$ with up to $N=10$ septuple layers with the goal of clarifying the relationship between these convenient magneto-optical observables and the TME. We find that for even $N$ the linear Kerr and Faraday responses to an electric field vanish in the low-frequency limit, even though the magnetoelectric response is large and approximately quantized. △ Less

Submitted 9 July, 2024; v1 submitted 26 March, 2023; originally announced March 2023.

Comments: 5+7 pages

Journal ref: Phys. Rev. B 108, 125424 (2023)

Showing 1–50 of 198 results for author: Lei, C