subscribe to arXiv mailings

Automatic Recognition and Classification of Future Work Sentences from Academic Articles in a Specific Domain

Authors: Chengzhi Zhang, Yi Xiang, Wenke Hao, Zhicheng Li, Yuchen Qian, Yuzhuo Wang

Abstract: Future work sentences (FWS) are the particular sentences in academic papers that contain the author's description of their proposed follow-up research direction. This paper presents methods to automatically extract FWS from academic papers and classify them according to the different future directions embodied in the paper's content. FWS recognition methods will enable subsequent researchers to lo… ▽ More Future work sentences (FWS) are the particular sentences in academic papers that contain the author's description of their proposed follow-up research direction. This paper presents methods to automatically extract FWS from academic papers and classify them according to the different future directions embodied in the paper's content. FWS recognition methods will enable subsequent researchers to locate future work sentences more accurately and quickly and reduce the time and cost of acquiring the corpus. The current work on automatic identification of future work sentences is relatively small, and the existing research cannot accurately identify FWS from academic papers, and thus cannot conduct data mining on a large scale. Furthermore, there are many aspects to the content of future work, and the subdivision of the content is conducive to the analysis of specific development directions. In this paper, Nature Language Processing (NLP) is used as a case study, and FWS are extracted from academic papers and classified into different types. We manually build an annotated corpus with six different types of FWS. Then, automatic recognition and classification of FWS are implemented using machine learning models, and the performance of these models is compared based on the evaluation metrics. The results show that the Bernoulli Bayesian model has the best performance in the automatic recognition task, with the Macro F1 reaching 90.73%, and the SCIBERT model has the best performance in the automatic classification task, with the weighted average F1 reaching 72.63%. Finally, we extract keywords from FWS and gain a deep understanding of the key content described in FWS, and we also demonstrate that content determination in FWS will be reflected in the subsequent research work by measuring the similarity between future work sentences and the abstracts. △ Less

Submitted 28 December, 2022; originally announced December 2022.

arXiv:2212.10746 [pdf, other]

SLGTformer: An Attention-Based Approach to Sign Language Recognition

Authors: Neil Song, Yu Xiang

Abstract: Sign language is the preferred method of communication of deaf or mute people, but similar to any language, it is difficult to learn and represents a significant barrier for those who are hard of hearing or unable to speak. A person's entire frontal appearance dictates and conveys specific meaning. However, this frontal appearance can be quantified as a temporal sequence of human body pose, leadin… ▽ More Sign language is the preferred method of communication of deaf or mute people, but similar to any language, it is difficult to learn and represents a significant barrier for those who are hard of hearing or unable to speak. A person's entire frontal appearance dictates and conveys specific meaning. However, this frontal appearance can be quantified as a temporal sequence of human body pose, leading to Sign Language Recognition through the learning of spatiotemporal dynamics of skeleton keypoints. We propose a novel, attention-based approach to Sign Language Recognition exclusively built upon decoupled graph and temporal self-attention: the Sign Language Graph Time Transformer (SLGTformer). SLGTformer first deconstructs spatiotemporal pose sequences separately into spatial graphs and temporal windows. SLGTformer then leverages novel Learnable Graph Relative Positional Encodings (LGRPE) to guide spatial self-attention with the graph neighborhood context of the human skeleton. By modeling the temporal dimension as intra- and inter-window dynamics, we introduce Temporal Twin Self-Attention (TTSA) as the combination of locally-grouped temporal attention (LTA) and global sub-sampled temporal attention (GSTA). We demonstrate the effectiveness of SLGTformer on the World-Level American Sign Language (WLASL) dataset, achieving state-of-the-art performance with an ensemble-free approach on the keypoint modality. The code is available at https://github.com/neilsong/slt △ Less

Submitted 22 December, 2022; v1 submitted 20 December, 2022; originally announced December 2022.

Comments: 12 pages, 3 figures

arXiv:2212.05571 [pdf, other]

doi 10.1016/j.jcp.2023.112343

DOSnet as a Non-Black-Box PDE Solver: When Deep Learning Meets Operator Splitting

Authors: Yuan Lan, Zhen Li, Jie Sun, Yang Xiang

Abstract: Deep neural networks (DNNs) recently emerged as a promising tool for analyzing and solving complex differential equations arising in science and engineering applications. Alternative to traditional numerical schemes, learning-based solvers utilize the representation power of DNNs to approximate the input-output relations in an automated manner. However, the lack of physics-in-the-loop often makes… ▽ More Deep neural networks (DNNs) recently emerged as a promising tool for analyzing and solving complex differential equations arising in science and engineering applications. Alternative to traditional numerical schemes, learning-based solvers utilize the representation power of DNNs to approximate the input-output relations in an automated manner. However, the lack of physics-in-the-loop often makes it difficult to construct a neural network solver that simultaneously achieves high accuracy, low computational burden, and interpretability. In this work, focusing on a class of evolutionary PDEs characterized by having decomposable operators, we show that the classical ``operator splitting'' numerical scheme of solving these equations can be exploited to design neural network architectures. This gives rise to a learning-based PDE solver, which we name Deep Operator-Splitting Network (DOSnet). Such non-black-box network design is constructed from the physical rules and operators governing the underlying dynamics contains learnable parameters, and is thus more flexible than the standard operator splitting scheme. Once trained, it enables the fast solution of the same type of PDEs. To validate the special structure inside DOSnet, we take the linear PDEs as the benchmark and give the mathematical explanation for the weight behavior. Furthermore, to demonstrate the advantages of our new AI-enhanced PDE solver, we train and validate it on several types of operator-decomposable differential equations. We also apply DOSnet to nonlinear Schrödinger equations (NLSE) which have important applications in the signal processing for modern optical fiber transmission systems, and experimental results show that our model has better accuracy and lower computational complexity than numerical schemes and the baseline DNNs. △ Less

Submitted 11 December, 2022; originally announced December 2022.

arXiv:2211.16703 [pdf, other]

An Efficient Split Fine-tuning Framework for Edge and Cloud Collaborative Learning

Authors: Shaohuai Shi, Qing Yang, Yang Xiang, Shuhan Qi, Xuan Wang

Abstract: To enable the pre-trained models to be fine-tuned with local data on edge devices without sharing data with the cloud, we design an efficient split fine-tuning (SFT) framework for edge and cloud collaborative learning. We propose three novel techniques in this framework. First, we propose a matrix decomposition-based method to compress the intermediate output of a neural network to reduce the comm… ▽ More To enable the pre-trained models to be fine-tuned with local data on edge devices without sharing data with the cloud, we design an efficient split fine-tuning (SFT) framework for edge and cloud collaborative learning. We propose three novel techniques in this framework. First, we propose a matrix decomposition-based method to compress the intermediate output of a neural network to reduce the communication volume between the edge device and the cloud server. Second, we eliminate particular links in the model without affecting the convergence performance in fine-tuning. Third, we implement our system atop PyTorch to allow users to easily extend their existing training scripts to enjoy the efficient edge and cloud collaborative learning. Experiments results on 9 NLP datasets show that our framework can reduce the communication traffic by 96 times with little impact on the model accuracy. △ Less

Submitted 29 November, 2022; originally announced November 2022.

Comments: 7 pages

arXiv:2211.16059 [pdf, ps, other]

On Large-Scale Multiple Testing Over Networks: An Asymptotic Approach

Authors: Mehrdad Pournaderi, Yu Xiang

Abstract: This work concerns developing communication- and computation-efficient methods for large-scale multiple testing over networks, which is of interest to many practical applications. We take an asymptotic approach and propose two methods, proportion-matching and greedy aggregation, tailored to distributed settings. The proportion-matching method achieves the global BH performance yet only requires a… ▽ More This work concerns developing communication- and computation-efficient methods for large-scale multiple testing over networks, which is of interest to many practical applications. We take an asymptotic approach and propose two methods, proportion-matching and greedy aggregation, tailored to distributed settings. The proportion-matching method achieves the global BH performance yet only requires a one-shot communication of the (estimated) proportion of true null hypotheses as well as the number of p-values at each node. By focusing on the asymptotic optimal power, we go beyond the BH procedure by providing an explicit characterization of the asymptotic optimal solution. This leads to the greedy aggregation method that effectively approximates the optimal rejection regions at each node, while computation efficiency comes from the greedy-type approach naturally. Moreover, for both methods, we provide the rate of convergence for both the FDR and power. Extensive numerical results over a variety of challenging settings are provided to support our theoretical findings. △ Less

Submitted 16 March, 2024; v1 submitted 29 November, 2022; originally announced November 2022.

Comments: Published in the IEEE Transactions on Signal and Information Processing over Networks

arXiv:2211.11679 [pdf, other]

Mean Shift Mask Transformer for Unseen Object Instance Segmentation

Authors: Yangxiao Lu, Yuqiao Chen, Nicholas Ruozzi, Yu Xiang

Abstract: Segmenting unseen objects from images is a critical perception skill that a robot needs to acquire. In robot manipulation, it can facilitate a robot to grasp and manipulate unseen objects. Mean shift clustering is a widely used method for image segmentation tasks. However, the traditional mean shift clustering algorithm is not differentiable, making it difficult to integrate it into an end-to-end… ▽ More Segmenting unseen objects from images is a critical perception skill that a robot needs to acquire. In robot manipulation, it can facilitate a robot to grasp and manipulate unseen objects. Mean shift clustering is a widely used method for image segmentation tasks. However, the traditional mean shift clustering algorithm is not differentiable, making it difficult to integrate it into an end-to-end neural network training framework. In this work, we propose the Mean Shift Mask Transformer (MSMFormer), a new transformer architecture that simulates the von Mises-Fisher (vMF) mean shift clustering algorithm, allowing for the joint training and inference of both the feature extractor and the clustering. Its central component is a hypersphere attention mechanism, which updates object queries on a hypersphere. To illustrate the effectiveness of our method, we apply MSMFormer to unseen object instance segmentation. Our experiments show that MSMFormer achieves competitive performance compared to state-of-the-art methods for unseen object instance segmentation. The project page, appendix, video, and code are available at https://irvlutd.github.io/MSMFormer △ Less

Submitted 21 September, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

Comments: Add pixel confidence maps

arXiv:2211.09166 [pdf, other]

A Two-Stage Deep Representation Learning-Based Speech Enhancement Method Using Variational Autoencoder and Adversarial Training

Authors: Yang Xiang, Jesper Lisby Højvang, Morten Højfeldt Rasmussen, Mads Græsbøll Christensen

Abstract: This paper focuses on leveraging deep representation learning (DRL) for speech enhancement (SE). In general, the performance of the deep neural network (DNN) is heavily dependent on the learning of data representation. However, the DRL's importance is often ignored in many DNN-based SE algorithms. To obtain a higher quality enhanced speech, we propose a two-stage DRL-based SE method through advers… ▽ More This paper focuses on leveraging deep representation learning (DRL) for speech enhancement (SE). In general, the performance of the deep neural network (DNN) is heavily dependent on the learning of data representation. However, the DRL's importance is often ignored in many DNN-based SE algorithms. To obtain a higher quality enhanced speech, we propose a two-stage DRL-based SE method through adversarial training. In the first stage, we disentangle different latent variables because disentangled representations can help DNN generate a better enhanced speech. Specifically, we use the $β$-variational autoencoder (VAE) algorithm to obtain the speech and noise posterior estimations and related representations from the observed signal. However, since the posteriors and representations are intractable and we can only apply a conditional assumption to estimate them, it is difficult to ensure that these estimations are always pretty accurate, which may potentially degrade the final accuracy of the signal estimation. To further improve the quality of enhanced speech, in the second stage, we introduce adversarial training to reduce the effect of the inaccurate posterior towards signal reconstruction and improve the signal estimation accuracy, making our algorithm more robust for the potentially inaccurate posterior estimations. As a result, better SE performance can be achieved. The experimental results indicate that the proposed strategy can help similar DNN-based SE algorithms achieve higher short-time objective intelligibility (STOI), perceptual evaluation of speech quality (PESQ), and scale-invariant signal-to-distortion ratio (SI-SDR) scores. Moreover, the proposed algorithm can also outperform recent competitive SE algorithms. △ Less

Submitted 27 September, 2023; v1 submitted 16 November, 2022; originally announced November 2022.

Comments: Accepted by IEEE/ACM Transactions on Audio, Speech and Language Processing

arXiv:2211.03885 [pdf, other]

Learned Smartphone ISP on Mobile GPUs with Deep Learning, Mobile AI & AIM 2022 Challenge: Report

Authors: Andrey Ignatov, Radu Timofte, Shuai Liu, Chaoyu Feng, Furui Bai, Xiaotao Wang, Lei Lei, Ziyao Yi, Yan Xiang, Zibin Liu, Shaoqing Li, Keming Shi, Dehui Kong, Ke Xu, Minsu Kwon, Yaqi Wu, Jiesi Zheng, Zhihao Fan, Xun Wu, Feng Zhang, Albert No, Minhyeok Cho, Zewen Chen, Xiaze Zhang, Ran Li , et al. (13 additional authors not shown)

Abstract: The role of mobile cameras increased dramatically over the past few years, leading to more and more research in automatic image quality enhancement and RAW photo processing. In this Mobile AI challenge, the target was to develop an efficient end-to-end AI-based image signal processing (ISP) pipeline replacing the standard mobile ISPs that can run on modern smartphone GPUs using TensorFlow Lite. Th… ▽ More The role of mobile cameras increased dramatically over the past few years, leading to more and more research in automatic image quality enhancement and RAW photo processing. In this Mobile AI challenge, the target was to develop an efficient end-to-end AI-based image signal processing (ISP) pipeline replacing the standard mobile ISPs that can run on modern smartphone GPUs using TensorFlow Lite. The participants were provided with a large-scale Fujifilm UltraISP dataset consisting of thousands of paired photos captured with a normal mobile camera sensor and a professional 102MP medium-format FujiFilm GFX100 camera. The runtime of the resulting models was evaluated on the Snapdragon's 8 Gen 1 GPU that provides excellent acceleration results for the majority of common deep learning ops. The proposed solutions are compatible with all recent mobile GPUs, being able to process Full HD photos in less than 20-50 milliseconds while achieving high fidelity results. A detailed description of all models developed in this challenge is provided in this paper. △ Less

Submitted 7 November, 2022; originally announced November 2022.

arXiv:2211.00235 [pdf, other]

Efficient AlphaFold2 Training using Parallel Evoformer and Branch Parallelism

Authors: Guoxia Wang, Zhihua Wu, Xiaomin Fang, Yingfei Xiang, Yiqun Liu, Dianhai Yu, Yanjun Ma

Abstract: The accuracy of AlphaFold2, a frontier end-to-end structure prediction system, is already close to that of the experimental determination techniques. Due to the complex model architecture and large memory consumption, it requires lots of computational resources and time to train AlphaFold2 from scratch. Efficient AlphaFold2 training could accelerate the development of life science. In this paper,… ▽ More The accuracy of AlphaFold2, a frontier end-to-end structure prediction system, is already close to that of the experimental determination techniques. Due to the complex model architecture and large memory consumption, it requires lots of computational resources and time to train AlphaFold2 from scratch. Efficient AlphaFold2 training could accelerate the development of life science. In this paper, we propose a Parallel Evoformer and Branch Parallelism to speed up the training of AlphaFold2. We conduct sufficient experiments on UniFold implemented in PyTorch and HelixFold implemented in PaddlePaddle, and Branch Parallelism can improve the training performance by 38.67% and 36.93%, respectively. We also demonstrate that the accuracy of Parallel Evoformer could be on par with AlphaFold2 on the CASP14 and CAMEO datasets. The source code is available on https://github.com/PaddlePaddle/PaddleFleetX △ Less

Submitted 31 October, 2022; originally announced November 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2207.05477

arXiv:2210.17408 [pdf, ps, other]

Accelerating Diffusion Models via Pre-segmentation Diffusion Sampling for Medical Image Segmentation

Authors: Xutao Guo, Yanwu Yang, Chenfei Ye, Shang Lu, Yang Xiang, Ting Ma

Abstract: Based on the Denoising Diffusion Probabilistic Model (DDPM), medical image segmentation can be described as a conditional image generation task, which allows to compute pixel-wise uncertainty maps of the segmentation and allows an implicit ensemble of segmentations to boost the segmentation performance. However, DDPM requires many iterative denoising steps to generate segmentations from Gaussian n… ▽ More Based on the Denoising Diffusion Probabilistic Model (DDPM), medical image segmentation can be described as a conditional image generation task, which allows to compute pixel-wise uncertainty maps of the segmentation and allows an implicit ensemble of segmentations to boost the segmentation performance. However, DDPM requires many iterative denoising steps to generate segmentations from Gaussian noise, resulting in extremely inefficient inference. To mitigate the issue, we propose a principled acceleration strategy, called pre-segmentation diffusion sampling DDPM (PD-DDPM), which is specially used for medical image segmentation. The key idea is to obtain pre-segmentation results based on a separately trained segmentation network, and construct noise predictions (non-Gaussian distribution) according to the forward diffusion rule. We can then start with noisy predictions and use fewer reverse steps to generate segmentation results. Experiments show that PD-DDPM yields better segmentation results over representative baseline methods even if the number of reverse steps is significantly reduced. Moreover, PD-DDPM is orthogonal to existing advanced segmentation models, which can be combined to further improve the segmentation performance. △ Less

Submitted 26 October, 2022; originally announced October 2022.

arXiv:2210.13721 [pdf, other]

Multi-modal Dynamic Graph Network: Coupling Structural and Functional Connectome for Disease Diagnosis and Classification

Authors: Yanwu Yang, Xutao Guo, Zhikai Chang, Chenfei Ye, Yang Xiang, Ting Ma

Abstract: Multi-modal neuroimaging technology has greatlly facilitated the efficiency and diagnosis accuracy, which provides complementary information in discovering objective disease biomarkers. Conventional deep learning methods, e.g. convolutional neural networks, overlook relationships between nodes and fail to capture topological properties in graphs. Graph neural networks have been proven to be of gre… ▽ More Multi-modal neuroimaging technology has greatlly facilitated the efficiency and diagnosis accuracy, which provides complementary information in discovering objective disease biomarkers. Conventional deep learning methods, e.g. convolutional neural networks, overlook relationships between nodes and fail to capture topological properties in graphs. Graph neural networks have been proven to be of great importance in modeling brain connectome networks and relating disease-specific patterns. However, most existing graph methods explicitly require known graph structures, which are not available in the sophisticated brain system. Especially in heterogeneous multi-modal brain networks, there exists a great challenge to model interactions among brain regions in consideration of inter-modal dependencies. In this study, we propose a Multi-modal Dynamic Graph Convolution Network (MDGCN) for structural and functional brain network learning. Our method benefits from modeling inter-modal representations and relating attentive multi-model associations into dynamic graphs with a compositional correspondence matrix. Moreover, a bilateral graph convolution layer is proposed to aggregate multi-modal representations in terms of multi-modal associations. Extensive experiments on three datasets demonstrate the superiority of our proposed method in terms of disease classification, with the accuracy of 90.4%, 85.9% and 98.3% in predicting Mild Cognitive Impairment (MCI), Parkinson's disease (PD), and schizophrenia (SCHZ) respectively. Furthermore, our statistical evaluations on the correspondence matrix exhibit a high correspondence with previous evidence of biomarkers. △ Less

Submitted 24 October, 2022; originally announced October 2022.

arXiv:2210.11834 [pdf, other]

Optimal Contextual Bandits with Knapsacks under Realizability via Regression Oracles

Authors: Yuxuan Han, Jialin Zeng, Yang Wang, Yang Xiang, Jiheng Zhang

Abstract: We study the stochastic contextual bandit with knapsacks (CBwK) problem, where each action, taken upon a context, not only leads to a random reward but also costs a random resource consumption in a vector form. The challenge is to maximize the total reward without violating the budget for each resource. We study this problem under a general realizability setting where the expected reward and expec… ▽ More We study the stochastic contextual bandit with knapsacks (CBwK) problem, where each action, taken upon a context, not only leads to a random reward but also costs a random resource consumption in a vector form. The challenge is to maximize the total reward without violating the budget for each resource. We study this problem under a general realizability setting where the expected reward and expected cost are functions of contexts and actions in some given general function classes $\mathcal{F}$ and $\mathcal{G}$, respectively. Existing works on CBwK are restricted to the linear function class since they use UCB-type algorithms, which heavily rely on the linear form and thus are difficult to extend to general function classes. Motivated by online regression oracles that have been successfully applied to contextual bandits, we propose the first universal and optimal algorithmic framework for CBwK by reducing it to online regression. We also establish the lower regret bound to show the optimality of our algorithm for a variety of function classes. △ Less

Submitted 22 February, 2023; v1 submitted 21 October, 2022; originally announced October 2022.

Comments: AISTATS2023

arXiv:2210.11262 [pdf, other]

doi 10.48550/ARXIV.2210.11262

RMBench: Benchmarking Deep Reinforcement Learning for Robotic Manipulator Control

Authors: Yanfei Xiang, Xin Wang, Shu Hu, Bin Zhu, Xiaomeng Huang, Xi Wu, Siwei Lyu

Abstract: Reinforcement learning is applied to solve actual complex tasks from high-dimensional, sensory inputs. The last decade has developed a long list of reinforcement learning algorithms. Recent progress benefits from deep learning for raw sensory signal representation. One question naturally arises: how well do they perform concerning different robotic manipulation tasks? Benchmarks use objective perf… ▽ More Reinforcement learning is applied to solve actual complex tasks from high-dimensional, sensory inputs. The last decade has developed a long list of reinforcement learning algorithms. Recent progress benefits from deep learning for raw sensory signal representation. One question naturally arises: how well do they perform concerning different robotic manipulation tasks? Benchmarks use objective performance metrics to offer a scientific way to compare algorithms. In this paper, we present RMBench, the first benchmark for robotic manipulations, which have high-dimensional continuous action and state spaces. We implement and evaluate reinforcement learning algorithms that directly use observed pixels as inputs. We report their average performance and learning curves to show their performance and stability of training. Our study concludes that none of the studied algorithms can handle all tasks well, soft Actor-Critic outperforms most algorithms in average reward and stability, and an algorithm combined with data augmentation may facilitate learning policies. Our code is publicly available at https://github.com/xiangyanfei212/RMBench-2022, including all benchmark tasks and studied algorithms. △ Less

Submitted 7 March, 2023; v1 submitted 20 October, 2022; originally announced October 2022.

Comments: 8 pages, 3 figures, 2 tables; update code's link

ACM Class: I.2.9

arXiv:2210.10978 [pdf, other]

A Comprehensive Survey on Edge Data Integrity Verification: Fundamentals and Future Trends

Authors: Yao Zhao, Youyang Qu, Yong Xiang, Longxiang Gao

Abstract: Recent advances in edge computing have pushed cloud-based data caching services to edge, however, such emerging edge storage comes with numerous challenging and unique security issues. One of them is the problem of edge data integrity verification (EDIV) which coordinates multiple participants (e.g., data owners and edge nodes) to inspect whether data cached on edge is authentic. To date, various… ▽ More Recent advances in edge computing have pushed cloud-based data caching services to edge, however, such emerging edge storage comes with numerous challenging and unique security issues. One of them is the problem of edge data integrity verification (EDIV) which coordinates multiple participants (e.g., data owners and edge nodes) to inspect whether data cached on edge is authentic. To date, various solutions have been proposed to address the EDIV problem, while there is no systematic review. Thus, we offer a comprehensive survey for the first time, aiming to show current research status, open problems, and potentially promising insights for readers to further investigate this under-explored field. Specifically, we begin with stating the significance of the EDIV problem, the integrity verification difference between data cached on cloud and edge, and three typical system models with corresponding inspection processes. Then, we synthesize a universal criteria framework that an effective verification approach should satisfy. Subsequently, we adopt a schematic development timeline to reveal the research advance on EDIV in a sequential manner, followed by a detailed review on the existing EDIV solutions. Finally, we highlight intriguing research challenges and possible directions for future research. △ Less

Submitted 19 October, 2022; originally announced October 2022.

arXiv:2210.04435 [pdf, other]

Creating a Dynamic Quadrupedal Robotic Goalkeeper with Reinforcement Learning

Authors: Xiaoyu Huang, Zhongyu Li, Yanzhen Xiang, Yiming Ni, Yufeng Chi, Yunhao Li, Lizhi Yang, Xue Bin Peng, Koushil Sreenath

Abstract: We present a reinforcement learning (RL) framework that enables quadrupedal robots to perform soccer goalkeeping tasks in the real world. Soccer goalkeeping using quadrupeds is a challenging problem, that combines highly dynamic locomotion with precise and fast non-prehensile object (ball) manipulation. The robot needs to react to and intercept a potentially flying ball using dynamic locomotion ma… ▽ More We present a reinforcement learning (RL) framework that enables quadrupedal robots to perform soccer goalkeeping tasks in the real world. Soccer goalkeeping using quadrupeds is a challenging problem, that combines highly dynamic locomotion with precise and fast non-prehensile object (ball) manipulation. The robot needs to react to and intercept a potentially flying ball using dynamic locomotion maneuvers in a very short amount of time, usually less than one second. In this paper, we propose to address this problem using a hierarchical model-free RL framework. The first component of the framework contains multiple control policies for distinct locomotion skills, which can be used to cover different regions of the goal. Each control policy enables the robot to track random parametric end-effector trajectories while performing one specific locomotion skill, such as jump, dive, and sidestep. These skills are then utilized by the second part of the framework which is a high-level planner to determine a desired skill and end-effector trajectory in order to intercept a ball flying to different regions of the goal. We deploy the proposed framework on a Mini Cheetah quadrupedal robot and demonstrate the effectiveness of our framework for various agile interceptions of a fast-moving ball in the real world. △ Less

Submitted 10 October, 2022; originally announced October 2022.

Comments: First two authors contributed equally. Accompanying video is at https://youtu.be/iX6OgG67-ZQ

arXiv:2210.03301 [pdf, other]

GOLLIC: Learning Global Context beyond Patches for Lossless High-Resolution Image Compression

Authors: Yuan Lan, Liang Qin, Zhaoyi Sun, Yang Xiang, Jie Sun

Abstract: Neural-network-based approaches recently emerged in the field of data compression and have already led to significant progress in image compression, especially in achieving a higher compression ratio. In the lossless image compression scenario, however, existing methods often struggle to learn a probability model of full-size high-resolution images due to the limitation of the computation source.… ▽ More Neural-network-based approaches recently emerged in the field of data compression and have already led to significant progress in image compression, especially in achieving a higher compression ratio. In the lossless image compression scenario, however, existing methods often struggle to learn a probability model of full-size high-resolution images due to the limitation of the computation source. The current strategy is to crop high-resolution images into multiple non-overlapping patches and process them independently. This strategy ignores long-term dependencies beyond patches, thus limiting modeling performance. To address this problem, we propose a hierarchical latent variable model with a global context to capture the long-term dependencies of high-resolution images. Besides the latent variable unique to each patch, we introduce shared latent variables between patches to construct the global context. The shared latent variables are extracted by a self-supervised clustering module inside the model's encoder. This clustering module assigns each patch the confidence that it belongs to any cluster. Later, shared latent variables are learned according to latent variables of patches and their confidence, which reflects the similarity of patches in the same cluster and benefits the global context modeling. Experimental results show that our global context model improves compression ratio compared to the engineered codecs and deep learning models on three benchmark high-resolution image datasets, DIV2K, CLIC.pro, and CLIC.mobile. △ Less

Submitted 6 October, 2022; originally announced October 2022.

arXiv:2210.02555 [pdf, ps, other]

Sample-and-Forward: Communication-Efficient Control of the False Discovery Rate in Networks

Authors: Mehrdad Pournaderi, Yu Xiang

Abstract: This work concerns controlling the false discovery rate (FDR) in networks under communication constraints. We present sample-and-forward, a flexible and communication-efficient version of the Benjamini-Hochberg (BH) procedure for multihop networks with general topologies. Our method evidences that the nodes in a network do not need to communicate p-values to each other to achieve a decent statisti… ▽ More This work concerns controlling the false discovery rate (FDR) in networks under communication constraints. We present sample-and-forward, a flexible and communication-efficient version of the Benjamini-Hochberg (BH) procedure for multihop networks with general topologies. Our method evidences that the nodes in a network do not need to communicate p-values to each other to achieve a decent statistical power under the global FDR control constraint. Consider a network with a total of $m$ p-values, our method consists of first sampling the (empirical) CDF of the p-values at each node and then forwarding $\mathcal{O}(\log m)$ bits to its neighbors. Under the same assumptions as for the original BH procedure, our method has both the provable finite-sample FDR control as well as competitive empirical detection power, even with a few samples at each node. We provide an asymptotic analysis of power under a mixture model assumption on the p-values. △ Less

Submitted 15 May, 2023; v1 submitted 5 October, 2022; originally announced October 2022.

Comments: Accepted to the 2023 IEEE International Symposium on Information Theory (ISIT)

arXiv:2210.00237 [pdf, other]

doi 10.1038/s41598-022-17540-1

Physical interpretation of nonlocal quantum correlation through local description of subsystems

Authors: Tanumoy Pramanik, Xiaojiong Chen, Yu Xiang, Xudong Li, Jun Mao, Jueming Bao, Yaohao Deng, Tianxiang Dai, Bo Tang, Yan Yang, Zhihua Li, Qihuang Gong, Qiongyi He, Jianwei Wang

Abstract: Characterization and categorization of quantum correlations are both fundamentally and practically important in quantum information science. Although quantum correlations such as non-separability, steerability, and non-locality can be characterized by different theoretical models in different scenarios with either known (trusted) or unknown (untrusted) knowledge of the associated systems, such cha… ▽ More Characterization and categorization of quantum correlations are both fundamentally and practically important in quantum information science. Although quantum correlations such as non-separability, steerability, and non-locality can be characterized by different theoretical models in different scenarios with either known (trusted) or unknown (untrusted) knowledge of the associated systems, such characterization sometimes lacks unambiguous to experimentalist. In this work, we propose the physical interpretation of nonlocal quantum correlation between two systems. In the absence of {\it complete local description} of one of the subsystems quantified by the {\it local uncertainty relation}, the correlation between subsystems becomes nonlocal. Remarkably, different nonlocal quantum correlations can be discriminated from a single uncertainty relation derived under local hidden state (LHS)-LHS model only. We experimentally characterize the two-qubit Werner state in different scenarios. △ Less

Submitted 1 October, 2022; originally announced October 2022.

Comments: 13 pages, 10 figures. Comments are welcome

Journal ref: Sci Rep 12, 16400 (2022)

arXiv:2209.12642 [pdf]

Design of Automatic Driving Safety Level and Positioning Accuracy

Authors: Tiantian Tang, Hao Xu, Chengcheng Wu, Sijie Lye, Yan Xiang

Abstract: Autonomous driving is a hot research topic in the frontier of science and technology. Technology companies and traditional car companies are developing and designing autonomous driving technology from two different directions. Based on the automatic driving classification standard and ISO safety level, combined with the number of traffic accidents and death data in China, and referring to the risk… ▽ More Autonomous driving is a hot research topic in the frontier of science and technology. Technology companies and traditional car companies are developing and designing autonomous driving technology from two different directions. Based on the automatic driving classification standard and ISO safety level, combined with the number of traffic accidents and death data in China, and referring to the risk allocation method of the automated driving virtual drive system in the United States, the risk allocation of China's virtual drive system will be carried out. In addition, combined with the vehicle "positioning box" model, the theoretical calculation of the alarm limit of positioning accuracy in China will be carried out and the positioning accuracy requirements of related vehicles will be designed. △ Less

Submitted 26 September, 2022; originally announced September 2022.

Comments: in Chinese language

arXiv:2209.11715 [pdf, other]

The "Beatrix'' Resurrections: Robust Backdoor Detection via Gram Matrices

Authors: Wanlun Ma, Derui Wang, Ruoxi Sun, Minhui Xue, Sheng Wen, Yang Xiang

Abstract: Deep Neural Networks (DNNs) are susceptible to backdoor attacks during training. The model corrupted in this way functions normally, but when triggered by certain patterns in the input, produces a predefined target label. Existing defenses usually rely on the assumption of the universal backdoor setting in which poisoned samples share the same uniform trigger. However, recent advanced backdoor att… ▽ More Deep Neural Networks (DNNs) are susceptible to backdoor attacks during training. The model corrupted in this way functions normally, but when triggered by certain patterns in the input, produces a predefined target label. Existing defenses usually rely on the assumption of the universal backdoor setting in which poisoned samples share the same uniform trigger. However, recent advanced backdoor attacks show that this assumption is no longer valid in dynamic backdoors where the triggers vary from input to input, thereby defeating the existing defenses. In this work, we propose a novel technique, Beatrix (backdoor detection via Gram matrix). Beatrix utilizes Gram matrix to capture not only the feature correlations but also the appropriately high-order information of the representations. By learning class-conditional statistics from activation patterns of normal samples, Beatrix can identify poisoned samples by capturing the anomalies in activation patterns. To further improve the performance in identifying target labels, Beatrix leverages kernel-based testing without making any prior assumptions on representation distribution. We demonstrate the effectiveness of our method through extensive evaluation and comparison with state-of-the-art defensive techniques. The experimental results show that our approach achieves an F1 score of 91.1% in detecting dynamic backdoors, while the state of the art can only reach 36.9%. △ Less

Submitted 18 December, 2022; v1 submitted 23 September, 2022; originally announced September 2022.

Comments: 18 pages, 23 figures. Accepted to NDSS 2023. Camera-ready version. Code availability: https://github.com/wanlunsec/Beatrix

arXiv:2209.08933 [pdf, ps, other]

Estimating Brain Age with Global and Local Dependencies

Authors: Yanwu Yang, Xutao Guo, Zhikai Chang, Chenfei Ye, Yang Xiang, Haiyan Lv, Ting Ma

Abstract: The brain age has been proven to be a phenotype of relevance to cognitive performance and brain disease. Achieving accurate brain age prediction is an essential prerequisite for optimizing the predicted brain-age difference as a biomarker. As a comprehensive biological characteristic, the brain age is hard to be exploited accurately with models using feature engineering and local processing such a… ▽ More The brain age has been proven to be a phenotype of relevance to cognitive performance and brain disease. Achieving accurate brain age prediction is an essential prerequisite for optimizing the predicted brain-age difference as a biomarker. As a comprehensive biological characteristic, the brain age is hard to be exploited accurately with models using feature engineering and local processing such as local convolution and recurrent operations that process one local neighborhood at a time. Instead, Vision Transformers learn global attentive interaction of patch tokens, introducing less inductive bias and modeling long-range dependencies. In terms of this, we proposed a novel network for learning brain age interpreting with global and local dependencies, where the corresponding representations are captured by Successive Permuted Transformer (SPT) and convolution blocks. The SPT brings computation efficiency and locates the 3D spatial information indirectly via continuously encoding 2D slices from different views. Finally, we collect a large cohort of 22645 subjects with ages ranging from 14 to 97 and our network performed the best among a series of deep learning methods, yielding a mean absolute error (MAE) of 2.855 in validation set, and 2.911 in an independent test set. △ Less

Submitted 19 September, 2022; originally announced September 2022.

arXiv:2209.00514 [pdf, other]

Efficient Chemical Space Exploration Using Active Learning Based on Marginalized Graph Kernel: an Application for Predicting the Thermodynamic Properties of Alkanes with Molecular Simulation

Authors: Yan Xiang, Yu-Hang Tang, Zheng Gong, Hongyi Liu, Liang Wu, Guang Lin, Huai Sun

Abstract: We introduce an explorative active learning (AL) algorithm based on Gaussian process regression and marginalized graph kernel (GPR-MGK) to explore chemical space with minimum cost. Using high-throughput molecular dynamics simulation to generate data and graph neural network (GNN) to predict, we constructed an active learning molecular simulation framework for thermodynamic property prediction. In… ▽ More We introduce an explorative active learning (AL) algorithm based on Gaussian process regression and marginalized graph kernel (GPR-MGK) to explore chemical space with minimum cost. Using high-throughput molecular dynamics simulation to generate data and graph neural network (GNN) to predict, we constructed an active learning molecular simulation framework for thermodynamic property prediction. In specific, targeting 251,728 alkane molecules consisting of 4 to 19 carbon atoms and their liquid physical properties: densities, heat capacities, and vaporization enthalpies, we use the AL algorithm to select the most informative molecules to represent the chemical space. Validation of computational and experimental test sets shows that only 313 (0.124\% of the total) molecules were sufficient to train an accurate GNN model with $\rm R^2 > 0.99$ for computational test sets and $\rm R^2 > 0.94$ for experimental test sets. We highlight two advantages of the presented AL algorithm: compatibility with high-throughput data generation and reliable uncertainty quantification. △ Less

Submitted 1 September, 2022; originally announced September 2022.

Comments: 9 pages, 5 figures

arXiv:2208.10027 [pdf, other]

Learning Invariant Representations under General Interventions on the Response

Authors: Kang Du, Yu Xiang

Abstract: It has become increasingly common nowadays to collect observations of feature and response pairs from different environments. As a consequence, one has to apply learned predictors to data with a different distribution due to distribution shifts. One principled approach is to adopt the structural causal models to describe training and test models, following the invariance principle which says that… ▽ More It has become increasingly common nowadays to collect observations of feature and response pairs from different environments. As a consequence, one has to apply learned predictors to data with a different distribution due to distribution shifts. One principled approach is to adopt the structural causal models to describe training and test models, following the invariance principle which says that the conditional distribution of the response given its predictors remains the same across environments. However, this principle might be violated in practical settings when the response is intervened. A natural question is whether it is still possible to identify other forms of invariance to facilitate prediction in unseen environments. To shed light on this challenging scenario, we focus on linear structural causal models (SCMs) and introduce invariant matching property (IMP), an explicit relation to capture interventions through an additional feature, leading to an alternative form of invariance that enables a unified treatment of general interventions on the response as well as the predictors. We analyze the asymptotic generalization errors of our method under both the discrete and continuous environment settings, where the continuous case is handled by relating it to the semiparametric varying coefficient models. We present algorithms that show competitive performance compared to existing methods over various experimental settings including a COVID dataset. △ Less

Submitted 30 October, 2023; v1 submitted 21 August, 2022; originally announced August 2022.

Comments: Accepted to the IEEE Journal on Selected Areas in Information Theory. Special Issue: Causality: Fundamental Limits and Applications

arXiv:2208.09804 [pdf, ps, other]

doi 10.1103/PhysRevB.107.165123

Electronic Correlations and Evolution of the Charge-Density Wave in the Kagome Metals $A$V$_{3}$Sb$_{5}$ ($A$ = K, Rb, Cs)

Authors: Xiaoxiang Zhou, Yongkai Li, Xinwei Fan, Jiahao Hao, Ying Xiang, Zhe Liu, Yaomin Dai, Zhiwei Wang, Yugui Yao, Hai-Hu Wen

Abstract: The kagome metals $A$V$_{3}$Sb$_{5}$ ($A$ = K, Rb, Cs) have attracted enormous interest as they exhibit intertwined charge-density wave (CDW) and superconductivity. The alkali-metal dependence of these characteristics contains pivotal information about the CDW and its interplay with superconductivity. Here, we report optical studies of $A$V$_{3}$Sb$_{5}$ across the whole family. With increasing al… ▽ More The kagome metals $A$V$_{3}$Sb$_{5}$ ($A$ = K, Rb, Cs) have attracted enormous interest as they exhibit intertwined charge-density wave (CDW) and superconductivity. The alkali-metal dependence of these characteristics contains pivotal information about the CDW and its interplay with superconductivity. Here, we report optical studies of $A$V$_{3}$Sb$_{5}$ across the whole family. With increasing alkali-metal atom radius from K to Cs, the CDW gap increases monotonically, whereas $T_{\text{CDW}}$ first rises and then drops, at variance with conventional CDW. While the Fermi surface gapped by the CDW grows, $T_{c}$ is elevated in CsV$_{3}$Sb$_{5}$, indicating that the interplay between the CDW and superconductivity is not simply a competition for the density of states near \EF. More importantly, we observe an enhancement of electronic correlations in CsV$_{3}$Sb$_{5}$, which suppresses the CDW but enhances superconductivity, thus accounting for the above peculiar observations. Our results suggest electronic correlations as an important factor in manipulating the CDW and its entanglement with superconductivity in $A$V$_{3}$Sb$_{5}$. △ Less

Submitted 21 August, 2022; originally announced August 2022.

Comments: 6 pages, 4 figures. Comments are welcome and appreciated

arXiv:2208.07194 [pdf, other]

doi 10.1109/TVT.2022.3232603

An Efficient and Reliable Asynchronous Federated Learning Scheme for Smart Public Transportation

Authors: Chenhao Xu, Youyang Qu, Tom H. Luan, Peter W. Eklund, Yong Xiang, Longxiang Gao

Abstract: Since the traffic conditions change over time, machine learning models that predict traffic flows must be updated continuously and efficiently in smart public transportation. Federated learning (FL) is a distributed machine learning scheme that allows buses to receive model updates without waiting for model training on the cloud. However, FL is vulnerable to poisoning or DDoS attacks since buses t… ▽ More Since the traffic conditions change over time, machine learning models that predict traffic flows must be updated continuously and efficiently in smart public transportation. Federated learning (FL) is a distributed machine learning scheme that allows buses to receive model updates without waiting for model training on the cloud. However, FL is vulnerable to poisoning or DDoS attacks since buses travel in public. Some work introduces blockchain to improve reliability, but the additional latency from the consensus process reduces the efficiency of FL. Asynchronous Federated Learning (AFL) is a scheme that reduces the latency of aggregation to improve efficiency, but the learning performance is unstable due to unreasonably weighted local models. To address the above challenges, this paper offers a blockchain-based asynchronous federated learning scheme with a dynamic scaling factor (DBAFL). Specifically, the novel committee-based consensus algorithm for blockchain improves reliability at the lowest possible cost of time. Meanwhile, the devised dynamic scaling factor allows AFL to assign reasonable weights to stale local models. Extensive experiments conducted on heterogeneous devices validate outperformed learning performance, efficiency, and reliability of DBAFL. △ Less

Submitted 26 December, 2022; v1 submitted 15 August, 2022; originally announced August 2022.

arXiv:2208.00183 [pdf, other]

Few-shot Single-view 3D Reconstruction with Memory Prior Contrastive Network

Authors: Zhen Xing, Yijiang Chen, Zhixin Ling, Xiangdong Zhou, Yu Xiang

Abstract: 3D reconstruction of novel categories based on few-shot learning is appealing in real-world applications and attracts increasing research interests. Previous approaches mainly focus on how to design shape prior models for different categories. Their performance on unseen categories is not very competitive. In this paper, we present a Memory Prior Contrastive Network (MPCN) that can store shape pri… ▽ More 3D reconstruction of novel categories based on few-shot learning is appealing in real-world applications and attracts increasing research interests. Previous approaches mainly focus on how to design shape prior models for different categories. Their performance on unseen categories is not very competitive. In this paper, we present a Memory Prior Contrastive Network (MPCN) that can store shape prior knowledge in a few-shot learning based 3D reconstruction framework. With the shape memory, a multi-head attention module is proposed to capture different parts of a candidate shape prior and fuse these parts together to guide 3D reconstruction of novel categories. Besides, we introduce a 3D-aware contrastive learning method, which can not only complement the retrieval accuracy of memory network, but also better organize image features for downstream tasks. Compared with previous few-shot 3D reconstruction methods, MPCN can handle the inter-class variability without category annotations. Experimental results on a benchmark synthetic dataset and the Pascal3D+ real-world dataset show that our model outperforms the current state-of-the-art methods significantly. △ Less

Submitted 30 July, 2022; originally announced August 2022.

Comments: Accepted by ECCV 2022

arXiv:2207.13921 [pdf, other]

doi 10.1038/s42256-023-00721-6

HelixFold-Single: MSA-free Protein Structure Prediction by Using Protein Language Model as an Alternative

Authors: Xiaomin Fang, Fan Wang, Lihang Liu, Jingzhou He, Dayong Lin, Yingfei Xiang, Xiaonan Zhang, Hua Wu, Hui Li, Le Song

Abstract: AI-based protein structure prediction pipelines, such as AlphaFold2, have achieved near-experimental accuracy. These advanced pipelines mainly rely on Multiple Sequence Alignments (MSAs) as inputs to learn the co-evolution information from the homologous sequences. Nonetheless, searching MSAs from protein databases is time-consuming, usually taking dozens of minutes. Consequently, we attempt to ex… ▽ More AI-based protein structure prediction pipelines, such as AlphaFold2, have achieved near-experimental accuracy. These advanced pipelines mainly rely on Multiple Sequence Alignments (MSAs) as inputs to learn the co-evolution information from the homologous sequences. Nonetheless, searching MSAs from protein databases is time-consuming, usually taking dozens of minutes. Consequently, we attempt to explore the limits of fast protein structure prediction by using only primary sequences of proteins. HelixFold-Single is proposed to combine a large-scale protein language model with the superior geometric learning capability of AlphaFold2. Our proposed method, HelixFold-Single, first pre-trains a large-scale protein language model (PLM) with thousands of millions of primary sequences utilizing the self-supervised learning paradigm, which will be used as an alternative to MSAs for learning the co-evolution information. Then, by combining the pre-trained PLM and the essential components of AlphaFold2, we obtain an end-to-end differentiable model to predict the 3D coordinates of atoms from only the primary sequence. HelixFold-Single is validated in datasets CASP14 and CAMEO, achieving competitive accuracy with the MSA-based methods on the targets with large homologous families. Furthermore, HelixFold-Single consumes much less time than the mainstream pipelines for protein structure prediction, demonstrating its potential in tasks requiring many predictions. The code of HelixFold-Single is available at https://github.com/PaddlePaddle/PaddleHelix/tree/dev/apps/protein_folding/helixfold-single, and we also provide stable web services on https://paddlehelix.baidu.com/app/drug/protein-single/forecast. △ Less

Submitted 21 February, 2023; v1 submitted 28 July, 2022; originally announced July 2022.

Journal ref: Nature Machine Intelligence, 2023

arXiv:2207.13342 [pdf, other]

Quantum Steering: Practical Challenges and Perspectives

Authors: Yu Xiang, Shuming Cheng, Qihuang Gong, Zbigniew Ficek, Qiongyi He

Abstract: Einstein-Rosen-Podolsky (EPR) steering or quantum steering describes the "spooky-action-at-a-distance" that one party is able to remotely alter the states of the other if they share a certain entangled state. Generally, it admits an operational interpretation as the task of verifying entanglement without trust in the steering party's devices, making it lying intermediate between Bell nonlocality a… ▽ More Einstein-Rosen-Podolsky (EPR) steering or quantum steering describes the "spooky-action-at-a-distance" that one party is able to remotely alter the states of the other if they share a certain entangled state. Generally, it admits an operational interpretation as the task of verifying entanglement without trust in the steering party's devices, making it lying intermediate between Bell nonlocality and entanglement. Together with the asymmetrical nature, quantum steering has attracted a considerable interest from theoretical and experimental sides over past decades. In this Perspective, we present a brief overview of the EPR steering with emphasis on recent progress, discuss current challenges, opportunities and propose various future directions. We look to the future which directs research to a larger-scale level beyond massless and microscopic systems to reveal steering of higher dimensionality, and to build up steered networks composed of multiple parties. △ Less

Submitted 27 July, 2022; originally announced July 2022.

Comments: PRX Quantum accepted

arXiv:2207.06638 [pdf, other]

doi 10.1103/PhysRevApplied.18.024065

Characterizing Multipartite Non-Gaussian Entanglement for Three-Mode Spontaneous Parametric Down-Conversion Process

Authors: Mingsheng Tian, Yu Xiang, Feng-Xiao Sun, Matteo Fadel, Qiongyi He

Abstract: Very recently, strongly non-Gaussian states have been observed via a direct three-mode spontaneous parametric down-conversion in a superconducting cavity [Phys. Rev. X 10, 011011 (2020)]. The created multi-photon non-Gaussian correlations are attractive and useful for various quantum information tasks. However, how to detect and classify multipartite non-Gaussian entanglement has not yet been comp… ▽ More Very recently, strongly non-Gaussian states have been observed via a direct three-mode spontaneous parametric down-conversion in a superconducting cavity [Phys. Rev. X 10, 011011 (2020)]. The created multi-photon non-Gaussian correlations are attractive and useful for various quantum information tasks. However, how to detect and classify multipartite non-Gaussian entanglement has not yet been completely understood. Here, we present an experimentally practical method to characterize continuous-variable multipartite non-Gaussian entanglement, by introducing a class of nonlinear squeezing parameters involving accessible higher-order moments of phase-space quadratures. As these parameters can depend on arbitrary operators, we consider their analytical optimization over a set of practical measurements, in order to detect different classes of multipartite non-Gaussian entanglement ranging from fully separable to fully inseparable. We demonstrate that the nonlinear squeezing parameters act as an excellent approximation to the quantum Fisher information within accessible third-order moments. The level of the nonlinear squeezing quantifies the metrological advantage provided by those entangled states. Moreover, by analyzing the above mentioned experiment, we show that our method can be readily used to confirm fully inseparable tripartite non-Gaussian entangled states by performing a limited number of measurements without requiring full knowledge of the quantum state. △ Less

Submitted 13 July, 2022; originally announced July 2022.

Comments: 10 pages, 4 figures

Journal ref: Phys. Rev. Applied 18, 024065 (2022)

arXiv:2207.05477 [pdf, other]

HelixFold: An Efficient Implementation of AlphaFold2 using PaddlePaddle

Authors: Guoxia Wang, Xiaomin Fang, Zhihua Wu, Yiqun Liu, Yang Xue, Yingfei Xiang, Dianhai Yu, Fan Wang, Yanjun Ma

Abstract: Accurate protein structure prediction can significantly accelerate the development of life science. The accuracy of AlphaFold2, a frontier end-to-end structure prediction system, is already close to that of the experimental determination techniques. Due to the complex model architecture and large memory consumption, it requires lots of computational resources and time to implement the training and… ▽ More Accurate protein structure prediction can significantly accelerate the development of life science. The accuracy of AlphaFold2, a frontier end-to-end structure prediction system, is already close to that of the experimental determination techniques. Due to the complex model architecture and large memory consumption, it requires lots of computational resources and time to implement the training and inference of AlphaFold2 from scratch. The cost of running the original AlphaFold2 is expensive for most individuals and institutions. Therefore, reducing this cost could accelerate the development of life science. We implement AlphaFold2 using PaddlePaddle, namely HelixFold, to improve training and inference speed and reduce memory consumption. The performance is improved by operator fusion, tensor fusion, and hybrid parallelism computation, while the memory is optimized through Recompute, BFloat16, and memory read/write in-place. Compared with the original AlphaFold2 (implemented with Jax) and OpenFold (implemented with PyTorch), HelixFold needs only 7.5 days to complete the full end-to-end training and only 5.3 days when using hybrid parallelism, while both AlphaFold2 and OpenFold take about 11 days. HelixFold saves 1x training time. We verified that HelixFold's accuracy could be on par with AlphaFold2 on the CASP14 and CAMEO datasets. HelixFold's code is available on GitHub for free download: https://github.com/PaddlePaddle/PaddleHelix/tree/dev/apps/protein_folding/helixfold, and we also provide stable web services on https://paddlehelix.baidu.com/app/drug/protein/forecast. △ Less

Submitted 13 July, 2022; v1 submitted 12 July, 2022; originally announced July 2022.

arXiv:2207.04579 [pdf, ps, other]

doi 10.3847/1538-4357/ac7ffa

Reconfiguration and eruption of a solar filament by magnetic reconnection with an emerging magnetic field

Authors: Leping Li, Hardi Peter, Lakshmi Pradeep Chitta, Hongqiang Song, Zhe Xu, Yongyuan Xiang

Abstract: Both observations and simulations suggest that the solar filament eruption is closely related to magnetic flux emergence. It is thought that the eruption is triggered by magnetic reconnection between the filament and the emerging flux. However, the details of such a reconnection are rarely presented. In this study, we report the detailed reconnection between a filament and its nearby emerging fiel… ▽ More Both observations and simulations suggest that the solar filament eruption is closely related to magnetic flux emergence. It is thought that the eruption is triggered by magnetic reconnection between the filament and the emerging flux. However, the details of such a reconnection are rarely presented. In this study, we report the detailed reconnection between a filament and its nearby emerging fields, that led to the reconfiguration and subsequent partial eruption of the filament located over the polarity inversion line of active region 12816. Before the reconnection, we observed repeated brightenings in the filament at a location that overlies a site of magnetic flux cancellation. Plasmoids form at this brightening region, and propagate bi-directionally along the filament. These indicate the tether-cutting reconnection that results in the formation and eruption of a flux rope. To the northwest of the filament, magnetic fields emerge, and reconnect with the context ones, resulting in repeated jets. Afterwards, another magnetic fields emerge near the northwestern filament endpoints, and reconnect with the filament, forming the newly reconnected filament and loops. Current sheet repeatedly occurs at the interface, with the mean temperature and emission measure of 1.7 MK and 1.1$\times$10$^{28}$ cm$^{-5}$. Plasmoids form in the current sheet, and propagate along it and further along the newly reconnected filament and loops. The newly reconnected filament then erupts, while the unreconnected filament remains stable. We propose that besides the orientation of emerging fields, some other parameters, such as the position, distance, strength, and area, are also crucial for triggering the filament eruption. △ Less

Submitted 10 July, 2022; originally announced July 2022.

Comments: 15 pages, 7 figures, accepted for publication in ApJ

arXiv:2207.04434 [pdf, other]

Hiding Your Signals: A Security Analysis of PPG-based Biometric Authentication

Authors: Lin Li, Chao Chen, Lei Pan, Yonghang Tai, Jun Zhang, Yang Xiang

Abstract: Recently, physiological signal-based biometric systems have received wide attention. Unlike traditional biometric features, physiological signals can not be easily compromised (usually unobservable to human eyes). Photoplethysmography (PPG) signal is easy to measure, making it more attractive than many other physiological signals for biometric authentication. However, with the advent of remote PPG… ▽ More Recently, physiological signal-based biometric systems have received wide attention. Unlike traditional biometric features, physiological signals can not be easily compromised (usually unobservable to human eyes). Photoplethysmography (PPG) signal is easy to measure, making it more attractive than many other physiological signals for biometric authentication. However, with the advent of remote PPG (rPPG), unobservability has been challenged when the attacker can remotely steal the rPPG signals by monitoring the victim's face, subsequently posing a threat to PPG-based biometrics. In PPG-based biometric authentication, current attack approaches mandate the victim's PPG signal, making rPPG-based attacks neglected. In this paper, we firstly analyze the security of PPG-based biometrics, including user authentication and communication protocols. We evaluate the signal waveforms, heart rate and inter-pulse-interval information extracted by five rPPG methods, including four traditional optical computing methods (CHROM, POS, LGI, PCA) and one deep learning method (CL_rPPG). We conducted experiments on five datasets (PURE, UBFC_rPPG, UBFC_Phys, LGI_PPGI, and COHFACE) to collect a comprehensive set of results. Our empirical studies show that rPPG poses a serious threat to the authentication system. The success rate of the rPPG signal spoofing attack in the user authentication system reached 0.35. The bit hit rate is 0.6 in inter-pulse-interval-based security protocols. Further, we propose an active defence strategy to hide the physiological signals of the face to resist the attack. It reduces the success rate of rPPG spoofing attacks in user authentication to 0.05. The bit hit rate was reduced to 0.5, which is at the level of a random guess. Our strategy effectively prevents the exposure of PPG signals to protect users' sensitive physiological data. △ Less

Submitted 10 July, 2022; originally announced July 2022.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2207.03333 [pdf, other]

FewSOL: A Dataset for Few-Shot Object Learning in Robotic Environments

Authors: Jishnu Jaykumar P, Yu-Wei Chao, Yu Xiang

Abstract: We introduce the Few-Shot Object Learning (FewSOL) dataset for object recognition with a few images per object. We captured 336 real-world objects with 9 RGB-D images per object from different views. Object segmentation masks, object poses and object attributes are provided. In addition, synthetic images generated using 330 3D object models are used to augment the dataset. We investigated (i) few-… ▽ More We introduce the Few-Shot Object Learning (FewSOL) dataset for object recognition with a few images per object. We captured 336 real-world objects with 9 RGB-D images per object from different views. Object segmentation masks, object poses and object attributes are provided. In addition, synthetic images generated using 330 3D object models are used to augment the dataset. We investigated (i) few-shot object classification and (ii) joint object segmentation and few-shot classification with the state-of-the-art methods for few-shot learning and meta-learning using our dataset. The evaluation results show that there is still a large margin to be improved for few-shot object classification in robotic environments. Our dataset can be used to study a set of few-shot object recognition problems such as classification, detection and segmentation, shape reconstruction, pose estimation, keypoint correspondences and attribute recognition. The dataset and code are available at https://irvlutd.github.io/FewSOL. △ Less

Submitted 5 March, 2023; v1 submitted 6 July, 2022; originally announced July 2022.

arXiv:2207.02959 [pdf, other]

NeuralGrasps: Learning Implicit Representations for Grasps of Multiple Robotic Hands

Authors: Ninad Khargonkar, Neil Song, Zesheng Xu, Balakrishnan Prabhakaran, Yu Xiang

Abstract: We introduce a neural implicit representation for grasps of objects from multiple robotic hands. Different grasps across multiple robotic hands are encoded into a shared latent space. Each latent vector is learned to decode to the 3D shape of an object and the 3D shape of a robotic hand in a grasping pose in terms of the signed distance functions of the two 3D shapes. In addition, the distance met… ▽ More We introduce a neural implicit representation for grasps of objects from multiple robotic hands. Different grasps across multiple robotic hands are encoded into a shared latent space. Each latent vector is learned to decode to the 3D shape of an object and the 3D shape of a robotic hand in a grasping pose in terms of the signed distance functions of the two 3D shapes. In addition, the distance metric in the latent space is learned to preserve the similarity between grasps across different robotic hands, where the similarity of grasps is defined according to contact regions of the robotic hands. This property enables our method to transfer grasps between different grippers including a human hand, and grasp transfer has the potential to share grasping skills between robots and enable robots to learn grasping skills from humans. Furthermore, the encoded signed distance functions of objects and grasps in our implicit representation can be used for 6D object pose estimation with grasping contact optimization from partial point clouds, which enables robotic grasping in the real world. △ Less

Submitted 6 July, 2022; originally announced July 2022.

arXiv:2207.00728 [pdf, other]

doi 10.1109/TCSVT.2022.3207516

Multi-scale Attentive Image De-raining Networks via Neural Architecture Search

Authors: Lei Cai, Yuli Fu, Wanliang Huo, Youjun Xiang, Tao Zhu, Ying Zhang, Huanqiang Zeng, Delu Zeng

Abstract: Multi-scale architectures and attention modules have shown effectiveness in many deep learning-based image de-raining methods. However, manually designing and integrating these two components into a neural network requires a bulk of labor and extensive expertise. In this article, a high-performance multi-scale attentive neural architecture search (MANAS) framework is technically developed for imag… ▽ More Multi-scale architectures and attention modules have shown effectiveness in many deep learning-based image de-raining methods. However, manually designing and integrating these two components into a neural network requires a bulk of labor and extensive expertise. In this article, a high-performance multi-scale attentive neural architecture search (MANAS) framework is technically developed for image deraining. The proposed method formulates a new multi-scale attention search space with multiple flexible modules that are favorite to the image de-raining task. Under the search space, multi-scale attentive cells are built, which are further used to construct a powerful image de-raining network. The internal multiscale attentive architecture of the de-raining network is searched automatically through a gradient-based search algorithm, which avoids the daunting procedure of the manual design to some extent. Moreover, in order to obtain a robust image de-raining model, a practical and effective multi-to-one training strategy is also presented to allow the de-raining network to get sufficient background information from multiple rainy images with the same background scene, and meanwhile, multiple loss functions including external loss, internal loss, architecture regularization loss, and model complexity loss are jointly optimized to achieve robust de-raining performance and controllable model complexity. Extensive experimental results on both synthetic and realistic rainy images, as well as the down-stream vision applications (i.e., objection detection and segmentation) consistently demonstrate the superiority of our proposed method. The code is publicly available at https://github.com/lcai-gz/MANAS. △ Less

Submitted 4 April, 2023; v1 submitted 1 July, 2022; originally announced July 2022.

Journal ref: IEEE Transactions on Circuits and Systems for Video Technology, vol.33, no.2, pp.618-633 September 2022

arXiv:2207.00268 [pdf, ps, other]

doi 10.1088/1674-4527/ac7cba

High-resolution Solar Image Reconstruction Based on Non-rigid Alignment

Authors: Hui Liu, Zhenyu Jin, Yongyuan Xiang, Kaifan Ji

Abstract: Suppressing the interference of atmospheric turbulence and obtaining observation data with a high spatial resolution is an issue to be solved urgently for ground observations. One way to solve this problem is to perform a statistical reconstruction of short-exposure speckle images. Combining the rapidity of Shift-Add and the accuracy of speckle masking, this paper proposes a novel reconstruction a… ▽ More Suppressing the interference of atmospheric turbulence and obtaining observation data with a high spatial resolution is an issue to be solved urgently for ground observations. One way to solve this problem is to perform a statistical reconstruction of short-exposure speckle images. Combining the rapidity of Shift-Add and the accuracy of speckle masking, this paper proposes a novel reconstruction algorithm-NASIR (Non-rigid Alignment based Solar Image Reconstruction). NASIR reconstructs the phase of the object image at each frequency by building a computational model between geometric distortion and intensity distribution and reconstructs the modulus of the object image on the aligned speckle images by speckle interferometry. We analyzed the performance of NASIR by using the correlation coefficient, power spectrum, and coefficient of variation of intensity profile (CVoIP) in processing data obtained by the NVST (1m New Vacuum Solar Telescope). The reconstruction experiments and analysis results show that the quality of images reconstructed by NASIR is close to speckle masking when the seeing is good, while NASIR has excellent robustness when the seeing condition becomes worse. Furthermore, NASIR reconstructs the entire field of view in parallel in one go, without phase recursion and block-by-block reconstruction, so its computation time is less than half that of speckle masking. Therefore, we consider NASIR is a robust and high-quality fast reconstruction method that can serve as an effective tool for data filtering and quick look. △ Less

Submitted 1 July, 2022; originally announced July 2022.

arXiv:2206.14362 [pdf, other]

Lower Bounds on the Error Probability for Invariant Causal Prediction

Authors: Austin Goddard, Yu Xiang, Ilya Soloveychik

Abstract: It is common practice to collect observations of feature and response pairs from different environments. A natural question is how to identify features that have consistent prediction power across environments. The invariant causal prediction framework proposes to approach this problem through invariance, assuming a linear model that is invariant under different environments. In this work, we make… ▽ More It is common practice to collect observations of feature and response pairs from different environments. A natural question is how to identify features that have consistent prediction power across environments. The invariant causal prediction framework proposes to approach this problem through invariance, assuming a linear model that is invariant under different environments. In this work, we make an attempt to shed light on this framework by connecting it to the Gaussian multiple access channel problem. Specifically, we incorporate optimal code constructions and decoding methods to provide lower bounds on the error probability. We illustrate our findings by various simulation settings. △ Less

Submitted 29 June, 2022; v1 submitted 28 June, 2022; originally announced June 2022.

Comments: Accepted to the 2022 IEEE International Workshop on Machine Learning for Signal Processing (MLSP)

arXiv:2206.12254 [pdf, other]

A Manifold-based Airfoil Geometric-feature Extraction and Discrepant Data Fusion Learning Method

Authors: Yu Xiang, Guangbo Zhang, Liwei Hu, Jun Zhang, Wenyong Wang

Abstract: Geometrical shape of airfoils, together with the corresponding flight conditions, are crucial factors for aerodynamic performances prediction. The obtained airfoils geometrical features in most existing approaches (e.g., geometrical parameters extraction, polynomial description and deep learning) are in Euclidean space. State-of-the-art studies showed that curves or surfaces of an airfoil formed a… ▽ More Geometrical shape of airfoils, together with the corresponding flight conditions, are crucial factors for aerodynamic performances prediction. The obtained airfoils geometrical features in most existing approaches (e.g., geometrical parameters extraction, polynomial description and deep learning) are in Euclidean space. State-of-the-art studies showed that curves or surfaces of an airfoil formed a manifold in Riemannian space. Therefore, the features extracted by existing methods are not sufficient to reflect the geometric-features of airfoils. Meanwhile, flight conditions and geometric features are greatly discrepant with different types, the relevant knowledge of the influence of these two factors that on final aerodynamic performances predictions must be evaluated and learned to improve prediction accuracy. Motivated by the advantages of manifold theory and multi-task learning, we propose a manifold-based airfoil geometric-feature extraction and discrepant data fusion learning method (MDF) to extract geometric-features of airfoils in Riemannian space (we call them manifold-features) and further fuse the manifold-features with flight conditions to predict aerodynamic performances. Experimental results show that our method could extract geometric-features of airfoils more accurately compared with existing methods, that the average MSE of re-built airfoils is reduced by 56.33%, and while keeping the same predicted accuracy level of CL, the MSE of CD predicted by MDF is further reduced by 35.37%. △ Less

Submitted 23 June, 2022; originally announced June 2022.

arXiv:2206.10736 [pdf]

Imitate then Transcend: Multi-Agent Optimal Execution with Dual-Window Denoise PPO

Authors: Jin Fang, Jiacheng Weng, Yi Xiang, Xinwen Zhang

Abstract: A novel framework for solving the optimal execution and placement problems using reinforcement learning (RL) with imitation was proposed. The RL agents trained from the proposed framework consistently outperformed the industry benchmark time-weighted average price (TWAP) strategy in execution cost and showed great generalization across out-of-sample trading dates and tickers. The impressive perfor… ▽ More A novel framework for solving the optimal execution and placement problems using reinforcement learning (RL) with imitation was proposed. The RL agents trained from the proposed framework consistently outperformed the industry benchmark time-weighted average price (TWAP) strategy in execution cost and showed great generalization across out-of-sample trading dates and tickers. The impressive performance was achieved from three aspects. First, our RL network architecture called Dual-window Denoise PPO enabled efficient learning in a noisy market environment. Second, a reward scheme with imitation learning was designed, and a comprehensive set of market features was studied. Third, our flexible action formulation allowed the RL agent to tackle optimal execution and placement collectively resulting in better performance than solving individual problems separately. The RL agent's performance was evaluated in our multi-agent realistic historical limit order book simulator in which price impact was accurately assessed. In addition, ablation studies were also performed, confirming the superiority of our framework. △ Less

Submitted 21 June, 2022; originally announced June 2022.

arXiv:2206.07257 [pdf, other]

doi 10.1093/mnras/stac1693

Investigation of stellar magnetic activity using variational autoencoder based on low-resolution spectroscopic survey

Authors: Yue Xiang, Shenghong Gu, Dongtao Cao

Abstract: We apply the variational autoencoder (VAE) to the LAMOST-K2 low-resolution spectra to detect the magnetic activity of the stars in the K2 field. After the training on the spectra of the selected inactive stars, the VAE model can efficiently generate the synthetic reference templates needed by the spectral subtraction procedure, without knowing any stellar parameters. Then we detect the peculiar sp… ▽ More We apply the variational autoencoder (VAE) to the LAMOST-K2 low-resolution spectra to detect the magnetic activity of the stars in the K2 field. After the training on the spectra of the selected inactive stars, the VAE model can efficiently generate the synthetic reference templates needed by the spectral subtraction procedure, without knowing any stellar parameters. Then we detect the peculiar spectral features, such as chromospheric emissions, strong nebular emissions and lithium absorptions, in our sample. We measure the emissions of the chromospheric activity indicators, H$α$ and Ca II infrared triplet (IRT) lines, to quantify the stellar magnetic activity. The excess emissions of H$α$ and Ca II IRT lines of the active stars are correlated well to the rotational periods and the amplitudes of light curves derived from the K2 photometry. We degrade the LAMOST spectra to simulate the slitless spectra of the China Space Station Telescope (CSST) and apply the VAE to the simulated data. For cool active stars, we reveal a good agreement between the equivalent widths (EWs) of H$α$ line derived from the spectra with two resolutions. The result indicates the ability of identifying the magnetically active stars in the future CSST survey, which will deliver an unprecedented large database of low-resolution spectra as well as simultaneous multi-band photometry of stars. △ Less

Submitted 6 July, 2022; v1 submitted 14 June, 2022; originally announced June 2022.

Comments: 13 pages, 19 figures, accepted for publication in MNRAS. Table 1 is available on Zenodo at https://doi.org/10.5281/zenodo.6802956 and the code can be found on GitHub at https://github.com/xylib/vae-for-spectroscopic-survey

arXiv:2205.14421 [pdf, ps, other]

doi 10.4208/jml.221018

Approximation of Functionals by Neural Network without Curse of Dimensionality

Authors: Yahong Yang, Yang Xiang

Abstract: In this paper, we establish a neural network to approximate functionals, which are maps from infinite dimensional spaces to finite dimensional spaces. The approximation error of the neural network is $O(1/\sqrt{m})$ where $m$ is the size of networks, which overcomes the curse of dimensionality. The key idea of the approximation is to define a Barron spectral space of functionals. In this paper, we establish a neural network to approximate functionals, which are maps from infinite dimensional spaces to finite dimensional spaces. The approximation error of the neural network is $O(1/\sqrt{m})$ where $m$ is the size of networks, which overcomes the curse of dimensionality. The key idea of the approximation is to define a Barron spectral space of functionals. △ Less

Submitted 18 October, 2022; v1 submitted 28 May, 2022; originally announced May 2022.

Journal ref: J. Mach. Learn. , 1 (2022), pp. 342-372

arXiv:2205.09747 [pdf, other]

HandoverSim: A Simulation Framework and Benchmark for Human-to-Robot Object Handovers

Authors: Yu-Wei Chao, Chris Paxton, Yu Xiang, Wei Yang, Balakumar Sundaralingam, Tao Chen, Adithyavairavan Murali, Maya Cakmak, Dieter Fox

Abstract: We introduce a new simulation benchmark "HandoverSim" for human-to-robot object handovers. To simulate the giver's motion, we leverage a recent motion capture dataset of hand grasping of objects. We create training and evaluation environments for the receiver with standardized protocols and metrics. We analyze the performance of a set of baselines and show a correlation with a real-world evaluatio… ▽ More We introduce a new simulation benchmark "HandoverSim" for human-to-robot object handovers. To simulate the giver's motion, we leverage a recent motion capture dataset of hand grasping of objects. We create training and evaluation environments for the receiver with standardized protocols and metrics. We analyze the performance of a set of baselines and show a correlation with a real-world evaluation. Code is open sourced at https://handover-sim.github.io. △ Less

Submitted 19 May, 2022; originally announced May 2022.

Comments: Accepted to ICRA 2022

arXiv:2205.09470 [pdf, other]

Nebula-I: A General Framework for Collaboratively Training Deep Learning Models on Low-Bandwidth Cloud Clusters

Authors: Yang Xiang, Zhihua Wu, Weibao Gong, Siyu Ding, Xianjie Mo, Yuang Liu, Shuohuan Wang, Peng Liu, Yongshuai Hou, Long Li, Bin Wang, Shaohuai Shi, Yaqian Han, Yue Yu, Ge Li, Yu Sun, Yanjun Ma, Dianhai Yu

Abstract: The ever-growing model size and scale of compute have attracted increasing interests in training deep learning models over multiple nodes. However, when it comes to training on cloud clusters, especially across remote clusters, huge challenges are faced. In this work, we introduce a general framework, Nebula-I, for collaboratively training deep learning models over remote heterogeneous clusters, t… ▽ More The ever-growing model size and scale of compute have attracted increasing interests in training deep learning models over multiple nodes. However, when it comes to training on cloud clusters, especially across remote clusters, huge challenges are faced. In this work, we introduce a general framework, Nebula-I, for collaboratively training deep learning models over remote heterogeneous clusters, the connections between which are low-bandwidth wide area networks (WANs). We took natural language processing (NLP) as an example to show how Nebula-I works in different training phases that include: a) pre-training a multilingual language model using two remote clusters; and b) fine-tuning a machine translation model using knowledge distilled from pre-trained models, which run through the most popular paradigm of recent deep learning. To balance the accuracy and communication efficiency, in Nebula-I, parameter-efficient training strategies, hybrid parallel computing methods and adaptive communication acceleration techniques are jointly applied. Meanwhile, security strategies are employed to guarantee the safety, reliability and privacy in intra-cluster computation and inter-cluster communication. Nebula-I is implemented with the PaddlePaddle deep learning framework, which can support collaborative training over heterogeneous hardware, e.g. GPU and NPU. Experiments demonstrate that the proposed framework could substantially maximize the training efficiency while preserving satisfactory NLP performance. By using Nebula-I, users can run large-scale training tasks over cloud clusters with minimum developments, and the utility of existed large pre-trained models could be further promoted. We also introduced new state-of-the-art results on cross-lingual natural language inference tasks, which are generated based upon a novel learning framework and Nebula-I. △ Less

Submitted 19 May, 2022; originally announced May 2022.

Comments: 20 pages, 10 figures, technical report

arXiv:2205.09162 [pdf, ps, other]

An Invariant Matching Property for Distribution Generalization under Intervened Response

Authors: Kang Du, Yu Xiang

Abstract: The task of distribution generalization concerns making reliable prediction of a response in unseen environments. The structural causal models are shown to be useful to model distribution changes through intervention. Motivated by the fundamental invariance principle, it is often assumed that the conditional distribution of the response given its predictors remains the same across environments. Ho… ▽ More The task of distribution generalization concerns making reliable prediction of a response in unseen environments. The structural causal models are shown to be useful to model distribution changes through intervention. Motivated by the fundamental invariance principle, it is often assumed that the conditional distribution of the response given its predictors remains the same across environments. However, this assumption might be violated in practical settings when the response is intervened. In this work, we investigate a class of model with an intervened response. We identify a novel form of invariance by incorporating the estimates of certain features as additional predictors. Effectively, we show this invariance is equivalent to having a deterministic linear matching that makes the generalization possible. We provide an explicit characterization of the linear matching and present our simulation results under various intervention settings. △ Less

Submitted 10 June, 2022; v1 submitted 18 May, 2022; originally announced May 2022.

Comments: Accepted to the European Signal Processing Conference (EUSIPCO) 2022

arXiv:2205.07186 [pdf, other]

Stochastic Continuum Models for High--Entropy Alloys with Short-range Order

Authors: Yahong Yang, Luchan Zhang, Yang Xiang

Abstract: High entropy alloys (HEAs) are a class of novel materials that exhibit superb engineering properties. It has been demonstrated by extensive experiments and first principles/atomistic simulations that short-range order in the atomic level randomness strongly influences the properties of HEAs. In this paper, we derive stochastic continuum models for HEAs with short-range order from atomistic models.… ▽ More High entropy alloys (HEAs) are a class of novel materials that exhibit superb engineering properties. It has been demonstrated by extensive experiments and first principles/atomistic simulations that short-range order in the atomic level randomness strongly influences the properties of HEAs. In this paper, we derive stochastic continuum models for HEAs with short-range order from atomistic models. A proper continuum limit is obtained such that the mean and variance of the atomic level randomness together with the short-range order described by a characteristic length are kept in the process from the atomistic interaction model to the continuum equation. The obtained continuum model with short-range order is in the form of an Ornstein--Uhlenbeck (OU) process. This validates the continuum model based on the OU process adopted phenomenologically by Zhang et al. [Acta Mater., 166 (2019), pp. 424--434] for HEAs with short-range order. We derive such stochastic continuum models with short-range order for both elasticity in HEAs without defects and HEAs with dislocations (line defects). The obtained stochastic continuum models are based on the energy formulations, whose variations lead to stochastic partial differential equations. △ Less

Submitted 15 May, 2022; originally announced May 2022.

arXiv:2205.07051 [pdf]

High-speed graphene-silicon-graphene waveguide PDs with high photo-to-dark-current ratio and large linear dynamic range

Authors: Jingshu Guo, Chaoyue Liu, Laiwen Yu, Hengtai Xiang, Yuluan Xiang, Daoxin Dai

Abstract: Two-dimensional materials (2DMs) meet the demand of broadband and low-cost photodetection on silicon for many applications. Currently, it is still very challenging to realize excellent silicon-2DM PDs. Here we demonstrate graphene-silicon-graphene waveguide PDs operating at the wavelength-bands of 1.55 μm and 2 μm, showing the potential for large-scale integration. For the fabricated PDs, the meas… ▽ More Two-dimensional materials (2DMs) meet the demand of broadband and low-cost photodetection on silicon for many applications. Currently, it is still very challenging to realize excellent silicon-2DM PDs. Here we demonstrate graphene-silicon-graphene waveguide PDs operating at the wavelength-bands of 1.55 μm and 2 μm, showing the potential for large-scale integration. For the fabricated PDs, the measured responsivities are respectively ~0.15 mA/W and ~0.015 mA/W for the wavelengths of 1.55 μm and 1.96μm. In particular, the PDs exhibit a high bandwidth of ~33 GHz, an ultra-low dark current of tens of pico-amperes, a high normalized photo-to-dark-current ratio (NPDR) of 1.63x10^6 W^-1, as well as a high linear dynamic range of 3 μW-1.86 mW (and beyond) at 1.55 μm. According to the measurement results for the wavelength-bands of 1.55/2.0 μm and the theoretical modeling for the silicon-graphene heterostructure, it is revealed that internal photo-emission and photo-assisted thermionic field emission dominantly contribute to the photoresponse in the graphene-silicon Schottky junctions, which helps the future work to further improve the performance. △ Less

Submitted 14 May, 2022; originally announced May 2022.

arXiv:2205.05581 [pdf, other]

A deep representation learning speech enhancement method using $β$-VAE

Authors: Yang Xiang, Jesper Lisby Højvang, Morten Højfeldt Rasmussen, Mads Græsbøll Christensen

Abstract: In previous work, we proposed a variational autoencoder-based (VAE) Bayesian permutation training speech enhancement (SE) method (PVAE) which indicated that the SE performance of the traditional deep neural network-based (DNN) method could be improved by deep representation learning (DRL). Based on our previous work, we in this paper propose to use $β$-VAE to further improve PVAE's ability of repr… ▽ More In previous work, we proposed a variational autoencoder-based (VAE) Bayesian permutation training speech enhancement (SE) method (PVAE) which indicated that the SE performance of the traditional deep neural network-based (DNN) method could be improved by deep representation learning (DRL). Based on our previous work, we in this paper propose to use $β$-VAE to further improve PVAE's ability of representation learning. More specifically, our $β$-VAE can improve PVAE's capacity of disentangling different latent variables from the observed signal without the trade-off problem between disentanglement and signal reconstruction. This trade-off problem widely exists in previous $β$-VAE algorithms. Unlike the previous $β$-VAE algorithms, the proposed $β$-VAE strategy can also be used to optimize the DNN's structure. This means that the proposed method can not only improve PVAE's SE performance but also reduce the number of PVAE training parameters. The experimental results show that the proposed method can acquire better speech and noise latent representation than PVAE. Meanwhile, it also obtains a higher scale-invariant signal-to-distortion ratio, speech quality, and speech intelligibility. △ Less

Submitted 11 May, 2022; originally announced May 2022.

Comments: Submitted to Eurosipco

arXiv:2204.13847 [pdf]

CATNet: Cross-event Attention-based Time-aware Network for Medical Event Prediction

Authors: Sicen Liu, Xiaolong Wang, Yang Xiang, Hui Xu, Hui Wang, Buzhou Tang

Abstract: Medical event prediction (MEP) is a fundamental task in the medical domain, which needs to predict medical events, including medications, diagnosis codes, laboratory tests, procedures, outcomes, and so on, according to historical medical records. The task is challenging as medical data is a type of complex time series data with heterogeneous and temporal irregular characteristics. Many machine lea… ▽ More Medical event prediction (MEP) is a fundamental task in the medical domain, which needs to predict medical events, including medications, diagnosis codes, laboratory tests, procedures, outcomes, and so on, according to historical medical records. The task is challenging as medical data is a type of complex time series data with heterogeneous and temporal irregular characteristics. Many machine learning methods that consider the two characteristics have been proposed for medical event prediction. However, most of them consider the two characteristics separately and ignore the correlations among different types of medical events, especially relations between historical medical events and target medical events. In this paper, we propose a novel neural network based on attention mechanism, called cross-event attention-based time-aware network (CATNet), for medical event prediction. It is a time-aware, event-aware and task-adaptive method with the following advantages: 1) modeling heterogeneous information and temporal information in a unified way and considering temporal irregular characteristics locally and globally respectively, 2) taking full advantage of correlations among different types of events via cross-event attention. Experiments on two public datasets (MIMIC-III and eICU) show CATNet can be adaptive with different MEP tasks and outperforms other state-of-the-art methods on various MEP tasks. The source code of CATNet will be released after this manuscript is accepted. △ Less

Submitted 28 April, 2022; originally announced April 2022.

Comments: 15 pages,4 figures

arXiv:2204.13325 [pdf, ps, other]

Weak solutions to an initial-boundary value problem for a continuum equation of motion of grain boundaries

Authors: Peicheng Zhu, Lei Yu, Yang Xiang

Abstract: We investigate an initial-(periodic-)boundary value problem for a continuum equation, which is a model for motion of grain boundaries based on the underlying microscopic mechanisms of line defects (disconnections) and integrated the effects of a diverse range of thermodynamic driving forces. We first prove the global-in-time existence and uniqueness of weak solution to this initial-boundary value… ▽ More We investigate an initial-(periodic-)boundary value problem for a continuum equation, which is a model for motion of grain boundaries based on the underlying microscopic mechanisms of line defects (disconnections) and integrated the effects of a diverse range of thermodynamic driving forces. We first prove the global-in-time existence and uniqueness of weak solution to this initial-boundary value problem in the case with positive equilibrium disconnection density parameter B, and then investigate the asymptotic behavior of the solutions as B goes to zero. The main difficulties in the proof of main theorems are due to the degeneracy of B=0, a non-local term with singularity, and a non-smooth coefficient of the highest derivative associated with the gradient of the unknown. The key ingredients in the proof are the energy method, an estimate for a singular integral of the Hilbert type, and a compactness lemma. △ Less

Submitted 28 April, 2022; originally announced April 2022.

arXiv:2204.11552 [pdf, ps, other]

doi 10.1103/PhysRevLett.128.200401

Experimental demonstration of remotely creating Wigner negativity via quantum steering

Authors: Shuheng Liu, Dongmei Han, Na Wang, Yu Xiang, Fengxiao Sun, Meihong Wang, Zhongzhong Qin, Qihuang Gong, Xiaolong Su, Qiongyi He

Abstract: Non-Gaussian states with Wigner negativity are of particular interest in quantum technology due to their potential applications in quantum computing and quantum metrology. However, how to create such states at a remote location remains a challenge, which is important for efficiently distributing quantum resource between distant nodes in a network. Here, we experimentally prepare optical non-Gaussi… ▽ More Non-Gaussian states with Wigner negativity are of particular interest in quantum technology due to their potential applications in quantum computing and quantum metrology. However, how to create such states at a remote location remains a challenge, which is important for efficiently distributing quantum resource between distant nodes in a network. Here, we experimentally prepare optical non-Gaussian state with negative Wigner function at a remote node via local non-Gaussian operation and shared Gaussian entangled state existing quantum steering. By performing photon subtraction on one mode, Wigner negativity is created in the remote target mode. We show that the Wigner negativity is sensitive to loss on the target mode, but robust to loss on the mode performing photon subtraction. This experiment confirms the connection between the remotely created Wigner negativity and quantum steering. As an application, we present that the generated non-Gaussian state exhibits metrological power in quantum phase estimation. △ Less

Submitted 25 April, 2022; originally announced April 2022.

Comments: Phys. Rev. Lett. (Accepted)

Journal ref: Phys. Rev. Lett. 128, 200401 (2022)

Showing 151–200 of 553 results for author: Xiang, Y