subscribe to arXiv mailings

MMGA: Multimodal Learning with Graph Alignment

Authors: Xuan Yang, Quanjin Tao, Xiao Feng, Donghong Cai, Xiang Ren, Yang Yang

Abstract: Multimodal pre-training breaks down the modality barriers and allows the individual modalities to be mutually augmented with information, resulting in significant advances in representation learning. However, graph modality, as a very general and important form of data, cannot be easily interacted with other modalities because of its non-regular nature. In this paper, we propose MMGA (Multimodal l… ▽ More Multimodal pre-training breaks down the modality barriers and allows the individual modalities to be mutually augmented with information, resulting in significant advances in representation learning. However, graph modality, as a very general and important form of data, cannot be easily interacted with other modalities because of its non-regular nature. In this paper, we propose MMGA (Multimodal learning with Graph Alignment), a novel multimodal pre-training framework to incorporate information from graph (social network), image and text modalities on social media to enhance user representation learning. In MMGA, a multi-step graph alignment mechanism is proposed to add the self-supervision from graph modality to optimize the image and text encoders, while using the information from the image and text modalities to guide the graph encoder learning. We conduct experiments on the dataset crawled from Instagram. The experimental results show that MMGA works well on the dataset and improves the fans prediction task's performance. We release our dataset, the first social media multimodal dataset with graph, of 60,000 users labeled with specific topics based on 2 million posts to facilitate future research. △ Less

Submitted 31 October, 2022; v1 submitted 18 October, 2022; originally announced October 2022.

Comments: Please contact xuany@zju.edu.cn for the dataset

arXiv:2210.09773 [pdf, other]

Retrofitting Multilingual Sentence Embeddings with Abstract Meaning Representation

Authors: Deng Cai, Xin Li, Jackie Chun-Sing Ho, Lidong Bing, Wai Lam

Abstract: We introduce a new method to improve existing multilingual sentence embeddings with Abstract Meaning Representation (AMR). Compared with the original textual input, AMR is a structured semantic representation that presents the core concepts and relations in a sentence explicitly and unambiguously. It also helps reduce surface variations across different expressions and languages. Unlike most prior… ▽ More We introduce a new method to improve existing multilingual sentence embeddings with Abstract Meaning Representation (AMR). Compared with the original textual input, AMR is a structured semantic representation that presents the core concepts and relations in a sentence explicitly and unambiguously. It also helps reduce surface variations across different expressions and languages. Unlike most prior work that only evaluates the ability to measure semantic similarity, we present a thorough evaluation of existing multilingual sentence embeddings and our improved versions, which include a collection of five transfer tasks in different downstream applications. Experiment results show that retrofitting multilingual sentence embeddings with AMR leads to better state-of-the-art performance on both semantic textual similarity and transfer tasks. Our codebase and evaluation scripts can be found at \url{https://github.com/jcyk/MSE-AMR}. △ Less

Submitted 18 October, 2022; originally announced October 2022.

Comments: EMNLP2022

arXiv:2210.02719 [pdf, other]

Continuous Diagnosis and Prognosis by Controlling the Update Process of Deep Neural Networks

Authors: Chenxi Sun, Hongyan Li, Moxian Song, Derun Cai, Baofeng Zhang, Shenda Hong

Abstract: Continuous diagnosis and prognosis are essential for intensive care patients. It can provide more opportunities for timely treatment and rational resource allocation, especially for sepsis, a main cause of death in ICU, and COVID-19, a new worldwide epidemic. Although deep learning methods have shown their great superiority in many medical tasks, they tend to catastrophically forget, over fit, and… ▽ More Continuous diagnosis and prognosis are essential for intensive care patients. It can provide more opportunities for timely treatment and rational resource allocation, especially for sepsis, a main cause of death in ICU, and COVID-19, a new worldwide epidemic. Although deep learning methods have shown their great superiority in many medical tasks, they tend to catastrophically forget, over fit, and get results too late when performing diagnosis and prognosis in the continuous mode. In this work, we summarized the three requirements of this task, proposed a new concept, continuous classification of time series (CCTS), and designed a novel model training method, restricted update strategy of neural networks (RU). In the context of continuous prognosis, our method outperformed all baselines and achieved the average accuracy of 90%, 97%, and 85% on sepsis prognosis, COVID-19 mortality prediction, and eight diseases classification. Superiorly, our method can also endow deep learning with interpretability, having the potential to explore disease mechanisms and provide a new horizon for medical research. We have achieved disease staging for sepsis and COVID-19, discovering four stages and three stages with their typical biomarkers respectively. Further, our method is a data-agnostic and model-agnostic plug-in, it can be used to continuously prognose other diseases with staging and even implement CCTS in other fields. △ Less

Submitted 6 October, 2022; originally announced October 2022.

Comments: 41 pages, 15 figures

arXiv:2210.01534 [pdf, other]

Multi-fidelity Monte Carlo: a pseudo-marginal approach

Authors: Diana Cai, Ryan P. Adams

Abstract: Markov chain Monte Carlo (MCMC) is an established approach for uncertainty quantification and propagation in scientific applications. A key challenge in applying MCMC to scientific domains is computation: the target density of interest is often a function of expensive computations, such as a high-fidelity physical simulation, an intractable integral, or a slowly-converging iterative algorithm. Thu… ▽ More Markov chain Monte Carlo (MCMC) is an established approach for uncertainty quantification and propagation in scientific applications. A key challenge in applying MCMC to scientific domains is computation: the target density of interest is often a function of expensive computations, such as a high-fidelity physical simulation, an intractable integral, or a slowly-converging iterative algorithm. Thus, using an MCMC algorithms with an expensive target density becomes impractical, as these expensive computations need to be evaluated at each iteration of the algorithm. In practice, these computations often approximated via a cheaper, low-fidelity computation, leading to bias in the resulting target density. Multi-fidelity MCMC algorithms combine models of varying fidelities in order to obtain an approximate target density with lower computational cost. In this paper, we describe a class of asymptotically exact multi-fidelity MCMC algorithms for the setting where a sequence of models of increasing fidelity can be computed that approximates the expensive target density of interest. We take a pseudo-marginal MCMC approach for multi-fidelity inference that utilizes a cheaper, randomized-fidelity unbiased estimator of the target fidelity constructed via random truncation of a telescoping series of the low-fidelity sequence of models. Finally, we discuss and evaluate the proposed multi-fidelity MCMC approach on several applications, including log-Gaussian Cox process modeling, Bayesian ODE system identification, PDE-constrained optimization, and Gaussian process regression parameter inference. △ Less

Submitted 4 October, 2022; originally announced October 2022.

Comments: 22 pages, 7 figures

arXiv:2209.12028 [pdf, other]

Towards Explainable 3D Grounded Visual Question Answering: A New Benchmark and Strong Baseline

Authors: Lichen Zhao, Daigang Cai, Jing Zhang, Lu Sheng, Dong Xu, Rui Zheng, Yinjie Zhao, Lipeng Wang, Xibo Fan

Abstract: Recently, 3D vision-and-language tasks have attracted increasing research interest. Compared to other vision-and-language tasks, the 3D visual question answering (VQA) task is less exploited and is more susceptible to language priors and co-reference ambiguity. Meanwhile, a couple of recently proposed 3D VQA datasets do not well support 3D VQA task due to their limited scale and annotation methods… ▽ More Recently, 3D vision-and-language tasks have attracted increasing research interest. Compared to other vision-and-language tasks, the 3D visual question answering (VQA) task is less exploited and is more susceptible to language priors and co-reference ambiguity. Meanwhile, a couple of recently proposed 3D VQA datasets do not well support 3D VQA task due to their limited scale and annotation methods. In this work, we formally define and address a 3D grounded VQA task by collecting a new 3D VQA dataset, referred to as FE-3DGQA, with diverse and relatively free-form question-answer pairs, as well as dense and completely grounded bounding box annotations. To achieve more explainable answers, we labelled the objects appeared in the complex QA pairs with different semantic types, including answer-grounded objects (both appeared and not appeared in the questions), and contextual objects for answer-grounded objects. We also propose a new 3D VQA framework to effectively predict the completely visually grounded and explainable answer. Extensive experiments verify that our newly collected benchmark datasets can be effectively used to evaluate various 3D VQA methods from different aspects and our newly proposed framework also achieves state-of-the-art performance on the new benchmark dataset. Both the newly collected dataset and our codes will be publicly available at http://github.com/zlccccc/3DGQA. △ Less

Submitted 24 September, 2022; originally announced September 2022.

Comments: 13 pages, 10 figures

arXiv:2209.11348 [pdf, other]

A Depth-Progressive Initialization Strategy for Quantum Approximate Optimization Algorithm

Authors: Xinwei Lee, Ningyi Xie, Yoshiyuki Saito, Dongsheng Cai, Nobuyoshi Asai

Abstract: The quantum approximate optimization algorithm (QAOA) is known for its capability and universality in solving combinatorial optimization problems on near-term quantum devices. The results yielded by QAOA depend strongly on its initial variational parameters. Hence, parameters selection for QAOA becomes an active area of research as bad initialization might deteriorate the quality of the results, e… ▽ More The quantum approximate optimization algorithm (QAOA) is known for its capability and universality in solving combinatorial optimization problems on near-term quantum devices. The results yielded by QAOA depend strongly on its initial variational parameters. Hence, parameters selection for QAOA becomes an active area of research as bad initialization might deteriorate the quality of the results, especially at great circuit depths. We first discuss on the patterns of optimal parameters in QAOA in two directions: the angle index and the circuit depth. Then, we discuss on the symmetries and periodicity of the expectation that is used to determine the bounds of the search space. Based on the patterns in optimal parameters and the bounds restriction, we propose a strategy which predicts the new initial parameters by taking the difference between previous optimal parameters. Unlike most other strategies, the strategy we propose does not require multiple trials to ensure success. It only requires one prediction when progressing to the next depth. We compare this strategy with our previously proposed strategy and the layerwise strategy on solving the Max-cut problem, in terms of the approximation ratio and the optimization cost. We also address the non-optimality in previous parameters, which is seldom discussed in other works, despite its importance in explaining the behavior of variational quantum algorithms. △ Less

Submitted 27 September, 2022; v1 submitted 22 September, 2022; originally announced September 2022.

Comments: 10 pages, 4 figures

arXiv:2208.13433 [pdf, other]

Towards In-distribution Compatibility in Out-of-distribution Detection

Authors: Boxi Wu, Jie Jiang, Haidong Ren, Zifan Du, Wenxiao Wang, Zhifeng Li, Deng Cai, Xiaofei He, Binbin Lin, Wei Liu

Abstract: Deep neural network, despite its remarkable capability of discriminating targeted in-distribution samples, shows poor performance on detecting anomalous out-of-distribution data. To address this defect, state-of-the-art solutions choose to train deep networks on an auxiliary dataset of outliers. Various training criteria for these auxiliary outliers are proposed based on heuristic intuitions. Howe… ▽ More Deep neural network, despite its remarkable capability of discriminating targeted in-distribution samples, shows poor performance on detecting anomalous out-of-distribution data. To address this defect, state-of-the-art solutions choose to train deep networks on an auxiliary dataset of outliers. Various training criteria for these auxiliary outliers are proposed based on heuristic intuitions. However, we find that these intuitively designed outlier training criteria can hurt in-distribution learning and eventually lead to inferior performance. To this end, we identify three causes of the in-distribution incompatibility: contradictory gradient, false likelihood, and distribution shift. Based on our new understandings, we propose a new out-of-distribution detection method by adapting both the top-design of deep models and the loss function. Our method achieves in-distribution compatibility by pursuing less interference with the probabilistic characteristic of in-distribution features. On several benchmarks, our method not only achieves the state-of-the-art out-of-distribution detection performance but also improves the in-distribution accuracy. △ Less

Submitted 29 August, 2022; originally announced August 2022.

arXiv:2208.03624 [pdf, other]

Graph R-CNN: Towards Accurate 3D Object Detection with Semantic-Decorated Local Graph

Authors: Honghui Yang, Zili Liu, Xiaopei Wu, Wenxiao Wang, Wei Qian, Xiaofei He, Deng Cai

Abstract: Two-stage detectors have gained much popularity in 3D object detection. Most two-stage 3D detectors utilize grid points, voxel grids, or sampled keypoints for RoI feature extraction in the second stage. Such methods, however, are inefficient in handling unevenly distributed and sparse outdoor points. This paper solves this problem in three aspects. 1) Dynamic Point Aggregation. We propose the patc… ▽ More Two-stage detectors have gained much popularity in 3D object detection. Most two-stage 3D detectors utilize grid points, voxel grids, or sampled keypoints for RoI feature extraction in the second stage. Such methods, however, are inefficient in handling unevenly distributed and sparse outdoor points. This paper solves this problem in three aspects. 1) Dynamic Point Aggregation. We propose the patch search to quickly search points in a local region for each 3D proposal. The dynamic farthest voxel sampling is then applied to evenly sample the points. Especially, the voxel size varies along the distance to accommodate the uneven distribution of points. 2) RoI-graph Pooling. We build local graphs on the sampled points to better model contextual information and mine point relations through iterative message passing. 3) Visual Features Augmentation. We introduce a simple yet effective fusion strategy to compensate for sparse LiDAR points with limited semantic cues. Based on these modules, we construct our Graph R-CNN as the second stage, which can be applied to existing one-stage detectors to consistently improve the detection performance. Extensive experiments show that Graph R-CNN outperforms the state-of-the-art 3D detection models by a large margin on both the KITTI and Waymo Open Dataset. And we rank first place on the KITTI BEV car detection leaderboard. Code will be available at \url{https://github.com/Nightmare-n/GraphRCNN}. △ Less

Submitted 6 August, 2022; originally announced August 2022.

Comments: ECCV 2022, Oral

arXiv:2208.02129 [pdf, other]

SC6D: Symmetry-agnostic and Correspondence-free 6D Object Pose Estimation

Authors: Dingding Cai, Janne Heikkilä, Esa Rahtu

Abstract: This paper presents an efficient symmetry-agnostic and correspondence-free framework, referred to as SC6D, for 6D object pose estimation from a single monocular RGB image. SC6D requires neither the 3D CAD model of the object nor any prior knowledge of the symmetries. The pose estimation is decomposed into three sub-tasks: a) object 3D rotation representation learning and matching; b) estimation of… ▽ More This paper presents an efficient symmetry-agnostic and correspondence-free framework, referred to as SC6D, for 6D object pose estimation from a single monocular RGB image. SC6D requires neither the 3D CAD model of the object nor any prior knowledge of the symmetries. The pose estimation is decomposed into three sub-tasks: a) object 3D rotation representation learning and matching; b) estimation of the 2D location of the object center; and c) scale-invariant distance estimation (the translation along the z-axis) via classification. SC6D is evaluated on three benchmark datasets, T-LESS, YCB-V, and ITODD, and results in state-of-the-art performance on the T-LESS dataset. Moreover, SC6D is computationally much more efficient than the previous state-of-the-art method SurfEmb. The implementation and pre-trained models are publicly available at https://github.com/dingdingcai/SC6D-pose. △ Less

Submitted 18 September, 2022; v1 submitted 3 August, 2022; originally announced August 2022.

Comments: 3DV 2022

arXiv:2207.11456 [pdf, other]

doi 10.1109/TBDATA.2022.3192898

Accelerating Vertical Federated Learning

Authors: Dongqi Cai, Tao Fan, Yan Kang, Lixin Fan, Mengwei Xu, Shangguang Wang, Qiang Yang

Abstract: Privacy, security and data governance constraints rule out a brute force process in the integration of cross-silo data, which inherits the development of the Internet of Things. Federated learning is proposed to ensure that all parties can collaboratively complete the training task while the data is not out of the local. Vertical federated learning is a specialization of federated learning for dis… ▽ More Privacy, security and data governance constraints rule out a brute force process in the integration of cross-silo data, which inherits the development of the Internet of Things. Federated learning is proposed to ensure that all parties can collaboratively complete the training task while the data is not out of the local. Vertical federated learning is a specialization of federated learning for distributed features. To preserve privacy, homomorphic encryption is applied to enable encrypted operations without decryption. Nevertheless, together with a robust security guarantee, homomorphic encryption brings extra communication and computation overhead. In this paper, we analyze the current bottlenecks of vertical federated learning under homomorphic encryption comprehensively and numerically. We propose a straggler-resilient and computation-efficient accelerating system that reduces the communication overhead in heterogeneous scenarios by 65.26% at most and reduces the computation overhead caused by homomorphic encryption by 40.66% at most. Our system can improve the robustness and efficiency of the current vertical federated learning framework without loss of security. △ Less

Submitted 21 January, 2024; v1 submitted 23 July, 2022; originally announced July 2022.

arXiv:2207.10498 [pdf, other]

Towards Efficient Adversarial Training on Vision Transformers

Authors: Boxi Wu, Jindong Gu, Zhifeng Li, Deng Cai, Xiaofei He, Wei Liu

Abstract: Vision Transformer (ViT), as a powerful alternative to Convolutional Neural Network (CNN), has received much attention. Recent work showed that ViTs are also vulnerable to adversarial examples like CNNs. To build robust ViTs, an intuitive way is to apply adversarial training since it has been shown as one of the most effective ways to accomplish robust CNNs. However, one major limitation of advers… ▽ More Vision Transformer (ViT), as a powerful alternative to Convolutional Neural Network (CNN), has received much attention. Recent work showed that ViTs are also vulnerable to adversarial examples like CNNs. To build robust ViTs, an intuitive way is to apply adversarial training since it has been shown as one of the most effective ways to accomplish robust CNNs. However, one major limitation of adversarial training is its heavy computational cost. The self-attention mechanism adopted by ViTs is a computationally intense operation whose expense increases quadratically with the number of input patches, making adversarial training on ViTs even more time-consuming. In this work, we first comprehensively study fast adversarial training on a variety of vision transformers and illustrate the relationship between the efficiency and robustness. Then, to expediate adversarial training on ViTs, we propose an efficient Attention Guided Adversarial Training mechanism. Specifically, relying on the specialty of self-attention, we actively remove certain patch embeddings of each layer with an attention-guided dropping strategy during adversarial training. The slimmed self-attention modules accelerate the adversarial training on ViTs significantly. With only 65\% of the fast adversarial training time, we match the state-of-the-art results on the challenging ImageNet benchmark. △ Less

Submitted 21 July, 2022; originally announced July 2022.

arXiv:2207.08531 [pdf, other]

DID-M3D: Decoupling Instance Depth for Monocular 3D Object Detection

Authors: Liang Peng, Xiaopei Wu, Zheng Yang, Haifeng Liu, Deng Cai

Abstract: Monocular 3D detection has drawn much attention from the community due to its low cost and setup simplicity. It takes an RGB image as input and predicts 3D boxes in the 3D space. The most challenging sub-task lies in the instance depth estimation. Previous works usually use a direct estimation method. However, in this paper we point out that the instance depth on the RGB image is non-intuitive. It… ▽ More Monocular 3D detection has drawn much attention from the community due to its low cost and setup simplicity. It takes an RGB image as input and predicts 3D boxes in the 3D space. The most challenging sub-task lies in the instance depth estimation. Previous works usually use a direct estimation method. However, in this paper we point out that the instance depth on the RGB image is non-intuitive. It is coupled by visual depth clues and instance attribute clues, making it hard to be directly learned in the network. Therefore, we propose to reformulate the instance depth to the combination of the instance visual surface depth (visual depth) and the instance attribute depth (attribute depth). The visual depth is related to objects' appearances and positions on the image. By contrast, the attribute depth relies on objects' inherent attributes, which are invariant to the object affine transformation on the image. Correspondingly, we decouple the 3D location uncertainty into visual depth uncertainty and attribute depth uncertainty. By combining different types of depths and associated uncertainties, we can obtain the final instance depth. Furthermore, data augmentation in monocular 3D detection is usually limited due to the physical nature, hindering the boost of performance. Based on the proposed instance depth disentanglement strategy, we can alleviate this problem. Evaluated on KITTI, our method achieves new state-of-the-art results, and extensive ablation studies validate the effectiveness of each component in our method. The codes are released at https://github.com/SPengLiang/DID-M3D. △ Less

Submitted 22 July, 2022; v1 submitted 18 July, 2022; originally announced July 2022.

Comments: ECCV 2022

arXiv:2207.08265

MLP-GAN for Brain Vessel Image Segmentation

Authors: Bin Xie, Hao Tang, Bin Duan, Dawen Cai, Yan Yan

Abstract: Brain vessel image segmentation can be used as a promising biomarker for better prevention and treatment of different diseases. One successful approach is to consider the segmentation as an image-to-image translation task and perform a conditional Generative Adversarial Network (cGAN) to learn a transformation between two distributions. In this paper, we present a novel multi-view approach, MLP-GA… ▽ More Brain vessel image segmentation can be used as a promising biomarker for better prevention and treatment of different diseases. One successful approach is to consider the segmentation as an image-to-image translation task and perform a conditional Generative Adversarial Network (cGAN) to learn a transformation between two distributions. In this paper, we present a novel multi-view approach, MLP-GAN, which splits a 3D volumetric brain vessel image into three different dimensional 2D images (i.e., sagittal, coronal, axial) and then feed them into three different 2D cGANs. The proposed MLP-GAN not only alleviates the memory issue which exists in the original 3D neural networks but also retains 3D spatial information. Specifically, we utilize U-Net as the backbone for our generator and redesign the pattern of skip connection integrated with the MLP-Mixer which has attracted lots of attention recently. Our model obtains the ability to capture cross-patch information to learn global information with the MLP-Mixer. Extensive experiments are performed on the public brain vessel dataset that show our MLP-GAN outperforms other state-of-the-art methods. We release our code at https://github.com/bxie9/MLP-GAN △ Less

Submitted 26 October, 2022; v1 submitted 17 July, 2022; originally announced July 2022.

Comments: Resubmit a conference

arXiv:2206.09103 [pdf, other]

Identifying Source Speakers for Voice Conversion based Spoofing Attacks on Speaker Verification Systems

Authors: Danwei Cai, Zexin Cai, Ming Li

Abstract: An automatic speaker verification system aims to verify the speaker identity of a speech signal. However, a voice conversion system could manipulate a person's speech signal to make it sound like another speaker's voice and deceive the speaker verification system. Most countermeasures for voice conversion-based spoofing attacks are designed to discriminate bona fide speech from spoofed speech for… ▽ More An automatic speaker verification system aims to verify the speaker identity of a speech signal. However, a voice conversion system could manipulate a person's speech signal to make it sound like another speaker's voice and deceive the speaker verification system. Most countermeasures for voice conversion-based spoofing attacks are designed to discriminate bona fide speech from spoofed speech for speaker verification systems. In this paper, we investigate the problem of source speaker identification -- inferring the identity of the source speaker given the voice converted speech. To perform source speaker identification, we simply add voice-converted speech data with the label of source speaker identity to the genuine speech dataset during speaker embedding network training. Experimental results show the feasibility of source speaker identification when training and testing with converted speeches from the same voice conversion model(s). In addition, our results demonstrate that having more converted utterances from various voice conversion model for training helps improve the source speaker identification performance on converted utterances from unseen voice conversion models. △ Less

Submitted 31 October, 2022; v1 submitted 17 June, 2022; originally announced June 2022.

arXiv:2206.07956 [pdf, other]

Automatic Prosody Annotation with Pre-Trained Text-Speech Model

Authors: Ziqian Dai, Jianwei Yu, Yan Wang, Nuo Chen, Yanyao Bian, Guangzhi Li, Deng Cai, Dong Yu

Abstract: Prosodic boundary plays an important role in text-to-speech synthesis (TTS) in terms of naturalness and readability. However, the acquisition of prosodic boundary labels relies on manual annotation, which is costly and time-consuming. In this paper, we propose to automatically extract prosodic boundary labels from text-audio data via a neural text-speech model with pre-trained audio encoders. This… ▽ More Prosodic boundary plays an important role in text-to-speech synthesis (TTS) in terms of naturalness and readability. However, the acquisition of prosodic boundary labels relies on manual annotation, which is costly and time-consuming. In this paper, we propose to automatically extract prosodic boundary labels from text-audio data via a neural text-speech model with pre-trained audio encoders. This model is pre-trained on text and speech data separately and jointly fine-tuned on TTS data in a triplet format: {speech, text, prosody}. The experimental results on both automatic evaluation and human evaluation demonstrate that: 1) the proposed text-speech prosody annotation framework significantly outperforms text-only baselines; 2) the quality of automatic prosodic boundary annotations is comparable to human annotations; 3) TTS systems trained with model-annotated boundaries are slightly better than systems that use manual ones. △ Less

Submitted 16 June, 2022; originally announced June 2022.

Comments: accepted by INTERSPEECH2022

arXiv:2206.02369 [pdf, other]

Learning to Break the Loop: Analyzing and Mitigating Repetitions for Neural Text Generation

Authors: Jin Xu, Xiaojiang Liu, Jianhao Yan, Deng Cai, Huayang Li, Jian Li

Abstract: While large-scale neural language models, such as GPT2 and BART, have achieved impressive results on various text generation tasks, they tend to get stuck in undesirable sentence-level loops with maximization-based decoding algorithms (\textit{e.g.}, greedy search). This phenomenon is counter-intuitive since there are few consecutive sentence-level repetitions in human corpora (e.g., 0.02\% in Wik… ▽ More While large-scale neural language models, such as GPT2 and BART, have achieved impressive results on various text generation tasks, they tend to get stuck in undesirable sentence-level loops with maximization-based decoding algorithms (\textit{e.g.}, greedy search). This phenomenon is counter-intuitive since there are few consecutive sentence-level repetitions in human corpora (e.g., 0.02\% in Wikitext-103). To investigate the underlying reasons for generating consecutive sentence-level repetitions, we study the relationship between the probabilities of the repetitive tokens and their previous repetitions in the context. Through our quantitative experiments, we find that 1) Language models have a preference to repeat the previous sentence; 2) The sentence-level repetitions have a \textit{self-reinforcement effect}: the more times a sentence is repeated in the context, the higher the probability of continuing to generate that sentence; 3) The sentences with higher initial probabilities usually have a stronger self-reinforcement effect. Motivated by our findings, we propose a simple and effective training method \textbf{DITTO} (Pseu\underline{D}o-Repet\underline{IT}ion Penaliza\underline{T}i\underline{O}n), where the model learns to penalize probabilities of sentence-level repetitions from pseudo repetitive data. Although our method is motivated by mitigating repetitions, experiments show that DITTO not only mitigates the repetition issue without sacrificing perplexity, but also achieves better generation quality. Extensive experiments on open-ended text generation (Wikitext-103) and text summarization (CNN/DailyMail) demonstrate the generality and effectiveness of our method. △ Less

Submitted 9 October, 2022; v1 submitted 6 June, 2022; originally announced June 2022.

Comments: Accepted by NeurIPS 2022. Code is released at https://github.com/Jxu-Thu/DITTO

arXiv:2206.02102 [pdf, other]

AUTM Flow: Atomic Unrestricted Time Machine for Monotonic Normalizing Flows

Authors: Difeng Cai, Yuliang Ji, Huan He, Qiang Ye, Yuanzhe Xi

Abstract: Nonlinear monotone transformations are used extensively in normalizing flows to construct invertible triangular mappings from simple distributions to complex ones. In existing literature, monotonicity is usually enforced by restricting function classes or model parameters and the inverse transformation is often approximated by root-finding algorithms as a closed-form inverse is unavailable. In thi… ▽ More Nonlinear monotone transformations are used extensively in normalizing flows to construct invertible triangular mappings from simple distributions to complex ones. In existing literature, monotonicity is usually enforced by restricting function classes or model parameters and the inverse transformation is often approximated by root-finding algorithms as a closed-form inverse is unavailable. In this paper, we introduce a new integral-based approach termed "Atomic Unrestricted Time Machine (AUTM)", equipped with unrestricted integrands and easy-to-compute explicit inverse. AUTM offers a versatile and efficient way to the design of normalizing flows with explicit inverse and unrestricted function classes or parameters. Theoretically, we present a constructive proof that AUTM is universal: all monotonic normalizing flows can be viewed as limits of AUTM flows. We provide a concrete example to show how to approximate any given monotonic normalizing flow using AUTM flows with guaranteed convergence. The result implies that AUTM can be used to transform an existing flow into a new one equipped with explicit inverse and unrestricted parameters. The performance of the new approach is evaluated on high dimensional density estimation, variational inference and image generation. Experiments demonstrate superior speed and memory efficiency of AUTM. △ Less

Submitted 5 June, 2022; originally announced June 2022.

Comments: 20 pages, 3 figures

MSC Class: 68T07 ACM Class: I.5.1; I.2.6

arXiv:2206.01885 [pdf, other]

Data-driven Construction of Hierarchical Matrices with Nested Bases

Authors: Difeng Cai, Hua Huang, Edmond Chow, Yuanzhe Xi

Abstract: Hierarchical matrices provide a powerful representation for significantly reducing the computational complexity associated with dense kernel matrices. For general kernel functions, interpolation-based methods are widely used for the efficient construction of hierarchical matrices. In this paper, we present a fast hierarchical data reduction (HiDR) procedure with $O(n)$ complexity for the memory-ef… ▽ More Hierarchical matrices provide a powerful representation for significantly reducing the computational complexity associated with dense kernel matrices. For general kernel functions, interpolation-based methods are widely used for the efficient construction of hierarchical matrices. In this paper, we present a fast hierarchical data reduction (HiDR) procedure with $O(n)$ complexity for the memory-efficient construction of hierarchical matrices with nested bases where $n$ is the number of data points. HiDR aims to reduce the given data in a hierarchical way so as to obtain $O(1)$ representations for all nearfield and farfield interactions. Based on HiDR, a linear complexity $\mathcal{H}^2$ matrix construction algorithm is proposed. The use of data-driven methods enables {better efficiency than other general-purpose methods} and flexible computation without accessing the kernel function. Experiments demonstrate significantly improved memory efficiency of the proposed data-driven method compared to interpolation-based methods over a wide range of kernels. Though the method is not optimized for any special kernel, benchmark experiments for the Coulomb kernel show that the proposed general-purpose algorithm offers competitive performance for hierarchical matrix construction compared to several state-of-the-art algorithms for the Coulomb kernel. △ Less

Submitted 3 June, 2022; originally announced June 2022.

Comments: 26 pages, 20 figures

MSC Class: 15A23 (Primary); 68W25; 65D40 (Secondary)

arXiv:2205.10162 [pdf, other]

FedAdapter: Efficient Federated Learning for Modern NLP

Authors: Dongqi Cai, Yaozong Wu, Shangguang Wang, Felix Xiaozhu Lin, Mengwei Xu

Abstract: Transformer-based pre-trained models have revolutionized NLP for superior performance and generality. Fine-tuning pre-trained models for downstream tasks often requires private data, for which federated learning is the de-facto approach (i.e., FedNLP). However, our measurements show that FedNLP is prohibitively slow due to the large model sizes and the resultant high network/computation cost. Towa… ▽ More Transformer-based pre-trained models have revolutionized NLP for superior performance and generality. Fine-tuning pre-trained models for downstream tasks often requires private data, for which federated learning is the de-facto approach (i.e., FedNLP). However, our measurements show that FedNLP is prohibitively slow due to the large model sizes and the resultant high network/computation cost. Towards practical FedNLP, we identify as the key building blocks adapters, small bottleneck modules inserted at a variety of model layers. A key challenge is to properly configure the depth and width of adapters, to which the training speed and efficiency is highly sensitive. No silver-bullet configuration exists: the optimal choice varies across downstream NLP tasks, desired model accuracy, and mobile resources. To automate adapter configuration, we propose FedAdapter, a framework that enhances the existing FedNLP with two novel designs. First, FedAdapter progressively upgrades the adapter configuration throughout a training session; the principle is to quickly learn shallow knowledge by only training fewer and smaller adapters at the model's top layers, and incrementally learn deep knowledge by incorporating deeper and larger adapters. Second, FedAdapter continuously profiles future adapter configurations by allocating participant devices to trial groups. Extensive experiments show that FedAdapter can reduce FedNLP's model convergence delay to no more than several hours, which is up to 155.5$\times$ faster compared to vanilla FedNLP and 48$\times$ faster compared to strong baselines. △ Less

Submitted 8 May, 2023; v1 submitted 20 May, 2022; originally announced May 2022.

Comments: Accepted by MobiCom 2023

arXiv:2204.08735 [pdf, other]

doi 10.1016/j.neucom.2023.01.023

Neural Collapse Inspired Attraction-Repulsion-Balanced Loss for Imbalanced Learning

Authors: Liang Xie, Yibo Yang, Deng Cai, Xiaofei He

Abstract: Class imbalance distribution widely exists in real-world engineering. However, the mainstream optimization algorithms that seek to minimize error will trap the deep learning model in sub-optimums when facing extreme class imbalance. It seriously harms the classification precision, especially on the minor classes. The essential reason is that the gradients of the classifier weights are imbalanced a… ▽ More Class imbalance distribution widely exists in real-world engineering. However, the mainstream optimization algorithms that seek to minimize error will trap the deep learning model in sub-optimums when facing extreme class imbalance. It seriously harms the classification precision, especially on the minor classes. The essential reason is that the gradients of the classifier weights are imbalanced among the components from different classes. In this paper, we propose Attraction-Repulsion-Balanced Loss (ARB-Loss) to balance the different components of the gradients. We perform experiments on the large-scale classification and segmentation datasets and our ARB-Loss can achieve state-of-the-art performance via only one-stage training instead of 2-stage learning like nowadays SOTA works. △ Less

Submitted 21 February, 2023; v1 submitted 19 April, 2022; originally announced April 2022.

Comments: 25 pages, 5 figures, accepted by Neurocomputing

arXiv:2203.14957 [pdf, other]

Frame-wise Action Representations for Long Videos via Sequence Contrastive Learning

Authors: Minghao Chen, Fangyun Wei, Chong Li, Deng Cai

Abstract: Prior works on action representation learning mainly focus on designing various architectures to extract the global representations for short video clips. In contrast, many practical applications such as video alignment have strong demand for learning dense representations for long videos. In this paper, we introduce a novel contrastive action representation learning (CARL) framework to learn fram… ▽ More Prior works on action representation learning mainly focus on designing various architectures to extract the global representations for short video clips. In contrast, many practical applications such as video alignment have strong demand for learning dense representations for long videos. In this paper, we introduce a novel contrastive action representation learning (CARL) framework to learn frame-wise action representations, especially for long videos, in a self-supervised manner. Concretely, we introduce a simple yet efficient video encoder that considers spatio-temporal context to extract frame-wise representations. Inspired by the recent progress of self-supervised learning, we present a novel sequence contrastive loss (SCL) applied on two correlated views obtained through a series of spatio-temporal data augmentations. SCL optimizes the embedding space by minimizing the KL-divergence between the sequence similarity of two augmented views and a prior Gaussian distribution of timestamp distance. Experiments on FineGym, PennAction and Pouring datasets show that our method outperforms previous state-of-the-art by a large margin for downstream fine-grained action classification. Surprisingly, although without training on paired videos, our approach also shows outstanding performance on video alignment and fine-grained frame retrieval tasks. Code and models are available at https://github.com/minghchen/CARL_code. △ Less

Submitted 28 March, 2022; originally announced March 2022.

Comments: Accepted by CVPR 2022

arXiv:2203.12644 [pdf, other]

Linearizing Transformer with Key-Value Memory

Authors: Yizhe Zhang, Deng Cai

Abstract: Efficient transformer variants with linear time complexity have been developed to mitigate the quadratic computational overhead of the vanilla transformer. Among them are low-rank projection methods such as Linformer and kernel-based Transformers. Despite their unique merits, they usually suffer from a performance drop comparing with the vanilla transformer on many sequence generation tasks, and o… ▽ More Efficient transformer variants with linear time complexity have been developed to mitigate the quadratic computational overhead of the vanilla transformer. Among them are low-rank projection methods such as Linformer and kernel-based Transformers. Despite their unique merits, they usually suffer from a performance drop comparing with the vanilla transformer on many sequence generation tasks, and often fail to obtain computation gain when the generation is short. We propose MemSizer, an approach towards closing the performance gap while improving the efficiency even with short generation. It projects the source sequences into lower dimension representations like Linformer, while enjoying efficient recurrent-style incremental computation similar to kernel-based transformers. This yields linear computation time and constant memory complexity at inference time. MemSizer also employs a lightweight multi-head mechanism which renders the computation as light as a single-head model. We demonstrate that MemSizer provides an improved balance between efficiency and accuracy over the vanilla transformer and other efficient transformer variants in three typical sequence generation tasks, including machine translation, abstractive text summarization, and language modeling. △ Less

Submitted 12 October, 2022; v1 submitted 23 March, 2022; originally announced March 2022.

Comments: EMNLP2022. The two authors contributed equally

arXiv:2203.10350 [pdf, other]

CLRNet: Cross Layer Refinement Network for Lane Detection

Authors: Tu Zheng, Yifei Huang, Yang Liu, Wenjian Tang, Zheng Yang, Deng Cai, Xiaofei He

Abstract: Lane is critical in the vision navigation system of the intelligent vehicle. Naturally, lane is a traffic sign with high-level semantics, whereas it owns the specific local pattern which needs detailed low-level features to localize accurately. Using different feature levels is of great importance for accurate lane detection, but it is still under-explored. In this work, we present Cross Layer Ref… ▽ More Lane is critical in the vision navigation system of the intelligent vehicle. Naturally, lane is a traffic sign with high-level semantics, whereas it owns the specific local pattern which needs detailed low-level features to localize accurately. Using different feature levels is of great importance for accurate lane detection, but it is still under-explored. In this work, we present Cross Layer Refinement Network (CLRNet) aiming at fully utilizing both high-level and low-level features in lane detection. In particular, it first detects lanes with high-level semantic features then performs refinement based on low-level features. In this way, we can exploit more contextual information to detect lanes while leveraging local detailed lane features to improve localization accuracy. We present ROIGather to gather global context, which further enhances the feature representation of lanes. In addition to our novel network design, we introduce Line IoU loss which regresses the lane line as a whole unit to improve the localization accuracy. Experiments demonstrate that the proposed method greatly outperforms the state-of-the-art lane detection approaches. △ Less

Submitted 19 March, 2022; originally announced March 2022.

Comments: CVPR2022 Acceptance

arXiv:2203.09780 [pdf, other]

Sparse Fuse Dense: Towards High Quality 3D Detection with Depth Completion

Authors: Xiaopei Wu, Liang Peng, Honghui Yang, Liang Xie, Chenxi Huang, Chengqi Deng, Haifeng Liu, Deng Cai

Abstract: Current LiDAR-only 3D detection methods inevitably suffer from the sparsity of point clouds. Many multi-modal methods are proposed to alleviate this issue, while different representations of images and point clouds make it difficult to fuse them, resulting in suboptimal performance. In this paper, we present a novel multi-modal framework SFD (Sparse Fuse Dense), which utilizes pseudo point clouds… ▽ More Current LiDAR-only 3D detection methods inevitably suffer from the sparsity of point clouds. Many multi-modal methods are proposed to alleviate this issue, while different representations of images and point clouds make it difficult to fuse them, resulting in suboptimal performance. In this paper, we present a novel multi-modal framework SFD (Sparse Fuse Dense), which utilizes pseudo point clouds generated from depth completion to tackle the issues mentioned above. Different from prior works, we propose a new RoI fusion strategy 3D-GAF (3D Grid-wise Attentive Fusion) to make fuller use of information from different types of point clouds. Specifically, 3D-GAF fuses 3D RoI features from the couple of point clouds in a grid-wise attentive way, which is more fine-grained and more precise. In addition, we propose a SynAugment (Synchronized Augmentation) to enable our multi-modal framework to utilize all data augmentation approaches tailored to LiDAR-only methods. Lastly, we customize an effective and efficient feature extractor CPConv (Color Point Convolution) for pseudo point clouds. It can explore 2D image features and 3D geometric features of pseudo point clouds simultaneously. Our method holds the highest entry on the KITTI car 3D object detection leaderboard, demonstrating the effectiveness of our SFD. Codes are available at https://github.com/LittlePey/SFD. △ Less

Submitted 4 July, 2022; v1 submitted 18 March, 2022; originally announced March 2022.

Comments: Accepted by CVPR 2022 (Oral)

arXiv:2203.08332 [pdf, other]

WeakM3D: Towards Weakly Supervised Monocular 3D Object Detection

Authors: Liang Peng, Senbo Yan, Boxi Wu, Zheng Yang, Xiaofei He, Deng Cai

Abstract: Monocular 3D object detection is one of the most challenging tasks in 3D scene understanding. Due to the ill-posed nature of monocular imagery, existing monocular 3D detection methods highly rely on training with the manually annotated 3D box labels on the LiDAR point clouds. This annotation process is very laborious and expensive. To dispense with the reliance on 3D box labels, in this paper we e… ▽ More Monocular 3D object detection is one of the most challenging tasks in 3D scene understanding. Due to the ill-posed nature of monocular imagery, existing monocular 3D detection methods highly rely on training with the manually annotated 3D box labels on the LiDAR point clouds. This annotation process is very laborious and expensive. To dispense with the reliance on 3D box labels, in this paper we explore the weakly supervised monocular 3D detection. Specifically, we first detect 2D boxes on the image. Then, we adopt the generated 2D boxes to select corresponding RoI LiDAR points as the weak supervision. Eventually, we adopt a network to predict 3D boxes which can tightly align with associated RoI LiDAR points. This network is learned by minimizing our newly-proposed 3D alignment loss between the 3D box estimates and the corresponding RoI LiDAR points. We will illustrate the potential challenges of the above learning problem and resolve these challenges by introducing several effective designs into our method. Codes will be available at https://github.com/SPengLiang/WeakM3D. △ Less

Submitted 15 March, 2022; originally announced March 2022.

Comments: Accepted by ICLR 2022

arXiv:2203.02309 [pdf, other]

doi 10.1088/1361-6471/ac841a

A Next-Generation Liquid Xenon Observatory for Dark Matter and Neutrino Physics

Authors: J. Aalbers, K. Abe, V. Aerne, F. Agostini, S. Ahmed Maouloud, D. S. Akerib, D. Yu. Akimov, J. Akshat, A. K. Al Musalhi, F. Alder, S. K. Alsum, L. Althueser, C. S. Amarasinghe, F. D. Amaro, A. Ames, T. J. Anderson, B. Andrieu, N. Angelides, E. Angelino, J. Angevaare, V. C. Antochi, D. Antón Martin, B. Antunovic, E. Aprile, H. M. Araújo , et al. (572 additional authors not shown)

Abstract: The nature of dark matter and properties of neutrinos are among the most pressing issues in contemporary particle physics. The dual-phase xenon time-projection chamber is the leading technology to cover the available parameter space for Weakly Interacting Massive Particles (WIMPs), while featuring extensive sensitivity to many alternative dark matter candidates. These detectors can also study neut… ▽ More The nature of dark matter and properties of neutrinos are among the most pressing issues in contemporary particle physics. The dual-phase xenon time-projection chamber is the leading technology to cover the available parameter space for Weakly Interacting Massive Particles (WIMPs), while featuring extensive sensitivity to many alternative dark matter candidates. These detectors can also study neutrinos through neutrinoless double-beta decay and through a variety of astrophysical sources. A next-generation xenon-based detector will therefore be a true multi-purpose observatory to significantly advance particle physics, nuclear physics, astrophysics, solar physics, and cosmology. This review article presents the science cases for such a detector. △ Less

Submitted 4 March, 2022; originally announced March 2022.

Comments: 77 pages, 40 figures, 1262 references

Report number: INT-PUB-22-003

Journal ref: J. Phys. G: Nucl. Part. Phys. 50 (2023) 013001

arXiv:2203.01072 [pdf, other]

OVE6D: Object Viewpoint Encoding for Depth-based 6D Object Pose Estimation

Authors: Dingding Cai, Janne Heikkilä, Esa Rahtu

Abstract: This paper proposes a universal framework, called OVE6D, for model-based 6D object pose estimation from a single depth image and a target object mask. Our model is trained using purely synthetic data rendered from ShapeNet, and, unlike most of the existing methods, it generalizes well on new real-world objects without any fine-tuning. We achieve this by decomposing the 6D pose into viewpoint, in-p… ▽ More This paper proposes a universal framework, called OVE6D, for model-based 6D object pose estimation from a single depth image and a target object mask. Our model is trained using purely synthetic data rendered from ShapeNet, and, unlike most of the existing methods, it generalizes well on new real-world objects without any fine-tuning. We achieve this by decomposing the 6D pose into viewpoint, in-plane rotation around the camera optical axis and translation, and introducing novel lightweight modules for estimating each component in a cascaded manner. The resulting network contains less than 4M parameters while demonstrating excellent performance on the challenging T-LESS and Occluded LINEMOD datasets without any dataset-specific training. We show that OVE6D outperforms some contemporary deep learning-based pose estimation methods specifically trained for individual objects or datasets with real-world training data. The implementation and the pre-trained model will be made publicly available. △ Less

Submitted 7 April, 2022; v1 submitted 2 March, 2022; originally announced March 2022.

Comments: CVPR 2022

arXiv:2203.00825 [pdf, other]

Towards Effective Resource Procurement in MEC: a Resource Re-selling Framework

Authors: Marie Siew, Shikhar Sharma, Kun Guo, Desmond Cai, Wanli Wen, Carlee Joe-Wong, Tony Q. S. Quek

Abstract: On-demand and resource reservation pricing models have been widely used in cloud computing, catering to different user requirements. Nevertheless, in Multi-Access Edge Computing (MEC), as the edge has limited resources compared to the cloud, on-demand users may not get their jobs served on time, or at all, if too many resources were reserved by reservation plan users. Concurrently, reservation pla… ▽ More On-demand and resource reservation pricing models have been widely used in cloud computing, catering to different user requirements. Nevertheless, in Multi-Access Edge Computing (MEC), as the edge has limited resources compared to the cloud, on-demand users may not get their jobs served on time, or at all, if too many resources were reserved by reservation plan users. Concurrently, reservation plan users may possess excess un-utilized quota. To optimize this resource mismatch scenario, we propose a Sharing Quota Model (SQM) where reservation plan users can re-sell unused resource quota to on-demand users, with the mobile network operator (MNO) taking a commission. To analyze the user's aggregate behavior at equilibrium and investigate the MNO's incentive of allowing re-selling, we formulate a 3-stage non-cooperative Stackelberg Game. Solving this game, we characterize the optimal strategies of buyers and re-sellers. We show that on aggregate, users' optimal strategies give rise to 4 disjoint regions, dependent on the MNO's prices and supply levels. Based on this, we characterise the MNO's optimal prices for on-demand users. Numerical results show that having both the sharing and on-demand pool gives the MNO an optimal revenue when the on-demand pool's supply is low, and when the MNO's commission is low. △ Less

Submitted 8 November, 2023; v1 submitted 1 March, 2022; originally announced March 2022.

Comments: Accepted at IEEE Transactions on Services Computing

arXiv:2202.02976 [pdf, other]

Measuring and Reducing Model Update Regression in Structured Prediction for NLP

Authors: Deng Cai, Elman Mansimov, Yi-An Lai, Yixuan Su, Lei Shu, Yi Zhang

Abstract: Recent advance in deep learning has led to the rapid adoption of machine learning-based NLP models in a wide range of applications. Despite the continuous gain in accuracy, backward compatibility is also an important aspect for industrial applications, yet it received little research attention. Backward compatibility requires that the new model does not regress on cases that were correctly handled… ▽ More Recent advance in deep learning has led to the rapid adoption of machine learning-based NLP models in a wide range of applications. Despite the continuous gain in accuracy, backward compatibility is also an important aspect for industrial applications, yet it received little research attention. Backward compatibility requires that the new model does not regress on cases that were correctly handled by its predecessor. This work studies model update regression in structured prediction tasks. We choose syntactic dependency parsing and conversational semantic parsing as representative examples of structured prediction tasks in NLP. First, we measure and analyze model update regression in different model update settings. Next, we explore and benchmark existing techniques for reducing model update regression including model ensemble and knowledge distillation. We further propose a simple and effective method, Backward-Congruent Re-ranking (BCR), by taking into account the characteristics of structured prediction. Experiments show that BCR can better mitigate model update regression than model ensemble and knowledge distillation approaches. △ Less

Submitted 8 October, 2022; v1 submitted 7 February, 2022; originally announced February 2022.

Comments: NeurIPS2022

arXiv:2202.01110 [pdf, other]

A Survey on Retrieval-Augmented Text Generation

Authors: Huayang Li, Yixuan Su, Deng Cai, Yan Wang, Lemao Liu

Abstract: Recently, retrieval-augmented text generation attracted increasing attention of the computational linguistics community. Compared with conventional generation models, retrieval-augmented text generation has remarkable advantages and particularly has achieved state-of-the-art performance in many NLP tasks. This paper aims to conduct a survey about retrieval-augmented text generation. It firstly hig… ▽ More Recently, retrieval-augmented text generation attracted increasing attention of the computational linguistics community. Compared with conventional generation models, retrieval-augmented text generation has remarkable advantages and particularly has achieved state-of-the-art performance in many NLP tasks. This paper aims to conduct a survey about retrieval-augmented text generation. It firstly highlights the generic paradigm of retrieval-augmented generation, and then it reviews notable approaches according to different tasks including dialogue response generation, machine translation, and other generation tasks. Finally, it points out some important directions on top of recent methods to facilitate future research. △ Less

Submitted 13 February, 2022; v1 submitted 2 February, 2022; originally announced February 2022.

Comments: all authors contributed equally

arXiv:2112.15141 [pdf, other]

doi 10.1103/PhysRevA.105.042812

Synthetic topology and Floquet dynamic quantum phase transition in a periodically driven Raman lattice

Authors: De-Huan Cai, Wei Yi

Abstract: Stimulated by the recent progress in engineering topological band structures in cold atomic gases, we study the dynamic topological phenomena for atoms loaded in a periodically driven optical lattice. When the frequency of the periodic modulation is low, the time-dependent Hamiltonian can be mapped to a two-dimensional topological insulator, with the discretized frequency components playing the ro… ▽ More Stimulated by the recent progress in engineering topological band structures in cold atomic gases, we study the dynamic topological phenomena for atoms loaded in a periodically driven optical lattice. When the frequency of the periodic modulation is low, the time-dependent Hamiltonian can be mapped to a two-dimensional topological insulator, with the discretized frequency components playing the role of an additional, synthetic dimension. In the high-frequency limit, we derive the effective Floquet Hamiltonian of the system, and reveal the occurrence of Floquet dynamic quantum phase transitions -- an emergent topological phenomenon in the micromotion of the Floquet dynamics. Addressing the relation between the topology of the effective Floquet Hamiltonian and the presence of dynamic topological phenomena, we demonstrate that the topologically non-trivial nature of the Floquet Hamiltonian is a sufficient but not necessary condition for the onset of the Floquet dynamic quantum phase transition. We further discuss the relation of the topology of the Floquet Hamiltonian with the existence of dynamic skyrmion structures in the emergent momentum-time manifold of the micromotion, as well as the fate of these dynamic topological phenomena when the modulation frequency decreases away from the high-frequency limit. Finally, making use of the rich level structures of $^{171}$Yb atoms, we show that the system under study can be implemented in a one-dimensional Raman lattice where states in the $^1S_0$ ground-state manifold are coupled by Raman beams with periodically modulated amplitudes. △ Less

Submitted 28 April, 2022; v1 submitted 30 December, 2021; originally announced December 2021.

Journal ref: Phys. Rev. A 105, 042812(2022)

arXiv:2112.07259 [pdf, other]

doi 10.1145/3447548.3467410

TopNet: Learning from Neural Topic Model to Generate Long Stories

Authors: Yazheng Yang, Boyuan Pan, Deng Cai, Huan Sun

Abstract: Long story generation (LSG) is one of the coveted goals in natural language processing. Different from most text generation tasks, LSG requires to output a long story of rich content based on a much shorter text input, and often suffers from information sparsity. In this paper, we propose \emph{TopNet} to alleviate this problem, by leveraging the recent advances in neural topic modeling to obtain… ▽ More Long story generation (LSG) is one of the coveted goals in natural language processing. Different from most text generation tasks, LSG requires to output a long story of rich content based on a much shorter text input, and often suffers from information sparsity. In this paper, we propose \emph{TopNet} to alleviate this problem, by leveraging the recent advances in neural topic modeling to obtain high-quality skeleton words to complement the short input. In particular, instead of directly generating a story, we first learn to map the short text input to a low-dimensional topic distribution (which is pre-assigned by a topic model). Based on this latent topic distribution, we can use the reconstruction decoder of the topic model to sample a sequence of inter-related words as a skeleton for the story. Experiments on two benchmark datasets show that our proposed framework is highly effective in skeleton word selection and significantly outperforms the state-of-the-art models in both automatic evaluation and human evaluation. △ Less

Submitted 14 December, 2021; originally announced December 2021.

Comments: KDD2021, 9 pages

Journal ref: Yang, Yazheng, Boyuan Pan, Deng Cai, and Huan Sun. "TopNet: Learning from Neural Topic Model to Generate Long Stories." In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 1997-2005. 2021

arXiv:2112.02353 [pdf, other]

Label Hierarchy Transition: Delving into Class Hierarchies to Enhance Deep Classifiers

Authors: Renzhen Wang, De cai, Kaiwen Xiao, Xixi Jia, Xiao Han, Deyu Meng

Abstract: Hierarchical classification aims to sort the object into a hierarchical structure of categories. For example, a bird can be categorized according to a three-level hierarchy of order, family, and species. Existing methods commonly address hierarchical classification by decoupling it into a series of multi-class classification tasks. However, such a multi-task learning strategy fails to fully exploi… ▽ More Hierarchical classification aims to sort the object into a hierarchical structure of categories. For example, a bird can be categorized according to a three-level hierarchy of order, family, and species. Existing methods commonly address hierarchical classification by decoupling it into a series of multi-class classification tasks. However, such a multi-task learning strategy fails to fully exploit the correlation among various categories across different levels of the hierarchy. In this paper, we propose Label Hierarchy Transition (LHT), a unified probabilistic framework based on deep learning, to address the challenges of hierarchical classification. The LHT framework consists of a transition network and a confusion loss. The transition network focuses on explicitly learning the label hierarchy transition matrices, which has the potential to effectively encode the underlying correlations embedded within class hierarchies. The confusion loss encourages the classification network to learn correlations across different label hierarchies during training. The proposed framework can be readily adapted to any existing deep network with only minor modifications. We experiment with a series of public benchmark datasets for hierarchical classification problems, and the results demonstrate the superiority of our approach beyond current state-of-the-art methods. Furthermore, we extend our proposed LHT framework to the skin lesion diagnosis task and validate its great potential in computer-aided diagnosis. The code of our method is available at \href{https://github.com/renzhenwang/label-hierarchy-transition}{https://github.com/renzhenwang/label-hierarchy-transition}. △ Less

Submitted 31 October, 2023; v1 submitted 4 December, 2021; originally announced December 2021.

arXiv:2111.15464 [pdf, other]

Energy-Efficient Design for a NOMA assisted STAR-RIS Network with Deep Reinforcement Learning

Authors: Yi Guo, Fang Fang, Donghong Cai, Zhiguo Ding

Abstract: Simultaneous transmitting and reflecting reconfigurable intelligent surfaces (STAR-RISs) has been considered as a promising auxiliary device to enhance the performance of the wireless network, where users located at the different sides of the surfaces can be simultaneously served by the transmitting and reflecting signals. In this paper, the energy efficiency (EE) maximization problem for a non-or… ▽ More Simultaneous transmitting and reflecting reconfigurable intelligent surfaces (STAR-RISs) has been considered as a promising auxiliary device to enhance the performance of the wireless network, where users located at the different sides of the surfaces can be simultaneously served by the transmitting and reflecting signals. In this paper, the energy efficiency (EE) maximization problem for a non-orthogonal multiple access (NOMA) assisted STAR-RIS downlink network is investigated. Due to the fractional form of the EE, it is challenging to solve the EE maximization problem by the traditional convex optimization solutions. In this work, a deep deterministic policy gradient (DDPG)-based algorithm is proposed to maximize the EE by jointly optimizing the transmission beamforming vectors at the base station and the coefficients matrices at the STAR-RIS. Simulation results demonstrate that the proposed algorithm can effectively maximize the system EE considering the time-varying channels. △ Less

Submitted 30 November, 2021; originally announced November 2021.

arXiv:2111.10342 [pdf, other]

GRecX: An Efficient and Unified Benchmark for GNN-based Recommendation

Authors: Desheng Cai, Jun Hu, Quan Zhao, Shengsheng Qian, Quan Fang, Changsheng Xu

Abstract: In this paper, we present GRecX, an open-source TensorFlow framework for benchmarking GNN-based recommendation models in an efficient and unified way. GRecX consists of core libraries for building GNN-based recommendation benchmarks, as well as the implementations of popular GNN-based recommendation models. The core libraries provide essential components for building efficient and unified benchmar… ▽ More In this paper, we present GRecX, an open-source TensorFlow framework for benchmarking GNN-based recommendation models in an efficient and unified way. GRecX consists of core libraries for building GNN-based recommendation benchmarks, as well as the implementations of popular GNN-based recommendation models. The core libraries provide essential components for building efficient and unified benchmarks, including FastMetrics (efficient metrics computation libraries), VectorSearch (efficient similarity search libraries for dense vectors), BatchEval (efficient mini-batch evaluation libraries), and DataManager (unified dataset management libraries). Especially, to provide a unified benchmark for the fair comparison of different complex GNN-based recommendation models, we design a new metric GRMF-X and integrate it into the FastMetrics component. Based on a TensorFlow GNN library tf_geometric, GRecX carefully implements a variety of popular GNN-based recommendation models. We carefully implement these baseline models to reproduce the performance reported in the literature, and our implementations are usually more efficient and friendly. In conclusion, GRecX enables uses to train and benchmark GNN-based recommendation baselines in an efficient and unified way. We conduct experiments with GRecX, and the experimental results show that GRecX allows us to train and benchmark GNN-based recommendation baselines in an efficient and unified way. The source code of GRecX is available at https://github.com/maenzhier/GRecX. △ Less

Submitted 22 February, 2022; v1 submitted 19 November, 2021; originally announced November 2021.

arXiv:2110.06612 [pdf, other]

Exploring Dense Retrieval for Dialogue Response Selection

Authors: Tian Lan, Deng Cai, Yan Wang, Yixuan Su, Heyan Huang, Xian-Ling Mao

Abstract: Recent progress in deep learning has continuously improved the accuracy of dialogue response selection. In particular, sophisticated neural network architectures are leveraged to capture the rich interactions between dialogue context and response candidates. While remarkably effective, these models also bring in a steep increase in computational cost. Consequently, such models can only be used as… ▽ More Recent progress in deep learning has continuously improved the accuracy of dialogue response selection. In particular, sophisticated neural network architectures are leveraged to capture the rich interactions between dialogue context and response candidates. While remarkably effective, these models also bring in a steep increase in computational cost. Consequently, such models can only be used as a re-rank module in practice. In this study, we present a solution to directly select proper responses from a large corpus or even a nonparallel corpus that only consists of unpaired sentences, using a dense retrieval model. To push the limits of dense retrieval, we design an interaction layer upon the dense retrieval models and apply a set of tailor-designed learning strategies. Our model shows superiority over strong baselines on the conventional re-rank evaluation setting, which is remarkable given its efficiency. To verify the effectiveness of our approach in realistic scenarios, we also conduct full-rank evaluation, where the target is to select proper responses from a full candidate pool that may contain millions of candidates and evaluate them fairly through human annotations. Our proposed model notably outperforms pipeline baselines that integrate fast recall and expressive re-rank modules. Human evaluation results show that enlarging the candidate pool with nonparallel corpora improves response quality further. △ Less

Submitted 25 April, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

Comments: 11 pages, 4 figures, 6 tables

arXiv:2109.15196 [pdf, other]

Multilingual AMR Parsing with Noisy Knowledge Distillation

Authors: Deng Cai, Xin Li, Jackie Chun-Sing Ho, Lidong Bing, Wai Lam

Abstract: We study multilingual AMR parsing from the perspective of knowledge distillation, where the aim is to learn and improve a multilingual AMR parser by using an existing English parser as its teacher. We constrain our exploration in a strict multilingual setting: there is but one model to parse all different languages including English. We identify that noisy input and precise output are the key to s… ▽ More We study multilingual AMR parsing from the perspective of knowledge distillation, where the aim is to learn and improve a multilingual AMR parser by using an existing English parser as its teacher. We constrain our exploration in a strict multilingual setting: there is but one model to parse all different languages including English. We identify that noisy input and precise output are the key to successful distillation. Together with extensive pre-training, we obtain an AMR parser whose performances surpass all previously published results on four different foreign languages, including German, Spanish, Italian, and Chinese, by large margins (up to 18.8 \textsc{Smatch} points on Chinese and on average 11.3 \textsc{Smatch} points). Our parser also achieves comparable performance on English to the latest state-of-the-art English-only parser. △ Less

Submitted 13 October, 2021; v1 submitted 30 September, 2021; originally announced September 2021.

Comments: EMNLP21 (findings)

arXiv:2109.14739 [pdf, other]

Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System

Authors: Yixuan Su, Lei Shu, Elman Mansimov, Arshit Gupta, Deng Cai, Yi-An Lai, Yi Zhang

Abstract: Pre-trained language models have been recently shown to benefit task-oriented dialogue (TOD) systems. Despite their success, existing methods often formulate this task as a cascaded generation problem which can lead to error accumulation across different sub-tasks and greater data annotation overhead. In this study, we present PPTOD, a unified plug-and-play model for task-oriented dialogue. In add… ▽ More Pre-trained language models have been recently shown to benefit task-oriented dialogue (TOD) systems. Despite their success, existing methods often formulate this task as a cascaded generation problem which can lead to error accumulation across different sub-tasks and greater data annotation overhead. In this study, we present PPTOD, a unified plug-and-play model for task-oriented dialogue. In addition, we introduce a new dialogue multi-task pre-training strategy that allows the model to learn the primary TOD task completion skills from heterogeneous dialog corpora. We extensively test our model on three benchmark TOD tasks, including end-to-end dialogue modelling, dialogue state tracking, and intent classification. Experimental results show that PPTOD achieves new state of the art on all evaluated tasks in both high-resource and low-resource scenarios. Furthermore, comparisons against previous SOTA methods show that the responses generated by PPTOD are more factually correct and semantically coherent as judged by human annotators. △ Less

Submitted 1 March, 2022; v1 submitted 29 September, 2021; originally announced September 2021.

Comments: Camera-ready for ACL2022 main conference

arXiv:2109.12309 [pdf, other]

doi 10.1109/TNSE.2023.3266381

Scaling properties of scale-free networks in degree-thresholding renormalization flows

Authors: Dan Chen, Defu Cai, Housheng Su

Abstract: We study the statistical properties of observables of scale-free networks in the degree-thresholding renormalization (DTR) flows. For BA scale-free networks with different sizes, we find that their structural and dynamical observables have similar scaling behavior in the DTR flow. The finite-size scaling analysis confirms this view and reveals a scaling function with a single scaling exponent that… ▽ More We study the statistical properties of observables of scale-free networks in the degree-thresholding renormalization (DTR) flows. For BA scale-free networks with different sizes, we find that their structural and dynamical observables have similar scaling behavior in the DTR flow. The finite-size scaling analysis confirms this view and reveals a scaling function with a single scaling exponent that collectively captures the changes of these observables. Furthermore, for the scale-free network with a single initial size, we use its DTR snapshots as the original networks in the DTR flows, then perform a similar finite-size scaling analysis. Interestingly, the initial network and its snapshots share the same scaling exponent as the BA synthetic network. Our findings have important guiding significance for analyzing the structure and dynamic behavior of large-scale networks. Such as, in large-scale simulation scenarios with high time complexity, the DTR snapshot could serve as a substitute or guide for the initial network and then quickly explore the scaling behavior of initial networks. △ Less

Submitted 28 November, 2022; v1 submitted 25 September, 2021; originally announced September 2021.

Journal ref: 2023, IEEE Transactions on Network Science and Engineering

arXiv:2109.09059 [pdf]

A simple transcendental travelling wave solution and stability study for the thermophoretic motion with variable heat transmission factors on substrate-supported grapheme sheet

Authors: Yue Chan, Daoju Cai, Kaisheng Cai, Shern-Long Lee, Rumiao Lin, Yong Ren

Abstract: Manually tailored wrinkled graphene sheets hold great promise in fabricating smart solid-state devices. In this paper, we employ an energy method to transform the original third-order partial differential equation (pde), i.e. Eq. (1) into the first-order pde, i.e. Eq. (8) for the thermophoretic motion of substrate-supported graphene sheets, which can be solved in terms of semi-group and transcende… ▽ More Manually tailored wrinkled graphene sheets hold great promise in fabricating smart solid-state devices. In this paper, we employ an energy method to transform the original third-order partial differential equation (pde), i.e. Eq. (1) into the first-order pde, i.e. Eq. (8) for the thermophoretic motion of substrate-supported graphene sheets, which can be solved in terms of semi-group and transcendental solutions. Unlike soliton solutions derived using other more sophisticated techniques [9, 23], the present transcendental solution can be easily solved numerically and provides physical insights. Most importantly, we verify that the formation of various forms for wrinkling wave solutions can be determined by the evolution of equilibrium points for Eq. (1). This sheds a light on modifying the heat sources in order to control the configuration of wrinkle waves that has not been previously addressed. △ Less

Submitted 19 September, 2021; originally announced September 2021.

Comments: 8 pages, 5 figures, conference

arXiv:2109.02905 [pdf, other]

Exploiting Reasoning Chains for Multi-hop Science Question Answering

Authors: Weiwen Xu, Yang Deng, Huihui Zhang, Deng Cai, Wai Lam

Abstract: We propose a novel Chain Guided Retriever-reader ({\tt CGR}) framework to model the reasoning chain for multi-hop Science Question Answering. Our framework is capable of performing explainable reasoning without the need of any corpus-specific annotations, such as the ground-truth reasoning chain, or human-annotated entity mentions. Specifically, we first generate reasoning chains from a semantic g… ▽ More We propose a novel Chain Guided Retriever-reader ({\tt CGR}) framework to model the reasoning chain for multi-hop Science Question Answering. Our framework is capable of performing explainable reasoning without the need of any corpus-specific annotations, such as the ground-truth reasoning chain, or human-annotated entity mentions. Specifically, we first generate reasoning chains from a semantic graph constructed by Abstract Meaning Representation of retrieved evidence facts. A \textit{Chain-aware loss}, concerning both local and global chain information, is also designed to enable the generated chains to serve as distant supervision signals for training the retriever, where reinforcement learning is also adopted to maximize the utility of the reasoning chains. Our framework allows the retriever to capture step-by-step clues of the entire reasoning process, which is not only shown to be effective on two challenging multi-hop Science QA tasks, namely OpenBookQA and ARC-Challenge, but also favors explainability. △ Less

Submitted 7 September, 2021; originally announced September 2021.

Comments: 14 pages, Findings of EMNLP 2021

arXiv:2109.02853 [pdf, other]

The DKU-DukeECE System for the Self-Supervision Speaker Verification Task of the 2021 VoxCeleb Speaker Recognition Challenge

Authors: Danwei Cai, Ming Li

Abstract: This report describes the submission of the DKU-DukeECE team to the self-supervision speaker verification task of the 2021 VoxCeleb Speaker Recognition Challenge (VoxSRC). Our method employs an iterative labeling framework to learn self-supervised speaker representation based on a deep neural network (DNN). The framework starts with training a self-supervision speaker embedding network by maximizi… ▽ More This report describes the submission of the DKU-DukeECE team to the self-supervision speaker verification task of the 2021 VoxCeleb Speaker Recognition Challenge (VoxSRC). Our method employs an iterative labeling framework to learn self-supervised speaker representation based on a deep neural network (DNN). The framework starts with training a self-supervision speaker embedding network by maximizing agreement between different segments within an utterance via a contrastive loss. Taking advantage of DNN's ability to learn from data with label noise, we propose to cluster the speaker embedding obtained from the previous speaker network and use the subsequent class assignments as pseudo labels to train a new DNN. Moreover, we iteratively train the speaker network with pseudo labels generated from the previous step to bootstrap the discriminative power of a DNN. Also, visual modal data is incorporated in this self-labeling framework. The visual pseudo label and the audio pseudo label are fused with a cluster ensemble algorithm to generate a robust supervisory signal for representation learning. Our submission achieves an equal error rate (EER) of 5.58% and 5.59% on the challenge development and test set, respectively. △ Less

Submitted 7 September, 2021; originally announced September 2021.

Comments: arXiv admin note: text overlap with arXiv:2010.14751

arXiv:2109.02002 [pdf, other]

The DKU-DukeECE-Lenovo System for the Diarization Task of the 2021 VoxCeleb Speaker Recognition Challenge

Authors: Weiqing Wang, Danwei Cai, Qingjian Lin, Lin Yang, Junjie Wang, Jin Wang, Ming Li

Abstract: This report describes the submission of the DKU-DukeECE-Lenovo team to the VoxCeleb Speaker Recognition Challenge (VoxSRC) 2021 track 4. Our system including a voice activity detection (VAD) model, a speaker embedding model, two clustering-based speaker diarization systems with different similarity measurements, two different overlapped speech detection (OSD) models, and a target-speaker voice act… ▽ More This report describes the submission of the DKU-DukeECE-Lenovo team to the VoxCeleb Speaker Recognition Challenge (VoxSRC) 2021 track 4. Our system including a voice activity detection (VAD) model, a speaker embedding model, two clustering-based speaker diarization systems with different similarity measurements, two different overlapped speech detection (OSD) models, and a target-speaker voice activity detection (TS-VAD) model. Our final submission, consisting of 5 independent systems, achieves a DER of 5.07% on the challenge test set. △ Less

Submitted 6 September, 2021; v1 submitted 5 September, 2021; originally announced September 2021.

arXiv:2108.13858 [pdf, other]

GRP-FED: Addressing Client Imbalance in Federated Learning via Global-Regularized Personalization

Authors: Yen-Hsiu Chou, Shenda Hong, Chenxi Sun, Derun Cai, Moxian Song, Hongyan Li

Abstract: Since data is presented long-tailed in reality, it is challenging for Federated Learning (FL) to train across decentralized clients as practical applications. We present Global-Regularized Personalization (GRP-FED) to tackle the data imbalanced issue by considering a single global model and multiple local models for each client. With adaptive aggregation, the global model treats multiple clients f… ▽ More Since data is presented long-tailed in reality, it is challenging for Federated Learning (FL) to train across decentralized clients as practical applications. We present Global-Regularized Personalization (GRP-FED) to tackle the data imbalanced issue by considering a single global model and multiple local models for each client. With adaptive aggregation, the global model treats multiple clients fairly and mitigates the global long-tailed issue. Each local model is learned from the local data and aligns with its distribution for customization. To prevent the local model from just overfitting, GRP-FED applies an adversarial discriminator to regularize between the learned global-local features. Extensive results show that our GRP-FED improves under both global and local scenarios on real-world MIT-BIH and synthesis CIFAR-10 datasets, achieving comparable performance and addressing client imbalance. △ Less

Submitted 31 August, 2021; originally announced August 2021.

Comments: (FL-ICML'21) International Workshop on Federated Learning for User Privacy and Data Confidentiality in Conjunction with ICML 2021

arXiv:2108.13048 [pdf, other]

ASR-GLUE: A New Multi-task Benchmark for ASR-Robust Natural Language Understanding

Authors: Lingyun Feng, Jianwei Yu, Deng Cai, Songxiang Liu, Haitao Zheng, Yan Wang

Abstract: Language understanding in speech-based systems have attracted much attention in recent years with the growing demand for voice interface applications. However, the robustness of natural language understanding (NLU) systems to errors introduced by automatic speech recognition (ASR) is under-examined. %To facilitate the research on ASR-robust general language understanding, In this paper, we propose… ▽ More Language understanding in speech-based systems have attracted much attention in recent years with the growing demand for voice interface applications. However, the robustness of natural language understanding (NLU) systems to errors introduced by automatic speech recognition (ASR) is under-examined. %To facilitate the research on ASR-robust general language understanding, In this paper, we propose ASR-GLUE benchmark, a new collection of 6 different NLU tasks for evaluating the performance of models under ASR error across 3 different levels of background noise and 6 speakers with various voice characteristics. Based on the proposed benchmark, we systematically investigate the effect of ASR error on NLU tasks in terms of noise intensity, error type and speaker variants. We further purpose two ways, correction-based method and data augmentation-based method to improve robustness of the NLU systems. Extensive experimental results and analysises show that the proposed methods are effective to some extent, but still far from human performance, demonstrating that NLU under ASR error is still very challenging and requires further research. △ Less

Submitted 16 March, 2022; v1 submitted 30 August, 2021; originally announced August 2021.

arXiv:2108.07744 [pdf, other]

An Iterative Improvement Method for HHL algorithm for Solving Linear System of Equations

Authors: Yoshiyuki Saito, Xinwei Lee, Dongsheng Cai, Nobuyoshi Asai

Abstract: We propose an iterative improvement method for the Harrow-Hassidim-Lloyd (HHL) algorithm to solve a linear system of equations. This is a quantum-classical hybrid algorithm. The accuracy is essential to solve the linear system of equations. However, the accuracy of the HHL algorithm is limited by the number of quantum bits used to express the eigenvalues of the matrix. Our iterative method improve… ▽ More We propose an iterative improvement method for the Harrow-Hassidim-Lloyd (HHL) algorithm to solve a linear system of equations. This is a quantum-classical hybrid algorithm. The accuracy is essential to solve the linear system of equations. However, the accuracy of the HHL algorithm is limited by the number of quantum bits used to express the eigenvalues of the matrix. Our iterative method improves the accuracy of the HHL solutions, and gives higher accuracy which surpasses the accuracy limited by the number of quantum bits. In practical HHL algorithm, a huge number of measurements is required to obtain good accuracy, even if we provide a sufficient number of quantum bits for the eigenvalue expression, since the solution is statistically processed from the measurements. Our improved iterative method can reduce the number of measurements. Moreover, the sign information for each eigenstate of the solution is lost once the measurement is made, although the sign is significant. Therefore, the naïve iterative method of the HHL algorithm may slow down, especially, when the solution includes wrong signs. In this paper, we propose and evaluate an improved iterative method for the HHL algorithm that is robust against the sign information loss, in terms of the number of iterations and the computational accuracy. △ Less

Submitted 17 August, 2021; originally announced August 2021.

Comments: 7 pages, 7 figures

arXiv:2108.05288 [pdf, other]

doi 10.1109/QCE52317.2021.00016

Parameters Fixing Strategy for Quantum Approximate Optimization Algorithm

Authors: Xinwei Lee, Yoshiyuki Saito, Dongsheng Cai, Nobuyoshi Asai

Abstract: The quantum approximate optimization algorithm (QAOA) has numerous promising applications in solving the combinatorial optimization problems on near-term Noisy Intermediate Scalable Quantum (NISQ) devices. QAOA has a quantum-classical hybrid structure. Its quantum part consists of a parameterized alternating operator ansatz, and its classical part comprises an optimization algorithm, which optimiz… ▽ More The quantum approximate optimization algorithm (QAOA) has numerous promising applications in solving the combinatorial optimization problems on near-term Noisy Intermediate Scalable Quantum (NISQ) devices. QAOA has a quantum-classical hybrid structure. Its quantum part consists of a parameterized alternating operator ansatz, and its classical part comprises an optimization algorithm, which optimizes the parameters to maximize the expectation value of the problem Hamiltonian. This expectation value depends highly on the parameters, this implies that a set of good parameters leads to an accurate solution. However, at large circuit depth of QAOA, it is difficult to achieve global optimization due to the multiple occurrences of local minima or maxima. In this paper, we propose a parameters fixing strategy which gives high approximation ratio on average, even at large circuit depths, by initializing QAOA with the optimal parameters obtained from the previous depths. We test our strategy on the Max-cut problem of certain classes of graphs such as the 3-regular graphs and the Erdös-Rényi graphs. △ Less

Submitted 11 August, 2021; originally announced August 2021.

Comments: 7 pages, 5 figures, accepted in the IEEE International Conference on Quantum Computing and Engineering

arXiv:2108.00154 [pdf, other]

CrossFormer: A Versatile Vision Transformer Hinging on Cross-scale Attention

Authors: Wenxiao Wang, Lu Yao, Long Chen, Binbin Lin, Deng Cai, Xiaofei He, Wei Liu

Abstract: Transformers have made great progress in dealing with computer vision tasks. However, existing vision transformers do not yet possess the ability of building the interactions among features of different scales, which is perceptually important to visual inputs. The reasons are two-fold: (1) Input embeddings of each layer are equal-scale, so no cross-scale feature can be extracted; (2) to lower the… ▽ More Transformers have made great progress in dealing with computer vision tasks. However, existing vision transformers do not yet possess the ability of building the interactions among features of different scales, which is perceptually important to visual inputs. The reasons are two-fold: (1) Input embeddings of each layer are equal-scale, so no cross-scale feature can be extracted; (2) to lower the computational cost, some vision transformers merge adjacent embeddings inside the self-attention module, thus sacrificing small-scale (fine-grained) features of the embeddings and also disabling the cross-scale interactions. To this end, we propose Cross-scale Embedding Layer (CEL) and Long Short Distance Attention (LSDA). On the one hand, CEL blends each embedding with multiple patches of different scales, providing the self-attention module itself with cross-scale features. On the other hand, LSDA splits the self-attention module into a short-distance one and a long-distance counterpart, which not only reduces the computational burden but also keeps both small-scale and large-scale features in the embeddings. Through the above two designs, we achieve cross-scale attention. Besides, we put forward a dynamic position bias for vision transformers to make the popular relative position bias apply to variable-sized images. Hinging on the cross-scale attention module, we construct a versatile vision architecture, dubbed CrossFormer, which accommodates variable-sized inputs. Extensive experiments show that CrossFormer outperforms the other vision transformers on image classification, object detection, instance segmentation, and semantic segmentation tasks. The code has been released: https://github.com/cheerss/CrossFormer. △ Less

Submitted 8 October, 2021; v1 submitted 31 July, 2021; originally announced August 2021.

Comments: 15 pages, 4 figures, and 9 tables

arXiv:2107.06341 [pdf, ps, other]

Hybrid A Posteriori Error Estimators for Conforming Finite Element Approximations to Stationary Convection-Diffusion-Reaction equations

Authors: Difeng Cai, Zhiqiang Cai

Abstract: We consider the a posteriori error estimation for convection-diffusion-reaction equations in both diffusion-dominated and convection/reaction-dominated regimes. We present an explicit hybrid estimator, which, in each regime, is proved to be reliable and efficient with constants independent of the parameters in the underlying problem. For convection-dominated problems, the norm introduced by Verf{ü… ▽ More We consider the a posteriori error estimation for convection-diffusion-reaction equations in both diffusion-dominated and convection/reaction-dominated regimes. We present an explicit hybrid estimator, which, in each regime, is proved to be reliable and efficient with constants independent of the parameters in the underlying problem. For convection-dominated problems, the norm introduced by Verf{ü}rth \cite{verf2005confusion} is used to measure the approximation error. Various numerical experiments are performed to (1) demonstrate the robustness of the hybrid estimator; (2) show that the hybrid estimator is more accurate than the explicit residual estimator and is less sensitive to the size of reaction, even though both of them are robust. △ Less

Submitted 15 July, 2021; v1 submitted 13 July, 2021; originally announced July 2021.

arXiv:2106.05517 [pdf, other]

Learning to Affiliate: Mutual Centralized Learning for Few-shot Classification

Authors: Yang Liu, Weifeng Zhang, Chao Xiang, Tu Zheng, Deng Cai, Xiaofei He

Abstract: Few-shot learning (FSL) aims to learn a classifier that can be easily adapted to accommodate new tasks not seen during training, given only a few examples. To handle the limited-data problem in few-shot regimes, recent methods tend to collectively use a set of local features to densely represent an image instead of using a mixed global feature. They generally explore a unidirectional query-to-supp… ▽ More Few-shot learning (FSL) aims to learn a classifier that can be easily adapted to accommodate new tasks not seen during training, given only a few examples. To handle the limited-data problem in few-shot regimes, recent methods tend to collectively use a set of local features to densely represent an image instead of using a mixed global feature. They generally explore a unidirectional query-to-support paradigm in FSL, e.g., find the nearest/optimal support feature for each query feature and aggregate these local matches for a joint classification. In this paper, we propose a new method Mutual Centralized Learning (MCL) to fully affiliate the two disjoint sets of dense features in a bidirectional paradigm. We associate each local feature with a particle that can bidirectionally random walk in a discrete feature space by the affiliations. To estimate the class probability, we propose the features' accessibility that measures the expected number of visits to the support features of that class in a Markov process. We relate our method to learning a centrality on an affiliation network and demonstrate its capability to be plugged in existing methods by highlighting centralized local features. Experiments show that our method achieves the state-of-the-art on both miniImageNet and tieredImageNet. △ Less

Submitted 18 March, 2022; v1 submitted 10 June, 2021; originally announced June 2021.

Comments: CVPR 2022

Showing 101–150 of 338 results for author: Cai, D