subscribe to arXiv mailings

Noisy Universal Domain Adaptation via Divergence Optimization for Visual Recognition

Authors: Qing Yu, Atsushi Hashimoto, Yoshitaka Ushiku

Abstract: To transfer the knowledge learned from a labeled source domain to an unlabeled target domain, many studies have worked on universal domain adaptation (UniDA), where there is no constraint on the label sets of the source domain and target domain. However, the existing UniDA methods rely on source samples with correct annotations. Due to the limited resources in the real world, it is difficult to ob… ▽ More To transfer the knowledge learned from a labeled source domain to an unlabeled target domain, many studies have worked on universal domain adaptation (UniDA), where there is no constraint on the label sets of the source domain and target domain. However, the existing UniDA methods rely on source samples with correct annotations. Due to the limited resources in the real world, it is difficult to obtain a large amount of perfectly clean labeled data in a source domain in some applications. As a result, we propose a novel realistic scenario named Noisy UniDA, in which classifiers are trained using noisy labeled data from the source domain as well as unlabeled domain data from the target domain that has an uncertain class distribution. A multi-head convolutional neural network framework is proposed in this paper to address all of the challenges faced in the Noisy UniDA at once. Our network comprises a single common feature generator and multiple classifiers with various decision bounds. We can detect noisy samples in the source domain, identify unknown classes in the target domain, and align the distribution of the source and target domains by optimizing the divergence between the outputs of the various classifiers. The proposed method outperformed the existing methods in most of the settings after a thorough analysis of the various domain adaption scenarios. The source code is available at \url{https://github.com/YU1ut/Divergence-Optimization}. △ Less

Submitted 20 April, 2023; originally announced April 2023.

arXiv:2304.04694 [pdf, other]

Video-kMaX: A Simple Unified Approach for Online and Near-Online Video Panoptic Segmentation

Authors: Inkyu Shin, Dahun Kim, Qihang Yu, Jun Xie, Hong-Seok Kim, Bradley Green, In So Kweon, Kuk-Jin Yoon, Liang-Chieh Chen

Abstract: Video Panoptic Segmentation (VPS) aims to achieve comprehensive pixel-level scene understanding by segmenting all pixels and associating objects in a video. Current solutions can be categorized into online and near-online approaches. Evolving over the time, each category has its own specialized designs, making it nontrivial to adapt models between different categories. To alleviate the discrepancy… ▽ More Video Panoptic Segmentation (VPS) aims to achieve comprehensive pixel-level scene understanding by segmenting all pixels and associating objects in a video. Current solutions can be categorized into online and near-online approaches. Evolving over the time, each category has its own specialized designs, making it nontrivial to adapt models between different categories. To alleviate the discrepancy, in this work, we propose a unified approach for online and near-online VPS. The meta architecture of the proposed Video-kMaX consists of two components: within clip segmenter (for clip-level segmentation) and cross-clip associater (for association beyond clips). We propose clip-kMaX (clip k-means mask transformer) and HiLA-MB (Hierarchical Location-Aware Memory Buffer) to instantiate the segmenter and associater, respectively. Our general formulation includes the online scenario as a special case by adopting clip length of one. Without bells and whistles, Video-kMaX sets a new state-of-the-art on KITTI-STEP and VIPSeg for video panoptic segmentation, and VSPW for video semantic segmentation. Code will be made publicly available. △ Less

Submitted 10 April, 2023; originally announced April 2023.

arXiv:2304.04521 [pdf, other]

Zero-Shot In-Distribution Detection in Multi-Object Settings Using Vision-Language Foundation Models

Authors: Atsuyuki Miyai, Qing Yu, Go Irie, Kiyoharu Aizawa

Abstract: Extracting in-distribution (ID) images from noisy images scraped from the Internet is an important preprocessing for constructing datasets, which has traditionally been done manually. Automating this preprocessing with deep learning techniques presents two key challenges. First, images should be collected using only the name of the ID class without training on the ID data. Second, as we can see wh… ▽ More Extracting in-distribution (ID) images from noisy images scraped from the Internet is an important preprocessing for constructing datasets, which has traditionally been done manually. Automating this preprocessing with deep learning techniques presents two key challenges. First, images should be collected using only the name of the ID class without training on the ID data. Second, as we can see why COCO was created, it is crucial to identify images containing not only ID objects but also both ID and out-of-distribution (OOD) objects as ID images to create robust recognizers. In this paper, we propose a novel problem setting called zero-shot in-distribution (ID) detection, where we identify images containing ID objects as ID images (even if they contain OOD objects), and images lacking ID objects as OOD images without any training. To solve this problem, we leverage the powerful zero-shot capability of CLIP and present a simple and effective approach, Global-Local Maximum Concept Matching (GL-MCM), based on both global and local visual-text alignments of CLIP features. Extensive experiments demonstrate that GL-MCM outperforms comparison methods on both multi-object datasets and single-object ImageNet benchmarks. The code will be available via https://github.com/AtsuMiyai/GL-MCM. △ Less

Submitted 23 August, 2023; v1 submitted 10 April, 2023; originally announced April 2023.

Comments: v3: I fixed some typos from v2

arXiv:2304.04052 [pdf, other]

Decoder-Only or Encoder-Decoder? Interpreting Language Model as a Regularized Encoder-Decoder

Authors: Zihao Fu, Wai Lam, Qian Yu, Anthony Man-Cho So, Shengding Hu, Zhiyuan Liu, Nigel Collier

Abstract: The sequence-to-sequence (seq2seq) task aims at generating the target sequence based on the given input source sequence. Traditionally, most of the seq2seq task is resolved by the Encoder-Decoder framework which requires an encoder to encode the source sequence and a decoder to generate the target text. Recently, a bunch of new approaches have emerged that apply decoder-only language models direct… ▽ More The sequence-to-sequence (seq2seq) task aims at generating the target sequence based on the given input source sequence. Traditionally, most of the seq2seq task is resolved by the Encoder-Decoder framework which requires an encoder to encode the source sequence and a decoder to generate the target text. Recently, a bunch of new approaches have emerged that apply decoder-only language models directly to the seq2seq task. Despite the significant advancements in applying language models to the seq2seq task, there is still a lack of thorough analysis on the effectiveness of the decoder-only language model architecture. This paper aims to address this gap by conducting a detailed comparison between the encoder-decoder architecture and the decoder-only language model framework through the analysis of a regularized encoder-decoder structure. This structure is designed to replicate all behaviors in the classical decoder-only language model but has an encoder and a decoder making it easier to be compared with the classical encoder-decoder structure. Based on the analysis, we unveil the attention degeneration problem in the language model, namely, as the generation step number grows, less and less attention is focused on the source sequence. To give a quantitative understanding of this problem, we conduct a theoretical sensitivity analysis of the attention output with respect to the source input. Grounded on our analysis, we propose a novel partial attention language model to solve the attention degeneration problem. Experimental results on machine translation, summarization, and data-to-text generation tasks support our analysis and demonstrate the effectiveness of our proposed model. △ Less

Submitted 8 April, 2023; originally announced April 2023.

arXiv:2304.01794 [pdf, other]

doi 10.1103/PhysRevA.107.043110

Partial measurements of the total field gradient and the field gradient tensor using an atomic magnetic gradiometer

Authors: Qianqian Yu, Siqi Liu, Xueke Wang, Dong Sheng

Abstract: Magnetic gradiometers have wide practical and academic applications, and two important types of field gradient observables are the total field gradient and field gradient tensor. However, measurements of the field gradient tensor have not been the focus of previous researches on atomic magnetic gradiometers. In this work, we develop an atomic magnetic gradiometer based on two separately optically… ▽ More Magnetic gradiometers have wide practical and academic applications, and two important types of field gradient observables are the total field gradient and field gradient tensor. However, measurements of the field gradient tensor have not been the focus of previous researches on atomic magnetic gradiometers. In this work, we develop an atomic magnetic gradiometer based on two separately optically pumped atomic ensembles in a Herriott-cavity-assisted atomic cell. This gradiometer shows versatile operation modes and functions, and we demonstrate them in measurements of both types of field gradient observables. △ Less

Submitted 4 April, 2023; originally announced April 2023.

Comments: Accepted by Physical Review A

arXiv:2304.00479 [pdf, other]

Mixed-Integer Programming Approaches to Generalized Submodular Optimization and its Applications

Authors: Simge Küçükyavuz, Qimeng Yu

Abstract: Submodularity is an important concept in integer and combinatorial optimization. A classical submodular set function models the utility of selecting homogenous items from a single ground set, and such selections can be represented by binary variables. In practice, many problem contexts involve choosing heterogenous items from more than one ground set or selecting multiple copies of homogenous item… ▽ More Submodularity is an important concept in integer and combinatorial optimization. A classical submodular set function models the utility of selecting homogenous items from a single ground set, and such selections can be represented by binary variables. In practice, many problem contexts involve choosing heterogenous items from more than one ground set or selecting multiple copies of homogenous items, which call for extensions of submodularity. We refer to the optimization problems associated with such generalized notions of submodularity as Generalized Submodular Optimization (GSO). GSO is found in wide-ranging applications, including infrastructure design, healthcare, online marketing, and machine learning. Due to the often highly nonlinear (even non-convex and non-concave) objective function and the mixed-integer decision space, GSO is a broad subclass of challenging mixed-integer nonlinear programming problems. In this tutorial, we first provide an overview of classical submodularity. Then we introduce two subclasses of GSO, for which we present polyhedral theory for the mixed-integer set structures that arise from these problem classes. Our theoretical results lead to efficient and versatile exact solution methods that demonstrate their effectiveness in practical problems using real-world datasets. △ Less

Submitted 4 April, 2023; v1 submitted 2 April, 2023; originally announced April 2023.

arXiv:2303.17376 [pdf, other]

A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision

Authors: Lucas Beyer, Bo Wan, Gagan Madan, Filip Pavetic, Andreas Steiner, Alexander Kolesnikov, André Susano Pinto, Emanuele Bugliarello, Xiao Wang, Qihang Yu, Liang-Chieh Chen, Xiaohua Zhai

Abstract: There has been a recent explosion of computer vision models which perform many tasks and are composed of an image encoder (usually a ViT) and an autoregressive decoder (usually a Transformer). However, most of this work simply presents one system and its results, leaving many questions regarding design decisions and trade-offs of such systems unanswered. In this work, we aim to provide such answer… ▽ More There has been a recent explosion of computer vision models which perform many tasks and are composed of an image encoder (usually a ViT) and an autoregressive decoder (usually a Transformer). However, most of this work simply presents one system and its results, leaving many questions regarding design decisions and trade-offs of such systems unanswered. In this work, we aim to provide such answers. We take a close look at autoregressive decoders for multi-task learning in multimodal computer vision, including classification, captioning, visual question answering, and optical character recognition. Through extensive systematic experiments, we study the effects of task and data mixture, training and regularization hyperparameters, conditioning type and specificity, modality combination, and more. Importantly, we compare these to well-tuned single-task baselines to highlight the cost incurred by multi-tasking. A key finding is that a small decoder learned on top of a frozen pretrained encoder works surprisingly well. We call this setup locked-image tuning with decoder (LiT-decoder). It can be seen as teaching a decoder to interact with a pretrained vision model via natural language. △ Less

Submitted 30 March, 2023; originally announced March 2023.

arXiv:2303.15790 [pdf, other]

doi 10.1007/s11467-023-1333-z

STCF Conceptual Design Report: Volume 1 -- Physics & Detector

Authors: M. Achasov, X. C. Ai, R. Aliberti, L. P. An, Q. An, X. Z. Bai, Y. Bai, O. Bakina, A. Barnyakov, V. Blinov, V. Bobrovnikov, D. Bodrov, A. Bogomyagkov, A. Bondar, I. Boyko, Z. H. Bu, F. M. Cai, H. Cai, J. J. Cao, Q. H. Cao, Z. Cao, Q. Chang, K. T. Chao, D. Y. Chen, H. Chen , et al. (413 additional authors not shown)

Abstract: The Super $τ$-Charm facility (STCF) is an electron-positron collider proposed by the Chinese particle physics community. It is designed to operate in a center-of-mass energy range from 2 to 7 GeV with a peak luminosity of $0.5\times 10^{35}{\rm cm}^{-2}{\rm s}^{-1}$ or higher. The STCF will produce a data sample about a factor of 100 larger than that by the present $τ$-Charm factory -- the BEPCII,… ▽ More The Super $τ$-Charm facility (STCF) is an electron-positron collider proposed by the Chinese particle physics community. It is designed to operate in a center-of-mass energy range from 2 to 7 GeV with a peak luminosity of $0.5\times 10^{35}{\rm cm}^{-2}{\rm s}^{-1}$ or higher. The STCF will produce a data sample about a factor of 100 larger than that by the present $τ$-Charm factory -- the BEPCII, providing a unique platform for exploring the asymmetry of matter-antimatter (charge-parity violation), in-depth studies of the internal structure of hadrons and the nature of non-perturbative strong interactions, as well as searching for exotic hadrons and physics beyond the Standard Model. The STCF project in China is under development with an extensive R\&D program. This document presents the physics opportunities at the STCF, describes conceptual designs of the STCF detector system, and discusses future plans for detector R\&D and physics case studies. △ Less

Submitted 5 October, 2023; v1 submitted 28 March, 2023; originally announced March 2023.

Journal ref: Front. Phys. 19(1), 14701 (2024)

arXiv:2303.14086 [pdf, other]

Finite Field Multiple Access

Authors: Qi-yue Yu, Jiang-xuan Li, Shu Lin

Abstract: In the past several decades, various techniques have been developed and used for multiple-access (MA) communications. With the new applications for 6G, it is desirable to find new resources, physical or virtual, to confront the fast development of MA communication systems. For binary source transmission, this paper proposes an element-pair (EP) coding scheme for supporting massive users with short… ▽ More In the past several decades, various techniques have been developed and used for multiple-access (MA) communications. With the new applications for 6G, it is desirable to find new resources, physical or virtual, to confront the fast development of MA communication systems. For binary source transmission, this paper proposes an element-pair (EP) coding scheme for supporting massive users with short packet traffic, which solves the finite blocklength (FBL) of multiuser reliability transmission problem. Each user is assigned a unique EP, and the collection of EPs assigned to the users possesses the unique sum-pattern mapping (USPM) structural property. We present methods for constructing symbol-wise EP codes with USPM structural property based on the prime field and extension field of prime field, respectively. Based on the orthogonal EP code constructed over GF($2^m$), we propose finite-field MA (FFMA) systems over a Gaussian multiple-access channel (GMAC), including both the sparse-form and diagonal-from structures. The proposed FFMA is then applied to network layer and forms network FFMA systems for pure digital networks, in which an EP is viewed as a virtual resource block (VRB). Simulation results show that the error performance of the proposed FFMA over a GMAC can approach the error performance as that of the single-user transmission. △ Less

Submitted 26 May, 2024; v1 submitted 24 March, 2023; originally announced March 2023.

Comments: 38 pages, 11 figures

arXiv:2303.13233 [pdf, other]

Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World

Authors: Qifan Yu, Juncheng Li, Yu Wu, Siliang Tang, Wei Ji, Yueting Zhuang

Abstract: Scene Graph Generation (SGG) aims to extract <subject, predicate, object> relationships in images for vision understanding. Although recent works have made steady progress on SGG, they still suffer long-tail distribution issues that tail-predicates are more costly to train and hard to distinguish due to a small amount of annotated data compared to frequent predicates. Existing re-balancing strateg… ▽ More Scene Graph Generation (SGG) aims to extract <subject, predicate, object> relationships in images for vision understanding. Although recent works have made steady progress on SGG, they still suffer long-tail distribution issues that tail-predicates are more costly to train and hard to distinguish due to a small amount of annotated data compared to frequent predicates. Existing re-balancing strategies try to handle it via prior rules but are still confined to pre-defined conditions, which are not scalable for various models and datasets. In this paper, we propose a Cross-modal prediCate boosting (CaCao) framework, where a visually-prompted language model is learned to generate diverse fine-grained predicates in a low-resource way. The proposed CaCao can be applied in a plug-and-play fashion and automatically strengthen existing SGG to tackle the long-tailed problem. Based on that, we further introduce a novel Entangled cross-modal prompt approach for open-world predicate scene graph generation (Epic), where models can generalize to unseen predicates in a zero-shot manner. Comprehensive experiments on three benchmark datasets show that CaCao consistently boosts the performance of multiple scene graph generation models in a model-agnostic way. Moreover, our Epic achieves competitive performance on open-world predicate prediction. The data and code for this paper are publicly available. △ Less

Submitted 19 August, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

Comments: Accepted by ICCV 2023

arXiv:2303.13090 [pdf, other]

Orthogonal Annotation Benefits Barely-supervised Medical Image Segmentation

Authors: Heng Cai, Shumeng Li, Lei Qi, Qian Yu, Yinghuan Shi, Yang Gao

Abstract: Recent trends in semi-supervised learning have significantly boosted the performance of 3D semi-supervised medical image segmentation. Compared with 2D images, 3D medical volumes involve information from different directions, e.g., transverse, sagittal, and coronal planes, so as to naturally provide complementary views. These complementary views and the intrinsic similarity among adjacent 3D slice… ▽ More Recent trends in semi-supervised learning have significantly boosted the performance of 3D semi-supervised medical image segmentation. Compared with 2D images, 3D medical volumes involve information from different directions, e.g., transverse, sagittal, and coronal planes, so as to naturally provide complementary views. These complementary views and the intrinsic similarity among adjacent 3D slices inspire us to develop a novel annotation way and its corresponding semi-supervised model for effective segmentation. Specifically, we firstly propose the orthogonal annotation by only labeling two orthogonal slices in a labeled volume, which significantly relieves the burden of annotation. Then, we perform registration to obtain the initial pseudo labels for sparsely labeled volumes. Subsequently, by introducing unlabeled volumes, we propose a dual-network paradigm named Dense-Sparse Co-training (DeSCO) that exploits dense pseudo labels in early stage and sparse labels in later stage and meanwhile forces consistent output of two networks. Experimental results on three benchmark datasets validated our effectiveness in performance and efficiency in annotation. For example, with only 10 annotated slices, our method reaches a Dice up to 86.93% on KiTS19 dataset. △ Less

Submitted 23 March, 2023; originally announced March 2023.

Comments: Accepted to CVPR 2023

arXiv:2303.09273 [pdf, other]

Adaptive Modeling of Uncertainties for Traffic Forecasting

Authors: Ying Wu, Yongchao Ye, Adnan Zeb, James J. Q. Yu, Zheng Wang

Abstract: Deep neural networks (DNNs) have emerged as a dominant approach for developing traffic forecasting models. These models are typically trained to minimize error on averaged test cases and produce a single-point prediction, such as a scalar value for traffic speed or travel time. However, single-point predictions fail to account for prediction uncertainty that is critical for many transportation man… ▽ More Deep neural networks (DNNs) have emerged as a dominant approach for developing traffic forecasting models. These models are typically trained to minimize error on averaged test cases and produce a single-point prediction, such as a scalar value for traffic speed or travel time. However, single-point predictions fail to account for prediction uncertainty that is critical for many transportation management scenarios, such as determining the best- or worst-case arrival time. We present QuanTraffic, a generic framework to enhance the capability of an arbitrary DNN model for uncertainty modeling. QuanTraffic requires little human involvement and does not change the base DNN architecture during deployment. Instead, it automatically learns a standard quantile function during the DNN model training to produce a prediction interval for the single-point prediction. The prediction interval defines a range where the true value of the traffic prediction is likely to fall. Furthermore, QuanTraffic develops an adaptive scheme that dynamically adjusts the prediction interval based on the location and prediction window of the test input. We evaluated QuanTraffic by applying it to five representative DNN models for traffic forecasting across seven public datasets. We then compared QuanTraffic against five uncertainty quantification methods. Compared to the baseline uncertainty modeling techniques, QuanTraffic with base DNN architectures delivers consistently better and more robust performance than the existing ones on the reported datasets. △ Less

Submitted 16 March, 2023; originally announced March 2023.

Comments: 14 pages, 5 figures

arXiv:2303.07184 [pdf, other]

Traffic Prediction with Transfer Learning: A Mutual Information-based Approach

Authors: Yunjie Huang, Xiaozhuang Song, Yuanshao Zhu, Shiyao Zhang, James J. Q. Yu

Abstract: In modern traffic management, one of the most essential yet challenging tasks is accurately and timely predicting traffic. It has been well investigated and examined that deep learning-based Spatio-temporal models have an edge when exploiting Spatio-temporal relationships in traffic data. Typically, data-driven models require vast volumes of data, but gathering data in small cities can be difficul… ▽ More In modern traffic management, one of the most essential yet challenging tasks is accurately and timely predicting traffic. It has been well investigated and examined that deep learning-based Spatio-temporal models have an edge when exploiting Spatio-temporal relationships in traffic data. Typically, data-driven models require vast volumes of data, but gathering data in small cities can be difficult owing to constraints such as equipment deployment and maintenance costs. To resolve this problem, we propose TrafficTL, a cross-city traffic prediction approach that uses big data from other cities to aid data-scarce cities in traffic prediction. Utilizing a periodicity-based transfer paradigm, it identifies data similarity and reduces negative transfer caused by the disparity between two data distributions from distant cities. In addition, the suggested method employs graph reconstruction techniques to rectify defects in data from small data cities. TrafficTL is evaluated by comprehensive case studies on three real-world datasets and outperforms the state-of-the-art baseline by around 8 to 25 percent. △ Less

Submitted 13 March, 2023; originally announced March 2023.

Comments: submited to T-ITS, 16 pages, 13 figures in color

arXiv:2303.06095 [pdf, other]

HiNet: Novel Multi-Scenario & Multi-Task Learning with Hierarchical Information Extraction

Authors: Jie Zhou, Xianshuai Cao, Wenhao Li, Lin Bo, Kun Zhang, Chuan Luo, Qian Yu

Abstract: Multi-scenario & multi-task learning has been widely applied to many recommendation systems in industrial applications, wherein an effective and practical approach is to carry out multi-scenario transfer learning on the basis of the Mixture-of-Expert (MoE) architecture. However, the MoE-based method, which aims to project all information in the same feature space, cannot effectively deal with the… ▽ More Multi-scenario & multi-task learning has been widely applied to many recommendation systems in industrial applications, wherein an effective and practical approach is to carry out multi-scenario transfer learning on the basis of the Mixture-of-Expert (MoE) architecture. However, the MoE-based method, which aims to project all information in the same feature space, cannot effectively deal with the complex relationships inherent among various scenarios and tasks, resulting in unsatisfactory performance. To tackle the problem, we propose a Hierarchical information extraction Network (HiNet) for multi-scenario and multi-task recommendation, which achieves hierarchical extraction based on coarse-to-fine knowledge transfer scheme. The multiple extraction layers of the hierarchical network enable the model to enhance the capability of transferring valuable information across scenarios while preserving specific features of scenarios and tasks. Furthermore, a novel scenario-aware attentive network module is proposed to model correlations between scenarios explicitly. Comprehensive experiments conducted on real-world industrial datasets from Meituan Meishi platform demonstrate that HiNet achieves a new state-of-the-art performance and significantly outperforms existing solutions. HiNet is currently fully deployed in two scenarios and has achieved 2.87% and 1.75% order quantity gain respectively. △ Less

Submitted 13 March, 2023; v1 submitted 10 March, 2023; originally announced March 2023.

arXiv:2303.05475 [pdf, other]

Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking

Authors: Peng Gao, Renrui Zhang, Rongyao Fang, Ziyi Lin, Hongyang Li, Hongsheng Li, Qiao Yu

Abstract: Masked Autoencoders (MAE) have been popular paradigms for large-scale vision representation pre-training. However, MAE solely reconstructs the low-level RGB signals after the decoder and lacks supervision upon high-level semantics for the encoder, thus suffering from sub-optimal learned representations and long pre-training epochs. To alleviate this, previous methods simply replace the pixel recon… ▽ More Masked Autoencoders (MAE) have been popular paradigms for large-scale vision representation pre-training. However, MAE solely reconstructs the low-level RGB signals after the decoder and lacks supervision upon high-level semantics for the encoder, thus suffering from sub-optimal learned representations and long pre-training epochs. To alleviate this, previous methods simply replace the pixel reconstruction targets of 75% masked tokens by encoded features from pre-trained image-image (DINO) or image-language (CLIP) contrastive learning. Different from those efforts, we propose to Mimic before Reconstruct for Masked Autoencoders, named as MR-MAE, which jointly learns high-level and low-level representations without interference during pre-training. For high-level semantics, MR-MAE employs a mimic loss over 25% visible tokens from the encoder to capture the pre-trained patterns encoded in CLIP and DINO. For low-level structures, we inherit the reconstruction loss in MAE to predict RGB pixel values for 75% masked tokens after the decoder. As MR-MAE applies high-level and low-level targets respectively at different partitions, the learning conflicts between them can be naturally overcome and contribute to superior visual representations for various downstream tasks. On ImageNet-1K, the MR-MAE base pre-trained for only 400 epochs achieves 85.8% top-1 accuracy after fine-tuning, surpassing the 1600-epoch MAE base by +2.2% and the previous state-of-the-art BEiT V2 base by +0.3%. Code and pre-trained models will be released at https://github.com/Alpha-VL/ConvMAE. △ Less

Submitted 9 March, 2023; originally announced March 2023.

Comments: 12 pages, 3 figures

arXiv:2303.04354 [pdf, other]

A sensitive and stable atomic vector magnetometer for weak field detections using double orthogonal multipass cavities

Authors: Siqi Liu, Qianqian Yu, Hao Zhou, Dong Sheng

Abstract: This paper presents a compact low-temperature atomic vector magnetometer for weak field measurements, using an atomic cell containing two orthogonal multipass cavities. At the working temperature of 75 $^\circ$C, the magnetic field sensitivities at all three axes are better than 45 fT/Hz$^{1/2}$ at 10~Hz limited by photon noise, and 85 fT/Hz$^{1/2}$ at 0.1~Hz. This sensor also shows measurement st… ▽ More This paper presents a compact low-temperature atomic vector magnetometer for weak field measurements, using an atomic cell containing two orthogonal multipass cavities. At the working temperature of 75 $^\circ$C, the magnetic field sensitivities at all three axes are better than 45 fT/Hz$^{1/2}$ at 10~Hz limited by photon noise, and 85 fT/Hz$^{1/2}$ at 0.1~Hz. This sensor also shows measurement stabilities better than 1.5~pT at three axes for an integration time of $10^4$ s, even with the laser frequency unlocked. The sensor response to a rotation is demonstrated, which is also developed to measure the effective gyromagnetic ratio of atoms in this sensor when the bias field is nulled. This magnetometer makes an important step towards long-term stable measurements and calibrations of ultra-low fields. △ Less

Submitted 7 March, 2023; originally announced March 2023.

Comments: Submitted to Physical Review Applied

arXiv:2303.04351 [pdf, other]

ElC-OIS: Ellipsoidal Clustering for Open-World Instance Segmentation on LiDAR Data

Authors: Wenbang Deng, Kaihong Huang, Qinghua Yu, Huimin Lu, Zhiqiang Zheng, Xieyuanli Chen

Abstract: Open-world Instance Segmentation (OIS) is a challenging task that aims to accurately segment every object instance appearing in the current observation, regardless of whether these instances have been labeled in the training set. This is important for safety-critical applications such as robust autonomous navigation. In this paper, we present a flexible and effective OIS framework for LiDAR point… ▽ More Open-world Instance Segmentation (OIS) is a challenging task that aims to accurately segment every object instance appearing in the current observation, regardless of whether these instances have been labeled in the training set. This is important for safety-critical applications such as robust autonomous navigation. In this paper, we present a flexible and effective OIS framework for LiDAR point cloud that can accurately segment both known and unknown instances (i.e., seen and unseen instance categories during training). It first identifies points belonging to known classes and removes the background by leveraging close-set panoptic segmentation networks. Then, we propose a novel ellipsoidal clustering method that is more adapted to the characteristic of LiDAR scans and allows precise segmentation of unknown instances. Furthermore, a diffuse searching method is proposed to handle the common over-segmentation problem presented in the known instances. With the combination of these techniques, we are able to achieve accurate segmentation for both known and unknown instances. We evaluated our method on the SemanticKITTI open-world LiDAR instance segmentation dataset. The experimental results suggest that it outperforms current state-of-the-art methods, especially with a 10.0% improvement in association quality. The source code of our method will be publicly available at https://github.com/nubot-nudt/ElC-OIS. △ Less

Submitted 7 March, 2023; originally announced March 2023.

arXiv:2303.03817 [pdf, other]

doi 10.1109/ISBI53787.2023.10230619

Region and Spatial Aware Anomaly Detection for Fundus Images

Authors: Jingqi Niu, Shiwen Dong, Qinji Yu, Kang Dang, Xiaowei Ding

Abstract: Recently anomaly detection has drawn much attention in diagnosing ocular diseases. Most existing anomaly detection research in fundus images has relatively large anomaly scores in the salient retinal structures, such as blood vessels, optical cups and discs. In this paper, we propose a Region and Spatial Aware Anomaly Detection (ReSAD) method for fundus images, which obtains local region and long-… ▽ More Recently anomaly detection has drawn much attention in diagnosing ocular diseases. Most existing anomaly detection research in fundus images has relatively large anomaly scores in the salient retinal structures, such as blood vessels, optical cups and discs. In this paper, we propose a Region and Spatial Aware Anomaly Detection (ReSAD) method for fundus images, which obtains local region and long-range spatial information to reduce the false positives in the normal structure. ReSAD transfers a pre-trained model to extract the features of normal fundus images and applies the Region-and-Spatial-Aware feature Combination module (ReSC) for pixel-level features to build a memory bank. In the testing phase, ReSAD uses the memory bank to determine out-of-distribution samples as abnormalities. Our method significantly outperforms the existing anomaly detection methods for fundus images on two publicly benchmark datasets. △ Less

Submitted 7 March, 2023; originally announced March 2023.

Report number: 2303.03817

Journal ref: 2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI), Cartagena, Colombia, 2023, pp. 1-5

arXiv:2303.02299 [pdf, other]

doi 10.3389/fphy.2023.1215468

Qubit Energy Tuner Based on Single Flux Quantum Circuits

Authors: Xiao Geng, Rutian Huang, Yongcheng He, Kaiyong He, Genting Dai, Liangliang Yang, Xinyu Wu, Qing Yu, Mingjun Cheng, Guodong Chen, Jianshe Liu, Wei Chen

Abstract: A device called qubit energy tuner (QET) based on single flux quantum (SFQ) circuits is proposed for Z control of superconducting qubits. Created from the improvement of flux digital-to-analog converters (flux DACs), a QET is able to set the energy levels or the frequencies of qubits, especially flux-tunable transmons, and perform gate operations requiring Z control. The circuit structure of QET i… ▽ More A device called qubit energy tuner (QET) based on single flux quantum (SFQ) circuits is proposed for Z control of superconducting qubits. Created from the improvement of flux digital-to-analog converters (flux DACs), a QET is able to set the energy levels or the frequencies of qubits, especially flux-tunable transmons, and perform gate operations requiring Z control. The circuit structure of QET is elucidated, which consists of an inductor loop and flux bias units for coarse tuning or fine tuning. The key feature of a QET is analyzed to understand how SFQ pulses change the inductor loop current, which provides external flux for qubits. To verify the functionality of the QET, three simulations are carried out. The first one verifies the responses of the inductor loop current to SFQ pulses. The results show that there is about 4.2% relative deviation between analytical solutions of the inductor loop current and the solutions from WRSpice time-domain simulation. The second and the third simulations with QuTip show how a Z gate and an iSWAP gate can be performed by this QET, respectively, with corresponding fidelities 99.99884% and 99.93906% for only once gate operation to specific initial states. These simulations indicate that the SFQ-based QET could act as an efficient component of SFQ-based quantum-classical interfaces for digital Z control of large-scale superconducting quantum computers. △ Less

Submitted 3 March, 2023; originally announced March 2023.

arXiv:2302.14671 [pdf, other]

doi 10.1063/5.0141987

Semiconducting nonperovskite ferroelectric oxynitride designed ab initio

Authors: Qisheng Yu, Jiawei Huang, Changming Ke, Zhuang Qian, Liyang Ma, Shi Liu

Abstract: Recent discovery of HfO2-based and nitride-based ferroelectrics that are compatible to the semiconductor manufacturing process have revitalized the field of ferroelectric-based nanoelectronics. Guided by a simple design principle of charge compensation and density functional theory calculations, we discover HfO2-like mixed-anion materials, TaON and NbON, can crystallize in the polar Pca21 phase wi… ▽ More Recent discovery of HfO2-based and nitride-based ferroelectrics that are compatible to the semiconductor manufacturing process have revitalized the field of ferroelectric-based nanoelectronics. Guided by a simple design principle of charge compensation and density functional theory calculations, we discover HfO2-like mixed-anion materials, TaON and NbON, can crystallize in the polar Pca21 phase with a strong thermodynamic driving force to adopt anion ordering spontaneously. Both oxynitrides possess large remnant polarization, low switching barriers, and unconventional negative piezoelectric effect, making them promising piezoelectrics and ferroelectrics. Distinct from HfO2 that has a wide band gap, both TaON and NbON can absorb visible light and have high charge carrier mobilities, suitable for ferroelectric photovoltaic and photocatalytic applications. This new class of multifunctional nonperovskite oxynitride containing economical and environmentally benign elements offer a platform to design and optimize high-performing ferroelectric semiconductors for integrated systems. △ Less

Submitted 28 February, 2023; originally announced February 2023.

arXiv:2302.11770 [pdf, other]

doi 10.1073/pnas.2303115120

Resolving the binding-kinase discrepancy in bacterial chemotaxis: A nonequilibrium allosteric model and the role of energy dissipation

Authors: David Hathcock, Qiwei Yu, Bernardo A. Mello, Divya N. Amin, Gerald L. Hazelbauer, Yuhai Tu

Abstract: The Escherichia coli chemotaxis signaling pathway has served as a model system for studying the adaptive sensing of environmental signals by large protein complexes. The chemoreceptors control the kinase activity of CheA in response to the extracellular ligand concentration and adapt across a wide concentration range by undergoing methylation and demethylation. Methylation shifts the kinase respon… ▽ More The Escherichia coli chemotaxis signaling pathway has served as a model system for studying the adaptive sensing of environmental signals by large protein complexes. The chemoreceptors control the kinase activity of CheA in response to the extracellular ligand concentration and adapt across a wide concentration range by undergoing methylation and demethylation. Methylation shifts the kinase response curve by orders of magnitude in ligand concentration while incurring a much smaller change in the ligand binding curve. Here, we show that this asymmetric shift in binding and kinase response is inconsistent with equilibrium allosteric models regardless of parameter choices. To resolve this inconsistency, we present a nonequilibrium allosteric model that explicitly includes the dissipative reaction cycles driven by ATP hydrolysis. The model successfully explains all existing measurements for both aspartate and serine receptors. Our results suggest that while ligand binding controls the equilibrium balance between the ON and OFF states of the kinase, receptor methylation modulates the kinetic properties (e.g., the phosphorylation rate) of the ON state. Furthermore, sufficient energy dissipation is necessary for maintaining and enhancing the sensitivity range and amplitude of the kinase response. We demonstrate that the nonequilibrium allosteric model is broadly applicable to other sensor-kinase systems by successfully fitting previously unexplained data from the DosP bacterial oxygen-sensing system. Overall, this work provides a new perspective on cooperative sensing by large protein complexes and opens up new research directions for understanding their microscopic mechanisms through simultaneous measurements and modeling of ligand binding and downstream responses. △ Less

Submitted 22 February, 2023; originally announced February 2023.

Comments: 12 (main text) + 4 (supplemental information) pages, 6+4 figures

Journal ref: Proc. Natl. Acad. Sci. U.S.A. 120, e2303115120 (2023)

arXiv:2302.10473 [pdf, other]

Oriented Object Detection in Optical Remote Sensing Images using Deep Learning: A Survey

Authors: Kun Wang, Zi Wang, Zhang Li, Ang Su, Xichao Teng, Minhao Liu, Qifeng Yu

Abstract: Oriented object detection is one of the most fundamental and challenging tasks in remote sensing, aiming to locate and classify objects with arbitrary orientations. Recent years have witnessed remarkable progress in oriented object detection using deep learning techniques. Given the rapid development of this field, this paper aims to provide a comprehensive survey of recent advances in oriented ob… ▽ More Oriented object detection is one of the most fundamental and challenging tasks in remote sensing, aiming to locate and classify objects with arbitrary orientations. Recent years have witnessed remarkable progress in oriented object detection using deep learning techniques. Given the rapid development of this field, this paper aims to provide a comprehensive survey of recent advances in oriented object detection. To be specific, we first review the technical evolution from horizontal object detection to oriented object detection and summarize the specific challenges, including feature misalignment, spatial misalignment, and periodicity of angle. Subsequently, we further categorize existing methods into detection framework, oriented bounding box (OBB) regression, and feature representations, and discuss how these methods address the above challenges in detail. In addition, we cover several publicly available datasets and performance evaluation protocols. Furthermore, we provide a comprehensive comparison and analysis of state-of-the-art oriented object detection methods. Toward the end of this paper, we discuss several future directions for oriented object detection. △ Less

Submitted 9 April, 2024; v1 submitted 21 February, 2023; originally announced February 2023.

arXiv:2302.09821 [pdf, other]

doi 10.1088/1572-9494/ace4b3

Thermodynamics and Microstructures of Euler-Heisenberg Black Hole in a Cavity

Authors: Qin Yu, Qi Xu, Jun Tao

Abstract: The Euler-Heisenberg black holes with quantum electrodynamics (QED) correction are embraced by a cavity in this paper, which serves as a boundary of the black hole spacetime and contributes to the equilibrium of the system. We explore the thermodynamic properties of the black hole, including the phase transitions and phase structures. The small/large black hole phase transition occurs for a negati… ▽ More The Euler-Heisenberg black holes with quantum electrodynamics (QED) correction are embraced by a cavity in this paper, which serves as a boundary of the black hole spacetime and contributes to the equilibrium of the system. We explore the thermodynamic properties of the black hole, including the phase transitions and phase structures. The small/large black hole phase transition occurs for a negative QED parameter, while the reentrant phase transition can be observed for a small positive QED parameter. Then the thermodynamic geometry is investigated to diagnose microscopic interactions of black hole thermodynamic systems. For the reentrant phase transition, the small black holes are dominated by repulsion for the first-order coexistence curve, while the interaction between the small black hole molecules could be attractive or repulsive for the small/large black hole phase transition. △ Less

Submitted 20 February, 2023; originally announced February 2023.

Comments: 26 pages, 10 figures

Journal ref: 2023 Commun. Theor. Phys. 75 095402

arXiv:2302.08888 [pdf, other]

Multimodal Federated Learning via Contrastive Representation Ensemble

Authors: Qiying Yu, Yang Liu, Yimu Wang, Ke Xu, Jingjing Liu

Abstract: With the increasing amount of multimedia data on modern mobile systems and IoT infrastructures, harnessing these rich multimodal data without breaching user privacy becomes a critical issue. Federated learning (FL) serves as a privacy-conscious alternative to centralized machine learning. However, existing FL methods extended to multimodal data all rely on model aggregation on single modality leve… ▽ More With the increasing amount of multimedia data on modern mobile systems and IoT infrastructures, harnessing these rich multimodal data without breaching user privacy becomes a critical issue. Federated learning (FL) serves as a privacy-conscious alternative to centralized machine learning. However, existing FL methods extended to multimodal data all rely on model aggregation on single modality level, which restrains the server and clients to have identical model architecture for each modality. This limits the global model in terms of both model complexity and data capacity, not to mention task diversity. In this work, we propose Contrastive Representation Ensemble and Aggregation for Multimodal FL (CreamFL), a multimodal federated learning framework that enables training larger server models from clients with heterogeneous model architectures and data modalities, while only communicating knowledge on public dataset. To achieve better multimodal representation fusion, we design a global-local cross-modal ensemble strategy to aggregate client representations. To mitigate local model drift caused by two unprecedented heterogeneous factors stemming from multimodal discrepancy (modality gap and task gap), we further propose two inter-modal and intra-modal contrasts to regularize local training, which complements information of the absent modality for uni-modal clients and regularizes local clients to head towards global consensus. Thorough evaluations and ablation studies on image-text retrieval and visual question answering tasks showcase the superiority of CreamFL over state-of-the-art FL methods and its practical value. △ Less

Submitted 5 May, 2023; v1 submitted 17 February, 2023; originally announced February 2023.

Comments: ICLR 2023, update

arXiv:2302.08092 [pdf, other]

Product Question Answering in E-Commerce: A Survey

Authors: Yang Deng, Wenxuan Zhang, Qian Yu, Wai Lam

Abstract: Product question answering (PQA), aiming to automatically provide instant responses to customer's questions in E-Commerce platforms, has drawn increasing attention in recent years. Compared with typical QA problems, PQA exhibits unique challenges such as the subjectivity and reliability of user-generated contents in E-commerce platforms. Therefore, various problem settings and novel methods have b… ▽ More Product question answering (PQA), aiming to automatically provide instant responses to customer's questions in E-Commerce platforms, has drawn increasing attention in recent years. Compared with typical QA problems, PQA exhibits unique challenges such as the subjectivity and reliability of user-generated contents in E-commerce platforms. Therefore, various problem settings and novel methods have been proposed to capture these special characteristics. In this paper, we aim to systematically review existing research efforts on PQA. Specifically, we categorize PQA studies into four problem settings in terms of the form of provided answers. We analyze the pros and cons, as well as present existing datasets and evaluation protocols for each setting. We further summarize the most significant challenges that characterize PQA from general QA applications and discuss their corresponding solutions. Finally, we conclude this paper by providing the prospect on several future directions. △ Less

Submitted 3 May, 2023; v1 submitted 16 February, 2023; originally announced February 2023.

Comments: Accepted by ACL 2023 main conference

arXiv:2302.06077 [pdf, ps, other]

Derivative of self-intersection local time for multidimensional fractional Brownian motion

Authors: Qian Yu, Xianye Yu

Abstract: The existence condition $H<1/d$ for first-order derivative of self-intersection local time for $d\geq3$ dimensional fractional Brownian motion can be obtained in Yu (2021). In this paper, we show a limit theorem under the non-existence critical condition $H=1/d$. The existence condition $H<1/d$ for first-order derivative of self-intersection local time for $d\geq3$ dimensional fractional Brownian motion can be obtained in Yu (2021). In this paper, we show a limit theorem under the non-existence critical condition $H=1/d$. △ Less

Submitted 12 February, 2023; originally announced February 2023.

Comments: 15 pages. arXiv admin note: substantial text overlap with arXiv:2008.05633

arXiv:2302.05031 [pdf, other]

Feature Decomposition for Reducing Negative Transfer: A Novel Multi-task Learning Method for Recommender System

Authors: Jie Zhou, Qian Yu, Chuan Luo, Jing Zhang

Abstract: In recent years, thanks to the rapid development of deep learning (DL), DL-based multi-task learning (MTL) has made significant progress, and it has been successfully applied to recommendation systems (RS). However, in a recommender system, the correlations among the involved tasks are complex. Therefore, the existing MTL models designed for RS suffer from negative transfer to different degrees, w… ▽ More In recent years, thanks to the rapid development of deep learning (DL), DL-based multi-task learning (MTL) has made significant progress, and it has been successfully applied to recommendation systems (RS). However, in a recommender system, the correlations among the involved tasks are complex. Therefore, the existing MTL models designed for RS suffer from negative transfer to different degrees, which will injure optimization in MTL. We find that the root cause of negative transfer is feature redundancy that features learned for different tasks interfere with each other. To alleviate the issue of negative transfer, we propose a novel multi-task learning method termed Feature Decomposition Network (FDN). The key idea of the proposed FDN is reducing the phenomenon of feature redundancy by explicitly decomposing features into task-specific features and task-shared features with carefully designed constraints. We demonstrate the effectiveness of the proposed method on two datasets, a synthetic dataset and a public datasets (i.e., Ali-CCP). Experimental results show that our proposed FDN can outperform the state-of-the-art (SOTA) methods by a noticeable margin. △ Less

Submitted 9 February, 2023; originally announced February 2023.

Comments: This paper has been accepted by AAAI-23

arXiv:2302.02371 [pdf, other]

Model-free Quantum Gate Design and Calibration using Deep Reinforcement Learning

Authors: Omar Shindi, Qi Yu, Parth Girdhar, Daoyi Dong

Abstract: High-fidelity quantum gate design is important for various quantum technologies, such as quantum computation and quantum communication. Numerous control policies for quantum gate design have been proposed given a dynamical model of the quantum system of interest. However, a quantum system is often highly sensitive to noise, and obtaining its accurate modeling can be difficult for many practical ap… ▽ More High-fidelity quantum gate design is important for various quantum technologies, such as quantum computation and quantum communication. Numerous control policies for quantum gate design have been proposed given a dynamical model of the quantum system of interest. However, a quantum system is often highly sensitive to noise, and obtaining its accurate modeling can be difficult for many practical applications. Thus, the control policy based on a quantum system model may be unpractical for quantum gate design. Also, quantum measurements collapse quantum states, which makes it challenging to obtain information through measurements during the control process. In this paper, we propose a novel training framework using deep reinforcement learning for model-free quantum control. The proposed framework relies only on the measurement at the end of the control process and offers the ability to find the optimal control policy without access to quantum systems during the learning process. The effectiveness of the proposed technique is numerically demonstrated for model-free quantum gate design and quantum gate calibration using off-policy reinforcement learning algorithms. △ Less

Submitted 7 February, 2023; v1 submitted 5 February, 2023; originally announced February 2023.

Comments: 12 pages, 17 figures, accepted for publication in the IEEE Transactions on Artificial Intelligence, in press

arXiv:2302.01478 [pdf, other]

Clustered Embedding Learning for Recommender Systems

Authors: Yizhou Chen, Guangda Huzhang, Anxiang Zeng, Qingtao Yu, Hui Sun, Heng-yi Li, Jingyi Li, Yabo Ni, Han Yu, Zhiming Zhou

Abstract: In recent years, recommender systems have advanced rapidly, where embedding learning for users and items plays a critical role. A standard method learns a unique embedding vector for each user and item. However, such a method has two important limitations in real-world applications: 1) it is hard to learn embeddings that generalize well for users and items with rare interactions on their own; and… ▽ More In recent years, recommender systems have advanced rapidly, where embedding learning for users and items plays a critical role. A standard method learns a unique embedding vector for each user and item. However, such a method has two important limitations in real-world applications: 1) it is hard to learn embeddings that generalize well for users and items with rare interactions on their own; and 2) it may incur unbearably high memory costs when the number of users and items scales up. Existing approaches either can only address one of the limitations or have flawed overall performances. In this paper, we propose Clustered Embedding Learning (CEL) as an integrated solution to these two problems. CEL is a plug-and-play embedding learning framework that can be combined with any differentiable feature interaction model. It is capable of achieving improved performance, especially for cold users and items, with reduced memory cost. CEL enables automatic and dynamic clustering of users and items in a top-down fashion, where clustered entities jointly learn a shared embedding. The accelerated version of CEL has an optimal time complexity, which supports efficient online updates. Theoretically, we prove the identifiability and the existence of a unique optimal number of clusters for CEL in the context of nonnegative matrix factorization. Empirically, we validate the effectiveness of CEL on three public datasets and one business dataset, showing its consistently superior performance against current state-of-the-art methods. In particular, when incorporating CEL into the business model, it brings an improvement of $+0.6\%$ in AUC, which translates into a significant revenue gain; meanwhile, the size of the embedding table gets $2650$ times smaller. △ Less

Submitted 10 February, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

arXiv:2301.12291 [pdf, other]

CancerUniT: Towards a Single Unified Model for Effective Detection, Segmentation, and Diagnosis of Eight Major Cancers Using a Large Collection of CT Scans

Authors: Jieneng Chen, Yingda Xia, Jiawen Yao, Ke Yan, Jianpeng Zhang, Le Lu, Fakai Wang, Bo Zhou, Mingyan Qiu, Qihang Yu, Mingze Yuan, Wei Fang, Yuxing Tang, Minfeng Xu, Jian Zhou, Yuqian Zhao, Qifeng Wang, Xianghua Ye, Xiaoli Yin, Yu Shi, Xin Chen, Jingren Zhou, Alan Yuille, Zaiyi Liu, Ling Zhang

Abstract: Human readers or radiologists routinely perform full-body multi-organ multi-disease detection and diagnosis in clinical practice, while most medical AI systems are built to focus on single organs with a narrow list of a few diseases. This might severely limit AI's clinical adoption. A certain number of AI models need to be assembled non-trivially to match the diagnostic process of a human reading… ▽ More Human readers or radiologists routinely perform full-body multi-organ multi-disease detection and diagnosis in clinical practice, while most medical AI systems are built to focus on single organs with a narrow list of a few diseases. This might severely limit AI's clinical adoption. A certain number of AI models need to be assembled non-trivially to match the diagnostic process of a human reading a CT scan. In this paper, we construct a Unified Tumor Transformer (CancerUniT) model to jointly detect tumor existence & location and diagnose tumor characteristics for eight major cancers in CT scans. CancerUniT is a query-based Mask Transformer model with the output of multi-tumor prediction. We decouple the object queries into organ queries, tumor detection queries and tumor diagnosis queries, and further establish hierarchical relationships among the three groups. This clinically-inspired architecture effectively assists inter- and intra-organ representation learning of tumors and facilitates the resolution of these complex, anatomically related multi-organ cancer image reading tasks. CancerUniT is trained end-to-end using a curated large-scale CT images of 10,042 patients including eight major types of cancers and occurring non-cancer tumors (all are pathology-confirmed with 3D tumor masks annotated by radiologists). On the test set of 631 patients, CancerUniT has demonstrated strong performance under a set of clinically relevant evaluation metrics, substantially outperforming both multi-disease methods and an assembly of eight single-organ expert models in tumor detection, segmentation, and diagnosis. This moves one step closer towards a universal high performance cancer screening tool. △ Less

Submitted 6 October, 2023; v1 submitted 28 January, 2023; originally announced January 2023.

Comments: ICCV 2023 Camera Ready Version

arXiv:2301.07085 [pdf, other]

Are Language Models Worse than Humans at Following Prompts? It's Complicated

Authors: Albert Webson, Alyssa Marie Loo, Qinan Yu, Ellie Pavlick

Abstract: Prompts have been the center of progress in advancing language models' zero-shot and few-shot performance. However, recent work finds that models can perform surprisingly well when given intentionally irrelevant or misleading prompts. Such results may be interpreted as evidence that model behavior is not "human like". In this study, we challenge a central assumption in such work: that humans would… ▽ More Prompts have been the center of progress in advancing language models' zero-shot and few-shot performance. However, recent work finds that models can perform surprisingly well when given intentionally irrelevant or misleading prompts. Such results may be interpreted as evidence that model behavior is not "human like". In this study, we challenge a central assumption in such work: that humans would perform badly when given pathological instructions. We find that humans are able to reliably ignore irrelevant instructions and thus, like models, perform well on the underlying task despite an apparent lack of signal regarding the task they are being asked to do. However, when given deliberately misleading instructions, humans follow the instructions faithfully, whereas models do not. Our findings caution that future research should not idealize human behaviors as a monolith and should not train or evaluate models to mimic assumptions about these behaviors without first validating humans' behaviors empirically. △ Less

Submitted 11 November, 2023; v1 submitted 17 January, 2023; originally announced January 2023.

Comments: EMNLP 2023

arXiv:2301.05931 [pdf, other]

Drug Synergistic Combinations Predictions via Large-Scale Pre-Training and Graph Structure Learning

Authors: Zhihang Hu, Qinze Yu, Yucheng Guo, Taifeng Wang, Irwin King, Xin Gao, Le Song, Yu Li

Abstract: Drug combination therapy is a well-established strategy for disease treatment with better effectiveness and less safety degradation. However, identifying novel drug combinations through wet-lab experiments is resource intensive due to the vast combinatorial search space. Recently, computational approaches, specifically deep learning models have emerged as an efficient way to discover synergistic c… ▽ More Drug combination therapy is a well-established strategy for disease treatment with better effectiveness and less safety degradation. However, identifying novel drug combinations through wet-lab experiments is resource intensive due to the vast combinatorial search space. Recently, computational approaches, specifically deep learning models have emerged as an efficient way to discover synergistic combinations. While previous methods reported fair performance, their models usually do not take advantage of multi-modal data and they are unable to handle new drugs or cell lines. In this study, we collected data from various datasets covering various drug-related aspects. Then, we take advantage of large-scale pre-training models to generate informative representations and features for drugs, proteins, and diseases. Based on that, a message-passing graph is built on top to propagate information together with graph structure learning flexibility. This is first introduced in the biological networks and enables us to generate pseudo-relations in the graph. Our framework achieves state-of-the-art results in comparison with other deep learning-based methods on synergistic prediction benchmark datasets. We are also capable of inferencing new drug combination data in a test on an independent set released by AstraZeneca, where 10% of improvement over previous methods is observed. In addition, we're robust against unseen drugs and surpass almost 15% AU ROC compared to the second-best model. We believe our framework contributes to both the future wet-lab discovery of novel drugs and the building of promising guidance for precise combination medicine. △ Less

Submitted 14 January, 2023; originally announced January 2023.

arXiv:2301.04195 [pdf, other]

doi 10.1109/LRA.2023.3270034

Orbit: A Unified Simulation Framework for Interactive Robot Learning Environments

Authors: Mayank Mittal, Calvin Yu, Qinxi Yu, Jingzhou Liu, Nikita Rudin, David Hoeller, Jia Lin Yuan, Ritvik Singh, Yunrong Guo, Hammad Mazhar, Ajay Mandlekar, Buck Babich, Gavriel State, Marco Hutter, Animesh Garg

Abstract: We present Orbit, a unified and modular framework for robot learning powered by NVIDIA Isaac Sim. It offers a modular design to easily and efficiently create robotic environments with photo-realistic scenes and high-fidelity rigid and deformable body simulation. With Orbit, we provide a suite of benchmark tasks of varying difficulty -- from single-stage cabinet opening and cloth folding to multi-s… ▽ More We present Orbit, a unified and modular framework for robot learning powered by NVIDIA Isaac Sim. It offers a modular design to easily and efficiently create robotic environments with photo-realistic scenes and high-fidelity rigid and deformable body simulation. With Orbit, we provide a suite of benchmark tasks of varying difficulty -- from single-stage cabinet opening and cloth folding to multi-stage tasks such as room reorganization. To support working with diverse observations and action spaces, we include fixed-arm and mobile manipulators with different physically-based sensors and motion generators. Orbit allows training reinforcement learning policies and collecting large demonstration datasets from hand-crafted or expert solutions in a matter of minutes by leveraging GPU-based parallelization. In summary, we offer an open-sourced framework that readily comes with 16 robotic platforms, 4 sensor modalities, 10 motion generators, more than 20 benchmark tasks, and wrappers to 4 learning libraries. With this framework, we aim to support various research areas, including representation learning, reinforcement learning, imitation learning, and task and motion planning. We hope it helps establish interdisciplinary collaborations in these communities, and its modularity makes it easily extensible for more tasks and applications in the future. △ Less

Submitted 16 February, 2024; v1 submitted 10 January, 2023; originally announced January 2023.

Comments: Project website: https://isaac-orbit.github.io/

Journal ref: IEEE Robotics and Automation Letters (Volume: 8, Issue: 6, June 2023)

arXiv:2212.12103 [pdf, other]

Bridging the Domain Gap in Satellite Pose Estimation: a Self-Training Approach based on Geometrical Constraints

Authors: Zi Wang, Minglin Chen, Yulan Guo, Zhang Li, Qifeng Yu

Abstract: Recently, unsupervised domain adaptation in satellite pose estimation has gained increasing attention, aiming at alleviating the annotation cost for training deep models. To this end, we propose a self-training framework based on the domain-agnostic geometrical constraints. Specifically, we train a neural network to predict the 2D keypoints of a satellite and then use PnP to estimate the pose. The… ▽ More Recently, unsupervised domain adaptation in satellite pose estimation has gained increasing attention, aiming at alleviating the annotation cost for training deep models. To this end, we propose a self-training framework based on the domain-agnostic geometrical constraints. Specifically, we train a neural network to predict the 2D keypoints of a satellite and then use PnP to estimate the pose. The poses of target samples are regarded as latent variables to formulate the task as a minimization problem. Furthermore, we leverage fine-grained segmentation to tackle the information loss issue caused by abstracting the satellite as sparse keypoints. Finally, we iteratively solve the minimization problem in two steps: pseudo-label generation and network training. Experimental results show that our method adapts well to the target domain. Moreover, our method won the 1st place on the sunlamp task of the second international Satellite Pose Estimation Competition. △ Less

Submitted 22 December, 2022; originally announced December 2022.

Comments: 11 pages, 5 figures. Submitted to IEEE TAES, major revision

arXiv:2212.10537 [pdf, other]

Does CLIP Bind Concepts? Probing Compositionality in Large Image Models

Authors: Martha Lewis, Nihal V. Nayak, Peilin Yu, Qinan Yu, Jack Merullo, Stephen H. Bach, Ellie Pavlick

Abstract: Large-scale neural network models combining text and images have made incredible progress in recent years. However, it remains an open question to what extent such models encode compositional representations of the concepts over which they operate, such as correctly identifying ''red cube'' by reasoning over the constituents ''red'' and ''cube''. In this work, we focus on the ability of a large pr… ▽ More Large-scale neural network models combining text and images have made incredible progress in recent years. However, it remains an open question to what extent such models encode compositional representations of the concepts over which they operate, such as correctly identifying ''red cube'' by reasoning over the constituents ''red'' and ''cube''. In this work, we focus on the ability of a large pretrained vision and language model (CLIP) to encode compositional concepts and to bind variables in a structure-sensitive way (e.g., differentiating ''cube behind sphere'' from ''sphere behind cube''). In order to inspect the performance of CLIP, we compare several architectures from research on compositional distributional semantics models (CDSMs), a line of research that attempts to implement traditional compositional linguistic structures within embedding spaces. We find that CLIP can compose concepts in a single-object setting, but in situations where concept binding is needed, performance drops dramatically. At the same time, CDSMs also perform poorly, with best performance at chance level. △ Less

Submitted 29 March, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

arXiv:2212.10441 [pdf, other]

First CE Matters: On the Importance of Long Term Properties on Memory Failure Prediction

Authors: Jasmin Bogatinovski, Qiao Yu, Jorge Cardoso, Odej Kao

Abstract: Dynamic random access memory failures are a threat to the reliability of data centres as they lead to data loss and system crashes. Timely predictions of memory failures allow for taking preventive measures such as server migration and memory replacement. Thereby, memory failure prediction prevents failures from externalizing, and it is a vital task to improve system reliability. In this paper, we… ▽ More Dynamic random access memory failures are a threat to the reliability of data centres as they lead to data loss and system crashes. Timely predictions of memory failures allow for taking preventive measures such as server migration and memory replacement. Thereby, memory failure prediction prevents failures from externalizing, and it is a vital task to improve system reliability. In this paper, we revisited the problem of memory failure prediction. We analyzed the correctable errors (CEs) from hardware logs as indicators for a degraded memory state. As memories do not always work with full occupancy, access to faulty memory parts is time distributed. Following this intuition, we observed that important properties for memory failure prediction are distributed through long time intervals. In contrast, related studies, to fit practical constraints, frequently only analyze the CEs from the last fixed-size time interval while ignoring the predating information. Motivated by the observed discrepancy, we study the impact of including the overall (long-range) CE evolution and propose novel features that are calculated incrementally to preserve long-range properties. By coupling the extracted features with machine learning methods, we learn a predictive model to anticipate upcoming failures three hours in advance while improving the average relative precision and recall for 21% and 19% accordingly. We evaluated our methodology on real-world memory failures from the server fleet of a large cloud provider, justifying its validity and practicality. △ Less

Submitted 21 November, 2022; originally announced December 2022.

Comments: This paper is accepted to appear in the proceedings of IEEE Big Data 2022. All publishing licenses belong to IEEE

arXiv:2212.09613 [pdf, other]

Model Predictive Spherical Image-Based Visual Servoing On $SO(3)$ for Aggressive Aerial Tracking

Authors: Chao Qin, Qiuyu Yu, Hugh H. T. Liu

Abstract: This paper presents an image-based visual servo control (IBVS) method for a first-person-view (FPV) quadrotor to conduct aggressive aerial tracking. There are three major challenges to maneuvering an underactuated vehicle using IBVS: (i) finding a visual feature representation that is robust to large rotations and is suited to be an optimization variable; (ii) keeping the target visible without sa… ▽ More This paper presents an image-based visual servo control (IBVS) method for a first-person-view (FPV) quadrotor to conduct aggressive aerial tracking. There are three major challenges to maneuvering an underactuated vehicle using IBVS: (i) finding a visual feature representation that is robust to large rotations and is suited to be an optimization variable; (ii) keeping the target visible without sacrificing the robot's agility; and (iii) compensating for the rotational effects in the detected features. We propose a complete design framework to address these problems. First, we employ a rotation on $SO(3)$ to represent a spherical image feature on $S^{2}$ to gain singularity-free and second-order differentiable properties. To ensure target visibility, we formulate the IBVS as a nonlinear model predictive control (NMPC) problem with three constraints taken into account: the robot's physical limits, target visibility, and time-to-collision (TTC). Furthermore, we propose a novel attitude-compensation scheme to enable formulating the visibility constraint in the actual image plane instead of a virtual fix-orientation image plane. It guarantees that the visibility constraint is valid under large rotations. Extensive experimental results show that our method can track a fast-moving target stably and aggressively without the aid of a localization system. △ Less

Submitted 19 December, 2022; originally announced December 2022.

arXiv:2212.00131 [pdf, other]

Evidential Conditional Neural Processes

Authors: Deep Shankar Pandey, Qi Yu

Abstract: The Conditional Neural Process (CNP) family of models offer a promising direction to tackle few-shot problems by achieving better scalability and competitive predictive performance. However, the current CNP models only capture the overall uncertainty for the prediction made on a target data point. They lack a systematic fine-grained quantification on the distinct sources of uncertainty that are es… ▽ More The Conditional Neural Process (CNP) family of models offer a promising direction to tackle few-shot problems by achieving better scalability and competitive predictive performance. However, the current CNP models only capture the overall uncertainty for the prediction made on a target data point. They lack a systematic fine-grained quantification on the distinct sources of uncertainty that are essential for model training and decision-making under the few-shot setting. We propose Evidential Conditional Neural Processes (ECNP), which replace the standard Gaussian distribution used by CNP with a much richer hierarchical Bayesian structure through evidential learning to achieve epistemic-aleatoric uncertainty decomposition. The evidential hierarchical structure also leads to a theoretically justified robustness over noisy training tasks. Theoretical analysis on the proposed ECNP establishes the relationship with CNP while offering deeper insights on the roles of the evidential parameters. Extensive experiments conducted on both synthetic and real-world data demonstrate the effectiveness of our proposed model in various few-shot settings. △ Less

Submitted 30 November, 2022; originally announced December 2022.

Comments: To appear in AAAI2023 Conference

arXiv:2211.15425 [pdf]

FAF: A novel multimodal emotion recognition approach integrating face, body and text

Authors: Zhongyu Fang, Aoyun He, Qihui Yu, Baopeng Gao, Weiping Ding, Tong Zhang, Lei Ma

Abstract: Multimodal emotion analysis performed better in emotion recognition depending on more comprehensive emotional clues and multimodal emotion dataset. In this paper, we developed a large multimodal emotion dataset, named "HED" dataset, to facilitate the emotion recognition task, and accordingly propose a multimodal emotion recognition method. To promote recognition accuracy, "Feature After Feature" f… ▽ More Multimodal emotion analysis performed better in emotion recognition depending on more comprehensive emotional clues and multimodal emotion dataset. In this paper, we developed a large multimodal emotion dataset, named "HED" dataset, to facilitate the emotion recognition task, and accordingly propose a multimodal emotion recognition method. To promote recognition accuracy, "Feature After Feature" framework was used to explore crucial emotional information from the aligned face, body and text samples. We employ various benchmarks to evaluate the "HED" dataset and compare the performance with our method. The results show that the five classification accuracy of the proposed multimodal fusion method is about 83.75%, and the performance is improved by 1.83%, 9.38%, and 21.62% respectively compared with that of individual modalities. The complementarity between each channel is effectively used to improve the performance of emotion recognition. We had also established a multimodal online emotion prediction platform, aiming to provide free emotion prediction to more users. △ Less

Submitted 20 November, 2022; originally announced November 2022.

arXiv:2211.15242 [pdf, other]

Ising Model on Locally Tree-like Graphs: Uniqueness of Solutions to Cavity Equations

Authors: Qian Yu, Yury Polyanskiy

Abstract: In the study of Ising models on large locally tree-like graphs, in both rigorous and non-rigorous methods one is often led to understanding the so-called belief propagation distributional recursions and its fixed points. We prove that there is at most one non-trivial fixed point for Ising models with zero or certain random external fields. Previously this was only known for sufficiently ``low-temp… ▽ More In the study of Ising models on large locally tree-like graphs, in both rigorous and non-rigorous methods one is often led to understanding the so-called belief propagation distributional recursions and its fixed points. We prove that there is at most one non-trivial fixed point for Ising models with zero or certain random external fields. Previously this was only known for sufficiently ``low-temperature'' models. Our main innovation is in applying information-theoretic ideas of channel comparison leading to a new metric (degradation index) between binary-input-symmetric (BMS) channels under which the Belief Propagation (BP) operator is a strict contraction (albeit non-multiplicative). A key ingredient of our proof is a strengthening of the classical stringy tree lemma of (Evans-Kenyon-Peres-Schulman'00). Our result simultaneously closes the following 6 conjectures in the literature: 1) independence of robust reconstruction accuracy to leaf noise in broadcasting on trees (Mossel-Neeman-Sly'16); 2) uselessness of global information for a labeled 2-community stochastic block model, or 2-SBM (Kanade-Mossel-Schramm'16); 3) optimality of local algorithms for 2-SBM under noisy side information (Mossel-Xu'16); 4) uniqueness of BP fixed point in broadcasting on trees in the Gaussian (large degree) limit (ibid); 5) boundary irrelevance in broadcasting on trees (Abbe-Cornacchia-Gu-Polyanskiy'21); 6) characterization of entropy (and mutual information) of community labels given the graph in 2-SBM (ibid). △ Less

Submitted 31 July, 2023; v1 submitted 28 November, 2022; originally announced November 2022.

arXiv:2211.13702 [pdf, other]

CasFusionNet: A Cascaded Network for Point Cloud Semantic Scene Completion by Dense Feature Fusion

Authors: Jinfeng Xu, Xianzhi Li, Yuan Tang, Qiao Yu, Yixue Hao, Long Hu, Min Chen

Abstract: Semantic scene completion (SSC) aims to complete a partial 3D scene and predict its semantics simultaneously. Most existing works adopt the voxel representations, thus suffering from the growth of memory and computation cost as the voxel resolution increases. Though a few works attempt to solve SSC from the perspective of 3D point clouds, they have not fully exploited the correlation and complemen… ▽ More Semantic scene completion (SSC) aims to complete a partial 3D scene and predict its semantics simultaneously. Most existing works adopt the voxel representations, thus suffering from the growth of memory and computation cost as the voxel resolution increases. Though a few works attempt to solve SSC from the perspective of 3D point clouds, they have not fully exploited the correlation and complementarity between the two tasks of scene completion and semantic segmentation. In our work, we present CasFusionNet, a novel cascaded network for point cloud semantic scene completion by dense feature fusion. Specifically, we design (i) a global completion module (GCM) to produce an upsampled and completed but coarse point set, (ii) a semantic segmentation module (SSM) to predict the per-point semantic labels of the completed points generated by GCM, and (iii) a local refinement module (LRM) to further refine the coarse completed points and the associated labels from a local perspective. We organize the above three modules via dense feature fusion in each level, and cascade a total of four levels, where we also employ feature fusion between each level for sufficient information usage. Both quantitative and qualitative results on our compiled two point-based datasets validate the effectiveness and superiority of our CasFusionNet compared to state-of-the-art methods in terms of both scene completion and semantic segmentation. The codes and datasets are available at: https://github.com/JinfengX/CasFusionNet. △ Less

Submitted 24 November, 2022; originally announced November 2022.

arXiv:2211.11324 [pdf, other]

doi 10.1109/TCSVT.2022.3201540

Slow Motion Matters: A Slow Motion Enhanced Network for Weakly Supervised Temporal Action Localization

Authors: Weiqi Sun, Rui Su, Qian Yu, Dong Xu

Abstract: Weakly supervised temporal action localization (WTAL) aims to localize actions in untrimmed videos with only weak supervision information (e.g. video-level labels). Most existing models handle all input videos with a fixed temporal scale. However, such models are not sensitive to actions whose pace of the movements is different from the ``normal" speed, especially slow-motion action instances, whi… ▽ More Weakly supervised temporal action localization (WTAL) aims to localize actions in untrimmed videos with only weak supervision information (e.g. video-level labels). Most existing models handle all input videos with a fixed temporal scale. However, such models are not sensitive to actions whose pace of the movements is different from the ``normal" speed, especially slow-motion action instances, which complete the movements with a much slower speed than their counterparts with a normal speed. Here arises the slow-motion blurred issue: It is hard to explore salient slow-motion information from videos at ``normal" speed. In this paper, we propose a novel framework termed Slow Motion Enhanced Network (SMEN) to improve the ability of a WTAL network by compensating its sensitivity on slow-motion action segments. The proposed SMEN comprises a Mining module and a Localization module. The mining module generates mask to mine slow-motion-related features by utilizing the relationships between the normal motion and slow motion; while the localization module leverages the mined slow-motion features as complementary information to improve the temporal action localization results. Our proposed framework can be easily adapted by existing WTAL networks and enable them be more sensitive to slow-motion actions. Extensive experiments on three benchmarks are conducted, which demonstrate the high performance of our proposed framework. △ Less

Submitted 21 November, 2022; originally announced November 2022.

Journal ref: IEEE Transactions on Circuits and Systems for Video Technology, 2022

arXiv:2211.07726 [pdf, other]

On Constrained Mixed-Integer DR-Submodular Minimization

Authors: Qimeng Yu, Simge Küçükyavuz

Abstract: DR-submodular functions encompass a broad class of functions which are generally non-convex and non-concave. We study the problem of minimizing any DR-submodular function, with continuous and general integer variables, under box constraints and possibly additional monotonicity constraints. We propose valid linear inequalities for the epigraph of any DR-submodular function under the constraints. We… ▽ More DR-submodular functions encompass a broad class of functions which are generally non-convex and non-concave. We study the problem of minimizing any DR-submodular function, with continuous and general integer variables, under box constraints and possibly additional monotonicity constraints. We propose valid linear inequalities for the epigraph of any DR-submodular function under the constraints. We further provide the complete convex hull of such an epigraph, which, surprisingly, turns out to be polyhedral. We propose a polynomial-time exact separation algorithm for our proposed valid inequalities, with which we first establish the polynomial-time solvability of this class of mixed-integer nonlinear optimization problems. △ Less

Submitted 5 September, 2023; v1 submitted 14 November, 2022; originally announced November 2022.

arXiv:2210.16152 [pdf, ps, other]

Limit laws for functionals of self-intersection symmetric alpha-stable processes

Authors: Minhao Hong, Qian Yu

Abstract: In this paper, we prove two limit laws for functionals of self-intersection symmetric alpha-stable processes with alpha\in(1,2). The results are obtained based on the method of moments, the sample configuration and the chaining argument introduced in (Nualart and Xu 2013) are employed. In this paper, we prove two limit laws for functionals of self-intersection symmetric alpha-stable processes with alpha\in(1,2). The results are obtained based on the method of moments, the sample configuration and the chaining argument introduced in (Nualart and Xu 2013) are employed. △ Less

Submitted 28 October, 2022; originally announced October 2022.

Comments: 18 pages

arXiv:2210.12681 [pdf, other]

Rethinking Rotation in Self-Supervised Contrastive Learning: Adaptive Positive or Negative Data Augmentation

Authors: Atsuyuki Miyai, Qing Yu, Daiki Ikami, Go Irie, Kiyoharu Aizawa

Abstract: Rotation is frequently listed as a candidate for data augmentation in contrastive learning but seldom provides satisfactory improvements. We argue that this is because the rotated image is always treated as either positive or negative. The semantics of an image can be rotation-invariant or rotation-variant, so whether the rotated image is treated as positive or negative should be determined based… ▽ More Rotation is frequently listed as a candidate for data augmentation in contrastive learning but seldom provides satisfactory improvements. We argue that this is because the rotated image is always treated as either positive or negative. The semantics of an image can be rotation-invariant or rotation-variant, so whether the rotated image is treated as positive or negative should be determined based on the content of the image. Therefore, we propose a novel augmentation strategy, adaptive Positive or Negative Data Augmentation (PNDA), in which an original and its rotated image are a positive pair if they are semantically close and a negative pair if they are semantically different. To achieve PNDA, we first determine whether rotation is positive or negative on an image-by-image basis in an unsupervised way. Then, we apply PNDA to contrastive learning frameworks. Our experiments showed that PNDA improves the performance of contrastive learning. The code is available at \url{ https://github.com/AtsuMiyai/rethinking_rotation}. △ Less

Submitted 24 November, 2022; v1 submitted 23 October, 2022; originally announced October 2022.

Comments: Accepted at the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2023

arXiv:2210.04379 [pdf, other]

Unsupervised Domain Adaptive Fundus Image Segmentation with Few Labeled Source Data

Authors: Qianbi Yu, Dongnan Liu, Chaoyi Zhang, Xinwen Zhang, Weidong Cai

Abstract: Deep learning-based segmentation methods have been widely employed for automatic glaucoma diagnosis and prognosis. In practice, fundus images obtained by different fundus cameras vary significantly in terms of illumination and intensity. Although recent unsupervised domain adaptation (UDA) methods enhance the models' generalization ability on the unlabeled target fundus datasets, they always requi… ▽ More Deep learning-based segmentation methods have been widely employed for automatic glaucoma diagnosis and prognosis. In practice, fundus images obtained by different fundus cameras vary significantly in terms of illumination and intensity. Although recent unsupervised domain adaptation (UDA) methods enhance the models' generalization ability on the unlabeled target fundus datasets, they always require sufficient labeled data from the source domain, bringing auxiliary data acquisition and annotation costs. To further facilitate the data efficiency of the cross-domain segmentation methods on the fundus images, we explore UDA optic disc and cup segmentation problems using few labeled source data in this work. We first design a Searching-based Multi-style Invariant Mechanism to diversify the source data style as well as increase the data amount. Next, a prototype consistency mechanism on the foreground objects is proposed to facilitate the feature alignment for each kind of tissue under different image styles. Moreover, a cross-style self-supervised learning stage is further designed to improve the segmentation performance on the target images. Our method has outperformed several state-of-the-art UDA segmentation methods under the UDA fundus segmentation with few labeled source data. △ Less

Submitted 9 October, 2022; originally announced October 2022.

Comments: Accepted by The 33rd British Machine Vision Conference (BMVC) 2022

arXiv:2210.01820 [pdf, other]

MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models

Authors: Chenglin Yang, Siyuan Qiao, Qihang Yu, Xiaoding Yuan, Yukun Zhu, Alan Yuille, Hartwig Adam, Liang-Chieh Chen

Abstract: This paper presents MOAT, a family of neural networks that build on top of MObile convolution (i.e., inverted residual blocks) and ATtention. Unlike the current works that stack separate mobile convolution and transformer blocks, we effectively merge them into a MOAT block. Starting with a standard Transformer block, we replace its multi-layer perceptron with a mobile convolution block, and furthe… ▽ More This paper presents MOAT, a family of neural networks that build on top of MObile convolution (i.e., inverted residual blocks) and ATtention. Unlike the current works that stack separate mobile convolution and transformer blocks, we effectively merge them into a MOAT block. Starting with a standard Transformer block, we replace its multi-layer perceptron with a mobile convolution block, and further reorder it before the self-attention operation. The mobile convolution block not only enhances the network representation capacity, but also produces better downsampled features. Our conceptually simple MOAT networks are surprisingly effective, achieving 89.1% / 81.5% top-1 accuracy on ImageNet-1K / ImageNet-1K-V2 with ImageNet22K pretraining. Additionally, MOAT can be seamlessly applied to downstream tasks that require large resolution inputs by simply converting the global attention to window attention. Thanks to the mobile convolution that effectively exchanges local information between pixels (and thus cross-windows), MOAT does not need the extra window-shifting mechanism. As a result, on COCO object detection, MOAT achieves 59.2% box AP with 227M model parameters (single-scale inference, and hard NMS), and on ADE20K semantic segmentation, MOAT attains 57.6% mIoU with 496M model parameters (single-scale inference). Finally, the tiny-MOAT family, obtained by simply reducing the channel sizes, also surprisingly outperforms several mobile-specific transformer-based models on ImageNet. The tiny-MOAT family is also benchmarked on downstream tasks, serving as a baseline for the community. We hope our simple yet effective MOAT will inspire more seamless integration of convolution and self-attention. Code is publicly available. △ Less

Submitted 30 January, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

Comments: ICLR 2023. arXiv v2: add ImageNet-1K-V2, tiny-MOAT on COCO detection and ADE20K segmentation

arXiv:2209.15296 [pdf, other]

Wake Word Detection Based on Res2Net

Authors: Qiuchen Yu, Ruohua Zhou

Abstract: This letter proposes a new wake word detection system based on Res2Net. As a variant of ResNet, Res2Net was first applied to objection detection. Res2Net realizes multiple feature scales by increasing possible receptive fields. This multiple scaling mechanism significantly improves the detection ability of wake words with different durations. Compared with the ResNet-based model, Res2Net also sign… ▽ More This letter proposes a new wake word detection system based on Res2Net. As a variant of ResNet, Res2Net was first applied to objection detection. Res2Net realizes multiple feature scales by increasing possible receptive fields. This multiple scaling mechanism significantly improves the detection ability of wake words with different durations. Compared with the ResNet-based model, Res2Net also significantly reduces the model size and is more suitable for detecting wake words. The proposed system can determine the positions of wake words from the audio stream without any additional assistance. The proposed method is verified on the Mobvoi dataset containing two wake words. At a false alarm rate of 0.5 per hour, the system reduced the false rejection of the two wake words by more than 12% over prior works. △ Less

Submitted 30 September, 2022; originally announced September 2022.

arXiv:2209.13947 [pdf, ps, other]

$^{197}$Au($γ,\,xn;\,x\,=\,1\thicksim9$) Reaction Cross Section Measurements using Laser-Driven Ultra-Intense $γ$-Ray Source

Authors: D. Wu, H. Y. Lan, J. Y. Zhang, J. X. Liu, H. G. Lu, J. F. Lv, X. Z. Wu, H. Zhang, J. Cai, Q. Y. Ma, Y. H. Xia, Z. N. Wang, M. Z. Wang, Z. Y. Yang, X. L. Xu, Y. X. Geng, Y. Y. Zhao, C. Lin, W. J. Ma, J. Q. Yu, H. R. Wang, F. L. Liu, C. Y. He, B. Guo, P. Zhu , et al. (4 additional authors not shown)

Abstract: We present a new method for the measurements of photonuclear reaction flux-weighted average cross sections and isomeric ratios using a laser-driven bremsstrahlung $γ$-ray source. An ultra-bright ultra-fast 60$\,\thicksim\,$250 MeV bremsstrahlung $γ$-ray source was established using the 200 TW laser facility in the Compact Laser Plasma Accelerator Laboratory, Peking University, which could cover th… ▽ More We present a new method for the measurements of photonuclear reaction flux-weighted average cross sections and isomeric ratios using a laser-driven bremsstrahlung $γ$-ray source. An ultra-bright ultra-fast 60$\,\thicksim\,$250 MeV bremsstrahlung $γ$-ray source was established using the 200 TW laser facility in the Compact Laser Plasma Accelerator Laboratory, Peking University, which could cover the energy range from knocking out neutrons to producing pions. Stable quasi-monoenergetic electron beams were generated via laser wakefield acceleration with a charge of 300$\,\thicksim\,$600 pC per shot. The averaged $γ$-ray intensities ($\geqslant$8 MeV) were higher than 10$^{8}$ per shot and the instantaneous intensities can reach above 10$^{19}$ s$^{-1}$ with a duration time about 6.7 ps. $^{65}$Cu($γ,\,n$)$^{64}$Cu and $^{27}$Al($γ,\,x$)$^{24}$Na reactions were used as $γ$-ray flux monitors in the experiments. The flux-weighted average cross sections and isomeric ratios of $^{197}$Au($γ,\,xn;\,x\,=\,1\thicksim9$) reactions were analyzed through activation measurements. The results showed good agreement with previous works and proved this method to be accurate. The $^{197}$Au($γ,\,xn;\,x\,=\,7\thicksim\,9$) reaction cross sections were first achieved with the highest threshold energy of 71.410 MeV. Theoretical cross sections of TALYS 1.9 were calculated to compare with experiment results. This method offered a unique way of gaining insight into photonuclear reaction research, especially for short-lived isomers which extremely lack experimental data. △ Less

Submitted 23 November, 2023; v1 submitted 28 September, 2022; originally announced September 2022.

arXiv:2209.12141 [pdf, other]

doi 10.1038/s41550-022-01766-0

A dynamically discovered and characterized non-accreting neutron star -- M dwarf binary candidate

Authors: Tuan Yi, Wei-Min Gu, Zhi-Xiang Zhang, Ling-Lin Zheng, Mouyuan Sun, Junfeng Wang, Zhongrui Bai, Pei Wang, Jianfeng Wu, Yu Bai, Song Wang, Haotong Zhang, Yize Dong, Yong Shao, Xiang-Dong Li, Jia Zhang, Yang Huang, Fan Yang, Qingzheng Yu, Hui-Jun Mu, Jin-Bo Fu, Senyu Qi, Jing Guo, Xuan Fang, Chuanjie Zheng , et al. (4 additional authors not shown)

Abstract: Optical time-domain surveys can unveil and characterize exciting but less-explored non-accreting and/or non-beaming neutron stars (NS) in binaries. Here we report the discovery of such a NS candidate using the LAMOST spectroscopic survey. The candidate, designated LAMOST J112306.9+400736 (hereafter J1123), is in a single-lined spectroscopic binary containing an optically visible M star. The star's… ▽ More Optical time-domain surveys can unveil and characterize exciting but less-explored non-accreting and/or non-beaming neutron stars (NS) in binaries. Here we report the discovery of such a NS candidate using the LAMOST spectroscopic survey. The candidate, designated LAMOST J112306.9+400736 (hereafter J1123), is in a single-lined spectroscopic binary containing an optically visible M star. The star's large radial velocity variation and ellipsoidal variations indicate a relatively massive unseen companion. Utilizing follow-up spectroscopy from the Palomar 200-inch telescope and high-precision photometry from TESS, we measure a companion mass of $1.24_{-0.03}^{+0.03}~M_{\odot}$. Main-sequence stars with this mass are ruled out, leaving a NS or a massive white dwarf (WD). Although a massive WD cannot be ruled out, the lack of UV excess radiation from the companion supports the NS hypothesis. Deep radio observations with FAST yielded no detections of either pulsed or persistent emission. J1123 is not detected in numerous X-ray and gamma-ray surveys. These non-detections suggest that the NS candidate is not presently accreting and pulsing. Our work exemplifies the capability of discovering compact objects in non-accreting close binaries by synergizing the optical time-domain spectroscopy and high-cadence photometry. △ Less

Submitted 25 September, 2022; originally announced September 2022.

Comments: 53 pages, 15 figures, publication in Nature Astronomy

Showing 151–200 of 615 results for author: Yu, Q