-
Social-DualCVAE: Multimodal Trajectory Forecasting Based on Social Interactions Pattern Aware and Dual Conditional Variational Auto-Encoder
Authors:
Jiashi Gao,
Xinming Shi,
James J. Q. Yu
Abstract:
Pedestrian trajectory forecasting is a fundamental task in multiple utility areas, such as self-driving, autonomous robots, and surveillance systems. The future trajectory forecasting is multi-modal, influenced by physical interaction with scene contexts and intricate social interactions among pedestrians. The mainly existing literature learns representations of social interactions by deep learnin…
▽ More
Pedestrian trajectory forecasting is a fundamental task in multiple utility areas, such as self-driving, autonomous robots, and surveillance systems. The future trajectory forecasting is multi-modal, influenced by physical interaction with scene contexts and intricate social interactions among pedestrians. The mainly existing literature learns representations of social interactions by deep learning networks, while the explicit interaction patterns are not utilized. Different interaction patterns, such as following or collision avoiding, will generate different trends of next movement, thus, the awareness of social interaction patterns is important for trajectory forecasting. Moreover, the social interaction patterns are privacy concerned or lack of labels. To jointly address the above issues, we present a social-dual conditional variational auto-encoder (Social-DualCVAE) for multi-modal trajectory forecasting, which is based on a generative model conditioned not only on the past trajectories but also the unsupervised classification of interaction patterns. After generating the category distribution of the unlabeled social interaction patterns, DualCVAE, conditioned on the past trajectories and social interaction pattern, is proposed for multi-modal trajectory prediction by latent variables estimating. A variational bound is derived as the minimization objective during training. The proposed model is evaluated on widely used trajectory benchmarks and outperforms the prior state-of-the-art methods.
△ Less
Submitted 8 February, 2022;
originally announced February 2022.
-
LBCF: A Large-Scale Budget-Constrained Causal Forest Algorithm
Authors:
Meng Ai,
Biao Li,
Heyang Gong,
Qingwei Yu,
Shengjie Xue,
Yuan Zhang,
Yunzhou Zhang,
Peng Jiang
Abstract:
Offering incentives (e.g., coupons at Amazon, discounts at Uber and video bonuses at Tiktok) to user is a common strategy used by online platforms to increase user engagement and platform revenue. Despite its proven effectiveness, these marketing incentives incur an inevitable cost and might result in a low ROI (Return on Investment) if not used properly. On the other hand, different users respond…
▽ More
Offering incentives (e.g., coupons at Amazon, discounts at Uber and video bonuses at Tiktok) to user is a common strategy used by online platforms to increase user engagement and platform revenue. Despite its proven effectiveness, these marketing incentives incur an inevitable cost and might result in a low ROI (Return on Investment) if not used properly. On the other hand, different users respond differently to these incentives, for instance, some users never buy certain products without coupons, while others do anyway. Thus, how to select the right amount of incentives (i.e. treatment) to each user under budget constraints is an important research problem with great practical implications. In this paper, we call such problem as a budget-constrained treatment selection (BTS) problem.
The challenge is how to efficiently solve BTS problem on a Large-Scale dataset and achieve improved results over the existing techniques. We propose a novel tree-based treatment selection technique under budget constraints, called Large-Scale Budget-Constrained Causal Forest (LBCF) algorithm, which is also an efficient treatment selection algorithm suitable for modern distributed computing systems. A novel offline evaluation method is also proposed to overcome an intrinsic challenge in assessing solutions' performance for BTS problem in randomized control trials (RCT) data. We deploy our approach in a real-world scenario on a large-scale video platform, where the platform gives away bonuses in order to increase users' campaign engagement duration. The simulation analysis, offline and online experiments all show that our method outperforms various tree-based state-of-the-art baselines. The proposed approach is currently serving over hundreds of millions of users on the platform and achieves one of the most tremendous improvements over these months.
△ Less
Submitted 18 February, 2022; v1 submitted 29 January, 2022;
originally announced January 2022.
-
Alleviating Cold-start Problem in CTR Prediction with A Variational Embedding Learning Framework
Authors:
Xiaoxiao Xu,
Chen Yang,
Qian Yu,
Zhiwei Fang,
Jiaxing Wang,
Chaosheng Fan,
Yang He,
Changping Peng,
Zhangang Lin,
Jingping Shao
Abstract:
We propose a general Variational Embedding Learning Framework (VELF) for alleviating the severe cold-start problem in CTR prediction. VELF addresses the cold start problem via alleviating over-fits caused by data-sparsity in two ways: learning probabilistic embedding, and incorporating trainable and regularized priors which utilize the rich side information of cold start users and advertisements (…
▽ More
We propose a general Variational Embedding Learning Framework (VELF) for alleviating the severe cold-start problem in CTR prediction. VELF addresses the cold start problem via alleviating over-fits caused by data-sparsity in two ways: learning probabilistic embedding, and incorporating trainable and regularized priors which utilize the rich side information of cold start users and advertisements (Ads). The two techniques are naturally integrated into a variational inference framework, forming an end-to-end training process. Abundant empirical tests on benchmark datasets well demonstrate the advantages of our proposed VELF. Besides, extended experiments confirmed that our parameterized and regularized priors provide more generalization capability than traditional fixed priors.
△ Less
Submitted 17 January, 2022;
originally announced January 2022.
-
Tensor factorization based method for low rank matrix completion and its application on tensor completion
Authors:
Quan Yu,
Xinzhen Zhang
Abstract:
Low rank matrix and tensor completion problems are to recover the incomplete two and higher order data by using their low rank structures. The essential problem in the matrix and tensor completion problems is how to improve the efficiency. To this end, we first establish the relationship between matrix rank and tensor tubal rank, and then reformulate matrix completion problem as a tensor completio…
▽ More
Low rank matrix and tensor completion problems are to recover the incomplete two and higher order data by using their low rank structures. The essential problem in the matrix and tensor completion problems is how to improve the efficiency. To this end, we first establish the relationship between matrix rank and tensor tubal rank, and then reformulate matrix completion problem as a tensor completion problem. For the reformulated tensor completion problem, we adopt a two-stage strategy based on tensor factorization algorithm. In this way, a matrix completion problem of big size can be solved via some matrix computations of smaller sizes. For a third order tensor completion problem, to fully exploit the low rank structures, we introduce the double tubal rank which combines the tubal rank and the rank of the mode-3 unfolding matrix. For the mode-3 unfolding matrix rank, we follow the idea of matrix completion. Based on this, we establish a novel model and modify the tensor factorization based algorithm for third order tensor completion. Extensive numerical experiments demonstrate that the proposed methods outperform state-of-the-art methods in terms of both accuracy and running time.
△ Less
Submitted 23 January, 2022;
originally announced January 2022.
-
State-space renormalization group theory of nonequilibrium reaction networks: Exact solutions for hypercubic lattices in arbitrary dimensions
Authors:
Qiwei Yu,
Yuhai Tu
Abstract:
Nonequilibrium reaction networks (NRNs) underlie most biological functions. Despite their diverse dynamic properties, NRNs share the signature characteristics of persistent probability fluxes and continuous energy dissipation even in the steady state. Dynamics of NRNs can be described at different coarse-grained levels. Our previous work showed that the apparent energy dissipation rate at a coarse…
▽ More
Nonequilibrium reaction networks (NRNs) underlie most biological functions. Despite their diverse dynamic properties, NRNs share the signature characteristics of persistent probability fluxes and continuous energy dissipation even in the steady state. Dynamics of NRNs can be described at different coarse-grained levels. Our previous work showed that the apparent energy dissipation rate at a coarse-grained level follows an inverse power law dependence on the scale of coarse-graining. The scaling exponent is determined by the network structure and correlation of stationary probability fluxes. However, it remains unclear whether and how the (renormalized) flux correlation varies with coarse-graining. Following Kadanoff's real space renormalization group (RG) approach for critical phenomena, we address this question by developing a State-Space Renormalization Group (SSRG) theory for NRNs, which leads to an iterative RG equation for the flux correlation function. In square and hypercubic lattices, we solve the RG equation exactly and find two types of fixed point solutions: a family of nontrivial fixed points where the correlation exhibits power-law decay and a trivial fixed point where the correlation vanishes beyond the nearest neighbors. The power-law fixed point is stable if and only if the power exponent is less than the lattice dimension $n$. Consequently, the correlation function converges to the power-law fixed point only when the correlation in the fine-grained network decays slower than $r^{-n}$ and to the trivial fixed point otherwise. If the flux correlation in the fine-grained network contains multiple stable solutions with different exponents, the RG iteration dynamics select the fixed point solution with the smallest exponent. We also discuss a possible connection between the RG flows of flux correlation with those of the Kosterlitz-Thouless transition.
△ Less
Submitted 26 May, 2022; v1 submitted 14 January, 2022;
originally announced January 2022.
-
Control of electron beam polarization in the bubble regime of laser-wakefield acceleration
Authors:
H. C. Fan,
X. Y. Liu,
X. F. Li,
J. F. Qu,
Q. Yu,
Q. Kong,
S. M. Weng,
M. Chen,
M. Büscher,
P. Gibbon,
S. Kawata,
Z. M. Sheng
Abstract:
Electron beam polarization in the bubble regime of the interaction between a high-intensity laser and a longitudinally pre-polarized plasma is investigated by means of the Thomas-Bargmann-Michel-Telegdi equation. Using a test-particle model, the dependence of the accelerated electron polarization on the bubble geometry is analyzed in detail. Tracking the polarization dynamics of individual electro…
▽ More
Electron beam polarization in the bubble regime of the interaction between a high-intensity laser and a longitudinally pre-polarized plasma is investigated by means of the Thomas-Bargmann-Michel-Telegdi equation. Using a test-particle model, the dependence of the accelerated electron polarization on the bubble geometry is analyzed in detail. Tracking the polarization dynamics of individual electrons reveals that although the spin direction changes during both the self-injection process and acceleration phase, the former has the biggest impact. For nearly spherical bubbles, the polarization of electron beam persists after capture and acceleration in the bubble. By contrast, for aspherical bubble shapes, the electron beam becomes rapidly depolarized, and the net polarization direction can even reverse in the case of a oblate spheroidal bubble. These findings are confirmed via particle-in-cell simulations.
△ Less
Submitted 9 January, 2022;
originally announced January 2022.
-
A Multi-Behavior Planning Framework for Robot Guide
Authors:
Muhan Hou,
Zonghao Mu,
Jing Li,
Qizhi Yu,
Jason Gu
Abstract:
The guiding task of a mobile robot requires not only human-aware navigation, but also appropriate yet timely interaction for active instruction. State-of-the-art tour-guide models limit their socially-aware consideration to adapting to users' motion, ignoring the interactive behavior planning to fulfill the communicative demands. We propose a multi-behavior planning framework based on Monte Carlo…
▽ More
The guiding task of a mobile robot requires not only human-aware navigation, but also appropriate yet timely interaction for active instruction. State-of-the-art tour-guide models limit their socially-aware consideration to adapting to users' motion, ignoring the interactive behavior planning to fulfill the communicative demands. We propose a multi-behavior planning framework based on Monte Carlo Tree Search to better assist users to understand confusing scene contexts, select proper paths and timely arrive at the destination. To provide proactive guidance, we construct a sampling-based probability model of human motion to consider the interrelated effects between robots and humans. We validate our method both in simulation and real-world experiments along with performance comparison with state-of-the-art models.
△ Less
Submitted 7 January, 2022;
originally announced January 2022.
-
A Joint Beamforming Design and Integrated CPM-LFM Signal for Dual-functional Radar-communication Systems
Authors:
Yu Cao,
QiYue Yu
Abstract:
The dual-functional radar-communication (DFRC) system is an attractive technique, since it can support both wireless communications and radar by a unified hardware platform with real-time cooperation. Considering the appealing feature of multiple beams, this paper proposes a precoding scheme that simultaneously support multiuser transmission and target detection, with an integrated continuous phas…
▽ More
The dual-functional radar-communication (DFRC) system is an attractive technique, since it can support both wireless communications and radar by a unified hardware platform with real-time cooperation. Considering the appealing feature of multiple beams, this paper proposes a precoding scheme that simultaneously support multiuser transmission and target detection, with an integrated continuous phase modulation (CPM) and linear frequency modulation (LFM) signal, based on the designed dual mode framework. Similarly to the conception of communication rate, this paper defines radar rate to unify the DFRC system. Then, the maximum sum-rate that includes both the communication and radar rates is set to be the objective function. Regarding as the optimal issue is non-convex, the optimal problem is divided into two sub-issues, one is the user selection issue, and the other is the joint beamforming design and power allocation issue. A successive maximum iteration (SMI) algorithm is presented for the former issue, which can balance the performances between the sum-rate and complexity; and maximum minimization Lagrange multiplier (MMLM) iteration algorithm is utilized to solve the latter optimal issue. Moreover, we deduce the spectrum characteristic, bit error rate (BER) and ambiguity function (AF) for the proposed system. Simulation results show that our proposed system can provide appreciated sum-rate than the classical schemes, validating the efficiency of the proposed system.
△ Less
Submitted 17 December, 2021;
originally announced December 2021.
-
Synthetic Map Generation to Provide Unlimited Training Data for Historical Map Text Detection
Authors:
Zekun Li,
Runyu Guan,
Qianmu Yu,
Yao-Yi Chiang,
Craig A. Knoblock
Abstract:
Many historical map sheets are publicly available for studies that require long-term historical geographic data. The cartographic design of these maps includes a combination of map symbols and text labels. Automatically reading text labels from map images could greatly speed up the map interpretation and helps generate rich metadata describing the map content. Many text detection algorithms have b…
▽ More
Many historical map sheets are publicly available for studies that require long-term historical geographic data. The cartographic design of these maps includes a combination of map symbols and text labels. Automatically reading text labels from map images could greatly speed up the map interpretation and helps generate rich metadata describing the map content. Many text detection algorithms have been proposed to locate text regions in map images automatically, but most of the algorithms are trained on out-ofdomain datasets (e.g., scenic images). Training data determines the quality of machine learning models, and manually annotating text regions in map images is labor-extensive and time-consuming. On the other hand, existing geographic data sources, such as Open- StreetMap (OSM), contain machine-readable map layers, which allow us to separate out the text layer and obtain text label annotations easily. However, the cartographic styles between OSM map tiles and historical maps are significantly different. This paper proposes a method to automatically generate an unlimited amount of annotated historical map images for training text detection models. We use a style transfer model to convert contemporary map images into historical style and place text labels upon them. We show that the state-of-the-art text detection models (e.g., PSENet) can benefit from the synthetic historical maps and achieve significant improvement for historical map text detection.
△ Less
Submitted 11 December, 2021;
originally announced December 2021.
-
Feasibility study of quantum computing using trapped electrons
Authors:
Qian Yu,
Alberto M. Alonso,
Jackie Caminiti,
Kristin M. Beck,
R. Tyler Sutherland,
Dietrich Leibfried,
Kayla J. Rodriguez,
Madhav Dhital,
Boerge Hemmerling,
Hartmut Häffner
Abstract:
We investigate the feasibility of using electrons in a linear Paul trap as qubits in a future quantum computer. We discuss the necessary experimental steps to realize such a device through a concrete design proposal, including trapping, cooling, electronic detection, spin readout and single and multi-qubit gate operations. Numeric simulations indicate that two-qubit Bell-state fidelities of order…
▽ More
We investigate the feasibility of using electrons in a linear Paul trap as qubits in a future quantum computer. We discuss the necessary experimental steps to realize such a device through a concrete design proposal, including trapping, cooling, electronic detection, spin readout and single and multi-qubit gate operations. Numeric simulations indicate that two-qubit Bell-state fidelities of order 99.99% can be achieved assuming reasonable experimental parameters.
△ Less
Submitted 7 December, 2021;
originally announced December 2021.
-
Low Noise Frequency Domain Multiplexing of TES Bolometers using Sub-kelvin SQUIDs
Authors:
Tucker Elleflot,
Aritoki Suzuki,
Kam Arnold,
Chris Bebek,
Robin H. Cantor,
Kevin T. Crowley,
John Groh,
Tijmen de Haan,
Amber Hornsby,
John Joseph,
Adrian T. Lee,
Tiffany Liu,
Joshua Montgomery,
Megan Russell,
Qingyang Yu
Abstract:
Digital Frequency-Domain Multiplexing (DfMux) is a technique that uses MHz superconducting resonators and Superconducting Quantum Interference Device (SQUID) arrays to read out sets of Transition Edge Sensors. DfMux has been used by several Cosmic Microwave Background experiments, including most recently POLARBEAR-2 and SPT-3G with multiplexing factors as high as 68, and is the baseline readout te…
▽ More
Digital Frequency-Domain Multiplexing (DfMux) is a technique that uses MHz superconducting resonators and Superconducting Quantum Interference Device (SQUID) arrays to read out sets of Transition Edge Sensors. DfMux has been used by several Cosmic Microwave Background experiments, including most recently POLARBEAR-2 and SPT-3G with multiplexing factors as high as 68, and is the baseline readout technology for the planned satellite mission LiteBIRD. Here, we present recent work focused on improving DfMux readout noise, reducing parasitic impedance, and improving sensor operation. We have achieved a substantial reduction in stray impedance by integrating the sensors, resonators, and SQUID array onto a single carrier board operated at 250 mK. This also drastically simplifies the packaging of the cryogenic components and leads to better-controlled crosstalk. We demonstrate a low readout noise level of 8.6 pA/Hz$^{-1/2}$, which was made possible by operating the SQUID array at a reduced temperature and with a low dynamic impedance. This is a factor of two improvement compared to the achieved readout noise level in currently operating Cosmic Microwave Background experiments using DfMux and represents a critical step toward maturation of the technology for the next generation of instruments.
△ Less
Submitted 4 December, 2021;
originally announced December 2021.
-
Permutationally invariant polynomial regression for energies and gradients, using reverse differentiation, achieves orders of magnitude speed-up with high precision compared to other machine learning methods
Authors:
Paul L. Houston,
Chen Qu,
Apurba Nandi,
Riccardo Conte,
Qi Yu,
Joel M. Bowman
Abstract:
Permutationally invariant polynomial (PIP) regression has been used to obtain machine-learned (ML) potential energy surfaces, including analytical gradients, for many molecules and chemical reactions. Recently, the approach has been extended to moderate size molecules and applied to systems up to 15 atoms. The algorithm, including "purification of the basis", is computationally efficient for energ…
▽ More
Permutationally invariant polynomial (PIP) regression has been used to obtain machine-learned (ML) potential energy surfaces, including analytical gradients, for many molecules and chemical reactions. Recently, the approach has been extended to moderate size molecules and applied to systems up to 15 atoms. The algorithm, including "purification of the basis", is computationally efficient for energies; however, we found that the recent extension to obtain analytical gradients, despite being a remarkable advance over previous methods, could be further improved. Here we report developments to compact further a purified basis and, more significantly, to use the reverse gradient approach to greatly speed up gradient evaluation. We demonstrate this for our recent 4-body water interaction potential. Comparisons of training and testing precision on the MD17 database of energies and gradients (forces) for ethanol against GP-SOAP, ANI, sGDML, PhysNet, pKREG, KRR, and other methods, which were recently assessed by Dral and co-workers, are given. The PIP fits are as precise as those using these methods, but the PIP computation time for energy and force evaluation is shown to be 10 to 1000 times faster. Finally, a new PIP PES is reported for ethanol based on a more extensive dataset of energies and gradients than in the MD17 database. Diffusion Monte Carlo calculations which fail on MD17-based PESs are successful using the new PES.
△ Less
Submitted 3 December, 2021;
originally announced December 2021.
-
Novel and self-consistency analysis of the QCD running coupling $α_s(Q)$ in both the perturbative and nonperturbative domains
Authors:
Qing Yu,
Hua Zhou,
Xu-Dong Huang,
Jian-Ming Shen,
Xing-Gang Wu
Abstract:
The QCD coupling $α_s$ is the most important parameter for achieving precise QCD predictions. By using the well measured effective coupling $α^{g_1}_{s}(Q)$ defined from the Bjorken sum rules as a basis, we suggest a novel and self-consistency way to fix the $α_s$ at all scales: The QCD light-front holographic model is adopted for its infrared behavior, and the fixed-order pQCD prediction under th…
▽ More
The QCD coupling $α_s$ is the most important parameter for achieving precise QCD predictions. By using the well measured effective coupling $α^{g_1}_{s}(Q)$ defined from the Bjorken sum rules as a basis, we suggest a novel and self-consistency way to fix the $α_s$ at all scales: The QCD light-front holographic model is adopted for its infrared behavior, and the fixed-order pQCD prediction under the principle of maximum conformality (PMC) is used for its high-energy behavior. Using the PMC scheme-and-scale independent perturbative series, and by transforming it into the one under the physical $V$-scheme, we observe that a precise $α_s$ running behavior in both the perturbative and nonperturbative domains with a smooth transition from small to large scales can be achieved.
△ Less
Submitted 13 June, 2022; v1 submitted 2 December, 2021;
originally announced December 2021.
-
Consensus Graph Representation Learning for Better Grounded Image Captioning
Authors:
Wenqiao Zhang,
Haochen Shi,
Siliang Tang,
Jun Xiao,
Qiang Yu,
Yueting Zhuang
Abstract:
The contemporary visual captioning models frequently hallucinate objects that are not actually in a scene, due to the visual misclassification or over-reliance on priors that resulting in the semantic inconsistency between the visual information and the target lexical words. The most common way is to encourage the captioning model to dynamically link generated object words or phrases to appropriat…
▽ More
The contemporary visual captioning models frequently hallucinate objects that are not actually in a scene, due to the visual misclassification or over-reliance on priors that resulting in the semantic inconsistency between the visual information and the target lexical words. The most common way is to encourage the captioning model to dynamically link generated object words or phrases to appropriate regions of the image, i.e., the grounded image captioning (GIC). However, GIC utilizes an auxiliary task (grounding objects) that has not solved the key issue of object hallucination, i.e., the semantic inconsistency. In this paper, we take a novel perspective on the issue above - exploiting the semantic coherency between the visual and language modalities. Specifically, we propose the Consensus Rraph Representation Learning framework (CGRL) for GIC that incorporates a consensus representation into the grounded captioning pipeline. The consensus is learned by aligning the visual graph (e.g., scene graph) to the language graph that consider both the nodes and edges in a graph. With the aligned consensus, the captioning model can capture both the correct linguistic characteristics and visual relevance, and then grounding appropriate image regions further. We validate the effectiveness of our model, with a significant decline in object hallucination (-9% CHAIRi) on the Flickr30k Entities dataset. Besides, our CGRL also evaluated by several automatic metrics and human evaluation, the results indicate that the proposed approach can simultaneously improve the performance of image captioning (+2.9 Cider) and grounding (+2.3 F1LOC).
△ Less
Submitted 12 April, 2022; v1 submitted 1 December, 2021;
originally announced December 2021.
-
PartImageNet: A Large, High-Quality Dataset of Parts
Authors:
Ju He,
Shuo Yang,
Shaokang Yang,
Adam Kortylewski,
Xiaoding Yuan,
Jie-Neng Chen,
Shuai Liu,
Cheng Yang,
Qihang Yu,
Alan Yuille
Abstract:
It is natural to represent objects in terms of their parts. This has the potential to improve the performance of algorithms for object recognition and segmentation but can also help for downstream tasks like activity recognition. Research on part-based models, however, is hindered by the lack of datasets with per-pixel part annotations. This is partly due to the difficulty and high cost of annotat…
▽ More
It is natural to represent objects in terms of their parts. This has the potential to improve the performance of algorithms for object recognition and segmentation but can also help for downstream tasks like activity recognition. Research on part-based models, however, is hindered by the lack of datasets with per-pixel part annotations. This is partly due to the difficulty and high cost of annotating object parts so it has rarely been done except for humans (where there exists a big literature on part-based models). To help address this problem, we propose PartImageNet, a large, high-quality dataset with part segmentation annotations. It consists of $158$ classes from ImageNet with approximately $24,000$ images. PartImageNet is unique because it offers part-level annotations on a general set of classes including non-rigid, articulated objects, while having an order of magnitude larger size compared to existing part datasets (excluding datasets of humans). It can be utilized for many vision tasks including Object Segmentation, Semantic Part Segmentation, Few-shot Learning and Part Discovery. We conduct comprehensive experiments which study these tasks and set up a set of baselines. The dataset and scripts are released at https://github.com/TACJu/PartImageNet.
△ Less
Submitted 16 December, 2022; v1 submitted 1 December, 2021;
originally announced December 2021.
-
SAGCI-System: Towards Sample-Efficient, Generalizable, Compositional, and Incremental Robot Learning
Authors:
Jun Lv,
Qiaojun Yu,
Lin Shao,
Wenhai Liu,
Wenqiang Xu,
Cewu Lu
Abstract:
Building general-purpose robots to perform a diverse range of tasks in a large variety of environments in the physical world at the human level is extremely challenging. It requires the robot learning to be sample-efficient, generalizable, compositional, and incremental. In this work, we introduce a systematic learning framework called SAGCI-system towards achieving these above four requirements.…
▽ More
Building general-purpose robots to perform a diverse range of tasks in a large variety of environments in the physical world at the human level is extremely challenging. It requires the robot learning to be sample-efficient, generalizable, compositional, and incremental. In this work, we introduce a systematic learning framework called SAGCI-system towards achieving these above four requirements. Our system first takes the raw point clouds gathered by the camera mounted on the robot's wrist as the inputs and produces initial modeling of the surrounding environment represented as a file of Unified Robot Description Format (URDF). Our system adopts a learning-augmented differentiable simulation that loads the URDF. The robot then utilizes the interactive perception to interact with the environment to online verify and modify the URDF. Leveraging the differentiable simulation, we propose a model-based learning algorithm combining object-centric and robot-centric stages to efficiently produce policies to accomplish manipulation tasks. We apply our system to perform articulated object manipulation tasks, both in the simulation and the real world. Extensive experiments demonstrate the effectiveness of our proposed learning framework. Supplemental materials and videos are available on https://sites.google.com/view/egci.
△ Less
Submitted 2 March, 2022; v1 submitted 29 November, 2021;
originally announced November 2021.
-
Action based Network for Conversation Question Reformulation
Authors:
Zheyu Ye,
Jiangning Liu,
Qian Yu,
Jianxun Ju
Abstract:
Conversation question answering requires the ability to interpret a question correctly. Current models, however, are still unsatisfactory due to the difficulty of understanding the co-references and ellipsis in daily conversation. Even though generative approaches achieved remarkable progress, they are still trapped by semantic incompleteness. This paper presents an action-based approach to recove…
▽ More
Conversation question answering requires the ability to interpret a question correctly. Current models, however, are still unsatisfactory due to the difficulty of understanding the co-references and ellipsis in daily conversation. Even though generative approaches achieved remarkable progress, they are still trapped by semantic incompleteness. This paper presents an action-based approach to recover the complete expression of the question. Specifically, we first locate the positions of co-reference or ellipsis in the question while assigning the corresponding action to each candidate span. We then look for matching phrases related to the candidate clues in the conversation context. Finally, according to the predicted action, we decide whether to replace the co-reference or supplement the ellipsis with the matched information. We demonstrate the effectiveness of our method on both English and Chinese utterance rewrite tasks, improving the state-of-the-art EM (exact match) by 3.9\% and ROUGE-L by 1.0\% respectively on the Restoration-200K dataset.
△ Less
Submitted 29 November, 2021;
originally announced November 2021.
-
Deep Reinforced Attention Regression for Partial Sketch Based Image Retrieval
Authors:
Dingrong Wang,
Hitesh Sapkota,
Xumin Liu,
Qi Yu
Abstract:
Fine-Grained Sketch-Based Image Retrieval (FG-SBIR) aims at finding a specific image from a large gallery given a query sketch. Despite the widespread applicability of FG-SBIR in many critical domains (e.g., crime activity tracking), existing approaches still suffer from a low accuracy while being sensitive to external noises such as unnecessary strokes in the sketch. The retrieval performance wil…
▽ More
Fine-Grained Sketch-Based Image Retrieval (FG-SBIR) aims at finding a specific image from a large gallery given a query sketch. Despite the widespread applicability of FG-SBIR in many critical domains (e.g., crime activity tracking), existing approaches still suffer from a low accuracy while being sensitive to external noises such as unnecessary strokes in the sketch. The retrieval performance will further deteriorate under a more practical on-the-fly setting, where only a partially complete sketch with only a few (noisy) strokes are available to retrieve corresponding images. We propose a novel framework that leverages a uniquely designed deep reinforcement learning model that performs a dual-level exploration to deal with partial sketch training and attention region selection. By enforcing the model's attention on the important regions of the original sketches, it remains robust to unnecessary stroke noises and improve the retrieval accuracy by a large margin. To sufficiently explore partial sketches and locate the important regions to attend, the model performs bootstrapped policy gradient for global exploration while adjusting a standard deviation term that governs a locator network for local exploration. The training process is guided by a hybrid loss that integrates a reinforcement loss and a supervised loss. A dynamic ranking reward is developed to fit the on-the-fly image retrieval process using partial sketches. The extensive experimentation performed on three public datasets shows that our proposed approach achieves the state-of-the-art performance on partial sketch based image retrieval.
△ Less
Submitted 21 November, 2021;
originally announced November 2021.
-
Uncertainty-Aware Multiple Instance Learning from Large-Scale Long Time Series Data
Authors:
Yuansheng Zhu,
Weishi Shi,
Deep Shankar Pandey,
Yang Liu,
Xiaofan Que,
Daniel E. Krutz,
Qi Yu
Abstract:
We propose a novel framework to classify large-scale time series data with long duration. Long time seriesclassification (L-TSC) is a challenging problem because the dataoften contains a large amount of irrelevant information to theclassification target. The irrelevant period degrades the classifica-tion performance while the relevance is unknown to the system.This paper proposes an uncertainty-aw…
▽ More
We propose a novel framework to classify large-scale time series data with long duration. Long time seriesclassification (L-TSC) is a challenging problem because the dataoften contains a large amount of irrelevant information to theclassification target. The irrelevant period degrades the classifica-tion performance while the relevance is unknown to the system.This paper proposes an uncertainty-aware multiple instancelearning (MIL) framework to identify the most relevant periodautomatically. The predictive uncertainty enables designing anattention mechanism that forces the MIL model to learn from thepossibly discriminant period. Moreover, the predicted uncertaintyyields a principled estimator to identify whether a prediction istrustworthy or not. We further incorporate another modality toaccommodate unreliable predictions by training a separate modelbased on its availability and conduct uncertainty aware fusion toproduce the final prediction. Systematic evaluation is conductedon the Automatic Identification System (AIS) data, which is col-lected to identify and track real-world vessels. Empirical resultsdemonstrate that the proposed method can effectively detect thetypes of vessels based on the trajectory and the uncertainty-awarefusion with other available data modality (Synthetic-ApertureRadar or SAR imagery is used in our experiments) can furtherimprove the detection accuracy.
△ Less
Submitted 20 November, 2021; v1 submitted 16 November, 2021;
originally announced November 2021.
-
Simultaneous estimation of parameters and the state of an optical parametric oscillator system
Authors:
Qi Yu,
Shota Yokoyama,
Daoyi Dong,
David McManus,
Hidehiro Yonezawa
Abstract:
In this paper, we consider the filtering problem of an optical parametric oscillator (OPO). The OPO pump power may fluctuate due to environmental disturbances, resulting in uncertainty in the system modeling. Thus, both the state and the unknown parameter may need to be estimated simultaneously. We formulate this problem using a state-space representation of the OPO dynamics. Under the assumption…
▽ More
In this paper, we consider the filtering problem of an optical parametric oscillator (OPO). The OPO pump power may fluctuate due to environmental disturbances, resulting in uncertainty in the system modeling. Thus, both the state and the unknown parameter may need to be estimated simultaneously. We formulate this problem using a state-space representation of the OPO dynamics. Under the assumption of Gaussianity and proper constraints, the dual Kalman filter method and the joint extended Kalman filter method are employed to simultaneously estimate the system state and the pump power. Numerical examples demonstrate the effectiveness of the proposed algorithms.
△ Less
Submitted 14 November, 2021;
originally announced November 2021.
-
HMD-AMP: Protein Language-Powered Hierarchical Multi-label Deep Forest for Annotating Antimicrobial Peptides
Authors:
Qinze Yu,
Zhihang Dong,
Xingyu Fan,
Licheng Zong,
Yu Li
Abstract:
Identifying the targets of an antimicrobial peptide is a fundamental step in studying the innate immune response and combating antibiotic resistance, and more broadly, precision medicine and public health. There have been extensive studies on the statistical and computational approaches to identify (i) whether a peptide is an antimicrobial peptide (AMP) or a non-AMP and (ii) which targets are thes…
▽ More
Identifying the targets of an antimicrobial peptide is a fundamental step in studying the innate immune response and combating antibiotic resistance, and more broadly, precision medicine and public health. There have been extensive studies on the statistical and computational approaches to identify (i) whether a peptide is an antimicrobial peptide (AMP) or a non-AMP and (ii) which targets are these sequences effective to (Gram-positive, Gram-negative, etc.). Despite the existing deep learning methods on this problem, most of them are unable to handle the small AMP classes (anti-insect, anti-parasite, etc.). And more importantly, some AMPs can have multiple targets, which the previous methods fail to consider. In this study, we build a diverse and comprehensive multi-label protein sequence database by collecting and cleaning amino acids from various AMP databases. To generate efficient representations and features for the small classes dataset, we take advantage of a protein language model trained on 250 million protein sequences. Based on that, we develop an end-to-end hierarchical multi-label deep forest framework, HMD-AMP, to annotate AMP comprehensively. After identifying an AMP, it further predicts what targets the AMP can effectively kill from eleven available classes. Extensive experiments suggest that our framework outperforms state-of-the-art models in both the binary classification task and the multi-label classification task, especially on the minor classes.The model is robust against reduced features and small perturbations and produces promising results. We believe HMD-AMP contributes to both the future wet-lab investigations of the innate structural properties of different antimicrobial peptides and build promising empirical underpinnings for precise medicine with antibiotics.
△ Less
Submitted 10 November, 2021;
originally announced November 2021.
-
Non-Binary Polar Coded System for the Two-User Multiple-Access Channel
Authors:
Guan-Chen Liu,
Qi-Yue Yu
Abstract:
This paper presents non-binary polar codes for the two-user multiple-access channel (MAC). The bit error rate (BER) performances of the non-binary polar codes with different kernel factors have been investigated in detail to select a proper parameter from GF(q) for the generator matrix. Furthermore, the successive cancellation decoding for the non-binary polar codes in the two-user MAC is introduc…
▽ More
This paper presents non-binary polar codes for the two-user multiple-access channel (MAC). The bit error rate (BER) performances of the non-binary polar codes with different kernel factors have been investigated in detail to select a proper parameter from GF(q) for the generator matrix. Furthermore, the successive cancellation decoding for the non-binary polar codes in the two-user MAC is introduced in detail. Simulation results show that the choice of the kernel factors has a significant impact on the block error rate (BLER) performance; moreover, the non-binary polar codes provide a better BLER performance than their binary counterpart in the two-user MAC.
△ Less
Submitted 6 November, 2021;
originally announced November 2021.
-
Spatio-Temporal Urban Knowledge Graph Enabled Mobility Prediction
Authors:
Huandong Wang,
Qiaohong Yu,
Yu Liu,
Depeng Jin,
Yong Li
Abstract:
With the rapid development of the mobile communication technology, mobile trajectories of humans are massively collected by Internet service providers (ISPs) and application service providers (ASPs). On the other hand, the rising paradigm of knowledge graph (KG) provides us a promising solution to extract structured "knowledge" from massive trajectory data. In this paper, we focus on modeling user…
▽ More
With the rapid development of the mobile communication technology, mobile trajectories of humans are massively collected by Internet service providers (ISPs) and application service providers (ASPs). On the other hand, the rising paradigm of knowledge graph (KG) provides us a promising solution to extract structured "knowledge" from massive trajectory data. In this paper, we focus on modeling users' spatio-temporal mobility patterns based on knowledge graph techniques, and predicting users' future movement based on the "knowledge'' extracted from multiple sources in a cohesive manner. Specifically, we propose a new type of knowledge graph, i.e., spatio-temporal urban knowledge graph (STKG), where mobility trajectories, category information of venues, and temporal information are jointly modeled by the facts with different relation types in STKG. The mobility prediction problem is converted to the knowledge graph completion problem in STKG. Further, a complex embedding model with elaborately designed scoring functions is proposed to measure the plausibility of facts in STKG to solve the knowledge graph completion problem, which considers temporal dynamics of the mobility patterns and utilizes PoI categories as the auxiliary information and background knowledge. Extensive evaluations confirm the high accuracy of our model in predicting users' mobility, i.e., improving the accuracy by 5.04% compared with the state-of-the-art algorithms. In addition, PoI categories as the background knowledge and auxiliary information are confirmed to be helpful by improving the performance by 3.85% in terms of accuracy. Additionally, experiments show that our proposed method is time-efficient by reducing the computational time by over 43.12% compared with existing methods.
△ Less
Submitted 10 November, 2021; v1 submitted 1 November, 2021;
originally announced November 2021.
-
One- and two-qubit gate infidelities due to motional errors in trapped ions and electrons
Authors:
R. Tyler Sutherland,
Qian Yu,
Kristin M. Beck,
Hartmut Häffner
Abstract:
In this work, we derive analytic formulae that determine the effect of error mechanisms on one- and two-qubit gates in trapped ions and electrons. First, we analyze, and derive expressions for, the effect of driving field inhomogeneities on one-qubit gate fidelities. Second, we derive expressions for two-qubit gate errors, including static motional frequency shifts, trap anharmonicities, field inh…
▽ More
In this work, we derive analytic formulae that determine the effect of error mechanisms on one- and two-qubit gates in trapped ions and electrons. First, we analyze, and derive expressions for, the effect of driving field inhomogeneities on one-qubit gate fidelities. Second, we derive expressions for two-qubit gate errors, including static motional frequency shifts, trap anharmonicities, field inhomogeneities, heating, and motional dephasing. We show that, for small errors, each of our expressions for infidelity converges to its respective numerical simulation; this shows our formulae are sufficient for determining error budgets for high-fidelity gates, obviating numerical simulations in future projects. All of the derivations are general to any internal qubit state, and any mixed state of the ion crystal's motion that is diagonal in the Fock state basis. Our treatment of static motional frequency shifts, trap anharmonicities, heating, and motional dephasing apply to both laser-based and laser-free gates, while our treatment of field imhomogenieties applies to laser-free systems.
△ Less
Submitted 10 February, 2022; v1 submitted 2 November, 2021;
originally announced November 2021.
-
Drawing Robust Scratch Tickets: Subnetworks with Inborn Robustness Are Found within Randomly Initialized Networks
Authors:
Yonggan Fu,
Qixuan Yu,
Yang Zhang,
Shang Wu,
Xu Ouyang,
David Cox,
Yingyan Lin
Abstract:
Deep Neural Networks (DNNs) are known to be vulnerable to adversarial attacks, i.e., an imperceptible perturbation to the input can mislead DNNs trained on clean images into making erroneous predictions. To tackle this, adversarial training is currently the most effective defense method, by augmenting the training set with adversarial samples generated on the fly. Interestingly, we discover for th…
▽ More
Deep Neural Networks (DNNs) are known to be vulnerable to adversarial attacks, i.e., an imperceptible perturbation to the input can mislead DNNs trained on clean images into making erroneous predictions. To tackle this, adversarial training is currently the most effective defense method, by augmenting the training set with adversarial samples generated on the fly. Interestingly, we discover for the first time that there exist subnetworks with inborn robustness, matching or surpassing the robust accuracy of the adversarially trained networks with comparable model sizes, within randomly initialized networks without any model training, indicating that adversarial training on model weights is not indispensable towards adversarial robustness. We name such subnetworks Robust Scratch Tickets (RSTs), which are also by nature efficient. Distinct from the popular lottery ticket hypothesis, neither the original dense networks nor the identified RSTs need to be trained. To validate and understand this fascinating finding, we further conduct extensive experiments to study the existence and properties of RSTs under different models, datasets, sparsity patterns, and attacks, drawing insights regarding the relationship between DNNs' robustness and their initialization/overparameterization. Furthermore, we identify the poor adversarial transferability between RSTs of different sparsity ratios drawn from the same randomly initialized dense network, and propose a Random RST Switch (R2S) technique, which randomly switches between different RSTs, as a novel defense method built on top of RSTs. We believe our findings about RSTs have opened up a new perspective to study model robustness and extend the lottery ticket hypothesis.
△ Less
Submitted 2 February, 2022; v1 submitted 26 October, 2021;
originally announced October 2021.
-
Uniquely Decodable Multi-Amplitude Sequence for Grant-Free Multiple-Access Adder Channels
Authors:
Qi-Yue Yu,
Ke-Xun Song
Abstract:
Grant-free multiple-access (GFMA) is a valuable research topic, since it can support multiuser transmission with low latency. This paper constructs novel uniquely-decodable multi-amplitude sequence (UDAS) sets for GFMA systems, which can provide high spectrum efficiency (SE) with low-complexity active user detection (AUD) algorithm. First of all, we propose an UDAS-based multi-dimensional bit inte…
▽ More
Grant-free multiple-access (GFMA) is a valuable research topic, since it can support multiuser transmission with low latency. This paper constructs novel uniquely-decodable multi-amplitude sequence (UDAS) sets for GFMA systems, which can provide high spectrum efficiency (SE) with low-complexity active user detection (AUD) algorithm. First of all, we propose an UDAS-based multi-dimensional bit interleaving coded modulation (MD-BICM) transmitter; then introduce the definition of UDAS and construct two kinds of UDAS sets based on cyclic and quasi-cyclic matrix modes. Besides, we present a statistic of UDAS feature based AUD algorithm (SoF-AUD), and a joint multiuser detection and improved message passing algorithm for the proposed system. Finally, the active user error rate (AUER) and Shannon limits of the proposed system are deduced in details. Simulation results show that our proposed system can simultaneously support four users without additional redundancy, and the AUER can reach an extremely low value $10^{-5}$ when $E_b/N_0$ is $0$ dB and the length of transmit block is larger than a given value, i.e., 784, verifying the validity and flexibility of the proposed UDAS sets.
△ Less
Submitted 8 April, 2022; v1 submitted 22 October, 2021;
originally announced October 2021.
-
Noisy Annotation Refinement for Object Detection
Authors:
Jiafeng Mao,
Qing Yu,
Yoko Yamakata,
Kiyoharu Aizawa
Abstract:
Supervised training of object detectors requires well-annotated large-scale datasets, whose production is costly. Therefore, some efforts have been made to obtain annotations in economical ways, such as cloud sourcing. However, datasets obtained by these methods tend to contain noisy annotations such as inaccurate bounding boxes and incorrect class labels. In this study, we propose a new problem s…
▽ More
Supervised training of object detectors requires well-annotated large-scale datasets, whose production is costly. Therefore, some efforts have been made to obtain annotations in economical ways, such as cloud sourcing. However, datasets obtained by these methods tend to contain noisy annotations such as inaccurate bounding boxes and incorrect class labels. In this study, we propose a new problem setting of training object detectors on datasets with entangled noises of annotations of class labels and bounding boxes. Our proposed method efficiently decouples the entangled noises, corrects the noisy annotations, and subsequently trains the detector using the corrected annotations. We verified the effectiveness of our proposed method and compared it with the baseline on noisy datasets with different noise levels. The experimental results show that our proposed method significantly outperforms the baseline.
△ Less
Submitted 7 December, 2021; v1 submitted 20 October, 2021;
originally announced October 2021.
-
Bilateral-ViT for Robust Fovea Localization
Authors:
Sifan Song,
Kang Dang,
Qinji Yu,
Zilong Wang,
Frans Coenen,
Jionglong Su,
Xiaowei Ding
Abstract:
The fovea is an important anatomical landmark of the retina. Detecting the location of the fovea is essential for the analysis of many retinal diseases. However, robust fovea localization remains a challenging problem, as the fovea region often appears fuzzy, and retina diseases may further obscure its appearance. This paper proposes a novel Vision Transformer (ViT) approach that integrates inform…
▽ More
The fovea is an important anatomical landmark of the retina. Detecting the location of the fovea is essential for the analysis of many retinal diseases. However, robust fovea localization remains a challenging problem, as the fovea region often appears fuzzy, and retina diseases may further obscure its appearance. This paper proposes a novel Vision Transformer (ViT) approach that integrates information both inside and outside the fovea region to achieve robust fovea localization. Our proposed network, named Bilateral-Vision-Transformer (Bilateral-ViT), consists of two network branches: a transformer-based main network branch for integrating global context across the entire fundus image and a vessel branch for explicitly incorporating the structure of blood vessels. The encoded features from both network branches are subsequently merged with a customized Multi-scale Feature Fusion (MFF) module. Our comprehensive experiments demonstrate that the proposed approach is significantly more robust for diseased images and establishes the new state of the arts using the Messidor and PALM datasets.
△ Less
Submitted 3 March, 2022; v1 submitted 19 October, 2021;
originally announced October 2021.
-
Inconsistency-aware Uncertainty Estimation for Semi-supervised Medical Image Segmentation
Authors:
Yinghuan Shi,
Jian Zhang,
Tong Ling,
Jiwen Lu,
Yefeng Zheng,
Qian Yu,
Lei Qi,
Yang Gao
Abstract:
In semi-supervised medical image segmentation, most previous works draw on the common assumption that higher entropy means higher uncertainty. In this paper, we investigate a novel method of estimating uncertainty. We observe that, when assigned different misclassification costs in a certain degree, if the segmentation result of a pixel becomes inconsistent, this pixel shows a relative uncertainty…
▽ More
In semi-supervised medical image segmentation, most previous works draw on the common assumption that higher entropy means higher uncertainty. In this paper, we investigate a novel method of estimating uncertainty. We observe that, when assigned different misclassification costs in a certain degree, if the segmentation result of a pixel becomes inconsistent, this pixel shows a relative uncertainty in its segmentation. Therefore, we present a new semi-supervised segmentation model, namely, conservative-radical network (CoraNet in short) based on our uncertainty estimation and separate self-training strategy. In particular, our CoraNet model consists of three major components: a conservative-radical module (CRM), a certain region segmentation network (C-SN), and an uncertain region segmentation network (UC-SN) that could be alternatively trained in an end-to-end manner. We have extensively evaluated our method on various segmentation tasks with publicly available benchmark datasets, including CT pancreas, MR endocardium, and MR multi-structures segmentation on the ACDC dataset. Compared with the current state of the art, our CoraNet has demonstrated superior performance. In addition, we have also analyzed its connection with and difference from conventional methods of uncertainty estimation in semi-supervised medical image segmentation.
△ Less
Submitted 17 October, 2021;
originally announced October 2021.
-
Resource-constrained Federated Edge Learning with Heterogeneous Data: Formulation and Analysis
Authors:
Yi Liu,
Yuanshao Zhu,
James J. Q. Yu
Abstract:
Efficient collaboration between collaborative machine learning and wireless communication technology, forming a Federated Edge Learning (FEEL), has spawned a series of next-generation intelligent applications. However, due to the openness of network connections, the FEEL framework generally involves hundreds of remote devices (or clients), resulting in expensive communication costs, which is not f…
▽ More
Efficient collaboration between collaborative machine learning and wireless communication technology, forming a Federated Edge Learning (FEEL), has spawned a series of next-generation intelligent applications. However, due to the openness of network connections, the FEEL framework generally involves hundreds of remote devices (or clients), resulting in expensive communication costs, which is not friendly to resource-constrained FEEL. To address this issue, we propose a distributed approximate Newton-type algorithm with fast convergence speed to alleviate the problem of FEEL resource (in terms of communication resources) constraints. Specifically, the proposed algorithm is improved based on distributed L-BFGS algorithm and allows each client to approximate the high-cost Hessian matrix by computing the low-cost Fisher matrix in a distributed manner to find a "better" descent direction, thereby speeding up convergence. Second, we prove that the proposed algorithm has linear convergence in strongly convex and non-convex cases and analyze its computational and communication complexity. Similarly, due to the heterogeneity of the connected remote devices, FEEL faces the challenge of heterogeneous data and non-IID (Independent and Identically Distributed) data. To this end, we design a simple but elegant training scheme, namely FedOVA, to solve the heterogeneous statistical challenge brought by heterogeneous data. In this way, FedOVA first decomposes a multi-class classification problem into more straightforward binary classification problems and then combines their respective outputs using ensemble learning. In particular, the scheme can be well integrated with our communication efficient algorithm to serve FEEL. Numerical results verify the effectiveness and superiority of the proposed algorithm.
△ Less
Submitted 14 October, 2021;
originally announced October 2021.
-
Investigating the existence of gravitomagnetic monopole in M87*
Authors:
M. Ghasemi-Nodehi,
Chandrachur Chakraborty,
Qingjuan Yu,
Youjun Lu
Abstract:
We examine the possibility for the existence of gravitomagnetic monopole ($n_*$) in M87* by using the results obtained from its first Event Horizon Telescope image. By numerically deducing the shadow sizes in Kerr-Taub-NUT (KTN) spacetime, we show that the shadow size increases with increasing $|n_*|$ for a fixed Kerr parameter $|a_*|$ in case of the KTN black hole, whereas for a KTN naked singula…
▽ More
We examine the possibility for the existence of gravitomagnetic monopole ($n_*$) in M87* by using the results obtained from its first Event Horizon Telescope image. By numerically deducing the shadow sizes in Kerr-Taub-NUT (KTN) spacetime, we show that the shadow size increases with increasing $|n_*|$ for a fixed Kerr parameter $|a_*|$ in case of the KTN black hole, whereas for a KTN naked singularity it increases with increasing $n_*$ for a fixed $a_* > 0$ if $n_* > -\cot 17^{\circ}$ . In general, the asymmetry of shadow shape increases if the central dark object in M87 is a KTN/Kerr naked singularity instead of a KTN/Kerr black hole. We find that a non-zero gravitomagnetic monopole is still compatible with the current EHT observations, in which case the upper limit of $n_*$ cannot be greater than $1.1$, i.e., $n_* \lesssim 1.1$ for the prograde rotation ($a_* > 0$), and the lower limit of $n_*$ cannot be less than $-1.1$, i.e., $ n_* \gtrsim -1.1$ for the retrograde rotation ($a_* < 0$). Moreover, if the circularity of the shadow can be measured on a precision of $\lesssim 1\%$, the Kerr and KTN naked singularities can be falsified for M87*.
△ Less
Submitted 27 October, 2021; v1 submitted 30 September, 2021;
originally announced September 2021.
-
LightSecAgg: a Lightweight and Versatile Design for Secure Aggregation in Federated Learning
Authors:
Jinhyun So,
Chaoyang He,
Chien-Sheng Yang,
Songze Li,
Qian Yu,
Ramy E. Ali,
Basak Guler,
Salman Avestimehr
Abstract:
Secure model aggregation is a key component of federated learning (FL) that aims at protecting the privacy of each user's individual model while allowing for their global aggregation. It can be applied to any aggregation-based FL approach for training a global or personalized model. Model aggregation needs to also be resilient against likely user dropouts in FL systems, making its design substanti…
▽ More
Secure model aggregation is a key component of federated learning (FL) that aims at protecting the privacy of each user's individual model while allowing for their global aggregation. It can be applied to any aggregation-based FL approach for training a global or personalized model. Model aggregation needs to also be resilient against likely user dropouts in FL systems, making its design substantially more complex. State-of-the-art secure aggregation protocols rely on secret sharing of the random-seeds used for mask generations at the users to enable the reconstruction and cancellation of those belonging to the dropped users. The complexity of such approaches, however, grows substantially with the number of dropped users. We propose a new approach, named LightSecAgg, to overcome this bottleneck by changing the design from "random-seed reconstruction of the dropped users" to "one-shot aggregate-mask reconstruction of the active users via mask encoding/decoding". We show that LightSecAgg achieves the same privacy and dropout-resiliency guarantees as the state-of-the-art protocols while significantly reducing the overhead for resiliency against dropped users. We also demonstrate that, unlike existing schemes, LightSecAgg can be applied to secure aggregation in the asynchronous FL setting. Furthermore, we provide a modular system design and optimized on-device parallelization for scalable implementation, by enabling computational overlapping between model training and on-device encoding, as well as improving the speed of concurrent receiving and sending of chunked masks. We evaluate LightSecAgg via extensive experiments for training diverse models on various datasets in a realistic FL system with large number of users and demonstrate that LightSecAgg significantly reduces the total training time.
△ Less
Submitted 1 February, 2022; v1 submitted 29 September, 2021;
originally announced September 2021.
-
RSI-Net: Two-Stream Deep Neural Network for Remote Sensing Imagesbased Semantic Segmentation
Authors:
Shuang He,
Xia Lu,
Jason Gu,
Haitong Tang,
Qin Yu,
Kaiyue Liu,
Haozhou Ding,
Chunqi Chang,
Nizhuan Wang
Abstract:
For semantic segmentation of remote sensing images (RSI), trade-off between representation power and location accuracy is quite important. How to get the trade-off effectively is an open question,where current approaches of utilizing very deep models result in complex models with large memory consumption. In contrast to previous work that utilizes dilated convolutions or deep models, we propose a…
▽ More
For semantic segmentation of remote sensing images (RSI), trade-off between representation power and location accuracy is quite important. How to get the trade-off effectively is an open question,where current approaches of utilizing very deep models result in complex models with large memory consumption. In contrast to previous work that utilizes dilated convolutions or deep models, we propose a novel two-stream deep neural network for semantic segmentation of RSI (RSI-Net) to obtain improved performance through modeling and propagating spatial contextual structure effectively and a decoding scheme with image-level and graph-level combination. The first component explicitly models correlations between adjacent land covers and conduct flexible convolution on arbitrarily irregular image regions by using graph convolutional network, while densely connected atrous convolution network (DenseAtrousCNet) with multi-scale atrous convolution can expand the receptive fields and obtain image global information. Extensive experiments are implemented on the Vaihingen, Potsdam and Gaofen RSI datasets, where the comparison results demonstrate the superior performance of RSI-Net in terms of overall accuracy (91.83%, 93.31% and 93.67% on three datasets, respectively), F1 score (90.3%, 91.49% and 89.35% on three datasets, respectively) and kappa coefficient (89.46%, 90.46% and 90.37% on three datasets, respectively) when compared with six state-of-the-art RSI semantic segmentation methods.
△ Less
Submitted 31 March, 2022; v1 submitted 19 September, 2021;
originally announced September 2021.
-
Application of integral invariants to apictorial jigsaw puzzle assembly
Authors:
Peter Illig,
Robert Thompson,
Qimeng Yu
Abstract:
We present a method for the automatic assembly of apictorial jigsaw puzzles. This method relies on integral area invariants for shape matching and an optimization process to aggregate shape matches into a final puzzle assembly. Assumptions about individual piece shape or arrangement are not necessary. We illustrate our method by solving example puzzles of various shapes and sizes.
We present a method for the automatic assembly of apictorial jigsaw puzzles. This method relies on integral area invariants for shape matching and an optimization process to aggregate shape matches into a final puzzle assembly. Assumptions about individual piece shape or arrangement are not necessary. We illustrate our method by solving example puzzles of various shapes and sizes.
△ Less
Submitted 1 October, 2022; v1 submitted 14 September, 2021;
originally announced September 2021.
-
2-in-1 Accelerator: Enabling Random Precision Switch for Winning Both Adversarial Robustness and Efficiency
Authors:
Yonggan Fu,
Yang Zhao,
Qixuan Yu,
Chaojian Li,
Yingyan Lin
Abstract:
The recent breakthroughs of deep neural networks (DNNs) and the advent of billions of Internet of Things (IoT) devices have excited an explosive demand for intelligent IoT devices equipped with domain-specific DNN accelerators. However, the deployment of DNN accelerator enabled intelligent functionality into real-world IoT devices still remains particularly challenging. First, powerful DNNs often…
▽ More
The recent breakthroughs of deep neural networks (DNNs) and the advent of billions of Internet of Things (IoT) devices have excited an explosive demand for intelligent IoT devices equipped with domain-specific DNN accelerators. However, the deployment of DNN accelerator enabled intelligent functionality into real-world IoT devices still remains particularly challenging. First, powerful DNNs often come at prohibitive complexities, whereas IoT devices often suffer from stringent resource constraints. Second, while DNNs are vulnerable to adversarial attacks especially on IoT devices exposed to complex real-world environments, many IoT applications require strict security. Existing DNN accelerators mostly tackle only one of the two aforementioned challenges (i.e., efficiency or adversarial robustness) while neglecting or even sacrificing the other. To this end, we propose a 2-in-1 Accelerator, an integrated algorithm-accelerator co-design framework aiming at winning both the adversarial robustness and efficiency of DNN accelerators. Specifically, we first propose a Random Precision Switch (RPS) algorithm that can effectively defend DNNs against adversarial attacks by enabling random DNN quantization as an in-situ model switch. Furthermore, we propose a new precision-scalable accelerator featuring (1) a new precision-scalable MAC unit architecture which spatially tiles the temporal MAC units to boost both the achievable efficiency and flexibility and (2) a systematically optimized dataflow that is searched by our generic accelerator optimizer. Extensive experiments and ablation studies validate that our 2-in-1 Accelerator can not only aggressively boost both the adversarial robustness and efficiency of DNN accelerators under various attacks, but also naturally support instantaneous robustness-efficiency trade-offs adapting to varied resources without the necessity of DNN retraining.
△ Less
Submitted 20 September, 2021; v1 submitted 11 September, 2021;
originally announced September 2021.
-
A durable and efficient electrocatalyst for saline water splitting with current density exceeding 2000 mA cm -2
Authors:
Fengning Yang,
Yuting Luo,
Qiangmin Yu,
Zhiyuan Zhang,
Shuo Zhang,
Zhibo Liu,
Wencai Ren,
Hui-Ming Cheng,
Jiong Li,
Bilu Liu
Abstract:
Water electrolysis is promising for industrial hydrogen production to achieve a sustainable and green hydrogen economy, but the high cost of the technology limits its market share. Developing efficient yet economic electrocatalysts is crucial to decrease the cost of electricity and electrolytic cell. Meanwhile, electrolysis in seawater electrolyte can further reduce feedstock cost. Here we synthes…
▽ More
Water electrolysis is promising for industrial hydrogen production to achieve a sustainable and green hydrogen economy, but the high cost of the technology limits its market share. Developing efficient yet economic electrocatalysts is crucial to decrease the cost of electricity and electrolytic cell. Meanwhile, electrolysis in seawater electrolyte can further reduce feedstock cost. Here we synthesize a type of electrocatalyst where trace precious metals are strongly anchored on corrosion-resistive matrix. As an example, the produced Pt/Ni-Mo electrocatalyst only needs an overpotential of 113 mV to reach an ultrahigh current density of 2000 mA cm-2 in saline-alkaline electrolyte, standing as the best performance so far. It shows high activity and long durability in various electrolytes and under harsh conditions, including strong alkaline and simulated seawater electrolytes, and under elevated temperatures up to 80 degree Celsius). This electrocatalyst is produced on a large scale at low cost and shows good performance in a commercial membrane electrode assembly stack, demonstrating its feasibility for practical water electrolysis
△ Less
Submitted 30 August, 2021;
originally announced August 2021.
-
Glue-Assisted Grinding Exfoliation of Large-Size 2D Materials for Insulating Thermal Conduction and Large-Current-Density Hydrogen Evolution
Authors:
Liusi Yang,
Dashuai Wang,
Minsu Liu,
Heming Liu,
Junyang Tan,
Heyuan Zhou,
Zhongyue Wang,
Qiangmin Yu,
Jingyun Wang,
Junhao Lin,
Xiaolong Zou,
Ling Qiu,
Hui-Ming Cheng,
Bilu Liu
Abstract:
Two-dimensional (2D) materials have many promising applications, but their scalable production remains challenging. Herein, we develop a glue-assisted grinding exfoliation (GAGE) method in which the adhesive polymer acts as a glue to massively produce 2D materials with large lateral sizes, high quality, and high yield. Density functional theory simulation shows that the exfoliation mechanism invol…
▽ More
Two-dimensional (2D) materials have many promising applications, but their scalable production remains challenging. Herein, we develop a glue-assisted grinding exfoliation (GAGE) method in which the adhesive polymer acts as a glue to massively produce 2D materials with large lateral sizes, high quality, and high yield. Density functional theory simulation shows that the exfoliation mechanism involves the competition between the binding energy of selected polymers and the 2D materials which is larger than the exfoliation energy of the layered materials. Taking h-BN as an example, the GAGE produces 2D h-BN with an average lateral size of 2.18 μm and thickness of 3.91 nm. The method is also extended to produce various other 2D materials, including graphene, MoS2, Bi2O2Se, vermiculite, and montmorillonite. Two representative applications of thus-produced 2D materials have been demonstrated, including h-BN/polymer composites for insulating thermal conduction and MoS2 electrocatalysts for large-current-density hydrogen evolution, indicating the great potential of massively produced 2D materials.
△ Less
Submitted 30 August, 2021;
originally announced August 2021.
-
Exploiting Different Levels of Parallelism in the Quantum Control Microarchitecture for Superconducting Qubits
Authors:
Mengyu Zhang,
Lei Xie,
Zhenxing Zhang,
Qiaonian Yu,
Guanglei Xi,
Huangliang Zhang,
Fuming Liu,
Yarui Zheng,
Yicong Zheng,
Shengyu Zhang
Abstract:
As current Noisy Intermediate Scale Quantum (NISQ) devices suffer from decoherence errors, any delay in the instruction execution of quantum control microarchitecture can lead to the loss of quantum information and incorrect computation results. Hence, it is crucial for the control microarchitecture to issue quantum operations to the Quantum Processing Unit (QPU) in time. As in classical microarch…
▽ More
As current Noisy Intermediate Scale Quantum (NISQ) devices suffer from decoherence errors, any delay in the instruction execution of quantum control microarchitecture can lead to the loss of quantum information and incorrect computation results. Hence, it is crucial for the control microarchitecture to issue quantum operations to the Quantum Processing Unit (QPU) in time. As in classical microarchitecture, parallelism in quantum programs needs to be exploited for speedup. However, three challenges emerge in the quantum scenario: 1) the quantum feedback control can introduce significant pipeline stall latency; 2) timing control is required for all quantum operations; 3) QPU requires a deterministic operation supply to prevent the accumulation of quantum errors.
In this paper, we propose a novel control microarchitecture design to exploit Circuit Level Parallelism (CLP) and Quantum Operation Level Parallelism (QOLP). Firstly, we develop a Multiprocessor architecture to exploit CLP, which supports dynamic scheduling of different sub-circuits. This architecture can handle parallel feedback control and minimize the potential overhead that disrupts the timing control. Secondly, we propose a Quantum Superscalar approach that exploits QOLP by efficiently executing massive quantum instructions in parallel. Both methods issue quantum operations to QPU deterministically. In the benchmark test of a Shor syndrome measurement, a six-core implementation of our proposal achieves up to 2.59$\times$ speedup compared with a single core. For various canonical quantum computing algorithms, our superscalar approach achieves an average of 4.04$\times$ improvement over a baseline design. Finally, We perform a simultaneous randomized benchmarking (simRB) experiment on a real QPU using the proposed microarchitecture for validation.
△ Less
Submitted 26 August, 2021; v1 submitted 19 August, 2021;
originally announced August 2021.
-
Modal-Adaptive Gated Recoding Network for RGB-D Salient Object Detection
Authors:
Jinchao Zhu,
Xiaoyu Zhang,
Xian Fang,
Feng Dong,
Qiu Yu
Abstract:
The multi-modal salient object detection model based on RGB-D information has better robustness in the real world. However, it remains nontrivial to better adaptively balance effective multi-modal information in the feature fusion phase. In this letter, we propose a novel gated recoding network (GRNet) to evaluate the information validity of the two modes, and balance their influence. Our framewor…
▽ More
The multi-modal salient object detection model based on RGB-D information has better robustness in the real world. However, it remains nontrivial to better adaptively balance effective multi-modal information in the feature fusion phase. In this letter, we propose a novel gated recoding network (GRNet) to evaluate the information validity of the two modes, and balance their influence. Our framework is divided into three phases: perception phase, recoding mixing phase and feature integration phase. First, A perception encoder is adopted to extract multi-level single-modal features, which lays the foundation for multi-modal semantic comparative analysis. Then, a modal-adaptive gate unit (MGU) is proposed to suppress the invalid information and transfer the effective modal features to the recoding mixer and the hybrid branch decoder. The recoding mixer is responsible for recoding and mixing the balanced multi-modal information. Finally, the hybrid branch decoder completes the multi-level feature integration under the guidance of an optional edge guidance stream (OEGS). Experiments and analysis on eight popular benchmarks verify that our framework performs favorably against 9 state-of-art methods.
△ Less
Submitted 9 November, 2021; v1 submitted 13 August, 2021;
originally announced August 2021.
-
CSC-Unet: A Novel Convolutional Sparse Coding Strategy Based Neural Network for Semantic Segmentation
Authors:
Haitong Tang,
Shuang He,
Mengduo Yang,
Xia Lu,
Qin Yu,
Kaiyue Liu,
Hongjie Yan,
Nizhuan Wang
Abstract:
It is a challenging task to accurately perform semantic segmentation due to the complexity of real picture scenes. Many semantic segmentation methods based on traditional deep learning insufficiently captured the semantic and appearance information of images, which put limit on their generality and robustness for various application scenes. In this paper, we proposed a novel strategy that reformul…
▽ More
It is a challenging task to accurately perform semantic segmentation due to the complexity of real picture scenes. Many semantic segmentation methods based on traditional deep learning insufficiently captured the semantic and appearance information of images, which put limit on their generality and robustness for various application scenes. In this paper, we proposed a novel strategy that reformulated the popularly-used convolution operation to multi-layer convolutional sparse coding block to ease the aforementioned deficiency. This strategy can be possibly used to significantly improve the segmentation performance of any semantic segmentation model that involves convolutional operations. To prove the effectiveness of our idea, we chose the widely-used U-Net model for the demonstration purpose, and we designed CSC-Unet model series based on U-Net. Through extensive analysis and experiments, we provided credible evidence showing that the multi-layer convolutional sparse coding block enables semantic segmentation model to converge faster, can extract finer semantic and appearance information of images, and improve the ability to recover spatial detail information. The best CSC-Unet model significantly outperforms the results of the original U-Net on three public datasets with different scenarios, i.e., 87.14% vs. 84.71% on DeepCrack dataset, 68.91% vs. 67.09% on Nuclei dataset, and 53.68% vs. 48.82% on CamVid dataset, respectively.
△ Less
Submitted 11 March, 2024; v1 submitted 1 August, 2021;
originally announced August 2021.
-
The Robustness of Graph k-shell Structure under Adversarial Attacks
Authors:
B. Zhou,
Y. Q. Lv,
Y. C. Mao,
J. H. Wang,
S. Q. Yu,
Q. Xuan
Abstract:
The k-shell decomposition plays an important role in unveiling the structural properties of a network, i.e., it is widely adopted to find the densest part of a network across a broad range of scientific fields, including Internet, biological networks, social networks, etc. However, there arises concern about the robustness of the k-shell structure when networks suffer from adversarial attacks. Her…
▽ More
The k-shell decomposition plays an important role in unveiling the structural properties of a network, i.e., it is widely adopted to find the densest part of a network across a broad range of scientific fields, including Internet, biological networks, social networks, etc. However, there arises concern about the robustness of the k-shell structure when networks suffer from adversarial attacks. Here, we introduce and formalize the problem of the k-shell attack and develop an efficient strategy to attack the k-shell structure by rewiring a small number of links. To the best of our knowledge, it is the first time to study the robustness of graph k-shell structure under adversarial attacks. In particular, we propose a Simulated Annealing (SA) based k-shell attack method and testify it on four real-world social networks. The extensive experiments validate that the k-shell structure of a network is robust under random perturbation, but it is quite vulnerable under adversarial attack, e.g., in Dolphin and Throne networks, more than 40% nodes change their k-shell values when only 10% links are changed based on our SA-based k-shell attack. Such results suggest that a single structural feature could also be significantly disturbed when only a small fraction of links are changed purposefully in a network. Therefore, it could be an interesting topic to improve the robustness of various network properties against adversarial attack in the future.
△ Less
Submitted 29 July, 2021;
originally announced July 2021.
-
CI-Net: Contextual Information for Joint Semantic Segmentation and Depth Estimation
Authors:
Tianxiao Gao,
Wu Wei,
Zhongbin Cai,
Zhun Fan,
Shane Xie,
Xinmei Wang,
Qiuda Yu
Abstract:
Monocular depth estimation and semantic segmentation are two fundamental goals of scene understanding. Due to the advantages of task interaction, many works study the joint task learning algorithm. However, most existing methods fail to fully leverage the semantic labels, ignoring the provided context structures and only using them to supervise the prediction of segmentation split, which limit the…
▽ More
Monocular depth estimation and semantic segmentation are two fundamental goals of scene understanding. Due to the advantages of task interaction, many works study the joint task learning algorithm. However, most existing methods fail to fully leverage the semantic labels, ignoring the provided context structures and only using them to supervise the prediction of segmentation split, which limit the performance of both tasks. In this paper, we propose a network injected with contextual information (CI-Net) to solve the problem. Specifically, we introduce self-attention block in the encoder to generate attention map. With supervision from the ideal attention map created by semantic label, the network is embedded with contextual information so that it could understand scene better and utilize correlated features to make accurate prediction. Besides, a feature sharing module is constructed to make the task-specific features deeply fused and a consistency loss is devised to make the features mutually guided. We evaluate the proposed CI-Net on the NYU-Depth-v2 and SUN-RGBD datasets. The experimental results validate that our proposed CI-Net could effectively improve the accuracy of semantic segmentation and depth estimation.
△ Less
Submitted 1 September, 2021; v1 submitted 29 July, 2021;
originally announced July 2021.
-
MHD analysis on the physical designs of CFETR and HFRC
Authors:
Ping Zhu,
Li Li,
Yu Fang,
Yuling He,
Shuo Wang,
Rui Han,
Yue Liu,
Xiaojing Wang,
Yang Zhang,
Xiaodong Zhang,
Qingquan Yu,
Liqun Hu,
Huihui Wang,
Youwen Sun,
Lai Wei,
Weikang Tang,
Tong Liu,
Zhengxiong Wang,
Xingting Yan,
Wenlong Huang,
Yawei Hou,
Xiaoquan Ji,
Shiyong Zeng,
Zafar Abdullah,
Zhongyong Chen
, et al. (10 additional authors not shown)
Abstract:
The China Fusion Engineering Test Reactor (CFETR) and the Huazhong Field Reversed Configuration (HFRC), currently both under intensive physical and engineering designs in China, are the two major projects representative of the low-density steady-state and high-density pulsed pathways to fusion. One of the primary tasks of the physics designs for both CFETR and HFRC is the assessment and analysis o…
▽ More
The China Fusion Engineering Test Reactor (CFETR) and the Huazhong Field Reversed Configuration (HFRC), currently both under intensive physical and engineering designs in China, are the two major projects representative of the low-density steady-state and high-density pulsed pathways to fusion. One of the primary tasks of the physics designs for both CFETR and HFRC is the assessment and analysis of the magnetohydrodynamic (MHD) stability of the proposed design schemes. Comprehensive efforts on the assessment of MHD stability of CFETR and HFRC baseline scenarios have led to preliminary progresses that may further benefit engineering designs.
△ Less
Submitted 25 July, 2021;
originally announced July 2021.
-
Crosslink-Net: Double-branch Encoder Segmentation Network via Fusing Vertical and Horizontal Convolutions
Authors:
Qian Yu,
Lei Qi,
Luping Zhou,
Lei Wang,
Yilong Yin,
Yinghuan Shi,
Wuzhang Wang,
Yang Gao
Abstract:
Accurate image segmentation plays a crucial role in medical image analysis, yet it faces great challenges of various shapes, diverse sizes, and blurry boundaries. To address these difficulties, square kernel-based encoder-decoder architecture has been proposed and widely used, but its performance remains still unsatisfactory. To further cope with these challenges, we present a novel double-branch…
▽ More
Accurate image segmentation plays a crucial role in medical image analysis, yet it faces great challenges of various shapes, diverse sizes, and blurry boundaries. To address these difficulties, square kernel-based encoder-decoder architecture has been proposed and widely used, but its performance remains still unsatisfactory. To further cope with these challenges, we present a novel double-branch encoder architecture. Our architecture is inspired by two observations: 1) Since the discrimination of features learned via square convolutional kernels needs to be further improved, we propose to utilize non-square vertical and horizontal convolutional kernels in the double-branch encoder, so features learned by the two branches can be expected to complement each other. 2) Considering that spatial attention can help models to better focus on the target region in a large-sized image, we develop an attention loss to further emphasize the segmentation on small-sized targets. Together, the above two schemes give rise to a novel double-branch encoder segmentation framework for medical image segmentation, namely Crosslink-Net. The experiments validate the effectiveness of our model on four datasets. The code is released at https://github.com/Qianyu1226/Crosslink-Net.
△ Less
Submitted 23 July, 2021;
originally announced July 2021.
-
DRIVE: Deep Reinforced Accident Anticipation with Visual Explanation
Authors:
Wentao Bao,
Qi Yu,
Yu Kong
Abstract:
Traffic accident anticipation aims to accurately and promptly predict the occurrence of a future accident from dashcam videos, which is vital for a safety-guaranteed self-driving system. To encourage an early and accurate decision, existing approaches typically focus on capturing the cues of spatial and temporal context before a future accident occurs. However, their decision-making lacks visual e…
▽ More
Traffic accident anticipation aims to accurately and promptly predict the occurrence of a future accident from dashcam videos, which is vital for a safety-guaranteed self-driving system. To encourage an early and accurate decision, existing approaches typically focus on capturing the cues of spatial and temporal context before a future accident occurs. However, their decision-making lacks visual explanation and ignores the dynamic interaction with the environment. In this paper, we propose Deep ReInforced accident anticipation with Visual Explanation, named DRIVE. The method simulates both the bottom-up and top-down visual attention mechanism in a dashcam observation environment so that the decision from the proposed stochastic multi-task agent can be visually explained by attentive regions. Moreover, the proposed dense anticipation reward and sparse fixation reward are effective in training the DRIVE model with our improved reinforcement learning algorithm. Experimental results show that the DRIVE model achieves state-of-the-art performance on multiple real-world traffic accident datasets. Code and pre-trained model are available at \url{https://www.rit.edu/actionlab/drive}.
△ Less
Submitted 6 September, 2021; v1 submitted 21 July, 2021;
originally announced July 2021.
-
Evidential Deep Learning for Open Set Action Recognition
Authors:
Wentao Bao,
Qi Yu,
Yu Kong
Abstract:
In a real-world scenario, human actions are typically out of the distribution from training data, which requires a model to both recognize the known actions and reject the unknown. Different from image data, video actions are more challenging to be recognized in an open-set setting due to the uncertain temporal dynamics and static bias of human actions. In this paper, we propose a Deep Evidential…
▽ More
In a real-world scenario, human actions are typically out of the distribution from training data, which requires a model to both recognize the known actions and reject the unknown. Different from image data, video actions are more challenging to be recognized in an open-set setting due to the uncertain temporal dynamics and static bias of human actions. In this paper, we propose a Deep Evidential Action Recognition (DEAR) method to recognize actions in an open testing set. Specifically, we formulate the action recognition problem from the evidential deep learning (EDL) perspective and propose a novel model calibration method to regularize the EDL training. Besides, to mitigate the static bias of video representation, we propose a plug-and-play module to debias the learned representation through contrastive learning. Experimental results show that our DEAR method achieves consistent performance gain on multiple mainstream action recognition models and benchmarks. Code and pre-trained models are available at {\small{\url{https://www.rit.edu/actionlab/dear}}}.
△ Less
Submitted 18 August, 2021; v1 submitted 21 July, 2021;
originally announced July 2021.
-
Gravitational Wave From Axion-like Particle Inflation
Authors:
Wei Cheng,
Tao Qian,
Qing Yu,
Hua Zhou,
Rui-Yu Zhou
Abstract:
In this paper, we investigate the Axion-like Particle inflation by applying the multi-nature inflation model, where the end of inflation is achieved through the phase transition (PT). The events of PT should not be less than $200$, which results in the free parameter $n\geq404$. Under the latest CMB restrictions, we found that the inflation energy is fixed at $10^{15} \rm{GeV}$. Then, we deeply di…
▽ More
In this paper, we investigate the Axion-like Particle inflation by applying the multi-nature inflation model, where the end of inflation is achieved through the phase transition (PT). The events of PT should not be less than $200$, which results in the free parameter $n\geq404$. Under the latest CMB restrictions, we found that the inflation energy is fixed at $10^{15} \rm{GeV}$. Then, we deeply discussed the corresponding stochastic background of the primordial gravitational wave (GW) during inflation. We study the two kinds of $n$ cases, i.e., $n=404, 2000$. We observe that the magnitude of $n$ is negligible for the physical observations, such as $n_s$, $r$, $Λ$, and $Ω_{\rm{GW}}h^2$. In the low-frequency regions, the GW is dominated by the quantum fluctuations, and this GW can be detected by Decigo at $10^{-1}~\rm{Hz}$. However, GW generated by PT dominates the high-frequency regions, which is expected to be detected by future 3DSR detector.
△ Less
Submitted 9 July, 2021;
originally announced July 2021.
-
Feature Cross Search via Submodular Optimization
Authors:
Lin Chen,
Hossein Esfandiari,
Gang Fu,
Vahab S. Mirrokni,
Qian Yu
Abstract:
In this paper, we study feature cross search as a fundamental primitive in feature engineering. The importance of feature cross search especially for the linear model has been known for a while, with well-known textbook examples. In this problem, the goal is to select a small subset of features, combine them to form a new feature (called the crossed feature) by considering their Cartesian product,…
▽ More
In this paper, we study feature cross search as a fundamental primitive in feature engineering. The importance of feature cross search especially for the linear model has been known for a while, with well-known textbook examples. In this problem, the goal is to select a small subset of features, combine them to form a new feature (called the crossed feature) by considering their Cartesian product, and find feature crosses to learn an \emph{accurate} model. In particular, we study the problem of maximizing a normalized Area Under the Curve (AUC) of the linear model trained on the crossed feature column.
First, we show that it is not possible to provide an $n^{1/\log\log n}$-approximation algorithm for this problem unless the exponential time hypothesis fails. This result also rules out the possibility of solving this problem in polynomial time unless $\mathsf{P}=\mathsf{NP}$. On the positive side, by assuming the \naive\ assumption, we show that there exists a simple greedy $(1-1/e)$-approximation algorithm for this problem. This result is established by relating the AUC to the total variation of the commutator of two probability measures and showing that the total variation of the commutator is monotone and submodular. To show this, we relate the submodularity of this function to the positive semi-definiteness of a corresponding kernel matrix. Then, we use Bochner's theorem to prove the positive semi-definiteness by showing that its inverse Fourier transform is non-negative everywhere. Our techniques and structural results might be of independent interest.
△ Less
Submitted 5 July, 2021;
originally announced July 2021.
-
Do grid codes afford generalization and flexible decision-making?
Authors:
Linda Q. Yu,
Seongmin A. Park,
Sarah C. Sweigart,
Erie D. Boorman,
Matthew R. Nassar
Abstract:
Behavioral flexibility is learning from previous experiences and planning appropriate actions in a changing or novel environment. Successful behavioral adaptation depends on internal models the brain builds to represent the relational structure of an abstract task. Emerging evidence suggests that the well-known roles of the hippocampus and entorhinal cortex (HC-EC) in integrating spatial relations…
▽ More
Behavioral flexibility is learning from previous experiences and planning appropriate actions in a changing or novel environment. Successful behavioral adaptation depends on internal models the brain builds to represent the relational structure of an abstract task. Emerging evidence suggests that the well-known roles of the hippocampus and entorhinal cortex (HC-EC) in integrating spatial relationships into cognitive maps can be extended to map the transition structure between states in non-spatial abstract tasks. However, what the EC grid-codes actually compute to afford generalization remains elusive. We introduce two non-exclusive ideas regarding what grid-codes may represent to afford higher-level cognition. One idea is that grid-codes are eigenvectors of the successor representation (SR) learned online during a task. This view assumes that the grid codes serve as an efficient basis function for learning and representing experienced relationships between entities. Subsequently, the grid codes facilitate generalization in novel contexts such as when the goal changes. The second idea is that the grid-codes reflect the inferred global task structure. This view assumes that the grid-code represents a structural code that is factorized from specific sensory content, enabling structural information to be transferred across tasks. Subsequently, the brain could afford one-shot inferences without requiring experience. The ability to generalize experiences and make appropriate decisions in novel situations is critical for both animals and machines. Here we review proposed computations of the grid-code in the brain, which is potentially critical to behavioral flexibility.
△ Less
Submitted 30 June, 2021;
originally announced June 2021.
-
Seeking celestial Positronium with an OH-suppressed diffraction-limited spectrograph
Authors:
J. Gordon Robertson,
Simon Ellis,
Qingshan Yu,
Joss Bland-Hawthorn,
Christopher Betters,
Martin Roth,
Sergio Leon-Saval
Abstract:
Celestially, Positronium (Ps), has only been observed through gamma-ray emission produced by its annihilation. However, in its triplet state, a Ps atom has a mean lifetime long enough for electronic transitions to occur between quantum states. This produces a recombination spectrum observable in principle at near IR wavelengths, where angular resolution greatly exceeding that of the gamma-ray obse…
▽ More
Celestially, Positronium (Ps), has only been observed through gamma-ray emission produced by its annihilation. However, in its triplet state, a Ps atom has a mean lifetime long enough for electronic transitions to occur between quantum states. This produces a recombination spectrum observable in principle at near IR wavelengths, where angular resolution greatly exceeding that of the gamma-ray observations is possible. However, the background in the NIR is dominated by extremely bright atmospheric hydroxyl (OH) emission lines. In this paper we present the design of a diffraction-limited spectroscopic system using novel photonic components - a photonic lantern, OH Fiber Bragg Grating filters, and a photonic TIGER 2-dimensional pseudo-slit - to observe the Ps Balmer alpha line at 1.3122 microns for the first time.
△ Less
Submitted 18 June, 2021;
originally announced June 2021.