subscribe to arXiv mailings

arXiv:2006.10893 [pdf]

Nearly-incompressible transverse isotropy (NITI) of cornea elasticity: model and experiments with acoustic micro-tapping OCE

Authors: John J Pitre Jr, Mitchell A Kirby, David S Li, Tueng T Shen, Ruikang K Wang, Matthew O'Donnell, Ivan Pelivanov

Abstract: The cornea provides the largest refractive power for the human visual system. Its stiffness, along with intraocular pressure (IOP), are linked to several pathologies, including keratoconus and glaucoma. Although mechanical tests can quantify corneal elasticity ex vivo, they cannot be used clinically. Optical coherence elastography (OCE), which launches and tracks shear waves to estimate stiffness,… ▽ More The cornea provides the largest refractive power for the human visual system. Its stiffness, along with intraocular pressure (IOP), are linked to several pathologies, including keratoconus and glaucoma. Although mechanical tests can quantify corneal elasticity ex vivo, they cannot be used clinically. Optical coherence elastography (OCE), which launches and tracks shear waves to estimate stiffness, provides an attractive non-contact probe of corneal elasticity. To date, however, OCE studies report corneal moduli around tens of kPa, orders-of-magnitude less than those (few MPa) obtained by tensile/inflation testing. This large discrepancy impedes OCE's clinical adoption. Based on corneal microstructure, we introduce and fully characterize a nearly-incompressible transversally isotropic (NITI) model depicting corneal biomechanics. We show that the cornea must be described by two shear moduli, contrary to current single-modulus models, decoupling tensile and shear responses. We measure both as a function of IOP in ex vivo porcine cornea, obtaining values consistent with both tensile and shear tests. At pressures above 30 mmHg, the model begins to fail, consistent with non-linear changes in cornea at high IOP. △ Less

Submitted 18 June, 2020; originally announced June 2020.

Comments: 41 pages, 14 figures, including supplementary notes, JJP and MAK contributed equally to this work

arXiv:2006.10516 [pdf, other]

Self-Attention Enhanced Patient Journey Understanding in Healthcare System

Authors: Xueping Peng, Guodong Long, Tao Shen, Sen Wang, Jing Jiang

Abstract: Understanding patients' journeys in healthcare system is a fundamental prepositive task for a broad range of AI-based healthcare applications. This task aims to learn an informative representation that can comprehensively encode hidden dependencies among medical events and its inner entities, and then the use of encoding outputs can greatly benefit the downstream application-driven tasks. A patien… ▽ More Understanding patients' journeys in healthcare system is a fundamental prepositive task for a broad range of AI-based healthcare applications. This task aims to learn an informative representation that can comprehensively encode hidden dependencies among medical events and its inner entities, and then the use of encoding outputs can greatly benefit the downstream application-driven tasks. A patient journey is a sequence of electronic health records (EHRs) over time that is organized at multiple levels: patient, visits and medical codes. The key challenge of patient journey understanding is to design an effective encoding mechanism which can properly tackle the aforementioned multi-level structured patient journey data with temporal sequential visits and a set of medical codes. This paper proposes a novel self-attention mechanism that can simultaneously capture the contextual and temporal relationships hidden in patient journeys. A multi-level self-attention network (MusaNet) is specifically designed to learn the representations of patient journeys that is used to be a long sequence of activities. The MusaNet is trained in end-to-end manner using the training data derived from EHRs. We evaluated the efficacy of our method on two medical application tasks with real-world benchmark datasets. The results have demonstrated the proposed MusaNet produces higher-quality representations than state-of-the-art baseline methods. The source code is available in https://github.com/xueping/MusaNet. △ Less

Submitted 18 June, 2020; v1 submitted 15 June, 2020; originally announced June 2020.

Comments: 16 pages, 6 figures, accepted by ECML/PKDD 2020

arXiv:2006.04018

Projecting and comparing non-pharmaceutical interventions to contain COVID-19 in major economies

Authors: Jingjing He, Xuefei Guan, Xiaochang Duan, Tian Shen, Jing Lin

Abstract: Non-pharmaceutical interventions (NPIs) such as quarantine, self-isolation, social distancing, and virus-contact tracing can greatly reduce the spread of the virus during a pandemic. In the wave of the COVID-19 pandemic, many countries have implemented various NPIs for infection control and mitigation. However, the stringency of the NPIs and the resulting impact among different countries remain un… ▽ More Non-pharmaceutical interventions (NPIs) such as quarantine, self-isolation, social distancing, and virus-contact tracing can greatly reduce the spread of the virus during a pandemic. In the wave of the COVID-19 pandemic, many countries have implemented various NPIs for infection control and mitigation. However, the stringency of the NPIs and the resulting impact among different countries remain unclear due to the lack of quantitative factors. In this study we took a further step to incorporate the effect of the NPIs into the pandemic dynamics model using the concept of policy intensity factor (PIF). This idea enables us to characterize the transition rates as time varying quantities instead of constant values, and thus capturing the dynamical behavior of the basic reproduction number variation in the pandemic. By leveraging a great amount of data reported by the governments and the World Health Organization, we projected the dynamics of the pandemic for the major economies in the world, including the numbers of infected, susceptible, and recovered cases, as well as the pandemic durations. It is observed that the proposed variable-rate susceptible-exposed-infected-recovered (VR-SEIR) model fits and projects the pandemic dynamics very well. We further showed that the resulting PIFs correlate with the stringency of NPIs, which allows us to project the final affected numbers of people in those countries when their current NPIs have been imposed for 90, 180, 360 days. It provides a quantitative insight into the effectiveness of the implemented NPIs, and sheds a new light on minimizing both affected people from COVID-19 and the economic impact. △ Less

Submitted 14 December, 2020; v1 submitted 6 June, 2020; originally announced June 2020.

Comments: The results in this study projects the pandemic will end in one half to one year, which apparently is meaningless. Therefore, it is considered not accurate. To avoid unnecessary ambiguity, the authors would like to withdraw this draft. Thank you

arXiv:2005.02153 [pdf, other]

Improving Target-driven Visual Navigation with Attention on 3D Spatial Relationships

Authors: Yunlian Lv, Ning Xie, Yimin Shi, Zijiao Wang, Heng Tao Shen

Abstract: Embodied artificial intelligence (AI) tasks shift from tasks focusing on internet images to active settings involving embodied agents that perceive and act within 3D environments. In this paper, we investigate the target-driven visual navigation using deep reinforcement learning (DRL) in 3D indoor scenes, whose navigation task aims to train an agent that can intelligently make a series of decision… ▽ More Embodied artificial intelligence (AI) tasks shift from tasks focusing on internet images to active settings involving embodied agents that perceive and act within 3D environments. In this paper, we investigate the target-driven visual navigation using deep reinforcement learning (DRL) in 3D indoor scenes, whose navigation task aims to train an agent that can intelligently make a series of decisions to arrive at a pre-specified target location from any possible starting positions only based on egocentric views. However, most navigation methods currently struggle against several challenging problems, such as data efficiency, automatic obstacle avoidance, and generalization. Generalization problem means that agent does not have the ability to transfer navigation skills learned from previous experience to unseen targets and scenes. To address these issues, we incorporate two designs into classic DRL framework: attention on 3D knowledge graph (KG) and target skill extension (TSE) module. On the one hand, our proposed method combines visual features and 3D spatial representations to learn navigation policy. On the other hand, TSE module is used to generate sub-targets which allow agent to learn from failures. Specifically, our 3D spatial relationships are encoded through recently popular graph convolutional network (GCN). Considering the real world settings, our work also considers open action and adds actionable targets into conventional navigation situations. Those more difficult settings are applied to test whether DRL agent really understand its task, navigating environment, and can carry out reasoning. Our experiments, performed in the AI2-THOR, show that our model outperforms the baselines in both SR and SPL metrics, and improves generalization ability across targets and scenes. △ Less

Submitted 29 April, 2020; originally announced May 2020.

Comments: 12 pages, 9 figures

arXiv:2005.01026 [pdf, other]

doi 10.1007/s11280-022-01046-x

Multi-Center Federated Learning: Clients Clustering for Better Personalization

Authors: Guodong Long, Ming Xie, Tao Shen, Tianyi Zhou, Xianzhi Wang, Jing Jiang, Chengqi Zhang

Abstract: Federated learning has received great attention for its capability to train a large-scale model in a decentralized manner without needing to access user data directly. It helps protect the users' private data from centralized collecting. Unlike distributed machine learning, federated learning aims to tackle non-IID data from heterogeneous sources in various real-world applications, such as those o… ▽ More Federated learning has received great attention for its capability to train a large-scale model in a decentralized manner without needing to access user data directly. It helps protect the users' private data from centralized collecting. Unlike distributed machine learning, federated learning aims to tackle non-IID data from heterogeneous sources in various real-world applications, such as those on smartphones. Existing federated learning approaches usually adopt a single global model to capture the shared knowledge of all users by aggregating their gradients, regardless of the discrepancy between their data distributions. However, due to the diverse nature of user behaviors, assigning users' gradients to different global models (i.e., centers) can better capture the heterogeneity of data distributions across users. Our paper proposes a novel multi-center aggregation mechanism for federated learning, which learns multiple global models from the non-IID user data and simultaneously derives the optimal matching between users and centers. We formulate the problem as a joint optimization that can be efficiently solved by a stochastic expectation maximization (EM) algorithm. Our experimental results on benchmark datasets show that our method outperforms several popular federated learning methods. △ Less

Submitted 5 February, 2023; v1 submitted 3 May, 2020; originally announced May 2020.

Comments: This paper has two duplicated versions: 2005.01026 and 2108.08647. The first one 2005.01026 is the right one, and the second one 2108.08647 should be deleted because it always causes misoperating

Journal ref: World Wide Web,26,(2003),481-500

arXiv:2004.14781 [pdf, other]

doi 10.1145/3442381.3450043

Structure-Augmented Text Representation Learning for Efficient Knowledge Graph Completion

Authors: Bo Wang, Tao Shen, Guodong Long, Tianyi Zhou, Yi Chang

Abstract: Human-curated knowledge graphs provide critical supportive information to various natural language processing tasks, but these graphs are usually incomplete, urging auto-completion of them. Prevalent graph embedding approaches, e.g., TransE, learn structured knowledge via representing graph elements into dense embeddings and capturing their triple-level relationship with spatial distance. However,… ▽ More Human-curated knowledge graphs provide critical supportive information to various natural language processing tasks, but these graphs are usually incomplete, urging auto-completion of them. Prevalent graph embedding approaches, e.g., TransE, learn structured knowledge via representing graph elements into dense embeddings and capturing their triple-level relationship with spatial distance. However, they are hardly generalizable to the elements never visited in training and are intrinsically vulnerable to graph incompleteness. In contrast, textual encoding approaches, e.g., KG-BERT, resort to graph triple's text and triple-level contextualized representations. They are generalizable enough and robust to the incompleteness, especially when coupled with pre-trained encoders. But two major drawbacks limit the performance: (1) high overheads due to the costly scoring of all possible triples in inference, and (2) a lack of structured knowledge in the textual encoder. In this paper, we follow the textual encoding paradigm and aim to alleviate its drawbacks by augmenting it with graph embedding techniques -- a complementary hybrid of both paradigms. Specifically, we partition each triple into two asymmetric parts as in translation-based graph embedding approach, and encode both parts into contextualized representations by a Siamese-style textual encoder. Built upon the representations, our model employs both deterministic classifier and spatial measurement for representation and structure learning respectively. Moreover, we develop a self-adaptive ensemble scheme to further improve the performance by incorporating triple scores from an existing graph embedding model. In experiments, we achieve state-of-the-art performance on three benchmarks and a zero-shot dataset for link prediction, with highlights of inference costs reduced by 1-2 orders of magnitude compared to a textual encoding method. △ Less

Submitted 23 February, 2021; v1 submitted 30 April, 2020; originally announced April 2020.

Comments: 12 pages, WWW'21, April19-23, 2021, Ljubljana, Slovenia

arXiv:2004.14224 [pdf, other]

Exploiting Structured Knowledge in Text via Graph-Guided Representation Learning

Authors: Tao Shen, Yi Mao, Pengcheng He, Guodong Long, Adam Trischler, Weizhu Chen

Abstract: In this work, we aim at equipping pre-trained language models with structured knowledge. We present two self-supervised tasks learning over raw text with the guidance from knowledge graphs. Building upon entity-level masked language models, our first contribution is an entity masking scheme that exploits relational knowledge underlying the text. This is fulfilled by using a linked knowledge graph… ▽ More In this work, we aim at equipping pre-trained language models with structured knowledge. We present two self-supervised tasks learning over raw text with the guidance from knowledge graphs. Building upon entity-level masked language models, our first contribution is an entity masking scheme that exploits relational knowledge underlying the text. This is fulfilled by using a linked knowledge graph to select informative entities and then masking their mentions. In addition we use knowledge graphs to obtain distractors for the masked entities, and propose a novel distractor-suppressed ranking objective which is optimized jointly with masked language model. In contrast to existing paradigms, our approach uses knowledge graphs implicitly, only during pre-training, to inject language models with structured knowledge via learning from raw text. It is more efficient than retrieval-based methods that perform entity linking and integration during finetuning and inference, and generalizes more effectively than the methods that directly learn from concatenated graph triples. Experiments show that our proposed model achieves improved performance on five benchmark datasets, including question answering and knowledge base completion tasks. △ Less

Submitted 29 April, 2020; originally announced April 2020.

arXiv:2004.08752 [pdf, other]

Zeus: A System Description of the Two-Time Winner of the Collegiate SAE AutoDrive Competition

Authors: Keenan Burnett, Jingxing Qian, Xintong Du, Linqiao Liu, David J. Yoon, Tianchang Shen, Susan Sun, Sepehr Samavi, Michael J. Sorocky, Mollie Bianchi, Kaicheng Zhang, Arkady Arkhangorodsky, Quinlan Sykora, Shichen Lu, Yizhou Huang, Angela P. Schoellig, Timothy D. Barfoot

Abstract: The SAE AutoDrive Challenge is a three-year collegiate competition to develop a self-driving car by 2020. The second year of the competition was held in June 2019 at MCity, a mock town built for self-driving car testing at the University of Michigan. Teams were required to autonomously navigate a series of intersections while handling pedestrians, traffic lights, and traffic signs. Zeus is aUToron… ▽ More The SAE AutoDrive Challenge is a three-year collegiate competition to develop a self-driving car by 2020. The second year of the competition was held in June 2019 at MCity, a mock town built for self-driving car testing at the University of Michigan. Teams were required to autonomously navigate a series of intersections while handling pedestrians, traffic lights, and traffic signs. Zeus is aUToronto's winning entry in the AutoDrive Challenge. This article describes the system design and development of Zeus as well as many of the lessons learned along the way. This includes details on the team's organizational structure, sensor suite, software components, and performance at the Year 2 competition. With a team of mostly undergraduates and minimal resources, aUToronto has made progress towards a functioning self-driving vehicle, in just two years. This article may prove valuable to researchers looking to develop their own self-driving platform. △ Less

Submitted 18 April, 2020; originally announced April 2020.

Comments: Submitted to the Journal of Field Robotics

arXiv:2004.07684 [pdf, other]

Joint Semantic Segmentation and Boundary Detection using Iterative Pyramid Contexts

Authors: Mingmin Zhen, Jinglu Wang, Lei Zhou, Shiwei Li, Tianwei Shen, Jiaxiang Shang, Tian Fang, Quan Long

Abstract: In this paper, we present a joint multi-task learning framework for semantic segmentation and boundary detection. The critical component in the framework is the iterative pyramid context module (PCM), which couples two tasks and stores the shared latent semantics to interact between the two tasks. For semantic boundary detection, we propose the novel spatial gradient fusion to suppress nonsemantic… ▽ More In this paper, we present a joint multi-task learning framework for semantic segmentation and boundary detection. The critical component in the framework is the iterative pyramid context module (PCM), which couples two tasks and stores the shared latent semantics to interact between the two tasks. For semantic boundary detection, we propose the novel spatial gradient fusion to suppress nonsemantic edges. As semantic boundary detection is the dual task of semantic segmentation, we introduce a loss function with boundary consistency constraint to improve the boundary pixel accuracy for semantic segmentation. Our extensive experiments demonstrate superior performance over state-of-the-art works, not only in semantic segmentation but also in semantic boundary detection. In particular, a mean IoU score of 81:8% on Cityscapes test set is achieved without using coarse data or any external data for semantic segmentation. For semantic boundary detection, we improve over previous state-of-the-art works by 9.9% in terms of AP and 6:8% in terms of MF(ODS). △ Less

Submitted 16 April, 2020; originally announced April 2020.

arXiv:2004.01194 [pdf, other]

doi 10.1021/acs.jctc.0c00288

Unveiling the Finite Temperature Physics of Hydrogen Chains via Auxiliary Field Quantum Monte Carlo

Authors: Yuan Liu, Tong Shen, Hang Zhang, Brenda Rubenstein

Abstract: The ability to accurately predict the finite temperature properties of realistic quantum solids is central to uncovering new phases and engineering materials with novel properties. Nonetheless, there remain comparatively few many-body techniques capable of elucidating the finite temperature physics of solids from first principles. In this work, we take a significant step towards developing such a… ▽ More The ability to accurately predict the finite temperature properties of realistic quantum solids is central to uncovering new phases and engineering materials with novel properties. Nonetheless, there remain comparatively few many-body techniques capable of elucidating the finite temperature physics of solids from first principles. In this work, we take a significant step towards developing such a technique by generalizing our previous, fully ab initio finite temperature Auxiliary Field Quantum Monte Carlo (FT-AFQMC) method to model periodic solids and employing it to uncover the finite temperature physics of periodic hydrogen chains. Based upon our calculations of these chains' many-body thermodynamic quantities and correlation functions, we outline their metal-insulator and magnetic ordering as a function of both H-H bond distance and temperature. At low temperatures approaching the ground state, we observe both metal-insulator and ferromagnetic-antiferromagnetic crossovers at bond lengths between 0.5 and 0.75 Å. We then demonstrate how this low-temperature ordering evolves into a metallic phase with decreasing magnetic order at higher temperatures. By comparing the features we observe to those previously seen in one-dimensional, half-filled Hubbard models at finite temperature and in ground state hydrogen chains, interestingly, we identify signatures of the Pomeranchuk effect in hydrogen chains for the first time and show that spin and charge excitations that typically arise at distinct temperatures in the Hubbard model are indistinguishably coupled in these systems. Beyond qualitatively revealing the many-body phase behavior of hydrogen chains, our efforts shed light on the further theoretical developments that will be required to construct the phase diagrams of the more complex transition metal, lanthanide, and actinide solids of longstanding interest to physicists. △ Less

Submitted 20 July, 2020; v1 submitted 2 April, 2020; originally announced April 2020.

Comments: 52 pages, 12 figures

Journal ref: J. Chem. Theory Comput. 2020, 16, 7, 4298-4314

arXiv:2003.12704 [pdf, other]

doi 10.1103/PhysRevB.102.045117

The influence of high-energy local orbitals and electron-phonon interactions on the band gaps and optical spectra of hexagonal boron nitride

Authors: Tong Shen, Xiao-Wei Zhang, Honghui Shang, Min-Ye Zhang, Xinqiang Wang, En-Ge Wang, Hong Jiang, Xin-Zheng Li

Abstract: We report $ab$ $initio$ band diagram and optical absorption spectra of hexagonal boron nitride ($h$-BN), focusing on unravelling how the completeness of basis set for $GW$ calculations and how electron-phonon interactions (EPIs) impact on them. The completeness of basis set, an issue which was seldom discussed in previous optical spectra calculations of $h$-BN, is found crucial in providing conver… ▽ More We report $ab$ $initio$ band diagram and optical absorption spectra of hexagonal boron nitride ($h$-BN), focusing on unravelling how the completeness of basis set for $GW$ calculations and how electron-phonon interactions (EPIs) impact on them. The completeness of basis set, an issue which was seldom discussed in previous optical spectra calculations of $h$-BN, is found crucial in providing converged quasiparticle band gaps. In the comparison among three different codes, we demonstrate that by including high-energy local orbitals in the all-electron linearized augmented plane waves based $GW$ calculations, the quasiparticle direct and fundamental indirect band gaps are widened by $\sim$0.2 eV, giving values of 6.81 eV and 6.25 eV respectively at the $GW_0$ level. EPIs, on the other hand, reduce them to 6.62 eV and 6.03 eV respectively at 0 K, and 6.60 eV and 5.98 eV respectively at 300 K. With clamped crystal structure, the first peak of the absorption spectrum is at 6.07 eV, originating from the direct exciton contributed by electron transitions around $K$ in the Brillouin zone. After including the EPIs-renormalized quasiparticles in the Bethe-Salpeter equation, the exciton-phonon coupling shifts the first peak to 5.83 eV at 300 K, lower than the experimental value of $\sim$6.00 eV. This accuracy is acceptable to an $ab$ $initio$ description of excited states with no fitting parameter. △ Less

Submitted 23 May, 2020; v1 submitted 28 March, 2020; originally announced March 2020.

Journal ref: Phys. Rev. B 102, 045117 (2020)

arXiv:2003.10629 [pdf, other]

KFNet: Learning Temporal Camera Relocalization using Kalman Filtering

Authors: Lei Zhou, Zixin Luo, Tianwei Shen, Jiahui Zhang, Mingmin Zhen, Yao Yao, Tian Fang, Long Quan

Abstract: Temporal camera relocalization estimates the pose with respect to each video frame in sequence, as opposed to one-shot relocalization which focuses on a still image. Even though the time dependency has been taken into account, current temporal relocalization methods still generally underperform the state-of-the-art one-shot approaches in terms of accuracy. In this work, we improve the temporal rel… ▽ More Temporal camera relocalization estimates the pose with respect to each video frame in sequence, as opposed to one-shot relocalization which focuses on a still image. Even though the time dependency has been taken into account, current temporal relocalization methods still generally underperform the state-of-the-art one-shot approaches in terms of accuracy. In this work, we improve the temporal relocalization method by using a network architecture that incorporates Kalman filtering (KFNet) for online camera relocalization. In particular, KFNet extends the scene coordinate regression problem to the time domain in order to recursively establish 2D and 3D correspondences for the pose determination. The network architecture design and the loss formulation are based on Kalman filtering in the context of Bayesian learning. Extensive experiments on multiple relocalization benchmarks demonstrate the high accuracy of KFNet at the top of both one-shot and temporal relocalization approaches. Our codes are released at https://github.com/zlthinker/KFNet. △ Less

Submitted 23 March, 2020; originally announced March 2020.

Comments: An oral paper of CVPR 2020

arXiv:2003.10211 [pdf, other]

Spatial Pyramid Based Graph Reasoning for Semantic Segmentation

Authors: Xia Li, Yibo Yang, Qijie Zhao, Tiancheng Shen, Zhouchen Lin, Hong Liu

Abstract: The convolution operation suffers from a limited receptive filed, while global modeling is fundamental to dense prediction tasks, such as semantic segmentation. In this paper, we apply graph convolution into the semantic segmentation task and propose an improved Laplacian. The graph reasoning is directly performed in the original feature space organized as a spatial pyramid. Different from existin… ▽ More The convolution operation suffers from a limited receptive filed, while global modeling is fundamental to dense prediction tasks, such as semantic segmentation. In this paper, we apply graph convolution into the semantic segmentation task and propose an improved Laplacian. The graph reasoning is directly performed in the original feature space organized as a spatial pyramid. Different from existing methods, our Laplacian is data-dependent and we introduce an attention diagonal matrix to learn a better distance metric. It gets rid of projecting and re-projecting processes, which makes our proposed method a light-weight module that can be easily plugged into current computer vision architectures. More importantly, performing graph reasoning directly in the feature space retains spatial relationships and makes spatial pyramid possible to explore multiple long-range contextual patterns from different scales. Experiments on Cityscapes, COCO Stuff, PASCAL Context and PASCAL VOC demonstrate the effectiveness of our proposed methods on semantic segmentation. We achieve comparable performance with advantages in computational and memory overhead. △ Less

Submitted 23 March, 2020; originally announced March 2020.

Comments: CVPR 2020

arXiv:2002.03079 [pdf, other]

Blank Language Models

Authors: Tianxiao Shen, Victor Quach, Regina Barzilay, Tommi Jaakkola

Abstract: We propose Blank Language Model (BLM), a model that generates sequences by dynamically creating and filling in blanks. The blanks control which part of the sequence to expand, making BLM ideal for a variety of text editing and rewriting tasks. The model can start from a single blank or partially completed text with blanks at specified locations. It iteratively determines which word to place in a b… ▽ More We propose Blank Language Model (BLM), a model that generates sequences by dynamically creating and filling in blanks. The blanks control which part of the sequence to expand, making BLM ideal for a variety of text editing and rewriting tasks. The model can start from a single blank or partially completed text with blanks at specified locations. It iteratively determines which word to place in a blank and whether to insert new blanks, and stops generating when no blanks are left to fill. BLM can be efficiently trained using a lower bound of the marginal data likelihood. On the task of filling missing text snippets, BLM significantly outperforms all other baselines in terms of both accuracy and fluency. Experiments on style transfer and damaged ancient text restoration demonstrate the potential of this framework for a wide range of applications. △ Less

Submitted 16 November, 2020; v1 submitted 7 February, 2020; originally announced February 2020.

Comments: EMNLP 2020 camera-ready

arXiv:1912.05790 [pdf, other]

Zooming into Face Forensics: A Pixel-level Analysis

Authors: Jia Li, Tong Shen, Wei Zhang, Hui Ren, Dan Zeng, Tao Mei

Abstract: The stunning progress in face manipulation methods has made it possible to synthesize realistic fake face images, which poses potential threats to our society. It is urgent to have face forensics techniques to distinguish those tampered images. A large scale dataset "FaceForensics++" has provided enormous training data generated from prominent face manipulation methods to facilitate anti-fake rese… ▽ More The stunning progress in face manipulation methods has made it possible to synthesize realistic fake face images, which poses potential threats to our society. It is urgent to have face forensics techniques to distinguish those tampered images. A large scale dataset "FaceForensics++" has provided enormous training data generated from prominent face manipulation methods to facilitate anti-fake research. However, previous works focus more on casting it as a classification problem by only considering a global prediction. Through investigation to the problem, we find that training a classification network often fails to capture high quality features, which might lead to sub-optimal solutions. In this paper, we zoom in on the problem by conducting a pixel-level analysis, i.e. formulating it as a pixel-level segmentation task. By evaluating multiple architectures on both segmentation and classification tasks, We show the superiority of viewing the problem from a segmentation perspective. Different ablation studies are also performed to investigate what makes an effective and efficient anti-fake model. Strong baselines are also established, which, we hope, could shed some light on the field of face forensics. △ Less

Submitted 12 December, 2019; originally announced December 2019.

arXiv:1912.03653 [pdf, ps, other]

Compactified Jacobians as Mumford models

Authors: Karl Christ, Sam Payne, Tif Shen

Abstract: We show that relative compactified Jacobians of one-parameter smoothings of a nodal curve of genus g are Mumford models of the generic fiber. Each such model is given by an admissible polytopal decomposition of the skeleton of the Jacobian. We describe the decompositions corresponding to compactified Jacobians explicitly in terms of the auxiliary stability data and find, in particular, that in deg… ▽ More We show that relative compactified Jacobians of one-parameter smoothings of a nodal curve of genus g are Mumford models of the generic fiber. Each such model is given by an admissible polytopal decomposition of the skeleton of the Jacobian. We describe the decompositions corresponding to compactified Jacobians explicitly in terms of the auxiliary stability data and find, in particular, that in degree g there is a unique compactified Jacobian encoding slop stability, and it is induced by the tropical break divisor decomposition. △ Less

Submitted 28 September, 2022; v1 submitted 8 December, 2019; originally announced December 2019.

Comments: 25 pages. Final version, to appear in Trans. Amer. Math. Soc

MSC Class: 14H40; 14G22; 14T15; 14T20

arXiv:1911.11899 [pdf, other]

Self-Attention Enhanced Selective Gate with Entity-Aware Embedding for Distantly Supervised Relation Extraction

Authors: Yang Li, Guodong Long, Tao Shen, Tianyi Zhou, Lina Yao, Huan Huo, Jing Jiang

Abstract: Distantly supervised relation extraction intrinsically suffers from noisy labels due to the strong assumption of distant supervision. Most prior works adopt a selective attention mechanism over sentences in a bag to denoise from wrongly labeled data, which however could be incompetent when there is only one sentence in a bag. In this paper, we propose a brand-new light-weight neural framework to a… ▽ More Distantly supervised relation extraction intrinsically suffers from noisy labels due to the strong assumption of distant supervision. Most prior works adopt a selective attention mechanism over sentences in a bag to denoise from wrongly labeled data, which however could be incompetent when there is only one sentence in a bag. In this paper, we propose a brand-new light-weight neural framework to address the distantly supervised relation extraction problem and alleviate the defects in previous selective attention framework. Specifically, in the proposed framework, 1) we use an entity-aware word embedding method to integrate both relative position information and head/tail entity embeddings, aiming to highlight the essence of entities for this task; 2) we develop a self-attention mechanism to capture the rich contextual dependencies as a complement for local dependencies captured by piecewise CNN; and 3) instead of using selective attention, we design a pooling-equipped gate, which is based on rich contextual representations, as an aggregator to generate bag-level representation for final relation classification. Compared to selective attention, one major advantage of the proposed gating mechanism is that, it performs stably and promisingly even if only one sentence appears in a bag and thus keeps the consistency across all training examples. The experiments on NYT dataset demonstrate that our approach achieves a new state-of-the-art performance in terms of both AUC and top-n precision metrics. △ Less

Submitted 26 November, 2019; originally announced November 2019.

Comments: Accepted to appear at AAAI 2020

arXiv:1911.10305 [pdf, other]

Dynamical System Inspired Adaptive Time Stepping Controller for Residual Network Families

Authors: Yibo Yang, Jianlong Wu, Hongyang Li, Xia Li, Tiancheng Shen, Zhouchen Lin

Abstract: The correspondence between residual networks and dynamical systems motivates researchers to unravel the physics of ResNets with well-developed tools in numeral methods of ODE systems. The Runge-Kutta-Fehlberg method is an adaptive time stepping that renders a good trade-off between the stability and efficiency. Can we also have an adaptive time stepping for ResNets to ensure both stability and per… ▽ More The correspondence between residual networks and dynamical systems motivates researchers to unravel the physics of ResNets with well-developed tools in numeral methods of ODE systems. The Runge-Kutta-Fehlberg method is an adaptive time stepping that renders a good trade-off between the stability and efficiency. Can we also have an adaptive time stepping for ResNets to ensure both stability and performance? In this study, we analyze the effects of time stepping on the Euler method and ResNets. We establish a stability condition for ResNets with step sizes and weight parameters, and point out the effects of step sizes on the stability and performance. Inspired by our analyses, we develop an adaptive time stepping controller that is dependent on the parameters of the current step, and aware of previous steps. The controller is jointly optimized with the network training so that variable step sizes and evolution time can be adaptively adjusted. We conduct experiments on ImageNet and CIFAR to demonstrate the effectiveness. It is shown that our proposed method is able to improve both stability and accuracy without introducing additional overhead in inference phase. △ Less

Submitted 22 November, 2019; originally announced November 2019.

Comments: AAAI-20

arXiv:1911.07158 [pdf, other]

Unsupervised Domain Adaptation for Object Detection via Cross-Domain Semi-Supervised Learning

Authors: Fuxun Yu, Di Wang, Yinpeng Chen, Nikolaos Karianakis, Tong Shen, Pei Yu, Dimitrios Lymberopoulos, Sidi Lu, Weisong Shi, Xiang Chen

Abstract: Current state-of-the-art object detectors can have significant performance drop when deployed in the wild due to domain gaps with training data. Unsupervised Domain Adaptation (UDA) is a promising approach to adapt models for new domains/environments without any expensive label cost. However, without ground truth labels, most prior works on UDA for object detection tasks can only perform coarse im… ▽ More Current state-of-the-art object detectors can have significant performance drop when deployed in the wild due to domain gaps with training data. Unsupervised Domain Adaptation (UDA) is a promising approach to adapt models for new domains/environments without any expensive label cost. However, without ground truth labels, most prior works on UDA for object detection tasks can only perform coarse image-level and/or feature-level adaptation by using adversarial learning methods. In this work, we show that such adversarial-based methods can only reduce the domain style gap, but cannot address the domain content distribution gap that is shown to be important for object detectors. To overcome this limitation, we propose the Cross-Domain Semi-Supervised Learning (CDSSL) framework by leveraging high-quality pseudo labels to learn better representations from the target domain directly. To enable SSL for cross-domain object detection, we propose fine-grained domain transfer, progressive-confidence-based label sharpening and imbalanced sampling strategy to address two challenges: (i) non-identical distribution between source and target domain data, (ii) error amplification/accumulation due to noisy pseudo labeling on the target domain. Experiment results show that our proposed approach consistently achieves new state-of-the-art performance (2.2% - 9.5% better than prior best work on mAP) under various domain gap scenarios. The code will be released. △ Less

Submitted 4 August, 2021; v1 submitted 17 November, 2019; originally announced November 2019.

Comments: Accepted in WACV'2022

arXiv:1910.13174 [pdf, other]

Autonomous UAV Landing System Based on Visual Navigation

Authors: Zhixin Wu, Peng Han, Ruiwen Yao, Lei Qiao, Weidong Zhang, Tielong Shen, Min Sun, Yilong Zhu, Ming Liu, Rui Fan

Abstract: In this paper, we present an autonomous unmanned aerial vehicle (UAV) landing system based on visual navigation. We design the landmark as a topological pattern in order to enable the UAV to distinguish the landmark from the environment easily. In addition, a dynamic thresholding method is developed for image binarization to improve detection efficiency. The relative distance in the horizontal pla… ▽ More In this paper, we present an autonomous unmanned aerial vehicle (UAV) landing system based on visual navigation. We design the landmark as a topological pattern in order to enable the UAV to distinguish the landmark from the environment easily. In addition, a dynamic thresholding method is developed for image binarization to improve detection efficiency. The relative distance in the horizontal plane is calculated according to effective image information, and the relative height is obtained using a linear interpolation method. The landing experiments are performed on a static and a moving platform, respectively. The experimental results illustrate that our proposed landing system performs robustly and accurately. △ Less

Submitted 29 October, 2019; originally announced October 2019.

Comments: 6 pages, 13 figures, 2019 IEEE International Conference on Imaging Systems and Techniques (IST)

arXiv:1910.12535 [pdf, ps, other]

Low-Complexity Leakage-based Secure Precise Wireless Transmission with Hybrid Beamforming

Authors: Tong Shen, Yan Lin, Jun Zou, Yongpeng Wu, Feng Shu, Jiangzhou Wang

Abstract: In conventional secure precise wireless transmission (SPWT), fully digital beamforming (FDB) has a high secrecy performance in transmit antenna system, but results in a huge RF-chain circuit budget for medium-scale and large-scale systems. To reduce the complexity, this letter considers a hybrid digital and analog (HDA) structure with random frequency mapped into the RF-chains to achieve SPWT. The… ▽ More In conventional secure precise wireless transmission (SPWT), fully digital beamforming (FDB) has a high secrecy performance in transmit antenna system, but results in a huge RF-chain circuit budget for medium-scale and large-scale systems. To reduce the complexity, this letter considers a hybrid digital and analog (HDA) structure with random frequency mapped into the RF-chains to achieve SPWT. Then, a hybrid SPWT scheme based on maximizing signal-to-leakage-and-noise ratio (SLNR) and artificial-noise-to-leakage-and-noise ratio (ANLNR) (M-SLNR-ANLNR) is proposed. Compared to the FDB scheme, the proposed scheme reduces the circuit budget with low computational complexity and comparable secrecy performance. △ Less

Submitted 28 October, 2019; originally announced October 2019.

arXiv:1910.09688 [pdf, other]

Learning to Make Generalizable and Diverse Predictions for Retrosynthesis

Authors: Benson Chen, Tianxiao Shen, Tommi S. Jaakkola, Regina Barzilay

Abstract: We propose a new model for making generalizable and diverse retrosynthetic reaction predictions. Given a target compound, the task is to predict the likely chemical reactants to produce the target. This generative task can be framed as a sequence-to-sequence problem by using the SMILES representations of the molecules. Building on top of the popular Transformer architecture, we propose two novel p… ▽ More We propose a new model for making generalizable and diverse retrosynthetic reaction predictions. Given a target compound, the task is to predict the likely chemical reactants to produce the target. This generative task can be framed as a sequence-to-sequence problem by using the SMILES representations of the molecules. Building on top of the popular Transformer architecture, we propose two novel pre-training methods that construct relevant auxiliary tasks (plausible reactions) for our problem. Furthermore, we incorporate a discrete latent variable model into the architecture to encourage the model to produce a diverse set of alternative predictions. On the 50k subset of reaction examples from the United States patent literature (USPTO-50k) benchmark dataset, our model greatly improves performance over the baseline, while also generating predictions that are more diverse. △ Less

Submitted 21 October, 2019; originally announced October 2019.

arXiv:1910.08762 [pdf]

doi 10.1103/PhysRevB.101.165306

A realistic dimension-independent approach for charged defect calculations in semiconductors

Authors: Jin Xiao, Kaike Yang, Dan Guo, Tao Shen, Jun-Wei Luo, Shu-Shen Li, Su-Huai Wei, Hui-Xiong Deng

Abstract: First-principles calculations of charged defects have become a cornerstone of research in semiconductors and insulators by providing insights into their fundamental physical properties. But current standard approach using the so-called jellium model has encountered both conceptual ambiguity and computational difficulty, especially for low-dimensional semiconducting materials. In this Communication… ▽ More First-principles calculations of charged defects have become a cornerstone of research in semiconductors and insulators by providing insights into their fundamental physical properties. But current standard approach using the so-called jellium model has encountered both conceptual ambiguity and computational difficulty, especially for low-dimensional semiconducting materials. In this Communication, we propose a physical, straightforward, and dimension-independent universal model to calculate the formation energies of charged defects in both three-dimensional (3D) bulk and low-dimensional semiconductors. Within this model, the ionized electrons or holes are placed on the realistic host band-edge states instead of the virtual jellium state, therefore, rendering it not only naturally keeps the supercell charge neutral, but also has clear physical meaning. This realistic model reproduces the same accuracy as the traditional jellium model for most of the 3D semiconducting materials, and remarkably, for the low-dimensional structures, it is able to cure the divergence caused by the artificial long-range electrostatic energy introduced in the jellium model, and hence gives meaningful formation energies of defects in charged state and transition energy levels of the corresponding defects. Our realistic method, therefore, will have significant impact for the study of defect physics in all low-dimensional systems including quantum dots, nanowires, surfaces, interfaces, and 2D materials. △ Less

Submitted 19 October, 2019; originally announced October 2019.

Journal ref: Phys. Rev. B 101, 165306 (2020)

arXiv:1910.05069 [pdf, other]

Multi-Task Learning for Conversational Question Answering over a Large-Scale Knowledge Base

Authors: Tao Shen, Xiubo Geng, Tao Qin, Daya Guo, Duyu Tang, Nan Duan, Guodong Long, Daxin Jiang

Abstract: We consider the problem of conversational question answering over a large-scale knowledge base. To handle huge entity vocabulary of a large-scale knowledge base, recent neural semantic parsing based approaches usually decompose the task into several subtasks and then solve them sequentially, which leads to following issues: 1) errors in earlier subtasks will be propagated and negatively affect dow… ▽ More We consider the problem of conversational question answering over a large-scale knowledge base. To handle huge entity vocabulary of a large-scale knowledge base, recent neural semantic parsing based approaches usually decompose the task into several subtasks and then solve them sequentially, which leads to following issues: 1) errors in earlier subtasks will be propagated and negatively affect downstream ones; and 2) each subtask cannot naturally share supervision signals with others. To tackle these issues, we propose an innovative multi-task learning framework where a pointer-equipped semantic parsing model is designed to resolve coreference in conversations, and naturally empower joint learning with a novel type-aware entity detection model. The proposed framework thus enables shared supervisions and alleviates the effect of error propagation. Experiments on a large-scale conversational question answering dataset containing 1.6M question answering pairs over 12.8M entities show that the proposed framework improves overall F1 score from 67% to 79% compared with previous state-of-the-art work. △ Less

Submitted 11 October, 2019; originally announced October 2019.

Comments: Accepted to appear at EMNLP-IJCNLP 2019

arXiv:1909.12860 [pdf, other]

doi 10.1126/sciadv.aax3793

Measurement of the cosmic-ray proton spectrum from 40 GeV to 100 TeV with the DAMPE satellite

Authors: Q. An, R. Asfandiyarov, P. Azzarello, P. Bernardini, X. J. Bi, M. S. Cai, J. Chang, D. Y. Chen, H. F. Chen, J. L. Chen, W. Chen, M. Y. Cui, T. S. Cui, H. T. Dai, A. D'Amone, A. De Benedittis, I. De Mitri, M. Di Santo, M. Ding, T. K. Dong, Y. F. Dong, Z. X. Dong, G. Donvito, D. Droz, J. L. Duan , et al. (129 additional authors not shown)

Abstract: The precise measurement of the spectrum of protons, the most abundant component of the cosmic radiation, is necessary to understand the source and acceleration of cosmic rays in the Milky Way. This work reports the measurement of the cosmic ray proton fluxes with kinetic energies from 40 GeV to 100 TeV, with two and a half years of data recorded by the DArk Matter Particle Explorer (DAMPE). This i… ▽ More The precise measurement of the spectrum of protons, the most abundant component of the cosmic radiation, is necessary to understand the source and acceleration of cosmic rays in the Milky Way. This work reports the measurement of the cosmic ray proton fluxes with kinetic energies from 40 GeV to 100 TeV, with two and a half years of data recorded by the DArk Matter Particle Explorer (DAMPE). This is the first time an experiment directly measures the cosmic ray protons up to ~100 TeV with a high statistics. The measured spectrum confirms the spectral hardening found by previous experiments and reveals a softening at ~13.6 TeV, with the spectral index changing from ~2.60 to ~2.85. Our result suggests the existence of a new spectral feature of cosmic rays at energies lower than the so-called knee, and sheds new light on the origin of Galactic cosmic rays. △ Less

Submitted 30 September, 2019; v1 submitted 27 September, 2019; originally announced September 2019.

Comments: 37 pages, 5 figures, published in Science Advances

Journal ref: Science Advances, Vol. 5, no. 9, eaax3793 (2019)

arXiv:1909.09115 [pdf, other]

Self-Supervised Learning of Depth and Motion Under Photometric Inconsistency

Authors: Tianwei Shen, Lei Zhou, Zixin Luo, Yao Yao, Shiwei Li, Jiahui Zhang, Tian Fang, Long Quan

Abstract: The self-supervised learning of depth and pose from monocular sequences provides an attractive solution by using the photometric consistency of nearby frames as it depends much less on the ground-truth data. In this paper, we address the issue when previous assumptions of the self-supervised approaches are violated due to the dynamic nature of real-world scenes. Different from handling the noise a… ▽ More The self-supervised learning of depth and pose from monocular sequences provides an attractive solution by using the photometric consistency of nearby frames as it depends much less on the ground-truth data. In this paper, we address the issue when previous assumptions of the self-supervised approaches are violated due to the dynamic nature of real-world scenes. Different from handling the noise as uncertainty, our key idea is to incorporate more robust geometric quantities and enforce internal consistency in the temporal image sequence. As demonstrated on commonly used benchmark datasets, the proposed method substantially improves the state-of-the-art methods on both depth and relative pose estimation for monocular image sequences, without adding inference overhead. △ Less

Submitted 19 September, 2019; originally announced September 2019.

Comments: International Conference on Computer Vision (ICCV) Workshop 2019

arXiv:1909.07159 [pdf, other]

RVH: Range-Vector Hash for Fast Online Packet Classification

Authors: Tong Shen, Gaogang Xie, Xin Wang, Zhenyu Li, Xinyi Zhang, Penghao Zhang, Dafang Zhang

Abstract: Packet classification according to multi-field ruleset is a key component for many network applications. Emerging software defined networking and cloud computing need to update the rulesets frequently for flexible policy configuration. Their success depends on the availability of the new generation of classifiers that can support both fast ruleset updating and high-speed packet classification. How… ▽ More Packet classification according to multi-field ruleset is a key component for many network applications. Emerging software defined networking and cloud computing need to update the rulesets frequently for flexible policy configuration. Their success depends on the availability of the new generation of classifiers that can support both fast ruleset updating and high-speed packet classification. However, existing packet classification approaches focus either on high-speed packet classification or fast rule update, but no known scheme meets both requirements. In this paper, we propose Range-vector Hash (RVH) to effectively accelerate the packet classification with a hash-based algorithm while ensuring the fast rule update. RVH is built on our key observation that the number of distinct combinations of each field prefix lengths is not evenly distributed. To reduce the number of hash tables for fast classification, we introduce a novel concept range-vector with each specified the length range of each field prefix of the projected rules. RVH can overcome the major obstacle that hinders hash-based packet classification by balancing the number of hash tables and the probability of hash collision. Experimental results demonstrate that RVH can achieve the classification speed up to 15.7 times and the update speed up to 2.3 times that of the state-of-the-art algorithms on average, while only consuming 44% less memory. △ Less

Submitted 16 September, 2019; originally announced September 2019.

arXiv:1909.06886 [pdf, other]

doi 10.1109/ICDM.2019.00060

Temporal Self-Attention Network for Medical Concept Embedding

Authors: Xueping Peng, Guodong Long, Tao Shen, Sen Wang, Jing Jiang, Michael Blumenstein

Abstract: In longitudinal electronic health records (EHRs), the event records of a patient are distributed over a long period of time and the temporal relations between the events reflect sufficient domain knowledge to benefit prediction tasks such as the rate of inpatient mortality. Medical concept embedding as a feature extraction method that transforms a set of medical concepts with a specific time stamp… ▽ More In longitudinal electronic health records (EHRs), the event records of a patient are distributed over a long period of time and the temporal relations between the events reflect sufficient domain knowledge to benefit prediction tasks such as the rate of inpatient mortality. Medical concept embedding as a feature extraction method that transforms a set of medical concepts with a specific time stamp into a vector, which will be fed into a supervised learning algorithm. The quality of the embedding significantly determines the learning performance over the medical data. In this paper, we propose a medical concept embedding method based on applying a self-attention mechanism to represent each medical concept. We propose a novel attention mechanism which captures the contextual information and temporal relationships between medical concepts. A light-weight neural net, "Temporal Self-Attention Network (TeSAN)", is then proposed to learn medical concept embedding based solely on the proposed attention mechanism. To test the effectiveness of our proposed methods, we have conducted clustering and prediction tasks on two public EHRs datasets comparing TeSAN against five state-of-the-art embedding methods. The experimental results demonstrate that the proposed TeSAN model is superior to all the compared methods. To the best of our knowledge, this work is the first to exploit temporal self-attentive relations between medical events. △ Less

Submitted 15 September, 2019; originally announced September 2019.

Comments: 10 pages, 7 figures, accepted at IEEE ICDM 2019

MSC Class: 68T30 ACM Class: I.2.1

arXiv:1909.02762 [pdf, other]

Effective Search of Logical Forms for Weakly Supervised Knowledge-Based Question Answering

Authors: Tao Shen, Xiubo Geng, Tao Qin, Guodong Long, Jing Jiang, Daxin Jiang

Abstract: Many algorithms for Knowledge-Based Question Answering (KBQA) depend on semantic parsing, which translates a question to its logical form. When only weak supervision is provided, it is usually necessary to search valid logical forms for model training. However, a complex question typically involves a huge search space, which creates two main problems: 1) the solutions limited by computation time a… ▽ More Many algorithms for Knowledge-Based Question Answering (KBQA) depend on semantic parsing, which translates a question to its logical form. When only weak supervision is provided, it is usually necessary to search valid logical forms for model training. However, a complex question typically involves a huge search space, which creates two main problems: 1) the solutions limited by computation time and memory usually reduce the success rate of the search, and 2) spurious logical forms in the search results degrade the quality of training data. These two problems lead to a poorly-trained semantic parsing model. In this work, we propose an effective search method for weakly supervised KBQA based on operator prediction for questions. With search space constrained by predicted operators, sufficient search paths can be explored, more valid logical forms can be derived, and operators possibly causing spurious logical forms can be avoided. As a result, a larger proportion of questions in a weakly supervised training set are equipped with logical forms, and fewer spurious logical forms are generated. Such high-quality training data directly contributes to a better semantic parsing model. Experimental results on one of the largest KBQA datasets (i.e., CSQA) verify the effectiveness of our approach: improving the precision from 67% to 72% and the recall from 67% to 72% in terms of the overall score. △ Less

Submitted 6 September, 2019; originally announced September 2019.

arXiv:1908.10136 [pdf, other]

Cooperative Cross-Stream Network for Discriminative Action Representation

Authors: Jingran Zhang, Fumin Shen, Xing Xu, Heng Tao Shen

Abstract: Spatial and temporal stream model has gained great success in video action recognition. Most existing works pay more attention to designing effective features fusion methods, which train the two-stream model in a separate way. However, it's hard to ensure discriminability and explore complementary information between different streams in existing works. In this work, we propose a novel cooperative… ▽ More Spatial and temporal stream model has gained great success in video action recognition. Most existing works pay more attention to designing effective features fusion methods, which train the two-stream model in a separate way. However, it's hard to ensure discriminability and explore complementary information between different streams in existing works. In this work, we propose a novel cooperative cross-stream network that investigates the conjoint information in multiple different modalities. The jointly spatial and temporal stream networks feature extraction is accomplished by an end-to-end learning manner. It extracts this complementary information of different modality from a connection block, which aims at exploring correlations of different stream features. Furthermore, different from the conventional ConvNet that learns the deep separable features with only one cross-entropy loss, our proposed model enhances the discriminative power of the deeply learned features and reduces the undesired modality discrepancy by jointly optimizing a modality ranking constraint and a cross-entropy loss for both homogeneous and heterogeneous modalities. The modality ranking constraint constitutes intra-modality discriminative embedding and inter-modality triplet constraint, and it reduces both the intra-modality and cross-modality feature variations. Experiments on three benchmark datasets demonstrate that by cooperating appearance and motion feature extraction, our method can achieve state-of-the-art or competitive performance compared with existing results. △ Less

Submitted 27 August, 2019; originally announced August 2019.

Comments: 10 pages, 6 figures

arXiv:1908.10059 [pdf, other]

MetaMixUp: Learning Adaptive Interpolation Policy of MixUp with Meta-Learning

Authors: Zhijun Mai, Guosheng Hu, Dexiong Chen, Fumin Shen, Heng Tao Shen

Abstract: MixUp is an effective data augmentation method to regularize deep neural networks via random linear interpolations between pairs of samples and their labels. It plays an important role in model regularization, semi-supervised learning and domain adaption. However, despite its empirical success, its deficiency of randomly mixing samples has poorly been studied. Since deep networks are capable of me… ▽ More MixUp is an effective data augmentation method to regularize deep neural networks via random linear interpolations between pairs of samples and their labels. It plays an important role in model regularization, semi-supervised learning and domain adaption. However, despite its empirical success, its deficiency of randomly mixing samples has poorly been studied. Since deep networks are capable of memorizing the entire dataset, the corrupted samples generated by vanilla MixUp with a badly chosen interpolation policy will degrade the performance of networks. To overcome the underfitting by corrupted samples, inspired by Meta-learning (learning to learn), we propose a novel technique of learning to mixup in this work, namely, MetaMixUp. Unlike the vanilla MixUp that samples interpolation policy from a predefined distribution, this paper introduces a meta-learning based online optimization approach to dynamically learn the interpolation policy in a data-adaptive way. The validation set performance via meta-learning captures the underfitting issue, which provides more information to refine interpolation policy. Furthermore, we adapt our method for pseudo-label based semisupervised learning (SSL) along with a refined pseudo-labeling strategy. In our experiments, our method achieves better performance than vanilla MixUp and its variants under supervised learning configuration. In particular, extensive experiments show that our MetaMixUp adapted SSL greatly outperforms MixUp and many state-of-the-art methods on CIFAR-10 and SVHN benchmarks under SSL configuration. △ Less

Submitted 27 August, 2019; originally announced August 2019.

arXiv:1908.09995 [pdf, other]

Temporal Reasoning Graph for Activity Recognition

Authors: Jingran Zhang, Fumin Shen, Xing Xu, Heng Tao Shen

Abstract: Despite great success has been achieved in activity analysis, it still has many challenges. Most existing work in activity recognition pay more attention to design efficient architecture or video sampling strategy. However, due to the property of fine-grained action and long term structure in video, activity recognition is expected to reason temporal relation between video sequences. In this paper… ▽ More Despite great success has been achieved in activity analysis, it still has many challenges. Most existing work in activity recognition pay more attention to design efficient architecture or video sampling strategy. However, due to the property of fine-grained action and long term structure in video, activity recognition is expected to reason temporal relation between video sequences. In this paper, we propose an efficient temporal reasoning graph (TRG) to simultaneously capture the appearance features and temporal relation between video sequences at multiple time scales. Specifically, we construct learnable temporal relation graphs to explore temporal relation on the multi-scale range. Additionally, to facilitate multi-scale temporal relation extraction, we design a multi-head temporal adjacent matrix to represent multi-kinds of temporal relations. Eventually, a multi-head temporal relation aggregator is proposed to extract the semantic meaning of those features convolving through the graphs. Extensive experiments are performed on widely-used large-scale datasets, such as Something-Something and Charades, and the results show that our model can achieve state-of-the-art performance. Further analysis shows that temporal relation reasoning with our TRG can extract discriminative features for activity recognition. △ Less

Submitted 26 August, 2019; originally announced August 2019.

Comments: 14pages, 8figures

arXiv:1908.09244 [pdf, ps, other]

Two High-Performance Amplitude Beamforming Schemes for Secure Precise Communication and Jamming with Phase Alignment

Authors: Lingling Zhu, Feng Shu, Tong Shen

Abstract: To severely weaken the eavesdropper's ability to intercept confidential message (CM), a precise jamming (PJ) idea is proposed by making use of the concept of secure precise wireless transmission (SPWT). Its basic idea is to focus the transmit energy of artificial noise (AN) onto the neighborhood of eavesdropper (Eve) by using random subcarrier selection (RSS), directional modulation, phase alignme… ▽ More To severely weaken the eavesdropper's ability to intercept confidential message (CM), a precise jamming (PJ) idea is proposed by making use of the concept of secure precise wireless transmission (SPWT). Its basic idea is to focus the transmit energy of artificial noise (AN) onto the neighborhood of eavesdropper (Eve) by using random subcarrier selection (RSS), directional modulation, phase alignment (PA), and amplitude beamforming (AB). By doing so, Eve will be seriously interfered with AN. Here, the conventional joint optimization of phase and amplitude is converted into two independent phase and amplitude optimization problems. Considering PJ and SPWT require PA, the joint optimization problem reduces to an amplitude optimization problem. Then, two efficient AB schemes are proposed: leakage and maximizing receive power(Max-RP). With existing equal AB (EAB) as a performance reference, simulation results show that the proposed Max-RP and leakage AB methods perform much better than conventional method in terms of both bit-error-rate (BER) and secrecy rate (SR) at medium and high signal-to-noise ratio regions. The performance difference between the two proposed leakage and Max-RP amplitude beamformers is trivial. Additionally, we also find the fact that all three AB schemes EA, Max-RP, and leakage can form two main peaks of AN and CM around Eve and the desired receiver (Bob), respectively. This is what we call PJ and SPWT. △ Less

Submitted 5 May, 2020; v1 submitted 24 August, 2019; originally announced August 2019.

arXiv:1908.04964 [pdf, other]

Learning Two-View Correspondences and Geometry Using Order-Aware Network

Authors: Jiahui Zhang, Dawei Sun, Zixin Luo, Anbang Yao, Lei Zhou, Tianwei Shen, Yurong Chen, Long Quan, Hongen Liao

Abstract: Establishing correspondences between two images requires both local and global spatial context. Given putative correspondences of feature points in two views, in this paper, we propose Order-Aware Network, which infers the probabilities of correspondences being inliers and regresses the relative pose encoded by the essential matrix. Specifically, this proposed network is built hierarchically and c… ▽ More Establishing correspondences between two images requires both local and global spatial context. Given putative correspondences of feature points in two views, in this paper, we propose Order-Aware Network, which infers the probabilities of correspondences being inliers and regresses the relative pose encoded by the essential matrix. Specifically, this proposed network is built hierarchically and comprises three novel operations. First, to capture the local context of sparse correspondences, the network clusters unordered input correspondences by learning a soft assignment matrix. These clusters are in a canonical order and invariant to input permutations. Next, the clusters are spatially correlated to form the global context of correspondences. After that, the context-encoded clusters are recovered back to the original size through a proposed upsampling operator. We intensively experiment on both outdoor and indoor datasets. The accuracy of the two-view geometry and correspondences are significantly improved over the state-of-the-arts. Code will be available at https://github.com/zjhthu/OANet.git. △ Less

Submitted 14 August, 2019; originally announced August 2019.

Comments: Accepted to ICCV 2019, and Winner solution to both tracks of CVPR IMW 2019 Challenge. Code will be available soon at https://github.com/zjhthu/OANet.git

arXiv:1908.04011 [pdf, other]

Matching Images and Text with Multi-modal Tensor Fusion and Re-ranking

Authors: Tan Wang, Xing Xu, Yang Yang, Alan Hanjalic, Heng Tao Shen, Jingkuan Song

Abstract: A major challenge in matching images and text is that they have intrinsically different data distributions and feature representations. Most existing approaches are based either on embedding or classification, the first one mapping image and text instances into a common embedding space for distance measuring, and the second one regarding image-text matching as a binary classification problem. Neit… ▽ More A major challenge in matching images and text is that they have intrinsically different data distributions and feature representations. Most existing approaches are based either on embedding or classification, the first one mapping image and text instances into a common embedding space for distance measuring, and the second one regarding image-text matching as a binary classification problem. Neither of these approaches can, however, balance the matching accuracy and model complexity well. We propose a novel framework that achieves remarkable matching performance with acceptable model complexity. Specifically, in the training stage, we propose a novel Multi-modal Tensor Fusion Network (MTFN) to explicitly learn an accurate image-text similarity function with rank-based tensor fusion rather than seeking a common embedding space for each image-text instance. Then, during testing, we deploy a generic Cross-modal Re-ranking (RR) scheme for refinement without requiring additional training procedure. Extensive experiments on two datasets demonstrate that our MTFN-RR consistently achieves the state-of-the-art matching performance with much less time complexity. The implementation code is available at https://github.com/Wangt-CN/MTFN-RR-PyTorch-Code. △ Less

Submitted 29 July, 2020; v1 submitted 12 August, 2019; originally announced August 2019.

Comments: ACM Multimedia 2019 Oral

arXiv:1907.12282 [pdf, other]

Regularizing Proxies with Multi-Adversarial Training for Unsupervised Domain-Adaptive Semantic Segmentation

Authors: Tong Shen, Dong Gong, Wei Zhang, Chunhua Shen, Tao Mei

Abstract: Training a semantic segmentation model requires a large amount of pixel-level annotation, hampering its application at scale. With computer graphics, we can generate almost unlimited training data with precise annotation. However,a deep model trained with synthetic data usually cannot directly generalize well to realistic images due to domain shift. It has been observed that highly confident label… ▽ More Training a semantic segmentation model requires a large amount of pixel-level annotation, hampering its application at scale. With computer graphics, we can generate almost unlimited training data with precise annotation. However,a deep model trained with synthetic data usually cannot directly generalize well to realistic images due to domain shift. It has been observed that highly confident labels for the unlabeled real images may be predicted relying on the labeled synthetic data. To tackle the unsupervised domain adaptation problem, we explore the possibilities to generate high-quality labels as proxy labels to supervise the training on target data. Specifically, we propose a novel proxy-based method using multi-adversarial training. We first train the model using synthetic data (source domain). Multiple discriminators are used to align the features be-tween the source and target domain (real images) at different levels. Then we focus on obtaining and selecting high-quality proxy labels by incorporating both the confidence of the class predictor and that from the adversarial discriminators. Our discriminators not only work as a regularizer to encourage feature alignment but also provide an alternative confidence measure for generating proxy labels. Relying on the generated high-quality proxies, our model can be trained in a "supervised manner" on the target do-main. On two major tasks, GTA5->Cityscapes and SYNTHIA->Cityscapes, our method achieves state-of-the-art results, outperforming the previous by a large margin. △ Less

Submitted 29 July, 2019; originally announced July 2019.

arXiv:1907.12070 [pdf, ps, other]

Two Efficient Beamformers for Secure Precise Jamming and Communication with Phase Alignment

Authors: Feng Shu, Lingling Zhu, Wenlong Cai, Tong Shen, Jinyong Lin, Shuo Zhang, Jiangzhou Wang

Abstract: To achieve a better effect of interference on eavesdropper with an enhanced security, a secure precise jamming (PJ) and communication (SPJC) is proposed and its basic idea is to force the transmit energy of artificial noise (AN) and confidential message into the neighborhoods of Eve and Bob by using random subcarrier selection (RSS), directional modulation, and beamforming under phase alignment (P… ▽ More To achieve a better effect of interference on eavesdropper with an enhanced security, a secure precise jamming (PJ) and communication (SPJC) is proposed and its basic idea is to force the transmit energy of artificial noise (AN) and confidential message into the neighborhoods of Eve and Bob by using random subcarrier selection (RSS), directional modulation, and beamforming under phase alignment (PA) constraint (PAC). Here, we propose two high-performance beamforming schemes: minimum transmit power (Min-TP) and minimum regularized transmit power (Min-RTP) to achieve SPJC under PAC and orthogonal constraint (OC), where OC means that AN and CM are projected onto the null-spaces of the desired and eavesdropping channels, respectively. Simulation results show that the proposed Min-TP and Min-RTP methods perform much better than existing equal amplitude (EA) method in terms of both bit-error-rate (BER) and secrecy rate (SR) at medium and high signal-to-noise ratio regions. The SR performance difference between the proposed two methods becomes trivial as the number of transmit antennas approaches large-scale. More importantly, we also find the fact that all three schemes including EA, Min-TP, and Min-RTP can form two main peaks of AN and CM around Eve and Bob, respectively. This achieves both PJ and secure precise wireless transmission (SPWT), called SPJC. △ Less

Submitted 31 July, 2019; v1 submitted 28 July, 2019; originally announced July 2019.

arXiv:1907.02173 [pdf, other]

doi 10.1016/j.astropartphys.2018.10.006

The on-orbit calibration of DArk Matter Particle Explorer

Authors: G. Ambrosi, Q. An, R. Asfandiyarov, P. Azzarello, P. Bernardini, M. S. Cai, M. Caragiulo, J. Chang, D. Y. Chen, H. F. Chen, J. L. Chen, W. Chen, M. Y. Cui, T. S. Cui, H. T. Dai, A. D'Amone, A. De Benedittis, I. De Mitri, M. Ding, M. Di Santo, J. N. Dong, T. K. Dong, Y. F. Dong, Z. X. Dong, D. Droz , et al. (133 additional authors not shown)

Abstract: The DArk Matter Particle Explorer (DAMPE), a satellite-based cosmic ray and gamma-ray detector, was launched on December 17, 2015, and began its on-orbit operation on December 24, 2015. In this work we document the on-orbit calibration procedures used by DAMPE and report the calibration results of the Plastic Scintillator strip Detector (PSD), the Silicon-Tungsten tracKer-converter (STK), the BGO… ▽ More The DArk Matter Particle Explorer (DAMPE), a satellite-based cosmic ray and gamma-ray detector, was launched on December 17, 2015, and began its on-orbit operation on December 24, 2015. In this work we document the on-orbit calibration procedures used by DAMPE and report the calibration results of the Plastic Scintillator strip Detector (PSD), the Silicon-Tungsten tracKer-converter (STK), the BGO imaging calorimeter (BGO), and the Neutron Detector (NUD). The results are obtained using Galactic cosmic rays, bright known GeV gamma-ray sources, and charge injection into the front-end electronics of each sub-detector. The determination of the boundary of the South Atlantic Anomaly (SAA), the measurement of the live time, and the alignments of the detectors are also introduced. The calibration results demonstrate the stability of the detectors in almost two years of the on-orbit operation. △ Less

Submitted 3 July, 2019; originally announced July 2019.

Journal ref: Astroparticle Physics, Volume 106, p. 18-34 (2019)

arXiv:1906.06699 [pdf, other]

doi 10.24963/ijcai.2019

Deep Recurrent Quantization for Generating Sequential Binary Codes

Authors: Jingkuan Song, Xiaosu Zhu, Lianli Gao, Xin-Shun Xu, Wu Liu, Heng Tao Shen

Abstract: Quantization has been an effective technology in ANN (approximate nearest neighbour) search due to its high accuracy and fast search speed. To meet the requirement of different applications, there is always a trade-off between retrieval accuracy and speed, reflected by variable code lengths. However, to encode the dataset into different code lengths, existing methods need to train several models,… ▽ More Quantization has been an effective technology in ANN (approximate nearest neighbour) search due to its high accuracy and fast search speed. To meet the requirement of different applications, there is always a trade-off between retrieval accuracy and speed, reflected by variable code lengths. However, to encode the dataset into different code lengths, existing methods need to train several models, where each model can only produce a specific code length. This incurs a considerable training time cost, and largely reduces the flexibility of quantization methods to be deployed in real applications. To address this issue, we propose a Deep Recurrent Quantization (DRQ) architecture which can generate sequential binary codes. To the end, when the model is trained, a sequence of binary codes can be generated and the code length can be easily controlled by adjusting the number of recurrent iterations. A shared codebook and a scalar factor is designed to be the learnable weights in the deep recurrent quantization block, and the whole framework can be trained in an end-to-end manner. As far as we know, this is the first quantization method that can be trained once and generate sequential binary codes. Experimental results on the benchmark datasets show that our model achieves comparable or even better performance compared with the state-of-the-art for image retrieval. But it requires significantly less number of parameters and training times. Our code is published online: https://github.com/cfm-uestc/DRQ. △ Less

Submitted 4 December, 2020; v1 submitted 16 June, 2019; originally announced June 2019.

ACM Class: H.3.1

Journal ref: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence 1 (2019) 912-918

arXiv:1906.06698 [pdf, other]

doi 10.24963/ijcai.2019

Beyond Product Quantization: Deep Progressive Quantization for Image Retrieval

Authors: Lianli Gao, Xiaosu Zhu, Jingkuan Song, Zhou Zhao, Heng Tao Shen

Abstract: Product Quantization (PQ) has long been a mainstream for generating an exponentially large codebook at very low memory/time cost. Despite its success, PQ is still tricky for the decomposition of high-dimensional vector space, and the retraining of model is usually unavoidable when the code length changes. In this work, we propose a deep progressive quantization (DPQ) model, as an alternative to PQ… ▽ More Product Quantization (PQ) has long been a mainstream for generating an exponentially large codebook at very low memory/time cost. Despite its success, PQ is still tricky for the decomposition of high-dimensional vector space, and the retraining of model is usually unavoidable when the code length changes. In this work, we propose a deep progressive quantization (DPQ) model, as an alternative to PQ, for large scale image retrieval. DPQ learns the quantization codes sequentially and approximates the original feature space progressively. Therefore, we can train the quantization codes with different code lengths simultaneously. Specifically, we first utilize the label information for guiding the learning of visual features, and then apply several quantization blocks to progressively approach the visual features. Each quantization block is designed to be a layer of a convolutional neural network, and the whole framework can be trained in an end-to-end manner. Experimental results on the benchmark datasets show that our model significantly outperforms the state-of-the-art for image retrieval. Our model is trained once for different code lengths and therefore requires less computation time. Additional ablation study demonstrates the effect of each component of our proposed model. Our code is released at https://github.com/cfm-uestc/DPQ. △ Less

Submitted 4 December, 2020; v1 submitted 16 June, 2019; originally announced June 2019.

ACM Class: H.3.1

Journal ref: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence 1 (2019) 723-729

arXiv:1905.12777 [pdf, other]

Educating Text Autoencoders: Latent Representation Guidance via Denoising

Authors: Tianxiao Shen, Jonas Mueller, Regina Barzilay, Tommi Jaakkola

Abstract: Generative autoencoders offer a promising approach for controllable text generation by leveraging their latent sentence representations. However, current models struggle to maintain coherent latent spaces required to perform meaningful text manipulations via latent vector operations. Specifically, we demonstrate by example that neural encoders do not necessarily map similar sentences to nearby lat… ▽ More Generative autoencoders offer a promising approach for controllable text generation by leveraging their latent sentence representations. However, current models struggle to maintain coherent latent spaces required to perform meaningful text manipulations via latent vector operations. Specifically, we demonstrate by example that neural encoders do not necessarily map similar sentences to nearby latent vectors. A theoretical explanation for this phenomenon establishes that high capacity autoencoders can learn an arbitrary mapping between sequences and associated latent representations. To remedy this issue, we augment adversarial autoencoders with a denoising objective where original sentences are reconstructed from perturbed versions (referred to as DAAE). We prove that this simple modification guides the latent space geometry of the resulting model by encouraging the encoder to map similar texts to similar latent representations. In empirical comparisons with various types of autoencoders, our model provides the best trade-off between generation quality and reconstruction capacity. Moreover, the improved geometry of the DAAE latent space enables zero-shot text style transfer via simple latent vector arithmetic. △ Less

Submitted 7 July, 2020; v1 submitted 29 May, 2019; originally announced May 2019.

Comments: ICML 2020 camera-ready

arXiv:1905.04729 [pdf, other]

One-Shot Image-to-Image Translation via Part-Global Learning with a Multi-adversarial Framework

Authors: Ziqiang Zheng, Zhibin Yu, Haiyong Zheng, Yang Yang, Heng Tao Shen

Abstract: It is well known that humans can learn and recognize objects effectively from several limited image samples. However, learning from just a few images is still a tremendous challenge for existing main-stream deep neural networks. Inspired by analogical reasoning in the human mind, a feasible strategy is to translate the abundant images of a rich source domain to enrich the relevant yet different ta… ▽ More It is well known that humans can learn and recognize objects effectively from several limited image samples. However, learning from just a few images is still a tremendous challenge for existing main-stream deep neural networks. Inspired by analogical reasoning in the human mind, a feasible strategy is to translate the abundant images of a rich source domain to enrich the relevant yet different target domain with insufficient image data. To achieve this goal, we propose a novel, effective multi-adversarial framework (MA) based on part-global learning, which accomplishes one-shot cross-domain image-to-image translation. In specific, we first devise a part-global adversarial training scheme to provide an efficient way for feature extraction and prevent discriminators being over-fitted. Then, a multi-adversarial mechanism is employed to enhance the image-to-image translation ability to unearth the high-level semantic representation. Moreover, a balanced adversarial loss function is presented, which aims to balance the training data and stabilize the training process. Extensive experiments demonstrate that the proposed approach can obtain impressive results on various datasets between two extremely imbalanced image domains and outperform state-of-the-art methods on one-shot image-to-image translation. △ Less

Submitted 12 May, 2019; originally announced May 2019.

Comments: 9 pages, 13 figures

arXiv:1905.04016 [pdf, other]

Exact Adversarial Attack to Image Captioning via Structured Output Learning with Latent Variables

Authors: Yan Xu, Baoyuan Wu, Fumin Shen, Yanbo Fan, Yong Zhang, Heng Tao Shen, Wei Liu

Abstract: In this work, we study the robustness of a CNN+RNN based image captioning system being subjected to adversarial noises. We propose to fool an image captioning system to generate some targeted partial captions for an image polluted by adversarial noises, even the targeted captions are totally irrelevant to the image content. A partial caption indicates that the words at some locations in this capti… ▽ More In this work, we study the robustness of a CNN+RNN based image captioning system being subjected to adversarial noises. We propose to fool an image captioning system to generate some targeted partial captions for an image polluted by adversarial noises, even the targeted captions are totally irrelevant to the image content. A partial caption indicates that the words at some locations in this caption are observed, while words at other locations are not restricted.It is the first work to study exact adversarial attacks of targeted partial captions. Due to the sequential dependencies among words in a caption, we formulate the generation of adversarial noises for targeted partial captions as a structured output learning problem with latent variables. Both the generalized expectation maximization algorithm and structural SVMs with latent variables are then adopted to optimize the problem. The proposed methods generate very successful at-tacks to three popular CNN+RNN based image captioning models. Furthermore, the proposed attack methods are used to understand the inner mechanism of image captioning systems, providing the guidance to further improve automatic image captioning systems towards human captioning. △ Less

Submitted 10 May, 2019; originally announced May 2019.

Comments: Accepted to CVPR 2019. Yan Xu and Baoyuan Wu are co-first authors

arXiv:1904.12615 [pdf, other]

Everyone is a Cartoonist: Selfie Cartoonization with Attentive Adversarial Networks

Authors: Xinyu Li, Wei Zhang, Tong Shen, Tao Mei

Abstract: Selfie and cartoon are two popular artistic forms that are widely presented in our daily life. Despite the great progress in image translation/stylization, few techniques focus specifically on selfie cartoonization, since cartoon images usually contain artistic abstraction (e.g., large smoothing areas) and exaggeration (e.g., large/delicate eyebrows). In this paper, we address this problem by prop… ▽ More Selfie and cartoon are two popular artistic forms that are widely presented in our daily life. Despite the great progress in image translation/stylization, few techniques focus specifically on selfie cartoonization, since cartoon images usually contain artistic abstraction (e.g., large smoothing areas) and exaggeration (e.g., large/delicate eyebrows). In this paper, we address this problem by proposing a selfie cartoonization Generative Adversarial Network (scGAN), which mainly uses an attentive adversarial network (AAN) to emphasize specific facial regions and ignore low-level details. More specifically, we first design a cycle-like architecture to enable training with unpaired data. Then we design three losses from different aspects. A total variation loss is used to highlight important edges and contents in cartoon portraits. An attentive cycle loss is added to lay more emphasis on delicate facial areas such as eyes. In addition, a perceptual loss is included to eliminate artifacts and improve robustness of our method. Experimental results show that our method is capable of generating different cartoon styles and outperforms a number of state-of-the-art methods. △ Less

Submitted 20 April, 2019; originally announced April 2019.

arXiv:1904.11207 [pdf, other]

Exploring Auxiliary Context: Discrete Semantic Transfer Hashing for Scalable Image Retrieval

Authors: Lei Zhu, Zi Huang, Zhihui Li, Liang Xie, Heng Tao Shen

Abstract: Unsupervised hashing can desirably support scalable content-based image retrieval (SCBIR) for its appealing advantages of semantic label independence, memory and search efficiency. However, the learned hash codes are embedded with limited discriminative semantics due to the intrinsic limitation of image representation. To address the problem, in this paper, we propose a novel hashing approach, dub… ▽ More Unsupervised hashing can desirably support scalable content-based image retrieval (SCBIR) for its appealing advantages of semantic label independence, memory and search efficiency. However, the learned hash codes are embedded with limited discriminative semantics due to the intrinsic limitation of image representation. To address the problem, in this paper, we propose a novel hashing approach, dubbed as \emph{Discrete Semantic Transfer Hashing} (DSTH). The key idea is to \emph{directly} augment the semantics of discrete image hash codes by exploring auxiliary contextual modalities. To this end, a unified hashing framework is formulated to simultaneously preserve visual similarities of images and perform semantic transfer from contextual modalities. Further, to guarantee direct semantic transfer and avoid information loss, we explicitly impose the discrete constraint, bit--uncorrelation constraint and bit-balance constraint on hash codes. A novel and effective discrete optimization method based on augmented Lagrangian multiplier is developed to iteratively solve the optimization problem. The whole learning process has linear computation complexity and desirable scalability. Experiments on three benchmark datasets demonstrate the superiority of DSTH compared with several state-of-the-art approaches. △ Less

Submitted 25 April, 2019; originally announced April 2019.

arXiv:1904.10681 [pdf, other]

A Large-scale Varying-view RGB-D Action Dataset for Arbitrary-view Human Action Recognition

Authors: Yanli Ji, Feixiang Xu, Yang Yang, Fumin Shen, Heng Tao Shen, Wei-Shi Zheng

Abstract: Current researches of action recognition mainly focus on single-view and multi-view recognition, which can hardly satisfies the requirements of human-robot interaction (HRI) applications to recognize actions from arbitrary views. The lack of datasets also sets up barriers. To provide data for arbitrary-view action recognition, we newly collect a large-scale RGB-D action dataset for arbitrary-view… ▽ More Current researches of action recognition mainly focus on single-view and multi-view recognition, which can hardly satisfies the requirements of human-robot interaction (HRI) applications to recognize actions from arbitrary views. The lack of datasets also sets up barriers. To provide data for arbitrary-view action recognition, we newly collect a large-scale RGB-D action dataset for arbitrary-view action analysis, including RGB videos, depth and skeleton sequences. The dataset includes action samples captured in 8 fixed viewpoints and varying-view sequences which covers the entire 360 degree view angles. In total, 118 persons are invited to act 40 action categories, and 25,600 video samples are collected. Our dataset involves more participants, more viewpoints and a large number of samples. More importantly, it is the first dataset containing the entire 360 degree varying-view sequences. The dataset provides sufficient data for multi-view, cross-view and arbitrary-view action analysis. Besides, we propose a View-guided Skeleton CNN (VS-CNN) to tackle the problem of arbitrary-view action recognition. Experiment results show that the VS-CNN achieves superior performance. △ Less

Submitted 24 April, 2019; originally announced April 2019.

Comments: Origianl version has been published by ACMMM 2018

arXiv:1904.04500 [pdf, ps, other]

Regional Robust Secure Precise Wireless Transmission Design for Multi-user UAV Broadcasting System

Authors: Tong Shen, Tingting Liu, Yan Lin, Yongpeng Wu, Feng Shu, Zhu Han

Abstract: In this paper, two regional robust secure precise wireless transmission (SPWT) schemes for multi-user unmanned aerial vehicle (UAV) :1) regional signal-to-leakage-and-noise ratio (SLNR) and artificial-noise-to-leakage-and-noise ratio (ANLNR) (R-SLNR-ANLNR) maximization and 2) point SLNR and ANLNR (P-SLNR-ANLNR) maximization, are proposed to tackle with the estimation errors of the target users' lo… ▽ More In this paper, two regional robust secure precise wireless transmission (SPWT) schemes for multi-user unmanned aerial vehicle (UAV) :1) regional signal-to-leakage-and-noise ratio (SLNR) and artificial-noise-to-leakage-and-noise ratio (ANLNR) (R-SLNR-ANLNR) maximization and 2) point SLNR and ANLNR (P-SLNR-ANLNR) maximization, are proposed to tackle with the estimation errors of the target users' location. In SPWT system, the estimation error for SPWT can not be ignored. However the conventional robust methods in secure wireless communications optimize the beamforming vector in the desired positions only in statistical means and can not guarantee the security for each symbol. Proposed regional robust schemes are designed for optimizing the secrecy performance in the whole error region around the estimated location. Specifically, with known maximal estimation error, we define target region and wiretap region. Then design an optimal beamforming vector and an artificial noise projection matrix, which achieve the confidential signal in the target area having the maximal power while only few signal power is conserved in the potential wiretap region. Instead of considering the statistical distributions of the estimated errors into optimization, we optimize the SLNR and ANLNR of the whole target area, which significantly decreases the complexity. Moreover, the proposed schemes can ensure that the desired users are located in the optimized region, which are more practical than conventional methods. Simulation results show that our proposed regional robust SPWT design is capable of substantially improving the secrecy rate compared to the conventional non-robust method. The P-SLNR-ANLNR maximization-based method has the comparable secrecy performance with a lower complexity than that of the R-SLNR-ANLNR maximization-based method. △ Less

Submitted 8 June, 2020; v1 submitted 9 April, 2019; originally announced April 2019.

arXiv:1904.04084 [pdf, other]

ContextDesc: Local Descriptor Augmentation with Cross-Modality Context

Authors: Zixin Luo, Tianwei Shen, Lei Zhou, Jiahui Zhang, Yao Yao, Shiwei Li, Tian Fang, Long Quan

Abstract: Most existing studies on learning local features focus on the patch-based descriptions of individual keypoints, whereas neglecting the spatial relations established from their keypoint locations. In this paper, we go beyond the local detail representation by introducing context awareness to augment off-the-shelf local feature descriptors. Specifically, we propose a unified learning framework that… ▽ More Most existing studies on learning local features focus on the patch-based descriptions of individual keypoints, whereas neglecting the spatial relations established from their keypoint locations. In this paper, we go beyond the local detail representation by introducing context awareness to augment off-the-shelf local feature descriptors. Specifically, we propose a unified learning framework that leverages and aggregates the cross-modality contextual information, including (i) visual context from high-level image representation, and (ii) geometric context from 2D keypoint distribution. Moreover, we propose an effective N-pair loss that eschews the empirical hyper-parameter search and improves the convergence. The proposed augmentation scheme is lightweight compared with the raw local feature description, meanwhile improves remarkably on several large-scale benchmarks with diversified scenes, which demonstrates both strong practicality and generalization ability in geometric matching applications. △ Less

Submitted 8 April, 2019; originally announced April 2019.

Comments: Accepted to CVPR 2019 (oral), supplementary materials included. (https://github.com/lzx551402/contextdesc)

arXiv:1903.04855 [pdf, other]

Parallel Medical Imaging for Intelligent Medical Image Analysis: Concepts, Methods, and Applications

Authors: Chao Gou, Tianyu Shen, Wenbo Zheng, Huadan Xue, Hui Yu, Qiang Ji, Zhengyu Jin, Fei-Yue Wang

Abstract: There has been much progress in data-driven artificial intelligence technology for medical image analysis in the last decades. However, it still remains challenging due to its distinctive complexity of acquiring and annotating image data, extracting medical domain knowledge, and explaining the diagnostic decision for medical image analysis. In this paper, we propose a data-knowledge-driven framewo… ▽ More There has been much progress in data-driven artificial intelligence technology for medical image analysis in the last decades. However, it still remains challenging due to its distinctive complexity of acquiring and annotating image data, extracting medical domain knowledge, and explaining the diagnostic decision for medical image analysis. In this paper, we propose a data-knowledge-driven framework termed as Parallel Medical Imaging (PMI) for intelligent medical image analysis based on the methodology of interactive ACP-based parallel intelligence. In the PMI framework, computational experiments with predictive learning in a data-driven way are conducted to extract medical knowledge for diagnostic decision support. Artificial imaging systems are introduced to select and prescriptively generate medical image data in a knowledge-driven way to utilize medical domain knowledge. Through the closed-loop optimization based on parallel execution, our proposed PMI framework can boost the generalization ability and alleviate the limitation of medical interpretation for diagnostic decisions. Furthermore, we illustrate the preliminary implementation of PMI method through the case studies of mammogram analysis and skin lesion image analysis. Experimental results on several public medical image datasets demonstrate the effectiveness of proposed PMI. △ Less

Submitted 29 June, 2021; v1 submitted 12 March, 2019; originally announced March 2019.

arXiv:1902.10556 [pdf, other]

Recurrent MVSNet for High-resolution Multi-view Stereo Depth Inference

Authors: Yao Yao, Zixin Luo, Shiwei Li, Tianwei Shen, Tian Fang, Long Quan

Abstract: Deep learning has recently demonstrated its excellent performance for multi-view stereo (MVS). However, one major limitation of current learned MVS approaches is the scalability: the memory-consuming cost volume regularization makes the learned MVS hard to be applied to high-resolution scenes. In this paper, we introduce a scalable multi-view stereo framework based on the recurrent neural network.… ▽ More Deep learning has recently demonstrated its excellent performance for multi-view stereo (MVS). However, one major limitation of current learned MVS approaches is the scalability: the memory-consuming cost volume regularization makes the learned MVS hard to be applied to high-resolution scenes. In this paper, we introduce a scalable multi-view stereo framework based on the recurrent neural network. Instead of regularizing the entire 3D cost volume in one go, the proposed Recurrent Multi-view Stereo Network (R-MVSNet) sequentially regularizes the 2D cost maps along the depth direction via the gated recurrent unit (GRU). This reduces dramatically the memory consumption and makes high-resolution reconstruction feasible. We first show the state-of-the-art performance achieved by the proposed R-MVSNet on the recent MVS benchmarks. Then, we further demonstrate the scalability of the proposed method on several large-scale scenarios, where previous learned approaches often fail due to the memory constraint. Code is available at https://github.com/YoYo000/MVSNet. △ Less

Submitted 27 February, 2019; originally announced February 2019.

Comments: Accepted by CVPR2019

Showing 251–300 of 396 results for author: Shen, T