-
General Proximal Incremental Aggregated Gradient Algorithms: Better and Novel Results under General Scheme
Authors:
Tao Sun,
Yuejiao Sun,
Dongsheng Li,
Qing Liao
Abstract:
The incremental aggregated gradient algorithm is popular in network optimization and machine learning research. However, the current convergence results require the objective function to be strongly convex. And the existing convergence rates are also limited to linear convergence. Due to the mathematical techniques, the stepsize in the algorithm is restricted by the strongly convex constant, which…
▽ More
The incremental aggregated gradient algorithm is popular in network optimization and machine learning research. However, the current convergence results require the objective function to be strongly convex. And the existing convergence rates are also limited to linear convergence. Due to the mathematical techniques, the stepsize in the algorithm is restricted by the strongly convex constant, which may make the stepsize be very small (the strongly convex constant may be small).
In this paper, we propose a general proximal incremental aggregated gradient algorithm, which contains various existing algorithms including the basic incremental aggregated gradient method. Better and new convergence results are proved even with the general scheme. The novel results presented in this paper, which have not appeared in previous literature, include: a general scheme, nonconvex analysis, the sublinear convergence rates of the function values, much larger stepsizes that guarantee the convergence, the convergence when noise exists, the line search strategy of the proximal incremental aggregated gradient algorithm and its convergence.
△ Less
Submitted 11 October, 2019;
originally announced October 2019.
-
Ultrahigh Responsivity Photodetectors of Two-dimensional Covalent Organic Frameworks Integrated on Graphene
Authors:
Yi-feng Xiong,
Qiao-bo Liao,
Zheng-ping Huang,
Xin Huang,
Can Ke,
Heng-tian Zhu,
Chen-yu Dong,
Hao-shang Wang,
Kai Xi,
Peng Zhan,
Fei Xu,
Yan-qing Lu
Abstract:
Two dimensional (2D) materials exhibit superior properties in electronic and optoelectronic fields. The wide demand for high performance optoelectronic devices promotes the exploration of diversified 2D materials. Recently, 2D covalent organic frameworks (COFs) have emerged as next-generation layered materials with predesigned pi electronic skeletons and highly ordered topological structures, whic…
▽ More
Two dimensional (2D) materials exhibit superior properties in electronic and optoelectronic fields. The wide demand for high performance optoelectronic devices promotes the exploration of diversified 2D materials. Recently, 2D covalent organic frameworks (COFs) have emerged as next-generation layered materials with predesigned pi electronic skeletons and highly ordered topological structures, which are promising for tailoring their optoelectronic properties. However, COFs are usually produced as solid powders due to anisotropic growth, making them unreliable to integrate into devices. Here, by selecting tetraphenylethylene (TPE) monomers with photoelectric activity, we designed and synthesized photosensitive 2D COFs with highly ordered topologies and grew 2D COFs in situ on graphene to form well ordered COF graphene heterostructures. Ultrasensitive photodetectors were successfully fabricated with the COFETBC TAPT graphene heterostructure and exhibited an excellent overall performance. Moreover, due to the high surface area and the polarity selectivity of COFs, the photosensing properties of the photodetectors can be reversibly regulated by specific target molecules. Our research provides new strategies for building advanced functional devices with programmable material structures and diversified regulation methods, paving the way for a generation of high performance applications in optoelectronics and many other fields.
△ Less
Submitted 22 October, 2019; v1 submitted 6 September, 2019;
originally announced October 2019.
-
D3M: A deep domain decomposition method for partial differential equations
Authors:
Ke Li,
Kejun Tang,
Tianfan Wu,
Qifeng Liao
Abstract:
A state-of-the-art deep domain decomposition method (D3M) based on the variational principle is proposed for partial differential equations (PDEs). The solution of PDEs can be formulated as the solution of a constrained optimization problem, and we design a multi-fidelity neural network framework to solve this optimization problem. Our contribution is to develop a systematical computational proced…
▽ More
A state-of-the-art deep domain decomposition method (D3M) based on the variational principle is proposed for partial differential equations (PDEs). The solution of PDEs can be formulated as the solution of a constrained optimization problem, and we design a multi-fidelity neural network framework to solve this optimization problem. Our contribution is to develop a systematical computational procedure for the underlying problem in parallel with domain decomposition. Our analysis shows that the D3M approximation solution converges to the exact solution of underlying PDEs. Our proposed framework establishes a foundation to use variational deep learning in large-scale engineering problems and designs. We present a general mathematical framework of D3M, validate its accuracy and demonstrate its efficiency with numerical experiments.
△ Less
Submitted 24 September, 2019;
originally announced September 2019.
-
LCSCNet: Linear Compressing Based Skip-Connecting Network for Image Super-Resolution
Authors:
Wenming Yang,
Xuechen Zhang,
Yapeng Tian,
Wei Wang,
Jing-Hao Xue,
Qingmin Liao
Abstract:
In this paper, we develop a concise but efficient network architecture called linear compressing based skip-connecting network (LCSCNet) for image super-resolution. Compared with two representative network architectures with skip connections, ResNet and DenseNet, a linear compressing layer is designed in LCSCNet for skip connection, which connects former feature maps and distinguishes them from ne…
▽ More
In this paper, we develop a concise but efficient network architecture called linear compressing based skip-connecting network (LCSCNet) for image super-resolution. Compared with two representative network architectures with skip connections, ResNet and DenseNet, a linear compressing layer is designed in LCSCNet for skip connection, which connects former feature maps and distinguishes them from newly-explored feature maps. In this way, the proposed LCSCNet enjoys the merits of the distinguish feature treatment of DenseNet and the parameter-economic form of ResNet. Moreover, to better exploit hierarchical information from both low and high levels of various receptive fields in deep models, inspired by gate units in LSTM, we also propose an adaptive element-wise fusion strategy with multi-supervised training. Experimental results in comparison with state-of-the-art algorithms validate the effectiveness of LCSCNet.
△ Less
Submitted 8 September, 2019;
originally announced September 2019.
-
One Explanation Does Not Fit All: A Toolkit and Taxonomy of AI Explainability Techniques
Authors:
Vijay Arya,
Rachel K. E. Bellamy,
Pin-Yu Chen,
Amit Dhurandhar,
Michael Hind,
Samuel C. Hoffman,
Stephanie Houde,
Q. Vera Liao,
Ronny Luss,
Aleksandra Mojsilović,
Sami Mourad,
Pablo Pedemonte,
Ramya Raghavendra,
John Richards,
Prasanna Sattigeri,
Karthikeyan Shanmugam,
Moninder Singh,
Kush R. Varshney,
Dennis Wei,
Yunfeng Zhang
Abstract:
As artificial intelligence and machine learning algorithms make further inroads into society, calls are increasing from multiple stakeholders for these algorithms to explain their outputs. At the same time, these stakeholders, whether they be affected citizens, government regulators, domain experts, or system developers, present different requirements for explanations. Toward addressing these need…
▽ More
As artificial intelligence and machine learning algorithms make further inroads into society, calls are increasing from multiple stakeholders for these algorithms to explain their outputs. At the same time, these stakeholders, whether they be affected citizens, government regulators, domain experts, or system developers, present different requirements for explanations. Toward addressing these needs, we introduce AI Explainability 360 (http://aix360.mybluemix.net/), an open-source software toolkit featuring eight diverse and state-of-the-art explainability methods and two evaluation metrics. Equally important, we provide a taxonomy to help entities requiring explanations to navigate the space of explanation methods, not only those in the toolkit but also in the broader literature on explainability. For data scientists and other users of the toolkit, we have implemented an extensible software architecture that organizes methods according to their place in the AI modeling pipeline. We also discuss enhancements to bring research innovations closer to consumers of explanations, ranging from simplified, more accessible versions of algorithms, to tutorials and an interactive web demo to introduce AI explainability to different audiences and application domains. Together, our toolkit and taxonomy can help identify gaps where more explainability methods are needed and provide a platform to incorporate them as they are developed.
△ Less
Submitted 14 September, 2019; v1 submitted 6 September, 2019;
originally announced September 2019.
-
Theoretical Issues in Deep Networks: Approximation, Optimization and Generalization
Authors:
Tomaso Poggio,
Andrzej Banburski,
Qianli Liao
Abstract:
While deep learning is successful in a number of applications, it is not yet well understood theoretically. A satisfactory theoretical characterization of deep learning however, is beginning to emerge. It covers the following questions: 1) representation power of deep networks 2) optimization of the empirical risk 3) generalization properties of gradient descent techniques --- why the expected err…
▽ More
While deep learning is successful in a number of applications, it is not yet well understood theoretically. A satisfactory theoretical characterization of deep learning however, is beginning to emerge. It covers the following questions: 1) representation power of deep networks 2) optimization of the empirical risk 3) generalization properties of gradient descent techniques --- why the expected error does not suffer, despite the absence of explicit regularization, when the networks are overparametrized? In this review we discuss recent advances in the three areas. In approximation theory both shallow and deep networks have been shown to approximate any continuous functions on a bounded domain at the expense of an exponential number of parameters (exponential in the dimensionality of the function). However, for a subset of compositional functions, deep networks of the convolutional type can have a linear dependence on dimensionality, unlike shallow networks. In optimization we discuss the loss landscape for the exponential loss function and show that stochastic gradient descent will find with high probability the global minima. To address the question of generalization for classification tasks, we use classical uniform convergence results to justify minimizing a surrogate exponential-type loss function under a unit norm constraint on the weight matrix at each layer -- since the interesting variables for classification are the weight directions rather than the weights. Our approach, which is supported by several independent new results, offers a solution to the puzzle about generalization performance of deep overparametrized ReLU networks, uncovering the origin of the underlying hidden complexity control.
△ Less
Submitted 25 August, 2019;
originally announced August 2019.
-
Excited heavy quarkonium production in Higgs boson decays
Authors:
Qi-Li Liao,
Jun Jiang
Abstract:
The rare decay channels of Higgs boson to heavy quarkonium offer vital opportunities to explore the coupling of Higgs to heavy quarks. We study the semi-exclusive decay channels of Higgs boson to heavy quarkonia, i.e., $H^0\to |(Q\bar{Q^{\prime}})[n]\rangle+\bar{Q}Q^{\prime}$ ($Q^{(\prime)}=c~\text{or}~b$ quark) within the NRQCD framework. In addition to the lower-level Fock states…
▽ More
The rare decay channels of Higgs boson to heavy quarkonium offer vital opportunities to explore the coupling of Higgs to heavy quarks. We study the semi-exclusive decay channels of Higgs boson to heavy quarkonia, i.e., $H^0\to |(Q\bar{Q^{\prime}})[n]\rangle+\bar{Q}Q^{\prime}$ ($Q^{(\prime)}=c~\text{or}~b$ quark) within the NRQCD framework. In addition to the lower-level Fock states $|(Q\bar{Q'})[1S]\rangle$ continent, contributions of high excited states $|(Q\bar{Q'})[2S]\rangle$, $|(Q\bar{Q'})[3S]\rangle$, $|(Q\bar{Q'})[4S]\rangle$, $|(Q\bar{Q'})[1P]\rangle$, $|(Q\bar{Q'})[2P]\rangle$, $|(Q\bar{Q'})[3P]\rangle$ and $|(Q\bar{Q'})[4P]\rangle$ are also studied. According to our study, the contributions of high excited Fock states should be considered seriously. Differential distributions of total decay width with respect to invariant-mass and angles, as well as uncertainties caused by non-perturbative hadronic non-perturbative matrix elements are discussed. If all excited heavy quarkonium states decay to the ground spin-singlet state through electromagnetic or hadronic interactions, we obtain the decay widths for $|(Q\bar{Q'})\rangle$ quarkonium production through $H^0$ semi-exclusive decays: $25.10^{+11.6\%}_{-51.6\%}$ keV for $|(b\bar{c})[n]\rangle$ meson, $3.23^{+0\%}_{-62.2\%}$ keV for $|(c\bar{c})[n]\rangle$ and $2.36^{+0\%}_{-57.1\%}$ keV for $|(b\bar{b})[n]\rangle$, where uncertainties are caused by adopting different non-perturbative potential models. At future high energy LHC ($\sqrt{s}=27$ TeV), numerical results show that sizable amounts of events for those high excited states can be produced, which implies that one could also consider exploring the coupling properties of Higgs to heavy quarks in these high excited states channels, especially for the charmonium and bottomonium.
△ Less
Submitted 21 August, 2019; v1 submitted 4 August, 2019;
originally announced August 2019.
-
A hierarchical neural hybrid method for failure probability estimation
Authors:
Ke Li,
Kejun Tang,
Jinglai Li,
Tianfan Wu,
Qifeng Liao
Abstract:
Failure probability evaluation for complex physical and engineering systems governed by partial differential equations (PDEs) are computationally intensive, especially when high-dimensional random parameters are involved. Since standard numerical schemes for solving these complex PDEs are expensive, traditional Monte Carlo methods which require repeatedly solving PDEs are infeasible. Alternative a…
▽ More
Failure probability evaluation for complex physical and engineering systems governed by partial differential equations (PDEs) are computationally intensive, especially when high-dimensional random parameters are involved. Since standard numerical schemes for solving these complex PDEs are expensive, traditional Monte Carlo methods which require repeatedly solving PDEs are infeasible. Alternative approaches which are typically the surrogate based methods suffer from the so-called ``curse of dimensionality'', which limits their application to problems with high-dimensional parameters. For this purpose, we develop a novel hierarchical neural hybrid (HNH) method to efficiently compute failure probabilities of these challenging high-dimensional problems. Especially, multifidelity surrogates are constructed based on neural networks with different levels of layers, such that expensive highfidelity surrogates are adapted only when the parameters are in the suspicious domain. The efficiency of our new HNH method is theoretically analyzed and is demonstrated with numerical experiments. From numerical results, we show that to achieve an accuracy in estimating the rare failure probability (e.g., $10^{-5}$), the traditional Monte Carlo method needs to solve PDEs more than a million times, while our HNH only requires solving them a few thousand times.
△ Less
Submitted 3 August, 2019;
originally announced August 2019.
-
Teasing out the overall survival benefit with adjustment for treatment switching to other therapies
Authors:
Yuqing Xu,
Meijing Wu,
Weili He,
Qiming Liao,
Yabing Mai
Abstract:
In oncology clinical trials, characterizing the long-term overall survival (OS) benefit for an experimental drug or treatment regimen (experimental group) is often unobservable if some patients in the control group switch to drugs in the experimental group and/or other cancer treatments after disease progression. A key question often raised by payers and reimbursement agencies is how to estimate t…
▽ More
In oncology clinical trials, characterizing the long-term overall survival (OS) benefit for an experimental drug or treatment regimen (experimental group) is often unobservable if some patients in the control group switch to drugs in the experimental group and/or other cancer treatments after disease progression. A key question often raised by payers and reimbursement agencies is how to estimate the true benefit of the experimental drug group on overall survival that would have been estimated if there were no treatment switches. Several commonly used statistical methods are available to estimate overall survival benefit while adjusting for treatment switching, ranging from naive exclusion or censoring approaches to more advanced methods including inverse probability of censoring weighting (IPCW), iterative parameter estimation (IPE) algorithm or rank-preserving structural failure time models (RPSFTM). However, many clinical trials now have patients switching to different treatment regimens other than the test drugs, and the existing methods cannot handle more complicated scenarios. To address this challenge, we propose two additional methods: stratified RPSFTM and random-forest-based prediction. A simulation study is conducted to assess the properties of the existing methods along with the two newly proposed approaches.
△ Less
Submitted 1 August, 2019;
originally announced August 2019.
-
An End-to-End Performance Analysis for Service Chaining in a Virtualized Network
Authors:
Emmanouil Fountoulakis,
Qi Liao,
Nikolaos Pappas
Abstract:
Future mobile networks supporting Internet of Things are expected to provide both high throughput and low latency to user-specific services. One way to overcome this challenge is to adopt Network Function Virtualization (NFV) and Multi-access Edge Computing (MEC). Besides latency constraints, these services may have strict function chaining requirements. The distribution of network functions over…
▽ More
Future mobile networks supporting Internet of Things are expected to provide both high throughput and low latency to user-specific services. One way to overcome this challenge is to adopt Network Function Virtualization (NFV) and Multi-access Edge Computing (MEC). Besides latency constraints, these services may have strict function chaining requirements. The distribution of network functions over different hosts and more flexible routing caused by service function chaining raise new challenges for end-to-end performance analysis. In this paper, as a first step, we analyze an end-to-end communications system that consists of both MEC servers and a server at the core network hosting different types of virtual network functions. We develop a queueing model for the performance analysis of the system consisting of both processing and transmission flows. We propose a method in order to derive analytical expressions of the performance metrics of interest. Then, we show how to apply the similar method to an extended larger system and derive a stochastic model for such systems. We observe that the simulation and analytical results coincide. By evaluating the system under different scenarios, we provide insights for the decision making on traffic flow control and its impact on critical performance metrics.
△ Less
Submitted 24 June, 2019;
originally announced June 2019.
-
Multi-label learning for improving discretely-modulated continuous-variable quantum key distribution
Authors:
Qin Liao,
Hai Zhong,
Ying Guo
Abstract:
Discretely-modulated continuous-variable quantum key distribution (CVQKD) is more suitable for long-distance transmission compared with its Gaussian-modulated CVQKD counterpart. However, its security can only be guaranteed when modulation variance is very small, which limits its further development. To solve this problem, in this work, we propose a novel scheme for discretely-modulated CVQKD using…
▽ More
Discretely-modulated continuous-variable quantum key distribution (CVQKD) is more suitable for long-distance transmission compared with its Gaussian-modulated CVQKD counterpart. However, its security can only be guaranteed when modulation variance is very small, which limits its further development. To solve this problem, in this work, we propose a novel scheme for discretely-modulated CVQKD using multi-label learning technology, called multi-label learning-based CVQKD (ML-CVQKD). In particular, the proposed scheme divides the whole quantum system into state learning and state prediction. The former is used for training and estimating quantum classifier, and the latter is used for generating final secret key. A quantum multi-label classification (QMLC) algorithm is also designed as an embedded classifier for distinguishing coherent state. Feature extraction for coherent state and related machine learning-based metrics for the quantum classifier are successively suggested. Security analysis shows that QMLC-embedded ML-CVQKD is able to immune intercept-resend attack so that small modulation variance is no longer compulsively required, thereby improving the performance of discretely-modulated CVQKD system.
△ Less
Submitted 19 January, 2020; v1 submitted 9 June, 2019;
originally announced June 2019.
-
Tell Me About Yourself: Using an AI-Powered Chatbot to Conduct Conversational Surveys with Open-ended Questions
Authors:
Ziang Xiao,
Michelle X. Zhou,
Q. Vera Liao,
Gloria Mark,
Changyan Chi,
Wenxi Chen,
Huahai Yang
Abstract:
The rise of increasingly more powerful chatbots offers a new way to collect information through conversational surveys, where a chatbot asks open-ended questions, interprets a user's free-text responses, and probes answers whenever needed. To investigate the effectiveness and limitations of such a chatbot in conducting surveys, we conducted a field study involving about 600 participants. In this s…
▽ More
The rise of increasingly more powerful chatbots offers a new way to collect information through conversational surveys, where a chatbot asks open-ended questions, interprets a user's free-text responses, and probes answers whenever needed. To investigate the effectiveness and limitations of such a chatbot in conducting surveys, we conducted a field study involving about 600 participants. In this study with mostly open-ended questions, half of the participants took a typical online survey on Qualtrics and the other half interacted with an AI-powered chatbot to complete a conversational survey. Our detailed analysis of over 5200 free-text responses revealed that the chatbot drove a significantly higher level of participant engagement and elicited significantly better quality responses measured by Gricean Maxims in terms of their informativeness, relevance, specificity, and clarity. Based on our results, we discuss design implications for creating AI-powered chatbots to conduct effective surveys and beyond.
△ Less
Submitted 20 March, 2020; v1 submitted 25 May, 2019;
originally announced May 2019.
-
Automatic Calibration of Multiple 3D LiDARs in Urban Environments
Authors:
Jianhao Jiao,
Yang Yu,
Qinghai Liao,
Haoyang Ye,
Ming Liu
Abstract:
Multiple LiDARs have progressively emerged on autonomous vehicles for rendering a wide field of view and dense measurements. However, the lack of precise calibration negatively affects their potential applications in localization and perception systems. In this paper, we propose a novel system that enables automatic multi-LiDAR calibration without any calibration target, prior environmental inform…
▽ More
Multiple LiDARs have progressively emerged on autonomous vehicles for rendering a wide field of view and dense measurements. However, the lack of precise calibration negatively affects their potential applications in localization and perception systems. In this paper, we propose a novel system that enables automatic multi-LiDAR calibration without any calibration target, prior environmental information, and initial values of the extrinsic parameters. Our approach starts with a hand-eye calibration for automatic initialization by aligning the estimated motions of each sensor. The resulting parameters are then refined with an appearance-based method by minimizing a cost function constructed from point-plane correspondences. Experimental results on simulated and real-world data sets demonstrate the reliability and accuracy of our calibration approach. The proposed approach can calibrate a multi-LiDAR system with the rotation and translation errors less than 0.04 [rad] and 0.1 [m] respectively for a mobile platform.
△ Less
Submitted 13 May, 2019;
originally announced May 2019.
-
A Novel Dual-Lidar Calibration Algorithm Using Planar Surfaces
Authors:
Jianhao Jiao,
Qinghai Liao,
Yilong Zhu,
Tianyu Liu,
Yang Yu,
Rui Fan,
Lujia Wang,
Ming Liu
Abstract:
Multiple lidars are prevalently used on mobile vehicles for rendering a broad view to enhance the performance of localization and perception systems. However, precise calibration of multiple lidars is challenging since the feature correspondences in scan points cannot always provide enough constraints. To address this problem, the existing methods require fixed calibration targets in scenes or rel…
▽ More
Multiple lidars are prevalently used on mobile vehicles for rendering a broad view to enhance the performance of localization and perception systems. However, precise calibration of multiple lidars is challenging since the feature correspondences in scan points cannot always provide enough constraints. To address this problem, the existing methods require fixed calibration targets in scenes or rely exclusively on additional sensors. In this paper, we present a novel method that enables automatic lidar calibration without these restrictions. Three linearly independent planar surfaces appearing in surroundings is utilized to find correspondences. Two components are developed to ensure the extrinsic parameters to be found: a closed-form solver for initialization and an optimizer for refinement by minimizing a nonlinear cost function. Simulation and experimental results demonstrate the high accuracy of our calibration approach with the rotation and translation errors smaller than 0.05rad and 0.1m respectively.
△ Less
Submitted 27 April, 2019;
originally announced April 2019.
-
Boundary Aware Multi-Focus Image Fusion Using Deep Neural Network
Authors:
Haoyu Ma,
Juncheng Zhang,
Shaojun Liu,
Qingmin Liao
Abstract:
Since it is usually difficult to capture an all-in-focus image of a 3D scene directly, various multi-focus image fusion methods are employed to generate it from several images focusing at different depths. However, the performance of existing methods is barely satisfactory and often degrades for areas near the focused/defocused boundary (FDB). In this paper, a boundary aware method using deep neur…
▽ More
Since it is usually difficult to capture an all-in-focus image of a 3D scene directly, various multi-focus image fusion methods are employed to generate it from several images focusing at different depths. However, the performance of existing methods is barely satisfactory and often degrades for areas near the focused/defocused boundary (FDB). In this paper, a boundary aware method using deep neural network is proposed to overcome this problem. (1) Aiming to acquire improved fusion images, a 2-channel deep network is proposed to better extract the relative defocus information of the two source images. (2) After analyzing the different situations for patches far away from and near the FDB, we use two networks to handle them respectively. (3) To simulate the reality more precisely, a new approach of dataset generation is designed. Experiments demonstrate that the proposed method outperforms the state-of-the-art methods, both qualitatively and quantitatively.
△ Less
Submitted 30 March, 2019;
originally announced April 2019.
-
Connections between spectral properties of asymptotic mappings and solutions to wireless network problems
Authors:
Renato Luís Garrido Cavalcante,
Qi Liao,
Slawomir Stańczak
Abstract:
In this study we establish connections between asymptotic functions and properties of solutions to important problems in wireless networks. We start by introducing a class of self-mappings (called asymptotic mappings) constructed with asymptotic functions, and we show that spectral properties of these mappings explain the behavior of solutions to some maxmin utility optimization problems. For exam…
▽ More
In this study we establish connections between asymptotic functions and properties of solutions to important problems in wireless networks. We start by introducing a class of self-mappings (called asymptotic mappings) constructed with asymptotic functions, and we show that spectral properties of these mappings explain the behavior of solutions to some maxmin utility optimization problems. For example, in a common family of max-min utility power control problems, we prove that the optimal utility as a function of the power available to transmitters is approximately linear in the low power regime. However, as we move away from this regime, there exists a transition point, easily computed from the spectral radius of an asymptotic mapping, from which gains in utility become increasingly marginal. From these results we derive analogous properties of the transmit energy efficiency. In this study we also generalize and unify existing approaches for feasibility analysis in wireless networks. Feasibility problems often reduce to determining the existence of the fixed point of a standard interference mapping, and we show that the spectral radius of an asymptotic mapping provides a necessary and sufficient condition for the existence of such a fixed point. We further present a result that determines whether the fixed point satisfies a constraint given in terms of a monotone norm.
△ Less
Submitted 25 July, 2022; v1 submitted 23 March, 2019;
originally announced March 2019.
-
Theory III: Dynamics and Generalization in Deep Networks
Authors:
Andrzej Banburski,
Qianli Liao,
Brando Miranda,
Lorenzo Rosasco,
Fernanda De La Torre,
Jack Hidary,
Tomaso Poggio
Abstract:
The key to generalization is controlling the complexity of the network. However, there is no obvious control of complexity -- such as an explicit regularization term -- in the training of deep networks for classification. We will show that a classical form of norm control -- but kind of hidden -- is present in deep networks trained with gradient descent techniques on exponential-type losses. In pa…
▽ More
The key to generalization is controlling the complexity of the network. However, there is no obvious control of complexity -- such as an explicit regularization term -- in the training of deep networks for classification. We will show that a classical form of norm control -- but kind of hidden -- is present in deep networks trained with gradient descent techniques on exponential-type losses. In particular, gradient descent induces a dynamics of the normalized weights which converge for $t \to \infty$ to an equilibrium which corresponds to a minimum norm (or maximum margin) solution. For sufficiently large but finite $ρ$ -- and thus finite $t$ -- the dynamics converges to one of several margin maximizers, with the margin monotonically increasing towards a limit stationary point of the flow. In the usual case of stochastic gradient descent, most of the stationary points are likely to be convex minima corresponding to a constrained minimizer -- the network with normalized weights-- which corresponds to vanishing regularization. The solution has zero generalization gap, for fixed architecture, asymptotically for $N \to \infty$, where $N$ is the number of training examples. Our approach extends some of the original results of Srebro from linear networks to deep networks and provides a new perspective on the implicit bias of gradient descent. We believe that the elusive complexity control we describe is responsible for the puzzling empirical finding of good predictive performance by deep networks, despite overparametrization.
△ Less
Submitted 10 April, 2020; v1 submitted 12 March, 2019;
originally announced March 2019.
-
Lightweight Feature Fusion Network for Single Image Super-Resolution
Authors:
Wenming Yang,
Wei Wang,
Xuechen Zhang,
Shuifa Sun,
Qingmin Liao
Abstract:
Single image super-resolution(SISR) has witnessed great progress as convolutional neural network(CNN) gets deeper and wider. However, enormous parameters hinder its application to real world problems. In this letter, We propose a lightweight feature fusion network (LFFN) that can fully explore multi-scale contextual information and greatly reduce network parameters while maximizing SISR results. L…
▽ More
Single image super-resolution(SISR) has witnessed great progress as convolutional neural network(CNN) gets deeper and wider. However, enormous parameters hinder its application to real world problems. In this letter, We propose a lightweight feature fusion network (LFFN) that can fully explore multi-scale contextual information and greatly reduce network parameters while maximizing SISR results. LFFN is built on spindle blocks and a softmax feature fusion module (SFFM). Specifically, a spindle block is composed of a dimension extension unit, a feature exploration unit and a feature refinement unit. The dimension extension layer expands low dimension to high dimension and implicitly learns the feature maps which is suitable for the next unit. The feature exploration unit performs linear and nonlinear feature exploration aimed at different feature maps. The feature refinement layer is used to fuse and refine features. SFFM fuses the features from different modules in a self-adaptive learning manner with softmax function, making full use of hierarchical information with a small amount of parameter cost. Both qualitative and quantitative experiments on benchmark datasets show that LFFN achieves favorable performance against state-of-the-art methods with similar parameters.
△ Less
Submitted 13 April, 2019; v1 submitted 15 February, 2019;
originally announced February 2019.
-
Explaining Models: An Empirical Study of How Explanations Impact Fairness Judgment
Authors:
Jonathan Dodge,
Q. Vera Liao,
Yunfeng Zhang,
Rachel K. E. Bellamy,
Casey Dugan
Abstract:
Ensuring fairness of machine learning systems is a human-in-the-loop process. It relies on developers, users, and the general public to identify fairness problems and make improvements. To facilitate the process we need effective, unbiased, and user-friendly explanations that people can confidently rely on. Towards that end, we conducted an empirical study with four types of programmatically gener…
▽ More
Ensuring fairness of machine learning systems is a human-in-the-loop process. It relies on developers, users, and the general public to identify fairness problems and make improvements. To facilitate the process we need effective, unbiased, and user-friendly explanations that people can confidently rely on. Towards that end, we conducted an empirical study with four types of programmatically generated explanations to understand how they impact people's fairness judgments of ML systems. With an experiment involving more than 160 Mechanical Turk workers, we show that: 1) Certain explanations are considered inherently less fair, while others can enhance people's confidence in the fairness of the algorithm; 2) Different fairness problems--such as model-wide fairness issues versus case-specific fairness discrepancies--may be more effectively exposed through different styles of explanation; 3) Individual differences, including prior positions and judgment criteria of algorithmic fairness, impact how people react to different styles of explanation. We conclude with a discussion on providing personalized and adaptive explanations to support fairness judgments of ML systems.
△ Less
Submitted 22 January, 2019;
originally announced January 2019.
-
Bootstrapping Conversational Agents With Weak Supervision
Authors:
Neil Mallinar,
Abhishek Shah,
Rajendra Ugrani,
Ayush Gupta,
Manikandan Gurusankar,
Tin Kam Ho,
Q. Vera Liao,
Yunfeng Zhang,
Rachel K. E. Bellamy,
Robert Yates,
Chris Desmarais,
Blake McGregor
Abstract:
Many conversational agents in the market today follow a standard bot development framework which requires training intent classifiers to recognize user input. The need to create a proper set of training examples is often the bottleneck in the development process. In many occasions agent developers have access to historical chat logs that can provide a good quantity as well as coverage of training…
▽ More
Many conversational agents in the market today follow a standard bot development framework which requires training intent classifiers to recognize user input. The need to create a proper set of training examples is often the bottleneck in the development process. In many occasions agent developers have access to historical chat logs that can provide a good quantity as well as coverage of training examples. However, the cost of labeling them with tens to hundreds of intents often prohibits taking full advantage of these chat logs. In this paper, we present a framework called \textit{search, label, and propagate} (SLP) for bootstrapping intents from existing chat logs using weak supervision. The framework reduces hours to days of labeling effort down to minutes of work by using a search engine to find examples, then relies on a data programming approach to automatically expand the labels. We report on a user study that shows positive user feedback for this new approach to build conversational agents, and demonstrates the effectiveness of using data programming for auto-labeling. While the system is developed for training conversational agents, the framework has broader application in significantly reducing labeling effort for training text classifiers.
△ Less
Submitted 14 December, 2018;
originally announced December 2018.
-
Rank adaptive tensor recovery based model reduction for partial differential equations with high-dimensional random inputs
Authors:
Kejun Tang,
Qifeng Liao
Abstract:
This work proposes a systematic model reduction approach based on rank adaptive tensor recovery for partial differential equation (PDE) models with high-dimensional random parameters. Since the standard outputs of interest of these models are discrete solutions on given physical grids which are high-dimensional, we use kernel principal component analysis to construct stochastic collocation approxi…
▽ More
This work proposes a systematic model reduction approach based on rank adaptive tensor recovery for partial differential equation (PDE) models with high-dimensional random parameters. Since the standard outputs of interest of these models are discrete solutions on given physical grids which are high-dimensional, we use kernel principal component analysis to construct stochastic collocation approximations in reduced dimensional spaces of the outputs. To address the issue of high-dimensional random inputs, we develop a new efficient rank adaptive tensor recovery approach to compute the collocation coefficients. Novel efficient initialization strategies for non-convex optimization problems involved in tensor recovery are also developed in this work. We present a general mathematical framework of our overall model reduction approach, analyze its stability, and demonstrate its efficiency with numerical experiments.
△ Less
Submitted 13 February, 2019; v1 submitted 11 December, 2018;
originally announced December 2018.
-
Continuous-variable quantum key distribution with non-Gaussian quantum catalysis
Authors:
Ying Guo,
Wei Ye,
Hai Zhong,
Qin Liao
Abstract:
The non-Gaussian operation can be used not only to enhance and distill the entanglement between Gaussian entangled states, but also to improve quantum communications. In this paper, we propose an non-Gaussian continuous-variable quantum key distribution (CVQKD) by using quantum catalysis (QC), which is an intriguing non-Gaussian operation in essence that can be implemented with current technologie…
▽ More
The non-Gaussian operation can be used not only to enhance and distill the entanglement between Gaussian entangled states, but also to improve quantum communications. In this paper, we propose an non-Gaussian continuous-variable quantum key distribution (CVQKD) by using quantum catalysis (QC), which is an intriguing non-Gaussian operation in essence that can be implemented with current technologies. We perform quantum catalysis on both ends of the Einstein-Podolsky-Rosen (EPR) pair prepared by a sender, Alice, and find that for the single-photon QC-CVQKD, the bilateral symmetrical quantum catalysis (BSQC) performs better than the single-side quantum catalysis (SSQC). Attributing to characteristics of integral within an ordered product (IWOP) of operators, we find that the quantum catalysis operation can improve the entanglement property of Gaussian entangled states by enhancing the success probability of non-Gaussian operation, leading to the improvement of the QC-CVQKD system. As a comparison, the QC-CVQKD system involving zero-photon and single-photon quantum catalysis outperforms the previous non-Gaussian CVQKD scheme via photon subtraction in terms of secret key rate, maximal transmission distance and tolerable excess noise.
△ Less
Submitted 3 December, 2018; v1 submitted 16 November, 2018;
originally announced November 2018.
-
An adaptive reduced basis ANOVA method for high-dimensional Bayesian inverse problems
Authors:
Qifeng Liao,
Jinglai Li
Abstract:
In Bayesian inverse problems sampling the posterior distribution is often a challenging task when the underlying models are computationally intensive. To this end, surrogates or reduced models are often used to accelerate the computation. However, in many practical problems, the parameter of interest can be of high dimensionality, which renders standard model reduction techniques infeasible. In th…
▽ More
In Bayesian inverse problems sampling the posterior distribution is often a challenging task when the underlying models are computationally intensive. To this end, surrogates or reduced models are often used to accelerate the computation. However, in many practical problems, the parameter of interest can be of high dimensionality, which renders standard model reduction techniques infeasible. In this paper, we present an approach that employs the ANOVA decomposition method to reduce the model with respect to the unknown parameters, and the reduced basis method to reduce the model with respect to the physical parameters. Moreover, we provide an adaptive scheme within the MCMC iterations, to perform the ANOVA decomposition with respect to the posterior distribution. With numerical examples, we demonstrate that the proposed model reduction method can significantly reduce the computational cost of Bayesian inverse problems, without sacrificing much accuracy.
△ Less
Submitted 13 November, 2018;
originally announced November 2018.
-
Biologically-plausible learning algorithms can scale to large datasets
Authors:
Will Xiao,
Honglin Chen,
Qianli Liao,
Tomaso Poggio
Abstract:
The backpropagation (BP) algorithm is often thought to be biologically implausible in the brain. One of the main reasons is that BP requires symmetric weight matrices in the feedforward and feedback pathways. To address this "weight transport problem" (Grossberg, 1987), two more biologically plausible algorithms, proposed by Liao et al. (2016) and Lillicrap et al. (2016), relax BP's weight symmetr…
▽ More
The backpropagation (BP) algorithm is often thought to be biologically implausible in the brain. One of the main reasons is that BP requires symmetric weight matrices in the feedforward and feedback pathways. To address this "weight transport problem" (Grossberg, 1987), two more biologically plausible algorithms, proposed by Liao et al. (2016) and Lillicrap et al. (2016), relax BP's weight symmetry requirements and demonstrate comparable learning capabilities to that of BP on small datasets. However, a recent study by Bartunov et al. (2018) evaluate variants of target-propagation (TP) and feedback alignment (FA) on MINIST, CIFAR, and ImageNet datasets, and find that although many of the proposed algorithms perform well on MNIST and CIFAR, they perform significantly worse than BP on ImageNet. Here, we additionally evaluate the sign-symmetry algorithm (Liao et al., 2016), which differs from both BP and FA in that the feedback and feedforward weights share signs but not magnitudes. We examine the performance of sign-symmetry and feedback alignment on ImageNet and MS COCO datasets using different network architectures (ResNet-18 and AlexNet for ImageNet, RetinaNet for MS COCO). Surprisingly, networks trained with sign-symmetry can attain classification performance approaching that of BP-trained networks. These results complement the study by Bartunov et al. (2018), and establish a new benchmark for future biologically plausible learning algorithms on more difficult datasets and more complex architectures.
△ Less
Submitted 20 December, 2018; v1 submitted 8 November, 2018;
originally announced November 2018.
-
Traversing Virtual Network Functions from the Edge to the Core: An End-to-End Performance Analysis
Authors:
Emmanouil Fountoulakis,
Qi Liao,
Manuel Stein,
Nikolaos Pappas
Abstract:
Future mobile networks supporting Internet of Things are expected to provide both high throughput and low latency to user-specific services. One way to overcome this challenge is to adopt network function virtualization and Multi-access edge computing (MEC). In this paper, we analyze an end-to-end communication system that consists of both MEC servers and a server at the core network hosting diffe…
▽ More
Future mobile networks supporting Internet of Things are expected to provide both high throughput and low latency to user-specific services. One way to overcome this challenge is to adopt network function virtualization and Multi-access edge computing (MEC). In this paper, we analyze an end-to-end communication system that consists of both MEC servers and a server at the core network hosting different types of virtual network functions. We develop a queueing model for the performance analysis of the system consisting of both processing and transmission flows. The system is decomposed into subsystems which are independently analyzed in order to approximate the behaviour of the original system. We provide closed-form expressions of the performance metrics such as system drop rate and average number of tasks in the system. Simulation results show that our approximation performs quite well. By evaluating the system under different scenarios, we provide insights for the decision making on traffic flow control and its impact on critical performance metrics.
△ Less
Submitted 3 July, 2019; v1 submitted 6 November, 2018;
originally announced November 2018.
-
Hierarchical landscape of hard disk glasses
Authors:
Qinyi Liao,
Ludovic Berthier
Abstract:
We numerically analyse the landscape governing the evolution of the vibrational dynamics of hard disk glasses as the density increases towards jamming. We find that the dynamics becomes slow, spatially correlated, and starts to display aging dynamics across an avoided Gardner transition, with a phenomenology that resembles three dimensional observations. We carefully analyse the behaviour of singl…
▽ More
We numerically analyse the landscape governing the evolution of the vibrational dynamics of hard disk glasses as the density increases towards jamming. We find that the dynamics becomes slow, spatially correlated, and starts to display aging dynamics across an avoided Gardner transition, with a phenomenology that resembles three dimensional observations. We carefully analyse the behaviour of single glass samples, and find that the emergence of aging dynamics is controlled by the apparition of a complex organisation of the landscape that splits into a remarkable hierarchy of minima as jamming is approached. Our results show that the mean-field prediction of a Gardner phase characterized by an ultrametric structure of the landscape provides a useful description of finite dimensional systems, even when the Gardner transition is avoided.
△ Less
Submitted 21 March, 2019; v1 submitted 24 October, 2018;
originally announced October 2018.
-
Dynamic Power Control for Packets with Deadlines
Authors:
Emmanouil Fountoulakis,
Nikolaos Pappas,
Qi Liao,
Anthony Ephremides,
Vangelis Angelakis
Abstract:
Wireless devices need to adapt their transmission power according to the fluctuating wireless channel in order to meet constraints of delay sensitive applications. In this paper, we consider delay sensitivity in the form of strict packet deadlines arriving in a transmission queue. Packets missing the deadline while in the queue are dropped from the system. We aim at minimizing the packet drop rate…
▽ More
Wireless devices need to adapt their transmission power according to the fluctuating wireless channel in order to meet constraints of delay sensitive applications. In this paper, we consider delay sensitivity in the form of strict packet deadlines arriving in a transmission queue. Packets missing the deadline while in the queue are dropped from the system. We aim at minimizing the packet drop rate under average power constraints. We utilize tools from Lyapunov optimization to find an approximate solution by selecting power allocation. We evaluate the performance of the proposed algorithm and show that it achieves the same performance in terms of packet drop rate with that of the Earliest Deadline First (EDF) when the available power is sufficient. However, our algorithm outperforms EDF regarding the trade-off between packet drop rate and average power consumption.
△ Less
Submitted 20 September, 2018;
originally announced September 2018.
-
Optical Flow Super-Resolution Based on Image Guidence Using Convolutional Neural Network
Authors:
Liping Zhang,
Zongqing Lu,
Qingmin Liao
Abstract:
The convolutional neural network model for optical flow estimation usually outputs a low-resolution(LR) optical flow field. To obtain the corresponding full image resolution,interpolation and variational approach are the most common options, which do not effectively improve the results. With the motivation of various convolutional neural network(CNN) structures succeeded in single image super-reso…
▽ More
The convolutional neural network model for optical flow estimation usually outputs a low-resolution(LR) optical flow field. To obtain the corresponding full image resolution,interpolation and variational approach are the most common options, which do not effectively improve the results. With the motivation of various convolutional neural network(CNN) structures succeeded in single image super-resolution(SISR) task, an end-to-end convolutional neural network is proposed to reconstruct the high resolution(HR) optical flow field from initial LR optical flow with the guidence of the first frame used in optical flow estimation. Our optical flow super-resolution(OFSR) problem differs from the general SISR problem in two main aspects. Firstly, the optical flow includes less texture information than image so that the SISR CNN structures can't be directly used in our OFSR problem. Secondly, the initial LR optical flow data contains estimation error, while the LR image data for SISR is generally a bicubic downsampled, blurred, and noisy version of HR ground truth. We evaluate the proposed approach on two different optical flow estimation mehods and show that it can not only obtain the full image resolution, but generate more accurate optical flow field (Accuracy improve 15% on FlyingChairs, 13% on MPI Sintel) with sharper edges than the estimation result of original method.
△ Less
Submitted 3 September, 2018;
originally announced September 2018.
-
Heavy $P$-wave quarkonium production via Higgs decays
Authors:
Qi-Li Liao,
Ya Deng,
Yan Yu,
Guang-Chuan Wang,
Guo-Ya Xie
Abstract:
The production of the heavy quarkonium, i.e., $|(c\bar{b})[n]\rangle$ (or $|(b\bar{c})[n]\rangle$), $|(c\bar{c})[n]\rangle$, and $|(b\bar{b})[n]\rangle$- quarkonium [$|(Q\bar{Q'})[n]\rangle$-quarkonium for short], through Higgs $H^{0}$ boson semiexclusive decays is evaluated within the NRQCD framework, where $[n]$ stands for the production of the two color-singlet $S$-wave states,…
▽ More
The production of the heavy quarkonium, i.e., $|(c\bar{b})[n]\rangle$ (or $|(b\bar{c})[n]\rangle$), $|(c\bar{c})[n]\rangle$, and $|(b\bar{b})[n]\rangle$- quarkonium [$|(Q\bar{Q'})[n]\rangle$-quarkonium for short], through Higgs $H^{0}$ boson semiexclusive decays is evaluated within the NRQCD framework, where $[n]$ stands for the production of the two color-singlet $S$-wave states, $|(Q\bar{Q'})[^1S_0]_{\textbf{1}} \rangle$ and $|(Q\bar{Q'})[^3S_1]_{\textbf{1}} \rangle$, the production of the four color-singlet $P$-wave states, i.e., $|(Q\bar{Q'})[^1P_0]_{\textbf{1}}\rangle$, $|(Q\bar{Q'})[^3P_J]_{\textbf{1}}\rangle$ (with $J =[0, 1, 2]$). Moreover, according to the velocity scaling rule of the NRQCD, the production of the two color-octet components, $|(Q\bar{Q'})g[^1S_0]_{\textbf{8}} \rangle$ and $|(Q\bar{Q'})g[^3S_1]_{\textbf{8}} \rangle$, are also taken into account. The "improved trace technology" to derive the simplified analytic expressions at the amplitude level is adopted, which shall be useful for dealing with these decay channels. If all higher heavy quarkonium states decay completely to the ground states, it should be obtained $Γ{(H^0\to |(c\bar{b})[^1S_0]_{\textbf{1}}\rangle)}=15.14$ KeV, $Γ{(H^0\to |(c\bar{c})[^1S_0]_{\textbf{1}}\rangle)}=1.547$ KeV, and $Γ{(H^0\to |(b\bar{b})[^1S_0]_{\textbf{1}}\rangle)}=1.311$ KeV. The production of $5.6\times10^{5}$ Bc meson, $4.7\times10^{4}$ charmonium meson, and $4.9\times10^{4}$ bottomonium meson per year in Higgs decays at the HE/HL-LHC can be obtained.
△ Less
Submitted 23 August, 2018; v1 submitted 29 July, 2018;
originally announced July 2018.
-
A Surprising Linear Relationship Predicts Test Performance in Deep Networks
Authors:
Qianli Liao,
Brando Miranda,
Andrzej Banburski,
Jack Hidary,
Tomaso Poggio
Abstract:
Given two networks with the same training loss on a dataset, when would they have drastically different test losses and errors? Better understanding of this question of generalization may improve practical applications of deep networks. In this paper we show that with cross-entropy loss it is surprisingly simple to induce significantly different generalization performances for two networks that ha…
▽ More
Given two networks with the same training loss on a dataset, when would they have drastically different test losses and errors? Better understanding of this question of generalization may improve practical applications of deep networks. In this paper we show that with cross-entropy loss it is surprisingly simple to induce significantly different generalization performances for two networks that have the same architecture, the same meta parameters and the same training error: one can either pretrain the networks with different levels of "corrupted" data or simply initialize the networks with weights of different Gaussian standard deviations. A corollary of recent theoretical results on overfitting shows that these effects are due to an intrinsic problem of measuring test performance with a cross-entropy/exponential-type loss, which can be decomposed into two components both minimized by SGD -- one of which is not related to expected classification performance. However, if we factor out this component of the loss, a linear relationship emerges between training and test losses. Under this transformation, classical generalization bounds are surprisingly tight: the empirical/training loss is very close to the expected/test loss. Furthermore, the empirical relation between classification error and normalized cross-entropy loss seem to be approximately monotonic
△ Less
Submitted 25 July, 2018;
originally announced July 2018.
-
Theory IIIb: Generalization in Deep Networks
Authors:
Tomaso Poggio,
Qianli Liao,
Brando Miranda,
Andrzej Banburski,
Xavier Boix,
Jack Hidary
Abstract:
A main puzzle of deep neural networks (DNNs) revolves around the apparent absence of "overfitting", defined in this paper as follows: the expected error does not get worse when increasing the number of neurons or of iterations of gradient descent. This is surprising because of the large capacity demonstrated by DNNs to fit randomly labeled data and the absence of explicit regularization. Recent re…
▽ More
A main puzzle of deep neural networks (DNNs) revolves around the apparent absence of "overfitting", defined in this paper as follows: the expected error does not get worse when increasing the number of neurons or of iterations of gradient descent. This is surprising because of the large capacity demonstrated by DNNs to fit randomly labeled data and the absence of explicit regularization. Recent results by Srebro et al. provide a satisfying solution of the puzzle for linear networks used in binary classification. They prove that minimization of loss functions such as the logistic, the cross-entropy and the exp-loss yields asymptotic, "slow" convergence to the maximum margin solution for linearly separable datasets, independently of the initial conditions. Here we prove a similar result for nonlinear multilayer DNNs near zero minima of the empirical loss. The result holds for exponential-type losses but not for the square loss. In particular, we prove that the weight matrix at each layer of a deep network converges to a minimum norm solution up to a scale factor (in the separable case). Our analysis of the dynamical system corresponding to gradient descent of a multilayer network suggests a simple criterion for ranking the generalization performance of different zero minimizers of the empirical loss.
△ Less
Submitted 29 June, 2018;
originally announced June 2018.
-
Novel Force Estimation-based Bilateral Teleoperation applying Type-2 Fuzzy logic and Moving Horizon Estimation
Authors:
Qianfang Liao,
Da Sun,
Hongliang Ren
Abstract:
This paper develops a novel force observer for bilateral teleoperation systems. Type-2 fuzzy logic is used to describe the overall dynamic system, and Moving Horizon Estimation (MHE) is employed to assess clean states as well as the values of dynamic uncertainties, and simultaneously filter out the measurement noises, which guarantee the high degree of accuracy for the observed forces. Compared wi…
▽ More
This paper develops a novel force observer for bilateral teleoperation systems. Type-2 fuzzy logic is used to describe the overall dynamic system, and Moving Horizon Estimation (MHE) is employed to assess clean states as well as the values of dynamic uncertainties, and simultaneously filter out the measurement noises, which guarantee the high degree of accuracy for the observed forces. Compared with the existing methods, the proposed force observer can run without knowing exact mathematical dynamic functions and is robust to different kinds of noises. A force-reflection four-channel teleoperation control laws is also proposed that involving the observed environmental and human force to provide the highly accurate force tracking between the master and the slave in the presence of time delays. Finally, experiments based on two haptic devices demonstrate the superiority of the proposed method through the comparisons with multiple state-to-the-art force observers.
△ Less
Submitted 17 May, 2018;
originally announced May 2018.
-
Positioning of Transparent Targets Using Defocusing Method in a Laser Proton Accelerator
Authors:
Yinren Shou,
Dahui Wang,
Pengjie Wang,
Jianbo Liu,
Zhengxuan Cao,
Zhusong Mei,
Yixing Geng,
Jungao Zhu,
Qing Liao,
Yanying Zhao,
Chen Lin,
Haiyang Lu,
Wenjun Ma,
Xueqing Yan
Abstract:
We report a positioning method for transparent targets with an accuracy of \SI{2}{μm} for a compact laser proton accelerator. The positioning system consists of two light-emitting diodes (LED), a long working distance objective and two charge coupled devices (CCD) for illumination, imaging and detection, respectively. We developed a defocusing method making transparent targets visible as phase obj…
▽ More
We report a positioning method for transparent targets with an accuracy of \SI{2}{μm} for a compact laser proton accelerator. The positioning system consists of two light-emitting diodes (LED), a long working distance objective and two charge coupled devices (CCD) for illumination, imaging and detection, respectively. We developed a defocusing method making transparent targets visible as phase objects and applied it to our system. Precise positioning of transparent targets can be realized by means of minimizing the image contrast of the phase objects. Fast positioning based on the relationship between the radius of spherical aberration ring and defocusing distance is also realized. Laser proton acceleration experiments have been performed to demonstrate the reliability of this positioning system.
△ Less
Submitted 27 April, 2018;
originally announced April 2018.
-
Dual-phase-modulated plug-and-play measurement-device-independent continuous-variable quantum key distribution
Authors:
Qin Liao,
Ying Guo,
Yijun Wang,
Duan Huang
Abstract:
We suggest an improved plug-and-play measurement-device-independent (MDI) continuous-variable quantum key distribution (CVQKD) via the dual-phase modulation (DPM), aiming to solve an implementation problem with no extra performance penalty. The synchronous loophole of different lasers from Alice and Bob can be elegantly eliminated in the plug-and-play configuration, which gives birth to the conven…
▽ More
We suggest an improved plug-and-play measurement-device-independent (MDI) continuous-variable quantum key distribution (CVQKD) via the dual-phase modulation (DPM), aiming to solve an implementation problem with no extra performance penalty. The synchronous loophole of different lasers from Alice and Bob can be elegantly eliminated in the plug-and-play configuration, which gives birth to the convenient implementation when comparing to the Gaussian-modulated coherent-state protocol. While the local oscillator (LO) can be locally generated by the trusted part Charlie, the LO-aimed attacks can be accurately detected in the data post-processing. We derive the security bounds of the DPM-based MDI-CVQKD against optimal Gaussian collective attacks. Taking the finite-size effect into account, the secret key rate can be increased due to the fact that almost all raw keys of the MDI-CVQKD system can be fully exploited for the final secret key generation without sacrificing raw keys in parameter estimation. Moreover, we give an experimental concept of the proposed scheme which can be deemed guideline for final implementation.
△ Less
Submitted 16 April, 2018; v1 submitted 18 March, 2018;
originally announced March 2018.
-
Laser acceleration of highly energetic carbon ions using a double-layer target composed of slightly underdense plasma and ultrathin foil
Authors:
W. J. Ma,
I Jong Kim,
J. Q. Yu,
Il Woo Choi,
P. K. Singh,
Hwang Woon Lee,
Jae Hee Sung,
Seong Ku Lee,
C. Lin,
Q. Liao,
J. G. Zhu,
H. Y. Lu,
B. Liu,
H. Y. Wang,
R. F. Xu,
X. T. He,
J. E. Chen,
M. Zepf,
J. Schreiber,
X. Q. Yan,
Chang Hee Nam
Abstract:
We report the experimental generation of highly energetic carbon ions up to 48 MeV per nucleon by shooting double-layer targets composed of well-controlled slightly underdense plasma (SUP) and ultrathin foils with ultra-intense femtosecond laser pulses. Particle-in-cell simulations reveal that carbon ions residing in the ultrathin foils undergo radiation pressure acceleration and long-time sheath…
▽ More
We report the experimental generation of highly energetic carbon ions up to 48 MeV per nucleon by shooting double-layer targets composed of well-controlled slightly underdense plasma (SUP) and ultrathin foils with ultra-intense femtosecond laser pulses. Particle-in-cell simulations reveal that carbon ions residing in the ultrathin foils undergo radiation pressure acceleration and long-time sheath field acceleration in sequence due to the existence of the SUP in front of the foils. Such an acceleration scheme is especially suited for heavy ion acceleration with femtosecond laser pulses. The breakthrough of heavy ion energy up to multi-tens of MeV/u at high-repetition-rate would be able to trigger significant advances in nuclear physics, high energy density physics, and medical physics.
△ Less
Submitted 31 January, 2018;
originally announced January 2018.
-
Theory of Deep Learning IIb: Optimization Properties of SGD
Authors:
Chiyuan Zhang,
Qianli Liao,
Alexander Rakhlin,
Brando Miranda,
Noah Golowich,
Tomaso Poggio
Abstract:
In Theory IIb we characterize with a mix of theory and experiments the optimization of deep convolutional networks by Stochastic Gradient Descent. The main new result in this paper is theoretical and experimental evidence for the following conjecture about SGD: SGD concentrates in probability -- like the classical Langevin equation -- on large volume, "flat" minima, selecting flat minimizers which…
▽ More
In Theory IIb we characterize with a mix of theory and experiments the optimization of deep convolutional networks by Stochastic Gradient Descent. The main new result in this paper is theoretical and experimental evidence for the following conjecture about SGD: SGD concentrates in probability -- like the classical Langevin equation -- on large volume, "flat" minima, selecting flat minimizers which are with very high probability also global minimizers
△ Less
Submitted 7 January, 2018;
originally announced January 2018.
-
Resource Optimization with Flexible Numerology and Frame Structure for Heterogeneous Services
Authors:
Lei You,
Qi Liao,
Nikolaos Pappas,
Di Yuan
Abstract:
We explore the potential of optimizing resource allocation with flexible numerology in frequency domain and variable frame structure in time domain, in presence of services with different types of requirements. We analyze the computational complexity and propose a scalable optimization algorithm based on searching in both the primal space and dual space that are complementary to each other. Numeri…
▽ More
We explore the potential of optimizing resource allocation with flexible numerology in frequency domain and variable frame structure in time domain, in presence of services with different types of requirements. We analyze the computational complexity and propose a scalable optimization algorithm based on searching in both the primal space and dual space that are complementary to each other. Numerical results show significant advantages of adopting flexibility in both time and frequency domains for capacity enhancement and meeting the requirements of mission critical services.
△ Less
Submitted 10 August, 2018; v1 submitted 6 January, 2018;
originally announced January 2018.
-
Theory of Deep Learning III: explaining the non-overfitting puzzle
Authors:
Tomaso Poggio,
Kenji Kawaguchi,
Qianli Liao,
Brando Miranda,
Lorenzo Rosasco,
Xavier Boix,
Jack Hidary,
Hrushikesh Mhaskar
Abstract:
A main puzzle of deep networks revolves around the absence of overfitting despite large overparametrization and despite the large capacity demonstrated by zero training error on randomly labeled data. In this note, we show that the dynamics associated to gradient descent minimization of nonlinear networks is topologically equivalent, near the asymptotically stable minima of the empirical error, to…
▽ More
A main puzzle of deep networks revolves around the absence of overfitting despite large overparametrization and despite the large capacity demonstrated by zero training error on randomly labeled data. In this note, we show that the dynamics associated to gradient descent minimization of nonlinear networks is topologically equivalent, near the asymptotically stable minima of the empirical error, to linear gradient system in a quadratic potential with a degenerate (for square loss) or almost degenerate (for logistic or crossentropy loss) Hessian. The proposition depends on the qualitative theory of dynamical systems and is supported by numerical results. Our main propositions extend to deep nonlinear networks two properties of gradient descent for linear networks, that have been recently established (1) to be key to their generalization properties: 1. Gradient descent enforces a form of implicit regularization controlled by the number of iterations, and asymptotically converges to the minimum norm solution for appropriate initial conditions of gradient descent. This implies that there is usually an optimum early stopping that avoids overfitting of the loss. This property, valid for the square loss and many other loss functions, is relevant especially for regression. 2. For classification, the asymptotic convergence to the minimum norm solution implies convergence to the maximum margin solution which guarantees good classification error for "low noise" datasets. This property holds for loss functions such as the logistic and cross-entropy loss independently of the initial conditions. The robustness to overparametrization has suggestive implications for the robustness of the architecture of deep convolutional networks with respect to the curse of dimensionality.
△ Less
Submitted 16 January, 2018; v1 submitted 30 December, 2017;
originally announced January 2018.
-
Resource Scheduling for Mixed Traffic Types with Scalable TTI in Dynamic TDD Systems
Authors:
Qi Liao,
Paolo Baracca,
David Lopez-Perez,
Lorenzo Galati Giordano
Abstract:
This paper analyses the performance benefits of a user-centric scheduling approach, exploiting the flexibility of both dynamic time division duplex (TDD) and a variable transmission time interval (TTI), where the downlink to uplink ratio and TTI duration can be adapted to the traffic load. The formulation of the joint optimisation problem takes into consideration the individual requirements of eac…
▽ More
This paper analyses the performance benefits of a user-centric scheduling approach, exploiting the flexibility of both dynamic time division duplex (TDD) and a variable transmission time interval (TTI), where the downlink to uplink ratio and TTI duration can be adapted to the traffic load. The formulation of the joint optimisation problem takes into consideration the individual requirements of each single user in terms of sustainable latency and desired throughput, thus implementing a real user-centric scheduling approach. Moreover, the developed solution is evaluated in a scenario with mixed traffic types, mobile broadband (MBB) and mission critical communications (MCC), showing remarkable performance enhancement of the proposed scheme over baseline dynamic TDD schemes with a fixed TTI in terms of both achievable throughput of the MBB users and guaranteed latency for the MCC users.
△ Less
Submitted 11 September, 2017;
originally announced October 2017.
-
Improving Resource Efficiency with Partial Resource Muting for Future Wireless Networks
Authors:
Qi Liao,
R. L. G. Cavalcante
Abstract:
We propose novel resource allocation algorithms that have the objective of finding a good tradeoff between resource reuse and interference avoidance in wireless networks. To this end, we first study properties of functions that relate the resource budget available to network elements to the optimal utility and to the optimal resource efficiency obtained by solving max-min utility optimization prob…
▽ More
We propose novel resource allocation algorithms that have the objective of finding a good tradeoff between resource reuse and interference avoidance in wireless networks. To this end, we first study properties of functions that relate the resource budget available to network elements to the optimal utility and to the optimal resource efficiency obtained by solving max-min utility optimization problems. From the asymptotic behavior of these functions, we obtain a transition point that indicates whether a network is operating in an efficient noise-limited regime or in an inefficient interference-limited regime for a given resource budget. For networks operating in the inefficient regime, we propose a novel partial resource muting scheme to improve the efficiency of the resource utilization. The framework is very general. It can be applied not only to the downlink of 4G networks, but also to 5G networks equipped with flexible duplex mechanisms. Numerical results show significant performance gains of the proposed scheme compared to the solution to the max-min utility optimization problem with full frequency reuse.
△ Less
Submitted 18 January, 2018; v1 submitted 11 September, 2017;
originally announced October 2017.
-
Dynamic Uplink/Downlink Resource Management in Flexible Duplex-Enabled Wireless Networks
Authors:
Qi Liao
Abstract:
Flexible duplex is proposed to adapt to the channel and traffic asymmetry for future wireless networks. In this paper, we propose two novel algorithms within the flexible duplex framework for joint uplink and downlink resource allocation in multi-cell scenario, named SAFP and RMDI, based on the awareness of interference coupling among wireless links. Numerical results show significant performance…
▽ More
Flexible duplex is proposed to adapt to the channel and traffic asymmetry for future wireless networks. In this paper, we propose two novel algorithms within the flexible duplex framework for joint uplink and downlink resource allocation in multi-cell scenario, named SAFP and RMDI, based on the awareness of interference coupling among wireless links. Numerical results show significant performance gain over the baseline system with fixed uplink/downlink resource configuration, and over the dynamic TDD scheme that independently adapts the configuration to time-varying traffic volume in each cell. The proposed algorithms achieve two-fold increase when compared with the baseline scheme, measured by the worst-case quality of service satisfaction level, under a low level of traffic asymmetry. The gain is more significant when the traffic is highly asymmetric, as it achieves three-fold increase.
△ Less
Submitted 11 September, 2017;
originally announced October 2017.
-
Characterization of a RS-LiDAR for 3D Perception
Authors:
Zhe Wang,
Yang Liu,
Qinghai Liao,
Haoyang Ye,
Ming Liu,
Lujia Wang
Abstract:
High precision 3D LiDARs are still expensive and hard to acquire. This paper presents the characteristics of RS-LiDAR, a model of low-cost LiDAR with sufficient supplies, in comparison with VLP-16. The paper also provides a set of evaluations to analyze the characterizations and performances of LiDARs sensors. This work analyzes multiple properties, such as drift effects, distance effects, color e…
▽ More
High precision 3D LiDARs are still expensive and hard to acquire. This paper presents the characteristics of RS-LiDAR, a model of low-cost LiDAR with sufficient supplies, in comparison with VLP-16. The paper also provides a set of evaluations to analyze the characterizations and performances of LiDARs sensors. This work analyzes multiple properties, such as drift effects, distance effects, color effects and sensor orientation effects, in the context of 3D perception. By comparing with Velodyne LiDAR, we found RS-LiDAR as a cheaper and acquirable substitute of VLP-16 with similar efficiency.
△ Less
Submitted 22 September, 2017;
originally announced September 2017.
-
Feature-Fused SSD: Fast Detection for Small Objects
Authors:
Guimei Cao,
Xuemei Xie,
Wenzhe Yang,
Quan Liao,
Guangming Shi,
Jinjian Wu
Abstract:
Small objects detection is a challenging task in computer vision due to its limited resolution and information. In order to solve this problem, the majority of existing methods sacrifice speed for improvement in accuracy. In this paper, we aim to detect small objects at a fast speed, using the best object detector Single Shot Multibox Detector (SSD) with respect to accuracy-vs-speed trade-off as b…
▽ More
Small objects detection is a challenging task in computer vision due to its limited resolution and information. In order to solve this problem, the majority of existing methods sacrifice speed for improvement in accuracy. In this paper, we aim to detect small objects at a fast speed, using the best object detector Single Shot Multibox Detector (SSD) with respect to accuracy-vs-speed trade-off as base architecture. We propose a multi-level feature fusion method for introducing contextual information in SSD, in order to improve the accuracy for small objects. In detailed fusion operation, we design two feature fusion modules, concatenation module and element-sum module, different in the way of adding contextual information. Experimental results show that these two fusion modules obtain higher mAP on PASCALVOC2007 than baseline SSD by 1.6 and 1.7 points respectively, especially with 2-3 points improvement on some smallobjects categories. The testing speed of them is 43 and 40 FPS respectively, superior to the state of the art Deconvolutional single shot detector (DSSD) by 29.4 and 26.4 FPS. Code is available at https://github.com/wnzhyee/Feature-Fused-SSD. Keywords: small object detection, feature fusion, real-time, single shot multi-box detector
△ Less
Submitted 26 November, 2018; v1 submitted 15 September, 2017;
originally announced September 2017.
-
A Measure for Dialog Complexity and its Application in Streamlining Service Operations
Authors:
Q Vera Liao,
Biplav Srivastava,
Pavan Kapanipathi
Abstract:
Dialog is a natural modality for interaction between customers and businesses in the service industry. As customers call up the service provider, their interactions may be routine or extraordinary. We believe that these interactions, when seen as dialogs, can be analyzed to obtain a better understanding of customer needs and how to efficiently address them. We introduce the idea of a dialog comple…
▽ More
Dialog is a natural modality for interaction between customers and businesses in the service industry. As customers call up the service provider, their interactions may be routine or extraordinary. We believe that these interactions, when seen as dialogs, can be analyzed to obtain a better understanding of customer needs and how to efficiently address them. We introduce the idea of a dialog complexity measure to characterize multi-party interactions, propose a general data-driven method to calculate it, use it to discover insights in public and enterprise dialog datasets, and demonstrate its beneficial usage in facilitating better handling of customer requests and evaluating service agents.
△ Less
Submitted 3 August, 2017;
originally announced August 2017.
-
Long-distance continuous-variable quantum key distribution using non-Gaussian state-discrimination detection
Authors:
Qin Liao,
Ying Guo,
Duang Huang,
Peng Huang,
Guihua Zeng
Abstract:
We propose a long-distance continuous-variable quantum key distribution (CVQKD) with four-state protocol using non-Gaussian state-discrimination detection. A photon subtraction operation, which is deployed at the transmitter, is used for splitting the signal required for generating the non-Gaussian operation to lengthen the maximum transmission distance of CVQKD. Whereby an improved state-discrimi…
▽ More
We propose a long-distance continuous-variable quantum key distribution (CVQKD) with four-state protocol using non-Gaussian state-discrimination detection. A photon subtraction operation, which is deployed at the transmitter, is used for splitting the signal required for generating the non-Gaussian operation to lengthen the maximum transmission distance of CVQKD. Whereby an improved state-discrimination detector, which can be deemed as an optimized quantum measurement that allows the discrimination of nonorthogonal coherent states beating the standard quantum limit, is applied at the receiver to codetermine the measurement result with conventional coherent detector. By tactfully exploiting multiplexing technique, the resulting signals can be simultaneously transmitted through an untrusted quantum channel, and subsequently sent to the state-discrimination detector and coherent detector respectively. Security analysis shows that the proposed scheme can lengthen the maximum transmission distance up to hundreds of kilometers. Furthermore, by taking finite-size effect and composable security into account we obtain the tightest bound of the secure distance, which is more practical than that obtained in the asymptotic limit.
△ Less
Submitted 14 August, 2017;
originally announced August 2017.
-
Composable security of unidimensional continuous-variable quantum key distribution
Authors:
Qin Liao,
Ying Guo,
Cailang Xie,
Duang Huang,
Peng Huang,
Guihua Zeng
Abstract:
We investigate the composable security of unidimensional contin- uous variable quantum key distribution (UCVQKD), which is based on the Gaussian modulation of a single quadrature of the coherent-state of light, aiming to provide a simple implementation of key distribution compared to the symmetrically modulated Gaussian coherent-state protocols. This protocol neglects the necessity in one of the q…
▽ More
We investigate the composable security of unidimensional contin- uous variable quantum key distribution (UCVQKD), which is based on the Gaussian modulation of a single quadrature of the coherent-state of light, aiming to provide a simple implementation of key distribution compared to the symmetrically modulated Gaussian coherent-state protocols. This protocol neglects the necessity in one of the quadrature modulation in coherent-states and hence reduces the system complexity. To clarify the influence of finite-size effect and the cost of performance degeneration, we establish the relation- ship of the balanced parameters of the unmodulated quadrature and estimate the precise secure region. Subsequently, we illustrate the composable security of the UCVQKD protocol against collective attacks and achieve the tightest bound of the UCVQKD protocol. Numerical simulations show the asymptotic secret key rate of the UCVQKD protocol, together with the symmetrically modulated Gaussian coherent-state protocols.
△ Less
Submitted 7 August, 2017;
originally announced August 2017.
-
Theory II: Landscape of the Empirical Risk in Deep Learning
Authors:
Qianli Liao,
Tomaso Poggio
Abstract:
Previous theoretical work on deep learning and neural network optimization tend to focus on avoiding saddle points and local minima. However, the practical observation is that, at least in the case of the most successful Deep Convolutional Neural Networks (DCNNs), practitioners can always increase the network size to fit the training data (an extreme example would be [1]). The most successful DCNN…
▽ More
Previous theoretical work on deep learning and neural network optimization tend to focus on avoiding saddle points and local minima. However, the practical observation is that, at least in the case of the most successful Deep Convolutional Neural Networks (DCNNs), practitioners can always increase the network size to fit the training data (an extreme example would be [1]). The most successful DCNNs such as VGG and ResNets are best used with a degree of "overparametrization". In this work, we characterize with a mix of theory and experiments, the landscape of the empirical risk of overparametrized DCNNs. We first prove in the regression framework the existence of a large number of degenerate global minimizers with zero empirical error (modulo inconsistent equations). The argument that relies on the use of Bezout theorem is rigorous when the RELUs are replaced by a polynomial nonlinearity (which empirically works as well). As described in our Theory III [2] paper, the same minimizers are degenerate and thus very likely to be found by SGD that will furthermore select with higher probability the most robust zero-minimizer. We further experimentally explored and visualized the landscape of empirical risk of a DCNN on CIFAR-10 during the entire training process and especially the global minima. Finally, based on our theoretical and experimental results, we propose an intuitive model of the landscape of DCNN's empirical loss surface, which might not be as complicated as people commonly believe.
△ Less
Submitted 22 June, 2017; v1 submitted 28 March, 2017;
originally announced March 2017.
-
Extrinsic Calibration of 3D Range Finder and Camera without Auxiliary Object or Human Intervention
Authors:
Qinghai Liao,
Ming Liu,
Lei Tai,
Haoyang Ye
Abstract:
Fusion of heterogeneous extroceptive sensors is the most effient and effective way to representing the environment precisely, as it overcomes various defects of each homogeneous sensor. The rigid transformation (aka. extrinsic parameters) of heterogeneous sensory systems should be available before precisely fusing the multisensor information. Researchers have proposed several approaches to estimat…
▽ More
Fusion of heterogeneous extroceptive sensors is the most effient and effective way to representing the environment precisely, as it overcomes various defects of each homogeneous sensor. The rigid transformation (aka. extrinsic parameters) of heterogeneous sensory systems should be available before precisely fusing the multisensor information. Researchers have proposed several approaches to estimating the extrinsic parameters. These approaches require either auxiliary objects, like chessboards, or extra help from human to select correspondences. In this paper, we proposed a novel extrinsic calibration approach for the extrinsic calibration of range and image sensors. As far as we know, it is the first automatic approach with no requirement of auxiliary objects or any human interventions. First, we estimate the initial extrinsic parameters from the individual motion of the range finder and the camera. Then we extract lines in the image and point-cloud pairs, to refine the line feature associations by the initial extrinsic parameters. At the end, we discussed the degenerate case which may lead to the algorithm failure and validate our approach by simulation. The results indicate high-precision extrinsic calibration results against the ground-truth.
△ Less
Submitted 2 March, 2017;
originally announced March 2017.
-
An Examination of the Benefits of Scalable TTI for Heterogeneous Traffic Management in 5G Networks
Authors:
Emmanouil Fountoulakis,
Nikolaos Pappas,
Qi Liao,
Vinay Suryaprakash,
Di Yuan
Abstract:
The rapid growth in the number and variety of connected devices requires 5G wireless systems to cope with a very heterogeneous traffic mix. As a consequence, the use of a fixed TTI during transmission is not necessarily the most efficacious method when heterogeneous traffic types need to be simultaneously serviced.This work analyzes the benefits of scheduling based on exploiting scalable TTI, wher…
▽ More
The rapid growth in the number and variety of connected devices requires 5G wireless systems to cope with a very heterogeneous traffic mix. As a consequence, the use of a fixed TTI during transmission is not necessarily the most efficacious method when heterogeneous traffic types need to be simultaneously serviced.This work analyzes the benefits of scheduling based on exploiting scalable TTI, where the channel assignment and the TTI duration are adapted to the deadlines and requirements of different services. We formulate an optimization problem by taking individual service requirements into consideration. We then prove that the optimization problem is NP-hard and provide a heuristic algorithm, which provides an effective solution to the problem. Numerical results show that our proposed algorithm is capable of finding near-optimal solutions to meet the latency requirements of mission critical communication services, while providing a good throughput performance for mobile broadband services.
△ Less
Submitted 4 May, 2017; v1 submitted 20 February, 2017;
originally announced February 2017.
-
Compression of Deep Neural Networks for Image Instance Retrieval
Authors:
Vijay Chandrasekhar,
Jie Lin,
Qianli Liao,
Olivier Morère,
Antoine Veillard,
Lingyu Duan,
Tomaso Poggio
Abstract:
Image instance retrieval is the problem of retrieving images from a database which contain the same object. Convolutional Neural Network (CNN) based descriptors are becoming the dominant approach for generating {\it global image descriptors} for the instance retrieval problem. One major drawback of CNN-based {\it global descriptors} is that uncompressed deep neural network models require hundreds…
▽ More
Image instance retrieval is the problem of retrieving images from a database which contain the same object. Convolutional Neural Network (CNN) based descriptors are becoming the dominant approach for generating {\it global image descriptors} for the instance retrieval problem. One major drawback of CNN-based {\it global descriptors} is that uncompressed deep neural network models require hundreds of megabytes of storage making them inconvenient to deploy in mobile applications or in custom hardware. In this work, we study the problem of neural network model compression focusing on the image instance retrieval task. We study quantization, coding, pruning and weight sharing techniques for reducing model size for the instance retrieval problem. We provide extensive experimental results on the trade-off between retrieval performance and model size for different types of networks on several data sets providing the most comprehensive study on this topic. We compress models to the order of a few MBs: two orders of magnitude smaller than the uncompressed models while achieving negligible loss in retrieval performance.
△ Less
Submitted 17 January, 2017;
originally announced January 2017.