-
Optical Transient Object Classification in Wide Field Small Aperture Telescopes with Neural Networks
Authors:
Peng Jia,
Yifei Zhao,
Gang Xue,
Dongmei Cai
Abstract:
Wide field small aperture telescopes are working horses for fast sky surveying. Transient discovery is one of their main tasks. Classification of candidate transient images between real sources and artifacts with high accuracy is an important step for transient discovery. In this paper, we propose two transient classification methods based on neural networks. The first method uses the convolutiona…
▽ More
Wide field small aperture telescopes are working horses for fast sky surveying. Transient discovery is one of their main tasks. Classification of candidate transient images between real sources and artifacts with high accuracy is an important step for transient discovery. In this paper, we propose two transient classification methods based on neural networks. The first method uses the convolutional neural network without pooling layers to classify transient images with low sampling rate. The second method assumes transient images as one dimensional signals and is based on recurrent neural networks with long short term memory and leaky ReLu activation function in each detection layer. Testing with real observation data, we find that although these two methods can both achieve more than 94% classification accuracy, they have different classification properties for different targets. Based on this result, we propose to use the ensemble learning method to further increase the classification accuracy to more than 97%.
△ Less
Submitted 29 April, 2019;
originally announced April 2019.
-
Utterance-level end-to-end language identification using attention-based CNN-BLSTM
Authors:
Weicheng Cai,
Danwei Cai,
Shen Huang,
Ming Li
Abstract:
In this paper, we present an end-to-end language identification framework, the attention-based Convolutional Neural Network-Bidirectional Long-short Term Memory (CNN-BLSTM). The model is performed on the utterance level, which means the utterance-level decision can be directly obtained from the output of the neural network. To handle speech utterances with entire arbitrary and potentially long dur…
▽ More
In this paper, we present an end-to-end language identification framework, the attention-based Convolutional Neural Network-Bidirectional Long-short Term Memory (CNN-BLSTM). The model is performed on the utterance level, which means the utterance-level decision can be directly obtained from the output of the neural network. To handle speech utterances with entire arbitrary and potentially long duration, we combine CNN-BLSTM model with a self-attentive pooling layer together. The front-end CNN-BLSTM module plays a role as local pattern extractor for the variable-length inputs, and the following self-attentive pooling layer is built on top to get the fixed-dimensional utterance-level representation. We conducted experiments on NIST LRE07 closed-set task, and the results reveal that the proposed attention-based CNN-BLSTM model achieves comparable error reduction with other state-of-the-art utterance-level neural network approaches for all 3 seconds, 10 seconds, 30 seconds duration tasks.
△ Less
Submitted 19 February, 2019;
originally announced February 2019.
-
Chinese Word Segmentation: Another Decade Review (2007-2017)
Authors:
Hai Zhao,
Deng Cai,
Changning Huang,
Chunyu Kit
Abstract:
This paper reviews the development of Chinese word segmentation (CWS) in the most recent decade, 2007-2017. Special attention was paid to the deep learning technologies that has already permeated into most areas of natural language processing (NLP). The basic view we have arrived at is that compared to traditional supervised learning methods, neural network based methods have not shown any superio…
▽ More
This paper reviews the development of Chinese word segmentation (CWS) in the most recent decade, 2007-2017. Special attention was paid to the deep learning technologies that has already permeated into most areas of natural language processing (NLP). The basic view we have arrived at is that compared to traditional supervised learning methods, neural network based methods have not shown any superior performance. The most critical challenge still lies on balancing of recognition of in-vocabulary (IV) and out-of-vocabulary (OOV) words. However, as neural models have potentials to capture the essential linguistic structure of natural language, we are optimistic about significant progresses may arrive in the near future.
△ Less
Submitted 17 January, 2019;
originally announced January 2019.
-
Ultrasensitive hybrid optical skin
Authors:
Lei Zhang,
Jing Pan,
Zhang Zhang,
Hao Wu,
Ni Yao,
Dawei Cai,
Yingxin Xu,
Jin Zhang,
Guofei Sun,
Liqiang Wang,
Weidong Geng,
Wenguang Jin,
Wei Fang,
Dawei Di,
Limin Tong
Abstract:
Electronic skin, a class of wearable electronic sensors that mimic the functionalities of human skin, has made remarkable success in applications including health monitoring, human-machine interaction and electronic-biological interfaces. While electronic skin continues to achieve higher sensitivity and faster response, its ultimate performance is fundamentally limited by the nature of low-frequen…
▽ More
Electronic skin, a class of wearable electronic sensors that mimic the functionalities of human skin, has made remarkable success in applications including health monitoring, human-machine interaction and electronic-biological interfaces. While electronic skin continues to achieve higher sensitivity and faster response, its ultimate performance is fundamentally limited by the nature of low-frequency AC currents in electronic circuitries. Here we demonstrate highly sensitive optical skin (O-skin) in which the primary sensory elements are optically driven. The simple construction of the sensors is achieved by embedding glass micro/nanofibers (MNFs) in thin layers of polydimethylsiloxane (PDMS). Enabled by the highly sensitive power-leakage response of the guided modes from the MNF upon external stimuli, our optical sensors show ultrahigh sensitivity (1870/kPa), low detection limit (7 mPa) and fast response (10 microseconds) for pressure sensing, significantly exceeding the performance metrics of state-of-the-art electronic skins. Electromagnetic interference (EMI)-free detection of high-frequency vibrations, wrist pulse and human voice are realized. Moreover, a five-sensor optical data glove and a 2x2-MNF tactile sensor are demonstrated. Our results pave the way toward wearable optical devices ranging from ultrasensitive flexible sensors to optical skins.
△ Less
Submitted 25 October, 2018;
originally announced December 2018.
-
Swift Two-sample Test on High-dimensional Neural Spiking Data
Authors:
Zhi-Qin John Xu,
Douglas Zhou,
David Cai
Abstract:
To understand how neural networks process information, it is important to investigate how neural network dynamics varies with respect to different stimuli. One challenging task is to design efficient statistical approaches to analyze multiple spike train data obtained from a short recording time. Based on the development of high-dimensional statistical methods, it is able to deal with data whose d…
▽ More
To understand how neural networks process information, it is important to investigate how neural network dynamics varies with respect to different stimuli. One challenging task is to design efficient statistical approaches to analyze multiple spike train data obtained from a short recording time. Based on the development of high-dimensional statistical methods, it is able to deal with data whose dimension is much larger than the sample size. However, these methods often require statistically independent samples to start with, while neural data are correlated over consecutive sampling time bins. We develop an approach to pretreat neural data to become independent samples over time by transferring the correlation of dynamics for each neuron in different sampling time bins into the correlation of dynamics among different dimensions within each sampling time bin. We verify the method using simulation data generated from Integrate-and-fire neuron network models and a large-scale network model of primary visual cortex within a short time, e.g., a few seconds. Our method may offer experimenters to use the advantage of the development of statistical methods to analyze high-dimensional neural data.
△ Less
Submitted 11 November, 2018;
originally announced November 2018.
-
Translating a Math Word Problem to an Expression Tree
Authors:
Lei Wang,
Yan Wang,
Deng Cai,
Dongxiang Zhang,
Xiaojiang Liu
Abstract:
Sequence-to-sequence (SEQ2SEQ) models have been successfully applied to automatic math word problem solving. Despite its simplicity, a drawback still remains: a math word problem can be correctly solved by more than one equations. This non-deterministic transduction harms the performance of maximum likelihood estimation. In this paper, by considering the uniqueness of expression tree, we propose a…
▽ More
Sequence-to-sequence (SEQ2SEQ) models have been successfully applied to automatic math word problem solving. Despite its simplicity, a drawback still remains: a math word problem can be correctly solved by more than one equations. This non-deterministic transduction harms the performance of maximum likelihood estimation. In this paper, by considering the uniqueness of expression tree, we propose an equation normalization method to normalize the duplicated equations. Moreover, we analyze the performance of three popular SEQ2SEQ models on the math word problem solving. We find that each model has its own specialty in solving problems, consequently an ensemble model is then proposed to combine their advantages. Experiments on dataset Math23K show that the ensemble model with equation normalization significantly outperforms the previous state-of-the-art methods.
△ Less
Submitted 14 November, 2018; v1 submitted 13 November, 2018;
originally announced November 2018.
-
Dial2Desc: End-to-end Dialogue Description Generation
Authors:
Haojie Pan,
Junpei Zhou,
Zhou Zhao,
Yan Liu,
Deng Cai,
Min Yang
Abstract:
We first propose a new task named Dialogue Description (Dial2Desc). Unlike other existing dialogue summarization tasks such as meeting summarization, we do not maintain the natural flow of a conversation but describe an object or an action of what people are talking about. The Dial2Desc system takes a dialogue text as input, then outputs a concise description of the object or the action involved i…
▽ More
We first propose a new task named Dialogue Description (Dial2Desc). Unlike other existing dialogue summarization tasks such as meeting summarization, we do not maintain the natural flow of a conversation but describe an object or an action of what people are talking about. The Dial2Desc system takes a dialogue text as input, then outputs a concise description of the object or the action involved in this conversation. After reading this short description, one can quickly extract the main topic of a conversation and build a clear picture in his mind, without reading or listening to the whole conversation. Based on the existing dialogue dataset, we build a new dataset, which has more than one hundred thousand dialogue-description pairs. As a step forward, we demonstrate that one can get more accurate and descriptive results using a new neural attentive model that exploits the interaction between utterances from different speakers, compared with other baselines.
△ Less
Submitted 31 October, 2018;
originally announced November 2018.
-
Textually Guided Ranking Network for Attentional Image Retweet Modeling
Authors:
Zhou Zhao,
Hanbing Zhan,
Lingtao Meng,
Jun Xiao,
Jun Yu,
Min Yang,
Fei Wu,
Deng Cai
Abstract:
Retweet prediction is a challenging problem in social media sites (SMS). In this paper, we study the problem of image retweet prediction in social media, which predicts the image sharing behavior that the user reposts the image tweets from their followees. Unlike previous studies, we learn user preference ranking model from their past retweeted image tweets in SMS. We first propose heterogeneous i…
▽ More
Retweet prediction is a challenging problem in social media sites (SMS). In this paper, we study the problem of image retweet prediction in social media, which predicts the image sharing behavior that the user reposts the image tweets from their followees. Unlike previous studies, we learn user preference ranking model from their past retweeted image tweets in SMS. We first propose heterogeneous image retweet modeling network (IRM) that exploits users' past retweeted image tweets with associated contexts, their following relations in SMS and preference of their followees. We then develop a novel attentional multi-faceted ranking network learning framework with textually guided multi-modal neural networks for the proposed heterogenous IRM network to learn the joint image tweet representations and user preference representations for prediction task. The extensive experiments on a large-scale dataset from Twitter site shows that our method achieves better performance than other state-of-the-art solutions to the problem.
△ Less
Submitted 24 October, 2018;
originally announced October 2018.
-
Skeleton-to-Response: Dialogue Generation Guided by Retrieval Memory
Authors:
Deng Cai,
Yan Wang,
Victoria Bi,
Zhaopeng Tu,
Xiaojiang Liu,
Wai Lam,
Shuming Shi
Abstract:
For dialogue response generation, traditional generative models generate responses solely from input queries. Such models rely on insufficient information for generating a specific response since a certain query could be answered in multiple ways. Consequentially, those models tend to output generic and dull responses, impeding the generation of informative utterances. Recently, researchers have a…
▽ More
For dialogue response generation, traditional generative models generate responses solely from input queries. Such models rely on insufficient information for generating a specific response since a certain query could be answered in multiple ways. Consequentially, those models tend to output generic and dull responses, impeding the generation of informative utterances. Recently, researchers have attempted to fill the information gap by exploiting information retrieval techniques. When generating a response for a current query, similar dialogues retrieved from the entire training data are considered as an additional knowledge source. While this may harvest massive information, the generative models could be overwhelmed, leading to undesirable performance. In this paper, we propose a new framework which exploits retrieval results via a skeleton-then-response paradigm. At first, a skeleton is generated by revising the retrieved responses. Then, a novel generative model uses both the generated skeleton and the original query for response generation. Experimental results show that our approaches significantly improve the diversity and informativeness of the generated responses.
△ Less
Submitted 28 February, 2020; v1 submitted 14 September, 2018;
originally announced September 2018.
-
End-to-end Language Identification using NetFV and NetVLAD
Authors:
Jinkun Chen,
Weicheng Cai,
Danwei Cai,
Zexin Cai,
Haibin Zhong,
Ming Li
Abstract:
In this paper, we apply the NetFV and NetVLAD layers for the end-to-end language identification task. NetFV and NetVLAD layers are the differentiable implementations of the standard Fisher Vector and Vector of Locally Aggregated Descriptors (VLAD) methods, respectively. Both of them can encode a sequence of feature vectors into a fixed dimensional vector which is very important to process those va…
▽ More
In this paper, we apply the NetFV and NetVLAD layers for the end-to-end language identification task. NetFV and NetVLAD layers are the differentiable implementations of the standard Fisher Vector and Vector of Locally Aggregated Descriptors (VLAD) methods, respectively. Both of them can encode a sequence of feature vectors into a fixed dimensional vector which is very important to process those variable-length utterances. We first present the relevances and differences between the classical i-vector and the aforementioned encoding schemes. Then, we construct a flexible end-to-end framework including a convolutional neural network (CNN) architecture and an encoding layer (NetFV or NetVLAD) for the language identification task. Experimental results on the NIST LRE 2007 close-set task show that the proposed system achieves significant EER reductions against the conventional i-vector baseline and the CNN temporal average pooling system, respectively.
△ Less
Submitted 8 September, 2018;
originally announced September 2018.
-
Maximum Entropy Principle Analysis in Network Systems with Short-time Recordings
Authors:
Zhi-Qin John Xu,
Jennifer Crodelle,
Douglas Zhou,
David Cai
Abstract:
In many realistic systems, maximum entropy principle (MEP) analysis provides an effective characterization of the probability distribution of network states. However, to implement the MEP analysis, a sufficiently long-time data recording in general is often required, e.g., hours of spiking recordings of neurons in neuronal networks. The issue of whether the MEP analysis can be successfully applied…
▽ More
In many realistic systems, maximum entropy principle (MEP) analysis provides an effective characterization of the probability distribution of network states. However, to implement the MEP analysis, a sufficiently long-time data recording in general is often required, e.g., hours of spiking recordings of neurons in neuronal networks. The issue of whether the MEP analysis can be successfully applied to network systems with data from short recordings has yet to be fully addressed. In this work, we investigate relationships underlying the probability distributions, moments, and effective interactions in the MEP analysis and then show that, with short recordings of network dynamics, the MEP analysis can be applied to reconstructing probability distributions of network states under the condition of asynchronous activity of nodes in the network. Using spike trains obtained from both Hodgkin-Huxley neuronal networks and electrophysiological experiments, we verify our results and demonstrate that MEP analysis provides a tool to investigate the neuronal population coding properties, even for short recordings.
△ Less
Submitted 30 August, 2018;
originally announced August 2018.
-
Dynamical and Coupling Structure of Pulse-Coupled Networks in Maximum Entropy Analysis
Authors:
Zhi-Qin John Xu,
Douglas Zhou,
David Cai
Abstract:
Maximum entropy principle (MEP) analysis with few non-zero effective interactions successfully characterizes the distribution of dynamical states of pulse-coupled networks in many experiments, e.g., in neuroscience. To better understand the underlying mechanism, we found a relation between the dynamical structure, i.e., effective interactions in MEP analysis, and the coupling structure of pulse-co…
▽ More
Maximum entropy principle (MEP) analysis with few non-zero effective interactions successfully characterizes the distribution of dynamical states of pulse-coupled networks in many experiments, e.g., in neuroscience. To better understand the underlying mechanism, we found a relation between the dynamical structure, i.e., effective interactions in MEP analysis, and the coupling structure of pulse-coupled network to understand how a sparse coupling structure could lead to a sparse coding by effective interactions. This relation quantitatively displays how the dynamical structure is closely related to the coupling structure.
△ Less
Submitted 13 August, 2018;
originally announced August 2018.
-
Language Style Transfer from Sentences with Arbitrary Unknown Styles
Authors:
Yanpeng Zhao,
Wei Bi,
Deng Cai,
Xiaojiang Liu,
Kewei Tu,
Shuming Shi
Abstract:
Language style transfer is the problem of migrating the content of a source sentence to a target style. In many of its applications, parallel training data are not available and source sentences to be transferred may have arbitrary and unknown styles. First, each sentence is encoded into its content and style latent representations. Then, by recombining the content with the target style, we decode…
▽ More
Language style transfer is the problem of migrating the content of a source sentence to a target style. In many of its applications, parallel training data are not available and source sentences to be transferred may have arbitrary and unknown styles. First, each sentence is encoded into its content and style latent representations. Then, by recombining the content with the target style, we decode a sentence aligned in the target domain. To adequately constrain the encoding and decoding functions, we couple them with two loss functions. The first is a style discrepancy loss, enforcing that the style representation accurately encodes the style information guided by the discrepancy between the sentence style and the target style. The second is a cycle consistency loss, which ensures that the transferred sentence should preserve the content of the original sentence disentangled from its style. We validate the effectiveness of our model in three tasks: sentiment modification of restaurant reviews, dialog response revision with a romantic style, and sentence rewriting with a Shakespearean style.
△ Less
Submitted 13 August, 2018;
originally announced August 2018.
-
On the Performance of NOMA with Hybrid ARQ
Authors:
Donghong Cai,
Zhiguo Ding,
Pingzhi Fan,
Zheng Yang
Abstract:
In this paper, we investigate the outage performance of hybrid automatic repeat request with chase combining (HARQ-CC) assisted downlink non-orthogonal multiple access (NOMA) systems. A closed-form expression of the individual outage probability and the diversity gain are obtained firstly. Based on the developed analytical outage probability, a tradeoff between the minimum number of retransmission…
▽ More
In this paper, we investigate the outage performance of hybrid automatic repeat request with chase combining (HARQ-CC) assisted downlink non-orthogonal multiple access (NOMA) systems. A closed-form expression of the individual outage probability and the diversity gain are obtained firstly. Based on the developed analytical outage probability, a tradeoff between the minimum number of retransmissions and the transmit power allocation coefficient is then provided for a given target rate. The provided simulation results demonstrate the accuracy of the developed analytical results. Moreover, it is shown that NOMA combined with the HARQ-CC can achieve a significant advantage when only average channel state information is known at the transmitter. Particularly, the performance of the user with less transmit power in NOMA systems can be efficiently improved by utilizing HARQ-CC.
△ Less
Submitted 10 July, 2018;
originally announced July 2018.
-
Learning Visual Knowledge Memory Networks for Visual Question Answering
Authors:
Zhou Su,
Chen Zhu,
Yinpeng Dong,
Dongqi Cai,
Yurong Chen,
Jianguo Li
Abstract:
Visual question answering (VQA) requires joint comprehension of images and natural language questions, where many questions can't be directly or clearly answered from visual content but require reasoning from structured human knowledge with confirmation from visual content. This paper proposes visual knowledge memory network (VKMN) to address this issue, which seamlessly incorporates structured hu…
▽ More
Visual question answering (VQA) requires joint comprehension of images and natural language questions, where many questions can't be directly or clearly answered from visual content but require reasoning from structured human knowledge with confirmation from visual content. This paper proposes visual knowledge memory network (VKMN) to address this issue, which seamlessly incorporates structured human knowledge and deep visual features into memory networks in an end-to-end learning framework. Comparing to existing methods for leveraging external knowledge for supporting VQA, this paper stresses more on two missing mechanisms. First is the mechanism for integrating visual contents with knowledge facts. VKMN handles this issue by embedding knowledge triples (subject, relation, target) and deep visual features jointly into the visual knowledge features. Second is the mechanism for handling multiple knowledge facts expanding from question and answer pairs. VKMN stores joint embedding using key-value pair structure in the memory networks so that it is easy to handle multiple facts. Experiments show that the proposed method achieves promising results on both VQA v1.0 and v2.0 benchmarks, while outperforms state-of-the-art methods on the knowledge-reasoning related questions.
△ Less
Submitted 13 June, 2018;
originally announced June 2018.
-
Addressing the Item Cold-start Problem by Attribute-driven Active Learning
Authors:
Yu Zhu,
Jinhao Lin,
Shibi He,
Beidou Wang,
Ziyu Guan,
Haifeng Liu,
Deng Cai
Abstract:
In recommender systems, cold-start issues are situations where no previous events, e.g. ratings, are known for certain users or items. In this paper, we focus on the item cold-start problem. Both content information (e.g. item attributes) and initial user ratings are valuable for seizing users' preferences on a new item. However, previous methods for the item cold-start problem either 1) incorpora…
▽ More
In recommender systems, cold-start issues are situations where no previous events, e.g. ratings, are known for certain users or items. In this paper, we focus on the item cold-start problem. Both content information (e.g. item attributes) and initial user ratings are valuable for seizing users' preferences on a new item. However, previous methods for the item cold-start problem either 1) incorporate content information into collaborative filtering to perform hybrid recommendation, or 2) actively select users to rate the new item without considering content information and then do collaborative filtering. In this paper, we propose a novel recommendation scheme for the item cold-start problem by leverage both active learning and items' attribute information. Specifically, we design useful user selection criteria based on items' attributes and users' rating history, and combine the criteria in an optimization framework for selecting users. By exploiting the feedback ratings, users' previous ratings and items' attributes, we then generate accurate rating predictions for the other unselected users. Experimental results on two real-world datasets show the superiority of our proposed method over traditional methods.
△ Less
Submitted 23 May, 2018;
originally announced May 2018.
-
A Brand-level Ranking System with the Customized Attention-GRU Model
Authors:
Yu Zhu,
Junxiong Zhu,
Jie Hou,
Yongliang Li,
Beidou Wang,
Ziyu Guan,
Deng Cai
Abstract:
In e-commerce websites like Taobao, brand is playing a more important role in influencing users' decision of click/purchase, partly because users are now attaching more importance to the quality of products and brand is an indicator of quality. However, existing ranking systems are not specifically designed to satisfy this kind of demand. Some design tricks may partially alleviate this problem, bu…
▽ More
In e-commerce websites like Taobao, brand is playing a more important role in influencing users' decision of click/purchase, partly because users are now attaching more importance to the quality of products and brand is an indicator of quality. However, existing ranking systems are not specifically designed to satisfy this kind of demand. Some design tricks may partially alleviate this problem, but still cannot provide satisfactory results or may create additional interaction cost. In this paper, we design the first brand-level ranking system to address this problem. The key challenge of this system is how to sufficiently exploit users' rich behavior in e-commerce websites to rank the brands. In our solution, we firstly conduct the feature engineering specifically tailored for the personalized brand ranking problem and then rank the brands by an adapted Attention-GRU model containing three important modifications. Note that our proposed modifications can also apply to many other machine learning models on various tasks. We conduct a series of experiments to evaluate the effectiveness of our proposed ranking model and test the response to the brand-level ranking system from real users on a large-scale e-commerce platform, i.e. Taobao.
△ Less
Submitted 11 August, 2018; v1 submitted 23 May, 2018;
originally announced May 2018.
-
PixelLink: Detecting Scene Text via Instance Segmentation
Authors:
Dan Deng,
Haifeng Liu,
Xuelong Li,
Deng Cai
Abstract:
Most state-of-the-art scene text detection algorithms are deep learning based methods that depend on bounding box regression and perform at least two kinds of predictions: text/non-text classification and location regression. Regression plays a key role in the acquisition of bounding boxes in these methods, but it is not indispensable because text/non-text prediction can also be considered as a ki…
▽ More
Most state-of-the-art scene text detection algorithms are deep learning based methods that depend on bounding box regression and perform at least two kinds of predictions: text/non-text classification and location regression. Regression plays a key role in the acquisition of bounding boxes in these methods, but it is not indispensable because text/non-text prediction can also be considered as a kind of semantic segmentation that contains full location information in itself. However, text instances in scene images often lie very close to each other, making them very difficult to separate via semantic segmentation. Therefore, instance segmentation is needed to address this problem. In this paper, PixelLink, a novel scene text detection algorithm based on instance segmentation, is proposed. Text instances are first segmented out by linking pixels within the same instance together. Text bounding boxes are then extracted directly from the segmentation result without location regression. Experiments show that, compared with regression-based methods, PixelLink can achieve better or comparable performance on several benchmarks, while requiring many fewer training iterations and less training data.
△ Less
Submitted 4 January, 2018;
originally announced January 2018.
-
On the Diversity of Realistic Image Synthesis
Authors:
Zichen Yang,
Haifeng Liu,
Deng Cai
Abstract:
Many image processing tasks can be formulated as translating images between two image domains, such as colorization, super resolution and conditional image synthesis. In most of these tasks, an input image may correspond to multiple outputs. However, current existing approaches only show very minor diversity of the outputs. In this paper, we present a novel approach to synthesize diverse realistic…
▽ More
Many image processing tasks can be formulated as translating images between two image domains, such as colorization, super resolution and conditional image synthesis. In most of these tasks, an input image may correspond to multiple outputs. However, current existing approaches only show very minor diversity of the outputs. In this paper, we present a novel approach to synthesize diverse realistic images corresponding to a semantic layout. We introduce a diversity loss objective, which maximizes the distance between synthesized image pairs and links the input noise to the semantic segments in the synthesized images. Thus, our approach can not only produce diverse images, but also allow users to manipulate the output images by adjusting the noise manually. Experimental results show that images synthesized by our approach are significantly more diverse than that of the current existing works and equipping our diversity loss does not degrade the reality of the base networks.
△ Less
Submitted 20 December, 2017;
originally announced December 2017.
-
A Revisit on Deep Hashings for Large-scale Content Based Image Retrieval
Authors:
Deng Cai,
Xiuye Gu,
Chaoqi Wang
Abstract:
There is a growing trend in studying deep hashing methods for content-based image retrieval (CBIR), where hash functions and binary codes are learnt using deep convolutional neural networks and then the binary codes can be used to do approximate nearest neighbor (ANN) search. All the existing deep hashing papers report their methods' superior performance over the traditional hashing methods accord…
▽ More
There is a growing trend in studying deep hashing methods for content-based image retrieval (CBIR), where hash functions and binary codes are learnt using deep convolutional neural networks and then the binary codes can be used to do approximate nearest neighbor (ANN) search. All the existing deep hashing papers report their methods' superior performance over the traditional hashing methods according to their experimental results. However, there are serious flaws in the evaluations of existing deep hashing papers: (1) The datasets they used are too small and simple to simulate the real CBIR situation. (2) They did not correctly include the search time in their evaluation criteria, while the search time is crucial in real CBIR systems. (3) The performance of some unsupervised hashing algorithms (e.g., LSH) can easily be boosted if one uses multiple hash tables, which is an important factor should be considered in the evaluation while most of the deep hashing papers failed to do so.
We re-evaluate several state-of-the-art deep hashing methods with a carefully designed experimental setting. Empirical results reveal that the performance of these deep hashing methods are inferior to multi-table IsoH, a very simple unsupervised hashing method. Thus, the conclusions in all the deep hashing papers should be carefully re-examined.
△ Less
Submitted 16 November, 2017;
originally announced November 2017.
-
Dialogue Act Recognition via CRF-Attentive Structured Network
Authors:
Zheqian Chen,
Rongqin Yang,
Zhou Zhao,
Deng Cai,
Xiaofei He
Abstract:
Dialogue Act Recognition (DAR) is a challenging problem in dialogue interpretation, which aims to attach semantic labels to utterances and characterize the speaker's intention. Currently, many existing approaches formulate the DAR problem ranging from multi-classification to structured prediction, which suffer from handcrafted feature extensions and attentive contextual structural dependencies. In…
▽ More
Dialogue Act Recognition (DAR) is a challenging problem in dialogue interpretation, which aims to attach semantic labels to utterances and characterize the speaker's intention. Currently, many existing approaches formulate the DAR problem ranging from multi-classification to structured prediction, which suffer from handcrafted feature extensions and attentive contextual structural dependencies. In this paper, we consider the problem of DAR from the viewpoint of extending richer Conditional Random Field (CRF) structural dependencies without abandoning end-to-end training. We incorporate hierarchical semantic inference with memory mechanism on the utterance modeling. We then extend structured attention network to the linear-chain conditional random field layer which takes into account both contextual utterances and corresponding dialogue acts. The extensive experiments on two major benchmark datasets Switchboard Dialogue Act (SWDA) and Meeting Recorder Dialogue Act (MRDA) datasets show that our method achieves better performance than other state-of-the-art solutions to the problem. It is a remarkable fact that our method is nearly close to the human annotator's performance on SWDA within 2% gap.
△ Less
Submitted 15 November, 2017;
originally announced November 2017.
-
Keyword-based Query Comprehending via Multiple Optimized-Demand Augmentation
Authors:
Boyuan Pan,
Hao Li,
Zhou Zhao,
Deng Cai,
Xiaofei He
Abstract:
In this paper, we consider the problem of machine reading task when the questions are in the form of keywords, rather than natural language. In recent years, researchers have achieved significant success on machine reading comprehension tasks, such as SQuAD and TriviaQA. These datasets provide a natural language question sentence and a pre-selected passage, and the goal is to answer the question a…
▽ More
In this paper, we consider the problem of machine reading task when the questions are in the form of keywords, rather than natural language. In recent years, researchers have achieved significant success on machine reading comprehension tasks, such as SQuAD and TriviaQA. These datasets provide a natural language question sentence and a pre-selected passage, and the goal is to answer the question according to the passage. However, in the situation of interacting with machines by means of text, people are more likely to raise a query in form of several keywords rather than a complete sentence. The keyword-based query comprehension is a new challenge, because small variations to a question may completely change its semantical information, thus yield different answers. In this paper, we propose a novel neural network system that consists a Demand Optimization Model based on a passage-attention neural machine translation and a Reader Model that can find the answer given the optimized question. The Demand Optimization Model optimizes the original query and output multiple reconstructed questions, then the Reader Model takes the new questions as input and locate the answers from the passage. To make predictions robust, an evaluation mechanism will score the reconstructed questions so the final answer strike a good balance between the quality of both the Demand Optimization Model and the Reader Model. Experimental results on several datasets show that our framework significantly improves multiple strong baselines on this challenging task.
△ Less
Submitted 31 October, 2017;
originally announced November 2017.
-
Emergence of a Balanced Core through Dynamical Competition in Heterogeneous Neuronal Networks
Authors:
Qing-long L. Gu,
Songting Li,
Wei P. Dai,
Douglas Zhou,
David Cai
Abstract:
The balance between excitation and inhibition is crucial for neuronal computation. It is observed that the balanced state of neuronal networks exists in many experiments, yet its underlying mechanism remains to be fully clarified. Theoretical studies of the balanced state mainly focus on the analysis of the homogeneous Erd$\ddot{\text{o}}$s-Rényi network. However, neuronal networks have been found…
▽ More
The balance between excitation and inhibition is crucial for neuronal computation. It is observed that the balanced state of neuronal networks exists in many experiments, yet its underlying mechanism remains to be fully clarified. Theoretical studies of the balanced state mainly focus on the analysis of the homogeneous Erd$\ddot{\text{o}}$s-Rényi network. However, neuronal networks have been found to be inhomogeneous in many cortical areas. In particular, the connectivity of neuronal networks can be of the type of scale-free, small-world, or even with specific motifs. In this work, we examine the questions of whether the balanced state is universal with respect to network topology and what characteristics the balanced state possesses in inhomogeneous networks such as scale-free and small-world networks. We discover that, for a sparsely but strongly connected inhomogeneous network, despite that the whole network receives external inputs, there is a small active subnetwork (active core) inherently embedded within it. The neurons in this active core have relatively high firing rates while the neurons in the rest of the network are quiescent. Surprisingly, the active core possesses a balanced state and this state is independent of the model of single-neuron dynamics. The dynamics of the active core can be well predicted using the Fokker-Planck equation with the mean-field assumption. Our results suggest that, in the presence of inhomogeneous network connectivity, the balanced state may be ubiquitous in the brain, and the network connectivity in the active core is essentially close to the Erd$\ddot{\text{o}}$s-Rényi structure. The existence of the small active core embedded in a large network may provide a potential dynamical scenario underlying sparse coding in neuronal networks.
△ Less
Submitted 14 October, 2017;
originally announced October 2017.
-
A New Framework for Determination of Excitatory and Inhibitory Conductances Using Somatic Clamp
Authors:
Songting Li,
Xiaohui Zhang,
Douglas Zhou,
David Cai
Abstract:
The interaction between excitation and inhibition is crucial for brain computation. To understand synaptic mechanisms underlying brain function, it is important to separate excitatory and inhibitory inputs to a target neuron. In the traditional method, after applying somatic current or voltage clamp, the excitatory and inhibitory conductances are determined from the synaptic current-voltage (I-V)…
▽ More
The interaction between excitation and inhibition is crucial for brain computation. To understand synaptic mechanisms underlying brain function, it is important to separate excitatory and inhibitory inputs to a target neuron. In the traditional method, after applying somatic current or voltage clamp, the excitatory and inhibitory conductances are determined from the synaptic current-voltage (I-V) relation --- the slope corresponds to the total conductance and the intercept corresponds to the reversal current. Because of the space clamp effect, the measured conductance in general deviates substantially from the local conductance on the dendrite. Therefore, the interpretation of the conductance measured by the traditional method remains to be clarified. In this work, based on the investigation of an idealized ball-and-stick neuron model and a biologically realistic pyramidal neuron model, we first demonstrate both analytically and numerically that the conductance determined by the traditional method has no clear biological interpretation due to the neglect of a nonlinear interaction between the clamp current and the synaptic current across the spatial dendrites. As a consequence, the traditional method can induce an arbitrarily large error of conductance measurement, sometimes even leads to unphysically negative conductance. To circumvent the difficulty of elucidating synaptic impact on neuronal computation using the traditional method, we then propose a framework to determine the effective conductance that reflects directly the functional impact of synaptic inputs on action potential initiation and thereby neuronal information processing. Our framework has been further verified in realistic neuron simulations, thus greatly improves upon the traditional approach by providing a reliable and accurate assessment of the role of synaptic activity in neuronal computation.
△ Less
Submitted 13 October, 2017;
originally announced October 2017.
-
Determination of Effective Synaptic Conductances Using Somatic Voltage Clamp
Authors:
Songting Li,
Nan Liu,
Xiaohui Zhang,
Douglas Zhou,
David Cai
Abstract:
The interplay between excitatory and inhibitory neurons imparts rich functions of the brain. To understand the underlying synaptic mechanisms, a fundamental approach is to study the dynamics of excitatory and inhibitory conductances of each neuron. The traditional method of determining conductance employs the synaptic current-voltage (I-V) relation obtained via voltage clamp. Using theoretical ana…
▽ More
The interplay between excitatory and inhibitory neurons imparts rich functions of the brain. To understand the underlying synaptic mechanisms, a fundamental approach is to study the dynamics of excitatory and inhibitory conductances of each neuron. The traditional method of determining conductance employs the synaptic current-voltage (I-V) relation obtained via voltage clamp. Using theoretical analysis, electrophysiological experiments, and realistic simulations, here we demonstrate that the traditional method conceptually fails to measure the conductance due to the neglect of a nonlinear interaction between the clamp current and the synaptic current. Consequently, it incurs substantial measurement error, even giving rise to unphysically negative conductance as observed in experiments. To elucidate synaptic impact on neuronal information processing, we introduce the concept of effective conductance and propose a framework to determine it accurately. Our work suggests re-examination of previous studies involving conductance measurement and provides a reliable approach to assess synaptic influence on neuronal computation.
△ Less
Submitted 13 October, 2017;
originally announced October 2017.
-
Smarnet: Teaching Machines to Read and Comprehend Like Human
Authors:
Zheqian Chen,
Rongqin Yang,
Bin Cao,
Zhou Zhao,
Deng Cai,
Xiaofei He
Abstract:
Machine Comprehension (MC) is a challenging task in Natural Language Processing field, which aims to guide the machine to comprehend a passage and answer the given question. Many existing approaches on MC task are suffering the inefficiency in some bottlenecks, such as insufficient lexical understanding, complex question-passage interaction, incorrect answer extraction and so on. In this paper, we…
▽ More
Machine Comprehension (MC) is a challenging task in Natural Language Processing field, which aims to guide the machine to comprehend a passage and answer the given question. Many existing approaches on MC task are suffering the inefficiency in some bottlenecks, such as insufficient lexical understanding, complex question-passage interaction, incorrect answer extraction and so on. In this paper, we address these problems from the viewpoint of how humans deal with reading tests in a scientific way. Specifically, we first propose a novel lexical gating mechanism to dynamically combine the words and characters representations. We then guide the machines to read in an interactive way with attention mechanism and memory network. Finally we add a checking layer to refine the answer for insurance. The extensive experiments on two popular datasets SQuAD and TriviaQA show that our method exceeds considerable performance than most state-of-the-art solutions at the time of submission.
△ Less
Submitted 7 October, 2017;
originally announced October 2017.
-
Some Inequalities Related to Ricci Curvatures for Lagrangian Submanifolds of Kahler QCH-manifolds
Authors:
Liang Zhang,
Xudong Liu,
Dandan Cai
Abstract:
By establishing two general quadratic inequalities, we obtain some inequalities related to Ricci curvatures for Lagrangian submanifolds of K$\ddot{\mathrm{a}}$hler QCH-manifolds, which generalize some results for Lagrangian submanifolds of complex space forms.
By establishing two general quadratic inequalities, we obtain some inequalities related to Ricci curvatures for Lagrangian submanifolds of K$\ddot{\mathrm{a}}$hler QCH-manifolds, which generalize some results for Lagrangian submanifolds of complex space forms.
△ Less
Submitted 28 September, 2017;
originally announced September 2017.
-
Learning Graph-Level Representation for Drug Discovery
Authors:
Junying Li,
Deng Cai,
Xiaofei He
Abstract:
Predicating macroscopic influences of drugs on human body, like efficacy and toxicity, is a central problem of small-molecule based drug discovery. Molecules can be represented as an undirected graph, and we can utilize graph convolution networks to predication molecular properties. However, graph convolutional networks and other graph neural networks all focus on learning node-level representatio…
▽ More
Predicating macroscopic influences of drugs on human body, like efficacy and toxicity, is a central problem of small-molecule based drug discovery. Molecules can be represented as an undirected graph, and we can utilize graph convolution networks to predication molecular properties. However, graph convolutional networks and other graph neural networks all focus on learning node-level representation rather than graph-level representation. Previous works simply sum all feature vectors for all nodes in the graph to obtain the graph feature vector for drug predication. In this paper, we introduce a dummy super node that is connected with all nodes in the graph by a directed edge as the representation of the graph and modify the graph operation to help the dummy super node learn graph-level feature. Thus, we can handle graph-level classification and regression in the same way as node-level classification and regression. In addition, we apply focal loss to address class imbalance in drug datasets. The experiments on MoleculeNet show that our method can effectively improve the performance of molecular properties predication.
△ Less
Submitted 15 September, 2017; v1 submitted 12 September, 2017;
originally announced September 2017.
-
MEMEN: Multi-layer Embedding with Memory Networks for Machine Comprehension
Authors:
Boyuan Pan,
Hao Li,
Zhou Zhao,
Bin Cao,
Deng Cai,
Xiaofei He
Abstract:
Machine comprehension(MC) style question answering is a representative problem in natural language processing. Previous methods rarely spend time on the improvement of encoding layer, especially the embedding of syntactic information and name entity of the words, which are very crucial to the quality of encoding. Moreover, existing attention methods represent each query word as a vector or use a s…
▽ More
Machine comprehension(MC) style question answering is a representative problem in natural language processing. Previous methods rarely spend time on the improvement of encoding layer, especially the embedding of syntactic information and name entity of the words, which are very crucial to the quality of encoding. Moreover, existing attention methods represent each query word as a vector or use a single vector to represent the whole query sentence, neither of them can handle the proper weight of the key words in query sentence. In this paper, we introduce a novel neural network architecture called Multi-layer Embedding with Memory Network(MEMEN) for machine reading task. In the encoding layer, we employ classic skip-gram model to the syntactic and semantic information of the words to train a new kind of embedding layer. We also propose a memory network of full-orientation matching of the query and passage to catch more pivotal information. Experiments show that our model has competitive results both from the perspectives of precision and efficiency in Stanford Question Answering Dataset(SQuAD) among all published results and achieves the state-of-the-art results on TriviaQA dataset.
△ Less
Submitted 27 July, 2017;
originally announced July 2017.
-
Nonlinear dance motion analysis and motion editing using Hilbert-Huang transform
Authors:
Ran Dong,
Dongsheng Cai,
Nobuyoshi Asai
Abstract:
Human motions (especially dance motions) are very noisy, and it is hard to analyze and edit the motions. To resolve this problem, we propose a new method to decompose and modify the motions using the Hilbert-Huang transform (HHT). First, HHT decomposes a chromatic signal into "monochromatic" signals that are the so-called Intrinsic Mode Functions (IMFs) using an Empirical Mode Decomposition (EMD)…
▽ More
Human motions (especially dance motions) are very noisy, and it is hard to analyze and edit the motions. To resolve this problem, we propose a new method to decompose and modify the motions using the Hilbert-Huang transform (HHT). First, HHT decomposes a chromatic signal into "monochromatic" signals that are the so-called Intrinsic Mode Functions (IMFs) using an Empirical Mode Decomposition (EMD) [6]. After applying the Hilbert Transform to each IMF, the instantaneous frequencies of the "monochromatic" signals can be obtained. The HHT has the advantage to analyze non-stationary and nonlinear signals such as human-joint-motions over FFT or Wavelet transform.
In the present paper, we propose a new framework to analyze and extract some new features from a famous Japanese threesome pop singer group called "Perfume", and compare it with Waltz and Salsa dance. Using the EMD, their dance motions can be decomposed into motion (choreographic) primitives or IMFs. Therefore we can scale, combine, subtract, exchange, and modify those IMFs, and can blend them into new dance motions self-consistently. Our analysis and framework can lead to a motion editing and blending method to create a new dance motion from different dance motions.
△ Less
Submitted 6 July, 2017;
originally announced July 2017.
-
Fast Approximate Nearest Neighbor Search With The Navigating Spreading-out Graph
Authors:
Cong Fu,
Chao Xiang,
Changxu Wang,
Deng Cai
Abstract:
Approximate nearest neighbor search (ANNS) is a fundamental problem in databases and data mining. A scalable ANNS algorithm should be both memory-efficient and fast. Some early graph-based approaches have shown attractive theoretical guarantees on search time complexity, but they all suffer from the problem of high indexing time complexity. Recently, some graph-based methods have been proposed to…
▽ More
Approximate nearest neighbor search (ANNS) is a fundamental problem in databases and data mining. A scalable ANNS algorithm should be both memory-efficient and fast. Some early graph-based approaches have shown attractive theoretical guarantees on search time complexity, but they all suffer from the problem of high indexing time complexity. Recently, some graph-based methods have been proposed to reduce indexing complexity by approximating the traditional graphs; these methods have achieved revolutionary performance on million-scale datasets. Yet, they still can not scale to billion-node databases. In this paper, to further improve the search-efficiency and scalability of graph-based methods, we start by introducing four aspects: (1) ensuring the connectivity of the graph; (2) lowering the average out-degree of the graph for fast traversal; (3) shortening the search path; and (4) reducing the index size. Then, we propose a novel graph structure called Monotonic Relative Neighborhood Graph (MRNG) which guarantees very low search complexity (close to logarithmic time). To further lower the indexing complexity and make it practical for billion-node ANNS problems, we propose a novel graph structure named Navigating Spreading-out Graph (NSG) by approximating the MRNG. The NSG takes the four aspects into account simultaneously. Extensive experiments show that NSG outperforms all the existing algorithms significantly. In addition, NSG shows superior performance in the E-commercial search scenario of Taobao (Alibaba Group) and has been integrated into their search engine at billion-node scale.
△ Less
Submitted 11 December, 2018; v1 submitted 1 July, 2017;
originally announced July 2017.
-
"Synchronize" to VR Body: Full Body Illusion in VR Space
Authors:
Peikun Xiong,
Chen Sun,
Dongsheng Cai
Abstract:
Virtual Reality (VR) becomes accessible to mimic a "real-like" world now. People who have a VR experience usually can be impressed by the immersive feeling, they might consider themselves are actually existed in the VR space. Self-consciousness is important for people to identify their own characters in VR space, and illusory ownership can help people to "build" their "bodies". The rubber hand ill…
▽ More
Virtual Reality (VR) becomes accessible to mimic a "real-like" world now. People who have a VR experience usually can be impressed by the immersive feeling, they might consider themselves are actually existed in the VR space. Self-consciousness is important for people to identify their own characters in VR space, and illusory ownership can help people to "build" their "bodies". The rubber hand illusion can convince us a fake hand made by rubber is a part of our bodies under certain circumstances. Researches about autoscopic phenomena extend this illusory to the so-called full body illusion. We conducted 3 type of experiments to study the illusory ownership in VR space as it shows in Figure 1, and we learned: Human body must receive the synchronized visual signal and somatosensory stimulus at the same time; The visual signal must be the first person perceptive; the subject and the virtual body needs to be the same height as much as possible. All these illusory ownerships accompanied by the body temperature decreases, where the body is stimulated.
△ Less
Submitted 20 June, 2017;
originally announced June 2017.
-
Understanding the Inefficiency of Security-Constrained Economic Dispatch
Authors:
Mohammad H. Hajiesmaili,
Desmond Cai,
Enrique Mallada
Abstract:
The security-constrained economic dispatch (SCED) problem tries to maintain the reliability of a power network by ensuring that a single failure does not lead to a global outage. The previous research has mainly investigated SCED by formulating the problem in different modalities, e.g. preventive or corrective, and devising efficient solutions for SCED. In this paper, we tackle a novel and importa…
▽ More
The security-constrained economic dispatch (SCED) problem tries to maintain the reliability of a power network by ensuring that a single failure does not lead to a global outage. The previous research has mainly investigated SCED by formulating the problem in different modalities, e.g. preventive or corrective, and devising efficient solutions for SCED. In this paper, we tackle a novel and important direction, and analyze the economic cost of incorporating security constraints in economic dispatch. Inspired by existing inefficiency metrics in game theory and computer science, we introduce notion of price of security as a metric that formally characterizes the economic inefficiency of security-constrained economic dispatch as compared to the original problem without security constraints. Then, we focus on the preventive approach in a simple topology comprising two buses and two lines, and investigate the impact of generation availability and demand distribution on the price of security. Moreover, we explicitly derive the worst-case input instance that leads to the maximum price of security. By extensive experimental study on two test-cases, we verify the analytical results and provide insights for characterizing the price of security in general networks.
△ Less
Submitted 2 June, 2017;
originally announced June 2017.
-
Deep Rotation Equivariant Network
Authors:
Junying Li,
Zichen Yang,
Haifeng Liu,
Deng Cai
Abstract:
Recently, learning equivariant representations has attracted considerable research attention. Dieleman et al. introduce four operations which can be inserted into convolutional neural network to learn deep representations equivariant to rotation. However, feature maps should be copied and rotated four times in each layer in their approach, which causes much running time and memory overhead. In ord…
▽ More
Recently, learning equivariant representations has attracted considerable research attention. Dieleman et al. introduce four operations which can be inserted into convolutional neural network to learn deep representations equivariant to rotation. However, feature maps should be copied and rotated four times in each layer in their approach, which causes much running time and memory overhead. In order to address this problem, we propose Deep Rotation Equivariant Network consisting of cycle layers, isotonic layers and decycle layers. Our proposed layers apply rotation transformation on filters rather than feature maps, achieving a speed up of more than 2 times with even less memory overhead. We evaluate DRENs on Rotated MNIST and CIFAR-10 datasets and demonstrate that it can improve the performance of state-of-the-art architectures.
△ Less
Submitted 28 February, 2018; v1 submitted 24 May, 2017;
originally announced May 2017.
-
SMASH: Structured matrix approximation by separation and hierarchy
Authors:
Difeng Cai,
Edmond Chow,
Yousef Saad,
Yuanzhe Xi
Abstract:
This paper presents an efficient method to perform Structured Matrix Approximation by Separation and Hierarchy (SMASH), when the original dense matrix is associated with a kernel function. Given points in a domain, a tree structure is first constructed based on an adaptive partitioning of the computational domain to facilitate subsequent approximation procedures. In contrast to existing schemes ba…
▽ More
This paper presents an efficient method to perform Structured Matrix Approximation by Separation and Hierarchy (SMASH), when the original dense matrix is associated with a kernel function. Given points in a domain, a tree structure is first constructed based on an adaptive partitioning of the computational domain to facilitate subsequent approximation procedures. In contrast to existing schemes based on either analytic or purely algebraic approximations, SMASH takes advantage of both approaches and greatly improves the efficiency. The algorithm follows a bottom-up traversal of the tree and is able to perform the operations associated with each node on the same level in parallel. A strong rank-revealing factorization is applied to the initial analytic approximation in the separation regime so that a special structure is incorporated into the final nested bases. As a consequence, the storage is significantly reduced on one hand and a hierarchy of the original grid is constructed on the other hand. Due to this hierarchy, nested bases at upper levels can be computed in a similar way as the leaf level operations but on coarser grids. The main advantages of SMASH include its simplicity of implementation, its flexibility to construct various hierarchical rank structures and a low storage cost. Rigorous error analysis and complexity analysis are conducted to show that this scheme is fast and stable. The efficiency and robustness of SMASH are demonstrated through various test problems arising from integral equations, structured matrices, etc.
△ Less
Submitted 15 May, 2017;
originally announced May 2017.
-
The Forgettable-Watcher Model for Video Question Answering
Authors:
Hongyang Xue,
Zhou Zhao,
Deng Cai
Abstract:
A number of visual question answering approaches have been proposed recently, aiming at understanding the visual scenes by answering the natural language questions. While the image question answering has drawn significant attention, video question answering is largely unexplored.
Video-QA is different from Image-QA since the information and the events are scattered among multiple frames. In orde…
▽ More
A number of visual question answering approaches have been proposed recently, aiming at understanding the visual scenes by answering the natural language questions. While the image question answering has drawn significant attention, video question answering is largely unexplored.
Video-QA is different from Image-QA since the information and the events are scattered among multiple frames. In order to better utilize the temporal structure of the videos and the phrasal structures of the answers, we propose two mechanisms: the re-watching and the re-reading mechanisms and combine them into the forgettable-watcher model. Then we propose a TGIF-QA dataset for video question answering with the help of automatic question generation. Finally, we evaluate the models on our dataset. The experimental results show the effectiveness of our proposed models.
△ Less
Submitted 3 May, 2017;
originally announced May 2017.
-
Fast and Accurate Neural Word Segmentation for Chinese
Authors:
Deng Cai,
Hai Zhao,
Zhisong Zhang,
Yuan Xin,
Yongjian Wu,
Feiyue Huang
Abstract:
Neural models with minimal feature engineering have achieved competitive performance against traditional methods for the task of Chinese word segmentation. However, both training and working procedures of the current neural models are computationally inefficient. This paper presents a greedy neural word segmenter with balanced word and character embedding inputs to alleviate the existing drawbacks…
▽ More
Neural models with minimal feature engineering have achieved competitive performance against traditional methods for the task of Chinese word segmentation. However, both training and working procedures of the current neural models are computationally inefficient. This paper presents a greedy neural word segmenter with balanced word and character embedding inputs to alleviate the existing drawbacks. Our segmenter is truly end-to-end, capable of performing segmentation much faster and even more accurate than state-of-the-art neural models on Chinese benchmark datasets.
△ Less
Submitted 24 April, 2017;
originally announced April 2017.
-
Convolutional Low-Resolution Fine-Grained Classification
Authors:
Dingding Cai,
Ke Chen,
Yanlin Qian,
Joni-Kristian Kämäräinen
Abstract:
Successful fine-grained image classification methods learn subtle details between visually similar (sub-)classes, but the problem becomes significantly more challenging if the details are missing due to low resolution. Encouraged by the recent success of Convolutional Neural Network (CNN) architectures in image classification, we propose a novel resolution-aware deep model which combines convoluti…
▽ More
Successful fine-grained image classification methods learn subtle details between visually similar (sub-)classes, but the problem becomes significantly more challenging if the details are missing due to low resolution. Encouraged by the recent success of Convolutional Neural Network (CNN) architectures in image classification, we propose a novel resolution-aware deep model which combines convolutional image super-resolution and convolutional fine-grained classification into a single model in an end-to-end manner. Extensive experiments on the Stanford Cars and Caltech-UCSD Birds 200-2011 benchmarks demonstrate that the proposed model consistently performs better than conventional convolutional net on classifying fine-grained object classes in low-resolution images.
△ Less
Submitted 16 October, 2017; v1 submitted 15 March, 2017;
originally announced March 2017.
-
On the Role of a Market Maker in Networked Cournot Competition
Authors:
Desmond Cai,
Subhonmesh Bose,
Adam Wierman
Abstract:
We study Cournot competition among firms in a networked marketplace that is centrally managed by a market maker. In particular, we study a situation in which a market maker facilitates trade between geographically separate markets via a constrained transport network. Our focus is on understanding the consequences of the design of the market maker and on providing tools for optimal design. To that…
▽ More
We study Cournot competition among firms in a networked marketplace that is centrally managed by a market maker. In particular, we study a situation in which a market maker facilitates trade between geographically separate markets via a constrained transport network. Our focus is on understanding the consequences of the design of the market maker and on providing tools for optimal design. To that end we provide a characterization of the equilibrium outcomes of the game between the firms and the market maker. Our results highlight that the equilibrium structure is impacted dramatically by the market maker's objective - depending on the objective there may be a unique equilibrium, multiple equilibria, or no equilibria. Further, the game may be a potential game (as in the case of classical Cournot competition) or not. Beyond characterizing the equilibria of the game, we provide an approach for designing the market maker in order to optimize a design objective (e.g., social welfare) at the equilibrium of the game. Additionally, we use our results to explore the value of transport (trade) and the efficiency of the market maker (as compared to a single, aggregate market).
△ Less
Submitted 19 April, 2019; v1 submitted 30 January, 2017;
originally announced January 2017.
-
A Revisit of Hashing Algorithms for Approximate Nearest Neighbor Search
Authors:
Deng Cai
Abstract:
Approximate Nearest Neighbor Search (ANNS) is a fundamental problem in many areas of machine learning and data mining. During the past decade, numerous hashing algorithms are proposed to solve this problem. Every proposed algorithm claims outperform other state-of-the-art hashing methods. However, the evaluation of these hashing papers was not thorough enough, and those claims should be re-examine…
▽ More
Approximate Nearest Neighbor Search (ANNS) is a fundamental problem in many areas of machine learning and data mining. During the past decade, numerous hashing algorithms are proposed to solve this problem. Every proposed algorithm claims outperform other state-of-the-art hashing methods. However, the evaluation of these hashing papers was not thorough enough, and those claims should be re-examined. The ultimate goal of an ANNS method is returning the most accurate answers (nearest neighbors) in the shortest time. If implemented correctly, almost all the hashing methods will have their performance improved as the code length increases. However, many existing hashing papers only report the performance with the code length shorter than 128. In this paper, we carefully revisit the problem of search with a hash index, and analyze the pros and cons of two popular hash index search procedures. Then we proposed a very simple but effective two level index structures and make a thorough comparison of eleven popular hashing algorithms. Surprisingly, the random-projection-based Locality Sensitive Hashing (LSH) is the best performed algorithm, which is in contradiction to the claims in all the other ten hashing papers. Despite the extreme simplicity of random-projection-based LSH, our results show that the capability of this algorithm has been far underestimated. For the sake of reproducibility, all the codes used in the paper are released on GitHub, which can be used as a testing platform for a fair comparison between various hashing algorithms.
△ Less
Submitted 18 June, 2019; v1 submitted 22 December, 2016;
originally announced December 2016.
-
Edge-exchangeable graphs and sparsity (NIPS 2016)
Authors:
Diana Cai,
Trevor Campbell,
Tamara Broderick
Abstract:
Many popular network models rely on the assumption of (vertex) exchangeability, in which the distribution of the graph is invariant to relabelings of the vertices. However, the Aldous-Hoover theorem guarantees that these graphs are dense or empty with probability one, whereas many real-world graphs are sparse. We present an alternative notion of exchangeability for random graphs, which we call edg…
▽ More
Many popular network models rely on the assumption of (vertex) exchangeability, in which the distribution of the graph is invariant to relabelings of the vertices. However, the Aldous-Hoover theorem guarantees that these graphs are dense or empty with probability one, whereas many real-world graphs are sparse. We present an alternative notion of exchangeability for random graphs, which we call edge exchangeability, in which the distribution of a graph sequence is invariant to the order of the edges. We demonstrate that edge-exchangeable models, unlike models that are traditionally vertex exchangeable, can exhibit sparsity. To do so, we outline a general framework for graph generative models; by contrast to the pioneering work of Caron and Fox (2015), models within our framework are stationary across steps of the graph sequence. In particular, our model grows the graph by instantiating more latent atoms of a single random measure as the dataset size increases, rather than adding new atoms to the measure.
△ Less
Submitted 3 February, 2017; v1 submitted 16 December, 2016;
originally announced December 2016.
-
Question Retrieval for Community-based Question Answering via Heterogeneous Network Integration Learning
Authors:
Zheqian Chen,
Chi Zhang,
Zhou Zhao,
Deng Cai
Abstract:
Community based question answering platforms have attracted substantial users to share knowledge and learn from each other. As the rapid enlargement of CQA platforms, quantities of overlapped questions emerge, which makes users confounded to select a proper reference. It is urgent for us to take effective automated algorithms to reuse historical questions with corresponding answers. In this paper…
▽ More
Community based question answering platforms have attracted substantial users to share knowledge and learn from each other. As the rapid enlargement of CQA platforms, quantities of overlapped questions emerge, which makes users confounded to select a proper reference. It is urgent for us to take effective automated algorithms to reuse historical questions with corresponding answers. In this paper we focus on the problem with question retrieval, which aims to match historical questions that are relevant or semantically equivalent to resolve one s query directly. The challenges in this task are the lexical gaps between questions for the word ambiguity and word mismatch problem. Furthermore, limited words in queried sentences cause sparsity of word features. To alleviate these challenges, we propose a novel framework named HNIL which encodes not only the question contents but also the askers social interactions to enhance the question embedding performance. More specifically, we apply random walk based learning method with recurrent neural network to match the similarities between askers question and historical questions proposed by other users. Extensive experiments on a large scale dataset from a real world CQA site show that employing the heterogeneous social network information outperforms the other state of the art solutions in this task.
△ Less
Submitted 24 November, 2016;
originally announced November 2016.
-
User Personalized Satisfaction Prediction via Multiple Instance Deep Learning
Authors:
Zheqian Chen,
Ben Gao,
Huimin Zhang,
Zhou Zhao,
Deng Cai
Abstract:
Community based question answering services have arisen as a popular knowledge sharing pattern for netizens. With abundant interactions among users, individuals are capable of obtaining satisfactory information. However, it is not effective for users to attain answers within minutes. Users have to check the progress over time until the satisfying answers submitted. We address this problem as a use…
▽ More
Community based question answering services have arisen as a popular knowledge sharing pattern for netizens. With abundant interactions among users, individuals are capable of obtaining satisfactory information. However, it is not effective for users to attain answers within minutes. Users have to check the progress over time until the satisfying answers submitted. We address this problem as a user personalized satisfaction prediction task. Existing methods usually exploit manual feature selection. It is not desirable as it requires careful design and is labor intensive. In this paper, we settle this issue by developing a new multiple instance deep learning framework. Specifically, in our settings, each question follows a weakly supervised learning multiple instance learning assumption, where its obtained answers can be regarded as instance sets and we define the question resolved with at least one satisfactory answer. We thus design an efficient framework exploiting multiple instance learning property with deep learning to model the question answer pairs. Extensive experiments on large scale datasets from Stack Exchange demonstrate the feasibility of our proposed framework in predicting askers personalized satisfaction. Our framework can be extended to numerous applications such as UI satisfaction Prediction, multi armed bandit problem, expert finding and so on.
△ Less
Submitted 24 November, 2016;
originally announced November 2016.
-
Relational Multi-Manifold Co-Clustering
Authors:
Ping Li,
Jiajun Bu,
Chun Chen,
Zhanying He,
Deng Cai
Abstract:
Co-clustering targets on grouping the samples (e.g., documents, users) and the features (e.g., words, ratings) simultaneously. It employs the dual relation and the bilateral information between the samples and features. In many realworld applications, data usually reside on a submanifold of the ambient Euclidean space, but it is nontrivial to estimate the intrinsic manifold of the data space in a…
▽ More
Co-clustering targets on grouping the samples (e.g., documents, users) and the features (e.g., words, ratings) simultaneously. It employs the dual relation and the bilateral information between the samples and features. In many realworld applications, data usually reside on a submanifold of the ambient Euclidean space, but it is nontrivial to estimate the intrinsic manifold of the data space in a principled way. In this study, we focus on improving the co-clustering performance via manifold ensemble learning, which is able to maximally approximate the intrinsic manifolds of both the sample and feature spaces. To achieve this, we develop a novel co-clustering algorithm called Relational Multi-manifold Co-clustering (RMC) based on symmetric nonnegative matrix tri-factorization, which decomposes the relational data matrix into three submatrices. This method considers the intertype relationship revealed by the relational data matrix, and also the intra-type information reflected by the affinity matrices encoded on the sample and feature data distributions. Specifically, we assume the intrinsic manifold of the sample or feature space lies in a convex hull of some pre-defined candidate manifolds. We want to learn a convex combination of them to maximally approach the desired intrinsic manifold. To optimize the objective function, the multiplicative rules are utilized to update the submatrices alternatively. Besides, both the entropic mirror descent algorithm and the coordinate descent algorithm are exploited to learn the manifold coefficient vector. Extensive experiments on documents, images and gene expression data sets have demonstrated the superiority of the proposed algorithm compared to other well-established methods.
△ Less
Submitted 16 November, 2016;
originally announced November 2016.
-
Constrained Low-Rank Learning Using Least Squares-Based Regularization
Authors:
Ping Li,
Jun Yu,
Meng Wang,
Luming Zhang,
Deng Cai,
Xuelong Li
Abstract:
Low-rank learning has attracted much attention recently due to its efficacy in a rich variety of real-world tasks, e.g., subspace segmentation and image categorization. Most low-rank methods are incapable of capturing low-dimensional subspace for supervised learning tasks, e.g., classification and regression. This paper aims to learn both the discriminant low-rank representation (LRR) and the robu…
▽ More
Low-rank learning has attracted much attention recently due to its efficacy in a rich variety of real-world tasks, e.g., subspace segmentation and image categorization. Most low-rank methods are incapable of capturing low-dimensional subspace for supervised learning tasks, e.g., classification and regression. This paper aims to learn both the discriminant low-rank representation (LRR) and the robust projecting subspace in a supervised manner. To achieve this goal, we cast the problem into a constrained rank minimization framework by adopting the least squares regularization. Naturally, the data label structure tends to resemble that of the corresponding low-dimensional representation, which is derived from the robust subspace projection of clean data by low-rank learning. Moreover, the low-dimensional representation of original data can be paired with some informative structure by imposing an appropriate constraint, e.g., Laplacian regularizer. Therefore, we propose a novel constrained LRR method. The objective function is formulated as a constrained nuclear norm minimization problem, which can be solved by the inexact augmented Lagrange multiplier algorithm. Extensive experiments on image classification, human pose estimation, and robust face recovery have confirmed the superiority of our method.
△ Less
Submitted 15 November, 2016;
originally announced November 2016.
-
Automated scalable segmentation of neurons from multispectral images
Authors:
Uygar Sümbül,
Douglas Roussien Jr.,
Fei Chen,
Nicholas Barry,
Edward S. Boyden,
Dawen Cai,
John P. Cunningham,
Liam Paninski
Abstract:
Reconstruction of neuroanatomy is a fundamental problem in neuroscience. Stochastic expression of colors in individual cells is a promising tool, although its use in the nervous system has been limited due to various sources of variability in expression. Moreover, the intermingled anatomy of neuronal trees is challenging for existing segmentation algorithms. Here, we propose a method to automate t…
▽ More
Reconstruction of neuroanatomy is a fundamental problem in neuroscience. Stochastic expression of colors in individual cells is a promising tool, although its use in the nervous system has been limited due to various sources of variability in expression. Moreover, the intermingled anatomy of neuronal trees is challenging for existing segmentation algorithms. Here, we propose a method to automate the segmentation of neurons in such (potentially pseudo-colored) images. The method uses spatio-color relations between the voxels, generates supervoxels to reduce the problem size by four orders of magnitude before the final segmentation, and is parallelizable over the supervoxels. To quantify performance and gain insight, we generate simulated images, where the noise level and characteristics, the density of expression, and the number of fluorophore types are variable. We also present segmentations of real Brainbow images of the mouse hippocampus, which reveal many of the dendritic segments.
△ Less
Submitted 21 January, 2017; v1 submitted 1 November, 2016;
originally announced November 2016.
-
Exchangeable Trait Allocations
Authors:
Trevor Campbell,
Diana Cai,
Tamara Broderick
Abstract:
Trait allocations are a class of combinatorial structures in which data may belong to multiple groups and may have different levels of belonging in each group. Often the data are also exchangeable, i.e., their joint distribution is invariant to reordering. In clustering---a special case of trait allocation---exchangeability implies the existence of both a de Finetti representation and an exchangea…
▽ More
Trait allocations are a class of combinatorial structures in which data may belong to multiple groups and may have different levels of belonging in each group. Often the data are also exchangeable, i.e., their joint distribution is invariant to reordering. In clustering---a special case of trait allocation---exchangeability implies the existence of both a de Finetti representation and an exchangeable partition probability function (EPPF), distributional representations useful for computational and theoretical purposes. In this work, we develop the analogous de Finetti representation and exchangeable trait probability function (ETPF) for trait allocations, along with a characterization of all trait allocations with an ETPF. Unlike previous feature allocation characterizations, our proofs fully capture single-occurrence "dust" groups. We further introduce a novel constrained version of the ETPF that we use to establish an intuitive connection between the probability functions for clustering, feature allocations, and trait allocations. As an application of our general theory, we characterize the distribution of all edge-exchangeable graphs, a class of recently-developed models that captures realistic sparse graph sequences.
△ Less
Submitted 5 July, 2018; v1 submitted 28 September, 2016;
originally announced September 2016.
-
EFANNA : An Extremely Fast Approximate Nearest Neighbor Search Algorithm Based on kNN Graph
Authors:
Cong Fu,
Deng Cai
Abstract:
Approximate nearest neighbor (ANN) search is a fundamental problem in many areas of data mining, machine learning and computer vision. The performance of traditional hierarchical structure (tree) based methods decreases as the dimensionality of data grows, while hashing based methods usually lack efficiency in practice. Recently, the graph based methods have drawn considerable attention. The main…
▽ More
Approximate nearest neighbor (ANN) search is a fundamental problem in many areas of data mining, machine learning and computer vision. The performance of traditional hierarchical structure (tree) based methods decreases as the dimensionality of data grows, while hashing based methods usually lack efficiency in practice. Recently, the graph based methods have drawn considerable attention. The main idea is that \emph{a neighbor of a neighbor is also likely to be a neighbor}, which we refer as \emph{NN-expansion}. These methods construct a $k$-nearest neighbor ($k$NN) graph offline. And at online search stage, these methods find candidate neighbors of a query point in some way (\eg, random selection), and then check the neighbors of these candidate neighbors for closer ones iteratively. Despite some promising results, there are mainly two problems with these approaches: 1) These approaches tend to converge to local optima. 2) Constructing a $k$NN graph is time consuming. We find that these two problems can be nicely solved when we provide a good initialization for NN-expansion. In this paper, we propose EFANNA, an extremely fast approximate nearest neighbor search algorithm based on $k$NN Graph. Efanna nicely combines the advantages of hierarchical structure based methods and nearest-neighbor-graph based methods. Extensive experiments have shown that EFANNA outperforms the state-of-art algorithms both on approximate nearest neighbor search and approximate nearest neighbor graph construction. To the best of our knowledge, EFANNA is the fastest algorithm so far both on approximate nearest neighbor graph construction and approximate nearest neighbor search. A library EFANNA based on this research is released on Github.
△ Less
Submitted 3 December, 2016; v1 submitted 23 September, 2016;
originally announced September 2016.
-
Scaling Up Sparse Support Vector Machines by Simultaneous Feature and Sample Reduction
Authors:
Weizhong Zhang,
Bin Hong,
Wei Liu,
Jieping Ye,
Deng Cai,
Xiaofei He,
Jie Wang
Abstract:
Sparse support vector machine (SVM) is a popular classification technique that can simultaneously learn a small set of the most interpretable features and identify the support vectors. It has achieved great successes in many real-world applications. However, for large-scale problems involving a huge number of samples and ultra-high dimensional features, solving sparse SVMs remains challenging. By…
▽ More
Sparse support vector machine (SVM) is a popular classification technique that can simultaneously learn a small set of the most interpretable features and identify the support vectors. It has achieved great successes in many real-world applications. However, for large-scale problems involving a huge number of samples and ultra-high dimensional features, solving sparse SVMs remains challenging. By noting that sparse SVMs induce sparsities in both feature and sample spaces, we propose a novel approach, which is based on accurate estimations of the primal and dual optima of sparse SVMs, to simultaneously identify the inactive features and samples that are guaranteed to be irrelevant to the outputs. Thus, we can remove the identified inactive samples and features from the training phase, leading to substantial savings in the computational cost without sacrificing the accuracy. Moreover, we show that our method can be extended to multi-class sparse support vector machines. To the best of our knowledge, the proposed method is the \emph{first} \emph{static} feature and sample reduction method for sparse SVMs and multi-class sparse SVMs. Experiments on both synthetic and real data sets demonstrate that our approach significantly outperforms state-of-the-art methods and the speedup gained by our approach can be orders of magnitude.
△ Less
Submitted 18 July, 2019; v1 submitted 24 July, 2016;
originally announced July 2016.
-
On the Inefficiency of Forward Markets in Leader-Follower Competition
Authors:
Desmond Cai,
Anish Agarwal,
Adam Wierman
Abstract:
Motivated by electricity markets, this paper studies the impact of forward contracting in situations where firms have capacity constraints and heterogeneous production lead times. We consider a model with two types of firms - leaders and followers - that choose production at two different times. Followers choose productions in the second stage but can sell forward contracts in the first stage. Our…
▽ More
Motivated by electricity markets, this paper studies the impact of forward contracting in situations where firms have capacity constraints and heterogeneous production lead times. We consider a model with two types of firms - leaders and followers - that choose production at two different times. Followers choose productions in the second stage but can sell forward contracts in the first stage. Our main result is an explicit characterization of the equilibrium outcomes. Classic results on forward contracting suggest that it can mitigate market power in simple settings; however the results in this paper show that the impact of forward markets in this setting is delicate - forward contracting can enhance or mitigate market power. In particular, our results show that leader-follower interactions created by heterogeneous production lead times may cause forward markets to be inefficient, even when there are a large number of followers. In fact, symmetric equilibria do not necessarily exist due to differences in market power among the leaders and followers.
△ Less
Submitted 28 June, 2016;
originally announced June 2016.