-
Medical Multimodal Classifiers Under Scarce Data Condition
Authors:
Faik Aydin,
Maggie Zhang,
Michelle Ananda-Rajah,
Gholamreza Haffari
Abstract:
Data is one of the essential ingredients to power deep learning research. Small datasets, especially specific to medical institutes, bring challenges to deep learning training stage. This work aims to develop a practical deep multimodal that can classify patients into abnormal and normal categories accurately as well as assist radiologists to detect visual and textual anomalies by locating areas o…
▽ More
Data is one of the essential ingredients to power deep learning research. Small datasets, especially specific to medical institutes, bring challenges to deep learning training stage. This work aims to develop a practical deep multimodal that can classify patients into abnormal and normal categories accurately as well as assist radiologists to detect visual and textual anomalies by locating areas of interest. The detection of the anomalies is achieved through a novel technique which extends the integrated gradients methodology with an unsupervised clustering algorithm. This technique also introduces a tuning parameter which trades off true positive signals to denoise false positive signals in the detection process. To overcome the challenges of the small training dataset which only has 3K frontal X-ray images and medical reports in pairs, we have adopted transfer learning for the multimodal which concatenates the layers of image and text submodels. The image submodel was trained on the vast ChestX-ray14 dataset, while the text submodel transferred a pertained word embedding layer from a hospital-specific corpus. Experimental results show that our multimodal improves the accuracy of the classification by 4% and 7% on average of 50 epochs, compared to the individual text and image model, respectively.
△ Less
Submitted 23 February, 2019;
originally announced February 2019.
-
A new simple and effective measure for bag-of-word inter-document similarity measurement
Authors:
Sunil Aryal,
Kai Ming Ting,
Takashi Washio,
Gholamreza Haffari
Abstract:
To measure the similarity of two documents in the bag-of-words (BoW) vector representation, different term weighting schemes are used to improve the performance of cosine similarity---the most widely used inter-document similarity measure in text mining. In this paper, we identify the shortcomings of the underlying assumptions of term weighting in the inter-document similarity measurement task; an…
▽ More
To measure the similarity of two documents in the bag-of-words (BoW) vector representation, different term weighting schemes are used to improve the performance of cosine similarity---the most widely used inter-document similarity measure in text mining. In this paper, we identify the shortcomings of the underlying assumptions of term weighting in the inter-document similarity measurement task; and provide a more fit-to-the-purpose alternative. Based on this new assumption, we introduce a new simple but effective similarity measure which does not require explicit term weighting. The proposed measure employs a more nuanced probabilistic approach than those used in term weighting to measure the similarity of two documents w.r.t each term occurring in the two documents. Our empirical comparison with the existing similarity measures using different term weighting schemes shows that the new measure produces (i) better results in the binary BoW representation; and (ii) competitive and more consistent results in the term-frequency-based BoW representation.
△ Less
Submitted 9 February, 2019;
originally announced February 2019.
-
Exploring Textual and Speech information in Dialogue Act Classification with Speaker Domain Adaptation
Authors:
Xuanli He,
Quan Hung Tran,
William Havard,
Laurent Besacier,
Ingrid Zukerman,
Gholamreza Haffari
Abstract:
In spite of the recent success of Dialogue Act (DA) classification, the majority of prior works focus on text-based classification with oracle transcriptions, i.e. human transcriptions, instead of Automatic Speech Recognition (ASR)'s transcriptions. In spoken dialog systems, however, the agent would only have access to noisy ASR transcriptions, which may further suffer performance degradation due…
▽ More
In spite of the recent success of Dialogue Act (DA) classification, the majority of prior works focus on text-based classification with oracle transcriptions, i.e. human transcriptions, instead of Automatic Speech Recognition (ASR)'s transcriptions. In spoken dialog systems, however, the agent would only have access to noisy ASR transcriptions, which may further suffer performance degradation due to domain shift. In this paper, we explore the effectiveness of using both acoustic and textual signals, either oracle or ASR transcriptions, and investigate speaker domain adaptation for DA classification. Our multimodal model proves to be superior to the unimodal models, particularly when the oracle transcriptions are not available. We also propose an effective method for speaker domain adaptation, which achieves competitive results.
△ Less
Submitted 17 October, 2018;
originally announced October 2018.
-
Sequence to Sequence Mixture Model for Diverse Machine Translation
Authors:
Xuanli He,
Gholamreza Haffari,
Mohammad Norouzi
Abstract:
Sequence to sequence (SEQ2SEQ) models often lack diversity in their generated translations. This can be attributed to the limitation of SEQ2SEQ models in capturing lexical and syntactic variations in a parallel corpus resulting from different styles, genres, topics, or ambiguity of the translation process. In this paper, we develop a novel sequence to sequence mixture (S2SMIX) model that improves…
▽ More
Sequence to sequence (SEQ2SEQ) models often lack diversity in their generated translations. This can be attributed to the limitation of SEQ2SEQ models in capturing lexical and syntactic variations in a parallel corpus resulting from different styles, genres, topics, or ambiguity of the translation process. In this paper, we develop a novel sequence to sequence mixture (S2SMIX) model that improves both translation diversity and quality by adopting a committee of specialized translation models rather than a single translation model. Each mixture component selects its own training dataset via optimization of the marginal loglikelihood, which leads to a soft clustering of the parallel corpus. Experiments on four language pairs demonstrate the superiority of our mixture model compared to a SEQ2SEQ baseline with standard or diversity-boosted beam search. Our mixture model uses negligible additional parameters and incurs no extra computation cost during decoding.
△ Less
Submitted 17 October, 2018;
originally announced October 2018.
-
Contextual Neural Model for Translating Bilingual Multi-Speaker Conversations
Authors:
Sameen Maruf,
André F. T. Martins,
Gholamreza Haffari
Abstract:
Recent works in neural machine translation have begun to explore document translation. However, translating online multi-speaker conversations is still an open problem. In this work, we propose the task of translating Bilingual Multi-Speaker Conversations, and explore neural architectures which exploit both source and target-side conversation histories for this task. To initiate an evaluation for…
▽ More
Recent works in neural machine translation have begun to explore document translation. However, translating online multi-speaker conversations is still an open problem. In this work, we propose the task of translating Bilingual Multi-Speaker Conversations, and explore neural architectures which exploit both source and target-side conversation histories for this task. To initiate an evaluation for this task, we introduce datasets extracted from Europarl v7 and OpenSubtitles2016. Our experiments on four language-pairs confirm the significance of leveraging conversation history, both in terms of BLEU and manual evaluation.
△ Less
Submitted 2 September, 2018;
originally announced September 2018.
-
Graph-to-Sequence Learning using Gated Graph Neural Networks
Authors:
Daniel Beck,
Gholamreza Haffari,
Trevor Cohn
Abstract:
Many NLP applications can be framed as a graph-to-sequence learning problem. Previous work proposing neural architectures on this setting obtained promising results compared to grammar-based approaches but still rely on linearisation heuristics and/or standard recurrent networks to achieve the best performance. In this work, we propose a new model that encodes the full structural information conta…
▽ More
Many NLP applications can be framed as a graph-to-sequence learning problem. Previous work proposing neural architectures on this setting obtained promising results compared to grammar-based approaches but still rely on linearisation heuristics and/or standard recurrent networks to achieve the best performance. In this work, we propose a new model that encodes the full structural information contained in the graph. Our architecture couples the recently proposed Gated Graph Neural Networks with an input transformation that allows nodes and edges to have their own hidden representations, while tackling the parameter explosion problem present in previous work. Experimental results show that our model outperforms strong baselines in generation from AMR graphs and syntax-based neural machine translation.
△ Less
Submitted 26 June, 2018;
originally announced June 2018.
-
Neural Machine Translation for Bilingually Scarce Scenarios: A Deep Multi-task Learning Approach
Authors:
Poorya Zaremoodi,
Gholamreza Haffari
Abstract:
Neural machine translation requires large amounts of parallel training text to learn a reasonable-quality translation model. This is particularly inconvenient for language pairs for which enough parallel text is not available. In this paper, we use monolingual linguistic resources in the source side to address this challenging problem based on a multi-task learning approach. More specifically, we…
▽ More
Neural machine translation requires large amounts of parallel training text to learn a reasonable-quality translation model. This is particularly inconvenient for language pairs for which enough parallel text is not available. In this paper, we use monolingual linguistic resources in the source side to address this challenging problem based on a multi-task learning approach. More specifically, we scaffold the machine translation task on auxiliary tasks including semantic parsing, syntactic parsing, and named-entity recognition. This effectively injects semantic and/or syntactic knowledge into the translation model, which would otherwise require a large amount of training bitext. We empirically evaluate and show the effectiveness of our multi-task learning approach on three translation tasks: English-to-French, English-to-Farsi, and English-to-Vietnamese.
△ Less
Submitted 10 May, 2018;
originally announced May 2018.
-
Incorporating Syntactic Uncertainty in Neural Machine Translation with Forest-to-Sequence Model
Authors:
Poorya Zaremoodi,
Gholamreza Haffari
Abstract:
Incorporating syntactic information in Neural Machine Translation models is a method to compensate their requirement for a large amount of parallel training text, especially for low-resource language pairs. Previous works on using syntactic information provided by (inevitably error-prone) parsers has been promising. In this paper, we propose a forest-to-sequence Attentional Neural Machine Translat…
▽ More
Incorporating syntactic information in Neural Machine Translation models is a method to compensate their requirement for a large amount of parallel training text, especially for low-resource language pairs. Previous works on using syntactic information provided by (inevitably error-prone) parsers has been promising. In this paper, we propose a forest-to-sequence Attentional Neural Machine Translation model to make use of exponentially many parse trees of the source sentence to compensate for the parser errors. Our method represents the collection of parse trees as a packed forest, and learns a neural attentional transduction model from the forest to the target sentence. Experiments on English to German, Chinese and Persian translation show the superiority of our method over the tree-to-sequence and vanilla sequence-to-sequence neural translation models.
△ Less
Submitted 23 November, 2017; v1 submitted 19 November, 2017;
originally announced November 2017.
-
Document Context Neural Machine Translation with Memory Networks
Authors:
Sameen Maruf,
Gholamreza Haffari
Abstract:
We present a document-level neural machine translation model which takes both source and target document context into account using memory networks. We model the problem as a structured prediction problem with interdependencies among the observed and hidden variables, i.e., the source sentences and their unobserved target translations in the document. The resulting structured prediction problem is…
▽ More
We present a document-level neural machine translation model which takes both source and target document context into account using memory networks. We model the problem as a structured prediction problem with interdependencies among the observed and hidden variables, i.e., the source sentences and their unobserved target translations in the document. The resulting structured prediction problem is tackled with a neural translation model equipped with two memory components, one each for the source and target side, to capture the documental interdependencies. We train the model end-to-end, and propose an iterative decoding algorithm based on block coordinate descent. Experimental results of English translations from French, German, and Estonian documents show that our model is effective in exploiting both source and target document context, and statistically significantly outperforms the previous work in terms of BLEU and METEOR.
△ Less
Submitted 16 May, 2018; v1 submitted 9 November, 2017;
originally announced November 2017.
-
Towards Decoding as Continuous Optimization in Neural Machine Translation
Authors:
Cong Duy Vu Hoang,
Gholamreza Haffari,
Trevor Cohn
Abstract:
We propose a novel decoding approach for neural machine translation (NMT) based on continuous optimisation. We convert decoding - basically a discrete optimization problem - into a continuous optimization problem. The resulting constrained continuous optimisation problem is then tackled using gradient-based methods. Our powerful decoding framework enables decoding intractable models such as the in…
▽ More
We propose a novel decoding approach for neural machine translation (NMT) based on continuous optimisation. We convert decoding - basically a discrete optimization problem - into a continuous optimization problem. The resulting constrained continuous optimisation problem is then tackled using gradient-based methods. Our powerful decoding framework enables decoding intractable models such as the intersection of left-to-right and right-to-left (bidirectional) as well as source-to-target and target-to-source (bilingual) NMT models. Our empirical results show that our decoding framework is effective, and leads to substantial improvements in translations generated from the intersected models where the typical greedy or beam search is not feasible. We also compare our framework against reranking, and analyse its advantages and disadvantages.
△ Less
Submitted 22 July, 2017; v1 submitted 11 January, 2017;
originally announced January 2017.
-
Fast, Small and Exact: Infinite-order Language Modelling with Compressed Suffix Trees
Authors:
Ehsan Shareghi,
Matthias Petri,
Gholamreza Haffari,
Trevor Cohn
Abstract:
Efficient methods for storing and querying are critical for scaling high-order n-gram language models to large corpora. We propose a language model based on compressed suffix trees, a representation that is highly compact and can be easily held in memory, while supporting queries needed in computing language model probabilities on-the-fly. We present several optimisations which improve query runti…
▽ More
Efficient methods for storing and querying are critical for scaling high-order n-gram language models to large corpora. We propose a language model based on compressed suffix trees, a representation that is highly compact and can be easily held in memory, while supporting queries needed in computing language model probabilities on-the-fly. We present several optimisations which improve query runtimes up to 2500x, despite only incurring a modest increase in construction time and memory usage. For large corpora and high Markov orders, our method is highly competitive with the state-of-the-art KenLM package. It imposes much lower memory requirements, often by orders of magnitude, and has runtimes that are either similar (for training) or comparable (for querying).
△ Less
Submitted 15 August, 2016;
originally announced August 2016.
-
Word Representation Models for Morphologically Rich Languages in Neural Machine Translation
Authors:
Ekaterina Vylomova,
Trevor Cohn,
Xuanli He,
Gholamreza Haffari
Abstract:
Dealing with the complex word forms in morphologically rich languages is an open problem in language processing, and is particularly important in translation. In contrast to most modern neural systems of translation, which discard the identity for rare words, in this paper we propose several architectures for learning word representations from character and morpheme level word decompositions. We i…
▽ More
Dealing with the complex word forms in morphologically rich languages is an open problem in language processing, and is particularly important in translation. In contrast to most modern neural systems of translation, which discard the identity for rare words, in this paper we propose several architectures for learning word representations from character and morpheme level word decompositions. We incorporate these representations in a novel machine translation model which jointly learns word alignments and translations via a hard attention mechanism. Evaluating on translating from several morphologically rich languages into English, we show consistent improvements over strong baseline methods, of between 1 and 1.5 BLEU points.
△ Less
Submitted 14 June, 2016;
originally announced June 2016.
-
Prepositional Attachment Disambiguation Using Bilingual Parsing and Alignments
Authors:
Geetanjali Rakshit,
Sagar Sontakke,
Pushpak Bhattacharyya,
Gholamreza Haffari
Abstract:
In this paper, we attempt to solve the problem of Prepositional Phrase (PP) attachments in English. The motivation for the work comes from NLP applications like Machine Translation, for which, getting the correct attachment of prepositions is very crucial. The idea is to correct the PP-attachments for a sentence with the help of alignments from parallel data in another language. The novelty of our…
▽ More
In this paper, we attempt to solve the problem of Prepositional Phrase (PP) attachments in English. The motivation for the work comes from NLP applications like Machine Translation, for which, getting the correct attachment of prepositions is very crucial. The idea is to correct the PP-attachments for a sentence with the help of alignments from parallel data in another language. The novelty of our work lies in the formulation of the problem into a dual decomposition based algorithm that enforces agreement between the parse trees from two languages as a constraint. Experiments were performed on the English-Hindi language pair and the performance improved by 10% over the baseline, where the baseline is the attachment predicted by the MSTParser model trained for English.
△ Less
Submitted 28 March, 2016;
originally announced March 2016.
-
A Latent Variable Recurrent Neural Network for Discourse Relation Language Models
Authors:
Yangfeng Ji,
Gholamreza Haffari,
Jacob Eisenstein
Abstract:
This paper presents a novel latent variable recurrent neural network architecture for jointly modeling sequences of words and (possibly latent) discourse relations between adjacent sentences. A recurrent neural network generates individual words, thus reaping the benefits of discriminatively-trained vector representations. The discourse relations are represented with a latent variable, which can b…
▽ More
This paper presents a novel latent variable recurrent neural network architecture for jointly modeling sequences of words and (possibly latent) discourse relations between adjacent sentences. A recurrent neural network generates individual words, thus reaping the benefits of discriminatively-trained vector representations. The discourse relations are represented with a latent variable, which can be predicted or marginalized, depending on the task. The resulting model can therefore employ a training objective that includes not only discourse relation classification, but also word prediction. As a result, it outperforms state-of-the-art alternatives for two tasks: implicit discourse relation classification in the Penn Discourse Treebank, and dialog act classification in the Switchboard corpus. Furthermore, by marginalizing over latent discourse relations at test time, we obtain a discourse informed language model, which improves over a strong LSTM baseline.
△ Less
Submitted 5 April, 2016; v1 submitted 6 March, 2016;
originally announced March 2016.
-
Incorporating Structural Alignment Biases into an Attentional Neural Translation Model
Authors:
Trevor Cohn,
Cong Duy Vu Hoang,
Ekaterina Vymolova,
Kaisheng Yao,
Chris Dyer,
Gholamreza Haffari
Abstract:
Neural encoder-decoder models of machine translation have achieved impressive results, rivalling traditional translation models. However their modelling formulation is overly simplistic, and omits several key inductive biases built into traditional models. In this paper we extend the attentional neural translation model to include structural biases from word based alignment models, including posit…
▽ More
Neural encoder-decoder models of machine translation have achieved impressive results, rivalling traditional translation models. However their modelling formulation is overly simplistic, and omits several key inductive biases built into traditional models. In this paper we extend the attentional neural translation model to include structural biases from word based alignment models, including positional bias, Markov conditioning, fertility and agreement over translation directions. We show improvements over a baseline attentional model and standard phrase-based model over several language pairs, evaluating on difficult languages in a low resource setting.
△ Less
Submitted 6 January, 2016;
originally announced January 2016.
-
Novel Bernstein-like Concentration Inequalities for the Missing Mass
Authors:
Bahman Yari Saeed Khanloo,
Gholamreza Haffari
Abstract:
We are concerned with obtaining novel concentration inequalities for the missing mass, i.e. the total probability mass of the outcomes not observed in the sample. We not only derive - for the first time - distribution-free Bernstein-like deviation bounds with sublinear exponents in deviation size for missing mass, but also improve the results of McAllester and Ortiz (2003) andBerend and Kontorovic…
▽ More
We are concerned with obtaining novel concentration inequalities for the missing mass, i.e. the total probability mass of the outcomes not observed in the sample. We not only derive - for the first time - distribution-free Bernstein-like deviation bounds with sublinear exponents in deviation size for missing mass, but also improve the results of McAllester and Ortiz (2003) andBerend and Kontorovich (2013, 2012) for small deviations which is the most interesting case in learning theory. It is known that the majority of standard inequalities cannot be directly used to analyze heterogeneous sums i.e. sums whose terms have large difference in magnitude. Our generic and intuitive approach shows that the heterogeneity issue introduced in McAllester and Ortiz (2003) is resolvable at least in the case of missing mass via regulating the terms using our novel thresholding technique.
△ Less
Submitted 19 June, 2015; v1 submitted 10 March, 2015;
originally announced March 2015.
-
Structured Prediction of Sequences and Trees using Infinite Contexts
Authors:
Ehsan Shareghi,
Gholamreza Haffari,
Trevor Cohn,
Ann Nicholson
Abstract:
Linguistic structures exhibit a rich array of global phenomena, however commonly used Markov models are unable to adequately describe these phenomena due to their strong locality assumptions. We propose a novel hierarchical model for structured prediction over sequences and trees which exploits global context by conditioning each generation decision on an unbounded context of prior decisions. This…
▽ More
Linguistic structures exhibit a rich array of global phenomena, however commonly used Markov models are unable to adequately describe these phenomena due to their strong locality assumptions. We propose a novel hierarchical model for structured prediction over sequences and trees which exploits global context by conditioning each generation decision on an unbounded context of prior decisions. This builds on the success of Markov models but without imposing a fixed bound in order to better represent global phenomena. To facilitate learning of this large and unbounded model, we use a hierarchical Pitman-Yor process prior which provides a recursive form of smoothing. We propose prediction algorithms based on A* and Markov Chain Monte Carlo sampling. Empirical results demonstrate the potential of our model compared to baseline finite-context Markov models on part-of-speech tagging and syntactic parsing.
△ Less
Submitted 9 March, 2015;
originally announced March 2015.
-
HetFHMM: A novel approach to infer tumor heterogeneity using factorial Hidden Markov model
Authors:
Gholamreza Haffari,
Zhaoxiang Cai,
Mohammad S. Rahman,
Ann E. Nicholson
Abstract:
Cancer arises from successive rounds of mutations which generate tumor cells with different genomic variation i.e. clones. For drug responsiveness and therapeutics, it is necessary to identify the clones in tumor sample accurately. Many methods are developed to infer tumor heterogeneity by either computing cellular prevalence and tumor phylogeny or predicting genotype of mutations. All methods suf…
▽ More
Cancer arises from successive rounds of mutations which generate tumor cells with different genomic variation i.e. clones. For drug responsiveness and therapeutics, it is necessary to identify the clones in tumor sample accurately. Many methods are developed to infer tumor heterogeneity by either computing cellular prevalence and tumor phylogeny or predicting genotype of mutations. All methods suffer some problems e.g. inaccurate computation of clonal frequencies, discarding clone specific genotypes etc. In the paper, we propose a method, called- HetFHMM to infer tumor heterogeneity by predicting clone specific genotypes and cellular prevalence. To infer clone specific genotype, we consider the presence of multiple mutations at any genomic location. We also tested our model on different simulated data. The results shows that HetFHMM outperforms recent methods which infer tumor heterogeneity. Therefore, HetFHMM is a novel approach in tumor heterogeneity research area.
△ Less
Submitted 2 March, 2015;
originally announced March 2015.
-
An Efficient Algorithm for Upper Bound on the Partition Function of Nucleic Acids
Authors:
Hamidreza Chitsaz,
Elmirasadat Forouzmand,
Gholamreza Haffari
Abstract:
It has been shown that minimum free energy structure for RNAs and RNA-RNA interaction is often incorrect due to inaccuracies in the energy parameters and inherent limitations of the energy model. In contrast, ensemble based quantities such as melting temperature and equilibrium concentrations can be more reliably predicted. Even structure prediction by sampling from the ensemble and clustering tho…
▽ More
It has been shown that minimum free energy structure for RNAs and RNA-RNA interaction is often incorrect due to inaccuracies in the energy parameters and inherent limitations of the energy model. In contrast, ensemble based quantities such as melting temperature and equilibrium concentrations can be more reliably predicted. Even structure prediction by sampling from the ensemble and clustering those structures by Sfold [7] has proven to be more reliable than minimum free energy structure prediction. The main obstacle for ensemble based approaches is the computational complexity of the partition function and base pairing probabilities. For instance, the space complexity of the partition function for RNA-RNA interaction is $O(n^4)$ and the time complexity is $O(n^6)$ which are prohibitively large [4,12]. Our goal in this paper is to give a fast algorithm, based on sparse folding, to calculate an upper bound on the partition function. Our work is based on the recent algorithm of Hazan and Jaakkola [10]. The space complexity of our algorithm is the same as that of sparse folding algorithms, and the time complexity of our algorithm is $O(MFE(n)\ell)$ for single RNA and $O(MFE(m, n)\ell)$ for RNA-RNA interaction in practice, in which $MFE$ is the running time of sparse folding and $\ell \leq n$ ($\ell \leq n + m$) is a sequence dependent parameter.
△ Less
Submitted 8 January, 2013;
originally announced January 2013.
-
Analysis of Semi-Supervised Learning with the Yarowsky Algorithm
Authors:
Gholam Reza Haffari,
Anoop Sarkar
Abstract:
The Yarowsky algorithm is a rule-based semi-supervised learning algorithm that has been successfully applied to some problems in computational linguistics. The algorithm was not mathematically well understood until (Abney 2004) which analyzed some specific variants of the algorithm, and also proposed some new algorithms for bootstrapping. In this paper, we extend Abney's work and show that some of…
▽ More
The Yarowsky algorithm is a rule-based semi-supervised learning algorithm that has been successfully applied to some problems in computational linguistics. The algorithm was not mathematically well understood until (Abney 2004) which analyzed some specific variants of the algorithm, and also proposed some new algorithms for bootstrapping. In this paper, we extend Abney's work and show that some of his proposed algorithms actually optimize (an upper-bound on) an objective function based on a new definition of cross-entropy which is based on a particular instantiation of the Bregman distance between probability distributions. Moreover, we suggest some new algorithms for rule-based semi-supervised learning and show connections with harmonic functions and minimum multi-way cuts in graph-based semi-supervised learning.
△ Less
Submitted 20 June, 2012;
originally announced June 2012.