subscribe to arXiv mailings

WISER: Weak supervISion and supErvised Representation learning to improve drug response prediction in cancer

Authors: Kumar Shubham, Aishwarya Jayagopal, Syed Mohammed Danish, Prathosh AP, Vaibhav Rajan

Abstract: Cancer, a leading cause of death globally, occurs due to genomic changes and manifests heterogeneously across patients. To advance research on personalized treatment strategies, the effectiveness of various drugs on cells derived from cancers (`cell lines') is experimentally determined in laboratory settings. Nevertheless, variations in the distribution of genomic data and drug responses between c… ▽ More Cancer, a leading cause of death globally, occurs due to genomic changes and manifests heterogeneously across patients. To advance research on personalized treatment strategies, the effectiveness of various drugs on cells derived from cancers (`cell lines') is experimentally determined in laboratory settings. Nevertheless, variations in the distribution of genomic data and drug responses between cell lines and humans arise due to biological and environmental differences. Moreover, while genomic profiles of many cancer patients are readily available, the scarcity of corresponding drug response data limits the ability to train machine learning models that can predict drug response in patients effectively. Recent cancer drug response prediction methods have largely followed the paradigm of unsupervised domain-invariant representation learning followed by a downstream drug response classification step. Introducing supervision in both stages is challenging due to heterogeneous patient response to drugs and limited drug response data. This paper addresses these challenges through a novel representation learning method in the first phase and weak supervision in the second. Experimental results on real patient data demonstrate the efficacy of our method (WISER) over state-of-the-art alternatives on predicting personalized drug response. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: ICML 2024

arXiv:2402.10551 [pdf, other]

Personalised Drug Identifier for Cancer Treatment with Transformers using Auxiliary Information

Authors: Aishwarya Jayagopal, Hansheng Xue, Ziyang He, Robert J. Walsh, Krishna Kumar Hariprasannan, David Shao Peng Tan, Tuan Zea Tan, Jason J. Pitt, Anand D. Jeyasekharan, Vaibhav Rajan

Abstract: Cancer remains a global challenge due to its growing clinical and economic burden. Its uniquely personal manifestation, which makes treatment difficult, has fuelled the quest for personalized treatment strategies. Thus, genomic profiling is increasingly becoming part of clinical diagnostic panels. Effective use of such panels requires accurate drug response prediction (DRP) models, which are chall… ▽ More Cancer remains a global challenge due to its growing clinical and economic burden. Its uniquely personal manifestation, which makes treatment difficult, has fuelled the quest for personalized treatment strategies. Thus, genomic profiling is increasingly becoming part of clinical diagnostic panels. Effective use of such panels requires accurate drug response prediction (DRP) models, which are challenging to build due to limited labelled patient data. Previous methods to address this problem have used various forms of transfer learning. However, they do not explicitly model the variable length sequential structure of the list of mutations in such diagnostic panels. Further, they do not utilize auxiliary information (like patient survival) for model training. We address these limitations through a novel transformer based method, which surpasses the performance of state-of-the-art DRP models on benchmark data. We also present the design of a treatment recommendation system (TRS), which is currently deployed at the National University Hospital, Singapore and is being evaluated in a clinical trial. △ Less

Submitted 16 February, 2024; originally announced February 2024.

arXiv:2402.10229 [pdf, other]

Mixture-Models: a one-stop Python Library for Model-based Clustering using various Mixture Models

Authors: Siva Rajesh Kasa, Hu Yijie, Santhosh Kumar Kasa, Vaibhav Rajan

Abstract: \texttt{Mixture-Models} is an open-source Python library for fitting Gaussian Mixture Models (GMM) and their variants, such as Parsimonious GMMs, Mixture of Factor Analyzers, MClust models, Mixture of Student's t distributions, etc. It streamlines the implementation and analysis of these models using various first/second order optimization routines such as Gradient Descent and Newton-CG through au… ▽ More \texttt{Mixture-Models} is an open-source Python library for fitting Gaussian Mixture Models (GMM) and their variants, such as Parsimonious GMMs, Mixture of Factor Analyzers, MClust models, Mixture of Student's t distributions, etc. It streamlines the implementation and analysis of these models using various first/second order optimization routines such as Gradient Descent and Newton-CG through automatic differentiation (AD) tools. This helps in extending these models to high-dimensional data, which is first of its kind among Python libraries. The library provides user-friendly model evaluation tools, such as BIC, AIC, and log-likelihood estimation. The source-code is licensed under MIT license and can be accessed at \url{https://github.com/kasakh/Mixture-Models}. The package is highly extensible, allowing users to incorporate new distributions and optimization techniques with ease. We conduct a large scale simulation to compare the performance of various gradient based approaches against Expectation Maximization on a wide range of settings and identify the corresponding best suited approach. △ Less

Submitted 8 February, 2024; originally announced February 2024.

arXiv:2401.12085 [pdf, other]

Consistency Based Unsupervised Self-training For ASR Personalisation

Authors: Jisi Zhang, Vandana Rajan, Haaris Mehmood, David Tuckey, Pablo Peso Parada, Md Asif Jalal, Karthikeyan Saravanan, Gil Ho Lee, Jungin Lee, Seokyeong Jung

Abstract: On-device Automatic Speech Recognition (ASR) models trained on speech data of a large population might underperform for individuals unseen during training. This is due to a domain shift between user data and the original training data, differed by user's speaking characteristics and environmental acoustic conditions. ASR personalisation is a solution that aims to exploit user data to improve model… ▽ More On-device Automatic Speech Recognition (ASR) models trained on speech data of a large population might underperform for individuals unseen during training. This is due to a domain shift between user data and the original training data, differed by user's speaking characteristics and environmental acoustic conditions. ASR personalisation is a solution that aims to exploit user data to improve model robustness. The majority of ASR personalisation methods assume labelled user data for supervision. Personalisation without any labelled data is challenging due to limited data size and poor quality of recorded audio samples. This work addresses unsupervised personalisation by developing a novel consistency based training method via pseudo-labelling. Our method achieves a relative Word Error Rate Reduction (WERR) of 17.3% on unlabelled training data and 8.1% on held-out data compared to a pre-trained model, and outperforms the current state-of-the art methods. △ Less

Submitted 22 January, 2024; originally announced January 2024.

Comments: Accepted for IEEE ASRU 2023

arXiv:2401.03181 [pdf]

A Joint-Reasoning based Disease Q&A System

Authors: Prakash Chandra Sukhwal, Vaibhav Rajan, Atreyi Kankanhalli

Abstract: Medical question answer (QA) assistants respond to lay users' health-related queries by synthesizing information from multiple sources using natural language processing and related techniques. They can serve as vital tools to alleviate issues of misinformation, information overload, and complexity of medical language, thus addressing lay users' information needs while reducing the burden on health… ▽ More Medical question answer (QA) assistants respond to lay users' health-related queries by synthesizing information from multiple sources using natural language processing and related techniques. They can serve as vital tools to alleviate issues of misinformation, information overload, and complexity of medical language, thus addressing lay users' information needs while reducing the burden on healthcare professionals. QA systems, the engines of such assistants, have typically used either language models (LMs) or knowledge graphs (KG), though the approaches could be complementary. LM-based QA systems excel at understanding complex questions and providing well-formed answers, but are prone to factual mistakes. KG-based QA systems, which represent facts well, are mostly limited to answering short-answer questions with pre-created templates. While a few studies have jointly used LM and KG approaches for text-based QA, this was done to answer multiple-choice questions. Extant QA systems also have limitations in terms of automation and performance. We address these challenges by designing a novel, automated disease QA system which effectively utilizes both LM and KG techniques through a joint-reasoning approach to answer disease-related questions appropriate for lay users. Our evaluation of the system using a range of quality metrics demonstrates its efficacy over benchmark systems, including the popular ChatGPT. △ Less

Submitted 6 January, 2024; originally announced January 2024.

Comments: 36 pages, 6 figures, submitted to TMIS on 14 July 2023 (status: under review)

arXiv:2210.12158 [pdf, other]

Graph Coloring via Neural Networks for Haplotype Assembly and Viral Quasispecies Reconstruction

Authors: Hansheng Xue, Vaibhav Rajan, Yu Lin

Abstract: Understanding genetic variation, e.g., through mutations, in organisms is crucial to unravel their effects on the environment and human health. A fundamental characterization can be obtained by solving the haplotype assembly problem, which yields the variation across multiple copies of chromosomes. Variations among fast evolving viruses that lead to different strains (called quasispecies) are also… ▽ More Understanding genetic variation, e.g., through mutations, in organisms is crucial to unravel their effects on the environment and human health. A fundamental characterization can be obtained by solving the haplotype assembly problem, which yields the variation across multiple copies of chromosomes. Variations among fast evolving viruses that lead to different strains (called quasispecies) are also deciphered with similar approaches. In both these cases, high-throughput sequencing technologies that provide oversampled mixtures of large noisy fragments (reads) of genomes, are used to infer constituent components (haplotypes or quasispecies). The problem is harder for polyploid species where there are more than two copies of chromosomes. State-of-the-art neural approaches to solve this NP-hard problem do not adequately model relations among the reads that are important for deconvolving the input signal. We address this problem by developing a new method, called NeurHap, that combines graph representation learning with combinatorial optimization. Our experiments demonstrate substantially better performance of NeurHap in real and synthetic datasets compared to competing approaches. △ Less

Submitted 21 October, 2022; originally announced October 2022.

Comments: Accepted by NeurIPS 2022

arXiv:2202.09263 [pdf, other]

Is Cross-Attention Preferable to Self-Attention for Multi-Modal Emotion Recognition?

Authors: Vandana Rajan, Alessio Brutti, Andrea Cavallaro

Abstract: Humans express their emotions via facial expressions, voice intonation and word choices. To infer the nature of the underlying emotion, recognition models may use a single modality, such as vision, audio, and text, or a combination of modalities. Generally, models that fuse complementary information from multiple modalities outperform their uni-modal counterparts. However, a successful model that… ▽ More Humans express their emotions via facial expressions, voice intonation and word choices. To infer the nature of the underlying emotion, recognition models may use a single modality, such as vision, audio, and text, or a combination of modalities. Generally, models that fuse complementary information from multiple modalities outperform their uni-modal counterparts. However, a successful model that fuses modalities requires components that can effectively aggregate task-relevant information from each modality. As cross-modal attention is seen as an effective mechanism for multi-modal fusion, in this paper we quantify the gain that such a mechanism brings compared to the corresponding self-attention mechanism. To this end, we implement and compare a cross-attention and a self-attention model. In addition to attention, each model uses convolutional layers for local feature extraction and recurrent layers for global sequential modelling. We compare the models using different modality combinations for a 7-class emotion classification task using the IEMOCAP dataset. Experimental results indicate that albeit both models improve upon the state-of-the-art in terms of weighted and unweighted accuracy for tri- and bi-modal configurations, their performance is generally statistically comparable. The code to replicate the experiments is available at https://github.com/smartcameras/SelfCrossAttn △ Less

Submitted 18 February, 2022; originally announced February 2022.

Comments: Accepted at ICASSP 2022

arXiv:2201.06344 [pdf, other]

ExpertNet: A Symbiosis of Classification and Clustering

Authors: Shivin Srivastava, Kenji Kawaguchi, Vaibhav Rajan

Abstract: A widely used paradigm to improve the generalization performance of high-capacity neural models is through the addition of auxiliary unsupervised tasks during supervised training. Tasks such as similarity matching and input reconstruction have been shown to provide a beneficial regularizing effect by guiding representation learning. Real data often has complex underlying structures and may be comp… ▽ More A widely used paradigm to improve the generalization performance of high-capacity neural models is through the addition of auxiliary unsupervised tasks during supervised training. Tasks such as similarity matching and input reconstruction have been shown to provide a beneficial regularizing effect by guiding representation learning. Real data often has complex underlying structures and may be composed of heterogeneous subpopulations that are not learned well with current approaches. In this work, we design ExpertNet, which uses novel training strategies to learn clustered latent representations and leverage them by effectively combining cluster-specific classifiers. We theoretically analyze the effect of clustering on its generalization gap, and empirically show that clustered latent representations from ExpertNet lead to disentangling the intrinsic structure and improvement in classification performance. ExpertNet also meets an important real-world need where classifiers need to be tailored for distinct subpopulations, such as in clinical risk models. We demonstrate the superiority of ExpertNet over state-of-the-art methods on 6 large clinical datasets, where our approach leads to valuable insights on group-specific risks. △ Less

Submitted 17 January, 2022; originally announced January 2022.

Comments: 16 pages, 3 figures

arXiv:2112.11696 [pdf, other]

RepBin: Constraint-based Graph Representation Learning for Metagenomic Binning

Authors: Hansheng Xue, Vijini Mallawaarachchi, Yujia Zhang, Vaibhav Rajan, Yu Lin

Abstract: Mixed communities of organisms are found in many environments (from the human gut to marine ecosystems) and can have profound impact on human health and the environment. Metagenomics studies the genomic material of such communities through high-throughput sequencing that yields DNA subsequences for subsequent analysis. A fundamental problem in the standard workflow, called binning, is to discover… ▽ More Mixed communities of organisms are found in many environments (from the human gut to marine ecosystems) and can have profound impact on human health and the environment. Metagenomics studies the genomic material of such communities through high-throughput sequencing that yields DNA subsequences for subsequent analysis. A fundamental problem in the standard workflow, called binning, is to discover clusters, of genomic subsequences, associated with the unknown constituent organisms. Inherent noise in the subsequences, various biological constraints that need to be imposed on them and the skewed cluster size distribution exacerbate the difficulty of this unsupervised learning problem. In this paper, we present a new formulation using a graph where the nodes are subsequences and edges represent homophily information. In addition, we model biological constraints providing heterophilous signal about nodes that cannot be clustered together. We solve the binning problem by developing new algorithms for (i) graph representation learning that preserves both homophily relations and heterophily constraints (ii) constraint-based graph clustering method that addresses the problems of skewed cluster size distribution. Extensive experiments, on real and synthetic datasets, demonstrate that our approach, called RepBin, outperforms a wide variety of competing methods. Our constraint-based graph representation learning and clustering methods, that may be useful in other domains as well, advance the state-of-the-art in both metagenomics binning and graph representation learning. △ Less

Submitted 22 December, 2021; originally announced December 2021.

Comments: Accepted by AAAI-2022

arXiv:2109.13164 [pdf, other]

Multi-way Clustering and Discordance Analysis through Deep Collective Matrix Tri-Factorization

Authors: Ragunathan Mariappan, Vaibhav Rajan

Abstract: Heterogeneous multi-typed, multimodal relational data is increasingly available in many domains and their exploratory analysis poses several challenges. We advance the state-of-the-art in neural unsupervised learning to analyze such data. We design the first neural method for collective matrix tri-factorization of arbitrary collections of matrices to perform spectral clustering of all constituent… ▽ More Heterogeneous multi-typed, multimodal relational data is increasingly available in many domains and their exploratory analysis poses several challenges. We advance the state-of-the-art in neural unsupervised learning to analyze such data. We design the first neural method for collective matrix tri-factorization of arbitrary collections of matrices to perform spectral clustering of all constituent entities and learn cluster associations. Experiments on benchmark datasets demonstrate its efficacy over previous non-neural approaches. Leveraging signals from multi-way clustering and collective matrix completion we design a unique technique, called Discordance Analysis, to reveal information discrepancies across subsets of matrices in a collection with respect to two entities. We illustrate its utility in quality assessment of knowledge bases and in improving representation learning. △ Less

Submitted 27 September, 2021; originally announced September 2021.

arXiv:2108.00597 [pdf, other]

Exact Pareto Optimal Search for Multi-Task Learning and Multi-Criteria Decision-Making

Authors: Debabrata Mahapatra, Vaibhav Rajan

Abstract: Given multiple non-convex objective functions and objective-specific weights, Chebyshev scalarization (CS) is a well-known approach to obtain an Exact Pareto Optimal (EPO), i.e., a solution on the Pareto front (PF) that intersects the ray defined by the inverse of the weights. First-order optimizers that use the CS formulation to find EPO solutions encounter practical problems of oscillations and… ▽ More Given multiple non-convex objective functions and objective-specific weights, Chebyshev scalarization (CS) is a well-known approach to obtain an Exact Pareto Optimal (EPO), i.e., a solution on the Pareto front (PF) that intersects the ray defined by the inverse of the weights. First-order optimizers that use the CS formulation to find EPO solutions encounter practical problems of oscillations and stagnation that affect convergence. Moreover, when initialized with a PO solution, they do not guarantee a controlled trajectory that lies completely on the PF. These shortcomings lead to modeling limitations and computational inefficiency in multi-task learning (MTL) and multi-criteria decision-making (MCDM) methods that utilize CS for their underlying non-convex multi-objective optimization (MOO). To address these shortcomings, we design a new MOO method, EPO Search. We prove that EPO Search converges to an EPO solution and empirically illustrate its computational efficiency and robustness to initialization. When initialized on the PF, EPO Search can trace the PF and converge to the required EPO solution at a linear rate of convergence. Using EPO Search we develop new algorithms: PESA-EPO for approximating the PF in a posteriori MCDM, and GP-EPO for preference elicitation in interactive MCDM; experiments on benchmark datasets confirm their advantages over competing alternatives. EPO Search scales linearly with the number of decision variables which enables its use for training deep networks. Empirical results on real data from personalized medicine, e-commerce and hydrometeorology demonstrate the efficacy of EPO Search for deep MTL. △ Less

Submitted 17 September, 2023; v1 submitted 1 August, 2021; originally announced August 2021.

arXiv:2102.11872 [pdf, other]

Clustering Aware Classification for Risk Prediction and Subtyping in Clinical Data

Authors: Shivin Srivastava, Siddharth Bhatia, Lingxiao Huang, Lim Jun Heng, Kenji Kawaguchi, Vaibhav Rajan

Abstract: In data containing heterogeneous subpopulations, classification performance benefits from incorporating the knowledge of cluster structure in the classifier. Previous methods for such combined clustering and classification either 1) are classifier-specific and not generic, or 2) independently perform clustering and classifier training, which may not form clusters that can potentially benefit class… ▽ More In data containing heterogeneous subpopulations, classification performance benefits from incorporating the knowledge of cluster structure in the classifier. Previous methods for such combined clustering and classification either 1) are classifier-specific and not generic, or 2) independently perform clustering and classifier training, which may not form clusters that can potentially benefit classifier performance. The question of how to perform clustering to improve the performance of classifiers trained on the clusters has received scant attention in previous literature, despite its importance in several real-world applications. In this paper, first, we theoretically analyze the generalization performance of classifiers trained on clustered data and find conditions under which clustering can potentially aid classification. This motivates the design of a simple k-means-based classification algorithm called Clustering Aware Classification (CAC) and its neural variant {DeepCAC}. DeepCAC effectively leverages deep representation learning to learn latent embeddings and finds clusters in a manner that make the clustered data suitable for training classifiers for each underlying subpopulation. Our experiments on synthetic and real benchmark datasets demonstrate the efficacy of DeepCAC over previous methods for combined clustering and classification. △ Less

Submitted 3 January, 2023; v1 submitted 23 February, 2021; originally announced February 2021.

Comments: 19 Pages, 5 figures

arXiv:2102.06371 [pdf, other]

Multiplex Bipartite Network Embedding using Dual Hypergraph Convolutional Networks

Authors: Hansheng Xue, Luwei Yang, Vaibhav Rajan, Wen Jiang, Yi Wei, Yu Lin

Abstract: A bipartite network is a graph structure where nodes are from two distinct domains and only inter-domain interactions exist as edges. A large number of network embedding methods exist to learn vectorial node representations from general graphs with both homogeneous and heterogeneous node and edge types, including some that can specifically model the distinct properties of bipartite networks. Howev… ▽ More A bipartite network is a graph structure where nodes are from two distinct domains and only inter-domain interactions exist as edges. A large number of network embedding methods exist to learn vectorial node representations from general graphs with both homogeneous and heterogeneous node and edge types, including some that can specifically model the distinct properties of bipartite networks. However, these methods are inadequate to model multiplex bipartite networks (e.g., in e-commerce), that have multiple types of interactions (e.g., click, inquiry, and buy) and node attributes. Most real-world multiplex bipartite networks are also sparse and have imbalanced node distributions that are challenging to model. In this paper, we develop an unsupervised Dual HyperGraph Convolutional Network (DualHGCN) model that scalably transforms the multiplex bipartite network into two sets of homogeneous hypergraphs and uses spectral hypergraph convolutional operators, along with intra- and inter-message passing strategies to promote information exchange within and across domains, to learn effective node embedding. We benchmark DualHGCN using four real-world datasets on link prediction and node classification tasks. Our extensive experiments demonstrate that DualHGCN significantly outperforms state-of-the-art methods, and is robust to varying sparsity levels and imbalanced node distributions. △ Less

Submitted 12 February, 2021; originally announced February 2021.

Comments: The Web Conference (formerly WWW) 2021

arXiv:2011.01631 [pdf, other]

Robust Latent Representations via Cross-Modal Translation and Alignment

Authors: Vandana Rajan, Alessio Brutti, Andrea Cavallaro

Abstract: Multi-modal learning relates information across observation modalities of the same physical phenomenon to leverage complementary information. Most multi-modal machine learning methods require that all the modalities used for training are also available for testing. This is a limitation when the signals from some modalities are unavailable or are severely degraded by noise. To address this limitati… ▽ More Multi-modal learning relates information across observation modalities of the same physical phenomenon to leverage complementary information. Most multi-modal machine learning methods require that all the modalities used for training are also available for testing. This is a limitation when the signals from some modalities are unavailable or are severely degraded by noise. To address this limitation, we aim to improve the testing performance of uni-modal systems using multiple modalities during training only. The proposed multi-modal training framework uses cross-modal translation and correlation-based latent space alignment to improve the representations of the weaker modalities. The translation from the weaker to the stronger modality generates a multi-modal intermediate encoding that is representative of both modalities. This encoding is then correlated with the stronger modality representations in a shared latent space. We validate the proposed approach on the AVEC 2016 dataset for continuous emotion recognition and show the effectiveness of the approach that achieves state-of-the-art (uni-modal) performance for weaker modalities. △ Less

Submitted 8 March, 2021; v1 submitted 3 November, 2020; originally announced November 2020.

Journal ref: ICASSP 2021

arXiv:2009.05805 [pdf, other]

Multi-way Spectral Clustering of Augmented Multi-view Data through Deep Collective Matrix Tri-factorization

Authors: Ragunathan Mariappan, Siva Rajesh Kasa, Vaibhav Rajan

Abstract: We present the first deep learning based architecture for collective matrix tri-factorization (DCMTF) of arbitrary collections of matrices, also known as augmented multi-view data. DCMTF can be used for multi-way spectral clustering of heterogeneous collections of relational data matrices to discover latent clusters in each input matrix, across both dimensions, as well as the strengths of associat… ▽ More We present the first deep learning based architecture for collective matrix tri-factorization (DCMTF) of arbitrary collections of matrices, also known as augmented multi-view data. DCMTF can be used for multi-way spectral clustering of heterogeneous collections of relational data matrices to discover latent clusters in each input matrix, across both dimensions, as well as the strengths of association across clusters. The source code for DCMTF is available on our public repository: https://bitbucket.org/cdal/dcmtf_generic △ Less

Submitted 24 January, 2022; v1 submitted 12 September, 2020; originally announced September 2020.

arXiv:2007.12786 [pdf, other]

Model-based Clustering using Automatic Differentiation: Confronting Misspecification and High-Dimensional Data

Authors: Siva Rajesh Kasa, Vaibhav Rajan

Abstract: We study two practically important cases of model based clustering using Gaussian Mixture Models: (1) when there is misspecification and (2) on high dimensional data, in the light of recent advances in Gradient Descent (GD) based optimization using Automatic Differentiation (AD). Our simulation studies show that EM has better clustering performance, measured by Adjusted Rand Index, compared to GD… ▽ More We study two practically important cases of model based clustering using Gaussian Mixture Models: (1) when there is misspecification and (2) on high dimensional data, in the light of recent advances in Gradient Descent (GD) based optimization using Automatic Differentiation (AD). Our simulation studies show that EM has better clustering performance, measured by Adjusted Rand Index, compared to GD in cases of misspecification, whereas on high dimensional data GD outperforms EM. We observe that both with EM and GD there are many solutions with high likelihood but poor cluster interpretation. To address this problem we design a new penalty term for the likelihood based on the Kullback Leibler divergence between pairs of fitted components. Closed form expressions for the gradients of this penalized likelihood are difficult to derive but AD can be done effortlessly, illustrating the advantage of AD-based optimization. Extensions of this penalty for high dimensional data and for model selection are discussed. Numerical experiments on synthetic and real datasets demonstrate the efficacy of clustering using the proposed penalized likelihood approach. △ Less

Submitted 8 July, 2020; originally announced July 2020.

arXiv:1901.02209 [pdf, other]

Subset Feedback Vertex Set in Chordal and Split Graphs

Authors: Geevarghese Philip, Varun Rajan, Saket Saurabh, Prafullkumar Tale

Abstract: In the \textsc{Subset Feedback Vertex Set (Subset-FVS)} problem the input is a graph $G$, a subset $T$ of vertices of $G$ called the `terminal' vertices, and an integer $k$. The task is to determine whether there exists a subset of vertices of cardinality at most $k$ which together intersect all cycles which pass through the terminals. \textsc{Subset-FVS} generalizes several well studied prob… ▽ More In the \textsc{Subset Feedback Vertex Set (Subset-FVS)} problem the input is a graph $G$, a subset $T$ of vertices of $G$ called the `terminal' vertices, and an integer $k$. The task is to determine whether there exists a subset of vertices of cardinality at most $k$ which together intersect all cycles which pass through the terminals. \textsc{Subset-FVS} generalizes several well studied problems including \textsc{Feedback Vertex Set} and \textsc{Multiway Cut}. This problem is known to be \NP-Complete even in split graphs. Cygan et al. proved that \textsc{Subset-FVS} is fixed parameter tractable (\FPT) in general graphs when parameterized by $k$ [SIAM J. Discrete Math (2013)]. In split graphs a simple observation reduces the problem to an equivalent instance of the $3$-\textsc{Hitting Set} problem with same solution size. This directly implies, for \textsc{Subset-FVS} \emph{restricted to split graphs}, (i) an \FPT algorithm which solves the problem in $\OhStar(2.076^k)$ time \footnote{The $\OhStar()$ notation hides polynomial factors.}% for \textsc{Subset-FVS} in Chordal % Graphs [Wahlström, Ph.D. Thesis], and (ii) a kernel of size $\mathcal{O}(k^3)$. We improve both these results for \textsc{Subset-FVS} on split graphs; we derive (i) a kernel of size $\mathcal{O}(k^2)$ which is the best possible unless $\NP \subseteq \coNP/{\sf poly}$, and (ii) an algorithm which solves the problem in time $\mathcal{O}^*(2^k)$. Our algorithm, in fact, solves \textsc{Subset-FVS} on the more general class of \emph{chordal graphs}, also in $\mathcal{O}^*(2^k)$ time. △ Less

Submitted 8 January, 2019; originally announced January 2019.

arXiv:1811.12640 [pdf, other]

Inferring Concept Prerequisite Relations from Online Educational Resources

Authors: Sudeshna Roy, Meghana Madhyastha, Sheril Lawrence, Vaibhav Rajan

Abstract: The Internet has rich and rapidly increasing sources of high quality educational content. Inferring prerequisite relations between educational concepts is required for modern large-scale online educational technology applications such as personalized recommendations and automatic curriculum creation. We present PREREQ, a new supervised learning method for inferring concept prerequisite relations.… ▽ More The Internet has rich and rapidly increasing sources of high quality educational content. Inferring prerequisite relations between educational concepts is required for modern large-scale online educational technology applications such as personalized recommendations and automatic curriculum creation. We present PREREQ, a new supervised learning method for inferring concept prerequisite relations. PREREQ is designed using latent representations of concepts obtained from the Pairwise Latent Dirichlet Allocation model, and a neural network based on the Siamese network architecture. PREREQ can learn unknown concept prerequisites from course prerequisites and labeled concept prerequisite data. It outperforms state-of-the-art approaches on benchmark datasets and can effectively learn from very less training data. PREREQ can also use unlabeled video playlists, a steadily growing source of training data, to learn concept prerequisites, thus obviating the need for manual annotation of course prerequisites. △ Less

Submitted 22 January, 2019; v1 submitted 30 November, 2018; originally announced November 2018.

Comments: Accepted at the AAAI Conference on Innovative Applications of Artificial Intelligence (IAAI-19)

arXiv:1811.11427 [pdf, other]

doi 10.1007/s10994-019-05801-6

Deep Collective Matrix Factorization for Augmented Multi-View Learning

Authors: Ragunathan Mariappan, Vaibhav Rajan

Abstract: Learning by integrating multiple heterogeneous data sources is a common requirement in many tasks. Collective Matrix Factorization (CMF) is a technique to learn shared latent representations from arbitrary collections of matrices. It can be used to simultaneously complete one or more matrices, for predicting the unknown entries. Classical CMF methods assume linearity in the interaction of latent f… ▽ More Learning by integrating multiple heterogeneous data sources is a common requirement in many tasks. Collective Matrix Factorization (CMF) is a technique to learn shared latent representations from arbitrary collections of matrices. It can be used to simultaneously complete one or more matrices, for predicting the unknown entries. Classical CMF methods assume linearity in the interaction of latent factors which can be restrictive and fails to capture complex non-linear interactions. In this paper, we develop the first deep-learning based method, called dCMF, for unsupervised learning of multiple shared representations, that can model such non-linear interactions, from an arbitrary collection of matrices. We address optimization challenges that arise due to dependencies between shared representations through Multi-Task Bayesian Optimization and design an acquisition function adapted for collective learning of hyperparameters. Our experiments show that dCMF significantly outperforms previous CMF algorithms in integrating heterogeneous data for predictive modeling. Further, on two tasks - recommendation and prediction of gene-disease association - dCMF outperforms state-of-the-art matrix completion algorithms that can utilize auxiliary sources of information. △ Less

Submitted 15 April, 2019; v1 submitted 28 November, 2018; originally announced November 2018.

arXiv:1602.07280 [pdf, other]

A Statistical Model for Stroke Outcome Prediction and Treatment Planning

Authors: Abhishek Sengupta, Vaibhav Rajan, Sakyajit Bhattacharya, G R K Sarma

Abstract: Stroke is a major cause of mortality and long--term disability in the world. Predictive outcome models in stroke are valuable for personalized treatment, rehabilitation planning and in controlled clinical trials. In this paper we design a new model to predict outcome in the short-term, the putative therapeutic window for several treatments. Our regression-based model has a parametric form that is… ▽ More Stroke is a major cause of mortality and long--term disability in the world. Predictive outcome models in stroke are valuable for personalized treatment, rehabilitation planning and in controlled clinical trials. In this paper we design a new model to predict outcome in the short-term, the putative therapeutic window for several treatments. Our regression-based model has a parametric form that is designed to address many challenges common in medical datasets like highly correlated variables and class imbalance. Empirically our model outperforms the best--known previous models in predicting short--term outcomes and in inferring the most effective treatments that improve outcome. △ Less

Submitted 22 February, 2016; originally announced February 2016.

arXiv:1501.01894 [pdf]

Quantifying Scripts: Defining metrics of characters for quantitative and descriptive analysis

Authors: Vinodh Rajan

Abstract: Analysis of scripts plays an important role in paleography and in quantitative linguistics. Especially in the field of digital paleography quantitative features are much needed to differentiate glyphs. We describe an elaborate set of metrics that quantify qualitative information contained in characters and hence indirectly also quantify the scribal features. We broadly divide the metrics into seve… ▽ More Analysis of scripts plays an important role in paleography and in quantitative linguistics. Especially in the field of digital paleography quantitative features are much needed to differentiate glyphs. We describe an elaborate set of metrics that quantify qualitative information contained in characters and hence indirectly also quantify the scribal features. We broadly divide the metrics into several categories and describe each individual metric with its underlying qualitative significance. The metrics are largely derived from the related area of gesture design and recognition. We also propose several novel metrics. The proposed metrics are soundly grounded on the principles of handwriting production and handwriting analysis. These computed metrics could serve as descriptors for scripts and also be used for comparing and analyzing scripts. We illustrate some quantitative analysis based on the proposed metrics by applying it to the paleographic evolution of the medieval Tamil script from Brahmi. We also outline future work. △ Less

Submitted 8 January, 2015; originally announced January 2015.

Comments: Manuscript submitted to Literary and Linguistic Computing Journal

arXiv:1203.3519 [pdf]

Bayesian Inference in Monte-Carlo Tree Search

Authors: Gerald Tesauro, V T Rajan, Richard Segal

Abstract: Monte-Carlo Tree Search (MCTS) methods are drawing great interest after yielding breakthrough results in computer Go. This paper proposes a Bayesian approach to MCTS that is inspired by distributionfree approaches such as UCT [13], yet significantly differs in important respects. The Bayesian framework allows potentially much more accurate (Bayes-optimal) estimation of node values and node uncerta… ▽ More Monte-Carlo Tree Search (MCTS) methods are drawing great interest after yielding breakthrough results in computer Go. This paper proposes a Bayesian approach to MCTS that is inspired by distributionfree approaches such as UCT [13], yet significantly differs in important respects. The Bayesian framework allows potentially much more accurate (Bayes-optimal) estimation of node values and node uncertainties from a limited number of simulation trials. We further propose propagating inference in the tree via fast analytic Gaussian approximation methods: this can make the overhead of Bayesian inference manageable in domains such as Go, while preserving high accuracy of expected-value estimates. We find substantial empirical outperformance of UCT in an idealized bandit-tree test environment, where we can obtain valuable insights by comparing with known ground truth. Additionally we rigorously prove on-policy and off-policy convergence of the proposed methods. △ Less

Submitted 15 March, 2012; originally announced March 2012.

Comments: Appears in Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence (UAI2010)

Report number: UAI-P-2010-PG-580-588

Showing 1–22 of 22 results for author: Rajan, V