-
WellFactor: Patient Profiling using Integrative Embedding of Healthcare Data
Authors:
Dongjin Choi,
Andy Xiang,
Ozgur Ozturk,
Deep Shrestha,
Barry Drake,
Hamid Haidarian,
Faizan Javed,
Haesun Park
Abstract:
In the rapidly evolving healthcare industry, platforms now have access to not only traditional medical records, but also diverse data sets encompassing various patient interactions, such as those from healthcare web portals. To address this rich diversity of data, we introduce WellFactor: a method that derives patient profiles by integrating information from these sources. Central to our approach…
▽ More
In the rapidly evolving healthcare industry, platforms now have access to not only traditional medical records, but also diverse data sets encompassing various patient interactions, such as those from healthcare web portals. To address this rich diversity of data, we introduce WellFactor: a method that derives patient profiles by integrating information from these sources. Central to our approach is the utilization of constrained low-rank approximation. WellFactor is optimized to handle the sparsity that is often inherent in healthcare data. Moreover, by incorporating task-specific label information, our method refines the embedding results, offering a more informed perspective on patients. One important feature of WellFactor is its ability to compute embeddings for new, previously unobserved patient data instantaneously, eliminating the need to revisit the entire data set or recomputing the embedding. Comprehensive evaluations on real-world healthcare data demonstrate WellFactor's effectiveness. It produces better results compared to other existing methods in classification performance, yields meaningful clustering of patients, and delivers consistent results in patient similarity searches and predictions.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
Patient Clustering via Integrated Profiling of Clinical and Digital Data
Authors:
Dongjin Choi,
Andy Xiang,
Ozgur Ozturk,
Deep Shrestha,
Barry Drake,
Hamid Haidarian,
Faizan Javed,
Haesun Park
Abstract:
We introduce a novel profile-based patient clustering model designed for clinical data in healthcare. By utilizing a method grounded on constrained low-rank approximation, our model takes advantage of patients' clinical data and digital interaction data, including browsing and search, to construct patient profiles. As a result of the method, nonnegative embedding vectors are generated, serving as…
▽ More
We introduce a novel profile-based patient clustering model designed for clinical data in healthcare. By utilizing a method grounded on constrained low-rank approximation, our model takes advantage of patients' clinical data and digital interaction data, including browsing and search, to construct patient profiles. As a result of the method, nonnegative embedding vectors are generated, serving as a low-dimensional representation of the patients. Our model was assessed using real-world patient data from a healthcare web portal, with a comprehensive evaluation approach which considered clustering and recommendation capabilities. In comparison to other baselines, our approach demonstrated superior performance in terms of clustering coherence and recommendation accuracy.
△ Less
Submitted 22 August, 2023;
originally announced August 2023.
-
PSI Draft Specification
Authors:
Mark Reid,
James Montgomery,
Barry Drake,
Avraham Ruderman
Abstract:
This document presents the draft specification for delivering machine learning services over HTTP, developed as part of the Protocols and Structures for Inference project, which concluded in 2013. It presents the motivation for providing machine learning as a service, followed by a description of the essential and optional components of such a service.
This document presents the draft specification for delivering machine learning services over HTTP, developed as part of the Protocols and Structures for Inference project, which concluded in 2013. It presents the motivation for providing machine learning as a service, followed by a description of the essential and optional components of such a service.
△ Less
Submitted 1 May, 2022;
originally announced May 2022.
-
TopicSifter: Interactive Search Space Reduction Through Targeted Topic Modeling
Authors:
Hannah Kim,
Dongjin Choi,
Barry Drake,
Alex Endert,
Haesun Park
Abstract:
Topic modeling is commonly used to analyze and understand large document collections. However, in practice, users want to focus on specific aspects or "targets" rather than the entire corpus. For example, given a large collection of documents, users may want only a smaller subset which more closely aligns with their interests, tasks, and domains. In particular, our paper focuses on large-scale doc…
▽ More
Topic modeling is commonly used to analyze and understand large document collections. However, in practice, users want to focus on specific aspects or "targets" rather than the entire corpus. For example, given a large collection of documents, users may want only a smaller subset which more closely aligns with their interests, tasks, and domains. In particular, our paper focuses on large-scale document retrieval with high recall where any missed relevant documents can be critical. A simple keyword matching search is generally not effective nor efficient as 1) it is difficult to find a list of keyword queries that can cover the documents of interest before exploring the dataset, 2) some documents may not contain the exact keywords of interest but may still be highly relevant, and 3) some words have multiple meanings, which would result in irrelevant documents included in the retrieved subset. In this paper, we present TopicSifter, a visual analytics system for interactive search space reduction. Our system utilizes targeted topic modeling based on nonnegative matrix factorization and allows users to give relevance feedback in order to refine their target and guide the topic modeling to the most relevant results.
△ Less
Submitted 28 July, 2019;
originally announced July 2019.
-
Hybrid Clustering based on Content and Connection Structure using Joint Nonnegative Matrix Factorization
Authors:
Rundong Du,
Barry Drake,
Haesun Park
Abstract:
We present a hybrid method for latent information discovery on the data sets containing both text content and connection structure based on constrained low rank approximation. The new method jointly optimizes the Nonnegative Matrix Factorization (NMF) objective function for text clustering and the Symmetric NMF (SymNMF) objective function for graph clustering. We propose an effective algorithm for…
▽ More
We present a hybrid method for latent information discovery on the data sets containing both text content and connection structure based on constrained low rank approximation. The new method jointly optimizes the Nonnegative Matrix Factorization (NMF) objective function for text clustering and the Symmetric NMF (SymNMF) objective function for graph clustering. We propose an effective algorithm for the joint NMF objective function, based on a block coordinate descent (BCD) framework. The proposed hybrid method discovers content associations via latent connections found using SymNMF. The method can also be applied with a natural conversion of the problem when a hypergraph formulation is used or the content is associated with hypergraph edges.
Experimental results show that by simultaneously utilizing both content and connection structure, our hybrid method produces higher quality clustering results compared to the other NMF clustering methods that uses content alone (standard NMF) or connection structure alone (SymNMF). We also present some interesting applications to several types of real world data such as citation recommendations of papers. The hybrid method proposed in this paper can also be applied to general data expressed with both feature space vectors and pairwise similarities and can be extended to the case with multiple feature spaces or multiple similarity measures.
△ Less
Submitted 28 March, 2017;
originally announced March 2017.
-
Fast Clustering and Topic Modeling Based on Rank-2 Nonnegative Matrix Factorization
Authors:
Da Kuang,
Barry Drake,
Haesun Park
Abstract:
The importance of unsupervised clustering and topic modeling is well recognized with ever-increasing volumes of text data. In this paper, we propose a fast method for hierarchical clustering and topic modeling called HierNMF2. Our method is based on fast Rank-2 nonnegative matrix factorization (NMF) that performs binary clustering and an efficient node splitting rule. Further utilizing the final l…
▽ More
The importance of unsupervised clustering and topic modeling is well recognized with ever-increasing volumes of text data. In this paper, we propose a fast method for hierarchical clustering and topic modeling called HierNMF2. Our method is based on fast Rank-2 nonnegative matrix factorization (NMF) that performs binary clustering and an efficient node splitting rule. Further utilizing the final leaf nodes generated in HierNMF2 and the idea of nonnegative least squares fitting, we propose a new clustering/topic modeling method called FlatNMF2 that recovers a flat clustering/topic modeling result in a very simple yet significantly more effective way than any other existing methods. We implement highly optimized open source software in C++ for both HierNMF2 and FlatNMF2 for hierarchical and partitional clustering/topic modeling of document data sets.
Substantial experimental tests are presented that illustrate significant improvements both in computational time as well as quality of solutions. We compare our methods to other clustering methods including K-means, standard NMF, and CLUTO, and also topic modeling methods including latent Dirichlet allocation (LDA) and recently proposed algorithms for NMF with separability constraints. Overall, we present efficient tools for analyzing large-scale data sets, and techniques that can be generalized to many other data analytics problem domains.
△ Less
Submitted 2 October, 2015; v1 submitted 3 September, 2015;
originally announced September 2015.