-
Measuring Distributional Shifts in Text: The Advantage of Language Model-Based Embeddings
Authors:
Gyandev Gupta,
Bashir Rastegarpanah,
Amalendu Iyer,
Joshua Rubin,
Krishnaram Kenthapadi
Abstract:
An essential part of monitoring machine learning models in production is measuring input and output data drift. In this paper, we present a system for measuring distributional shifts in natural language data and highlight and investigate the potential advantage of using large language models (LLMs) for this problem. Recent advancements in LLMs and their successful adoption in different domains ind…
▽ More
An essential part of monitoring machine learning models in production is measuring input and output data drift. In this paper, we present a system for measuring distributional shifts in natural language data and highlight and investigate the potential advantage of using large language models (LLMs) for this problem. Recent advancements in LLMs and their successful adoption in different domains indicate their effectiveness in capturing semantic relationships for solving various natural language processing problems. The power of LLMs comes largely from the encodings (embeddings) generated in the hidden layers of the corresponding neural network. First we propose a clustering-based algorithm for measuring distributional shifts in text data by exploiting such embeddings. Then we study the effectiveness of our approach when applied to text embeddings generated by both LLMs and classical embedding algorithms. Our experiments show that general-purpose LLM-based embeddings provide a high sensitivity to data drift compared to other embedding methods. We propose drift sensitivity as an important evaluation metric to consider when comparing language models. Finally, we present insights and lessons learned from deploying our framework as part of the Fiddler ML Monitoring platform over a period of 18 months.
△ Less
Submitted 4 December, 2023;
originally announced December 2023.
-
It Is All About Data: A Survey on the Effects of Data on Adversarial Robustness
Authors:
Peiyu Xiong,
Michael Tegegn,
Jaskeerat Singh Sarin,
Shubhraneel Pal,
Julia Rubin
Abstract:
Adversarial examples are inputs to machine learning models that an attacker has intentionally designed to confuse the model into making a mistake. Such examples pose a serious threat to the applicability of machine-learning-based systems, especially in life- and safety-critical domains. To address this problem, the area of adversarial robustness investigates mechanisms behind adversarial attacks a…
▽ More
Adversarial examples are inputs to machine learning models that an attacker has intentionally designed to confuse the model into making a mistake. Such examples pose a serious threat to the applicability of machine-learning-based systems, especially in life- and safety-critical domains. To address this problem, the area of adversarial robustness investigates mechanisms behind adversarial attacks and defenses against these attacks. This survey reviews a particular subset of this literature that focuses on investigating properties of training data in the context of model robustness under evasion attacks. It first summarizes the main properties of data leading to adversarial vulnerability. It then discusses guidelines and techniques for improving adversarial robustness by enhancing the data representation and learning procedures, as well as techniques for estimating robustness guarantees given particular data. Finally, it discusses gaps of knowledge and promising future research directions in this area.
△ Less
Submitted 17 October, 2023; v1 submitted 17 March, 2023;
originally announced March 2023.
-
An Empirical Investigation into the Reproduction of Bug Reports for Android Apps
Authors:
Jack Johnson,
Junayed Mahmud,
Tyler Wendland,
Kevin Moran,
Julia Rubin,
Mattia Fazzini
Abstract:
One of the key tasks related to ensuring mobile app quality is the reporting, management, and resolution of bug reports. As such, researchers have committed considerable resources toward automating various tasks of the bug management process for mobile apps, such as reproduction and triaging. However, the success of these automated approaches is largely dictated by the characteristics and properti…
▽ More
One of the key tasks related to ensuring mobile app quality is the reporting, management, and resolution of bug reports. As such, researchers have committed considerable resources toward automating various tasks of the bug management process for mobile apps, such as reproduction and triaging. However, the success of these automated approaches is largely dictated by the characteristics and properties of the bug reports they operate upon. As such, understanding mobile app bug reports is imperative to drive the continued advancement of report management techniques. While prior studies have examined high-level statistics of large sets of reports, we currently lack an in-depth investigation of how the information typically reported in mobile app issue trackers relates to the specific details generally required to reproduce the underlying failures. In this paper, we perform an in-depth analysis of 180 reproducible bug reports systematically mined from Android apps on GitHub and investigate how the information contained in the reports relates to the task of reproducing the described bugs. In our analysis, we focus on three pieces of information: the environment needed to reproduce the bug report, the steps to reproduce (S2Rs), and the observed behavior. Focusing on this information, we characterize failure types, identify the modality used to report the information, and characterize the quality of the information within the reports. We find that bugs are reported in a multi-modal fashion, the environment is not always provided, and S2Rs often contain missing or non-specific enough information. These findings carry with them important implications on automated bug reproduction techniques as well as automated bug report management approaches more generally.
△ Less
Submitted 3 January, 2023;
originally announced January 2023.
-
Visual Auditor: Interactive Visualization for Detection and Summarization of Model Biases
Authors:
David Munechika,
Zijie J. Wang,
Jack Reidy,
Josh Rubin,
Krishna Gade,
Krishnaram Kenthapadi,
Duen Horng Chau
Abstract:
As machine learning (ML) systems become increasingly widespread, it is necessary to audit these systems for biases prior to their deployment. Recent research has developed algorithms for effectively identifying intersectional bias in the form of interpretable, underperforming subsets (or slices) of the data. However, these solutions and their insights are limited without a tool for visually unders…
▽ More
As machine learning (ML) systems become increasingly widespread, it is necessary to audit these systems for biases prior to their deployment. Recent research has developed algorithms for effectively identifying intersectional bias in the form of interpretable, underperforming subsets (or slices) of the data. However, these solutions and their insights are limited without a tool for visually understanding and interacting with the results of these algorithms. We propose Visual Auditor, an interactive visualization tool for auditing and summarizing model biases. Visual Auditor assists model validation by providing an interpretable overview of intersectional bias (bias that is present when examining populations defined by multiple features), details about relationships between problematic data slices, and a comparison between underperforming and overperforming data slices in a model. Our open-source tool runs directly in both computational notebooks and web browsers, making model auditing accessible and easily integrated into current ML development workflows. An observational user study in collaboration with domain experts at Fiddler AI highlights that our tool can help ML practitioners identify and understand model biases.
△ Less
Submitted 24 June, 2022;
originally announced June 2022.
-
Convolution-Free Waveform Transformers for Multi-Lead ECG Classification
Authors:
Annamalai Natarajan,
Gregory Boverman,
Yale Chang,
Corneliu Antonescu,
Jonathan Rubin
Abstract:
We present our entry to the 2021 PhysioNet/CinC challenge - a waveform transformer model to detect cardiac abnormalities from ECG recordings. We compare the performance of the waveform transformer model on different ECG-lead subsets using approximately 88,000 ECG recordings from six datasets. In the official rankings, team prna ranked between 9 and 15 on 12, 6, 4, 3 and 2-lead sets respectively. O…
▽ More
We present our entry to the 2021 PhysioNet/CinC challenge - a waveform transformer model to detect cardiac abnormalities from ECG recordings. We compare the performance of the waveform transformer model on different ECG-lead subsets using approximately 88,000 ECG recordings from six datasets. In the official rankings, team prna ranked between 9 and 15 on 12, 6, 4, 3 and 2-lead sets respectively. Our waveform transformer model achieved an average challenge metric of 0.47 on the held-out test set across all ECG-lead subsets. Our combined performance across all leads placed us at rank 11 out of 39 officially ranking teams.
△ Less
Submitted 29 September, 2021;
originally announced September 2021.
-
Interpretable Additive Recurrent Neural Networks For Multivariate Clinical Time Series
Authors:
Asif Rahman,
Yale Chang,
Jonathan Rubin
Abstract:
Time series models with recurrent neural networks (RNNs) can have high accuracy but are unfortunately difficult to interpret as a result of feature-interactions, temporal-interactions, and non-linear transformations. Interpretability is important in domains like healthcare where constructing models that provide insight into the relationships they have learned are required to validate and trust mod…
▽ More
Time series models with recurrent neural networks (RNNs) can have high accuracy but are unfortunately difficult to interpret as a result of feature-interactions, temporal-interactions, and non-linear transformations. Interpretability is important in domains like healthcare where constructing models that provide insight into the relationships they have learned are required to validate and trust model predictions. We want accurate time series models where users can understand the contribution of individual input features. We present the Interpretable-RNN (I-RNN) that balances model complexity and accuracy by forcing the relationship between variables in the model to be additive. Interactions are restricted between hidden states of the RNN and additively combined at the final step. I-RNN specifically captures the unique characteristics of clinical time series, which are unevenly sampled in time, asynchronously acquired, and have missing data. Importantly, the hidden state activations represent feature coefficients that correlate with the prediction target and can be visualized as risk curves that capture the global relationship between individual input features and the outcome. We evaluate the I-RNN model on the Physionet 2012 Challenge dataset to predict in-hospital mortality, and on a real-world clinical decision support task: predicting hemodynamic interventions in the intensive care unit. I-RNN provides explanations in the form of global and local feature importances comparable to highly intelligible models like decision trees trained on hand-engineered features while significantly outperforming them. I-RNN remains intelligible while providing accuracy comparable to state-of-the-art decay-based and interpolation-based recurrent time series models. The experimental results on real-world clinical datasets refute the myth that there is a tradeoff between accuracy and interpretability.
△ Less
Submitted 15 September, 2021;
originally announced September 2021.
-
AndroR2: A Dataset of Manually Reproduced Bug Reports for Android Applications
Authors:
Tyler Wendland,
Jingyang Sun,
Junayed Mahmud,
S. M. Hasan Mansur,
Steven Huang,
Kevin Moran,
Julia Rubin,
Mattia Fazzini
Abstract:
Software maintenance constitutes a large portion of the software development lifecycle. To carry out maintenance tasks, developers often need to understand and reproduce bug reports. As such, there has been increasing research activity coalescing around the notion of automating various activities related to bug reporting. A sizable portion of this research interest has focused on the domain of mob…
▽ More
Software maintenance constitutes a large portion of the software development lifecycle. To carry out maintenance tasks, developers often need to understand and reproduce bug reports. As such, there has been increasing research activity coalescing around the notion of automating various activities related to bug reporting. A sizable portion of this research interest has focused on the domain of mobile apps. However, as research around mobile app bug reporting progresses, there is a clear need for a manually vetted and reproducible set of real-world bug reports that can serve as a benchmark for future work. This paper presents ANDROR2: a dataset of 90 manually reproduced bug reports for Android apps listed on Google Play and hosted on GitHub, systematically collected via an in-depth analysis of 459 reports extracted from the GitHub issue tracker. For each reproduced report, ANDROR2 includes the original bug report, an apk file for the buggy version of the app, an executable reproduction script, and metadata regarding the quality of the reproduction steps associated with the original report. We believe that the ANDROR2 dataset can be used to facilitate research in automatically analyzing, understanding, reproducing, localizing, and fixing bugs for mobile applications as well as other software maintenance activities more broadly.
△ Less
Submitted 15 June, 2021;
originally announced June 2021.
-
Unified Shapley Framework to Explain Prediction Drift
Authors:
Aalok Shanbhag,
Avijit Ghosh,
Josh Rubin
Abstract:
Predictions are the currency of a machine learning model, and to understand the model's behavior over segments of a dataset, or over time, is an important problem in machine learning research and practice. There currently is no systematic framework to understand this drift in prediction distributions over time or between two semantically meaningful slices of data, in terms of the input features an…
▽ More
Predictions are the currency of a machine learning model, and to understand the model's behavior over segments of a dataset, or over time, is an important problem in machine learning research and practice. There currently is no systematic framework to understand this drift in prediction distributions over time or between two semantically meaningful slices of data, in terms of the input features and points. We propose GroupShapley and GroupIG (Integrated Gradients), as axiomatically justified methods to tackle this problem. In doing so, we re-frame all current feature/data importance measures based on the Shapley value as essentially problems of distributional comparisons, and unify them under a common umbrella. We axiomatize certain desirable properties of distributional difference, and study the implications of choosing them empirically.
△ Less
Submitted 15 February, 2021;
originally announced February 2021.
-
Predicting Merge Conflicts in Collaborative Software Development
Authors:
Moein Owhadi-Kareshk,
Sarah Nadi,
Julia Rubin
Abstract:
Background. During collaborative software development, developers often use branches to add features or fix bugs. When merging changes from two branches, conflicts may occur if the changes are inconsistent. Developers need to resolve these conflicts before completing the merge, which is an error-prone and time-consuming process. Early detection of merge conflicts, which warns developers about reso…
▽ More
Background. During collaborative software development, developers often use branches to add features or fix bugs. When merging changes from two branches, conflicts may occur if the changes are inconsistent. Developers need to resolve these conflicts before completing the merge, which is an error-prone and time-consuming process. Early detection of merge conflicts, which warns developers about resolving conflicts before they become large and complicated, is among the ways of dealing with this problem. Existing techniques do this by continuously pulling and merging all combinations of branches in the background to notify developers as soon as a conflict occurs, which is a computationally expensive process. One potential way for reducing this cost is to use a machine-learning based conflict predictor that filters out the merge scenarios that are not likely to have conflicts, ie safe merge scenarios. Aims. In this paper, we assess if conflict prediction is feasible. Method. We design a classifier for predicting merge conflicts, based on 9 light-weight Git feature sets. To evaluate our predictor, we perform a large-scale study on 267, 657 merge scenarios from 744 GitHub repositories in seven programming languages. Results. Our results show that we achieve high f1-scores, varying from 0.95 to 0.97 for different programming languages, when predicting safe merge scenarios. The f1-score is between 0.57 and 0.68 for the conflicting merge scenarios. Conclusions. Predicting merge conflicts is feasible in practice, especially in the context of predicting safe merge scenarios as a pre-filtering step for speculative merging.
△ Less
Submitted 14 July, 2019;
originally announced July 2019.
-
CT-To-MR Conditional Generative Adversarial Networks for Ischemic Stroke Lesion Segmentation
Authors:
Jonathan Rubin,
S. Mazdak Abulnaga
Abstract:
Infarcted brain tissue resulting from acute stroke readily shows up as hyperintense regions within diffusion-weighted magnetic resonance imaging (DWI). It has also been proposed that computed tomography perfusion (CTP) could alternatively be used to triage stroke patients, given improvements in speed and availability, as well as reduced cost. However, CTP has a lower signal to noise ratio compared…
▽ More
Infarcted brain tissue resulting from acute stroke readily shows up as hyperintense regions within diffusion-weighted magnetic resonance imaging (DWI). It has also been proposed that computed tomography perfusion (CTP) could alternatively be used to triage stroke patients, given improvements in speed and availability, as well as reduced cost. However, CTP has a lower signal to noise ratio compared to MR. In this work, we investigate whether a conditional mapping can be learned by a generative adversarial network to map CTP inputs to generated MR DWI that more clearly delineates hyperintense regions due to ischemic stroke. We detail the architectures of the generator and discriminator and describe the training process used to perform image-to-image translation from multi-modal CT perfusion maps to diffusion weighted MR outputs. We evaluate the results both qualitatively by visual comparison of generated MR to ground truth, as well as quantitatively by training fully convolutional neural networks that make use of generated MR data inputs to perform ischemic stroke lesion segmentation. Segmentation networks trained using generated CT-to-MR inputs result in at least some improvement on all metrics used for evaluation, compared with networks that only use CT perfusion input.
△ Less
Submitted 30 April, 2019;
originally announced April 2019.
-
Semi-supervised Learning for Quantification of Pulmonary Edema in Chest X-Ray Images
Authors:
Ruizhi Liao,
Jonathan Rubin,
Grace Lam,
Seth Berkowitz,
Sandeep Dalal,
William Wells,
Steven Horng,
Polina Golland
Abstract:
We propose and demonstrate machine learning algorithms to assess the severity of pulmonary edema in chest x-ray images of congestive heart failure patients. Accurate assessment of pulmonary edema in heart failure is critical when making treatment and disposition decisions. Our work is grounded in a large-scale clinical dataset of over 300,000 x-ray images with associated radiology reports. While e…
▽ More
We propose and demonstrate machine learning algorithms to assess the severity of pulmonary edema in chest x-ray images of congestive heart failure patients. Accurate assessment of pulmonary edema in heart failure is critical when making treatment and disposition decisions. Our work is grounded in a large-scale clinical dataset of over 300,000 x-ray images with associated radiology reports. While edema severity labels can be extracted unambiguously from a small fraction of the radiology reports, accurate annotation is challenging in most cases. To take advantage of the unlabeled images, we develop a Bayesian model that includes a variational auto-encoder for learning a latent representation from the entire image set trained jointly with a regressor that employs this representation for predicting pulmonary edema severity. Our experimental results suggest that modeling the distribution of images jointly with the limited labels improves the accuracy of pulmonary edema scoring compared to a strictly supervised approach. To the best of our knowledge, this is the first attempt to employ machine learning algorithms to automatically and quantitatively assess the severity of pulmonary edema in chest x-ray images.
△ Less
Submitted 9 April, 2019; v1 submitted 27 February, 2019;
originally announced February 2019.
-
Multivariate Time-series Similarity Assessment via Unsupervised Representation Learning and Stratified Locality Sensitive Hashing: Application to Early Acute Hypotensive Episode Detection
Authors:
Jwala Dhamala,
Emmanuel Azuh,
Abdullah Al-Dujaili,
Jonathan Rubin,
Una-May O'Reilly
Abstract:
Timely prediction of clinically critical events in Intensive Care Unit (ICU) is important for improving care and survival rate. Most of the existing approaches are based on the application of various classification methods on explicitly extracted statistical features from vital signals. In this work, we propose to eliminate the high cost of engineering hand-crafted features from multivariate time-…
▽ More
Timely prediction of clinically critical events in Intensive Care Unit (ICU) is important for improving care and survival rate. Most of the existing approaches are based on the application of various classification methods on explicitly extracted statistical features from vital signals. In this work, we propose to eliminate the high cost of engineering hand-crafted features from multivariate time-series of physiologic signals by learning their representation with a sequence-to-sequence auto-encoder. We then propose to hash the learned representations to enable signal similarity assessment for the prediction of critical events. We apply this methodological framework to predict Acute Hypotensive Episodes (AHE) on a large and diverse dataset of vital signal recordings. Experiments demonstrate the ability of the presented framework in accurately predicting an upcoming AHE.
△ Less
Submitted 4 December, 2018; v1 submitted 14 November, 2018;
originally announced November 2018.
-
Ischemic Stroke Lesion Segmentation in CT Perfusion Scans using Pyramid Pooling and Focal Loss
Authors:
S. Mazdak Abulnaga,
Jonathan Rubin
Abstract:
We present a fully convolutional neural network for segmenting ischemic stroke lesions in CT perfusion images for the ISLES 2018 challenge. Treatment of stroke is time sensitive and current standards for lesion identification require manual segmentation, a time consuming and challenging process. Automatic segmentation methods present the possibility of accurately identifying lesions and improving…
▽ More
We present a fully convolutional neural network for segmenting ischemic stroke lesions in CT perfusion images for the ISLES 2018 challenge. Treatment of stroke is time sensitive and current standards for lesion identification require manual segmentation, a time consuming and challenging process. Automatic segmentation methods present the possibility of accurately identifying lesions and improving treatment planning. Our model is based on the PSPNet, a network architecture that makes use of pyramid pooling to provide global and local contextual information. To learn the varying shapes of the lesions, we train our network using focal loss, a loss function designed for the network to focus on learning the more difficult samples. We compare our model to networks trained using the U-Net and V-Net architectures. Our approach demonstrates effective performance in lesion segmentation and ranked among the top performers at the challenge conclusion.
△ Less
Submitted 2 November, 2018;
originally announced November 2018.
-
Automatic Detection of Arousals during Sleep using Multiple Physiological Signals
Authors:
Saman Parvaneh,
Jonathan Rubin,
Ali Samadani,
Gajendra Katuwal
Abstract:
The visual scoring of arousals during sleep routinely conducted by sleep experts is a challenging task warranting an automatic approach. This paper presents an algorithm for automatic detection of arousals during sleep. Using the Physionet/CinC Challenge dataset, an 80-20% subject-level split was performed to create in-house training and test sets, respectively. The data for each subject in the tr…
▽ More
The visual scoring of arousals during sleep routinely conducted by sleep experts is a challenging task warranting an automatic approach. This paper presents an algorithm for automatic detection of arousals during sleep. Using the Physionet/CinC Challenge dataset, an 80-20% subject-level split was performed to create in-house training and test sets, respectively. The data for each subject in the training set was split to 30-second epochs with no overlap. A total of 428 features from EEG, EMG, EOG, airflow, and SaO2 in each epoch were extracted and used for creating subject-specific models based on an ensemble of bagged classification trees, resulting in 943 models. For marking arousal and non-arousal regions in the test set, the data in the test set was split to 30-second epochs with 50% overlaps. The average of arousal probabilities from different patient-specific models was assigned to each 30-second epoch and then a sample-wise probability vector with the same length as test data was created for model evaluation. Using the PhysioNet/CinC Challenge 2018 scoring criteria, AUPRCs of 0.25 and 0.21 were achieved for the in-house test and blind test sets, respectively.
△ Less
Submitted 5 October, 2018;
originally announced October 2018.
-
Large Scale Automated Reading of Frontal and Lateral Chest X-Rays using Dual Convolutional Neural Networks
Authors:
Jonathan Rubin,
Deepan Sanghavi,
Claire Zhao,
Kathy Lee,
Ashequl Qadir,
Minnan Xu-Wilson
Abstract:
The MIMIC-CXR dataset is (to date) the largest released chest x-ray dataset consisting of 473,064 chest x-rays and 206,574 radiology reports collected from 63,478 patients. We present the results of training and evaluating a collection of deep convolutional neural networks on this dataset to recognize multiple common thorax diseases. To the best of our knowledge, this is the first work that trains…
▽ More
The MIMIC-CXR dataset is (to date) the largest released chest x-ray dataset consisting of 473,064 chest x-rays and 206,574 radiology reports collected from 63,478 patients. We present the results of training and evaluating a collection of deep convolutional neural networks on this dataset to recognize multiple common thorax diseases. To the best of our knowledge, this is the first work that trains CNNs for this task on such a large collection of chest x-ray images, which is over four times the size of the largest previously released chest x-ray corpus (ChestX-Ray14). We describe and evaluate individual CNN models trained on frontal and lateral CXR view types. In addition, we present a novel DualNet architecture that emulates routine clinical practice by simultaneously processing both frontal and lateral CXR images obtained from a radiological exam. Our DualNet architecture shows improved performance in recognizing findings in CXR images when compared to applying separate baseline frontal and lateral classifiers.
△ Less
Submitted 24 April, 2018; v1 submitted 20 April, 2018;
originally announced April 2018.
-
Densely Connected Convolutional Networks and Signal Quality Analysis to Detect Atrial Fibrillation Using Short Single-Lead ECG Recordings
Authors:
Jonathan Rubin,
Saman Parvaneh,
Asif Rahman,
Bryan Conroy,
Saeed Babaeizadeh
Abstract:
The development of new technology such as wearables that record high-quality single channel ECG, provides an opportunity for ECG screening in a larger population, especially for atrial fibrillation screening. The main goal of this study is to develop an automatic classification algorithm for normal sinus rhythm (NSR), atrial fibrillation (AF), other rhythms (O), and noise from a single channel sho…
▽ More
The development of new technology such as wearables that record high-quality single channel ECG, provides an opportunity for ECG screening in a larger population, especially for atrial fibrillation screening. The main goal of this study is to develop an automatic classification algorithm for normal sinus rhythm (NSR), atrial fibrillation (AF), other rhythms (O), and noise from a single channel short ECG segment (9-60 seconds). For this purpose, signal quality index (SQI) along with dense convolutional neural networks was used. Two convolutional neural network (CNN) models (main model that accepts 15 seconds ECG and secondary model that processes 9 seconds shorter ECG) were trained using the training data set. If the recording is determined to be of low quality by SQI, it is immediately classified as noisy. Otherwise, it is transformed to a time-frequency representation and classified with the CNN as NSR, AF, O, or noise. At the final step, a feature-based post-processing algorithm classifies the rhythm as either NSR or O in case the CNN model's discrimination between the two is indeterminate. The best result achieved at the official phase of the PhysioNet/CinC challenge on the blind test set was 0.80 (F1 for NSR, AF, and O were 0.90, 0.80, and 0.70, respectively).
△ Less
Submitted 10 October, 2017;
originally announced October 2017.
-
An Ensemble Boosting Model for Predicting Transfer to the Pediatric Intensive Care Unit
Authors:
Jonathan Rubin,
Cristhian Potes,
Minnan Xu-Wilson,
Junzi Dong,
Asif Rahman,
Hiep Nguyen,
David Moromisato
Abstract:
Our work focuses on the problem of predicting the transfer of pediatric patients from the general ward of a hospital to the pediatric intensive care unit. Using data collected over 5.5 years from the electronic health records of two medical facilities, we develop classifiers based on adaptive boosting and gradient tree boosting. We further combine these learned classifiers into an ensemble model a…
▽ More
Our work focuses on the problem of predicting the transfer of pediatric patients from the general ward of a hospital to the pediatric intensive care unit. Using data collected over 5.5 years from the electronic health records of two medical facilities, we develop classifiers based on adaptive boosting and gradient tree boosting. We further combine these learned classifiers into an ensemble model and compare its performance to a modified pediatric early warning score (PEWS) baseline that relies on expert defined guidelines. To gauge model generalizability, we perform an inter-facility evaluation where we train our algorithm on data from one facility and perform evaluation on a hidden test dataset from a separate facility. We show that improvements are witnessed over the PEWS baseline in accuracy (0.77 vs. 0.69), sensitivity (0.80 vs. 0.68), specificity (0.74 vs. 0.70) and AUROC (0.85 vs. 0.73).
△ Less
Submitted 16 July, 2017;
originally announced July 2017.
-
Recognizing Abnormal Heart Sounds Using Deep Learning
Authors:
Jonathan Rubin,
Rui Abreu,
Anurag Ganguli,
Saigopal Nelaturi,
Ion Matei,
Kumar Sricharan
Abstract:
The work presented here applies deep learning to the task of automated cardiac auscultation, i.e. recognizing abnormalities in heart sounds. We describe an automated heart sound classification algorithm that combines the use of time-frequency heat map representations with a deep convolutional neural network (CNN). Given the cost-sensitive nature of misclassification, our CNN architecture is traine…
▽ More
The work presented here applies deep learning to the task of automated cardiac auscultation, i.e. recognizing abnormalities in heart sounds. We describe an automated heart sound classification algorithm that combines the use of time-frequency heat map representations with a deep convolutional neural network (CNN). Given the cost-sensitive nature of misclassification, our CNN architecture is trained using a modified loss function that directly optimizes the trade-off between sensitivity and specificity. We evaluated our algorithm at the 2016 PhysioNet Computing in Cardiology challenge where the objective was to accurately classify normal and abnormal heart sounds from single, short, potentially noisy recordings. Our entry to the challenge achieved a final specificity of 0.95, sensitivity of 0.73 and overall score of 0.84. We achieved the greatest specificity score out of all challenge entries and, using just a single CNN, our algorithm differed in overall score by only 0.02 compared to the top place finisher, which used an ensemble approach.
△ Less
Submitted 19 October, 2017; v1 submitted 14 July, 2017;
originally announced July 2017.
-
Proceedings 7th International Workshop on Formal Methods and Analysis in Software Product Line Engineering
Authors:
Julia Rubin,
Thomas Thüm
Abstract:
In Software Product Line Engineering (SPLE), a portfolio of similar systems is developed from a shared set of software assets. Claimed benefits of SPLE include reductions in the portfolio size, cost of software development and time to production, as well as improvements in the quality of the delivered systems. Yet, despite these benefits, SPLE is still in the early adoption stage. We believe that…
▽ More
In Software Product Line Engineering (SPLE), a portfolio of similar systems is developed from a shared set of software assets. Claimed benefits of SPLE include reductions in the portfolio size, cost of software development and time to production, as well as improvements in the quality of the delivered systems. Yet, despite these benefits, SPLE is still in the early adoption stage. We believe that automated approaches, tools and techniques that provide better support for SPLE activities can further facilitate its adoption in practice and increase its benefits.
To promote work in this area, the FMSPLE'16 workshop focuses on automated analysis and formal methods, which can (1) lead to a further increase in development productivity and reduction in maintenance costs associated with management of the SPLE artifacts, and (2) provide proven guarantees for the correctness and quality of the delivered systems.
△ Less
Submitted 28 March, 2016;
originally announced March 2016.
-
Degree switching and partitioning for enumerating graphs to arbitrary orders of accuracy
Authors:
David Burstein,
Jonathan Rubin
Abstract:
We provide a novel method for constructing asymptotics (to arbitrary accuracy) for the number of directed graphs that realize a fixed bidegree sequence $d = a \times b$ with maximum degree $d_{max}=O(S^{\frac{1}{2}-τ})$ for an arbitrarily small positive number $τ$, where $S$ is the number edges specified by $d$. Our approach is based on two key steps, graph partitioning and degree preserving switc…
▽ More
We provide a novel method for constructing asymptotics (to arbitrary accuracy) for the number of directed graphs that realize a fixed bidegree sequence $d = a \times b$ with maximum degree $d_{max}=O(S^{\frac{1}{2}-τ})$ for an arbitrarily small positive number $τ$, where $S$ is the number edges specified by $d$. Our approach is based on two key steps, graph partitioning and degree preserving switches. The former idea allows us to relate enumeration results for given sequences to those for sequences that are especially easy to handle, while the latter facilitates expansions based on numbers of shared neighbors of pairs of nodes. While we focus primarily on directed graphs allowing loops, our results can be extended to other cases, including bipartite graphs, as well as directed and undirected graphs without loops. In addition, we can relax the constraint that $d_{max} = O(S^{\frac{1}{2}-τ})$ and replace it with $a_{max} b_{max} = O(S^{1-τ})$. where $a_{max}$ and $b_{max}$ are the maximum values for $a$ and $b$ respectively. The previous best results, from Greenhill et al., only allow for $d_{max} = o(S^{\frac{1}{3}})$ or alternatively $a_{max} b_{max} = o(S^{\frac{2}{3}})$. Since in many real world networks, $d_{max}$ scales larger than $o(S^{\frac{1}{3}})$, we expect that this work will be helpful for various applications.
△ Less
Submitted 21 October, 2016; v1 submitted 11 November, 2015;
originally announced November 2015.
-
Sufficient Conditions for Graphicality of Bidegree Sequences
Authors:
David Burstein,
Jonathan Rubin
Abstract:
There are a variety of existing conditions for a degree sequence to be graphic. When a degree sequence satisfies any of these conditions, there exists a graph that realizes the sequence. We formulate several novel sufficient graphicality criteria that depend on the number of elements in the sequence, corresponding to the number of nodes in an associated graph, and the mean degree of the sequence.…
▽ More
There are a variety of existing conditions for a degree sequence to be graphic. When a degree sequence satisfies any of these conditions, there exists a graph that realizes the sequence. We formulate several novel sufficient graphicality criteria that depend on the number of elements in the sequence, corresponding to the number of nodes in an associated graph, and the mean degree of the sequence. These conditions, which are stated in terms of bidegree sequences for directed graphs, are easier to apply than classic necessary and sufficient graphicality conditions involving multiple inequalities. They are also more flexible than more recent graphicality conditions, in that they imply graphicality of some degree sequences not covered by those conditions. The form of our results will allow them to be easily used for the generation of graphs with particular degree sequences for applications.
△ Less
Submitted 21 October, 2016; v1 submitted 7 November, 2015;
originally announced November 2015.