Skip to main content

Showing 1–21 of 21 results for author: Rubin, J

  1. arXiv:2312.02337  [pdf, other

    cs.CL

    Measuring Distributional Shifts in Text: The Advantage of Language Model-Based Embeddings

    Authors: Gyandev Gupta, Bashir Rastegarpanah, Amalendu Iyer, Joshua Rubin, Krishnaram Kenthapadi

    Abstract: An essential part of monitoring machine learning models in production is measuring input and output data drift. In this paper, we present a system for measuring distributional shifts in natural language data and highlight and investigate the potential advantage of using large language models (LLMs) for this problem. Recent advancements in LLMs and their successful adoption in different domains ind… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

  2. arXiv:2303.09767  [pdf, other

    cs.LG cs.CR

    It Is All About Data: A Survey on the Effects of Data on Adversarial Robustness

    Authors: Peiyu Xiong, Michael Tegegn, Jaskeerat Singh Sarin, Shubhraneel Pal, Julia Rubin

    Abstract: Adversarial examples are inputs to machine learning models that an attacker has intentionally designed to confuse the model into making a mistake. Such examples pose a serious threat to the applicability of machine-learning-based systems, especially in life- and safety-critical domains. To address this problem, the area of adversarial robustness investigates mechanisms behind adversarial attacks a… ▽ More

    Submitted 17 October, 2023; v1 submitted 17 March, 2023; originally announced March 2023.

    Comments: Accepted to ACM Computing Surveys, 40 pages, 24 figures

  3. An Empirical Investigation into the Reproduction of Bug Reports for Android Apps

    Authors: Jack Johnson, Junayed Mahmud, Tyler Wendland, Kevin Moran, Julia Rubin, Mattia Fazzini

    Abstract: One of the key tasks related to ensuring mobile app quality is the reporting, management, and resolution of bug reports. As such, researchers have committed considerable resources toward automating various tasks of the bug management process for mobile apps, such as reproduction and triaging. However, the success of these automated approaches is largely dictated by the characteristics and properti… ▽ More

    Submitted 3 January, 2023; originally announced January 2023.

    Comments: Published in the Proceedings of the 29th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER'22), Honolulu, Hawaii, March 15-18, 2022, pp. 321-332

  4. arXiv:2206.12540  [pdf, other

    cs.HC cs.LG

    Visual Auditor: Interactive Visualization for Detection and Summarization of Model Biases

    Authors: David Munechika, Zijie J. Wang, Jack Reidy, Josh Rubin, Krishna Gade, Krishnaram Kenthapadi, Duen Horng Chau

    Abstract: As machine learning (ML) systems become increasingly widespread, it is necessary to audit these systems for biases prior to their deployment. Recent research has developed algorithms for effectively identifying intersectional bias in the form of interpretable, underperforming subsets (or slices) of the data. However, these solutions and their insights are limited without a tool for visually unders… ▽ More

    Submitted 24 June, 2022; originally announced June 2022.

  5. arXiv:2109.15129  [pdf, other

    eess.SP cs.LG

    Convolution-Free Waveform Transformers for Multi-Lead ECG Classification

    Authors: Annamalai Natarajan, Gregory Boverman, Yale Chang, Corneliu Antonescu, Jonathan Rubin

    Abstract: We present our entry to the 2021 PhysioNet/CinC challenge - a waveform transformer model to detect cardiac abnormalities from ECG recordings. We compare the performance of the waveform transformer model on different ECG-lead subsets using approximately 88,000 ECG recordings from six datasets. In the official rankings, team prna ranked between 9 and 15 on 12, 6, 4, 3 and 2-lead sets respectively. O… ▽ More

    Submitted 29 September, 2021; originally announced September 2021.

    Comments: Computing in Cardiology - 2021 PhysioNet Challenge

  6. arXiv:2109.07602  [pdf, other

    cs.LG cs.AI

    Interpretable Additive Recurrent Neural Networks For Multivariate Clinical Time Series

    Authors: Asif Rahman, Yale Chang, Jonathan Rubin

    Abstract: Time series models with recurrent neural networks (RNNs) can have high accuracy but are unfortunately difficult to interpret as a result of feature-interactions, temporal-interactions, and non-linear transformations. Interpretability is important in domains like healthcare where constructing models that provide insight into the relationships they have learned are required to validate and trust mod… ▽ More

    Submitted 15 September, 2021; originally announced September 2021.

  7. AndroR2: A Dataset of Manually Reproduced Bug Reports for Android Applications

    Authors: Tyler Wendland, Jingyang Sun, Junayed Mahmud, S. M. Hasan Mansur, Steven Huang, Kevin Moran, Julia Rubin, Mattia Fazzini

    Abstract: Software maintenance constitutes a large portion of the software development lifecycle. To carry out maintenance tasks, developers often need to understand and reproduce bug reports. As such, there has been increasing research activity coalescing around the notion of automating various activities related to bug reporting. A sizable portion of this research interest has focused on the domain of mob… ▽ More

    Submitted 15 June, 2021; originally announced June 2021.

    Comments: 5 pages, Accepted to the 2021 International Conference on Mining Software Repositories, Data Showcase Track; Links to Datasets: https://doi.org/10.5281/zenodo.4646313; https://github.com/SageSELab/AndroR2

  8. arXiv:2102.07862  [pdf, other

    cs.LG

    Unified Shapley Framework to Explain Prediction Drift

    Authors: Aalok Shanbhag, Avijit Ghosh, Josh Rubin

    Abstract: Predictions are the currency of a machine learning model, and to understand the model's behavior over segments of a dataset, or over time, is an important problem in machine learning research and practice. There currently is no systematic framework to understand this drift in prediction distributions over time or between two semantically meaningful slices of data, in terms of the input features an… ▽ More

    Submitted 15 February, 2021; originally announced February 2021.

  9. arXiv:1907.06274  [pdf, other

    cs.SE cs.LG

    Predicting Merge Conflicts in Collaborative Software Development

    Authors: Moein Owhadi-Kareshk, Sarah Nadi, Julia Rubin

    Abstract: Background. During collaborative software development, developers often use branches to add features or fix bugs. When merging changes from two branches, conflicts may occur if the changes are inconsistent. Developers need to resolve these conflicts before completing the merge, which is an error-prone and time-consuming process. Early detection of merge conflicts, which warns developers about reso… ▽ More

    Submitted 14 July, 2019; originally announced July 2019.

  10. arXiv:1904.13281  [pdf, other

    eess.IV cs.CV cs.LG

    CT-To-MR Conditional Generative Adversarial Networks for Ischemic Stroke Lesion Segmentation

    Authors: Jonathan Rubin, S. Mazdak Abulnaga

    Abstract: Infarcted brain tissue resulting from acute stroke readily shows up as hyperintense regions within diffusion-weighted magnetic resonance imaging (DWI). It has also been proposed that computed tomography perfusion (CTP) could alternatively be used to triage stroke patients, given improvements in speed and availability, as well as reduced cost. However, CTP has a lower signal to noise ratio compared… ▽ More

    Submitted 30 April, 2019; originally announced April 2019.

    Comments: Seventh IEEE International Conference on Healthcare Informatics (ICHI 2019)

  11. arXiv:1902.10785  [pdf, other

    cs.CV

    Semi-supervised Learning for Quantification of Pulmonary Edema in Chest X-Ray Images

    Authors: Ruizhi Liao, Jonathan Rubin, Grace Lam, Seth Berkowitz, Sandeep Dalal, William Wells, Steven Horng, Polina Golland

    Abstract: We propose and demonstrate machine learning algorithms to assess the severity of pulmonary edema in chest x-ray images of congestive heart failure patients. Accurate assessment of pulmonary edema in heart failure is critical when making treatment and disposition decisions. Our work is grounded in a large-scale clinical dataset of over 300,000 x-ray images with associated radiology reports. While e… ▽ More

    Submitted 9 April, 2019; v1 submitted 27 February, 2019; originally announced February 2019.

  12. arXiv:1811.06106  [pdf, other

    cs.CV cs.AI stat.ML

    Multivariate Time-series Similarity Assessment via Unsupervised Representation Learning and Stratified Locality Sensitive Hashing: Application to Early Acute Hypotensive Episode Detection

    Authors: Jwala Dhamala, Emmanuel Azuh, Abdullah Al-Dujaili, Jonathan Rubin, Una-May O'Reilly

    Abstract: Timely prediction of clinically critical events in Intensive Care Unit (ICU) is important for improving care and survival rate. Most of the existing approaches are based on the application of various classification methods on explicitly extracted statistical features from vital signals. In this work, we propose to eliminate the high cost of engineering hand-crafted features from multivariate time-… ▽ More

    Submitted 4 December, 2018; v1 submitted 14 November, 2018; originally announced November 2018.

    Comments: Machine Learning for Health (ML4H) Workshop at NeurIPS 2018 arXiv:1811.07216

    Report number: ML4H/2018/66

  13. arXiv:1811.01085  [pdf, other

    cs.CV cs.AI

    Ischemic Stroke Lesion Segmentation in CT Perfusion Scans using Pyramid Pooling and Focal Loss

    Authors: S. Mazdak Abulnaga, Jonathan Rubin

    Abstract: We present a fully convolutional neural network for segmenting ischemic stroke lesions in CT perfusion images for the ISLES 2018 challenge. Treatment of stroke is time sensitive and current standards for lesion identification require manual segmentation, a time consuming and challenging process. Automatic segmentation methods present the possibility of accurately identifying lesions and improving… ▽ More

    Submitted 2 November, 2018; originally announced November 2018.

    Comments: BrainLes 2018 MICCAI workshop

  14. arXiv:1810.02726  [pdf

    cs.CV eess.SP

    Automatic Detection of Arousals during Sleep using Multiple Physiological Signals

    Authors: Saman Parvaneh, Jonathan Rubin, Ali Samadani, Gajendra Katuwal

    Abstract: The visual scoring of arousals during sleep routinely conducted by sleep experts is a challenging task warranting an automatic approach. This paper presents an algorithm for automatic detection of arousals during sleep. Using the Physionet/CinC Challenge dataset, an 80-20% subject-level split was performed to create in-house training and test sets, respectively. The data for each subject in the tr… ▽ More

    Submitted 5 October, 2018; originally announced October 2018.

    Comments: Computing in Cardiology 2018

  15. arXiv:1804.07839  [pdf, other

    cs.CV stat.ML

    Large Scale Automated Reading of Frontal and Lateral Chest X-Rays using Dual Convolutional Neural Networks

    Authors: Jonathan Rubin, Deepan Sanghavi, Claire Zhao, Kathy Lee, Ashequl Qadir, Minnan Xu-Wilson

    Abstract: The MIMIC-CXR dataset is (to date) the largest released chest x-ray dataset consisting of 473,064 chest x-rays and 206,574 radiology reports collected from 63,478 patients. We present the results of training and evaluating a collection of deep convolutional neural networks on this dataset to recognize multiple common thorax diseases. To the best of our knowledge, this is the first work that trains… ▽ More

    Submitted 24 April, 2018; v1 submitted 20 April, 2018; originally announced April 2018.

    Comments: First draft, under review

  16. arXiv:1710.05817  [pdf

    eess.SP cs.CV stat.ML

    Densely Connected Convolutional Networks and Signal Quality Analysis to Detect Atrial Fibrillation Using Short Single-Lead ECG Recordings

    Authors: Jonathan Rubin, Saman Parvaneh, Asif Rahman, Bryan Conroy, Saeed Babaeizadeh

    Abstract: The development of new technology such as wearables that record high-quality single channel ECG, provides an opportunity for ECG screening in a larger population, especially for atrial fibrillation screening. The main goal of this study is to develop an automatic classification algorithm for normal sinus rhythm (NSR), atrial fibrillation (AF), other rhythms (O), and noise from a single channel sho… ▽ More

    Submitted 10 October, 2017; originally announced October 2017.

    Comments: Computing in Cardiology 2017

  17. arXiv:1707.04958  [pdf, other

    cs.LG stat.AP stat.ML

    An Ensemble Boosting Model for Predicting Transfer to the Pediatric Intensive Care Unit

    Authors: Jonathan Rubin, Cristhian Potes, Minnan Xu-Wilson, Junzi Dong, Asif Rahman, Hiep Nguyen, David Moromisato

    Abstract: Our work focuses on the problem of predicting the transfer of pediatric patients from the general ward of a hospital to the pediatric intensive care unit. Using data collected over 5.5 years from the electronic health records of two medical facilities, we develop classifiers based on adaptive boosting and gradient tree boosting. We further combine these learned classifiers into an ensemble model a… ▽ More

    Submitted 16 July, 2017; originally announced July 2017.

  18. arXiv:1707.04642  [pdf, other

    cs.SD cs.CV

    Recognizing Abnormal Heart Sounds Using Deep Learning

    Authors: Jonathan Rubin, Rui Abreu, Anurag Ganguli, Saigopal Nelaturi, Ion Matei, Kumar Sricharan

    Abstract: The work presented here applies deep learning to the task of automated cardiac auscultation, i.e. recognizing abnormalities in heart sounds. We describe an automated heart sound classification algorithm that combines the use of time-frequency heat map representations with a deep convolutional neural network (CNN). Given the cost-sensitive nature of misclassification, our CNN architecture is traine… ▽ More

    Submitted 19 October, 2017; v1 submitted 14 July, 2017; originally announced July 2017.

    Comments: IJCAI 2017 Knowledge Discovery in Healthcare Workshop

  19. arXiv:1603.08577   

    cs.SE cs.FL

    Proceedings 7th International Workshop on Formal Methods and Analysis in Software Product Line Engineering

    Authors: Julia Rubin, Thomas Thüm

    Abstract: In Software Product Line Engineering (SPLE), a portfolio of similar systems is developed from a shared set of software assets. Claimed benefits of SPLE include reductions in the portfolio size, cost of software development and time to production, as well as improvements in the quality of the delivered systems. Yet, despite these benefits, SPLE is still in the early adoption stage. We believe that… ▽ More

    Submitted 28 March, 2016; originally announced March 2016.

    Journal ref: EPTCS 206, 2016

  20. arXiv:1511.03738  [pdf, other

    math.CO cs.DM math.ST

    Degree switching and partitioning for enumerating graphs to arbitrary orders of accuracy

    Authors: David Burstein, Jonathan Rubin

    Abstract: We provide a novel method for constructing asymptotics (to arbitrary accuracy) for the number of directed graphs that realize a fixed bidegree sequence $d = a \times b$ with maximum degree $d_{max}=O(S^{\frac{1}{2}-τ})$ for an arbitrarily small positive number $τ$, where $S$ is the number edges specified by $d$. Our approach is based on two key steps, graph partitioning and degree preserving switc… ▽ More

    Submitted 21 October, 2016; v1 submitted 11 November, 2015; originally announced November 2015.

    Comments: 24 pages, 1 figure

    MSC Class: 05C30; 05A16; 62Q05; 05C07; 05C20

  21. arXiv:1511.02411  [pdf, ps, other

    math.CO cs.DM

    Sufficient Conditions for Graphicality of Bidegree Sequences

    Authors: David Burstein, Jonathan Rubin

    Abstract: There are a variety of existing conditions for a degree sequence to be graphic. When a degree sequence satisfies any of these conditions, there exists a graph that realizes the sequence. We formulate several novel sufficient graphicality criteria that depend on the number of elements in the sequence, corresponding to the number of nodes in an associated graph, and the mean degree of the sequence.… ▽ More

    Submitted 21 October, 2016; v1 submitted 7 November, 2015; originally announced November 2015.

    Comments: 18 pages

    MSC Class: 05C20; 05C80; 05C82