subscribe to arXiv mailings

Exploring Best Practices for ECG Signal Processing in Machine Learning

Authors: Amir Salimi, Sunil Vasu Kalmady, Abram Hindle, Osmar Zaiane, Padma Kaul

Abstract: In this work we search for best practices in pre-processing of Electrocardiogram (ECG) signals in order to train better classifiers for the diagnosis of heart conditions. State of the art machine learning algorithms have achieved remarkable results in classification of some heart conditions using ECG data, yet there appears to be no consensus on pre-processing best practices. Is this lack of conse… ▽ More In this work we search for best practices in pre-processing of Electrocardiogram (ECG) signals in order to train better classifiers for the diagnosis of heart conditions. State of the art machine learning algorithms have achieved remarkable results in classification of some heart conditions using ECG data, yet there appears to be no consensus on pre-processing best practices. Is this lack of consensus due to different conditions and architectures requiring different processing steps for optimal performance? Is it possible that state of the art deep-learning models have rendered pre-processing unnecessary? In this work we apply down-sampling, normalization, and filtering functions to 3 different multi-label ECG datasets and measure their effects on 3 different high-performing time-series classifiers. We find that sampling rates as low as 50Hz can yield comparable results to the commonly used 500Hz. This is significant as smaller sampling rates will result in smaller datasets and models, which require less time and resources to train. Additionally, despite their common usage, we found min-max normalization to be slightly detrimental overall, and band-passing to make no measurable difference. We found the blind approach to pre-processing of ECGs for multi-label classification to be ineffective, with the exception of sample rate reduction which reliably reduces computational resources, but does not increase accuracy. △ Less

Submitted 2 November, 2023; originally announced November 2023.

arXiv:2211.10431 [pdf, other]

Improving ECG-based COVID-19 diagnosis and mortality predictions using pre-pandemic medical records at population-scale

Authors: Weijie Sun, Sunil Vasu Kalmady, Nariman Sepehrvand, Luan Manh Chu, Zihan Wang, Amir Salimi, Abram Hindle, Russell Greiner, Padma Kaul

Abstract: Pandemic outbreaks such as COVID-19 occur unexpectedly, and need immediate action due to their potential devastating consequences on global health. Point-of-care routine assessments such as electrocardiogram (ECG), can be used to develop prediction models for identifying individuals at risk. However, there is often too little clinically-annotated medical data, especially in early phases of a pande… ▽ More Pandemic outbreaks such as COVID-19 occur unexpectedly, and need immediate action due to their potential devastating consequences on global health. Point-of-care routine assessments such as electrocardiogram (ECG), can be used to develop prediction models for identifying individuals at risk. However, there is often too little clinically-annotated medical data, especially in early phases of a pandemic, to develop accurate prediction models. In such situations, historical pre-pandemic health records can be utilized to estimate a preliminary model, which can then be fine-tuned based on limited available pandemic data. This study shows this approach -- pre-train deep learning models with pre-pandemic data -- can work effectively, by demonstrating substantial performance improvement over three different COVID-19 related diagnostic and prognostic prediction tasks. Similar transfer learning strategies can be useful for developing timely artificial intelligence solutions in future pandemic outbreaks. △ Less

Submitted 11 January, 2023; v1 submitted 13 November, 2022; originally announced November 2022.

Comments: Accepted for NeurIPS 2022 TS4H workshop

arXiv:2210.06291 [pdf, other]

ECG for high-throughput screening of multiple diseases: Proof-of-concept using multi-diagnosis deep learning from population-based datasets

Authors: Weijie Sun, Sunil Vasu Kalmady, Amir Salimi, Nariman Sepehrvand, Eric Ly, Abram Hindle, Russell Greiner, Padma Kaul

Abstract: Electrocardiogram (ECG) abnormalities are linked to cardiovascular diseases, but may also occur in other non-cardiovascular conditions such as mental, neurological, metabolic and infectious conditions. However, most of the recent success of deep learning (DL) based diagnostic predictions in selected patient cohorts have been limited to a small set of cardiac diseases. In this study, we use a popul… ▽ More Electrocardiogram (ECG) abnormalities are linked to cardiovascular diseases, but may also occur in other non-cardiovascular conditions such as mental, neurological, metabolic and infectious conditions. However, most of the recent success of deep learning (DL) based diagnostic predictions in selected patient cohorts have been limited to a small set of cardiac diseases. In this study, we use a population-based dataset of >250,000 patients with >1000 medical conditions and >2 million ECGs to identify a wide range of diseases that could be accurately diagnosed from the patient's first in-hospital ECG. Our DL models uncovered 128 diseases and 68 disease categories with strong discriminative performance. △ Less

Submitted 5 October, 2022; originally announced October 2022.

Comments: Accepted in Medical Imaging meets NeurIPS 2021 https://www.cse.cuhk.edu.hk/~qdou/public/medneurips2021/88_ECG_for_high-throughput_screening_of_multiple_diseases_final_version.pdf

arXiv:2201.11808 [pdf, other]

LAP: An Attention-Based Module for Concept Based Self-Interpretation and Knowledge Injection in Convolutional Neural Networks

Authors: Rassa Ghavami Modegh, Ahmad Salimi, Alireza Dizaji, Hamid R. Rabiee

Abstract: Despite the state-of-the-art performance of deep convolutional neural networks, they are susceptible to bias and malfunction in unseen situations. Moreover, the complex computation behind their reasoning is not human-understandable to develop trust. External explainer methods have tried to interpret network decisions in a human-understandable way, but they are accused of fallacies due to their ass… ▽ More Despite the state-of-the-art performance of deep convolutional neural networks, they are susceptible to bias and malfunction in unseen situations. Moreover, the complex computation behind their reasoning is not human-understandable to develop trust. External explainer methods have tried to interpret network decisions in a human-understandable way, but they are accused of fallacies due to their assumptions and simplifications. On the other side, the inherent self-interpretability of models, while being more robust to the mentioned fallacies, cannot be applied to the already trained models. In this work, we propose a new attention-based pooling layer, called Local Attention Pooling (LAP), that accomplishes self-interpretability and the possibility for knowledge injection without performance loss. The module is easily pluggable into any convolutional neural network, even the already trained ones. We have defined a weakly supervised training scheme to learn the distinguishing features in decision-making without depending on experts' annotations. We verified our claims by evaluating several LAP-extended models on two datasets, including ImageNet. The proposed framework offers more valid human-understandable and faithful-to-the-model interpretations than the commonly used white-box explainer methods. △ Less

Submitted 24 October, 2023; v1 submitted 27 January, 2022; originally announced January 2022.

MSC Class: 68T07; 68T99 (Primary) 68T45 (Secondary)

arXiv:1907.07803 [pdf, other]

Syntax and Stack Overflow: A methodology for extracting a corpus of syntax errors and fixes

Authors: Alexander William Wong, Amir Salimi, Shaiful Chowdhury, Abram Hindle

Abstract: One problem when studying how to find and fix syntax errors is how to get natural and representative examples of syntax errors. Most syntax error datasets are not free, open, and public, or they are extracted from novice programmers and do not represent syntax errors that the general population of developers would make. Programmers of all skill levels post questions and answers to Stack Overflow w… ▽ More One problem when studying how to find and fix syntax errors is how to get natural and representative examples of syntax errors. Most syntax error datasets are not free, open, and public, or they are extracted from novice programmers and do not represent syntax errors that the general population of developers would make. Programmers of all skill levels post questions and answers to Stack Overflow which may contain snippets of source code along with corresponding text and tags. Many snippets do not parse, thus they are ripe for forming a corpus of syntax errors and corrections. Our primary contribution is an approach for extracting natural syntax errors and their corresponding human made fixes to help syntax error research. A Python abstract syntax tree parser is used to determine preliminary errors and corrections on code blocks extracted from the SOTorrent data set. We further analyzed our code by executing the corrections in a Python interpreter. We applied our methodology to produce a public data set of 62,965 Python Stack Overflow code snippets with corresponding tags, errors, and stack traces. We found that errors made by Stack Overflow users do not match errors made by student developers or random mutations, implying there is a serious representativeness risk within the field. Finally we share our dataset openly so that future researchers can re-use and extend our syntax errors and fixes. △ Less

Submitted 17 July, 2019; originally announced July 2019.

Comments: 5 pages, ICSME 2019

arXiv:1301.5334 [pdf, ps, other]

Generalized Cut-Set Bounds for Broadcast Networks

Authors: Amir Salimi, Tie Liu, Shuguang Cui

Abstract: A broadcast network is a classical network with all source messages collocated at a single source node. For broadcast networks, the standard cut-set bounds, which are known to be loose in general, are closely related to union as a specific set operation to combine the basic cuts of the network. This paper provides a new set of network coding bounds for general broadcast networks. These bounds comb… ▽ More A broadcast network is a classical network with all source messages collocated at a single source node. For broadcast networks, the standard cut-set bounds, which are known to be loose in general, are closely related to union as a specific set operation to combine the basic cuts of the network. This paper provides a new set of network coding bounds for general broadcast networks. These bounds combine the basic cuts of the network via a variety of set operations (not just the union) and are established via only the submodularity of Shannon entropy. The tightness of these bounds are demonstrated via applications to combination networks. △ Less

Submitted 22 January, 2013; originally announced January 2013.

Comments: 30 pages, 4 figures, submitted to the IEEE Transaction on Information Theory

Showing 1–6 of 6 results for author: Salimi, A