-
Discriminant audio properties in deep learning based respiratory insufficiency detection in Brazilian Portuguese
Authors:
Marcelo Matheus Gauy,
Larissa Cristina Berti,
Arnaldo Cândido Jr,
Augusto Camargo Neto,
Alfredo Goldman,
Anna Sara Shafferman Levin,
Marcus Martins,
Beatriz Raposo de Medeiros,
Marcelo Queiroz,
Ester Cerdeira Sabino,
Flaviane Romani Fernandes Svartman,
Marcelo Finger
Abstract:
This work investigates Artificial Intelligence (AI) systems that detect respiratory insufficiency (RI) by analyzing speech audios, thus treating speech as a RI biomarker. Previous works collected RI data (P1) from COVID-19 patients during the first phase of the pandemic and trained modern AI models, such as CNNs and Transformers, which achieved $96.5\%$ accuracy, showing the feasibility of RI dete…
▽ More
This work investigates Artificial Intelligence (AI) systems that detect respiratory insufficiency (RI) by analyzing speech audios, thus treating speech as a RI biomarker. Previous works collected RI data (P1) from COVID-19 patients during the first phase of the pandemic and trained modern AI models, such as CNNs and Transformers, which achieved $96.5\%$ accuracy, showing the feasibility of RI detection via AI. Here, we collect RI patient data (P2) with several causes besides COVID-19, aiming at extending AI-based RI detection. We also collected control data from hospital patients without RI. We show that the considered models, when trained on P1, do not generalize to P2, indicating that COVID-19 RI has features that may not be found in all RI types.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Interpretability Analysis of Deep Models for COVID-19 Detection
Authors:
Daniel Peixoto Pinto da Silva,
Edresson Casanova,
Lucas Rafael Stefanel Gris,
Arnaldo Candido Junior,
Marcelo Finger,
Flaviane Svartman,
Beatriz Raposo,
Marcus Vinícius Moreira Martins,
Sandra Maria Aluísio,
Larissa Cristina Berti,
João Paulo Teixeira
Abstract:
During the outbreak of COVID-19 pandemic, several research areas joined efforts to mitigate the damages caused by SARS-CoV-2. In this paper we present an interpretability analysis of a convolutional neural network based model for COVID-19 detection in audios. We investigate which features are important for model decision process, investigating spectrograms, F0, F0 standard deviation, sex and age.…
▽ More
During the outbreak of COVID-19 pandemic, several research areas joined efforts to mitigate the damages caused by SARS-CoV-2. In this paper we present an interpretability analysis of a convolutional neural network based model for COVID-19 detection in audios. We investigate which features are important for model decision process, investigating spectrograms, F0, F0 standard deviation, sex and age. Following, we analyse model decisions by generating heat maps for the trained models to capture their attention during the decision process. Focusing on a explainable Inteligence Artificial approach, we show that studied models can taken unbiased decisions even in the presence of spurious data in the training set, given the adequate preprocessing steps. Our best model has 94.44% of accuracy in detection, with results indicating that models favors spectrograms for the decision process, particularly, high energy areas in the spectrogram related to prosodic domains, while F0 also leads to efficient COVID-19 detection.
△ Less
Submitted 25 November, 2022;
originally announced November 2022.
-
Bringing NURC/SP to Digital Life: the Role of Open-source Automatic Speech Recognition Models
Authors:
Lucas Rafael Stefanel Gris,
Arnaldo Candido Junior,
Vinícius G. dos Santos,
Bruno A. Papa Dias,
Marli Quadros Leite,
Flaviane Romani Fernandes Svartman,
Sandra Aluísio
Abstract:
The NURC Project that started in 1969 to study the cultured linguistic urban norm spoken in five Brazilian capitals, was responsible for compiling a large corpus for each capital. The digitized NURC/SP comprises 375 inquiries in 334 hours of recordings taken in São Paulo capital. Although 47 inquiries have transcripts, there was no alignment between the audio-transcription, and 328 inquiries were…
▽ More
The NURC Project that started in 1969 to study the cultured linguistic urban norm spoken in five Brazilian capitals, was responsible for compiling a large corpus for each capital. The digitized NURC/SP comprises 375 inquiries in 334 hours of recordings taken in São Paulo capital. Although 47 inquiries have transcripts, there was no alignment between the audio-transcription, and 328 inquiries were not transcribed. This article presents an evaluation and error analysis of three automatic speech recognition models trained with spontaneous speech in Portuguese and one model trained with prepared speech. The evaluation allowed us to choose the best model, using WER and CER metrics, in a manually aligned sample of NURC/SP, to automatically transcribe 284 hours.
△ Less
Submitted 14 October, 2022;
originally announced October 2022.