subscribe to arXiv mailings

Speech Robust Bench: A Robustness Benchmark For Speech Recognition

Authors: Muhammad A. Shah, David Solans Noguero, Mikko A. Heikkila, Nicolas Kourtellis

Abstract: As Automatic Speech Recognition (ASR) models become ever more pervasive, it is important to ensure that they make reliable predictions under corruptions present in the physical and digital world. We propose Speech Robust Bench (SRB), a comprehensive benchmark for evaluating the robustness of ASR models to diverse corruptions. SRB is composed of 69 input perturbations which are intended to simulate… ▽ More As Automatic Speech Recognition (ASR) models become ever more pervasive, it is important to ensure that they make reliable predictions under corruptions present in the physical and digital world. We propose Speech Robust Bench (SRB), a comprehensive benchmark for evaluating the robustness of ASR models to diverse corruptions. SRB is composed of 69 input perturbations which are intended to simulate various corruptions that ASR models may encounter in the physical and digital world. We use SRB to evaluate the robustness of several state-of-the-art ASR models and observe that model size and certain modeling choices such as discrete representations, and self-training appear to be conducive to robustness. We extend this analysis to measure the robustness of ASR models on data from various demographic subgroups, namely English and Spanish speakers, and males and females, and observed noticeable disparities in the model's robustness across subgroups. We believe that SRB will facilitate future research towards robust ASR models, by making it easier to conduct comprehensive and comparable robustness evaluations. △ Less

Submitted 8 March, 2024; originally announced March 2024.

arXiv:2208.01637 [pdf, other]

Comparative Analysis of State-of-the-Art Deep Learning Models for Detecting COVID-19 Lung Infection from Chest X-Ray Images

Authors: Zeba Ghaffar, Pir Masoom Shah, Hikmat Khan, Syed Farhan Alam Zaidi, Abdullah Gani, Izaz Ahmad Khan, Munam Ali Shah, Saif ul Islam

Abstract: The ongoing COVID-19 pandemic has already taken millions of lives and damaged economies across the globe. Most COVID-19 deaths and economic losses are reported from densely crowded cities. It is comprehensible that the effective control and prevention of epidemic/pandemic infectious diseases is vital. According to WHO, testing and diagnosis is the best strategy to control pandemics. Scientists wor… ▽ More The ongoing COVID-19 pandemic has already taken millions of lives and damaged economies across the globe. Most COVID-19 deaths and economic losses are reported from densely crowded cities. It is comprehensible that the effective control and prevention of epidemic/pandemic infectious diseases is vital. According to WHO, testing and diagnosis is the best strategy to control pandemics. Scientists worldwide are attempting to develop various innovative and cost-efficient methods to speed up the testing process. This paper comprehensively evaluates the applicability of the recent top ten state-of-the-art Deep Convolutional Neural Networks (CNNs) for automatically detecting COVID-19 infection using chest X-ray images. Moreover, it provides a comparative analysis of these models in terms of accuracy. This study identifies the effective methodologies to control and prevent infectious respiratory diseases. Our trained models have demonstrated outstanding results in classifying the COVID-19 infected chest x-rays. In particular, our trained models MobileNet, EfficentNet, and InceptionV3 achieved a classification average accuracy of 95\%, 95\%, and 94\% test set for COVID-19 class classification, respectively. Thus, it can be beneficial for clinical practitioners and radiologists to speed up the testing, detection, and follow-up of COVID-19 cases. △ Less

Submitted 30 June, 2022; originally announced August 2022.

arXiv:1710.08684 [pdf, other]

doi 10.1109/MLSP.2017.8168153

Inferring Room Semantics Using Acoustic Monitoring

Authors: Muhammad A. Shah, Bhiksha Raj, Khaled A. Harras

Abstract: Having knowledge of the environmental context of the user i.e. the knowledge of the users' indoor location and the semantics of their environment, can facilitate the development of many of location-aware applications. In this paper, we propose an acoustic monitoring technique that infers semantic knowledge about an indoor space \emph{over time,} using audio recordings from it. Our technique uses t… ▽ More Having knowledge of the environmental context of the user i.e. the knowledge of the users' indoor location and the semantics of their environment, can facilitate the development of many of location-aware applications. In this paper, we propose an acoustic monitoring technique that infers semantic knowledge about an indoor space \emph{over time,} using audio recordings from it. Our technique uses the impulse response of these spaces as well as the ambient sounds produced in them in order to determine a semantic label for them. As we process more recordings, we update our \emph{confidence} in the assigned label. We evaluate our technique on a dataset of single-speaker human speech recordings obtained in different types of rooms at three university buildings. In our evaluation, the confidence\emph{ }for the true label generally outstripped the confidence for all other labels and in some cases converged to 100\% with less than 30 samples. △ Less

Submitted 24 October, 2017; originally announced October 2017.

Comments: 2017 IEEE International Workshop on Machine Learning for Signal Processing, Sept.\ 25--28, 2017, Tokyo, Japan

Journal ref: IEEE International Workshop on Machine Learning for Signal Processing (MLSP) 27 (2017) 1-6

Showing 1–3 of 3 results for author: Shah, M A