Skip to main content

Showing 1–50 of 69 results for author: Shah, A

  1. arXiv:2403.07937  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Speech Robust Bench: A Robustness Benchmark For Speech Recognition

    Authors: Muhammad A. Shah, David Solans Noguero, Mikko A. Heikkila, Nicolas Kourtellis

    Abstract: As Automatic Speech Recognition (ASR) models become ever more pervasive, it is important to ensure that they make reliable predictions under corruptions present in the physical and digital world. We propose Speech Robust Bench (SRB), a comprehensive benchmark for evaluating the robustness of ASR models to diverse corruptions. SRB is composed of 69 input perturbations which are intended to simulate… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  2. arXiv:2403.06659  [pdf, other

    eess.SP cs.AI cs.LG

    Zero-Shot ECG Classification with Multimodal Learning and Test-time Clinical Knowledge Enhancement

    Authors: Che Liu, Zhongwei Wan, Cheng Ouyang, Anand Shah, Wenjia Bai, Rossella Arcucci

    Abstract: Electrocardiograms (ECGs) are non-invasive diagnostic tools crucial for detecting cardiac arrhythmic diseases in clinical practice. While ECG Self-supervised Learning (eSSL) methods show promise in representation learning from unannotated ECG data, they often overlook the clinical knowledge that can be found in reports. This oversight and the requirement for annotated samples for downstream tasks… ▽ More

    Submitted 2 July, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: Accepted by ICML2024

  3. arXiv:2402.17050  [pdf, other

    eess.SY cs.RO

    Reinforcement Learning Based Oscillation Dampening: Scaling up Single-Agent RL algorithms to a 100 AV highway field operational test

    Authors: Kathy Jang, Nathan Lichtlé, Eugene Vinitsky, Adit Shah, Matthew Bunting, Matthew Nice, Benedetto Piccoli, Benjamin Seibold, Daniel B. Work, Maria Laura Delle Monache, Jonathan Sprinkle, Jonathan W. Lee, Alexandre M. Bayen

    Abstract: In this article, we explore the technical details of the reinforcement learning (RL) algorithms that were deployed in the largest field test of automated vehicles designed to smooth traffic flow in history as of 2023, uncovering the challenges and breakthroughs that come with developing RL controllers for automated vehicles. We delve into the fundamental concepts behind RL algorithms and their app… ▽ More

    Submitted 14 May, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

  4. arXiv:2402.17043  [pdf, other

    eess.SY

    Traffic Control via Connected and Automated Vehicles: An Open-Road Field Experiment with 100 CAVs

    Authors: Jonathan W. Lee, Han Wang, Kathy Jang, Amaury Hayat, Matthew Bunting, Arwa Alanqary, William Barbour, Zhe Fu, Xiaoqian Gong, George Gunter, Sharon Hornstein, Abdul Rahman Kreidieh, Nathan Lichtlé, Matthew W. Nice, William A. Richardson, Adit Shah, Eugene Vinitsky, Fangyu Wu, Shengquan Xiang, Sulaiman Almatrudi, Fahd Althukair, Rahul Bhadani, Joy Carpio, Raphael Chekroun, Eric Cheng , et al. (39 additional authors not shown)

    Abstract: The CIRCLES project aims to reduce instabilities in traffic flow, which are naturally occurring phenomena due to human driving behavior. These "phantom jams" or "stop-and-go waves,"are a significant source of wasted energy. Toward this goal, the CIRCLES project designed a control system referred to as the MegaController by the CIRCLES team, that could be deployed in real traffic. Our field experim… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  5. arXiv:2401.12974  [pdf, other

    eess.IV cs.CV q-bio.QM

    SegmentAnyBone: A Universal Model that Segments Any Bone at Any Location on MRI

    Authors: Hanxue Gu, Roy Colglazier, Haoyu Dong, Jikai Zhang, Yaqian Chen, Zafer Yildiz, Yuwen Chen, Lin Li, Jichen Yang, Jay Willhite, Alex M. Meyer, Brian Guo, Yashvi Atul Shah, Emily Luo, Shipra Rajput, Sally Kuehn, Clark Bulleit, Kevin A. Wu, Jisoo Lee, Brandon Ramirez, Darui Lu, Jay M. Levin, Maciej A. Mazurowski

    Abstract: Magnetic Resonance Imaging (MRI) is pivotal in radiology, offering non-invasive and high-quality insights into the human body. Precise segmentation of MRIs into different organs and tissues would be highly beneficial since it would allow for a higher level of understanding of the image content and enable important measurements, which are essential for accurate diagnosis and effective treatment pla… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

    Comments: 15 pages, 15 figures

  6. arXiv:2401.09666  [pdf, other

    eess.SY cs.AI cs.MA

    Traffic Smoothing Controllers for Autonomous Vehicles Using Deep Reinforcement Learning and Real-World Trajectory Data

    Authors: Nathan Lichtlé, Kathy Jang, Adit Shah, Eugene Vinitsky, Jonathan W. Lee, Alexandre M. Bayen

    Abstract: Designing traffic-smoothing cruise controllers that can be deployed onto autonomous vehicles is a key step towards improving traffic flow, reducing congestion, and enhancing fuel efficiency in mixed autonomy traffic. We bypass the common issue of having to carefully fine-tune a large traffic microsimulator by leveraging real-world trajectory data from the I-24 highway in Tennessee, replayed in a o… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

    Comments: Accepted to be published as part of the 26th IEEE International Conference on Intelligent Transportation Systems (ITSC) 2023, Bilbao, Spain, September 24-28, 2023

  7. arXiv:2312.09369  [pdf, other

    cs.SD cs.AI eess.AS

    Audio-visual fine-tuning of audio-only ASR models

    Authors: Avner May, Dmitriy Serdyuk, Ankit Parag Shah, Otavio Braga, Olivier Siohan

    Abstract: Audio-visual automatic speech recognition (AV-ASR) models are very effective at reducing word error rates on noisy speech, but require large amounts of transcribed AV training data. Recently, audio-visual self-supervised learning (SSL) approaches have been developed to reduce this dependence on transcribed AV data, but these methods are quite complex and computationally expensive. In this work, we… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

  8. arXiv:2312.03020  [pdf

    eess.IV cs.CV cs.LG

    Enhanced Breast Cancer Tumor Classification using MobileNetV2: A Detailed Exploration on Image Intensity, Error Mitigation, and Streamlit-driven Real-time Deployment

    Authors: Aaditya Surya, Aditya Shah, Jarnell Kabore, Subash Sasikumar

    Abstract: This research introduces a sophisticated transfer learning model based on Google's MobileNetV2 for breast cancer tumor classification into normal, benign, and malignant categories, utilizing a dataset of 1576 ultrasound images (265 normal, 891 benign, 420 malignant). The model achieves an accuracy of 0.82, precision of 0.83, recall of 0.81, ROC-AUC of 0.94, PR-AUC of 0.88, and MCC of 0.74. It exam… ▽ More

    Submitted 6 January, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

  9. arXiv:2312.01529  [pdf, other

    cs.CV cs.CL cs.LG eess.IV

    T3D: Towards 3D Medical Image Understanding through Vision-Language Pre-training

    Authors: Che Liu, Cheng Ouyang, Yinda Chen, Cesar César Quilodrán-Casas, Lei Ma, Jie Fu, Yike Guo, Anand Shah, Wenjia Bai, Rossella Arcucci

    Abstract: Expert annotation of 3D medical image for downstream analysis is resource-intensive, posing challenges in clinical applications. Visual self-supervised learning (vSSL), though effective for learning visual invariance, neglects the incorporation of domain knowledge from medicine. To incorporate medical knowledge into visual representation learning, vision-language pre-training (VLP) has shown promi… ▽ More

    Submitted 5 December, 2023; v1 submitted 3 December, 2023; originally announced December 2023.

  10. arXiv:2311.18539  [pdf, other

    cs.CR eess.SY

    Bridging Both Worlds in Semantics and Time: Domain Knowledge Based Analysis and Correlation of Industrial Process Attacks

    Authors: Moses Ike, Kandy Phan, Anwesh Badapanda, Matthew Landen, Keaton Sadoski, Wanda Guo, Asfahan Shah, Saman Zonouz, Wenke Lee

    Abstract: Modern industrial control systems (ICS) attacks infect supervisory control and data acquisition (SCADA) hosts to stealthily alter industrial processes, causing damage. To detect attacks with low false alarms, recent work detects attacks in both SCADA and process data. Unfortunately, this led to the same problem - disjointed (false) alerts, due to the semantic and time gap in SCADA and process beha… ▽ More

    Submitted 3 December, 2023; v1 submitted 30 November, 2023; originally announced November 2023.

  11. arXiv:2310.07161  [pdf, ps, other

    cs.SD cs.CL eess.AS

    Psychoacoustic Challenges Of Speech Enhancement On VoIP Platforms

    Authors: Joseph Konan, Ojas Bhargave, Shikhar Agnihotri, Shuo Han, Yunyang Zeng, Ankit Shah, Bhiksha Raj

    Abstract: Within the ambit of VoIP (Voice over Internet Protocol) telecommunications, the complexities introduced by acoustic transformations merit rigorous analysis. This research, rooted in the exploration of proprietary sender-side denoising effects, meticulously evaluates platforms such as Google Meets and Zoom. The study draws upon the Deep Noise Suppression (DNS) 2020 dataset, ensuring a structured ex… ▽ More

    Submitted 21 November, 2023; v1 submitted 10 October, 2023; originally announced October 2023.

  12. arXiv:2310.05932  [pdf, other

    cs.MA cs.AI eess.SY

    A Multi-Agent Systems Approach for Peer-to-Peer Energy Trading in Dairy Farming

    Authors: Mian Ibad Ali Shah, Abdul Wahid, Enda Barrett, Karl Mason

    Abstract: To achieve desired carbon emission reductions, integrating renewable generation and accelerating the adoption of peer-to-peer energy trading is crucial. This is especially important for energy-intensive farming, like dairy farming. However, integrating renewables and peer-to-peer trading presents challenges. To address this, we propose the Multi-Agent Peer-to-Peer Dairy Farm Energy Simulator (MAPD… ▽ More

    Submitted 21 August, 2023; originally announced October 2023.

    Comments: Proc. of the Artificial Intelligence for Sustainability, ECAI 2023, Eunika et al. (eds.), Sep 30- Oct 1, 2023, https://sites.google.com/view/ai4s. 2023

  13. arXiv:2309.14460  [pdf, other

    eess.AS cs.AI cs.CL cs.SD eess.SP

    Online Active Learning For Sound Event Detection

    Authors: Mark Lindsey, Ankit Shah, Francis Kubala, Richard M. Stern

    Abstract: Data collection and annotation is a laborious, time-consuming prerequisite for supervised machine learning tasks. Online Active Learning (OAL) is a paradigm that addresses this issue by simultaneously minimizing the amount of annotation required to train a classifier and adapting to changes in the data over the duration of the data collection process. Prior work has indicated that fluctuating clas… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

    Comments: Submitted to ICASSP 2024. Publication will belong to IEEE

  14. arXiv:2309.13227  [pdf, other

    cs.LG cs.SD eess.AS

    Importance of negative sampling in weak label learning

    Authors: Ankit Shah, Fuyu Tang, Zelin Ye, Rita Singh, Bhiksha Raj

    Abstract: Weak-label learning is a challenging task that requires learning from data "bags" containing positive and negative instances, but only the bag labels are known. The pool of negative instances is usually larger than positive instances, thus making selecting the most informative negative instance critical for performance. Such a selection strategy for negative instances from each bag is an open prob… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

  15. arXiv:2309.04641  [pdf, other

    cs.SD eess.AS

    Exploring Domain-Specific Enhancements for a Neural Foley Synthesizer

    Authors: Ashwin Pillay, Sage Betko, Ari Liloia, Hao Chen, Ankit Shah

    Abstract: Foley sound synthesis refers to the creation of authentic, diegetic sound effects for media, such as film or radio. In this study, we construct a neural Foley synthesizer capable of generating mono-audio clips across seven predefined categories. Our approach introduces multiple enhancements to existing models in the text-to-audio domain, with the goal of enriching the diversity and acoustic charac… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

  16. arXiv:2304.09756  [pdf, other

    cs.LG cs.AI cs.HC eess.SP

    Contactless Human Activity Recognition using Deep Learning with Flexible and Scalable Software Define Radio

    Authors: Muhammad Zakir Khan, Jawad Ahmad, Wadii Boulila, Matthew Broadbent, Syed Aziz Shah, Anis Koubaa, Qammer H. Abbasi

    Abstract: Ambient computing is gaining popularity as a major technological advancement for the future. The modern era has witnessed a surge in the advancement in healthcare systems, with viable radio frequency solutions proposed for remote and unobtrusive human activity recognition (HAR). Specifically, this study investigates the use of Wi-Fi channel state information (CSI) as a novel method of ambient sens… ▽ More

    Submitted 18 April, 2023; originally announced April 2023.

  17. arXiv:2303.09048  [pdf, other

    cs.SD cs.AI cs.LG cs.MM eess.AS

    Improving Perceptual Quality, Intelligibility, and Acoustics on VoIP Platforms

    Authors: Joseph Konan, Ojas Bhargave, Shikhar Agnihotri, Hojeong Lee, Ankit Shah, Shuo Han, Yunyang Zeng, Amanda Shu, Haohui Liu, Xuankai Chang, Hamza Khalid, Minseon Gwak, Kawon Lee, Minjeong Kim, Bhiksha Raj

    Abstract: In this paper, we present a method for fine-tuning models trained on the Deep Noise Suppression (DNS) 2020 Challenge to improve their performance on Voice over Internet Protocol (VoIP) applications. Our approach involves adapting the DNS 2020 models to the specific acoustic characteristics of VoIP communications, which includes distortion and artifacts caused by compression, transmission, and plat… ▽ More

    Submitted 15 March, 2023; originally announced March 2023.

    Comments: Under review at European Association for Signal Processing. 5 pages

  18. arXiv:2303.03591  [pdf, other

    cs.SD cs.LG cs.MM eess.AS

    Approach to Learning Generalized Audio Representation Through Batch Embedding Covariance Regularization and Constant-Q Transforms

    Authors: Ankit Shah, Shuyi Chen, Kejun Zhou, Yue Chen, Bhiksha Raj

    Abstract: General-purpose embedding is highly desirable for few-shot even zero-shot learning in many application scenarios, including audio tasks. In order to understand representations better, we conducted a thorough error analysis and visualization of HEAR 2021 submission results. Inspired by the analysis, this work experiments with different front-end audio preprocessing methods, including Constant-Q Tra… ▽ More

    Submitted 6 March, 2023; originally announced March 2023.

    Comments: Technical report, 10 pages

  19. arXiv:2302.10915  [pdf, other

    cs.LG cs.CL cs.CV cs.SD eess.AS

    Conformers are All You Need for Visual Speech Recognition

    Authors: Oscar Chang, Hank Liao, Dmitriy Serdyuk, Ankit Shah, Olivier Siohan

    Abstract: Visual speech recognition models extract visual features in a hierarchical manner. At the lower level, there is a visual front-end with a limited temporal receptive field that processes the raw pixels depicting the lips or faces. At the higher level, there is an encoder that attends to the embeddings produced by the front-end over a large temporal receptive field. Previous work has focused on impr… ▽ More

    Submitted 12 December, 2023; v1 submitted 16 February, 2023; originally announced February 2023.

  20. arXiv:2302.09516  [pdf, other

    eess.IV cs.CV

    A Bibliography of Multiple Sclerosis Lesions Detection Methods using Brain MRIs

    Authors: Atif Shah, Maged S. Al-Shaibani, Moataz Ahmad, Reem Bunyan

    Abstract: Introduction: Multiple Sclerosis (MS) is a chronic disease that affects millions of people across the globe. MS can critically affect different organs of the central nervous system such as the eyes, the spinal cord, and the brain. Background: To help physicians in diagnosing MS lesions, computer-aided methods are widely used. In this regard, a considerable research has been carried out in the ar… ▽ More

    Submitted 19 February, 2023; originally announced February 2023.

  21. arXiv:2211.03333  [pdf

    eess.SP

    Learning From Alarms: A Robust Learning Approach for Accurate Photoplethysmography-Based Atrial Fibrillation Detection using Eight Million Samples Labeled with Imprecise Arrhythmia Alarms

    Authors: Cheng Ding, Zhicheng Guo, Cynthia Rudin, Ran Xiao, Amit Shah, Duc H. Do, Randall J Lee, Gari Clifford, Fadi B Nahab, Xiao Hu

    Abstract: Atrial fibrillation (AF) is a common cardiac arrhythmia with serious health consequences if not detected and treated early. Detecting AF using wearable devices with photoplethysmography (PPG) sensors and deep neural networks has demonstrated some success using proprietary algorithms in commercial solutions. However, further advancement of this paradigm of continuous AF detection in ambulatory sett… ▽ More

    Submitted 12 November, 2023; v1 submitted 7 November, 2022; originally announced November 2022.

  22. arXiv:2210.14446  [pdf, other

    cs.CL cs.SD eess.AS

    Smart Speech Segmentation using Acousto-Linguistic Features with look-ahead

    Authors: Piyush Behre, Naveen Parihar, Sharman Tan, Amy Shah, Eva Sharma, Geoffrey Liu, Shuangyu Chang, Hosam Khalil, Chris Basoglu, Sayan Pathak

    Abstract: Segmentation for continuous Automatic Speech Recognition (ASR) has traditionally used silence timeouts or voice activity detectors (VADs), which are both limited to acoustic features. This segmentation is often overly aggressive, given that people naturally pause to think as they speak. Consequently, segmentation happens mid-sentence, hindering both punctuation and downstream tasks like machine tr… ▽ More

    Submitted 27 October, 2022; v1 submitted 25 October, 2022; originally announced October 2022.

  23. arXiv:2208.01637  [pdf, other

    eess.IV cs.CV

    Comparative Analysis of State-of-the-Art Deep Learning Models for Detecting COVID-19 Lung Infection from Chest X-Ray Images

    Authors: Zeba Ghaffar, Pir Masoom Shah, Hikmat Khan, Syed Farhan Alam Zaidi, Abdullah Gani, Izaz Ahmad Khan, Munam Ali Shah, Saif ul Islam

    Abstract: The ongoing COVID-19 pandemic has already taken millions of lives and damaged economies across the globe. Most COVID-19 deaths and economic losses are reported from densely crowded cities. It is comprehensible that the effective control and prevention of epidemic/pandemic infectious diseases is vital. According to WHO, testing and diagnosis is the best strategy to control pandemics. Scientists wor… ▽ More

    Submitted 30 June, 2022; originally announced August 2022.

  24. arXiv:2207.04156  [pdf, other

    cs.SD cs.CL cs.IR eess.AS

    Automated Audio Captioning and Language-Based Audio Retrieval

    Authors: Clive Gomes, Hyejin Park, Patrick Kollman, Yi Song, Iffanice Houndayi, Ankit Shah

    Abstract: This project involved participation in the DCASE 2022 Competition (Task 6) which had two subtasks: (1) Automated Audio Captioning and (2) Language-Based Audio Retrieval. The first subtask involved the generation of a textual description for audio samples, while the goal of the second was to find audio samples within a fixed dataset that match a given description. For both subtasks, the Clotho data… ▽ More

    Submitted 15 May, 2023; v1 submitted 8 July, 2022; originally announced July 2022.

    Comments: DCASE 2022 Competition (Task 6)

  25. arXiv:2206.04632  [pdf, other

    cs.RO cs.AI cs.FL cs.LG eess.SY

    Temporal Logic Imitation: Learning Plan-Satisficing Motion Policies from Demonstrations

    Authors: Yanwei Wang, Nadia Figueroa, Shen Li, Ankit Shah, Julie Shah

    Abstract: Learning from demonstration (LfD) has succeeded in tasks featuring a long time horizon. However, when the problem complexity also includes human-in-the-loop perturbations, state-of-the-art approaches do not guarantee the successful reproduction of a task. In this work, we identify the roots of this challenge as the failure of a learned continuous policy to satisfy the discrete plan implicit in the… ▽ More

    Submitted 14 December, 2022; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: CoRL 2022 Oral Talk

  26. arXiv:2206.02358  [pdf, other

    eess.SP cs.AI cs.CV eess.SY

    Implementation of a Modified U-Net for Medical Image Segmentation on Edge Devices

    Authors: Owais Ali, Hazrat Ali, Syed Ayaz Ali Shah, Aamir Shahzad

    Abstract: Deep learning techniques, particularly convolutional neural networks, have shown great potential in computer vision and medical imaging applications. However, deep learning models are computationally demanding as they require enormous computational power and specialized processing hardware for model training. To make these models portable and compatible for prototyping, their implementation on low… ▽ More

    Submitted 6 June, 2022; originally announced June 2022.

    Comments: Preprint of paper accepted in IEEE Transactions on Circuits and Systems II: Express Brief

  27. arXiv:2204.09909  [pdf, other

    eess.IV cs.CV

    An Efficient End-to-End Deep Neural Network for Interstitial Lung Disease Recognition and Classification

    Authors: Masum Shah Junayed, Afsana Ahsan Jeny, Md Baharul Islam, Ikhtiar Ahmed, A F M Shahen Shah

    Abstract: The automated Interstitial Lung Diseases (ILDs) classification technique is essential for assisting clinicians during the diagnosis process. Detecting and classifying ILDs patterns is a challenging problem. This paper introduces an end-to-end deep convolution neural network (CNN) for classifying ILDs patterns. The proposed model comprises four convolutional layers with different kernel sizes and R… ▽ More

    Submitted 21 April, 2022; originally announced April 2022.

    Comments: Turkish Journal of Electrical Engineering and Computer Sciences

  28. arXiv:2204.06322  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Production federated keyword spotting via distillation, filtering, and joint federated-centralized training

    Authors: Andrew Hard, Kurt Partridge, Neng Chen, Sean Augenstein, Aishanee Shah, Hyun Jin Park, Alex Park, Sara Ng, Jessica Nguyen, Ignacio Lopez Moreno, Rajiv Mathews, Françoise Beaufays

    Abstract: We trained a keyword spotting model using federated learning on real user devices and observed significant improvements when the model was deployed for inference on phones. To compensate for data domains that are missing from on-device training caches, we employed joint federated-centralized training. And to learn in the absence of curated labels on-device, we formulated a confidence filtering str… ▽ More

    Submitted 29 June, 2022; v1 submitted 11 April, 2022; originally announced April 2022.

    Comments: Accepted to Interspeech 2022

  29. arXiv:2204.04802  [pdf, other

    cs.SD cs.LG cs.MM eess.AS

    On the pragmatism of using binary classifiers over data intensive neural network classifiers for detection of COVID-19 from voice

    Authors: Ankit Shah, Hira Dhamyal, Yang Gao, Daniel Arancibia, Mario Arancibia, Bhiksha Raj, Rita Singh

    Abstract: Lately, there has been a global effort by multiple research groups to detect COVID-19 from voice. Different researchers use different kinds of information from the voice signal to achieve this. Various types of phonated sounds and the sound of cough and breath have all been used with varying degree of success in automated voice-based COVID-19 detection apps. In this paper, we show that detecting C… ▽ More

    Submitted 25 October, 2022; v1 submitted 10 April, 2022; originally announced April 2022.

    Comments: Submitted to ICASSP 2022

  30. arXiv:2203.10363  [pdf, other

    cs.CV cs.GR eess.IV

    Towards Device Efficient Conditional Image Generation

    Authors: Nisarg A. Shah, Gaurav Bharaj

    Abstract: We present a novel algorithm to reduce tensor compute required by a conditional image generation autoencoder without sacrificing quality of photo-realistic image generation. Our method is device agnostic, and can optimize an autoencoder for a given CPU-only, GPU compute device(s) in about normal time it takes to train an autoencoder on a generic workstation. We achieve this via a two-stage novel s… ▽ More

    Submitted 13 October, 2022; v1 submitted 19 March, 2022; originally announced March 2022.

    Comments: British Machine Vision Conference 2022

  31. arXiv:2203.02483  [pdf, other

    cs.SD cs.LG cs.MM eess.AS

    Ontological Learning from Weak Labels

    Authors: Larry Tang, Po Hao Chou, Yi Yu Zheng, Ziqian Ge, Ankit Shah, Bhiksha Raj

    Abstract: Ontologies encompass a formal representation of knowledge through the definition of concepts or properties of a domain, and the relationships between those concepts. In this work, we seek to investigate whether using this ontological information will improve learning from weakly labeled data, which are easier to collect since it requires only the presence or absence of an event to be known. We use… ▽ More

    Submitted 4 March, 2022; originally announced March 2022.

  32. arXiv:2203.00845  [pdf, other

    eess.IV cs.AI cs.CV

    Can No-reference features help in Full-reference image quality estimation?

    Authors: Saikat Dutta, Sourya Dipta Das, Nisarg A. Shah

    Abstract: Development of perceptual image quality assessment (IQA) metrics has been of significant interest to computer vision community. The aim of these metrics is to model quality of an image as perceived by humans. Recent works in Full-reference IQA research perform pixelwise comparison between deep features corresponding to query and reference images for quality prediction. However, pixelwise feature c… ▽ More

    Submitted 1 March, 2022; originally announced March 2022.

    Comments: Code to be updated on: https://github.com/saikatdutta/nr-in-friqa

  33. ADAM Challenge: Detecting Age-related Macular Degeneration from Fundus Images

    Authors: Huihui Fang, Fei Li, Huazhu Fu, Xu Sun, Xingxing Cao, Fengbin Lin, Jaemin Son, Sunho Kim, Gwenole Quellec, Sarah Matta, Sharath M Shankaranarayana, Yi-Ting Chen, Chuen-heng Wang, Nisarg A. Shah, Chia-Yen Lee, Chih-Chung Hsu, Hai Xie, Baiying Lei, Ujjwal Baid, Shubham Innani, Kang Dang, Wenxiu Shi, Ravi Kamble, Nitin Singhal, Ching-Wei Wang , et al. (6 additional authors not shown)

    Abstract: Age-related macular degeneration (AMD) is the leading cause of visual impairment among elderly in the world. Early detection of AMD is of great importance, as the vision loss caused by this disease is irreversible and permanent. Color fundus photography is the most cost-effective imaging modality to screen for retinal disorders. Cutting edge deep learning based algorithms have been recently develo… ▽ More

    Submitted 6 May, 2022; v1 submitted 16 February, 2022; originally announced February 2022.

    Comments: 31 pages, 17 figures

  34. arXiv:2202.05662  [pdf, other

    cs.CR cs.SD eess.AS

    A Novel Chaos-based Light-weight Image Encryption Scheme for Multi-modal Hearing Aids

    Authors: Awais Aziz Shah, Ahsan Adeel, Jawad Ahmad, Ahmed Al-Dubai, Mandar Gogate, Abhijeet Bishnu, Muhammad Diyan, Tassadaq Hussain, Kia Dashtipour, Tharm Ratnarajah, Amir Hussain

    Abstract: Multimodal hearing aids (HAs) aim to deliver more intelligible audio in noisy environments by contextually sensing and processing data in the form of not only audio but also visual information (e.g. lip reading). Machine learning techniques can play a pivotal role for the contextually processing of multimodal data. However, since the computational power of HA devices is low, therefore this data mu… ▽ More

    Submitted 11 February, 2022; originally announced February 2022.

  35. arXiv:2201.02449  [pdf, ps, other

    cs.RO eess.SY

    Online 3-Axis Magnetometer Hard-Iron and Soft-Iron Bias and Angular Velocity Sensor Bias Estimation Using Angular Velocity Sensors for Improved Dynamic Heading Accuracy

    Authors: Andrew R. Spielvogel, Abhimanyu S. Shah, Louis L. Whitcomb

    Abstract: This article addresses the problem of dynamic on-line estimation and compensation of hard-iron and soft-iron biases of 3-axis magnetometers under dynamic motion in field robotics, utilizing only biased measurements from a 3-axis magnetometer and a 3-axis angular rate sensor. The proposed magnetometer and angular velocity bias estimator (MAVBE) utilizes a 15-state process model encoding the nonline… ▽ More

    Submitted 7 January, 2022; originally announced January 2022.

    Comments: Preprint of an article accepted for publication in Field Robotics, https://FieldRobotics.net, Special Issue in Unmanned Marine Systems. Submitted January 16, 2021; Revised May 28, 2021; Accepted August 2, 2021

  36. arXiv:2112.10074  [pdf, other

    eess.IV cs.CV cs.LG

    QU-BraTS: MICCAI BraTS 2020 Challenge on Quantifying Uncertainty in Brain Tumor Segmentation - Analysis of Ranking Scores and Benchmarking Results

    Authors: Raghav Mehta, Angelos Filos, Ujjwal Baid, Chiharu Sako, Richard McKinley, Michael Rebsamen, Katrin Datwyler, Raphael Meier, Piotr Radojewski, Gowtham Krishnan Murugesan, Sahil Nalawade, Chandan Ganesh, Ben Wagner, Fang F. Yu, Baowei Fei, Ananth J. Madhuranthakam, Joseph A. Maldjian, Laura Daza, Catalina Gomez, Pablo Arbelaez, Chengliang Dai, Shuo Wang, Hadrien Reynaud, Yuan-han Mo, Elsa Angelini , et al. (67 additional authors not shown)

    Abstract: Deep learning (DL) models have provided state-of-the-art performance in various medical imaging benchmarking challenges, including the Brain Tumor Segmentation (BraTS) challenges. However, the task of focal pathology multi-compartment segmentation (e.g., tumor and lesion sub-regions) is particularly challenging, and potential errors hinder translating DL models into clinical workflows. Quantifying… ▽ More

    Submitted 23 August, 2022; v1 submitted 19 December, 2021; originally announced December 2021.

    Comments: Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA): https://www.melba-journal.org/papers/2022:026.html

    Journal ref: Machine.Learning.for.Biomedical.Imaging. 1 (2022)

  37. arXiv:2111.08606  [pdf

    eess.IV cs.CV

    Advancement of Deep Learning in Pneumonia and Covid-19 Classification and Localization: A Qualitative and Quantitative Analysis

    Authors: Aakash Shah, Manan Shah

    Abstract: Around 450 million people are affected by pneumonia every year which results in 2.5 million deaths. Covid-19 has also affected 181 million people which has lead to 3.92 million casualties. The chances of death in both of these diseases can be significantly reduced if they are diagnosed early. However, the current methods of diagnosing pneumonia (complaints + chest X-ray) and covid-19 (RT-PCR) requ… ▽ More

    Submitted 16 November, 2021; originally announced November 2021.

    Comments: 20 pages, 5 figures, 5 tables

    Report number: CDTM-D-21-00047R2

  38. arXiv:2110.09584  [pdf, other

    eess.SY cs.RO

    Set-based State Estimation with Probabilistic Consistency Guarantee under Epistemic Uncertainty

    Authors: Shen Li, Theodoros Stouraitis, Michael Gienger, Sethu Vijayakumar, Julie A. Shah

    Abstract: Consistent state estimation is challenging, especially under the epistemic uncertainties arising from learned (nonlinear) dynamic and observation models. In this work, we propose a set-based estimation algorithm, named Gaussian Process-Zonotopic Kalman Filter (GP-ZKF), that produces zonotopic state estimates while respecting both the epistemic uncertainties in the learned models and aleatoric unce… ▽ More

    Submitted 25 February, 2022; v1 submitted 18 October, 2021; originally announced October 2021.

    Comments: Published at IEEE Robotics and Automation Letters, 2022. Video: https://www.youtube.com/watch?v=CvIPJlALaFU Copyright: 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any media, including reprinting/republishing for any purposes, creating new works, for resale or redistribution, or reuse of any copyrighted component of this work

  39. arXiv:2110.04678  [pdf, other

    cs.SD cs.AI eess.AS

    An Overview of Techniques for Biomarker Discovery in Voice Signal

    Authors: Rita Singh, Ankit Shah, Hira Dhamyal

    Abstract: This paper reflects on the effect of several categories of medical conditions on human voice, focusing on those that may be hypothesized to have effects on voice, but for which the changes themselves may be subtle enough to have eluded observation in standard analytical examinations of the voice signal. It presents three categories of techniques that can potentially uncover such elusive biomarkers… ▽ More

    Submitted 9 October, 2021; originally announced October 2021.

    Comments: Last two authors contributed equally to the paper

  40. arXiv:2106.12864  [pdf, other

    eess.IV cs.CV cs.LG

    A Systematic Collection of Medical Image Datasets for Deep Learning

    Authors: Johann Li, Guangming Zhu, Cong Hua, Mingtao Feng, BasheerBennamoun, Ping Li, Xiaoyuan Lu, Juan Song, Peiyi Shen, Xu Xu, Lin Mei, Liang Zhang, Syed Afaq Ali Shah, Mohammed Bennamoun

    Abstract: The astounding success made by artificial intelligence (AI) in healthcare and other fields proves that AI can achieve human-like performance. However, success always comes with challenges. Deep learning algorithms are data-dependent and require large datasets for training. The lack of data in the medical imaging field creates a bottleneck for the application of deep learning to medical image analy… ▽ More

    Submitted 24 June, 2021; originally announced June 2021.

    Comments: This paper has been submitted to one journal

  41. arXiv:2105.08819  [pdf, other

    eess.IV cs.CV cs.LG

    Fast and Accurate Quantized Camera Scene Detection on Smartphones, Mobile AI 2021 Challenge: Report

    Authors: Andrey Ignatov, Grigory Malivenko, Radu Timofte, Sheng Chen, Xin Xia, Zhaoyan Liu, Yuwei Zhang, Feng Zhu, Jiashi Li, Xuefeng Xiao, Yuan Tian, Xinglong Wu, Christos Kyrkou, Yixin Chen, Zexin Zhang, Yunbo Peng, Yue Lin, Saikat Dutta, Sourya Dipta Das, Nisarg A. Shah, Himanshu Kumar, Chao Ge, Pei-Lin Wu, Jin-Hua Du, Andrew Batutin , et al. (6 additional authors not shown)

    Abstract: Camera scene detection is among the most popular computer vision problem on smartphones. While many custom solutions were developed for this task by phone vendors, none of the designed models were available publicly up until now. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop quantized deep learning-based camera scene classification solutions th… ▽ More

    Submitted 17 May, 2021; originally announced May 2021.

    Comments: Mobile AI 2021 Workshop and Challenges: https://ai-benchmark.com/workshops/mai/2021/. arXiv admin note: substantial text overlap with arXiv:2105.08630; text overlap with arXiv:2105.07825, arXiv:2105.07809, arXiv:2105.08629

  42. arXiv:2104.05778  [pdf, other

    eess.IV cs.CV

    Efficient Space-time Video Super Resolution using Low-Resolution Flow and Mask Upsampling

    Authors: Saikat Dutta, Nisarg A. Shah, Anurag Mittal

    Abstract: This paper explores an efficient solution for Space-time Super-Resolution, aiming to generate High-resolution Slow-motion videos from Low Resolution and Low Frame rate videos. A simplistic solution is the sequential running of Video Super Resolution and Video Frame interpolation models. However, this type of solutions are memory inefficient, have high inference time, and could not make the proper… ▽ More

    Submitted 8 June, 2021; v1 submitted 12 April, 2021; originally announced April 2021.

    Comments: Accepted at NTIRE Workshop, CVPR 2021. Code and models: https://github.com/saikatdutta/FMU_STSR

  43. arXiv:2104.01511  [pdf, other

    eess.SP cs.LG

    Late fusion of machine learning models using passively captured interpersonal social interactions and motion from smartphones predicts decompensation in heart failure

    Authors: Ayse S. Cakmak, Samuel Densen, Gabriel Najarro, Pratik Rout, Christopher J. Rozell, Omer T. Inan, Amit J. Shah, Gari D. Clifford

    Abstract: Objective: Worldwide, heart failure (HF) is a major cause of morbidity and mortality and one of the leading causes of hospitalization. Early detection of HF symptoms and pro-active management may reduce adverse events. Approach: Twenty-eight participants were monitored using a smartphone app after discharge from hospitals, and each clinical event during the enrollment (N=110 clinical events) was r… ▽ More

    Submitted 3 April, 2021; originally announced April 2021.

  44. arXiv:2103.09289  [pdf, other

    eess.IV cs.CV cs.LG

    Colorectal Cancer Segmentation using Atrous Convolution and Residual Enhanced UNet

    Authors: Nisarg A. Shah, Divij Gupta, Romil Lodaya, Ujjwal Baid, Sanjay Talbar

    Abstract: Colorectal cancer is a leading cause of death worldwide. However, early diagnosis dramatically increases the chances of survival, for which it is crucial to identify the tumor in the body. Since its imaging uses high-resolution techniques, annotating the tumor is time-consuming and requires particular expertise. Lately, methods built upon Convolutional Neural Networks(CNNs) have proven to be at pa… ▽ More

    Submitted 16 March, 2021; originally announced March 2021.

    Comments: 5th IAPR International Conference on Computer Vision and Image Processing, 12 pages

  45. arXiv:2011.04988  [pdf, other

    eess.IV cs.CV

    AIM 2020 Challenge on Rendering Realistic Bokeh

    Authors: Andrey Ignatov, Radu Timofte, Ming Qian, Congyu Qiao, Jiamin Lin, Zhenyu Guo, Chenghua Li, Cong Leng, Jian Cheng, Juewen Peng, Xianrui Luo, Ke Xian, Zijin Wu, Zhiguo Cao, Densen Puthussery, Jiji C V, Hrishikesh P S, Melvin Kuriakose, Saikat Dutta, Sourya Dipta Das, Nisarg A. Shah, Kuldeep Purohit, Praveen Kandula, Maitreya Suin, A. N. Rajagopalan , et al. (10 additional authors not shown)

    Abstract: This paper reviews the second AIM realistic bokeh effect rendering challenge and provides the description of the proposed solutions and results. The participating teams were solving a real-world bokeh simulation problem, where the goal was to learn a realistic shallow focus technique using a large-scale EBB! bokeh dataset consisting of 5K shallow / wide depth-of-field image pairs captured using th… ▽ More

    Submitted 10 November, 2020; originally announced November 2020.

    Comments: Published in ECCV 2020 Workshop (Advances in Image Manipulation), https://data.vision.ee.ethz.ch/cvl/aim20/

  46. arXiv:2010.06659  [pdf, other

    eess.AS cs.LG cs.SD

    Towards Data-efficient Modeling for Wake Word Spotting

    Authors: Yixin Gao, Yuriy Mishchenko, Anish Shah, Spyros Matsoukas, Shiv Vitaladevuni

    Abstract: Wake word (WW) spotting is challenging in far-field not only because of the interference in signal transmission but also the complexity in acoustic environments. Traditional WW model training requires large amount of in-domain WW-specific data with substantial human annotations therefore it is hard to build WW models without such data. In this paper we present data-efficient solutions to address t… ▽ More

    Submitted 13 October, 2020; originally announced October 2020.

    Journal ref: Proc. ICASSP 2020

  47. arXiv:2009.12798  [pdf, other

    cs.CV eess.IV

    AIM 2020: Scene Relighting and Illumination Estimation Challenge

    Authors: Majed El Helou, Ruofan Zhou, Sabine Süsstrunk, Radu Timofte, Mahmoud Afifi, Michael S. Brown, Kele Xu, Hengxing Cai, Yuzhong Liu, Li-Wen Wang, Zhi-Song Liu, Chu-Tak Li, Sourya Dipta Das, Nisarg A. Shah, Akashdeep Jassal, Tongtong Zhao, Shanshan Zhao, Sabari Nathan, M. Parisa Beham, R. Suganya, Qing Wang, Zhongyun Hu, Xin Huang, Yaning Li, Maitreya Suin , et al. (12 additional authors not shown)

    Abstract: We review the AIM 2020 challenge on virtual image relighting and illumination estimation. This paper presents the novel VIDIT dataset used in the challenge and the different proposed solutions and final evaluation results over the 3 challenge tracks. The first track considered one-to-one relighting; the objective was to relight an input photo of a scene with a different color temperature and illum… ▽ More

    Submitted 27 September, 2020; originally announced September 2020.

    Comments: ECCVW 2020. Data and more information on https://github.com/majedelhelou/VIDIT

  48. arXiv:2008.07742  [pdf, other

    eess.IV cs.CV

    UDC 2020 Challenge on Image Restoration of Under-Display Camera: Methods and Results

    Authors: Yuqian Zhou, Michael Kwan, Kyle Tolentino, Neil Emerton, Sehoon Lim, Tim Large, Lijiang Fu, Zhihong Pan, Baopu Li, Qirui Yang, Yihao Liu, Jigang Tang, Tao Ku, Shibin Ma, Bingnan Hu, Jiarong Wang, Densen Puthussery, Hrishikesh P S, Melvin Kuriakose, Jiji C V, Varun Sundar, Sumanth Hegde, Divya Kothandaraman, Kaushik Mitra, Akashdeep Jassal , et al. (20 additional authors not shown)

    Abstract: This paper is the report of the first Under-Display Camera (UDC) image restoration challenge in conjunction with the RLQ workshop at ECCV 2020. The challenge is based on a newly-collected database of Under-Display Camera. The challenge tracks correspond to two types of display: a 4k Transparent OLED (T-OLED) and a phone Pentile OLED (P-OLED). Along with about 150 teams registered the challenge, ei… ▽ More

    Submitted 18 August, 2020; originally announced August 2020.

    Comments: 15 pages

  49. Accurate Detection of Wake Word Start and End Using a CNN

    Authors: Christin Jose, Yuriy Mishchenko, Thibaud Senechal, Anish Shah, Alex Escott, Shiv Vitaladevuni

    Abstract: Small footprint embedded devices require keyword spotters (KWS) with small model size and detection latency for enabling voice assistants. Such a keyword is often referred to as \textit{wake word} as it is used to wake up voice assistant enabled devices. Together with wake word detection, accurate estimation of wake word endpoints (start and end) is an important task of KWS. In this paper, we prop… ▽ More

    Submitted 9 August, 2020; originally announced August 2020.

    Comments: Proceedings of INTERSPEECH

    Journal ref: Interspeech 2020

  50. arXiv:2008.02567  [pdf, other

    eess.SP cs.CY

    An Intelligent Non-Invasive Real Time Human Activity Recognition System for Next-Generation Healthcare

    Authors: William Taylor, Syed Aziz Shah, Kia Dashtipour, Adnan Zahid, Qammer H. Abbasi, Muhammad Ali Imran

    Abstract: Human motion detection is getting considerable attention in the field of Artificial Intelligence (AI) driven healthcare systems. Human motion can be used to provide remote healthcare solutions for vulnerable people by identifying particular movements such as falls, gait and breathing disorders. This can allow people to live more independent lifestyles and still have the safety of being monitored i… ▽ More

    Submitted 6 August, 2020; originally announced August 2020.

    Comments: 20 pages 18 figures, journal

    Journal ref: Sensors 2020, 20(9), 2653