Skip to main content

Showing 1–43 of 43 results for author: Tewfik, A

  1. arXiv:2406.09617  [pdf, other

    cs.CL cs.HC eess.AS

    Multimodal Large Language Models with Fusion Low Rank Adaptation for Device Directed Speech Detection

    Authors: Shruti Palaskar, Oggi Rudovic, Sameer Dharur, Florian Pesce, Gautam Krishna, Aswin Sivaraman, Jack Berkowitz, Ahmed Hussen Abdelaziz, Saurabh Adya, Ahmed Tewfik

    Abstract: Although Large Language Models (LLMs) have shown promise for human-like conversations, they are primarily pre-trained on text data. Incorporating audio or video improves performance, but collecting large-scale multimodal data and pre-training multimodal LLMs is challenging. To this end, we propose a Fusion Low Rank Adaptation (FLoRA) technique that efficiently adapts a pre-trained unimodal LLM to… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024

  2. arXiv:2310.15261  [pdf, ps, other

    cs.SD cs.HC cs.LG eess.AS

    Modality Dropout for Multimodal Device Directed Speech Detection using Verbal and Non-Verbal Features

    Authors: Gautam Krishna, Sameer Dharur, Oggi Rudovic, Pranay Dighe, Saurabh Adya, Ahmed Hussen Abdelaziz, Ahmed H Tewfik

    Abstract: Device-directed speech detection (DDSD) is the binary classification task of distinguishing between queries directed at a voice assistant versus side conversation or background speech. State-of-the-art DDSD systems use verbal cues, e.g acoustic, text and/or automatic speech recognition system (ASR) features, to classify speech as device-directed or otherwise, and often have to contend with one or… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: 5 pages

  3. Streaming Anchor Loss: Augmenting Supervision with Temporal Significance

    Authors: Utkarsh Oggy Sarawgi, John Berkowitz, Vineet Garg, Arnav Kundu, Minsik Cho, Sai Srujana Buddi, Saurabh Adya, Ahmed Tewfik

    Abstract: Streaming neural network models for fast frame-wise responses to various speech and sensory signals are widely adopted on resource-constrained platforms. Hence, increasing the learning capacity of such streaming models (i.e., by adding more parameters) to improve the predictive power may not be viable for real-world tasks. In this work, we propose a new loss, Streaming Anchor Loss (SAL), to better… ▽ More

    Submitted 18 April, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

    Comments: Published at IEEE ICASSP 2024, please see https://ieeexplore.ieee.org/abstract/document/10447222

    ACM Class: I.2.6; I.5.1; I.5.4; I.6.5

    Journal ref: In ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6110-6114). IEEE

  4. arXiv:2309.04842  [pdf, other

    cs.CL cs.HC cs.SD eess.AS

    Leveraging Large Language Models for Exploiting ASR Uncertainty

    Authors: Pranay Dighe, Yi Su, Shangshang Zheng, Yunshu Liu, Vineet Garg, Xiaochuan Niu, Ahmed Tewfik

    Abstract: While large language models excel in a variety of natural language processing (NLP) tasks, to perform well on spoken language understanding (SLU) tasks, they must either rely on off-the-shelf automatic speech recognition (ASR) systems for transcription, or be equipped with an in-built speech modality. This work focuses on the former scenario, where LLM's accuracy on SLU tasks is constrained by the… ▽ More

    Submitted 12 September, 2023; v1 submitted 9 September, 2023; originally announced September 2023.

    Comments: Added references

  5. arXiv:2302.10450  [pdf, other

    cs.CV eess.SP

    Automotive RADAR sub-sampling via object detection networks: Leveraging prior signal information

    Authors: Madhumitha Sakthi, Ahmed Tewfik, Marius Arvinte, Haris Vikalo

    Abstract: Automotive radar has increasingly attracted attention due to growing interest in autonomous driving technologies. Acquiring situational awareness using multimodal data collected at high sampling rates by various sensing devices including cameras, LiDAR, and radar requires considerable power, memory and compute resources which are often limited at an edge device. In this paper, we present a novel a… ▽ More

    Submitted 21 February, 2023; originally announced February 2023.

  6. arXiv:2210.12134  [pdf, other

    cs.CL cs.HC cs.SD eess.AS

    Audio-to-Intent Using Acoustic-Textual Subword Representations from End-to-End ASR

    Authors: Pranay Dighe, Prateeth Nayak, Oggi Rudovic, Erik Marchi, Xiaochuan Niu, Ahmed Tewfik

    Abstract: Accurate prediction of the user intent to interact with a voice assistant (VA) on a device (e.g. on the phone) is critical for achieving naturalistic, engaging, and privacy-centric interactions with the VA. To this end, we present a novel approach to predict the user's intent (the user speaking to the device or not) directly from acoustic and textual information encoded at subword tokens which are… ▽ More

    Submitted 21 October, 2022; originally announced October 2022.

  7. arXiv:2207.04394  [pdf, other

    cs.CV

    Radiomics-Guided Global-Local Transformer for Weakly Supervised Pathology Localization in Chest X-Rays

    Authors: Yan Han, Gregory Holste, Ying Ding, Ahmed Tewfik, Yifan Peng, Zhangyang Wang

    Abstract: Before the recent success of deep learning methods for automated medical image analysis, practitioners used handcrafted radiomic features to quantitatively describe local patches of medical images. However, extracting discriminative radiomic features relies on accurate pathology localization, which is difficult to acquire in real-world settings. Despite advances in disease classification and local… ▽ More

    Submitted 19 October, 2022; v1 submitted 10 July, 2022; originally announced July 2022.

  8. arXiv:2204.02455  [pdf, other

    cs.SD cs.LG eess.AS

    Improving Voice Trigger Detection with Metric Learning

    Authors: Prateeth Nayak, Takuya Higuchi, Anmol Gupta, Shivesh Ranjan, Stephen Shum, Siddharth Sigtia, Erik Marchi, Varun Lakshminarasimhan, Minsik Cho, Saurabh Adya, Chandra Dhir, Ahmed Tewfik

    Abstract: Voice trigger detection is an important task, which enables activating a voice assistant when a target user speaks a keyword phrase. A detector is typically trained on speech data independent of speaker information and used for the voice trigger detection task. However, such a speaker independent voice trigger detector typically suffers from performance degradation on speech from underrepresented… ▽ More

    Submitted 13 September, 2022; v1 submitted 5 April, 2022; originally announced April 2022.

    Comments: Accepted at InterSpeech 2022

  9. arXiv:2203.15975  [pdf, other

    eess.AS cs.HC cs.LG cs.SD

    Device-Directed Speech Detection: Regularization via Distillation for Weakly-Supervised Models

    Authors: Vineet Garg, Ognjen Rudovic, Pranay Dighe, Ahmed H. Abdelaziz, Erik Marchi, Saurabh Adya, Chandra Dhir, Ahmed Tewfik

    Abstract: We address the problem of detecting speech directed to a device that does not contain a specific wake-word. Specifically, we focus on audio coming from a touch-based invocation. Mitigating virtual assistants (VAs) activation due to accidental button presses is critical for user experience. While the majority of approaches to false trigger mitigation (FTM) are designed to detect the presence of a t… ▽ More

    Submitted 29 March, 2022; originally announced March 2022.

    Comments: Submitted to INTERSPEECH 2022

  10. arXiv:2203.03905  [pdf, other

    cs.CV

    End-to-end system for object detection from sub-sampled radar data

    Authors: Madhumitha Sakthi, Ahmed Tewfik, Marius Arvinte, Haris Vikalo

    Abstract: Robust and accurate sensing is of critical importance for advancing autonomous automotive systems. The need to acquire situational awareness in complex urban conditions using sensors such as radar has motivated research on power and latency-efficient signal acquisition methods. In this paper, we present an end-to-end signal processing pipeline, capable of operating in extreme weather conditions, t… ▽ More

    Submitted 8 March, 2022; originally announced March 2022.

    Comments: Submitted to EUSIPCO 2022

  11. arXiv:2104.04968  [pdf, other

    cs.CV

    Knowledge-Augmented Contrastive Learning for Abnormality Classification and Localization in Chest X-rays with Radiomics using a Feedback Loop

    Authors: Yan Han, Chongyan Chen, Ahmed Tewfik, Benjamin Glicksberg, Ying Ding, Yifan Peng, Zhangyang Wang

    Abstract: Building a highly accurate predictive model for classification and localization of abnormalities in chest X-rays usually requires a large number of manually annotated labels and pixel regions (bounding boxes) of abnormalities. However, it is expensive to acquire such annotations, especially the bounding boxes. Recently, contrastive learning has shown strong promise in leveraging unlabeled natural… ▽ More

    Submitted 4 May, 2022; v1 submitted 11 April, 2021; originally announced April 2021.

    Comments: Accepted by WACV 2022

  12. arXiv:2103.02087  [pdf, other

    eess.SP cs.LG

    Deep J-Sense: Accelerated MRI Reconstruction via Unrolled Alternating Optimization

    Authors: Marius Arvinte, Sriram Vishwanath, Ahmed H. Tewfik, Jonathan I. Tamir

    Abstract: Accelerated multi-coil magnetic resonance imaging reconstruction has seen a substantial recent improvement combining compressed sensing with deep learning. However, most of these methods rely on estimates of the coil sensitivity profiles, or on calibration data for estimating model parameters. Prior work has shown that these methods degrade in performance when the quality of these estimators are p… ▽ More

    Submitted 11 April, 2021; v1 submitted 2 March, 2021; originally announced March 2021.

  13. arXiv:2103.00383  [pdf, other

    cs.SD cs.LG eess.AS q-bio.QM

    Brain Signals to Rescue Aphasia, Apraxia and Dysarthria Speech Recognition

    Authors: Gautam Krishna, Mason Carnahan, Shilpa Shamapant, Yashitha Surendranath, Saumya Jain, Arundhati Ghosh, Co Tran, Jose del R Millan, Ahmed H Tewfik

    Abstract: In this paper, we propose a deep learning-based algorithm to improve the performance of automatic speech recognition (ASR) systems for aphasia, apraxia, and dysarthria speech by utilizing electroencephalography (EEG) features recorded synchronously with aphasia, apraxia, and dysarthria speech. We demonstrate a significant decoding performance improvement by more than 50\% during test time for isol… ▽ More

    Submitted 17 July, 2021; v1 submitted 27 February, 2021; originally announced March 2021.

    Comments: Accepted to IEEE EMBC 2021

  14. arXiv:2101.04269  [pdf, other

    cs.CV eess.IV

    Pneumonia Detection on Chest X-ray using Radiomic Features and Contrastive Learning

    Authors: Yan Han, Chongyan Chen, Ahmed H Tewfik, Ying Ding, Yifan Peng

    Abstract: Chest X-ray becomes one of the most common medical diagnoses due to its noninvasiveness. The number of chest X-ray images has skyrocketed, but reading chest X-rays still have been manually performed by radiologists, which creates huge burnouts and delays. Traditionally, radiomics, as a subfield of radiology that can extract a large number of quantitative features from medical images, demonstrates… ▽ More

    Submitted 4 May, 2022; v1 submitted 11 January, 2021; originally announced January 2021.

    Comments: Accepted for ISBI 2021

  15. arXiv:2012.12843  [pdf, other

    cs.LG eess.SP stat.ML

    EQ-Net: A Unified Deep Learning Framework for Log-Likelihood Ratio Estimation and Quantization

    Authors: Marius Arvinte, Ahmed H. Tewfik, Sriram Vishwanath

    Abstract: In this work, we introduce EQ-Net: the first holistic framework that solves both the tasks of log-likelihood ratio (LLR) estimation and quantization using a data-driven method. We motivate our approach with theoretical insights on two practical estimation algorithms at the ends of the complexity spectrum and reveal a connection between the complexity of an algorithm and the information bottleneck… ▽ More

    Submitted 3 May, 2021; v1 submitted 23 December, 2020; originally announced December 2020.

  16. arXiv:2011.12506  [pdf, other

    cs.CV

    Using Radiomics as Prior Knowledge for Thorax Disease Classification and Localization in Chest X-rays

    Authors: Yan Han, Chongyan Chen, Liyan Tang, Mingquan Lin, Ajay Jaiswal, Song Wang, Ahmed Tewfik, George Shih, Ying Ding, Yifan Peng

    Abstract: Chest X-ray becomes one of the most common medical diagnoses due to its noninvasiveness. The number of chest X-ray images has skyrocketed, but reading chest X-rays still have been manually performed by radiologists, which creates huge burnouts and delays. Traditionally, radiomics, as a subfield of radiology that can extract a large number of quantitative features from medical images, demonstrates… ▽ More

    Submitted 9 July, 2021; v1 submitted 24 November, 2020; originally announced November 2020.

    Comments: Accepted by AMIA 2021

  17. arXiv:2010.02367  [pdf, other

    cs.CV cs.LG

    Automotive Radar Data Acquisition using Object Detection

    Authors: Madhumitha Sakthi, Ahmed Tewfik

    Abstract: The growing urban complexity demands an efficient algorithm to acquire and process various sensor information from autonomous vehicles. In this paper, we introduce an algorithm to utilize object detection results from the image to adaptively sample and acquire radar data using Compressed Sensing (CS). This novel algorithm is motivated by the hypothesis that with a limited sampling budget, allocati… ▽ More

    Submitted 1 March, 2021; v1 submitted 5 October, 2020; originally announced October 2020.

    Comments: Submitted to EUSIPCO 2021

  18. arXiv:2008.07621  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    Speech Recognition using EEG signals recorded using dry electrodes

    Authors: Gautam Krishna, Co Tran, Mason Carnahan, Morgan M Hagood, Ahmed H Tewfik

    Abstract: In this paper, we demonstrate speech recognition using electroencephalography (EEG) signals obtained using dry electrodes on a limited English vocabulary consisting of three vowels and one word using a deep learning model. We demonstrate a test accuracy of 79.07 percent on a subset vocabulary consisting of two English vowels. Our results demonstrate the feasibility of using EEG signals recorded us… ▽ More

    Submitted 13 August, 2020; originally announced August 2020.

  19. arXiv:2006.03638  [pdf, other

    cs.CV cs.LG

    Robust Face Verification via Disentangled Representations

    Authors: Marius Arvinte, Ahmed H. Tewfik, Sriram Vishwanath

    Abstract: We introduce a robust algorithm for face verification, i.e., deciding whether twoimages are of the same person or not. Our approach is a novel take on the idea ofusing deep generative networks for adversarial robustness. We use the generativemodel during training as an online augmentation method instead of a test-timepurifier that removes adversarial noise. Our architecture uses a contrastive loss… ▽ More

    Submitted 23 June, 2020; v1 submitted 5 June, 2020; originally announced June 2020.

    Comments: Preprint

  20. arXiv:2006.02902  [pdf

    eess.AS cs.LG cs.SD eess.SP

    Constrained Variational Autoencoder for improving EEG based Speech Recognition Systems

    Authors: Gautam Krishna, Co Tran, Mason Carnahan, Ahmed Tewfik

    Abstract: In this paper we introduce a recurrent neural network (RNN) based variational autoencoder (VAE) model with a new constrained loss function that can generate more meaningful electroencephalography (EEG) features from raw EEG features to improve the performance of EEG based speech recognition systems. We demonstrate that both continuous and isolated speech recognition systems trained and tested usin… ▽ More

    Submitted 1 June, 2020; originally announced June 2020.

    Comments: Under Review. arXiv admin note: substantial text overlap with arXiv:2006.01260

  21. arXiv:2006.01262  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    Predicting Different Acoustic Features from EEG and towards direct synthesis of Audio Waveform from EEG

    Authors: Gautam Krishna, Co Tran, Mason Carnahan, Ahmed Tewfik

    Abstract: In [1,2] authors provided preliminary results for synthesizing speech from electroencephalography (EEG) features where they first predict acoustic features from EEG features and then the speech is reconstructed from the predicted acoustic features using griffin lim reconstruction algorithm. In this paper we first introduce a deep learning model that takes raw EEG waveform signals as input and dire… ▽ More

    Submitted 29 May, 2020; originally announced June 2020.

    Comments: Under Review

  22. arXiv:2006.01261  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    Understanding effect of speech perception in EEG based speech recognition systems

    Authors: Gautam Krishna, Co Tran, Mason Carnahan, Ahmed Tewfik

    Abstract: The electroencephalography (EEG) signals recorded in parallel with speech are used to perform isolated and continuous speech recognition. During speaking process, one also hears his or her own speech and this speech perception is also reflected in the recorded EEG signals. In this paper we investigate whether it is possible to separate out this speech perception component from EEG signals in order… ▽ More

    Submitted 29 May, 2020; originally announced June 2020.

    Comments: Under Review

  23. arXiv:2006.01260  [pdf, other

    eess.AS cs.LG cs.SD

    Improving EEG based continuous speech recognition using GAN

    Authors: Gautam Krishna, Co Tran, Mason Carnahan, Ahmed Tewfik

    Abstract: In this paper we demonstrate that it is possible to generate more meaningful electroencephalography (EEG) features from raw EEG features using generative adversarial networks (GAN) to improve the performance of EEG based continuous speech recognition systems. We improve the results demonstrated by authors in [1] using their data sets for for some of the test time experiments and for other cases ou… ▽ More

    Submitted 29 May, 2020; originally announced June 2020.

    Comments: Under Review

  24. arXiv:2005.11235  [pdf, other

    cs.CV cs.LG eess.SP

    Predicting Video features from EEG and Vice versa

    Authors: Gautam Krishna, Co Tran, Mason Carnahan, Ahmed Tewfik

    Abstract: In this paper we explore predicting facial or lip video features from electroencephalography (EEG) features and predicting EEG features from recorded facial or lip video frames using deep learning models. The subjects were asked to read out loud English sentences shown to them on a computer screen and their simultaneous EEG signals and facial video frames were recorded. Our model was able to gener… ▽ More

    Submitted 16 May, 2020; originally announced May 2020.

    Comments: under review

  25. arXiv:2004.04731  [pdf, other

    eess.AS cs.LG cs.SD

    Advancing Speech Synthesis using EEG

    Authors: Gautam Krishna, Co Tran, Mason Carnahan, Ahmed Tewfik

    Abstract: In this paper we introduce attention-regression model to demonstrate predicting acoustic features from electroencephalography (EEG) features recorded in parallel with spoken sentences. First we demonstrate predicting acoustic features directly from EEG features using our attention model and then we demonstrate predicting acoustic features from EEG features using a two-step approach where in the fi… ▽ More

    Submitted 3 May, 2020; v1 submitted 9 April, 2020; originally announced April 2020.

    Comments: Under review

  26. arXiv:2003.04733  [pdf, other

    eess.AS cs.LG cs.SD eess.SP stat.ML

    Speaker Identification using EEG

    Authors: Gautam Krishna, Co Tran, Mason Carnahan, Ahmed Tewfik

    Abstract: In this paper we explore speaker identification using electroencephalography (EEG) signals. The performance of speaker identification systems degrades in presence of background noise, this paper demonstrates that EEG features can be used to enhance the performance of speaker identification systems operating in presence and absence of background noise. The paper further demonstrates that in presenc… ▽ More

    Submitted 6 March, 2020; originally announced March 2020.

  27. arXiv:2003.00007  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Generating EEG features from Acoustic features

    Authors: Gautam Krishna, Co Tran, Mason Carnahan, Yan Han, Ahmed H Tewfik

    Abstract: In this paper we demonstrate predicting electroencephalograpgy (EEG) features from acoustic features using recurrent neural network (RNN) based regression model and generative adversarial network (GAN). We predict various types of EEG features from acoustic features. We compare our results with the previously studied problem on speech synthesis using EEG and our results demonstrate that EEG featur… ▽ More

    Submitted 18 March, 2020; v1 submitted 29 February, 2020; originally announced March 2020.

  28. arXiv:2002.12504  [pdf, other

    cs.CV eess.IV

    Detecting Patch Adversarial Attacks with Image Residuals

    Authors: Marius Arvinte, Ahmed Tewfik, Sriram Vishwanath

    Abstract: We introduce an adversarial sample detection algorithm based on image residuals, specifically designed to guard against patch-based attacks. The image residual is obtained as the difference between an input image and a denoised version of it, and a discriminator is trained to distinguish between clean and adversarial samples. More precisely, we use a wavelet domain algorithm for denoising images a… ▽ More

    Submitted 2 March, 2020; v1 submitted 27 February, 2020; originally announced February 2020.

  29. arXiv:2002.03851  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Continuous Silent Speech Recognition using EEG

    Authors: Gautam Krishna, Co Tran, Mason Carnahan, Ahmed Tewfik

    Abstract: In this paper we explore continuous silent speech recognition using electroencephalography (EEG) signals. We implemented a connectionist temporal classification (CTC) automatic speech recognition (ASR) model to translate EEG signals recorded in parallel while subjects were reading English sentences in their mind without producing any voice to text. Our results demonstrate the feasibility of using… ▽ More

    Submitted 4 May, 2020; v1 submitted 6 February, 2020; originally announced February 2020.

  30. arXiv:2001.00501  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    EEG based Continuous Speech Recognition using Transformers

    Authors: Gautam Krishna, Co Tran, Mason Carnahan, Ahmed H Tewfik

    Abstract: In this paper we investigate continuous speech recognition using electroencephalography (EEG) features using recently introduced end-to-end transformer based automatic speech recognition (ASR) model. Our results demonstrate that transformer based model demonstrate faster training compared to recurrent neural network (RNN) based sequence-to-sequence EEG models and better performance during inferenc… ▽ More

    Submitted 5 May, 2020; v1 submitted 31 December, 2019; originally announced January 2020.

  31. arXiv:1912.07730  [pdf, other

    cs.LG eess.AS eess.IV stat.ML

    Continuous Speech Recognition using EEG and Video

    Authors: Gautam Krishna, Mason Carnahan, Co Tran, Ahmed H Tewfik

    Abstract: In this paper we investigate whether electroencephalography (EEG) features can be used to improve the performance of continuous visual speech recognition systems. We implemented a connectionist temporal classification (CTC) based end-to-end automatic speech recognition (ASR) model for performing recognition. Our results demonstrate that EEG features are helpful in enhancing the performance of cont… ▽ More

    Submitted 27 December, 2019; v1 submitted 16 December, 2019; originally announced December 2019.

    Comments: On preparation for submission to EUSIPCO 2020. arXiv admin note: text overlap with arXiv:1911.11610, arXiv:1911.04261

  32. arXiv:1911.11610  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Improving EEG based Continuous Speech Recognition

    Authors: Gautam Krishna, Co Tran, Mason Carnahan, Yan Han, Ahmed H Tewfik

    Abstract: In this paper we introduce various techniques to improve the performance of electroencephalography (EEG) features based continuous speech recognition (CSR) systems. A connectionist temporal classification (CTC) based automatic speech recognition (ASR) system was implemented for performing recognition. We introduce techniques to initialize the weights of the recurrent layers in the encoder of the C… ▽ More

    Submitted 23 December, 2019; v1 submitted 24 November, 2019; originally announced November 2019.

    Comments: On preparation for submission to EUSIPCO 2020. arXiv admin note: text overlap with arXiv:1911.04261, arXiv:1906.08871

  33. arXiv:1911.04261  [pdf, other

    cs.SD eess.AS eess.SP

    Voice Activity Detection in presence of background noise using EEG

    Authors: Gautam Krishna, Co Tran, Mason Carnahan, Yan Han, Ahmed H Tewfik

    Abstract: In this paper we demonstrate that performance of voice activity detection (VAD) system operating in presence of background noise can be improved by concatenating acoustic input features with electroencephalography (EEG) features. We also demonstrate that VAD using only EEG features shows better performance than VAD using only acoustic features in presence of background noise. We implemented a recu… ▽ More

    Submitted 14 March, 2020; v1 submitted 8 November, 2019; originally announced November 2019.

    Comments: On preparation for submission to EUSIPCO 2020. arXiv admin note: text overlap with arXiv:1906.08871, arXiv:1909.09132

  34. arXiv:1909.09132  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Spoken Speech Enhancement using EEG

    Authors: Gautam Krishna, Co Tran, Yan Han, Mason Carnahan, Ahmed H Tewfik

    Abstract: In this paper we demonstrate spoken speech enhancement using electroencephalography (EEG) signals using a generative adversarial network (GAN) based model, gated recurrent unit (GRU) regression based model, temporal convolutional network (TCN) regression model and finally using a mixed TCN GRU regression model. We compare our EEG based speech enhancement results with traditional log minimum mean… ▽ More

    Submitted 19 April, 2020; v1 submitted 13 September, 2019; originally announced September 2019.

  35. arXiv:1908.05743  [pdf, other

    eess.AS cs.SD

    State-of-the-art Speech Recognition using EEG and Towards Decoding of Speech Spectrum From EEG

    Authors: Gautam Krishna, Yan Han, Co Tran, Mason Carnahan, Ahmed H Tewfik

    Abstract: In this paper we first demonstrate continuous noisy speech recognition using electroencephalography (EEG) signals on English vocabulary using different types of state of the art end-to-end automatic speech recognition (ASR) models, we further provide results obtained using EEG data recorded under different experimental conditions. We finally demonstrate decoding of speech spectrum from EEG signals… ▽ More

    Submitted 4 March, 2020; v1 submitted 14 August, 2019; originally announced August 2019.

  36. arXiv:1906.08871  [pdf, other

    eess.AS cs.CL cs.LG cs.SD stat.ML

    Advancing Speech Recognition With No Speech Or With Noisy Speech

    Authors: Gautam Krishna, Co Tran, Mason Carnahan, Ahmed H Tewfik

    Abstract: In this paper we demonstrate end-to-end continuous speech recognition (CSR) using electroencephalography (EEG) signals with no speech signal as input. An attention model based automatic speech recognition (ASR) and connectionist temporal classification (CTC) based ASR systems were implemented for performing recognition. We further demonstrate CSR for noisy speech by fusing with EEG features.

    Submitted 14 March, 2020; v1 submitted 17 June, 2019; originally announced June 2019.

    Comments: Extended version of our accepted IEEE EUSIPCO 2019 paper with additional results for CTC model based recognition. arXiv admin note: substantial text overlap with arXiv:1906.08045, arXiv:1906.08044

  37. arXiv:1906.08045  [pdf, other

    eess.AS cs.LG cs.SD eess.SP stat.ML

    Speech Recognition With No Speech Or With Noisy Speech Beyond English

    Authors: Gautam Krishna, Co Tran, Yan Han, Mason Carnahan, Ahmed H Tewfik

    Abstract: In this paper we demonstrate continuous noisy speech recognition using connectionist temporal classification (CTC) model on limited Chinese vocabulary using electroencephalography (EEG) features with no speech signal as input and we further demonstrate single CTC model based continuous noisy speech recognition on limited joint English and Chinese vocabulary using EEG features with no speech signal… ▽ More

    Submitted 26 February, 2020; v1 submitted 17 June, 2019; originally announced June 2019.

    Comments: arXiv admin note: text overlap with arXiv:1906.08871

  38. arXiv:1906.08044  [pdf, other

    eess.AS cs.LG cs.SD eess.SP stat.ML

    Robust End-to-End Speaker Verification Using EEG

    Authors: Yan Han, Gautam Krishna, Co Tran, Mason Carnahan, Ahmed H Tewfik

    Abstract: In this paper we demonstrate that performance of a speaker verification system can be improved by concatenating electroencephalography (EEG) signal features with speech signal features or only using EEG signal features. We use state-of-the-art end-to-end deep learning model for performing speaker verification and we demonstrate our results for noisy speech. Our results indicate that EEG signals ca… ▽ More

    Submitted 9 June, 2020; v1 submitted 17 June, 2019; originally announced June 2019.

    Comments: Accepted for EUSIPCO 2020

  39. arXiv:1906.07849  [pdf, other

    cs.LG eess.SP stat.ML

    Deep Learning-Based Quantization of L-Values for Gray-Coded Modulation

    Authors: Marius Arvinte, Sriram Vishwanath, Ahmed H. Tewfik

    Abstract: In this work, a deep learning-based quantization scheme for log-likelihood ratio (L-value) storage is introduced. We analyze the dependency between the average magnitude of different L-values from the same quadrature amplitude modulation (QAM) symbol and show they follow a consistent ordering. Based on this we design a deep autoencoder that jointly compresses and separately reconstructs each L-val… ▽ More

    Submitted 9 May, 2021; v1 submitted 18 June, 2019; originally announced June 2019.

    Comments: Submitted to IEEE Globecom 2019

  40. arXiv:1903.04656  [pdf, other

    cs.LG eess.SP stat.ML

    Deep Log-Likelihood Ratio Quantization

    Authors: Marius Arvinte, Ahmed H. Tewfik, Sriram Vishwanath

    Abstract: In this work, a deep learning-based method for log-likelihood ratio (LLR) lossy compression and quantization is proposed, with emphasis on a single-input single-output uncorrelated fading communication setting. A deep autoencoder network is trained to compress, quantize and reconstruct the bit log-likelihood ratios corresponding to a single transmitted symbol. Specifically, the encoder maps to a l… ▽ More

    Submitted 9 May, 2021; v1 submitted 11 March, 2019; originally announced March 2019.

    Comments: Accepted for publication at EUSIPCO 2019. Camera-ready version

  41. arXiv:1903.00739  [pdf, other

    cs.LG stat.ML

    Speech Recognition with no speech or with noisy speech

    Authors: Gautam Krishna, Co Tran, Jianguo Yu, Ahmed H Tewfik

    Abstract: The performance of automatic speech recognition systems(ASR) degrades in the presence of noisy speech. This paper demonstrates that using electroencephalography (EEG) can help automatic speech recognition systems overcome performance loss in the presence of noise. The paper also shows that distillation training of automatic speech recognition systems using EEG features will increase their performa… ▽ More

    Submitted 2 March, 2019; originally announced March 2019.

    Comments: Accepted for ICASSP 2019

  42. arXiv:1703.00134  [pdf, other

    cs.NI

    Collision Resolution and Interference Elimination in Multiaccess Communication Networks

    Authors: Naeem Akl, Ahmed Tewfik

    Abstract: We define a multiaccess communication scheme that effectively eliminates interference and resolves collisions in many-to-one and many-to-many communication scenarios. Each transmitter is uniquely identified by a steering vector. All signals issued from a specific transmitter will be steered into the same single-dimensional or double-dimensional subspace at all receivers hearing this transmission.… ▽ More

    Submitted 28 February, 2017; originally announced March 2017.

  43. Primary Traffic Characterization and Secondary Transmissions

    Authors: Yingxi Liu, Ahmed Tewfik

    Abstract: Channel idle time distribution based secondary transmission strategies have been studied intensively in the literature. Under various performance metrics, the ultimate performance of secondary devices are eventually dictated by the presumed channel idle time distribution. Such distributions can take any arbitrary form in practice. In this work, we study idle time distributions in wireless local ar… ▽ More

    Submitted 30 April, 2016; originally announced May 2016.

    Journal ref: IEEE Transactions on Wireless Communications, vol. 13, no. 6, pp. 3003-3016, June 2014