Skip to main content

Showing 1–8 of 8 results for author: Ganapathiraju, A

  1. arXiv:2407.04444  [pdf, other

    cs.CL cs.SD eess.AS

    TokenVerse: Unifying Speech and NLP Tasks via Transducer-based ASR

    Authors: Shashi Kumar, Srikanth Madikeri, Juan Zuluaga-Gomez, Iuliia Nigmatulina, Esaú Villatoro-Tello, Sergio Burdisso, Petr Motlicek, Karthik Pandia, Aravind Ganapathiraju

    Abstract: In traditional conversational intelligence from speech, a cascaded pipeline is used, involving tasks such as voice activity detection, diarization, transcription, and subsequent processing with different NLP models for tasks like semantic endpointing and named entity recognition (NER). Our paper introduces TokenVerse, a single Transducer-based model designed to handle multiple tasks. This is achie… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 5 pages, double column

  2. arXiv:2306.15685  [pdf, other

    eess.AS cs.CL

    Implementing contextual biasing in GPU decoder for online ASR

    Authors: Iuliia Nigmatulina, Srikanth Madikeri, Esaú Villatoro-Tello, Petr Motliček, Juan Zuluaga-Gomez, Karthik Pandia, Aravind Ganapathiraju

    Abstract: GPU decoding significantly accelerates the output of ASR predictions. While GPUs are already being used for online ASR decoding, post-processing and rescoring on GPUs have not been properly investigated yet. Rescoring with available contextual information can considerably improve ASR predictions. Previous studies have proven the viability of lattice rescoring in decoding and biasing language model… ▽ More

    Submitted 23 June, 2023; originally announced June 2023.

    Comments: Accepted to Interspeech 2023

  3. arXiv:2305.12540  [pdf, other

    eess.AS cs.AI cs.SD

    On the Efficacy and Noise-Robustness of Jointly Learned Speech Emotion and Automatic Speech Recognition

    Authors: Lokesh Bansal, S. Pavankumar Dubagunta, Malolan Chetlur, Pushpak Jagtap, Aravind Ganapathiraju

    Abstract: New-age conversational agent systems perform both speech emotion recognition (SER) and automatic speech recognition (ASR) using two separate and often independent approaches for real-world application in noisy environments. In this paper, we investigate a joint ASR-SER multitask learning approach in a low-resource setting and show that improvements are observed not only in SER, but also in ASR. We… ▽ More

    Submitted 25 May, 2023; v1 submitted 21 May, 2023; originally announced May 2023.

    Comments: accepted to be part of INTERSPEECH 2023

  4. arXiv:2212.08489  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Effectiveness of Text, Acoustic, and Lattice-based representations in Spoken Language Understanding tasks

    Authors: Esaú Villatoro-Tello, Srikanth Madikeri, Juan Zuluaga-Gomez, Bidisha Sharma, Seyyed Saeed Sarfjoo, Iuliia Nigmatulina, Petr Motlicek, Alexei V. Ivanov, Aravind Ganapathiraju

    Abstract: In this paper, we perform an exhaustive evaluation of different representations to address the intent classification problem in a Spoken Language Understanding (SLU) setup. We benchmark three types of systems to perform the SLU intent detection task: 1) text-based, 2) lattice-based, and a novel 3) multimodal approach. Our work provides a comprehensive analysis of what could be the achievable perfo… ▽ More

    Submitted 17 March, 2023; v1 submitted 16 December, 2022; originally announced December 2022.

    Comments: Accepted in ICASSP 2023

    ACM Class: I.2.7

    Journal ref: ICASSP 2023

  5. arXiv:1702.02289  [pdf, other

    cs.SD

    Neural Network Based Speaker Classification and Verification Systems with Enhanced Features

    Authors: Zhenhao Ge, Ananth N. Iyer, Srinath Cheluvaraja, Ram Sundaram, Aravind Ganapathiraju

    Abstract: This work presents a novel framework based on feed-forward neural network for text-independent speaker classification and verification, two related systems of speaker recognition. With optimized features and model training, it achieves 100% classification rate in classification and less than 6% Equal Error Rate (ERR), using merely about 1 second and 5 seconds of data respectively. Features with st… ▽ More

    Submitted 7 February, 2017; originally announced February 2017.

    Comments: Intelligent Systems Conference 2017, Sep. 7-8 2017, London, UK. arXiv admin note: text overlap with arXiv:1702.02285

  6. arXiv:1702.02285  [pdf, other

    cs.SD

    Speaker Change Detection Using Features through A Neural Network Speaker Classifier

    Authors: Zhenhao Ge, Ananth N. Iyer, Srinath Cheluvaraja, Aravind Ganapathiraju

    Abstract: The mechanism proposed here is for real-time speaker change detection in conversations, which firstly trains a neural network text-independent speaker classifier using in-domain speaker data. Through the network, features of conversational speech from out-of-domain speakers are then converted into likelihood vectors, i.e. similarity scores comparing to the in-domain speakers. These transformed fea… ▽ More

    Submitted 7 February, 2017; originally announced February 2017.

    Comments: Intelligent System Conference 2017, Sep. 7-8, 2017, London, UK. arXiv admin note: text overlap with arXiv:1702.02289

  7. arXiv:1606.08821  [pdf, other

    cs.CL

    Generation and Pruning of Pronunciation Variants to Improve ASR Accuracy

    Authors: Zhenhao Ge, Aravind Ganapathiraju, Ananth N. Iyer, Scott A. Randal, Felix I. Wyss

    Abstract: Speech recognition, especially name recognition, is widely used in phone services such as company directory dialers, stock quote providers or location finders. It is usually challenging due to pronunciation variations. This paper proposes an efficient and robust data-driven technique which automatically learns acceptable word pronunciations and updates the pronunciation dictionary to build a bette… ▽ More

    Submitted 28 June, 2016; originally announced June 2016.

    Comments: Interspeech 2016

  8. arXiv:1604.08095  [pdf, other

    cs.SD cs.CL

    Accent Classification with Phonetic Vowel Representation

    Authors: Zhenhao Ge, Yingyi Tan, Aravind Ganapathiraju

    Abstract: Previous accent classification research focused mainly on detecting accents with pure acoustic information without recognizing accented speech. This work combines phonetic knowledge such as vowels with acoustic information to build Guassian Mixture Model (GMM) classifier with Perceptual Linear Predictive (PLP) features, optimized by Hetroscedastic Linear Discriminant Analysis (HLDA). With input ab… ▽ More

    Submitted 23 February, 2016; originally announced April 2016.

    Comments: Asian Conference on Pattern Recognition (ACPR) 2015