Skip to main content

Showing 1–7 of 7 results for author: Narisetty, C

  1. arXiv:2206.07430  [pdf, ps, other

    eess.AS cs.SD

    Residual Language Model for End-to-end Speech Recognition

    Authors: Emiru Tsunoo, Yosuke Kashiwagi, Chaitanya Narisetty, Shinji Watanabe

    Abstract: End-to-end automatic speech recognition suffers from adaptation to unknown target domain speech despite being trained with a large amount of paired audio--text data. Recent studies estimate a linguistic bias of the model as the internal language model (LM). To effectively adapt to the target domain, the internal LM is subtracted from the posterior during inference and fused with an external target… ▽ More

    Submitted 15 June, 2022; originally announced June 2022.

    Comments: Accepted for Interspeech2022

  2. arXiv:2202.01405  [pdf, other

    eess.AS cs.CL cs.SD

    Joint Speech Recognition and Audio Captioning

    Authors: Chaitanya Narisetty, Emiru Tsunoo, Xuankai Chang, Yosuke Kashiwagi, Michael Hentschel, Shinji Watanabe

    Abstract: Speech samples recorded in both indoor and outdoor environments are often contaminated with secondary audio sources. Most end-to-end monaural speech recognition systems either remove these background sounds using speech enhancement or train noise-robust models. For better model interpretability and holistic understanding, we aim to bring together the growing field of automated audio captioning (AA… ▽ More

    Submitted 2 February, 2022; originally announced February 2022.

    Comments: 5 pages, 2 figures. Accepted for ICASSP 2022

  3. arXiv:2201.10190  [pdf, ps, other

    eess.AS cs.SD

    Run-and-back stitch search: novel block synchronous decoding for streaming encoder-decoder ASR

    Authors: Emiru Tsunoo, Chaitanya Narisetty, Michael Hentschel, Yosuke Kashiwagi, Shinji Watanabe

    Abstract: A streaming style inference of encoder-decoder automatic speech recognition (ASR) system is important for reducing latency, which is essential for interactive use cases. To this end, we propose a novel blockwise synchronous decoding algorithm with a hybrid approach that combines endpoint prediction and endpoint post-determination. In the endpoint prediction, we compute the expectation of the numbe… ▽ More

    Submitted 25 January, 2022; originally announced January 2022.

    Comments: Accepted for ICASSP2022

  4. arXiv:2108.06643  [pdf, other

    cs.CL cs.AI cs.LG

    SAPPHIRE: Approaches for Enhanced Concept-to-Text Generation

    Authors: Steven Y. Feng, Jessica Huynh, Chaitanya Narisetty, Eduard Hovy, Varun Gangal

    Abstract: We motivate and propose a suite of simple but effective improvements for concept-to-text generation called SAPPHIRE: Set Augmentation and Post-hoc PHrase Infilling and REcombination. We demonstrate their effectiveness on generative commonsense reasoning, a.k.a. the CommonGen task, through experiments using both BART and T5 models. Through extensive automatic and human evaluation, we show that SAPP… ▽ More

    Submitted 1 December, 2021; v1 submitted 14 August, 2021; originally announced August 2021.

    Comments: INLG 2021 [Best Long Paper]. Code available at https://github.com/styfeng/SAPPHIRE

  5. arXiv:2106.03419  [pdf, ps, other

    eess.AS cs.SD

    Data Augmentation Methods for End-to-end Speech Recognition on Distant-Talk Scenarios

    Authors: Emiru Tsunoo, Kentaro Shibata, Chaitanya Narisetty, Yosuke Kashiwagi, Shinji Watanabe

    Abstract: Although end-to-end automatic speech recognition (E2E ASR) has achieved great performance in tasks that have numerous paired data, it is still challenging to make E2E ASR robust against noisy and low-resource conditions. In this study, we investigated data augmentation methods for E2E ASR in distant-talk scenarios. E2E ASR models are trained on the series of CHiME challenge datasets, which are sui… ▽ More

    Submitted 7 June, 2021; originally announced June 2021.

    Comments: Accepted for Interspeech2021

  6. arXiv:1904.03787  [pdf, other

    cs.SD cs.LG eess.AS

    Bayesian Non-Parametric Multi-Source Modelling Based Determined Blind Source Separation

    Authors: Chaitanya Narisetty, Tatsuya Komatsu, Reishi Kondo

    Abstract: This paper proposes a determined blind source separation method using Bayesian non-parametric modelling of sources. Conventionally source signals are separated from a given set of mixture signals by modelling them using non-negative matrix factorization (NMF). However in NMF, a latent variable signifying model complexity must be appropriately specified to avoid over-fitting or under-fitting. As re… ▽ More

    Submitted 7 April, 2019; originally announced April 2019.

    Comments: 5 pages, 2 figures. Accepted at ICASSP 2019

  7. arXiv:1904.02852  [pdf, other

    eess.AS cs.SD

    Modelling of Sound Events with Hidden Imbalances Based on Clustering and Separate Sub-Dictionary Learning

    Authors: Chaitanya Narisetty, Tatsuya Komatsu, Reishi Kondo

    Abstract: This paper proposes an effective modelling of sound event spectra with a hidden data-size-imbalance, for improved Acoustic Event Detection (AED). The proposed method models each event as an aggregated representation of a few latent factors, while conventional approaches try to find acoustic elements directly from the event spectra. In the method, all the latent factors across all events are assign… ▽ More

    Submitted 4 April, 2019; originally announced April 2019.