Skip to main content

Showing 1–13 of 13 results for author: Bourlard, H

  1. arXiv:2104.02558  [pdf, ps, other

    cs.SD cs.LG eess.AS

    Comparing CTC and LFMMI for out-of-domain adaptation of wav2vec 2.0 acoustic model

    Authors: Apoorv Vyas, Srikanth Madikeri, Hervé Bourlard

    Abstract: In this work, we investigate if the wav2vec 2.0 self-supervised pretraining helps mitigate the overfitting issues with connectionist temporal classification (CTC) training to reduce its performance gap with flat-start lattice-free MMI (E2E-LFMMI) for automatic speech recognition with limited training data. Towards that objective, we use the pretrained wav2vec 2.0 BASE model and fine-tune it on thr… ▽ More

    Submitted 6 April, 2021; originally announced April 2021.

  2. arXiv:2012.14252  [pdf, ps, other

    cs.LG cs.SD eess.AS

    Lattice-Free MMI Adaptation Of Self-Supervised Pretrained Acoustic Models

    Authors: Apoorv Vyas, Srikanth Madikeri, Hervé Bourlard

    Abstract: In this work, we propose lattice-free MMI (LFMMI) for supervised adaptation of self-supervised pretrained acoustic model. We pretrain a Transformer model on thousand hours of untranscribed Librispeech data followed by supervised adaptation with LFMMI on three different datasets. Our results show that fine-tuning with LFMMI, we consistently obtain relative WER improvements of 10% and 35.3% on the c… ▽ More

    Submitted 6 April, 2021; v1 submitted 28 December, 2020; originally announced December 2020.

  3. arXiv:2011.07542  [pdf, other

    cs.SD eess.AS

    Automatic and perceptual discrimination between dysarthria, apraxia of speech, and neurotypical speech

    Authors: I. Kodrasi, M. Pernon, M. Laganaro, H. Bourlard

    Abstract: Automatic techniques in the context of motor speech disorders (MSDs) are typically two-class techniques aiming to discriminate between dysarthria and neurotypical speech or between dysarthria and apraxia of speech (AoS). Further, although such techniques are proposed to support the perceptual assessment of clinicians, the automatic and perceptual classification accuracy has never been compared. In… ▽ More

    Submitted 2 June, 2021; v1 submitted 15 November, 2020; originally announced November 2020.

    Comments: ICASSP 2021

  4. arXiv:2010.03466  [pdf, ps, other

    eess.AS cs.SD

    Pkwrap: a PyTorch Package for LF-MMI Training of Acoustic Models

    Authors: Srikanth Madikeri, Sibo Tong, Juan Zuluaga-Gomez, Apoorv Vyas, Petr Motlicek, Hervé Bourlard

    Abstract: We present a simple wrapper that is useful to train acoustic models in PyTorch using Kaldi's LF-MMI training framework. The wrapper, called pkwrap (short form of PyTorch kaldi wrapper), enables the user to utilize the flexibility provided by PyTorch in designing model architectures. It exposes the LF-MMI cost function as an autograd function. Other capabilities of Kaldi have also been ported to Py… ▽ More

    Submitted 7 October, 2020; originally announced October 2020.

  5. arXiv:1911.08332  [pdf, other

    eess.AS cs.CL cs.LG cs.SD stat.ML

    Neural Network based End-to-End Query by Example Spoken Term Detection

    Authors: Dhananjay Ram, Lesly Miculicich, Hervé Bourlard

    Abstract: This paper focuses on the problem of query by example spoken term detection (QbE-STD) in zero-resource scenario. State-of-the-art approaches primarily rely on dynamic time warping (DTW) based template matching techniques using phone posterior or bottleneck features extracted from a deep neural network (DNN). We use both monolingual and multilingual bottleneck features, and show that multilingual f… ▽ More

    Submitted 19 November, 2019; originally announced November 2019.

    Comments: Submitted to IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING

  6. arXiv:1907.00443  [pdf, other

    cs.CL cs.HC cs.LG cs.SD eess.AS

    Multilingual Bottleneck Features for Query by Example Spoken Term Detection

    Authors: Dhananjay Ram, Lesly Miculicich, Hervé Bourlard

    Abstract: State of the art solutions to query by example spoken term detection (QbE-STD) usually rely on bottleneck feature representation of the query and audio document to perform dynamic time warping (DTW) based template matching. Here, we present a study on QbE-STD performance using several monolingual as well as multilingual bottleneck features extracted from feed forward networks. Then, we propose to… ▽ More

    Submitted 30 June, 2019; originally announced July 2019.

  7. arXiv:1711.10025  [pdf, other

    eess.AS cs.SD

    Multilingual Training and Cross-lingual Adaptation on CTC-based Acoustic Model

    Authors: Sibo Tong, Philip N. Garner, Hervé Bourlard

    Abstract: Multilingual models for Automatic Speech Recognition (ASR) are attractive as they have been shown to benefit from more training data, and better lend themselves to adaptation to under-resourced languages. However, initialisation from monolingual context-dependent models leads to an explosion of context-dependent states. Connectionist Temporal Classification (CTC) is a potential solution to this as… ▽ More

    Submitted 23 January, 2018; v1 submitted 27 November, 2017; originally announced November 2017.

  8. arXiv:1709.01144   

    cs.SD cs.CL cs.LG

    Information Theoretic Analysis of DNN-HMM Acoustic Modeling

    Authors: Pranay Dighe, Afsaneh Asaei, Hervé Bourlard

    Abstract: We propose an information theoretic framework for quantitative assessment of acoustic modeling for hidden Markov model (HMM) based automatic speech recognition (ASR). Acoustic modeling yields the probabilities of HMM sub-word states for a short temporal window of speech acoustic features. We cast ASR as a communication channel where the input sub-word probabilities convey the information about the… ▽ More

    Submitted 8 November, 2017; v1 submitted 29 August, 2017; originally announced September 2017.

    Comments: Theoretical flaw, needs major revision

  9. arXiv:1610.05688  [pdf, other

    cs.CL cs.AI cs.HC cs.LG

    Low-rank and Sparse Soft Targets to Learn Better DNN Acoustic Models

    Authors: Pranay Dighe, Afsaneh Asaei, Herve Bourlard

    Abstract: Conventional deep neural networks (DNN) for speech acoustic modeling rely on Gaussian mixture models (GMM) and hidden Markov model (HMM) to obtain binary class labels as the targets for DNN training. Subword classes in speech recognition systems correspond to context-dependent tied states or senones. The present work addresses some limitations of GMM-HMM senone alignments for DNN training. We hypo… ▽ More

    Submitted 18 October, 2016; originally announced October 2016.

  10. arXiv:1601.05936  [pdf, other

    cs.CL cs.LG stat.ML

    Exploiting Low-dimensional Structures to Enhance DNN Based Acoustic Modeling in Speech Recognition

    Authors: Pranay Dighe, Gil Luyet, Afsaneh Asaei, Herve Bourlard

    Abstract: We propose to model the acoustic space of deep neural network (DNN) class-conditional posterior probabilities as a union of low-dimensional subspaces. To that end, the training posteriors are used for dictionary learning and sparse coding. Sparse representation of the test posteriors using this dictionary enables projection to the space of training data. Relying on the fact that the intrinsic dime… ▽ More

    Submitted 22 January, 2016; originally announced January 2016.

  11. On Structured Sparsity of Phonological Posteriors for Linguistic Parsing

    Authors: Milos Cernak, Afsaneh Asaei, Hervé Bourlard

    Abstract: The speech signal conveys information on different time scales from short time scale or segmental, associated to phonological and phonetic information to long time scale or supra segmental, associated to syllabic and prosodic information. Linguistic and neurocognitive studies recognize the phonological classes at segmental level as the essential and invariant representations used in speech tempora… ▽ More

    Submitted 30 August, 2016; v1 submitted 21 January, 2016; originally announced January 2016.

    Report number: Idiap-RR-07-2016

    Journal ref: Speech Communication, Volume 84, November 2016, Pages 36-45

  12. arXiv:1409.0203  [pdf, other

    cs.SD cs.LG

    Ad Hoc Microphone Array Calibration: Euclidean Distance Matrix Completion Algorithm and Theoretical Guarantees

    Authors: Mohammad J. Taghizadeh, Reza Parhizkar, Philip N. Garner, Herve Bourlard, Afsaneh Asaei

    Abstract: This paper addresses the problem of ad hoc microphone array calibration where only partial information about the distances between microphones is available. We construct a matrix consisting of the pairwise distances and propose to estimate the missing entries based on a novel Euclidean distance matrix completion algorithm by alternative low-rank matrix completion and projection onto the Euclidean… ▽ More

    Submitted 31 August, 2014; originally announced September 2014.

    Comments: In Press, available online, August 1, 2014. http://www.sciencedirect.com/science/article/pii/S0165168414003508, Signal Processing, 2014

  13. arXiv:1210.6766  [pdf, other

    cs.LG cs.SD

    Structured Sparsity Models for Multiparty Speech Recovery from Reverberant Recordings

    Authors: Afsaneh Asaei, Mohammad Golbabaee, Hervé Bourlard, Volkan Cevher

    Abstract: We tackle the multi-party speech recovery problem through modeling the acoustic of the reverberant chambers. Our approach exploits structured sparsity models to perform room modeling and speech recovery. We propose a scheme for characterizing the room acoustic from the unknown competing speech sources relying on localization of the early images of the speakers by sparse approximation of the spatia… ▽ More

    Submitted 25 October, 2012; originally announced October 2012.

    Comments: 31 pages