Skip to main content

Showing 1–16 of 16 results for author: Klejch, O

  1. arXiv:2309.15674  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    Speech collage: code-switched audio generation by collaging monolingual corpora

    Authors: Amir Hussein, Dorsa Zeinali, Ondřej Klejch, Matthew Wiesner, Brian Yan, Shammur Chowdhury, Ahmed Ali, Shinji Watanabe, Sanjeev Khudanpur

    Abstract: Designing effective automatic speech recognition (ASR) systems for Code-Switching (CS) often depends on the availability of the transcribed CS resources. To address data scarcity, this paper introduces Speech Collage, a method that synthesizes CS data from monolingual corpora by splicing audio segments. We further improve the smoothness quality of audio generation using an overlap-add approach. We… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

  2. arXiv:2306.02153  [pdf, ps, other

    cs.CL cs.LG cs.SD eess.AS

    Acoustic Word Embeddings for Untranscribed Target Languages with Continued Pretraining and Learned Pooling

    Authors: Ramon Sanabria, Ondrej Klejch, Hao Tang, Sharon Goldwater

    Abstract: Acoustic word embeddings are typically created by training a pooling function using pairs of word-like units. For unsupervised systems, these are mined using k-nearest neighbor (KNN) search, which is slow. Recently, mean-pooled representations from a pre-trained self-supervised English model were suggested as a promising alternative, but their performance on target languages was not fully competit… ▽ More

    Submitted 3 June, 2023; originally announced June 2023.

    Comments: Accepted to Interspeech 2023

  3. arXiv:2305.16065  [pdf, other

    eess.AS cs.CL cs.SD

    ASR and Emotional Speech: A Word-Level Investigation of the Mutual Impact of Speech and Emotion Recognition

    Authors: Yuanchao Li, Zeyu Zhao, Ondrej Klejch, Peter Bell, Catherine Lai

    Abstract: In Speech Emotion Recognition (SER), textual data is often used alongside audio signals to address their inherent variability. However, the reliance on human annotated text in most research hinders the development of practical SER systems. To overcome this challenge, we investigate how Automatic Speech Recognition (ASR) performs on emotional speech by analyzing the ASR performance on emotion corpo… ▽ More

    Submitted 28 May, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: Accepted to INTERSPEECH 2023

  4. arXiv:2303.18110  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    The Edinburgh International Accents of English Corpus: Towards the Democratization of English ASR

    Authors: Ramon Sanabria, Nikolay Bogoychev, Nina Markl, Andrea Carmantini, Ondrej Klejch, Peter Bell

    Abstract: English is the most widely spoken language in the world, used daily by millions of people as a first or second language in many different contexts. As a result, there are many varieties of English. Although the great many advances in English automatic speech recognition (ASR) over the past decades, results are usually reported based on test datasets which fail to represent the diversity of English… ▽ More

    Submitted 31 March, 2023; originally announced March 2023.

    Comments: Accepted to IEEE ICASSP 2023

  5. arXiv:2211.16049  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Evaluating and reducing the distance between synthetic and real speech distributions

    Authors: Christoph Minixhofer, Ondřej Klejch, Peter Bell

    Abstract: While modern Text-to-Speech (TTS) systems can produce natural-sounding speech, they remain unable to reproduce the full diversity found in natural speech data. We consider the distribution of all possible real speech samples that could be generated by these speakers alongside the distribution of all synthetic samples that could be generated for the same set of speakers, using a particular TTS syst… ▽ More

    Submitted 25 May, 2023; v1 submitted 29 November, 2022; originally announced November 2022.

    Comments: To be presented at INTERSPEECH 2023

  6. arXiv:2211.01458  [pdf, other

    cs.CL cs.SD eess.AS

    Towards Zero-Shot Code-Switched Speech Recognition

    Authors: Brian Yan, Matthew Wiesner, Ondrej Klejch, Preethi Jyothi, Shinji Watanabe

    Abstract: In this work, we seek to build effective code-switched (CS) automatic speech recognition systems (ASR) under the zero-shot setting where no transcribed CS speech data is available for training. Previously proposed frameworks which conditionally factorize the bilingual task into its constituent monolingual parts are a promising starting point for leveraging monolingual data efficiently. However, th… ▽ More

    Submitted 9 November, 2022; v1 submitted 2 November, 2022; originally announced November 2022.

    Comments: 5 pages

  7. arXiv:2112.08098  [pdf, other

    cs.CL cs.LG

    Mask-combine Decoding and Classification Approach for Punctuation Prediction with real-time Inference Constraints

    Authors: Christoph Minixhofer, Ondřej Klejch, Peter Bell

    Abstract: In this work, we unify several existing decoding strategies for punctuation prediction in one framework and introduce a novel strategy which utilises multiple predictions at each word across different windows. We show that significant improvements can be achieved by optimising these strategies after training a model, only leading to a potential increase in inference time, with no requirement for r… ▽ More

    Submitted 17 December, 2021; v1 submitted 15 December, 2021; originally announced December 2021.

    Comments: 4 pages, 3 figures, submitted to ICASSP2022

  8. arXiv:2111.06799  [pdf, other

    cs.CL eess.AS

    Deciphering Speech: a Zero-Resource Approach to Cross-Lingual Transfer in ASR

    Authors: Ondrej Klejch, Electra Wallington, Peter Bell

    Abstract: We present a method for cross-lingual training an ASR system using absolutely no transcribed training data from the target language, and with no phonetic knowledge of the language in question. Our approach uses a novel application of a decipherment algorithm, which operates given only unpaired speech and text data from the target language. We apply this decipherment to phone sequences generated by… ▽ More

    Submitted 6 June, 2022; v1 submitted 12 November, 2021; originally announced November 2021.

    Comments: Submitted to Interspeech 2022

  9. arXiv:2008.06580  [pdf, other

    eess.AS cs.CL cs.SD

    Adaptation Algorithms for Neural Network-Based Speech Recognition: An Overview

    Authors: Peter Bell, Joachim Fainberg, Ondrej Klejch, Jinyu Li, Steve Renals, Pawel Swietojanski

    Abstract: We present a structured overview of adaptation algorithms for neural network-based speech recognition, considering both hybrid hidden Markov model / neural network systems and end-to-end neural network systems, with a focus on speaker adaptation, domain adaptation, and accent adaptation. The overview characterizes adaptation algorithms as based on embeddings, model parameter adaptation, or data au… ▽ More

    Submitted 28 February, 2021; v1 submitted 14 August, 2020; originally announced August 2020.

    Comments: Total of 31 pages, 27 figures. Associated repository: https://github.com/pswietojanski/ojsp_adaptation_review_2020

    Journal ref: IEEE Open Journal of Signal Processing, vol. 2, pp. 33-66, 2021

  10. arXiv:2003.13551  [pdf

    cs.CL

    European Language Grid: An Overview

    Authors: Georg Rehm, Maria Berger, Ela Elsholz, Stefanie Hegele, Florian Kintzel, Katrin Marheinecke, Stelios Piperidis, Miltos Deligiannis, Dimitris Galanis, Katerina Gkirtzou, Penny Labropoulou, Kalina Bontcheva, David Jones, Ian Roberts, Jan Hajic, Jana Hamrlová, Lukáš Kačena, Khalid Choukri, Victoria Arranz, Andrejs Vasiļjevs, Orians Anvari, Andis Lagzdiņš, Jūlija Meļņika, Gerhard Backfried, Erinç Dikici , et al. (11 additional authors not shown)

    Abstract: With 24 official EU and many additional languages, multilingualism in Europe and an inclusive Digital Single Market can only be enabled through Language Technologies (LTs). European LT business is dominated by hundreds of SMEs and a few large players. Many are world-class, with technologies that outperform the global players. However, European LT business is also fragmented, by nation states, lang… ▽ More

    Submitted 30 March, 2020; originally announced March 2020.

    Comments: Proceedings of the 12th Language Resources and Evaluation Conference (LREC 2020). To appear

  11. arXiv:1910.10605  [pdf, ps, other

    cs.CL cs.LG eess.AS

    Speaker Adaptive Training using Model Agnostic Meta-Learning

    Authors: Ondřej Klejch, Joachim Fainberg, Peter Bell, Steve Renals

    Abstract: Speaker adaptive training (SAT) of neural network acoustic models learns models in a way that makes them more suitable for adaptation to test conditions. Conventionally, model-based speaker adaptive training is performed by having a set of speaker dependent parameters that are jointly optimised with speaker independent parameters in order to remove speaker variation. However, this does not scale w… ▽ More

    Submitted 23 October, 2019; originally announced October 2019.

    Comments: Accepted to IEEE ASRU 2019

  12. arXiv:1909.13759  [pdf, other

    eess.AS cs.CL cs.SD

    Acoustic Model Adaptation from Raw Waveforms with SincNet

    Authors: Joachim Fainberg, Ondřej Klejch, Erfan Loweimi, Peter Bell, Steve Renals

    Abstract: Raw waveform acoustic modelling has recently gained interest due to neural networks' ability to learn feature extraction, and the potential for finding better representations for a given scenario than hand-crafted features. SincNet has been proposed to reduce the number of parameters required in raw-waveform modelling, by restricting the filter functions, rather than having to learn every tap of e… ▽ More

    Submitted 30 September, 2019; originally announced September 2019.

    Comments: Accepted to IEEE ASRU 2019

  13. arXiv:1906.11521  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Lattice-Based Unsupervised Test-Time Adaptation of Neural Network Acoustic Models

    Authors: Ondrej Klejch, Joachim Fainberg, Peter Bell, Steve Renals

    Abstract: Acoustic model adaptation to unseen test recordings aims to reduce the mismatch between training and testing conditions. Most adaptation schemes for neural network models require the use of an initial one-best transcription for the test data, generated by an unadapted model, in order to estimate the adaptation transform. It has been found that adaptation methods using discriminative objective func… ▽ More

    Submitted 27 June, 2019; originally announced June 2019.

  14. arXiv:1905.13150  [pdf, other

    cs.CL cs.SD eess.AS

    Lattice-based lightly-supervised acoustic model training

    Authors: Joachim Fainberg, Ondřej Klejch, Steve Renals, Peter Bell

    Abstract: In the broadcast domain there is an abundance of related text data and partial transcriptions, such as closed captions and subtitles. This text data can be used for lightly supervised training, in which text matching the audio is selected using an existing speech recognition model. Current approaches to light supervision typically filter the data based on matching error rates between the transcrip… ▽ More

    Submitted 13 July, 2019; v1 submitted 30 May, 2019; originally announced May 2019.

    Comments: Proc. INTERSPEECH 2019

  15. arXiv:1901.01342  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection

    Authors: Joseph Roth, Sourish Chaudhuri, Ondrej Klejch, Radhika Marvin, Andrew Gallagher, Liat Kaver, Sharadh Ramaswamy, Arkadiusz Stopczynski, Cordelia Schmid, Zhonghua Xi, Caroline Pantofaru

    Abstract: Active speaker detection is an important component in video analysis algorithms for applications such as speaker diarization, video re-targeting for meetings, speech enhancement, and human-robot interaction. The absence of a large, carefully labeled audio-visual dataset for this task has constrained algorithm evaluations with respect to data diversity, environments, and accuracy. This has made com… ▽ More

    Submitted 24 May, 2019; v1 submitted 4 January, 2019; originally announced January 2019.

  16. arXiv:1808.10239  [pdf, other

    cs.CL

    Learning to adapt: a meta-learning approach for speaker adaptation

    Authors: Ondřej Klejch, Joachim Fainberg, Peter Bell

    Abstract: The performance of automatic speech recognition systems can be improved by adapting an acoustic model to compensate for the mismatch between training and testing conditions, for example by adapting to unseen speakers. The success of speaker adaptation methods relies on selecting weights that are suitable for adaptation and using good adaptation schedules to update these weights in order not to ove… ▽ More

    Submitted 30 August, 2018; originally announced August 2018.

    Comments: Interspeech 2018