Skip to main content

Showing 1–30 of 30 results for author: Motlicek, P

  1. arXiv:2407.04444  [pdf, other

    cs.CL cs.SD eess.AS

    TokenVerse: Unifying Speech and NLP Tasks via Transducer-based ASR

    Authors: Shashi Kumar, Srikanth Madikeri, Juan Zuluaga-Gomez, Iuliia Nigmatulina, Esaú Villatoro-Tello, Sergio Burdisso, Petr Motlicek, Karthik Pandia, Aravind Ganapathiraju

    Abstract: In traditional conversational intelligence from speech, a cascaded pipeline is used, involving tasks such as voice activity detection, diarization, transcription, and subsequent processing with different NLP models for tasks like semantic endpointing and named entity recognition (NER). Our paper introduces TokenVerse, a single Transducer-based model designed to handle multiple tasks. This is achie… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 5 pages, double column

  2. arXiv:2404.14463  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    DAIC-WOZ: On the Validity of Using the Therapist's prompts in Automatic Depression Detection from Clinical Interviews

    Authors: Sergio Burdisso, Ernesto Reyes-Ramírez, Esaú Villatoro-Tello, Fernando Sánchez-Vega, Pastor López-Monroy, Petr Motlicek

    Abstract: Automatic depression detection from conversational data has gained significant interest in recent years. The DAIC-WOZ dataset, interviews conducted by a human-controlled virtual agent, has been widely used for this task. Recent studies have reported enhanced performance when incorporating interviewer's prompts into the model. In this work, we hypothesize that this improvement might be mainly due t… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: Accepted to Clinical NLP workshop at NAACL 2024

  3. arXiv:2404.09565  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    Reliability Estimation of News Media Sources: Birds of a Feather Flock Together

    Authors: Sergio Burdisso, Dairazalia Sánchez-Cortés, Esaú Villatoro-Tello, Petr Motlicek

    Abstract: Evaluating the reliability of news sources is a routine task for journalists and organizations committed to acquiring and disseminating accurate information. Recent research has shown that predicting sources' reliability represents an important first-prior step in addressing additional challenges such as fake news detection and fact-checking. In this paper, we introduce a novel approach for source… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: Accepted to NAACL 2024 Main Conference

  4. Node-weighted Graph Convolutional Network for Depression Detection in Transcribed Clinical Interviews

    Authors: Sergio Burdisso, Esaú Villatoro-Tello, Srikanth Madikeri, Petr Motlicek

    Abstract: We propose a simple approach for weighting self-connecting edges in a Graph Convolutional Network (GCN) and show its impact on depression detection from transcribed clinical interviews. To this end, we use a GCN for modeling non-consecutive and long-distance semantics to classify the transcriptions into depressed or control subjects. The proposed method aims to mitigate the limiting assumptions of… ▽ More

    Submitted 11 March, 2024; v1 submitted 3 July, 2023; originally announced July 2023.

    Comments: Paper Accepted to Interspeech 2023

    Journal ref: Interspeech 2023

  5. arXiv:2306.15685  [pdf, other

    eess.AS cs.CL

    Implementing contextual biasing in GPU decoder for online ASR

    Authors: Iuliia Nigmatulina, Srikanth Madikeri, Esaú Villatoro-Tello, Petr Motliček, Juan Zuluaga-Gomez, Karthik Pandia, Aravind Ganapathiraju

    Abstract: GPU decoding significantly accelerates the output of ASR predictions. While GPUs are already being used for online ASR decoding, post-processing and rescoring on GPUs have not been properly investigated yet. Rescoring with available contextual information can considerably improve ASR predictions. Previous studies have proven the viability of lattice rescoring in decoding and biasing language model… ▽ More

    Submitted 23 June, 2023; originally announced June 2023.

    Comments: Accepted to Interspeech 2023

  6. arXiv:2305.18281  [pdf, other

    cs.CL cs.AI cs.LG eess.AS

    HyperConformer: Multi-head HyperMixer for Efficient Speech Recognition

    Authors: Florian Mai, Juan Zuluaga-Gomez, Titouan Parcollet, Petr Motlicek

    Abstract: State-of-the-art ASR systems have achieved promising results by modeling local and global interactions separately. While the former can be computed efficiently, global interactions are usually modeled via attention mechanisms, which are expensive for long input sequences. Here, we address this by extending HyperMixer, an efficient alternative to attention exhibiting linear complexity, to the Confo… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

    Comments: Florian Mai and Juan Zuluaga-Gomez contributed equally. To appear in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2023

  7. arXiv:2305.01155  [pdf, other

    eess.AS cs.CL cs.HC cs.SD

    Lessons Learned in ATCO2: 5000 hours of Air Traffic Control Communications for Robust Automatic Speech Recognition and Understanding

    Authors: Juan Zuluaga-Gomez, Iuliia Nigmatulina, Amrutha Prasad, Petr Motlicek, Driss Khalil, Srikanth Madikeri, Allan Tart, Igor Szoke, Vincent Lenders, Mickael Rigault, Khalid Choukri

    Abstract: Voice communication between air traffic controllers (ATCos) and pilots is critical for ensuring safe and efficient air traffic control (ATC). This task requires high levels of awareness from ATCos and can be tedious and error-prone. Recent attempts have been made to integrate artificial intelligence (AI) into ATC in order to reduce the workload of ATCos. However, the development of data-driven AI… ▽ More

    Submitted 1 May, 2023; originally announced May 2023.

    Comments: Manuscript under review

  8. arXiv:2304.07842  [pdf, other

    eess.AS cs.AI cs.HC

    A Virtual Simulation-Pilot Agent for Training of Air Traffic Controllers

    Authors: Juan Zuluaga-Gomez, Amrutha Prasad, Iuliia Nigmatulina, Petr Motlicek, Matthias Kleinert

    Abstract: In this paper we propose a novel virtual simulation-pilot engine for speeding up air traffic controller (ATCo) training by integrating different state-of-the-art artificial intelligence (AI) based tools. The virtual simulation-pilot engine receives spoken communications from ATCo trainees, and it performs automatic speech recognition and understanding. Thus, it goes beyond only transcribing the co… ▽ More

    Submitted 16 April, 2023; originally announced April 2023.

    Comments: Under review

  9. arXiv:2212.08489  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Effectiveness of Text, Acoustic, and Lattice-based representations in Spoken Language Understanding tasks

    Authors: Esaú Villatoro-Tello, Srikanth Madikeri, Juan Zuluaga-Gomez, Bidisha Sharma, Seyyed Saeed Sarfjoo, Iuliia Nigmatulina, Petr Motlicek, Alexei V. Ivanov, Aravind Ganapathiraju

    Abstract: In this paper, we perform an exhaustive evaluation of different representations to address the intent classification problem in a Spoken Language Understanding (SLU) setup. We benchmark three types of systems to perform the SLU intent detection task: 1) text-based, 2) lattice-based, and a novel 3) multimodal approach. Our work provides a comprehensive analysis of what could be the achievable perfo… ▽ More

    Submitted 17 March, 2023; v1 submitted 16 December, 2022; originally announced December 2022.

    Comments: Accepted in ICASSP 2023

    ACM Class: I.2.7

    Journal ref: ICASSP 2023

  10. arXiv:2212.07164  [pdf, other

    cs.CL cs.AI cs.LG eess.AS

    Speech and Natural Language Processing Technologies for Pseudo-Pilot Simulator

    Authors: Amrutha Prasad, Juan Zuluaga-Gomez, Petr Motlicek, Saeed Sarfjoo, Iuliia Nigmatulina, Karel Vesely

    Abstract: This paper describes a simple yet efficient repetition-based modular system for speeding up air-traffic controllers (ATCos) training. E.g., a human pilot is still required in EUROCONTROL's ESCAPE lite simulator (see https://www.eurocontrol.int/simulator/escape) during ATCo training. However, this need can be substituted by an automatic system that could act as a pilot. In this paper, we aim to dev… ▽ More

    Submitted 14 December, 2022; originally announced December 2022.

    Comments: Presented at Sesar Innovation Days 2022. https://www.sesarju.eu/sesarinnovationdays

  11. arXiv:2211.04054  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    ATCO2 corpus: A Large-Scale Dataset for Research on Automatic Speech Recognition and Natural Language Understanding of Air Traffic Control Communications

    Authors: Juan Zuluaga-Gomez, Karel Veselý, Igor Szöke, Alexander Blatt, Petr Motlicek, Martin Kocour, Mickael Rigault, Khalid Choukri, Amrutha Prasad, Seyyed Saeed Sarfjoo, Iuliia Nigmatulina, Claudia Cevenini, Pavel Kolčárek, Allan Tart, Jan Černocký, Dietrich Klakow

    Abstract: Personal assistants, automatic speech recognizers and dialogue understanding systems are becoming more critical in our interconnected digital world. A clear example is air traffic control (ATC) communications. ATC aims at guiding aircraft and controlling the airspace in a safe and optimal manner. These voice-based dialogues are carried between an air traffic controller (ATCO) and pilots via very-h… ▽ More

    Submitted 15 June, 2023; v1 submitted 8 November, 2022; originally announced November 2022.

    Comments: Manuscript under review; The code is available at: https://github.com/idiap/atco2-corpus

  12. IDIAPers @ Causal News Corpus 2022: Efficient Causal Relation Identification Through a Prompt-based Few-shot Approach

    Authors: Sergio Burdisso, Juan Zuluaga-Gomez, Esau Villatoro-Tello, Martin Fajcik, Muskaan Singh, Pavel Smrz, Petr Motlicek

    Abstract: In this paper, we describe our participation in the subtask 1 of CASE-2022, Event Causality Identification with Casual News Corpus. We address the Causal Relation Identification (CRI) task by exploiting a set of simple yet complementary techniques for fine-tuning language models (LMs) on a small number of annotated examples (i.e., a few-shot configuration). We follow a prompt-based prediction appr… ▽ More

    Submitted 14 October, 2022; v1 submitted 8 September, 2022; originally announced September 2022.

    Comments: To be published in CASE@EMNLP 2022 (5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text)

    Journal ref: CASE @ EMNLP 2022

  13. arXiv:2209.03891  [pdf, other

    cs.CL cs.AI

    IDIAPers @ Causal News Corpus 2022: Extracting Cause-Effect-Signal Triplets via Pre-trained Autoregressive Language Model

    Authors: Martin Fajcik, Muskaan Singh, Juan Zuluaga-Gomez, Esaú Villatoro-Tello, Sergio Burdisso, Petr Motlicek, Pavel Smrz

    Abstract: In this paper, we describe our shared task submissions for Subtask 2 in CASE-2022, Event Causality Identification with Casual News Corpus. The challenge focused on the automatic detection of all cause-effect-signal spans present in the sentence from news-media. We detect cause-effect-signal spans in a sentence using T5 -- a pre-trained autoregressive language model. We iteratively identify all cau… ▽ More

    Submitted 20 October, 2022; v1 submitted 8 September, 2022; originally announced September 2022.

    Comments: Camera-ready for CASE@EMNLP

  14. arXiv:2207.14116  [pdf, other

    cs.CL cs.AI

    Claim-Dissector: An Interpretable Fact-Checking System with Joint Re-ranking and Veracity Prediction

    Authors: Martin Fajcik, Petr Motlicek, Pavel Smrz

    Abstract: We present Claim-Dissector: a novel latent variable model for fact-checking and analysis, which given a claim and a set of retrieved evidences jointly learns to identify: (i) the relevant evidences to the given claim, (ii) the veracity of the claim. We propose to disentangle the per-evidence relevance probability and its contribution to the final veracity probability in an interpretable way -- the… ▽ More

    Submitted 7 August, 2023; v1 submitted 28 July, 2022; originally announced July 2022.

    Comments: updated acknowledgement

  15. arXiv:2203.16822  [pdf, other

    eess.AS cs.CL cs.LG

    How Does Pre-trained Wav2Vec 2.0 Perform on Domain Shifted ASR? An Extensive Benchmark on Air Traffic Control Communications

    Authors: Juan Zuluaga-Gomez, Amrutha Prasad, Iuliia Nigmatulina, Saeed Sarfjoo, Petr Motlicek, Matthias Kleinert, Hartmut Helmke, Oliver Ohneiser, Qingran Zhan

    Abstract: Recent work on self-supervised pre-training focus on leveraging large-scale unlabeled speech data to build robust end-to-end (E2E) acoustic models (AM) that can be later fine-tuned on downstream tasks e.g., automatic speech recognition (ASR). Yet, few works investigated the impact on performance when the data properties substantially differ between the pre-training and fine-tuning phases, termed d… ▽ More

    Submitted 17 October, 2022; v1 submitted 31 March, 2022; originally announced March 2022.

    Comments: To be published in the 2022 IEEE Spoken Language Technology Workshop (SLT) (SLT 2022)

  16. arXiv:2202.03725  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    A two-step approach to leverage contextual data: speech recognition in air-traffic communications

    Authors: Iuliia Nigmatulina, Juan Zuluaga-Gomez, Amrutha Prasad, Seyyed Saeed Sarfjoo, Petr Motlicek

    Abstract: Automatic Speech Recognition (ASR), as the assistance of speech communication between pilots and air-traffic controllers, can significantly reduce the complexity of the task and increase the reliability of transmitted information. ASR application can lead to a lower number of incidents caused by misunderstanding and improve air traffic management (ATM) efficiency. Evidently, high accuracy predicti… ▽ More

    Submitted 8 February, 2022; originally announced February 2022.

    Comments: 20XX IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. arXiv admin note: text overlap with arXiv:2108.12156

    Journal ref: ICASSP 2022

  17. arXiv:2110.05781  [pdf, other

    eess.AS cs.CL cs.LG

    BERTraffic: BERT-based Joint Speaker Role and Speaker Change Detection for Air Traffic Control Communications

    Authors: Juan Zuluaga-Gomez, Seyyed Saeed Sarfjoo, Amrutha Prasad, Iuliia Nigmatulina, Petr Motlicek, Karel Ondrej, Oliver Ohneiser, Hartmut Helmke

    Abstract: Automatic speech recognition (ASR) allows transcribing the communications between air traffic controllers (ATCOs) and aircraft pilots. The transcriptions are used later to extract ATC named entities, e.g., aircraft callsigns. One common challenge is speech activity detection (SAD) and speaker diarization (SD). In the failure condition, two or more segments remain in the same recording, jeopardizin… ▽ More

    Submitted 14 October, 2022; v1 submitted 12 October, 2021; originally announced October 2021.

    Comments: To be published in the 2022 IEEE Spoken Language Technology Workshop (SLT) (SLT 2022)

  18. arXiv:2108.12175  [pdf, other

    cs.CL cs.LG eess.AS

    Grammar Based Speaker Role Identification for Air Traffic Control Speech Recognition

    Authors: Amrutha Prasad, Juan Zuluaga-Gomez, Petr Motlicek, Saeed Sarfjoo, Iuliia Nigmatulina, Oliver Ohneiser, Hartmut Helmke

    Abstract: Automatic Speech Recognition (ASR) for air traffic control is generally trained by pooling Air Traffic Controller (ATCO) and pilot data into one set. This is motivated by the fact that pilot's voice communications are more scarce than ATCOs. Due to this data imbalance and other reasons (e.g., varying acoustic conditions), the speech from ATCOs is usually recognized more accurately than from pilots… ▽ More

    Submitted 14 December, 2022; v1 submitted 27 August, 2021; originally announced August 2021.

    Comments: Presented at Sesar Innovation Days - 2022. See https://www.sesarju.eu/sesarinnovationdays

  19. arXiv:2108.12156  [pdf, other

    cs.CL cs.LG eess.AS

    Improving callsign recognition with air-surveillance data in air-traffic communication

    Authors: Iuliia Nigmatulina, Rudolf Braun, Juan Zuluaga-Gomez, Petr Motlicek

    Abstract: Automatic Speech Recognition (ASR) can be used as the assistance of speech communication between pilots and air-traffic controllers. Its application can significantly reduce the complexity of the task and increase the reliability of transmitted information. Evidently, high accuracy predictions are needed to minimize the risk of errors. Especially, high accuracy is required in recognition of key in… ▽ More

    Submitted 27 August, 2021; originally announced August 2021.

    Comments: Submitted to Interspeech 2021

  20. A Comparison of Methods for OOV-word Recognition on a New Public Dataset

    Authors: Rudolf A. Braun, Srikanth Madikeri, Petr Motlicek

    Abstract: A common problem for automatic speech recognition systems is how to recognize words that they did not see during training. Currently there is no established method of evaluating different techniques for tackling this problem. We propose using the CommonVoice dataset to create test sets for multiple languages which have a high out-of-vocabulary (OOV) ratio relative to a training set and release a n… ▽ More

    Submitted 16 July, 2021; originally announced July 2021.

  21. arXiv:2104.03643  [pdf, other

    cs.CL cs.CV cs.LG eess.AS

    Contextual Semi-Supervised Learning: An Approach To Leverage Air-Surveillance and Untranscribed ATC Data in ASR Systems

    Authors: Juan Zuluaga-Gomez, Iuliia Nigmatulina, Amrutha Prasad, Petr Motlicek, Karel Veselý, Martin Kocour, Igor Szöke

    Abstract: Air traffic management and specifically air-traffic control (ATC) rely mostly on voice communications between Air Traffic Controllers (ATCos) and pilots. In most cases, these voice communications follow a well-defined grammar that could be leveraged in Automatic Speech Recognition (ASR) technologies. The callsign used to address an airplane is an essential part of all ATCo-pilot communications. We… ▽ More

    Submitted 27 August, 2021; v1 submitted 8 April, 2021; originally announced April 2021.

    Comments: Presented at: Interspeech conference 2021 (Brno, Czechia, August 30 - September 3)

  22. arXiv:2011.02198  [pdf, other

    cs.SD eess.AS

    IEEE SLT 2021 Alpha-mini Speech Challenge: Open Datasets, Tracks, Rules and Baselines

    Authors: Yihui Fu, Zhuoyuan Yao, Weipeng He, Jian Wu, Xiong Wang, Zhanheng Yang, Shimin Zhang, Lei Xie, Dongyan Huang, Hui Bu, Petr Motlicek, Jean-Marc Odobez

    Abstract: The IEEE Spoken Language Technology Workshop (SLT) 2021 Alpha-mini Speech Challenge (ASC) is intended to improve research on keyword spotting (KWS) and sound source location (SSL) on humanoid robots. Many publications report significant improvements in deep learning based KWS and SSL on open source datasets in recent years. For deep learning model training, it is necessary to expand the data cover… ▽ More

    Submitted 14 November, 2020; v1 submitted 4 November, 2020; originally announced November 2020.

    Comments: Accepted at IEEE SLT 2021

  23. arXiv:2010.12277  [pdf, other

    cs.SD eess.AS

    Speech Activity Detection Based on Multilingual Speech Recognition System

    Authors: Seyyed Saeed Sarfjoo, Srikanth Madikeri, Petr Motlicek

    Abstract: To better model the contextual information and increase the generalization ability of Speech Activity Detection (SAD) system, this paper leverages a multi-lingual Automatic Speech Recognition (ASR) system to perform SAD. Sequence discriminative training of Acoustic Model (AM) using Lattice-Free Maximum Mutual Information (LF-MMI) loss function, effectively extracts the contextual information of th… ▽ More

    Submitted 11 April, 2021; v1 submitted 23 October, 2020; originally announced October 2020.

    Comments: Submitted to Interspeech 2021

  24. arXiv:2010.03466  [pdf, ps, other

    eess.AS cs.SD

    Pkwrap: a PyTorch Package for LF-MMI Training of Acoustic Models

    Authors: Srikanth Madikeri, Sibo Tong, Juan Zuluaga-Gomez, Apoorv Vyas, Petr Motlicek, Hervé Bourlard

    Abstract: We present a simple wrapper that is useful to train acoustic models in PyTorch using Kaldi's LF-MMI training framework. The wrapper, called pkwrap (short form of PyTorch kaldi wrapper), enables the user to utilize the flexibility provided by PyTorch in designing model architectures. It exposes the LF-MMI cost function as an autograd function. Other capabilities of Kaldi have also been ported to Py… ▽ More

    Submitted 7 October, 2020; originally announced October 2020.

  25. arXiv:2006.10304  [pdf, ps, other

    cs.CL cs.CV cs.LG cs.SD eess.AS

    Automatic Speech Recognition Benchmark for Air-Traffic Communications

    Authors: Juan Zuluaga-Gomez, Petr Motlicek, Qingran Zhan, Karel Vesely, Rudolf Braun

    Abstract: Advances in Automatic Speech Recognition (ASR) over the last decade opened new areas of speech-based automation such as in Air-Traffic Control (ATC) environment. Currently, voice communication and data links communications are the only way of contact between pilots and Air-Traffic Controllers (ATCo), where the former is the most widely used and the latter is a non-spoken method mandatory for ocean… ▽ More

    Submitted 13 August, 2020; v1 submitted 18 June, 2020; originally announced June 2020.

    Comments: Accepted to: 21st INTERSPEECH conference (Shanghai, October 25-29)

  26. arXiv:2006.09054  [pdf, other

    eess.AS cs.SD

    Quantization of Acoustic Model Parameters in Automatic Speech Recognition Framework

    Authors: Amrutha Prasad, Petr Motlicek, Srikanth Madikeri

    Abstract: State-of-the-art hybrid automatic speech recognition (ASR) system exploits deep neural network (DNN) based acoustic models (AM) trained with Lattice Free-Maximum Mutual Information (LF-MMI) criterion and n-gram language models. The AMs typically have millions of parameters and require significant parameter reduction to operate on embedded devices. The impact of parameter quantization on the overal… ▽ More

    Submitted 20 November, 2020; v1 submitted 16 June, 2020; originally announced June 2020.

    Comments: Submitted to ICASSP21

  27. arXiv:2006.02093  [pdf, other

    cs.SI cs.SD eess.AS

    Graph2Speak: Improving Speaker Identification using Network Knowledge in Criminal Conversational Data

    Authors: Mael Fabien, Seyyed Saeed Sarfjoo, Petr Motlicek, Srikanth Madikeri

    Abstract: Criminal investigations mostly rely on the collection of speech conversational data in order to identify speakers and build or enrich an existing criminal network. Social network analysis tools are then applied to identify the most central characters and the different communities within the network. We introduce two candidate datasets for criminal conversational data, Crime Scene Investigation (CS… ▽ More

    Submitted 21 September, 2020; v1 submitted 3 June, 2020; originally announced June 2020.

  28. arXiv:1909.06749  [pdf, other

    cs.RO cs.AI

    MuMMER: Socially Intelligent Human-Robot Interaction in Public Spaces

    Authors: Mary Ellen Foster, Bart Craenen, Amol Deshmukh, Oliver Lemon, Emanuele Bastianelli, Christian Dondrup, Ioannis Papaioannou, Andrea Vanzo, Jean-Marc Odobez, Olivier Canévet, Yuanzhouhan Cao, Weipeng He, Angel Martínez-González, Petr Motlicek, Rémy Siegfried, Rachid Alami, Kathleen Belhassein, Guilhem Buisan, Aurélie Clodic, Amandine Mayima, Yoan Sallami, Guillaume Sarthou, Phani-Teja Singamaneni, Jules Waldhart, Alexandre Mazel , et al. (5 additional authors not shown)

    Abstract: In the EU-funded MuMMER project, we have developed a social robot designed to interact naturally and flexibly with users in public spaces such as a shopping mall. We present the latest version of the robot system developed during the project. This system encompasses audio-visual sensing, social signal processing, conversational interaction, perspective taking, geometric reasoning, and motion plann… ▽ More

    Submitted 15 September, 2019; originally announced September 2019.

    Report number: AI-HRI/2019/14

  29. arXiv:1908.05227  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Exploiting semi-supervised training through a dropout regularization in end-to-end speech recognition

    Authors: Subhadeep Dey, Petr Motlicek, Trung Bui, Franck Dernoncourt

    Abstract: In this paper, we explore various approaches for semi supervised learning in an end to end automatic speech recognition (ASR) framework. The first step in our approach involves training a seed model on the limited amount of labelled data. Additional unlabelled speech data is employed through a data selection mechanism to obtain the best hypothesized output, further used to retrain the seed model.… ▽ More

    Submitted 8 August, 2019; originally announced August 2019.

    Comments: Interspeech 2019

    MSC Class: 62H30

  30. arXiv:1711.11565  [pdf, other

    cs.SD cs.AI cs.MM cs.RO eess.AS

    Deep Neural Networks for Multiple Speaker Detection and Localization

    Authors: Weipeng He, Petr Motlicek, Jean-Marc Odobez

    Abstract: We propose to use neural networks for simultaneous detection and localization of multiple sound sources in human-robot interaction. In contrast to conventional signal processing techniques, neural network-based sound source localization methods require fewer strong assumptions about the environment. Previous neural network-based methods have been focusing on localizing a single sound source, which… ▽ More

    Submitted 26 February, 2018; v1 submitted 30 November, 2017; originally announced November 2017.

    Comments: Accepted for ICRA 2018

    Journal ref: 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 2018, pp. 74-79