Skip to main content

Showing 1–8 of 8 results for author: Perez-Toro, P A

  1. arXiv:2407.03132  [pdf, other

    cs.SD cs.AI cs.CL cs.LG eess.AS

    Speaker- and Text-Independent Estimation of Articulatory Movements and Phoneme Alignments from Speech

    Authors: Tobias Weise, Philipp Klumpp, Kubilay Can Demir, Paula Andrea Pérez-Toro, Maria Schuster, Elmar Noeth, Bjoern Heismann, Andreas Maier, Seung Hee Yang

    Abstract: This paper introduces a novel combination of two tasks, previously treated separately: acoustic-to-articulatory speech inversion (AAI) and phoneme-to-articulatory (PTA) motion estimation. We refer to this joint task as acoustic phoneme-to-articulatory speech inversion (APTAI) and explore two different approaches, both working speaker- and text-independently during inference. We use a multi-task le… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: to be published in Interspeech 2024 proceedings

  2. arXiv:2404.08064  [pdf

    eess.AS cs.AI cs.CR cs.LG

    The Impact of Speech Anonymization on Pathology and Its Limits

    Authors: Soroosh Tayebi Arasteh, Tomas Arias-Vergara, Paula Andrea Perez-Toro, Tobias Weise, Kai Packhaeuser, Maria Schuster, Elmar Noeth, Andreas Maier, Seung Hee Yang

    Abstract: Integration of speech into healthcare has intensified privacy concerns due to its potential as a non-invasive biomarker containing individual biometric information. In response, speaker anonymization aims to conceal personally identifiable information while retaining crucial linguistic content. However, the application of anonymization techniques to pathological speech, a critical area where priva… ▽ More

    Submitted 22 June, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

  3. arXiv:2402.15294  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    A Survey of Music Generation in the Context of Interaction

    Authors: Ismael Agchar, Ilja Baumann, Franziska Braun, Paula Andrea Perez-Toro, Korbinian Riedhammer, Sebastian Trump, Martin Ullrich

    Abstract: In recent years, machine learning, and in particular generative adversarial neural networks (GANs) and attention-based neural networks (transformers), have been successfully used to compose and generate music, both melodies and polyphonic pieces. Current research focuses foremost on style replication (eg. generating a Bach-style chorale) or style transfer (eg. classical to jazz) based on large amo… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  4. arXiv:2308.08306  [pdf, other

    eess.AS cs.SD

    Classifying Dementia in the Presence of Depression: A Cross-Corpus Study

    Authors: Franziska Braun, Sebastian P. Bayerl, Paula A. Pérez-Toro, Florian Hönig, Hartmut Lehfeld, Thomas Hillemacher, Elmar Nöth, Tobias Bocklet, Korbinian Riedhammer

    Abstract: Automated dementia screening enables early detection and intervention, reducing costs to healthcare systems and increasing quality of life for those affected. Depression has shared symptoms with dementia, adding complexity to diagnoses. The research focus so far has been on binary classification of dementia (DEM) and healthy controls (HC) using speech from picture description tests from a single d… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

    Comments: Accepted at INTERSPEECH 2023

  5. arXiv:2204.01677  [pdf, other

    cs.CL

    Self-Supervised Speech Representations Preserve Speech Characteristics while Anonymizing Voices

    Authors: Abner Hernandez, Paula Andrea Pérez-Toro, Juan Camilo Vásquez-Correa, Juan Rafael Orozco-Arroyave, Andreas Maier, Seung Hee Yang

    Abstract: Collecting speech data is an important step in training speech recognition systems and other speech-based machine learning models. However, the issue of privacy protection is an increasing concern that must be addressed. The current study investigates the use of voice conversion as a method for anonymizing voices. In particular, we train several voice conversion models using self-supervised speech… ▽ More

    Submitted 4 April, 2022; originally announced April 2022.

    Comments: Submitted for review at Interspeech 2022

  6. arXiv:2204.01670  [pdf, other

    cs.CL cs.SD eess.AS

    Cross-lingual Self-Supervised Speech Representations for Improved Dysarthric Speech Recognition

    Authors: Abner Hernandez, Paula Andrea Pérez-Toro, Elmar Nöth, Juan Rafael Orozco-Arroyave, Andreas Maier, Seung Hee Yang

    Abstract: State-of-the-art automatic speech recognition (ASR) systems perform well on healthy speech. However, the performance on impaired speech still remains an issue. The current study explores the usefulness of using Wav2Vec self-supervised speech representations as features for training an ASR system for dysarthric speech. Dysarthric speech recognition is particularly difficult as several aspects of sp… ▽ More

    Submitted 4 April, 2022; originally announced April 2022.

    Comments: Submitted for review at Interspeech 2022

  7. arXiv:2201.05912  [pdf, other

    eess.AS cs.LG cs.SD

    Common Phone: A Multilingual Dataset for Robust Acoustic Modelling

    Authors: Philipp Klumpp, Tomás Arias-Vergara, Paula Andrea Pérez-Toro, Elmar Nöth, Juan Rafael Orozco-Arroyave

    Abstract: Current state of the art acoustic models can easily comprise more than 100 million parameters. This growing complexity demands larger training datasets to maintain a decent generalization of the final decision function. An ideal dataset is not necessarily large in size, but large with respect to the amount of unique speakers, utilized hardware and varying recording conditions. This enables a machi… ▽ More

    Submitted 31 January, 2022; v1 submitted 15 January, 2022; originally announced January 2022.

    Comments: Pre-print submitted to LREC 2022 Link to Common Phone: https://zenodo.org/record/5846137

  8. arXiv:2112.11514  [pdf, ps, other

    eess.AS cs.AI cs.LG

    The Phonetic Footprint of Parkinson's Disease

    Authors: Philipp Klumpp, Tomás Arias-Vergara, Juan Camilo Vásquez-Correa, Paula Andrea Pérez-Toro, Juan Rafael Orozco-Arroyave, Anton Batliner, Elmar Nöth

    Abstract: As one of the most prevalent neurodegenerative disorders, Parkinson's disease (PD) has a significant impact on the fine motor skills of patients. The complex interplay of different articulators during speech production and realization of required muscle tension become increasingly difficult, thus leading to a dysarthric speech. Characteristic patterns such as vowel instability, slurred pronunciati… ▽ More

    Submitted 21 December, 2021; originally announced December 2021.

    Comments: https://www.sciencedirect.com/science/article/abs/pii/S0885230821001169

    Journal ref: Elsevier Computer Speech and Language, Volume 72, March 2022