Skip to main content

Showing 1–29 of 29 results for author: Ferrer, L

  1. Fusion approaches for emotion recognition from speech using acoustic and text-based features

    Authors: Leonardo Pepino, Pablo Riera, Luciana Ferrer, Agustin Gravano

    Abstract: In this paper, we study different approaches for classifying emotions from speech using acoustic and text-based features. We propose to obtain contextualized word embeddings with BERT to represent the information contained in speech transcriptions and show that this results in better performance than using Glove embeddings. We also propose and compare different strategies to combine the audio and… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: 5 pages. Accepted in ICASSP 2020

  2. arXiv:2401.03051  [pdf, other

    cs.LG math.DS

    On the Stability of a non-hyperbolic nonlinear map with non-bounded set of non-isolated fixed points with applications to Machine Learning

    Authors: Roberta Hansen, Matias Vera, Lautaro Estienne, Luciana Ferrer, Pablo Piantanida

    Abstract: This paper deals with the convergence analysis of the SUCPA (Semi Unsupervised Calibration through Prior Adaptation) algorithm, defined from a first-order non-linear difference equations, first developed to correct the scores output by a supervised machine learning classifier. The convergence analysis is addressed as a dynamical system problem, by studying the local and global stability of the non… ▽ More

    Submitted 25 April, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

  3. arXiv:2309.07391  [pdf, other

    cs.SD cs.LG eess.AS

    EnCodecMAE: Leveraging neural codecs for universal audio representation learning

    Authors: Leonardo Pepino, Pablo Riera, Luciana Ferrer

    Abstract: The goal of universal audio representation learning is to obtain foundational models that can be used for a variety of downstream tasks involving speech, music and environmental sounds. To approach this problem, methods inspired by works on self-supervised learning for NLP, like BERT, or computer vision, like masked autoencoders (MAE), are often adapted to the audio domain. In this work, we propos… ▽ More

    Submitted 20 May, 2024; v1 submitted 13 September, 2023; originally announced September 2023.

  4. arXiv:2307.16324  [pdf, other

    cs.CL cs.SD eess.AS

    Mispronunciation detection using self-supervised speech representations

    Authors: Jazmin Vidal, Pablo Riera, Luciana Ferrer

    Abstract: In recent years, self-supervised learning (SSL) models have produced promising results in a variety of speech-processing tasks, especially in contexts of data scarcity. In this paper, we study the use of SSL models for the task of mispronunciation detection for second language learners. We compare two downstream approaches: 1) training the model for phone recognition (PR) using native English data… ▽ More

    Submitted 30 July, 2023; originally announced July 2023.

  5. Unsupervised Calibration through Prior Adaptation for Text Classification using Large Language Models

    Authors: Lautaro Estienne, Luciana Ferrer, Matías Vera, Pablo Piantanida

    Abstract: A wide variety of natural language tasks are currently being addressed with large-scale language models (LLMs). These models are usually trained with a very large amount of unsupervised text data and adapted to perform a downstream natural language task using methods like fine-tuning, calibration or in-context learning. In this work, we propose an approach to adapt the prior class distribution to… ▽ More

    Submitted 9 August, 2023; v1 submitted 13 July, 2023; originally announced July 2023.

    Journal ref: In Proceedings of the RANLP 2023 Student Research Workshop

  6. arXiv:2303.12540  [pdf, other

    cs.CV cs.LG eess.IV

    Deployment of Image Analysis Algorithms under Prevalence Shifts

    Authors: Patrick Godau, Piotr Kalinowski, Evangelia Christodoulou, Annika Reinke, Minu Tizabi, Luciana Ferrer, Paul Jäger, Lena Maier-Hein

    Abstract: Domain gaps are among the most relevant roadblocks in the clinical translation of machine learning (ML)-based solutions for medical image analysis. While current research focuses on new training paradigms and network architectures, little attention is given to the specific effect of prevalence shifts on an algorithm deployed in practice. Such discrepancies between class frequencies in the data use… ▽ More

    Submitted 24 July, 2023; v1 submitted 22 March, 2023; originally announced March 2023.

  7. Phone and speaker spatial organization in self-supervised speech representations

    Authors: Pablo Riera, Manuela Cerdeiro, Leonardo Pepino, Luciana Ferrer

    Abstract: Self-supervised representations of speech are currently being widely used for a large number of applications. Recently, some efforts have been made in trying to analyze the type of information present in each of these representations. Most such work uses downstream models to test whether the representations can be successfully used for a specific task. The downstream models, though, typically perf… ▽ More

    Submitted 24 February, 2023; originally announced February 2023.

  8. Understanding metric-related pitfalls in image analysis validation

    Authors: Annika Reinke, Minu D. Tizabi, Michael Baumgartner, Matthias Eisenmann, Doreen Heckmann-Nötzel, A. Emre Kavur, Tim Rädsch, Carole H. Sudre, Laura Acion, Michela Antonelli, Tal Arbel, Spyridon Bakas, Arriel Benis, Matthew Blaschko, Florian Buettner, M. Jorge Cardoso, Veronika Cheplygina, Jianxu Chen, Evangelia Christodoulou, Beth A. Cimini, Gary S. Collins, Keyvan Farahani, Luciana Ferrer, Adrian Galdran, Bram van Ginneken , et al. (53 additional authors not shown)

    Abstract: Validation metrics are key for the reliable tracking of scientific progress and for bridging the current chasm between artificial intelligence (AI) research and its translation into practice. However, increasing evidence shows that particularly in image analysis, metrics are often chosen inadequately in relation to the underlying research problem. This could be attributed to a lack of accessibilit… ▽ More

    Submitted 23 February, 2024; v1 submitted 3 February, 2023; originally announced February 2023.

    Comments: Shared first authors: Annika Reinke and Minu D. Tizabi; shared senior authors: Lena Maier-Hein and Paul F. Jäger. Published in Nature Methods. arXiv admin note: text overlap with arXiv:2206.01653

    Journal ref: Nature methods, 1-13 (2024)

  9. arXiv:2209.05355  [pdf, other

    cs.LG cs.AI

    Analysis and Comparison of Classification Metrics

    Authors: Luciana Ferrer

    Abstract: A variety of different performance metrics are commonly used in the machine learning literature for the evaluation of classification systems. Some of the most common ones for measuring quality of hard decisions are standard and balanced accuracy, standard and balanced error rate, F-beta score, and Matthews correlation coefficient (MCC). In this document, we review the definition of these and other… ▽ More

    Submitted 20 September, 2023; v1 submitted 12 September, 2022; originally announced September 2022.

  10. Metrics reloaded: Recommendations for image analysis validation

    Authors: Lena Maier-Hein, Annika Reinke, Patrick Godau, Minu D. Tizabi, Florian Buettner, Evangelia Christodoulou, Ben Glocker, Fabian Isensee, Jens Kleesiek, Michal Kozubek, Mauricio Reyes, Michael A. Riegler, Manuel Wiesenfarth, A. Emre Kavur, Carole H. Sudre, Michael Baumgartner, Matthias Eisenmann, Doreen Heckmann-Nötzel, Tim Rädsch, Laura Acion, Michela Antonelli, Tal Arbel, Spyridon Bakas, Arriel Benis, Matthew Blaschko , et al. (49 additional authors not shown)

    Abstract: Increasing evidence shows that flaws in machine learning (ML) algorithm validation are an underestimated global problem. Particularly in automatic biomedical image analysis, chosen performance metrics often do not reflect the domain interest, thus failing to adequately measure scientific progress and hindering translation of ML techniques into practice. To overcome this, our large international ex… ▽ More

    Submitted 23 February, 2024; v1 submitted 3 June, 2022; originally announced June 2022.

    Comments: Shared first authors: Lena Maier-Hein, Annika Reinke. arXiv admin note: substantial text overlap with arXiv:2104.05642 Published in Nature Methods

    Journal ref: Nature methods, 1-18 (2024)

  11. arXiv:2204.12649  [pdf, other

    eess.AS cs.SD

    Study on the Fairness of Speaker Verification Systems on Underrepresented Accents in English

    Authors: Mariel Estevez, Luciana Ferrer

    Abstract: Speaker verification (SV) systems are currently being used to make sensitive decisions like giving access to bank accounts or deciding whether the voice of a suspect coincides with that of the perpetrator of a crime. Ensuring that these systems are fair and do not disfavor any particular group is crucial. In this work, we analyze the performance of several state-of-the-art SV systems across groups… ▽ More

    Submitted 26 April, 2022; originally announced April 2022.

    Comments: 5 pages, 2 figures, submitted to INTERSPEECH

  12. arXiv:2201.01364  [pdf, other

    cs.CL cs.SD eess.AS

    A Discriminative Hierarchical PLDA-based Model for Spoken Language Recognition

    Authors: Luciana Ferrer, Diego Castan, Mitchell McLaren, Aaron Lawson

    Abstract: Spoken language recognition (SLR) refers to the automatic process used to determine the language present in a speech sample. SLR is an important task in its own right, for example, as a tool to analyze or categorize large amounts of multi-lingual data. Further, it is also an essential tool for selecting downstream applications in a work flow, for example, to chose appropriate speech recognition or… ▽ More

    Submitted 11 August, 2022; v1 submitted 4 January, 2022; originally announced January 2022.

    Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 2396-2410, 2022

  13. arXiv:2112.12843  [pdf, other

    cs.CV

    Impact of class imbalance on chest x-ray classifiers: towards better evaluation practices for discrimination and calibration performance

    Authors: Candelaria Mosquera, Luciana Ferrer, Diego Milone, Daniel Luna, Enzo Ferrante

    Abstract: This work aims to analyze standard evaluation practices adopted by the research community when assessing chest x-ray classifiers, particularly focusing on the impact of class imbalance in such appraisals. Our analysis considers a comprehensive definition of model performance, covering not only discriminative performance but also model calibration, a topic of research that has received increasing a… ▽ More

    Submitted 14 March, 2022; v1 submitted 23 December, 2021; originally announced December 2021.

    Comments: Conference on Health, Inference, and Learning (CHIL) 2022 - Invited non-archival presentation

  14. Deformable image registration with deep network priors: a study on longitudinal PET images

    Authors: Constance Fourcade, Ludovic Ferrer, Noemie Moreau, Gianmarco Santini, Aishlinn Brennan, Caroline Rousseau, Marie Lacombe, Vincent Fleury, Mathilde Colombié, Pascal Jézéquel, Mario Campone, Mathieu Rubeaux, Diana Mateus

    Abstract: Longitudinal image registration is challenging and has not yet benefited from major performance improvements thanks to deep-learning. Inspired by Deep Image Prior, this paper introduces a different use of deep architectures as regularizers to tackle the image registration question. We propose a subject-specific deformable registration method called MIRRBA, relying on a deep pyramidal architecture… ▽ More

    Submitted 30 March, 2022; v1 submitted 22 November, 2021; originally announced November 2021.

    Comments: 11 pages 3 figures in the main article 2 tables in the main article 2 figures in supplementary material

  15. arXiv:2111.00976  [pdf, other

    cs.CL cs.SD eess.AS

    A transfer learning based approach for pronunciation scoring

    Authors: Marcelo Sancinetti, Jazmin Vidal, Cyntia Bonomi, Luciana Ferrer

    Abstract: Phone-level pronunciation scoring is a challenging task, with performance far from that of human annotators. Standard systems generate a score for each phone in a phrase using models trained for automatic speech recognition (ASR) with native data only. Better performance has been shown when using systems that are trained specifically for the task using non-native data. Yet, such systems face the c… ▽ More

    Submitted 9 May, 2023; v1 submitted 1 November, 2021; originally announced November 2021.

    Comments: ICASSP 2022

  16. Study of positional encoding approaches for Audio Spectrogram Transformers

    Authors: Leonardo Pepino, Pablo Riera, Luciana Ferrer

    Abstract: Transformers have revolutionized the world of deep learning, specially in the field of natural language processing. Recently, the Audio Spectrogram Transformer (AST) was proposed for audio classification, leading to state of the art results in several datasets. However, in order for ASTs to outperform CNNs, pretraining with ImageNet is needed. In this paper, we study one component of the AST, the… ▽ More

    Submitted 13 October, 2021; originally announced October 2021.

    Comments: Submitted to ICASSP 2022. 5 pages, 3 figures

  17. arXiv:2104.05642  [pdf, other

    eess.IV cs.CV

    Common Limitations of Image Processing Metrics: A Picture Story

    Authors: Annika Reinke, Minu D. Tizabi, Carole H. Sudre, Matthias Eisenmann, Tim Rädsch, Michael Baumgartner, Laura Acion, Michela Antonelli, Tal Arbel, Spyridon Bakas, Peter Bankhead, Arriel Benis, Matthew Blaschko, Florian Buettner, M. Jorge Cardoso, Jianxu Chen, Veronika Cheplygina, Evangelia Christodoulou, Beth Cimini, Gary S. Collins, Sandy Engelhardt, Keyvan Farahani, Luciana Ferrer, Adrian Galdran, Bram van Ginneken , et al. (68 additional authors not shown)

    Abstract: While the importance of automatic image analysis is continuously increasing, recent meta-research revealed major flaws with respect to algorithm validation. Performance metrics are particularly key for meaningful, objective, and transparent performance assessment and validation of the used automatic algorithms, but relatively little attention has been given to the practical pitfalls when using spe… ▽ More

    Submitted 6 December, 2023; v1 submitted 12 April, 2021; originally announced April 2021.

    Comments: Shared first authors: Annika Reinke and Minu D. Tizabi. This is a dynamic paper on limitations of commonly used metrics. It discusses metrics for image-level classification, semantic and instance segmentation, and object detection. For missing use cases, comments or questions, please contact a.reinke@dkfz.de. Substantial contributions to this document will be acknowledged with a co-authorship

  18. arXiv:2104.03502  [pdf, other

    cs.SD cs.LG eess.AS

    Emotion Recognition from Speech Using Wav2vec 2.0 Embeddings

    Authors: Leonardo Pepino, Pablo Riera, Luciana Ferrer

    Abstract: Emotion recognition datasets are relatively small, making the use of the more sophisticated deep learning approaches challenging. In this work, we propose a transfer learning method for speech emotion recognition where features extracted from pre-trained wav2vec 2.0 models are modeled using simple neural networks. We propose to combine the output of several layers from the pre-trained model using… ▽ More

    Submitted 8 April, 2021; originally announced April 2021.

    Comments: 5 pages, 2 figures. Submitted to Interspeech 2021

  19. arXiv:2104.00732  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Out of a hundred trials, how many errors does your speaker verifier make?

    Authors: Niko Brümmer, Luciana Ferrer, Albert Swart

    Abstract: Out of a hundred trials, how many errors does your speaker verifier make? For the user this is an important, practical question, but researchers and vendors typically sidestep it and supply instead the conditional error-rates that are given by the ROC/DET curve. We posit that the user's question is answered by the Bayes error-rate. We present a tutorial to show how to compute the error-rate that r… ▽ More

    Submitted 1 April, 2021; originally announced April 2021.

    Comments: Submitted to Interspeech 2021

  20. arXiv:2102.09370  [pdf, other

    cs.HC cs.AI cs.LG

    A Study on the Manifestation of Trust in Speech

    Authors: Lara Gauder, Leonardo Pepino, Pablo Riera, Silvina Brussino, Jazmín Vidal, Agustín Gravano, Luciana Ferrer

    Abstract: Research has shown that trust is an essential aspect of human-computer interaction directly determining the degree to which the person is willing to use a system. An automatic prediction of the level of trust that a user has on a certain system could be used to attempt to correct potential distrust by having the system take relevant actions like, for example, apologizing or explaining its decision… ▽ More

    Submitted 9 February, 2021; originally announced February 2021.

    Comments: arXiv admin note: text overlap with arXiv:2007.15711, arXiv:2006.05977

  21. A Speaker Verification Backend with Robust Performance across Conditions

    Authors: Luciana Ferrer, Mitchell McLaren, Niko Brummer

    Abstract: In this paper, we address the problem of speaker verification in conditions unseen or unknown during development. A standard method for speaker verification consists of extracting speaker embeddings with a deep neural network and processing them through a backend composed of probabilistic linear discriminant analysis (PLDA) and global logistic regression score calibration. This method is known to… ▽ More

    Submitted 17 August, 2021; v1 submitted 2 February, 2021; originally announced February 2021.

    Journal ref: Computer Speech and Language, Volume 71, 2021

  22. arXiv:2007.15711  [pdf, other

    eess.AS cs.LG cs.SD

    Detecting Distrust Towards the Skills of a Virtual Assistant Using Speech

    Authors: Leonardo Pepino, Pablo Riera, Lara Gauder, Agustín Gravano, Luciana Ferrer

    Abstract: Research has shown that trust is an essential aspect of human-computer interaction directly determining the degree to which the person is willing to use the system. An automatic prediction of the level of trust that a user has on a certain system could be used to attempt to correct potential distrust by having the system take relevant actions like, for example, explaining its actions more thorough… ▽ More

    Submitted 30 July, 2020; originally announced July 2020.

  23. arXiv:2006.05977  [pdf, other

    cs.HC

    Trust-UBA: A Corpus for the Study of the Manifestation of Trust in Speech

    Authors: Lara Gauder, Pablo Riera, Leonardo Pepino, Silvina Brussino, Jazmín Vidal, Luciana Ferrer, Agustín Gravano

    Abstract: This paper describes a novel protocol for collecting speech data from subjects induced to have different degrees of trust in the skills of a conversational agent. The protocol consists of an interactive session where the subject is asked to respond to a series of factual questions with the help of a virtual assistant. In order to induce subjects to either trust or distrust the agent's skills, they… ▽ More

    Submitted 30 July, 2020; v1 submitted 10 June, 2020; originally announced June 2020.

  24. arXiv:2002.03802  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    A Speaker Verification Backend for Improved Calibration Performance across Varying Conditions

    Authors: Luciana Ferrer, Mitchell McLaren

    Abstract: In a recent work, we presented a discriminative backend for speaker verification that achieved good out-of-the-box calibration performance on most tested conditions containing varying levels of mismatch to the training conditions. This backend mimics the standard PLDA-based backend process used in most current speaker verification systems, including the calibration stage. All parameters of the bac… ▽ More

    Submitted 5 February, 2020; originally announced February 2020.

    Comments: arXiv admin note: substantial text overlap with arXiv:1911.11622

  25. arXiv:1911.11622  [pdf, other

    cs.LG cs.SD eess.AS stat.ML

    A discriminative condition-aware backend for speaker verification

    Authors: Luciana Ferrer, Mitchell McLaren

    Abstract: We present a scoring approach for speaker verification that mimics the standard PLDA-based backend process used in most current speaker verification systems. However, unlike the standard backends, all parameters of the model are jointly trained to optimize the binary cross-entropy for the speaker verification task. We further integrate the calibration stage inside the model, making the parameters… ▽ More

    Submitted 26 November, 2019; originally announced November 2019.

    Journal ref: Proceedings of ICASSP 2020

  26. arXiv:1803.10554  [pdf, other

    cs.LG stat.ML

    Joint PLDA for Simultaneous Modeling of Two Factors

    Authors: Luciana Ferrer, Mitchell McLaren

    Abstract: Probabilistic linear discriminant analysis (PLDA) is a method used for biometric problems like speaker or face recognition that models the variability of the samples using two latent variables, one that depends on the class of the sample and another one that is assumed independent across samples and models the within-class variability. In this work, we propose a generalization of PLDA that enables… ▽ More

    Submitted 28 March, 2018; originally announced March 2018.

    Comments: Submitted to Journal of Machine Learning Research

    Journal ref: Journal of Machine Learning Research, January, 2019

  27. arXiv:1803.03684  [pdf, ps, other

    cs.LG stat.ML

    Scoring Formulation for Multi-Condition Joint PLDA

    Authors: Luciana Ferrer

    Abstract: The joint PLDA model, is a generalization of PLDA where the nuisance variable is no longer considered independent across samples, but potentially shared (tied) across samples that correspond to the same nuisance condition. The original work considered a single nuisance condition, deriving the EM and scoring formulas for this scenario. In this document, we show how to obtain likelihood ratios for s… ▽ More

    Submitted 9 March, 2018; originally announced March 2018.

  28. arXiv:1704.02346  [pdf, ps, other

    cs.LG stat.ML

    Joint Probabilistic Linear Discriminant Analysis

    Authors: Luciana Ferrer

    Abstract: Standard probabilistic linear discriminant analysis (PLDA) for speaker recognition assumes that the sample's features (usually, i-vectors) are given by a sum of three terms: a term that depends on the speaker identity, a term that models the within-speaker variability and is assumed independent across samples, and a final term that models any remaining variability and is also independent across sa… ▽ More

    Submitted 16 January, 2018; v1 submitted 7 April, 2017; originally announced April 2017.

    Comments: Technical report

  29. arXiv:1611.08947  [pdf, other

    cs.GR

    Navigable videos for presenting scientific data on head-mounted displays

    Authors: Jacqueline Chu, Leonardo Ferrer, Min Shih, Kwan-Liu Ma

    Abstract: Immersive, stereoscopic viewing enables scientists to better analyze the spatial structures of visualized physical phenomena. However, their findings cannot be properly presented in traditional media, which lack these core attributes. Creating a presentation tool that captures this environment poses unique challenges, namely related to poor viewing accessibility. Immersive scientific renderings of… ▽ More

    Submitted 27 November, 2016; originally announced November 2016.