Skip to main content

Showing 1–22 of 22 results for author: Rouvier, M

  1. arXiv:2407.05746  [pdf, other

    cs.AI cs.SD eess.AS

    MSP-Podcast SER Challenge 2024: L'antenne du Ventoux Multimodal Self-Supervised Learning for Speech Emotion Recognition

    Authors: Jarod Duret, Mickael Rouvier, Yannick Estève

    Abstract: In this work, we detail our submission to the 2024 edition of the MSP-Podcast Speech Emotion Recognition (SER) Challenge. This challenge is divided into two distinct tasks: Categorical Emotion Recognition and Emotional Attribute Prediction. We concentrated our efforts on Task 1, which involves the categorical classification of eight emotional states using data from the MSP-Podcast dataset. Our app… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Journal ref: Odyssey 2024, Jun 2024, Quebec, France

  2. arXiv:2406.05876  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Zero-Shot End-To-End Spoken Question Answering In Medical Domain

    Authors: Yanis Labrak, Adel Moumen, Richard Dufour, Mickael Rouvier

    Abstract: In the rapidly evolving landscape of spoken question-answering (SQA), the integration of large language models (LLMs) has emerged as a transformative development. Conventional approaches often entail the use of separate models for question audio transcription and answer selection, resulting in significant resource utilization and error accumulation. To tackle these challenges, we explore the effec… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted to INTERSPEECH 2024

    Journal ref: InterSpeech 2024

  3. arXiv:2403.19634  [pdf, ps, other

    cs.SD cs.CL eess.AS

    Asymmetric and trial-dependent modeling: the contribution of LIA to SdSV Challenge Task 2

    Authors: Pierre-Michel Bousquet, Mickael Rouvier

    Abstract: The SdSv challenge Task 2 provided an opportunity to assess efficiency and robustness of modern text-independent speaker verification systems. But it also made it possible to test new approaches, capable of taking into account the main issues of this challenge (duration, language, ...). This paper describes the contributions of our laboratory to the speaker recognition field. These contributions h… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: LIA system description for the Short Duration Speaker Verification (SdSv) challenge 2020 Task 2

  4. arXiv:2402.19443  [pdf, other

    cs.SD cs.AI eess.AS

    Probing the Information Encoded in Neural-based Acoustic Models of Automatic Speech Recognition Systems

    Authors: Quentin Raymondaud, Mickael Rouvier, Richard Dufour

    Abstract: Deep learning architectures have made significant progress in terms of performance in many research areas. The automatic speech recognition (ASR) field has thus benefited from these scientific and technological advances, particularly for acoustic modeling, now integrating deep neural network architectures. However, these performance gains have translated into increased complexity regarding the inf… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  5. arXiv:2402.15010  [pdf, other

    cs.CL cs.AI cs.LG

    How Important Is Tokenization in French Medical Masked Language Models?

    Authors: Yanis Labrak, Adrien Bazoge, Beatrice Daille, Mickael Rouvier, Richard Dufour

    Abstract: Subword tokenization has become the prevailing standard in the field of natural language processing (NLP) over recent years, primarily due to the widespread utilization of pre-trained language models. This shift began with Byte-Pair Encoding (BPE) and was later followed by the adoption of SentencePiece and WordPiece. While subword tokenization consistently outperforms character and word-level toke… ▽ More

    Submitted 9 June, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

    Journal ref: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

  6. arXiv:2402.13432  [pdf, other

    cs.CL cs.AI cs.LG

    DrBenchmark: A Large Language Understanding Evaluation Benchmark for French Biomedical Domain

    Authors: Yanis Labrak, Adrien Bazoge, Oumaima El Khettari, Mickael Rouvier, Pacome Constant dit Beaufils, Natalia Grabar, Beatrice Daille, Solen Quiniou, Emmanuel Morin, Pierre-Antoine Gourraud, Richard Dufour

    Abstract: The biomedical domain has sparked a significant interest in the field of Natural Language Processing (NLP), which has seen substantial advancements with pre-trained language models (PLMs). However, comparing these models has proven challenging due to variations in evaluation protocols across different models. A fair solution is to aggregate diverse downstream tasks into a benchmark, allowing for t… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: Accepted at LREC-Coling 2024

    Journal ref: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

  7. arXiv:2402.10373  [pdf, other

    cs.CL cs.AI cs.LG

    BioMistral: A Collection of Open-Source Pretrained Large Language Models for Medical Domains

    Authors: Yanis Labrak, Adrien Bazoge, Emmanuel Morin, Pierre-Antoine Gourraud, Mickael Rouvier, Richard Dufour

    Abstract: Large Language Models (LLMs) have demonstrated remarkable versatility in recent years, offering potential applications across specialized domains such as healthcare and medicine. Despite the availability of various open-source LLMs tailored for health contexts, adapting general-purpose LLMs to the medical domain presents significant challenges. In this paper, we introduce BioMistral, an open-sourc… ▽ More

    Submitted 9 June, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

    Comments: Accepted at ACL 2024 - Proceedings of the 62st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

    Journal ref: Proceedings of the 62st Annual Meeting of the Association for Computational Linguistics - Volume 1: Long Papers (ACL 2024)

  8. arXiv:2312.16885  [pdf, other

    cs.SD eess.AS

    Jeffreys divergence-based regularization of neural network output distribution applied to speaker recognition

    Authors: Pierre-Michel Bousquet, Mickael Rouvier

    Abstract: A new loss function for speaker recognition with deep neural network is proposed, based on Jeffreys Divergence. Adding this divergence to the cross-entropy loss function allows to maximize the target value of the output distribution while smoothing the non-target values. This objective function provides highly discriminative features. Beyond this effect, we propose a theoretical justification of i… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

    Comments: Accepted in ICASSP 2023

  9. arXiv:2309.06141  [pdf, other

    cs.SD eess.AS

    SynVox2: Towards a privacy-friendly VoxCeleb2 dataset

    Authors: Xiaoxiao Miao, Xin Wang, Erica Cooper, Junichi Yamagishi, Nicholas Evans, Massimiliano Todisco, Jean-François Bonastre, Mickael Rouvier

    Abstract: The success of deep learning in speaker recognition relies heavily on the use of large datasets. However, the data-hungry nature of deep learning methods has already being questioned on account the ethical, privacy, and legal concerns that arise when using large-scale datasets of natural speech collected from real human speakers. For example, the widely-used VoxCeleb2 dataset for speaker recogniti… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

    Comments: conference

  10. arXiv:2309.05472  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French Speech

    Authors: Titouan Parcollet, Ha Nguyen, Solene Evain, Marcely Zanon Boito, Adrien Pupier, Salima Mdhaffar, Hang Le, Sina Alisamir, Natalia Tomashenko, Marco Dinarelli, Shucong Zhang, Alexandre Allauzen, Maximin Coavoux, Yannick Esteve, Mickael Rouvier, Jerome Goulian, Benjamin Lecouteux, Francois Portet, Solange Rossato, Fabien Ringeval, Didier Schwab, Laurent Besacier

    Abstract: Self-supervised learning (SSL) is at the origin of unprecedented improvements in many different domains including computer vision and natural language processing. Speech processing drastically benefitted from SSL as most of the current domain-related tasks are now being approached with pre-trained models. This work introduces LeBenchmark 2.0 an open-source framework for assessing and building SSL-… ▽ More

    Submitted 18 March, 2024; v1 submitted 11 September, 2023; originally announced September 2023.

    Comments: Published in Computer Science and Language. Preprint allowed

  11. arXiv:2307.12114  [pdf, ps, other

    cs.CL cs.AI cs.LG

    A Zero-shot and Few-shot Study of Instruction-Finetuned Large Language Models Applied to Clinical and Biomedical Tasks

    Authors: Yanis Labrak, Mickael Rouvier, Richard Dufour

    Abstract: We evaluate four state-of-the-art instruction-tuned large language models (LLMs) -- ChatGPT, Flan-T5 UL2, Tk-Instruct, and Alpaca -- on a set of 13 real-world clinical and biomedical natural language processing (NLP) tasks in English, such as named-entity recognition (NER), question-answering (QA), relation extraction (RE), etc. Our overall results demonstrate that the evaluated LLMs begin to appr… ▽ More

    Submitted 9 June, 2024; v1 submitted 22 July, 2023; originally announced July 2023.

    Comments: LREC-COLING 2024 - Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation

    Journal ref: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

  12. arXiv:2304.04280  [pdf, other

    cs.CL cs.AI

    FrenchMedMCQA: A French Multiple-Choice Question Answering Dataset for Medical domain

    Authors: Yanis Labrak, Adrien Bazoge, Richard Dufour, Mickael Rouvier, Emmanuel Morin, Béatrice Daille, Pierre-Antoine Gourraud

    Abstract: This paper introduces FrenchMedMCQA, the first publicly available Multiple-Choice Question Answering (MCQA) dataset in French for medical domain. It is composed of 3,105 questions taken from real exams of the French medical specialization diploma in pharmacy, mixing single and multiple answers. Each instance of the dataset contains an identifier, a question, five possible answers and their manual… ▽ More

    Submitted 9 April, 2023; originally announced April 2023.

    Journal ref: Proceedings of the 13th International Workshop on Health Text Mining and Information Analysis (LOUHI 2022)

  13. arXiv:2304.00958  [pdf, other

    cs.CL

    DrBERT: A Robust Pre-trained Model in French for Biomedical and Clinical domains

    Authors: Yanis Labrak, Adrien Bazoge, Richard Dufour, Mickael Rouvier, Emmanuel Morin, Béatrice Daille, Pierre-Antoine Gourraud

    Abstract: In recent years, pre-trained language models (PLMs) achieve the best performance on a wide range of natural language processing (NLP) tasks. While the first models were trained on general domain data, specialized ones have emerged to more effectively treat specific domains. In this paper, we propose an original study of PLMs in the medical domain on French language. We compare, for the first time,… ▽ More

    Submitted 4 May, 2023; v1 submitted 3 April, 2023; originally announced April 2023.

    Comments: Accepted at ACL 2023

  14. arXiv:2211.01091  [pdf, ps, other

    eess.AS cs.AI cs.SD

    I4U System Description for NIST SRE'20 CTS Challenge

    Authors: Kong Aik Lee, Tomi Kinnunen, Daniele Colibro, Claudio Vair, Andreas Nautsch, Hanwu Sun, Liang He, Tianyu Liang, Qiongqiong Wang, Mickael Rouvier, Pierre-Michel Bousquet, Rohan Kumar Das, Ignacio Viñals Bailo, Meng Liu, Héctor Deldago, Xuechen Liu, Md Sahidullah, Sandro Cumani, Boning Zhang, Koji Okabe, Hitoshi Yamamoto, Ruijie Tao, Haizhou Li, Alfonso Ortega Giménez, Longbiao Wang , et al. (1 additional authors not shown)

    Abstract: This manuscript describes the I4U submission to the 2020 NIST Speaker Recognition Evaluation (SRE'20) Conversational Telephone Speech (CTS) Challenge. The I4U's submission was resulted from active collaboration among researchers across eight research teams - I$^2$R (Singapore), UEF (Finland), VALPT (Italy, Spain), NEC (Japan), THUEE (China), LIA (France), NUS (Singapore), INRIA (France) and TJU (C… ▽ More

    Submitted 2 November, 2022; originally announced November 2022.

    Comments: SRE 2021, NIST Speaker Recognition Evaluation Workshop, CTS Speaker Recognition Challenge, 14-12 December 2021

  15. arXiv:2210.05291  [pdf, other

    cs.CL cs.SD eess.AS

    On the Use of Semantically-Aligned Speech Representations for Spoken Language Understanding

    Authors: Gaëlle Laperrière, Valentin Pelloin, Mickaël Rouvier, Themos Stafylakis, Yannick Estève

    Abstract: In this paper we examine the use of semantically-aligned speech representations for end-to-end spoken language understanding (SLU). We employ the recently-introduced SAMU-XLSR model, which is designed to generate a single embedding that captures the semantics at the utterance level, semantically aligned across different languages. This model combines the acoustic frame-level speech representation… ▽ More

    Submitted 11 October, 2022; originally announced October 2022.

    Comments: Accepted in IEEE SLT 2022. This work was performed using HPC resources from GENCI/IDRIS (grant 2022 AD011012565) and received funding from the EU H2020 research and innovation programme under the Marie Sklodowska-Curie ESPERANTO project (grant agreement No 101007666), through the SELMA project (grant No 957017) and from the French ANR through the AISSPER project (ANR-19-CE23-0004)

  16. arXiv:2201.05051  [pdf, ps, other

    cs.CL

    Speech Resources in the Tamasheq Language

    Authors: Marcely Zanon Boito, Fethi Bougares, Florentin Barbier, Souhir Gahbiche, Loïc Barrault, Mickael Rouvier, Yannick Estève

    Abstract: In this paper we present two datasets for Tamasheq, a developing language mainly spoken in Mali and Niger. These two datasets were made available for the IWSLT 2022 low-resource speech translation track, and they consist of collections of radio recordings from daily broadcast news in Niger (Studio Kalangou) and Mali (Studio Tamani). We share (i) a massive amount of unlabeled audio data (671 hours)… ▽ More

    Submitted 11 April, 2022; v1 submitted 13 January, 2022; originally announced January 2022.

    Comments: Accepted to LREC 2022

  17. arXiv:2109.05977  [pdf, other

    eess.AS cs.SD

    Studying squeeze-and-excitation used in CNN for speaker verification

    Authors: Mickael Rouvier, Pierre-Michel Bousquet

    Abstract: In speaker verification, the extraction of voice representations is mainly based on the Residual Neural Network (ResNet) architecture. ResNet is built upon convolution layers which learn filters to capture local spatial patterns along all the input, then generate feature maps that jointly encode the spatial and channel information. Unfortunately, all feature maps in a convolution layer are learnt… ▽ More

    Submitted 13 September, 2021; originally announced September 2021.

  18. arXiv:2105.04310  [pdf, other

    eess.AS cs.SD

    Study on the temporal pooling used in deep neural networks for speaker verification

    Authors: Mickael Rouvier, Pierre-Michel Bousquet, Jarod Duret

    Abstract: The x-vector architecture has recently achieved state-of-the-art results on the speaker verification task. This architecture incorporates a central layer, referred to as temporal pooling, which stacks statistical parameters of the acoustic frame distribution. This work proposes to highlight the significant effect of the temporal pooling content on the training dynamics and task performance. An eva… ▽ More

    Submitted 10 May, 2021; originally announced May 2021.

  19. arXiv:1910.13689  [pdf, other

    cs.CL cs.SD eess.AS

    ON-TRAC Consortium End-to-End Speech Translation Systems for the IWSLT 2019 Shared Task

    Authors: Ha Nguyen, Natalia Tomashenko, Marcely Zanon Boito, Antoine Caubriere, Fethi Bougares, Mickael Rouvier, Laurent Besacier, Yannick Esteve

    Abstract: This paper describes the ON-TRAC Consortium translation systems developed for the end-to-end model task of IWSLT Evaluation 2019 for the English-to-Portuguese language pair. ON-TRAC Consortium is composed of researchers from three French academic laboratories: LIA (Avignon Université), LIG (Université Grenoble Alpes), and LIUM (Le Mans Université). A single end-to-end model built as a neural encod… ▽ More

    Submitted 30 October, 2019; originally announced October 2019.

    Comments: IWSLT 2019 - First two authors contributed equally to this work

  20. arXiv:1904.07386  [pdf, other

    eess.AS cs.CL cs.SD

    I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences

    Authors: Kong Aik Lee, Ville Hautamaki, Tomi Kinnunen, Hitoshi Yamamoto, Koji Okabe, Ville Vestman, Jing Huang, Guohong Ding, Hanwu Sun, Anthony Larcher, Rohan Kumar Das, Haizhou Li, Mickael Rouvier, Pierre-Michel Bousquet, Wei Rao, Qing Wang, Chunlei Zhang, Fahimeh Bahmaninezhad, Hector Delgado, Jose Patino, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Koichi Shinoda , et al. (21 additional authors not shown)

    Abstract: The I4U consortium was established to facilitate a joint entry to NIST speaker recognition evaluations (SRE). The latest edition of such joint submission was in SRE 2018, in which the I4U submission was among the best-performing systems. SRE'18 also marks the 10-year anniversary of I4U consortium into NIST SRE series of evaluation. The primary objective of the current paper is to summarize the res… ▽ More

    Submitted 15 April, 2019; originally announced April 2019.

    Comments: 5 pages

  21. arXiv:1612.05202  [pdf, other

    cs.CL

    Building a robust sentiment lexicon with (almost) no resource

    Authors: Mickael Rouvier, Benoit Favre

    Abstract: Creating sentiment polarity lexicons is labor intensive. Automatically translating them from resourceful languages requires in-domain machine translation systems, which rely on large quantities of bi-texts. In this paper, we propose to replace machine translation by transferring words from the lexicon through word embeddings aligned across languages with a simple linear transform. The approach lea… ▽ More

    Submitted 15 December, 2016; originally announced December 2016.

  22. arXiv:1612.05168  [pdf, ps, other

    cs.SD

    LIA system description for NIST SRE 2016

    Authors: Mickael Rouvier, Pierre-Michel Bousquet, Moez Ajili, Waad Ben Kheder, Driss Matrouf, Jean-François Bonastre

    Abstract: This paper describes the LIA speaker recognition system developed for the Speaker Recognition Evaluation (SRE) campaign. Eight sub-systems are developed, all based on a state-of-the-art approach: i-vector/PLDA which represents the mainstream technique in text-independent speaker recognition. These sub-systems differ: on the acoustic feature extraction front-end (MFCC, PLP), at the i-vector extract… ▽ More

    Submitted 15 December, 2016; originally announced December 2016.