Skip to main content

Showing 1–10 of 10 results for author: Shulby, C

  1. arXiv:2204.00618  [pdf, other

    eess.AS cs.CL cs.SD

    ASR data augmentation in low-resource settings using cross-lingual multi-speaker TTS and cross-lingual voice conversion

    Authors: Edresson Casanova, Christopher Shulby, Alexander Korolev, Arnaldo Candido Junior, Anderson da Silva Soares, Sandra Aluísio, Moacir Antonelli Ponti

    Abstract: We explore cross-lingual multi-speaker speech synthesis and cross-lingual voice conversion applied to data augmentation for automatic speech recognition (ASR) systems in low/medium-resource scenarios. Through extensive experiments, we show that our approach permits the application of speech synthesis and voice conversion to improve ASR systems using only one target-language speaker during model tr… ▽ More

    Submitted 20 May, 2023; v1 submitted 29 March, 2022; originally announced April 2022.

    Comments: This paper was accepted at INTERSPEECH 2023

  2. arXiv:2112.02418  [pdf, other

    cs.SD cs.CL eess.AS

    YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone

    Authors: Edresson Casanova, Julian Weber, Christopher Shulby, Arnaldo Candido Junior, Eren Gölge, Moacir Antonelli Ponti

    Abstract: YourTTS brings the power of a multilingual approach to the task of zero-shot multi-speaker TTS. Our method builds upon the VITS model and adds several novel modifications for zero-shot multi-speaker and multilingual training. We achieved state-of-the-art (SOTA) results in zero-shot multi-speaker TTS and results comparable to SOTA in zero-shot voice conversion on the VCTK dataset. Additionally, our… ▽ More

    Submitted 30 April, 2023; v1 submitted 4 December, 2021; originally announced December 2021.

    Comments: An Erratum was added on the last page of this paper

    Journal ref: Proceedings of the 39th International Conference on Machine Learning, PMLR 162:2709-2720, 2022

  3. arXiv:2104.05557  [pdf, other

    eess.AS cs.SD

    SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model

    Authors: Edresson Casanova, Christopher Shulby, Eren Gölge, Nicolas Michael Müller, Frederico Santos de Oliveira, Arnaldo Candido Junior, Anderson da Silva Soares, Sandra Maria Aluisio, Moacir Antonelli Ponti

    Abstract: In this paper, we propose SC-GlowTTS: an efficient zero-shot multi-speaker text-to-speech model that improves similarity for speakers unseen during training. We propose a speaker-conditional architecture that explores a flow-based decoder that works in a zero-shot scenario. As text encoders, we explore a dilated residual convolutional-based encoder, gated convolutional-based encoder, and transform… ▽ More

    Submitted 15 June, 2021; v1 submitted 2 April, 2021; originally announced April 2021.

    Comments: Accepted on Interspeech 2021

  4. arXiv:2005.05144  [pdf, other

    eess.AS cs.CL cs.LG

    TTS-Portuguese Corpus: a corpus for speech synthesis in Brazilian Portuguese

    Authors: Edresson Casanova, Arnaldo Candido Junior, Christopher Shulby, Frederico Santos de Oliveira, João Paulo Teixeira, Moacir Antonelli Ponti, Sandra Maria Aluisio

    Abstract: Speech provides a natural way for human-computer interaction. In particular, speech synthesis systems are popular in different applications, such as personal assistants, GPS applications, screen readers and accessibility tools. However, not all languages are on the same level when in terms of resources and systems for speech synthesis. This work consists of creating publicly available resources fo… ▽ More

    Submitted 29 January, 2022; v1 submitted 11 May, 2020; originally announced May 2020.

  5. arXiv:2002.11213  [pdf, other

    cs.CL cs.SD eess.AS

    Speech2Phone: A Novel and Efficient Method for Training Speaker Recognition Models

    Authors: Edresson Casanova, Arnaldo Candido Junior, Christopher Shulby, Frederico Santos de Oliveira, Lucas Rafael Stefanel Gris, Hamilton Pereira da Silva, Sandra Maria Aluisio, Moacir Antonelli Ponti

    Abstract: In this paper we present an efficient method for training models for speaker recognition using small or under-resourced datasets. This method requires less data than other SOTA (State-Of-The-Art) methods, e.g. the Angular Prototypical and GE2E loss functions, while achieving similar results to those methods. This is done using the knowledge of the reconstruction of a phoneme in the speaker's voice… ▽ More

    Submitted 18 June, 2021; v1 submitted 25 February, 2020; originally announced February 2020.

    Comments: Submitted to BRACIS

  6. arXiv:1810.09431  [pdf, ps, other

    cs.CL cs.AI

    Proactive Security: Embedded AI Solution for Violent and Abusive Speech Recognition

    Authors: Christopher Dane Shulby, Leonardo Pombal, Vitor Jordão, Guilherme Ziolle, Bruno Martho, Antônio Postal, Thiago Prochnow

    Abstract: Violence is an epidemic in Brazil and a problem on the rise world-wide. Mobile devices provide communication technologies which can be used to monitor and alert about violent situations. However, current solutions, like panic buttons or safe words, might increase the loss of life in violent situations. We propose an embedded artificial intelligence solution, using natural language and speech proce… ▽ More

    Submitted 22 October, 2018; originally announced October 2018.

    Comments: 6 Pages, Bracis 2018 Preprint

  7. arXiv:1708.06025  [pdf, ps, other

    cs.CL

    Portuguese Word Embeddings: Evaluating on Word Analogies and Natural Language Tasks

    Authors: Nathan Hartmann, Erick Fonseca, Christopher Shulby, Marcos Treviso, Jessica Rodrigues, Sandra Aluisio

    Abstract: Word embeddings have been found to provide meaningful representations for words in an efficient way; therefore, they have become common in Natural Language Processing sys- tems. In this paper, we evaluated different word embedding models trained on a large Portuguese corpus, including both Brazilian and European variants. We trained 31 word embedding models using FastText, GloVe, Wang2Vec and Word… ▽ More

    Submitted 20 August, 2017; originally announced August 2017.

    Comments: 7 pages, STIL 2017 Full paper

  8. arXiv:1708.04704  [pdf, other

    cs.CL

    Evaluating Word Embeddings for Sentence Boundary Detection in Speech Transcripts

    Authors: Marcos V. Treviso, Christopher D. Shulby, Sandra M. Aluisio

    Abstract: This paper is motivated by the automation of neuropsychological tests involving discourse analysis in the retellings of narratives by patients with potential cognitive impairment. In this scenario the task of sentence boundary detection in speech transcripts is important as discourse analysis involves the application of Natural Language Processing tools, such as taggers and parsers, which depend o… ▽ More

    Submitted 15 August, 2017; originally announced August 2017.

    Comments: Accepted on STIL 2017

  9. arXiv:1706.09055  [pdf, other

    cs.SD cs.CL

    Acoustic Modeling Using a Shallow CNN-HTSVM Architecture

    Authors: Christopher Dane Shulby, Martha Dais Ferreira, Rodrigo F. de Mello, Sandra Maria Aluisio

    Abstract: High-accuracy speech recognition is especially challenging when large datasets are not available. It is possible to bridge this gap with careful and knowledge-driven parsing combined with the biologically inspired CNN and the learning guarantees of the Vapnik Chervonenkis (VC) theory. This work presents a Shallow-CNN-HTSVM (Hierarchical Tree Support Vector Machine classifier) architecture which us… ▽ More

    Submitted 27 June, 2017; originally announced June 2017.

    Comments: Pre-review version of Bracis 2017

  10. arXiv:1610.00211  [pdf, other

    cs.CL

    Sentence Segmentation in Narrative Transcripts from Neuropsychological Tests using Recurrent Convolutional Neural Networks

    Authors: Marcos Vinícius Treviso, Christopher Shulby, Sandra Maria Aluísio

    Abstract: Automated discourse analysis tools based on Natural Language Processing (NLP) aiming at the diagnosis of language-impairing dementias generally extract several textual metrics of narrative transcripts. However, the absence of sentence boundary segmentation in the transcripts prevents the direct application of NLP methods which rely on these marks to function properly, such as taggers and parsers.… ▽ More

    Submitted 15 August, 2017; v1 submitted 1 October, 2016; originally announced October 2016.

    Comments: EACL 2017

    MSC Class: 68T50