-
CML-TTS A Multilingual Dataset for Speech Synthesis in Low-Resource Languages
Authors:
Frederico S. Oliveira,
Edresson Casanova,
Arnaldo Cândido Júnior,
Anderson S. Soares,
Arlindo R. Galvão Filho
Abstract:
In this paper, we present CML-TTS, a recursive acronym for CML-Multi-Lingual-TTS, a new Text-to-Speech (TTS) dataset developed at the Center of Excellence in Artificial Intelligence (CEIA) of the Federal University of Goias (UFG). CML-TTS is based on Multilingual LibriSpeech (MLS) and adapted for training TTS models, consisting of audiobooks in seven languages: Dutch, French, German, Italian, Port…
▽ More
In this paper, we present CML-TTS, a recursive acronym for CML-Multi-Lingual-TTS, a new Text-to-Speech (TTS) dataset developed at the Center of Excellence in Artificial Intelligence (CEIA) of the Federal University of Goias (UFG). CML-TTS is based on Multilingual LibriSpeech (MLS) and adapted for training TTS models, consisting of audiobooks in seven languages: Dutch, French, German, Italian, Portuguese, Polish, and Spanish. Additionally, we provide the YourTTS model, a multi-lingual TTS model, trained using 3,176.13 hours from CML-TTS and also with 245.07 hours from LibriTTS, in English. Our purpose in creating this dataset is to open up new research possibilities in the TTS area for multi-lingual models. The dataset is publicly available under the CC-BY 4.0 license1.
△ Less
Submitted 16 June, 2023;
originally announced June 2023.
-
Evaluation of Speech Representations for MOS prediction
Authors:
Frederico S. Oliveira,
Edresson Casanova,
Arnaldo Cândido Júnior,
Lucas R. S. Gris,
Anderson S. Soares,
Arlindo R. Galvão Filho
Abstract:
In this paper, we evaluate feature extraction models for predicting speech quality. We also propose a model architecture to compare embeddings of supervised learning and self-supervised learning models with embeddings of speaker verification models to predict the metric MOS. Our experiments were performed on the VCC2018 dataset and a Brazilian-Portuguese dataset called BRSpeechMOS, which was creat…
▽ More
In this paper, we evaluate feature extraction models for predicting speech quality. We also propose a model architecture to compare embeddings of supervised learning and self-supervised learning models with embeddings of speaker verification models to predict the metric MOS. Our experiments were performed on the VCC2018 dataset and a Brazilian-Portuguese dataset called BRSpeechMOS, which was created for this work. The results show that the Whisper model is appropriate in all scenarios: with both the VCC2018 and BRSpeech- MOS datasets. Among the supervised and self-supervised learning models using BRSpeechMOS, Whisper-Small achieved the best linear correlation of 0.6980, and the speaker verification model, SpeakerNet, had linear correlation of 0.6963. Using VCC2018, the best supervised and self-supervised learning model, Whisper-Large, achieved linear correlation of 0.7274, and the best model speaker verification, TitaNet, achieved a linear correlation of 0.6933. Although the results of the speaker verification models are slightly lower, the SpeakerNet model has only 5M parameters, making it suitable for real-time applications, and the TitaNet model produces an embedding of size 192, the smallest among all the evaluated models. The experiment results are reproducible with publicly available source-code1 .
△ Less
Submitted 16 June, 2023;
originally announced June 2023.
-
Fluorescence angiography classification in colorectal surgery -- A preliminary report
Authors:
Antonio S Soares,
Sophia Bano,
Neil T Clancy,
Laurence B Lovat,
Danail Stoyanov,
Manish Chand
Abstract:
Background: Fluorescence angiography has shown very promising results in reducing anastomotic leaks by allowing the surgeon to select optimally perfused tissue. However, subjective interpretation of the fluorescent signal still hinders broad application of the technique, as significant variation between different surgeons exists. Our aim is to develop an artificial intelligence algorithm to classi…
▽ More
Background: Fluorescence angiography has shown very promising results in reducing anastomotic leaks by allowing the surgeon to select optimally perfused tissue. However, subjective interpretation of the fluorescent signal still hinders broad application of the technique, as significant variation between different surgeons exists. Our aim is to develop an artificial intelligence algorithm to classify colonic tissue as 'perfused' or 'not perfused' based on intraoperative fluorescence angiography data.
Methods: A classification model with a Resnet architecture was trained on a dataset of fluorescence angiography videos of colorectal resections at a tertiary referral centre. Frames corresponding to fluorescent and non-fluorescent segments of colon were used to train a classification algorithm. Validation using frames from patients not used in the training set was performed, including both data collected using the same equipment and data collected using a different camera. Performance metrics were calculated, and saliency maps used to further analyse the output. A decision boundary was identified based on the tissue classification.
Results: A convolutional neural network was successfully trained on 1790 frames from 7 patients and validated in 24 frames from 14 patients. The accuracy on the training set was 100%, on the validation set was 80%. Recall and precision were respectively 100% and 100% on the training set and 68.8% and 91.7% on the validation set.
Conclusion: Automated classification of intraoperative fluorescence angiography with a high degree of accuracy is possible and allows automated decision boundary identification. This will enable surgeons to standardise the technique of fluorescence angiography. A web based app was made available to deploy the algorithm.
△ Less
Submitted 13 June, 2022;
originally announced June 2022.
-
Predição da Idade Cerebral a partir de Imagens de Ressonância Magnética utilizando Redes Neurais Convolucionais
Authors:
Victor H. R. Oliveira,
Augusto Antunes,
Alexandre S. Soares,
Arthur D. Reys,
Robson Z. Júnior,
Saulo D. S. Pedro,
Danilo Silva
Abstract:
In this work, deep learning techniques for brain age prediction from magnetic resonance images are investigated, aiming to assist in the identification of biomarkers of the natural aging process. The identification of biomarkers is useful for detecting an early-stage neurodegenerative process, as well as for predicting age-related or non-age-related cognitive decline. Two techniques are implemente…
▽ More
In this work, deep learning techniques for brain age prediction from magnetic resonance images are investigated, aiming to assist in the identification of biomarkers of the natural aging process. The identification of biomarkers is useful for detecting an early-stage neurodegenerative process, as well as for predicting age-related or non-age-related cognitive decline. Two techniques are implemented and compared in this work: a 3D Convolutional Neural Network applied to the volumetric image and a 2D Convolutional Neural Network applied to slices from the axial plane, with subsequent fusion of individual predictions. The best result was obtained by the 2D model, which achieved a mean absolute error of 3.83 years.
--
Neste trabalho são investigadas técnicas de aprendizado profundo para a predição da idade cerebral a partir de imagens de ressonância magnética, visando auxiliar na identificação de biomarcadores do processo natural de envelhecimento. A identificação de biomarcadores é útil para a detecção de um processo neurodegenerativo em estágio inicial, além de possibilitar prever um declínio cognitivo relacionado ou não à idade. Duas técnicas são implementadas e comparadas neste trabalho: uma Rede Neural Convolucional 3D aplicada na imagem volumétrica e uma Rede Neural Convolucional 2D aplicada a fatias do plano axial, com posterior fusão das predições individuais. O melhor resultado foi obtido pelo modelo 2D, que alcançou um erro médio absoluto de 3.83 anos.
△ Less
Submitted 23 December, 2021;
originally announced December 2021.
-
Improving Irregularly Sampled Time Series Learning with Dense Descriptors of Time
Authors:
Rafael T. Sousa,
Lucas A. Pereira,
Anderson S. Soares
Abstract:
Supervised learning with irregularly sampled time series have been a challenge to Machine Learning methods due to the obstacle of dealing with irregular time intervals. Some papers introduced recently recurrent neural network models that deals with irregularity, but most of them rely on complex mechanisms to achieve a better performance. This work propose a novel method to represent timestamps (ho…
▽ More
Supervised learning with irregularly sampled time series have been a challenge to Machine Learning methods due to the obstacle of dealing with irregular time intervals. Some papers introduced recently recurrent neural network models that deals with irregularity, but most of them rely on complex mechanisms to achieve a better performance. This work propose a novel method to represent timestamps (hours or dates) as dense vectors using sinusoidal functions, called Time Embeddings. As a data input method it and can be applied to most machine learning models. The method was evaluated with two predictive tasks from MIMIC III, a dataset of irregularly sampled time series of electronic health records. Our tests showed an improvement to LSTM-based and classical machine learning models, specially with very irregular data.
△ Less
Submitted 20 March, 2020;
originally announced March 2020.
-
Predicting Diabetes Disease Evolution Using Financial Records and Recurrent Neural Networks
Authors:
Rafael T. Sousa,
Lucas A. Pereira,
Anderson S. Soares
Abstract:
Managing patients with chronic diseases is a major and growing healthcare challenge in several countries. A chronic condition, such as diabetes, is an illness that lasts a long time and does not go away, and often leads to the patient's health gradually getting worse. While recent works involve raw electronic health record (EHR) from hospitals, this work uses only financial records from health pla…
▽ More
Managing patients with chronic diseases is a major and growing healthcare challenge in several countries. A chronic condition, such as diabetes, is an illness that lasts a long time and does not go away, and often leads to the patient's health gradually getting worse. While recent works involve raw electronic health record (EHR) from hospitals, this work uses only financial records from health plan providers (medical claims) to predict diabetes disease evolution with a self-attentive recurrent neural network. The use of financial data is due to the possibility of being an interface to international standards, as the records standard encodes medical procedures. The main goal was to assess high risk diabetics, so we predict records related to diabetes acute complications such as amputations and debridements, revascularization and hemodialysis. Our work succeeds to anticipate complications between 60 to 240 days with an area under ROC curve ranging from 0.81 to 0.94. In this paper we describe the first half of a work-in-progress developed within a health plan provider with ROC curve ranging from 0.81 to 0.83. This assessment will give healthcare providers the chance to intervene earlier and head off hospitalizations. We are aiming to deliver personalized predictions and personalized recommendations to individual patients, with the goal of improving outcomes and reducing costs
△ Less
Submitted 20 March, 2020; v1 submitted 22 November, 2018;
originally announced November 2018.