-
Merlin: A Vision Language Foundation Model for 3D Computed Tomography
Authors:
Louis Blankemeier,
Joseph Paul Cohen,
Ashwin Kumar,
Dave Van Veen,
Syed Jamal Safdar Gardezi,
Magdalini Paschali,
Zhihong Chen,
Jean-Benoit Delbrouck,
Eduardo Reis,
Cesar Truyts,
Christian Bluethgen,
Malte Engmann Kjeldskov Jensen,
Sophie Ostmeier,
Maya Varma,
Jeya Maria Jose Valanarasu,
Zhongnan Fang,
Zepeng Huo,
Zaid Nabulsi,
Diego Ardila,
Wei-Hung Weng,
Edson Amaro Junior,
Neera Ahuja,
Jason Fries,
Nigam H. Shah,
Andrew Johnston
, et al. (6 additional authors not shown)
Abstract:
Over 85 million computed tomography (CT) scans are performed annually in the US, of which approximately one quarter focus on the abdomen. Given the current radiologist shortage, there is a large impetus to use artificial intelligence to alleviate the burden of interpreting these complex imaging studies. Prior state-of-the-art approaches for automated medical image interpretation leverage vision la…
▽ More
Over 85 million computed tomography (CT) scans are performed annually in the US, of which approximately one quarter focus on the abdomen. Given the current radiologist shortage, there is a large impetus to use artificial intelligence to alleviate the burden of interpreting these complex imaging studies. Prior state-of-the-art approaches for automated medical image interpretation leverage vision language models (VLMs). However, current medical VLMs are generally limited to 2D images and short reports, and do not leverage electronic health record (EHR) data for supervision. We introduce Merlin - a 3D VLM that we train using paired CT scans (6+ million images from 15,331 CTs), EHR diagnosis codes (1.8+ million codes), and radiology reports (6+ million tokens). We evaluate Merlin on 6 task types and 752 individual tasks. The non-adapted (off-the-shelf) tasks include zero-shot findings classification (31 findings), phenotype classification (692 phenotypes), and zero-shot cross-modal retrieval (image to findings and image to impressions), while model adapted tasks include 5-year disease prediction (6 diseases), radiology report generation, and 3D semantic segmentation (20 organs). We perform internal validation on a test set of 5,137 CTs, and external validation on 7,000 clinical CTs and on two public CT datasets (VerSe, TotalSegmentator). Beyond these clinically-relevant evaluations, we assess the efficacy of various network architectures and training strategies to depict that Merlin has favorable performance to existing task-specific baselines. We derive data scaling laws to empirically assess training data needs for requisite downstream task performance. Furthermore, unlike conventional VLMs that require hundreds of GPUs for training, we perform all training on a single GPU.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
IT Strategic alignment in the decentralized finance (DeFi): CBDC and digital currencies
Authors:
Carlos Alberto Durigan Junior,
Fernando Jose Barbin Laurindo
Abstract:
Cryptocurrency can be understood as a digital asset transacted among participants in the crypto economy. Every cryptocurrency must have an associated Blockchain. Blockchain is a Distributed Ledger Technology (DLT) which supports cryptocurrencies, this may be considered as the most promising disruptive technology in the industry 4.0 context. Decentralized finance (DeFi) is a Blockchain-based financ…
▽ More
Cryptocurrency can be understood as a digital asset transacted among participants in the crypto economy. Every cryptocurrency must have an associated Blockchain. Blockchain is a Distributed Ledger Technology (DLT) which supports cryptocurrencies, this may be considered as the most promising disruptive technology in the industry 4.0 context. Decentralized finance (DeFi) is a Blockchain-based financial infrastructure, the term generally refers to an open, permissionless, and highly interoperable protocol stack built on public smart contract platforms, such as the Ethereum Blockchain. It replicates existing financial services in a more open and transparent way. DeFi does not rely on intermediaries and centralized institutions. Instead, it is based on open protocols and decentralized applications (Dapps). Considering that there are many digital coins, stablecoins and central bank digital currencies (CBDCs), these currencies should interact among each other sometime. For this interaction the Information Technology elements play an important whole as enablers and IT strategic alignment. This paper considers the strategic alignment model proposed by Henderson and Venkatraman (1993) and Luftman (1996). This paper seeks to answer two main questions 1) What are the common IT elements in the DeFi? And 2) How the elements connect to the IT strategic alignment in DeFi? Through a Systematic Literature Review (SLR). Results point out that there are many IT elements already mentioned by literature, however there is a lack in the literature about the connection between IT elements and IT strategic alignment in a Decentralized Finance (DeFi) architectural network. After final considerations, limitations and future research agenda are presented. Keywords: IT Strategic alignment, Decentralized Finance (DeFi), Cryptocurrency, Digital Economy.
△ Less
Submitted 17 May, 2024;
originally announced May 2024.
-
Web Intelligence Journal in perspective: an analysis of its two decades trajectory
Authors:
Diogenes Ademir Domingos,
Victor Emanuel Santos Moura,
Antonio Fernando Lavareda Jacob Junior,
Fabio Manoel Franca Lobato
Abstract:
The evolution of a thematic area undergoes various changes of perspective and adopts new theoretical approaches that arise from the interactions of the community and a wide range of social needs. The advent of digital technologies, such as social networks, underlines this factor by spreading knowledge and forging links between different communities. Web intelligence is now on the verge of raising…
▽ More
The evolution of a thematic area undergoes various changes of perspective and adopts new theoretical approaches that arise from the interactions of the community and a wide range of social needs. The advent of digital technologies, such as social networks, underlines this factor by spreading knowledge and forging links between different communities. Web intelligence is now on the verge of raising questions that broaden the understanding of how artificial intelligence impacts the Web of People, Data, and Things, among other factors. To the best of our knowledge, there is no study that has conducted a longitudinal analysis of the evolution of this community. Thus, we investigate in this paper how Web intelligence has evolved in the last twenty years by carrying out a literature review and bibliometric analysis. Concerning the impact of this research study, increasing attention is devoted to determining which are the most influential papers in the community by referring to citation networks and discovering the most popular and pressing topics through a co-citation analysis and the keywords co-occurrence. The results obtained can guide the direction of new research projects in the area and update the scope and places of interest found in current trends and the relevant journals.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
The use of the open innovation paradigm in the public sector: a systematic review of published studies
Authors:
Joel Alves de Lima Júnior,
Kiev Gama,
Jorge da Silva Correia Neto
Abstract:
The use of the open innovation paradigm has been, over the past years, getting special attention in the public sector. Motivated by an urban environment that is increasingly more complex and challenging, several government agencies have been allocating financial resources and efforts to promote open and participative government initiatives. As a way to try and understand this scenario, a systemati…
▽ More
The use of the open innovation paradigm has been, over the past years, getting special attention in the public sector. Motivated by an urban environment that is increasingly more complex and challenging, several government agencies have been allocating financial resources and efforts to promote open and participative government initiatives. As a way to try and understand this scenario, a systematic review of the literature was conducted, to provide a comprehensive analysis of the scientific papers that were published, seeking to capture, classify, evaluate and synthesize how the use of this paradigm has been put into practice in the public sector. In total, 4,741 preliminary studies were analyzed. From this number, only 37 articles were classified as potentially relevant and moved forward, going through the process of data extraction and analysis. From the data obtained, it was possible to verify that the use of this paradigm started to be reported with a higher frequency in the literature since 2013 and, among the main findings, we highlight the reports of experiences, approach propositions, of understanding how the phenomenon occurs and theoretical reflections. It was also possible to verify that the use of open innovation through social media was one of the pioneer techniques of engagement between the public sector and citizens. In conclusion, the reports confirm that the main challenges of this paradigm applied to the public sector are associated with their respective bureaucratic aspects, therefore lacking a bigger reflection on the procedures and methods to be used in the public sphere.
△ Less
Submitted 8 April, 2024; v1 submitted 1 April, 2024;
originally announced April 2024.
-
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order
Authors:
Taishi Nakamura,
Mayank Mishra,
Simone Tedeschi,
Yekun Chai,
Jason T Stillerman,
Felix Friedrich,
Prateek Yadav,
Tanmay Laud,
Vu Minh Chien,
Terry Yue Zhuo,
Diganta Misra,
Ben Bogin,
Xuan-Son Vu,
Marzena Karpinska,
Arnav Varma Dantuluri,
Wojciech Kusa,
Tommaso Furlanello,
Rio Yokota,
Niklas Muennighoff,
Suhas Pai,
Tosin Adewumi,
Veronika Laippala,
Xiaozhe Yao,
Adalberto Junior,
Alpay Ariyak
, et al. (20 additional authors not shown)
Abstract:
Pretrained language models underpin several AI applications, but their high computational cost for training limits accessibility. Initiatives such as BLOOM and StarCoder aim to democratize access to pretrained models for collaborative community development. However, such existing models face challenges: limited multilingual capabilities, continual pretraining causing catastrophic forgetting, where…
▽ More
Pretrained language models underpin several AI applications, but their high computational cost for training limits accessibility. Initiatives such as BLOOM and StarCoder aim to democratize access to pretrained models for collaborative community development. However, such existing models face challenges: limited multilingual capabilities, continual pretraining causing catastrophic forgetting, whereas pretraining from scratch is computationally expensive, and compliance with AI safety and development laws. This paper presents Aurora-M, a 15B parameter multilingual open-source model trained on English, Finnish, Hindi, Japanese, Vietnamese, and code. Continually pretrained from StarCoderPlus on 435 billion additional tokens, Aurora-M surpasses 2 trillion tokens in total training token count. It is the first open-source multilingual model fine-tuned on human-reviewed safety instructions, thus aligning its development not only with conventional red-teaming considerations, but also with the specific concerns articulated in the Biden-Harris Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence. Aurora-M is rigorously evaluated across various tasks and languages, demonstrating robustness against catastrophic forgetting and outperforming alternatives in multilingual settings, particularly in safety evaluations. To promote responsible open-source LLM development, Aurora-M and its variants are released at https://huggingface.co/collections/aurora-m/aurora-m-models-65fdfdff62471e09812f5407 .
△ Less
Submitted 23 April, 2024; v1 submitted 30 March, 2024;
originally announced April 2024.
-
An Incremental MaxSAT-based Model to Learn Interpretable and Balanced Classification Rules
Authors:
Antônio Carlos Souza Ferreira Júnior,
Thiago Alves Rocha
Abstract:
The increasing advancements in the field of machine learning have led to the development of numerous applications that effectively address a wide range of problems with accurate predictions. However, in certain cases, accuracy alone may not be sufficient. Many real-world problems also demand explanations and interpretability behind the predictions. One of the most popular interpretable models that…
▽ More
The increasing advancements in the field of machine learning have led to the development of numerous applications that effectively address a wide range of problems with accurate predictions. However, in certain cases, accuracy alone may not be sufficient. Many real-world problems also demand explanations and interpretability behind the predictions. One of the most popular interpretable models that are classification rules. This work aims to propose an incremental model for learning interpretable and balanced rules based on MaxSAT, called IMLIB. This new model was based on two other approaches, one based on SAT and the other on MaxSAT. The one based on SAT limits the size of each generated rule, making it possible to balance them. We suggest that such a set of rules seem more natural to be understood compared to a mixture of large and small rules. The approach based on MaxSAT, called IMLI, presents a technique to increase performance that involves learning a set of rules by incrementally applying the model in a dataset. Finally, IMLIB and IMLI are compared using diverse databases. IMLIB obtained results comparable to IMLI in terms of accuracy, generating more balanced rules with smaller sizes.
△ Less
Submitted 29 April, 2024; v1 submitted 25 March, 2024;
originally announced March 2024.
-
Introducing Bode: A Fine-Tuned Large Language Model for Portuguese Prompt-Based Task
Authors:
Gabriel Lino Garcia,
Pedro Henrique Paiola,
Luis Henrique Morelli,
Giovani Candido,
Arnaldo Cândido Júnior,
Danilo Samuel Jodas,
Luis C. S. Afonso,
Ivan Rizzo Guilherme,
Bruno Elias Penteado,
João Paulo Papa
Abstract:
Large Language Models (LLMs) are increasingly bringing advances to Natural Language Processing. However, low-resource languages, those lacking extensive prominence in datasets for various NLP tasks, or where existing datasets are not as substantial, such as Portuguese, already obtain several benefits from LLMs, but not to the same extent. LLMs trained on multilingual datasets normally struggle to…
▽ More
Large Language Models (LLMs) are increasingly bringing advances to Natural Language Processing. However, low-resource languages, those lacking extensive prominence in datasets for various NLP tasks, or where existing datasets are not as substantial, such as Portuguese, already obtain several benefits from LLMs, but not to the same extent. LLMs trained on multilingual datasets normally struggle to respond to prompts in Portuguese satisfactorily, presenting, for example, code switching in their responses. This work proposes a fine-tuned LLaMA 2-based model for Portuguese prompts named Bode in two versions: 7B and 13B. We evaluate the performance of this model in classification tasks using the zero-shot approach with in-context learning, and compare it with other LLMs. Our main contribution is to bring an LLM with satisfactory results in the Portuguese language, as well as to provide a model that is free for research or commercial purposes.
△ Less
Submitted 5 January, 2024;
originally announced January 2024.
-
Offshore Wind Plant Instance Segmentation Using Sentinel-1 Time Series, GIS, and Semantic Segmentation Models
Authors:
Osmar Luiz Ferreira de Carvalho,
Osmar Abilio de Carvalho Junior,
Anesmar Olino de Albuquerque,
Daniel Guerreiro e Silva
Abstract:
Offshore wind farms represent a renewable energy source with a significant global growth trend, and their monitoring is strategic for territorial and environmental planning. This study's primary objective is to detect offshore wind plants at an instance level using semantic segmentation models and Sentinel-1 time series. The secondary objectives are: (a) to develop a database consisting of labeled…
▽ More
Offshore wind farms represent a renewable energy source with a significant global growth trend, and their monitoring is strategic for territorial and environmental planning. This study's primary objective is to detect offshore wind plants at an instance level using semantic segmentation models and Sentinel-1 time series. The secondary objectives are: (a) to develop a database consisting of labeled data and S-1 time series; (b) to compare the performance of five deep semantic segmentation architectures (U-Net, U-Net++, Feature Pyramid Network - FPN, DeepLabv3+, and LinkNet); (c) develop a novel augmentation strategy that shuffles the positions of the images within the time series; (d) investigate different dimensions of time series intervals (1, 5, 10, and 15 images); and (e) evaluate the semantic-to-instance conversion procedure. LinkNet was the top-performing model, followed by U-Net++ and U-Net, while FPN and DeepLabv3+ presented the worst results. The evaluation of semantic segmentation models reveals enhanced Intersection over Union (IoU) (25%) and F-score metrics (18%) with the augmentation of time series images. The study showcases the augmentation strategy's capability to mitigate biases and precisely detect invariant targets. Furthermore, the conversion from semantic to instance segmentation demonstrates its efficacy in accurately isolating individual instances within classified regions - simplifying training data and reducing annotation effort and complexity.
△ Less
Submitted 14 December, 2023;
originally announced December 2023.
-
Deep Learning Brasil at ABSAPT 2022: Portuguese Transformer Ensemble Approaches
Authors:
Juliana Resplande Santanna Gomes,
Eduardo Augusto Santos Garcia,
Adalberto Ferreira Barbosa Junior,
Ruan Chaves Rodrigues,
Diogo Fernandes Costa Silva,
Dyonnatan Ferreira Maia,
Nádia Félix Felipe da Silva,
Arlindo Rodrigues Galvão Filho,
Anderson da Silva Soares
Abstract:
Aspect-based Sentiment Analysis (ABSA) is a task whose objective is to classify the individual sentiment polarity of all entities, called aspects, in a sentence. The task is composed of two subtasks: Aspect Term Extraction (ATE), identify all aspect terms in a sentence; and Sentiment Orientation Extraction (SOE), given a sentence and its aspect terms, the task is to determine the sentiment polarit…
▽ More
Aspect-based Sentiment Analysis (ABSA) is a task whose objective is to classify the individual sentiment polarity of all entities, called aspects, in a sentence. The task is composed of two subtasks: Aspect Term Extraction (ATE), identify all aspect terms in a sentence; and Sentiment Orientation Extraction (SOE), given a sentence and its aspect terms, the task is to determine the sentiment polarity of each aspect term (positive, negative or neutral). This article presents we present our participation in Aspect-Based Sentiment Analysis in Portuguese (ABSAPT) 2022 at IberLEF 2022. We submitted the best performing systems, achieving new state-of-the-art results on both subtasks.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
DeepLearningBrasil@LT-EDI-2023: Exploring Deep Learning Techniques for Detecting Depression in Social Media Text
Authors:
Eduardo Garcia,
Juliana Gomes,
Adalberto Barbosa Júnior,
Cardeque Borges,
Nádia da Silva
Abstract:
In this paper, we delineate the strategy employed by our team, DeepLearningBrasil, which secured us the first place in the shared task DepSign-LT-EDI@RANLP-2023, achieving a 47.0% Macro F1-Score and a notable 2.4% advantage. The task was to classify social media texts into three distinct levels of depression - "not depressed," "moderately depressed," and "severely depressed." Leveraging the power…
▽ More
In this paper, we delineate the strategy employed by our team, DeepLearningBrasil, which secured us the first place in the shared task DepSign-LT-EDI@RANLP-2023, achieving a 47.0% Macro F1-Score and a notable 2.4% advantage. The task was to classify social media texts into three distinct levels of depression - "not depressed," "moderately depressed," and "severely depressed." Leveraging the power of the RoBERTa and DeBERTa models, we further pre-trained them on a collected Reddit dataset, specifically curated from mental health-related Reddit's communities (Subreddits), leading to an enhanced understanding of nuanced mental health discourse. To address lengthy textual data, we used truncation techniques that retained the essence of the content by focusing on its beginnings and endings. Our model was robust against unbalanced data by incorporating sample weights into the loss. Cross-validation and ensemble techniques were then employed to combine our k-fold trained models, delivering an optimal solution. The accompanying code is made available for transparency and further development.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
Yin Yang Convolutional Nets: Image Manifold Extraction by the Analysis of Opposites
Authors:
Augusto Seben da Rosa,
Frederico Santos de Oliveira,
Anderson da Silva Soares,
Arnaldo Candido Junior
Abstract:
Computer vision in general presented several advances such as training optimizations, new architectures (pure attention, efficient block, vision language models, generative models, among others). This have improved performance in several tasks such as classification, and others. However, the majority of these models focus on modifications that are taking distance from realistic neuroscientific app…
▽ More
Computer vision in general presented several advances such as training optimizations, new architectures (pure attention, efficient block, vision language models, generative models, among others). This have improved performance in several tasks such as classification, and others. However, the majority of these models focus on modifications that are taking distance from realistic neuroscientific approaches related to the brain. In this work, we adopt a more bio-inspired approach and present the Yin Yang Convolutional Network, an architecture that extracts visual manifold, its blocks are intended to separate analysis of colors and forms at its initial layers, simulating occipital lobe's operations. Our results shows that our architecture provides State-of-the-Art efficiency among low parameter architectures in the dataset CIFAR-10. Our first model reached 93.32\% test accuracy, 0.8\% more than the older SOTA in this category, while having 150k less parameters (726k in total). Our second model uses 52k parameters, losing only 3.86\% test accuracy. We also performed an analysis on ImageNet, where we reached 66.49\% validation accuracy with 1.6M parameters. We make the code publicly available at: https://github.com/NoSavedDATA/YinYang_CNN.
△ Less
Submitted 24 October, 2023;
originally announced October 2023.
-
fakenewsbr: A Fake News Detection Platform for Brazilian Portuguese
Authors:
Luiz Giordani,
Gilsiley Darú,
Rhenan Queiroz,
Vitor Buzinaro,
Davi Keglevich Neiva,
Daniel Camilo Fuentes Guzmán,
Marcos Jardel Henriques,
Oilson Alberto Gonzatto Junior,
Francisco Louzada
Abstract:
The proliferation of fake news has become a significant concern in recent times due to its potential to spread misinformation and manipulate public opinion. This paper presents a comprehensive study on detecting fake news in Brazilian Portuguese, focusing on journalistic-type news. We propose a machine learning-based approach that leverages natural language processing techniques, including TF-IDF…
▽ More
The proliferation of fake news has become a significant concern in recent times due to its potential to spread misinformation and manipulate public opinion. This paper presents a comprehensive study on detecting fake news in Brazilian Portuguese, focusing on journalistic-type news. We propose a machine learning-based approach that leverages natural language processing techniques, including TF-IDF and Word2Vec, to extract features from textual data. We evaluate the performance of various classification algorithms, such as logistic regression, support vector machine, random forest, AdaBoost, and LightGBM, on a dataset containing both true and fake news articles. The proposed approach achieves high accuracy and F1-Score, demonstrating its effectiveness in identifying fake news. Additionally, we developed a user-friendly web platform, fakenewsbr.com, to facilitate the verification of news articles' veracity. Our platform provides real-time analysis, allowing users to assess the likelihood of fake news articles. Through empirical analysis and comparative studies, we demonstrate the potential of our approach to contribute to the fight against the spread of fake news and promote more informed media consumption.
△ Less
Submitted 20 September, 2023; v1 submitted 20 September, 2023;
originally announced September 2023.
-
A Machine Learning Pressure Emulator for Hydrogen Embrittlement
Authors:
Minh Triet Chau,
João Lucas de Sousa Almeida,
Elie Alhajjar,
Alberto Costa Nogueira Junior
Abstract:
A recent alternative for hydrogen transportation as a mixture with natural gas is blending it into natural gas pipelines. However, hydrogen embrittlement of material is a major concern for scientists and gas installation designers to avoid process failures. In this paper, we propose a physics-informed machine learning model to predict the gas pressure on the pipes' inner wall. Despite its high-fid…
▽ More
A recent alternative for hydrogen transportation as a mixture with natural gas is blending it into natural gas pipelines. However, hydrogen embrittlement of material is a major concern for scientists and gas installation designers to avoid process failures. In this paper, we propose a physics-informed machine learning model to predict the gas pressure on the pipes' inner wall. Despite its high-fidelity results, the current PDE-based simulators are time- and computationally-demanding. Using simulation data, we train an ML model to predict the pressure on the pipelines' inner walls, which is a first step for pipeline system surveillance. We found that the physics-based method outperformed the purely data-driven method and satisfy the physical constraints of the gas flow system.
△ Less
Submitted 22 June, 2023;
originally announced June 2023.
-
CML-TTS A Multilingual Dataset for Speech Synthesis in Low-Resource Languages
Authors:
Frederico S. Oliveira,
Edresson Casanova,
Arnaldo Cândido Júnior,
Anderson S. Soares,
Arlindo R. Galvão Filho
Abstract:
In this paper, we present CML-TTS, a recursive acronym for CML-Multi-Lingual-TTS, a new Text-to-Speech (TTS) dataset developed at the Center of Excellence in Artificial Intelligence (CEIA) of the Federal University of Goias (UFG). CML-TTS is based on Multilingual LibriSpeech (MLS) and adapted for training TTS models, consisting of audiobooks in seven languages: Dutch, French, German, Italian, Port…
▽ More
In this paper, we present CML-TTS, a recursive acronym for CML-Multi-Lingual-TTS, a new Text-to-Speech (TTS) dataset developed at the Center of Excellence in Artificial Intelligence (CEIA) of the Federal University of Goias (UFG). CML-TTS is based on Multilingual LibriSpeech (MLS) and adapted for training TTS models, consisting of audiobooks in seven languages: Dutch, French, German, Italian, Portuguese, Polish, and Spanish. Additionally, we provide the YourTTS model, a multi-lingual TTS model, trained using 3,176.13 hours from CML-TTS and also with 245.07 hours from LibriTTS, in English. Our purpose in creating this dataset is to open up new research possibilities in the TTS area for multi-lingual models. The dataset is publicly available under the CC-BY 4.0 license1.
△ Less
Submitted 16 June, 2023;
originally announced June 2023.
-
Evaluation of Speech Representations for MOS prediction
Authors:
Frederico S. Oliveira,
Edresson Casanova,
Arnaldo Cândido Júnior,
Lucas R. S. Gris,
Anderson S. Soares,
Arlindo R. Galvão Filho
Abstract:
In this paper, we evaluate feature extraction models for predicting speech quality. We also propose a model architecture to compare embeddings of supervised learning and self-supervised learning models with embeddings of speaker verification models to predict the metric MOS. Our experiments were performed on the VCC2018 dataset and a Brazilian-Portuguese dataset called BRSpeechMOS, which was creat…
▽ More
In this paper, we evaluate feature extraction models for predicting speech quality. We also propose a model architecture to compare embeddings of supervised learning and self-supervised learning models with embeddings of speaker verification models to predict the metric MOS. Our experiments were performed on the VCC2018 dataset and a Brazilian-Portuguese dataset called BRSpeechMOS, which was created for this work. The results show that the Whisper model is appropriate in all scenarios: with both the VCC2018 and BRSpeech- MOS datasets. Among the supervised and self-supervised learning models using BRSpeechMOS, Whisper-Small achieved the best linear correlation of 0.6980, and the speaker verification model, SpeakerNet, had linear correlation of 0.6963. Using VCC2018, the best supervised and self-supervised learning model, Whisper-Large, achieved linear correlation of 0.7274, and the best model speaker verification, TitaNet, achieved a linear correlation of 0.6933. Although the results of the speaker verification models are slightly lower, the SpeakerNet model has only 5M parameters, making it suitable for real-time applications, and the TitaNet model produces an embedding of size 192, the smallest among all the evaluated models. The experiment results are reproducible with publicly available source-code1 .
△ Less
Submitted 16 June, 2023;
originally announced June 2023.
-
Deep learning techniques for blind image super-resolution: A high-scale multi-domain perspective evaluation
Authors:
Valdivino Alexandre de Santiago Júnior
Abstract:
Despite several solutions and experiments have been conducted recently addressing image super-resolution (SR), boosted by deep learning (DL) techniques, they do not usually design evaluations with high scaling factors, capping it at 2x or 4x. Moreover, the datasets are generally benchmarks which do not truly encompass significant diversity of domains to proper evaluate the techniques. It is also i…
▽ More
Despite several solutions and experiments have been conducted recently addressing image super-resolution (SR), boosted by deep learning (DL) techniques, they do not usually design evaluations with high scaling factors, capping it at 2x or 4x. Moreover, the datasets are generally benchmarks which do not truly encompass significant diversity of domains to proper evaluate the techniques. It is also interesting to remark that blind SR is attractive for real-world scenarios since it is based on the idea that the degradation process is unknown, and hence techniques in this context rely basically on low-resolution (LR) images. In this article, we present a high-scale (8x) controlled experiment which evaluates five recent DL techniques tailored for blind image SR: Adaptive Pseudo Augmentation (APA), Blind Image SR with Spatially Variant Degradations (BlindSR), Deep Alternating Network (DAN), FastGAN, and Mixture of Experts Super-Resolution (MoESR). We consider 14 small datasets from five different broader domains which are: aerial, fauna, flora, medical, and satellite. Another distinctive characteristic of our evaluation is that some of the DL approaches were designed for single-image SR but others not. Two no-reference metrics were selected, being the classical natural image quality evaluator (NIQE) and the recent transformer-based multi-dimension attention network for no-reference image quality assessment (MANIQA) score, to assess the techniques. Overall, MoESR can be regarded as the best solution although the perceptual quality of the created HR images of all the techniques still needs to improve. Supporting code: https://github.com/vsantjr/DL_BlindSR. Datasets: https://www.kaggle.com/datasets/valdivinosantiago/dl-blindsr-datasets.
△ Less
Submitted 15 June, 2023;
originally announced June 2023.
-
Evaluating OpenAI's Whisper ASR for Punctuation Prediction and Topic Modeling of life histories of the Museum of the Person
Authors:
Lucas Rafael Stefanel Gris,
Ricardo Marcacini,
Arnaldo Candido Junior,
Edresson Casanova,
Anderson Soares,
Sandra Maria Aluísio
Abstract:
Automatic speech recognition (ASR) systems play a key role in applications involving human-machine interactions. Despite their importance, ASR models for the Portuguese language proposed in the last decade have limitations in relation to the correct identification of punctuation marks in automatic transcriptions, which hinder the use of transcriptions by other systems, models, and even by humans.…
▽ More
Automatic speech recognition (ASR) systems play a key role in applications involving human-machine interactions. Despite their importance, ASR models for the Portuguese language proposed in the last decade have limitations in relation to the correct identification of punctuation marks in automatic transcriptions, which hinder the use of transcriptions by other systems, models, and even by humans. However, recently Whisper ASR was proposed by OpenAI, a general-purpose speech recognition model that has generated great expectations in dealing with such limitations. This chapter presents the first study on the performance of Whisper for punctuation prediction in the Portuguese language. We present an experimental evaluation considering both theoretical aspects involving pausing points (comma) and complete ideas (exclamation, question, and fullstop), as well as practical aspects involving transcript-based topic modeling - an application dependent on punctuation marks for promising performance. We analyzed experimental results from videos of Museum of the Person, a virtual museum that aims to tell and preserve people's life histories, thus discussing the pros and cons of Whisper in a real-world scenario. Although our experiments indicate that Whisper achieves state-of-the-art results, we conclude that some punctuation marks require improvements, such as exclamation, semicolon and colon.
△ Less
Submitted 26 May, 2023; v1 submitted 23 May, 2023;
originally announced May 2023.
-
Computer Vision-Aided Intelligent Monitoring of Coffee: Towards Sustainable Coffee Production
Authors:
Francisco Eron,
Muhammad Noman,
Raphael Ricon de Oliveira,
Deigo de Souza Marques,
Rafael Serapilha Durelli,
Andre Pimenta Freire,
Antonio Chalfun Junior
Abstract:
Coffee which is prepared from the grinded roasted seeds of harvested coffee cherries, is one of the most consumed beverage and traded commodity, globally. To manually monitor the coffee field regularly, and inform about plant and soil health, as well as estimate yield and harvesting time, is labor-intensive, time-consuming and error-prone. Some recent studies have developed sensors for estimating…
▽ More
Coffee which is prepared from the grinded roasted seeds of harvested coffee cherries, is one of the most consumed beverage and traded commodity, globally. To manually monitor the coffee field regularly, and inform about plant and soil health, as well as estimate yield and harvesting time, is labor-intensive, time-consuming and error-prone. Some recent studies have developed sensors for estimating coffee yield at the time of harvest, however a more inclusive and applicable technology to remotely monitor multiple parameters of the field and estimate coffee yield and quality even at pre-harvest stage, was missing. Following precision agriculture approach, we employed machine learning algorithm YOLO, for image processing of coffee plant. In this study, the latest version of the state-of-the-art algorithm YOLOv7 was trained with 324 annotated images followed by its evaluation with 82 unannotated images as test data. Next, as an innovative approach for annotating the training data, we trained K-means models which led to machine-generated color classes of coffee fruit and could thus characterize the informed objects in the image. Finally, we attempted to develop an AI-based handy mobile application which would not only efficiently predict harvest time, estimate coffee yield and quality, but also inform about plant health. Resultantly, the developed model efficiently analyzed the test data with a mean average precision of 0.89. Strikingly, our innovative semi-supervised method with an mean average precision of 0.77 for multi-class mode surpassed the supervised method with mean average precision of only 0.60, leading to faster and more accurate annotation. The mobile application we designed based on the developed code, was named CoffeApp, which possesses multiple features of analyzing fruit from the image taken by phone camera with in field and can thus track fruit ripening in real time.
△ Less
Submitted 11 April, 2023;
originally announced April 2023.
-
A visão da BBChain sobre o contexto tecnológico subjacente à adoção do Real Digital
Authors:
Marcio G B de Avellar,
Alexandre A S Junior,
André H G Lopes,
André L S Carneiro,
João A Pereira,
Davi C B D da Cunha
Abstract:
We explore confidential computing in the context of CBDCs using Microsoft's CCF framework as an example. By developing an experiment and comparing different approaches and performance and security metrics, we seek to evaluate the effectiveness of confidential computing to improve the privacy, security, and performance of CBDCs. Preliminary results suggest that confidential computing could be a pro…
▽ More
We explore confidential computing in the context of CBDCs using Microsoft's CCF framework as an example. By developing an experiment and comparing different approaches and performance and security metrics, we seek to evaluate the effectiveness of confidential computing to improve the privacy, security, and performance of CBDCs. Preliminary results suggest that confidential computing could be a promising solution to the technological challenges faced by CBDCs. Furthermore, by implementing confidential computing in DLTs such as Hyperledger Besu and utilizing frameworks such as CCF, we increase transaction confidentiality and privacy while maintaining the scalability and interoperability required for a global digital financial system. In conclusion, confidential computing can significantly bolster CBDC development, fostering a secure, private, and efficient financial future.
--
Exploramos o uso da computação confidencial no contexto das CBDCs utilizando o framework CCF da Microsoft como exemplo. Via desenvolvimento de experimentos e comparação de diferentes abordagens e métricas de desempenho e segurança, buscamos avaliar a eficácia da computação confidencial para melhorar a privacidade, segurança e desempenho das CBDCs. Resultados preliminares sugerem que a computação confidencial pode ser uma solução promissora para os desafios tecnológicos enfrentados pelas CBDCs. Ao implementar a computação confidencial em DLTs, como o Hyperledger Besu, e utilizar frameworks como o CCF, aumentamos a confidencialidade e a privacidade das transações, mantendo a escalabilidade e a interoperabilidade necessárias para um sistema financeiro global e digital. Em conclusão, a computação confidencial pode reforçar significativamente o desenvolvimento do CBDC, promovendo um futuro financeiro seguro, privado e eficiente.
△ Less
Submitted 10 April, 2023;
originally announced April 2023.
-
Interpretability Analysis of Deep Models for COVID-19 Detection
Authors:
Daniel Peixoto Pinto da Silva,
Edresson Casanova,
Lucas Rafael Stefanel Gris,
Arnaldo Candido Junior,
Marcelo Finger,
Flaviane Svartman,
Beatriz Raposo,
Marcus Vinícius Moreira Martins,
Sandra Maria Aluísio,
Larissa Cristina Berti,
João Paulo Teixeira
Abstract:
During the outbreak of COVID-19 pandemic, several research areas joined efforts to mitigate the damages caused by SARS-CoV-2. In this paper we present an interpretability analysis of a convolutional neural network based model for COVID-19 detection in audios. We investigate which features are important for model decision process, investigating spectrograms, F0, F0 standard deviation, sex and age.…
▽ More
During the outbreak of COVID-19 pandemic, several research areas joined efforts to mitigate the damages caused by SARS-CoV-2. In this paper we present an interpretability analysis of a convolutional neural network based model for COVID-19 detection in audios. We investigate which features are important for model decision process, investigating spectrograms, F0, F0 standard deviation, sex and age. Following, we analyse model decisions by generating heat maps for the trained models to capture their attention during the decision process. Focusing on a explainable Inteligence Artificial approach, we show that studied models can taken unbiased decisions even in the presence of spurious data in the training set, given the adequate preprocessing steps. Our best model has 94.44% of accuracy in detection, with results indicating that models favors spectrograms for the decision process, particularly, high energy areas in the spectrogram related to prosodic domains, while F0 also leads to efficient COVID-19 detection.
△ Less
Submitted 25 November, 2022;
originally announced November 2022.
-
Bringing NURC/SP to Digital Life: the Role of Open-source Automatic Speech Recognition Models
Authors:
Lucas Rafael Stefanel Gris,
Arnaldo Candido Junior,
Vinícius G. dos Santos,
Bruno A. Papa Dias,
Marli Quadros Leite,
Flaviane Romani Fernandes Svartman,
Sandra Aluísio
Abstract:
The NURC Project that started in 1969 to study the cultured linguistic urban norm spoken in five Brazilian capitals, was responsible for compiling a large corpus for each capital. The digitized NURC/SP comprises 375 inquiries in 334 hours of recordings taken in São Paulo capital. Although 47 inquiries have transcripts, there was no alignment between the audio-transcription, and 328 inquiries were…
▽ More
The NURC Project that started in 1969 to study the cultured linguistic urban norm spoken in five Brazilian capitals, was responsible for compiling a large corpus for each capital. The digitized NURC/SP comprises 375 inquiries in 334 hours of recordings taken in São Paulo capital. Although 47 inquiries have transcripts, there was no alignment between the audio-transcription, and 328 inquiries were not transcribed. This article presents an evaluation and error analysis of three automatic speech recognition models trained with spontaneous speech in Portuguese and one model trained with prepared speech. The evaluation allowed us to choose the best model, using WER and CER metrics, in a manually aligned sample of NURC/SP, to automatically transcribe 284 hours.
△ Less
Submitted 14 October, 2022;
originally announced October 2022.
-
Generalizing intrusion detection for heterogeneous networks: A stacked-unsupervised federated learning approach
Authors:
Gustavo de Carvalho Bertoli,
Lourenço Alves Pereira Junior,
Aldri Luiz dos Santos,
Osamu Saotome
Abstract:
The constantly evolving digital transformation imposes new requirements on our society. Aspects relating to reliance on the networking domain and the difficulty of achieving security by design pose a challenge today. As a result, data-centric and machine-learning approaches arose as feasible solutions for securing large networks. Although, in the network security domain, ML-based solutions face a…
▽ More
The constantly evolving digital transformation imposes new requirements on our society. Aspects relating to reliance on the networking domain and the difficulty of achieving security by design pose a challenge today. As a result, data-centric and machine-learning approaches arose as feasible solutions for securing large networks. Although, in the network security domain, ML-based solutions face a challenge regarding the capability to generalize between different contexts. In other words, solutions based on specific network data usually do not perform satisfactorily on other networks. This paper describes the stacked-unsupervised federated learning (FL) approach to generalize on a cross-silo configuration for a flow-based network intrusion detection system (NIDS). The proposed approach we have examined comprises a deep autoencoder in conjunction with an energy flow classifier in an ensemble learning task. Our approach performs better than traditional local learning and naive cross-evaluation (training in one context and testing on another network data). Remarkably, the proposed approach demonstrates a sound performance in the case of non-iid data silos. In conjunction with an informative feature in an ensemble architecture for unsupervised learning, we advise that the proposed FL-based NIDS results in a feasible approach for generalization between heterogeneous networks. To the best of our knowledge, our proposal is the first successful approach to applying unsupervised FL on the problem of network intrusion detection generalization using flow-based data.
△ Less
Submitted 28 November, 2022; v1 submitted 1 September, 2022;
originally announced September 2022.
-
Large-Margin Representation Learning for Texture Classification
Authors:
Jonathan de Matos,
Luiz Eduardo Soares de Oliveira,
Alceu de Souza Britto Junior,
Alessandro Lameiras Koerich
Abstract:
This paper presents a novel approach combining convolutional layers (CLs) and large-margin metric learning for training supervised models on small datasets for texture classification. The core of such an approach is a loss function that computes the distances between instances of interest and support vectors. The objective is to update the weights of CLs iteratively to learn a representation with…
▽ More
This paper presents a novel approach combining convolutional layers (CLs) and large-margin metric learning for training supervised models on small datasets for texture classification. The core of such an approach is a loss function that computes the distances between instances of interest and support vectors. The objective is to update the weights of CLs iteratively to learn a representation with a large margin between classes. Each iteration results in a large-margin discriminant model represented by support vectors based on such a representation. The advantage of the proposed approach w.r.t. convolutional neural networks (CNNs) is two-fold. First, it allows representation learning with a small amount of data due to the reduced number of parameters compared to an equivalent CNN. Second, it has a low training cost since the backpropagation considers only support vectors. The experimental results on texture and histopathologic image datasets have shown that the proposed approach achieves competitive accuracy with lower computational cost and faster convergence when compared to equivalent CNNs.
△ Less
Submitted 17 June, 2022;
originally announced June 2022.
-
Non-Intrusive Reduced Models based on Operator Inference for Chaotic Systems
Authors:
João Lucas de Sousa Almeida,
Arthur Cancellieri Pires,
Klaus Feine Vaz Cid,
Alberto Costa Nogueira Junior
Abstract:
This work explores the physics-driven machine learning technique Operator Inference (OpInf) for predicting the state of chaotic dynamical systems. OpInf provides a non-intrusive approach to infer approximations of polynomial operators in reduced space without having access to the full order operators appearing in discretized models. Datasets for the physics systems are generated using conventional…
▽ More
This work explores the physics-driven machine learning technique Operator Inference (OpInf) for predicting the state of chaotic dynamical systems. OpInf provides a non-intrusive approach to infer approximations of polynomial operators in reduced space without having access to the full order operators appearing in discretized models. Datasets for the physics systems are generated using conventional numerical solvers and then projected to a low-dimensional space via Principal Component Analysis (PCA). In latent space, a least-squares problem is set to fit a quadratic polynomial operator, which is subsequently employed in a time-integration scheme in order to produce extrapolations in the same space. Once solved, the inverse PCA operation is applied to reconstruct the extrapolations in the original space. The quality of the OpInf predictions is assessed via the Normalized Root Mean Squared Error (NRMSE) metric from which the Valid Prediction Time (VPT) is computed. Numerical experiments considering the chaotic systems Lorenz 96 and the Kuramoto-Sivashinsky equation show promising forecasting capabilities of the OpInf reduced order models with VPT ranges that outperform state-of-the-art machine learning methods such as backpropagation and reservoir computing recurrent neural networks [1], as well as Markov neural operators [2].
△ Less
Submitted 21 September, 2022; v1 submitted 1 June, 2022;
originally announced June 2022.
-
Machine learning method for return direction forecasting of Exchange Traded Funds using classification and regression models
Authors:
Raphael P. B. Piovezan,
Pedro Paulo de Andrade Junior
Abstract:
This article aims to propose and apply a machine learning method to analyze the direction of returns from Exchange Traded Funds (ETFs) using the historical return data of its components, helping to make investment strategy decisions through a trading algorithm. In methodological terms, regression and classification models were applied, using standard datasets from Brazilian and American markets, i…
▽ More
This article aims to propose and apply a machine learning method to analyze the direction of returns from Exchange Traded Funds (ETFs) using the historical return data of its components, helping to make investment strategy decisions through a trading algorithm. In methodological terms, regression and classification models were applied, using standard datasets from Brazilian and American markets, in addition to algorithmic error metrics. In terms of research results, they were analyzed and compared to those of the Naïve forecast and the returns obtained by the buy & hold technique in the same period of time. In terms of risk and return, the models mostly performed better than the control metrics, with emphasis on the linear regression model and the classification models by logistic regression, support vector machine (using the LinearSVC model), Gaussian Naive Bayes and K-Nearest Neighbors, where in certain datasets the returns exceeded by two times and the Sharpe ratio by up to four times those of the buy & hold control model.
△ Less
Submitted 13 June, 2022; v1 submitted 25 May, 2022;
originally announced May 2022.
-
ASR data augmentation in low-resource settings using cross-lingual multi-speaker TTS and cross-lingual voice conversion
Authors:
Edresson Casanova,
Christopher Shulby,
Alexander Korolev,
Arnaldo Candido Junior,
Anderson da Silva Soares,
Sandra Aluísio,
Moacir Antonelli Ponti
Abstract:
We explore cross-lingual multi-speaker speech synthesis and cross-lingual voice conversion applied to data augmentation for automatic speech recognition (ASR) systems in low/medium-resource scenarios. Through extensive experiments, we show that our approach permits the application of speech synthesis and voice conversion to improve ASR systems using only one target-language speaker during model tr…
▽ More
We explore cross-lingual multi-speaker speech synthesis and cross-lingual voice conversion applied to data augmentation for automatic speech recognition (ASR) systems in low/medium-resource scenarios. Through extensive experiments, we show that our approach permits the application of speech synthesis and voice conversion to improve ASR systems using only one target-language speaker during model training. We also managed to close the gap between ASR models trained with synthesized versus human speech compared to other works that use many speakers. Finally, we show that it is possible to obtain promising ASR training results with our data augmentation method using only a single real speaker in a target language.
△ Less
Submitted 20 May, 2023; v1 submitted 29 March, 2022;
originally announced April 2022.
-
A Review of Deep Learning-based Approaches for Deepfake Content Detection
Authors:
Leandro A. Passos,
Danilo Jodas,
Kelton A. P. da Costa,
Luis A. Souza Júnior,
Douglas Rodrigues,
Javier Del Ser,
David Camacho,
João Paulo Papa
Abstract:
Recent advancements in deep learning generative models have raised concerns as they can create highly convincing counterfeit images and videos. This poses a threat to people's integrity and can lead to social instability. To address this issue, there is a pressing need to develop new computational models that can efficiently detect forged content and alert users to potential image and video manipu…
▽ More
Recent advancements in deep learning generative models have raised concerns as they can create highly convincing counterfeit images and videos. This poses a threat to people's integrity and can lead to social instability. To address this issue, there is a pressing need to develop new computational models that can efficiently detect forged content and alert users to potential image and video manipulations. This paper presents a comprehensive review of recent studies for deepfake content detection using deep learning-based approaches. We aim to broaden the state-of-the-art research by systematically reviewing the different categories of fake content detection. Furthermore, we report the advantages and drawbacks of the examined works, and prescribe several future directions towards the issues and shortcomings still unsolved on deepfake detection.
△ Less
Submitted 15 February, 2024; v1 submitted 12 February, 2022;
originally announced February 2022.
-
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
Authors:
Edresson Casanova,
Julian Weber,
Christopher Shulby,
Arnaldo Candido Junior,
Eren Gölge,
Moacir Antonelli Ponti
Abstract:
YourTTS brings the power of a multilingual approach to the task of zero-shot multi-speaker TTS. Our method builds upon the VITS model and adds several novel modifications for zero-shot multi-speaker and multilingual training. We achieved state-of-the-art (SOTA) results in zero-shot multi-speaker TTS and results comparable to SOTA in zero-shot voice conversion on the VCTK dataset. Additionally, our…
▽ More
YourTTS brings the power of a multilingual approach to the task of zero-shot multi-speaker TTS. Our method builds upon the VITS model and adds several novel modifications for zero-shot multi-speaker and multilingual training. We achieved state-of-the-art (SOTA) results in zero-shot multi-speaker TTS and results comparable to SOTA in zero-shot voice conversion on the VCTK dataset. Additionally, our approach achieves promising results in a target language with a single-speaker dataset, opening possibilities for zero-shot multi-speaker TTS and zero-shot voice conversion systems in low-resource languages. Finally, it is possible to fine-tune the YourTTS model with less than 1 minute of speech and achieve state-of-the-art results in voice similarity and with reasonable quality. This is important to allow synthesis for speakers with a very different voice or recording characteristics from those seen during training.
△ Less
Submitted 30 April, 2023; v1 submitted 4 December, 2021;
originally announced December 2021.
-
Panoptic Segmentation Meets Remote Sensing
Authors:
Osmar Luiz Ferreira de Carvalho,
Osmar Abílio de Carvalho Júnior,
Cristiano Rosa e Silva,
Anesmar Olino de Albuquerque,
Nickolas Castro Santana,
Dibio Leandro Borges,
Roberto Arnaldo Trancoso Gomes,
Renato Fontes Guimarães
Abstract:
Panoptic segmentation combines instance and semantic predictions, allowing the detection of "things" and "stuff" simultaneously. Effectively approaching panoptic segmentation in remotely sensed data can be auspicious in many challenging problems since it allows continuous mapping and specific target counting. Several difficulties have prevented the growth of this task in remote sensing: (a) most a…
▽ More
Panoptic segmentation combines instance and semantic predictions, allowing the detection of "things" and "stuff" simultaneously. Effectively approaching panoptic segmentation in remotely sensed data can be auspicious in many challenging problems since it allows continuous mapping and specific target counting. Several difficulties have prevented the growth of this task in remote sensing: (a) most algorithms are designed for traditional images, (b) image labelling must encompass "things" and "stuff" classes, and (c) the annotation format is complex. Thus, aiming to solve and increase the operability of panoptic segmentation in remote sensing, this study has five objectives: (1) create a novel data preparation pipeline for panoptic segmentation, (2) propose an annotation conversion software to generate panoptic annotations; (3) propose a novel dataset on urban areas, (4) modify the Detectron2 for the task, and (5) evaluate difficulties of this task in the urban setting. We used an aerial image with a 0,24-meter spatial resolution considering 14 classes. Our pipeline considers three image inputs, and the proposed software uses point shapefiles for creating samples in the COCO format. Our study generated 3,400 samples with 512x512 pixel dimensions. We used the Panoptic-FPN with two backbones (ResNet-50 and ResNet-101), and the model evaluation considered semantic instance and panoptic metrics. We obtained 93.9, 47.7, and 64.9 for the mean IoU, box AP, and PQ. Our study presents the first effective pipeline for panoptic segmentation and an extensive database for other researchers to use and deal with other data or related problems requiring a thorough scene understanding.
△ Less
Submitted 30 November, 2021; v1 submitted 23 November, 2021;
originally announced November 2021.
-
Bounding Box-Free Instance Segmentation Using Semi-Supervised Learning for Generating a City-Scale Vehicle Dataset
Authors:
Osmar Luiz Ferreira de Carvalho,
Osmar Abílio de Carvalho Júnior,
Anesmar Olino de Albuquerque,
Nickolas Castro Santana,
Dibio Leandro Borges,
Roberto Arnaldo Trancoso Gomes,
Renato Fontes Guimarães
Abstract:
Vehicle classification is a hot computer vision topic, with studies ranging from ground-view up to top-view imagery. In remote sensing, the usage of top-view images allows for understanding city patterns, vehicle concentration, traffic management, and others. However, there are some difficulties when aiming for pixel-wise classification: (a) most vehicle classification studies use object detection…
▽ More
Vehicle classification is a hot computer vision topic, with studies ranging from ground-view up to top-view imagery. In remote sensing, the usage of top-view images allows for understanding city patterns, vehicle concentration, traffic management, and others. However, there are some difficulties when aiming for pixel-wise classification: (a) most vehicle classification studies use object detection methods, and most publicly available datasets are designed for this task, (b) creating instance segmentation datasets is laborious, and (c) traditional instance segmentation methods underperform on this task since the objects are small. Thus, the present research objectives are: (1) propose a novel semi-supervised iterative learning approach using GIS software, (2) propose a box-free instance segmentation approach, and (3) provide a city-scale vehicle dataset. The iterative learning procedure considered: (1) label a small number of vehicles, (2) train on those samples, (3) use the model to classify the entire image, (4) convert the image prediction into a polygon shapefile, (5) correct some areas with errors and include them in the training data, and (6) repeat until results are satisfactory. To separate instances, we considered vehicle interior and vehicle borders, and the DL model was the U-net with the Efficient-net-B7 backbone. When removing the borders, the vehicle interior becomes isolated, allowing for unique object identification. To recover the deleted 1-pixel borders, we proposed a simple method to expand each prediction. The results show better pixel-wise metrics when compared to the Mask-RCNN (82% against 67% in IoU). On per-object analysis, the overall accuracy, precision, and recall were greater than 90%. This pipeline applies to any remote sensing target, being very efficient for segmentation and generating datasets.
△ Less
Submitted 23 November, 2021;
originally announced November 2021.
-
Understanding mobility in networks: A node embedding approach
Authors:
Matheus F. C. Barros,
Carlos H. G. Ferreira,
Bruno Pereira dos Santos,
Lourenço A. P. Júnior,
Marco Mellia,
Jussara M. Almeida
Abstract:
Motivated by the growing number of mobile devices capable of connecting and exchanging messages, we propose a methodology aiming to model and analyze node mobility in networks. We note that many existing solutions in the literature rely on topological measurements calculated directly on the graph of node contacts, aiming to capture the notion of the node's importance in terms of connectivity and m…
▽ More
Motivated by the growing number of mobile devices capable of connecting and exchanging messages, we propose a methodology aiming to model and analyze node mobility in networks. We note that many existing solutions in the literature rely on topological measurements calculated directly on the graph of node contacts, aiming to capture the notion of the node's importance in terms of connectivity and mobility patterns beneficial for prototyping, design, and deployment of mobile networks. However, each measure has its specificity and fails to generalize the node importance notions that ultimately change over time. Unlike previous approaches, our methodology is based on a node embedding method that models and unveils the nodes' importance in mobility and connectivity patterns while preserving their spatial and temporal characteristics. We focus on a case study based on a trace of group meetings. The results show that our methodology provides a rich representation for extracting different mobility and connectivity patterns, which can be helpful for various applications and services in mobile networks.
△ Less
Submitted 11 November, 2021;
originally announced November 2021.
-
CORAA: a large corpus of spontaneous and prepared speech manually validated for speech recognition in Brazilian Portuguese
Authors:
Arnaldo Candido Junior,
Edresson Casanova,
Anderson Soares,
Frederico Santos de Oliveira,
Lucas Oliveira,
Ricardo Corso Fernandes Junior,
Daniel Peixoto Pinto da Silva,
Fernando Gorgulho Fayet,
Bruno Baldissera Carlotto,
Lucas Rafael Stefanel Gris,
Sandra Maria Aluísio
Abstract:
Automatic Speech recognition (ASR) is a complex and challenging task. In recent years, there have been significant advances in the area. In particular, for the Brazilian Portuguese (BP) language, there were about 376 hours public available for ASR task until the second half of 2020. With the release of new datasets in early 2021, this number increased to 574 hours. The existing resources, however,…
▽ More
Automatic Speech recognition (ASR) is a complex and challenging task. In recent years, there have been significant advances in the area. In particular, for the Brazilian Portuguese (BP) language, there were about 376 hours public available for ASR task until the second half of 2020. With the release of new datasets in early 2021, this number increased to 574 hours. The existing resources, however, are composed of audios containing only read and prepared speech. There is a lack of datasets including spontaneous speech, which are essential in different ASR applications. This paper presents CORAA (Corpus of Annotated Audios) v1. with 290.77 hours, a publicly available dataset for ASR in BP containing validated pairs (audio-transcription). CORAA also contains European Portuguese audios (4.69 hours). We also present a public ASR model based on Wav2Vec 2.0 XLSR-53 and fine-tuned over CORAA. Our model achieved a Word Error Rate of 24.18% on CORAA test set and 20.08% on Common Voice test set. When measuring the Character Error Rate, we obtained 11.02% and 6.34% for CORAA and Common Voice, respectively. CORAA corpora were assembled to both improve ASR models in BP with phenomena from spontaneous speech and motivate young researchers to start their studies on ASR for Portuguese. All the corpora are publicly available at https://github.com/nilc-nlp/CORAA under the CC BY-NC-ND 4.0 license.
△ Less
Submitted 18 November, 2021; v1 submitted 14 October, 2021;
originally announced October 2021.
-
Bridging the gap to real-world for network intrusion detection systems with data-centric approach
Authors:
Gustavo de Carvalho Bertoli,
Lourenço Alves Pereira Junior,
Filipe Alves Neto Verri,
Aldri Luiz dos Santos,
Osamu Saotome
Abstract:
Most research using machine learning (ML) for network intrusion detection systems (NIDS) uses well-established datasets such as KDD-CUP99, NSL-KDD, UNSW-NB15, and CICIDS-2017. In this context, the possibilities of machine learning techniques are explored, aiming for metrics improvements compared to the published baselines (model-centric approach). However, those datasets present some limitations a…
▽ More
Most research using machine learning (ML) for network intrusion detection systems (NIDS) uses well-established datasets such as KDD-CUP99, NSL-KDD, UNSW-NB15, and CICIDS-2017. In this context, the possibilities of machine learning techniques are explored, aiming for metrics improvements compared to the published baselines (model-centric approach). However, those datasets present some limitations as aging that make it unfeasible to transpose those ML-based solutions to real-world applications. This paper presents a systematic data-centric approach to address the current limitations of NIDS research, specifically the datasets. This approach generates NIDS datasets composed of the most recent network traffic and attacks, with the labeling process integrated by design.
△ Less
Submitted 8 January, 2022; v1 submitted 25 October, 2021;
originally announced October 2021.
-
EEG multipurpose eye blink detector using convolutional neural network
Authors:
Amanda Ferrari Iaquinta,
Ana Carolina de Sousa Silva,
Aldrumont Ferraz Júnior,
Jessica Monique de Toledo,
Gustavo Voltani von Atzingen
Abstract:
The electrical signal emitted by the eyes movement produces a very strong artifact on EEG signaldue to its close proximity to the sensors and abundance of occurrence. In the context of detectingeye blink artifacts in EEG waveforms for further removal and signal purification, multiple strategieswhere proposed in the literature. Most commonly applied methods require the use of a large numberof elect…
▽ More
The electrical signal emitted by the eyes movement produces a very strong artifact on EEG signaldue to its close proximity to the sensors and abundance of occurrence. In the context of detectingeye blink artifacts in EEG waveforms for further removal and signal purification, multiple strategieswhere proposed in the literature. Most commonly applied methods require the use of a large numberof electrodes, complex equipment for sampling and processing data. The goal of this work is to createa reliable and user independent algorithm for detecting and removing eye blink in EEG signals usingCNN (convolutional neural network). For training and validation, three sets of public EEG data wereused. All three sets contain samples obtained while the recruited subjects performed assigned tasksthat included blink voluntarily in specific moments, watch a video and read an article. The modelused in this study was able to have an embracing understanding of all the features that distinguish atrivial EEG signal from a signal contaminated with eye blink artifacts without being overfitted byspecific features that only occurred in the situations when the signals were registered.
△ Less
Submitted 28 July, 2021;
originally announced July 2021.
-
Brazilian Portuguese Speech Recognition Using Wav2vec 2.0
Authors:
Lucas Rafael Stefanel Gris,
Edresson Casanova,
Frederico Santos de Oliveira,
Anderson da Silva Soares,
Arnaldo Candido Junior
Abstract:
Deep learning techniques have been shown to be efficient in various tasks, especially in the development of speech recognition systems, that is, systems that aim to transcribe an audio sentence in a sequence of written words. Despite the progress in the area, speech recognition can still be considered difficult, especially for languages lacking available data, such as Brazilian Portuguese (BP). In…
▽ More
Deep learning techniques have been shown to be efficient in various tasks, especially in the development of speech recognition systems, that is, systems that aim to transcribe an audio sentence in a sequence of written words. Despite the progress in the area, speech recognition can still be considered difficult, especially for languages lacking available data, such as Brazilian Portuguese (BP). In this sense, this work presents the development of an public Automatic Speech Recognition (ASR) system using only open available audio data, from the fine-tuning of the Wav2vec 2.0 XLSR-53 model pre-trained in many languages, over BP data. The final model presents an average word error rate of 12.4% over 7 different datasets (10.5% when applying a language model). According to our knowledge, the obtained error is the lowest among open end-to-end (E2E) ASR models for BP.
△ Less
Submitted 22 December, 2021; v1 submitted 23 July, 2021;
originally announced July 2021.
-
ICDAR 2021 Competition on Components Segmentation Task of Document Photos
Authors:
Celso A. M. Lopes Junior,
Ricardo B. das Neves Junior,
Byron L. D. Bezerra,
Alejandro H. Toselli,
Donato Impedovo
Abstract:
This paper describes the short-term competition on the Components Segmentation Task of Document Photos that was prepared in the context of the 16th International Conference on Document Analysis and Recognition (ICDAR 2021). This competition aims to bring together researchers working in the field of identification document image processing and provides them a suitable benchmark to compare their tec…
▽ More
This paper describes the short-term competition on the Components Segmentation Task of Document Photos that was prepared in the context of the 16th International Conference on Document Analysis and Recognition (ICDAR 2021). This competition aims to bring together researchers working in the field of identification document image processing and provides them a suitable benchmark to compare their techniques on the component segmentation task of document images. Three challenge tasks were proposed entailing different segmentation assignments to be performed on a provided dataset. The collected data are from several types of Brazilian ID documents, whose personal information was conveniently replaced. There were 16 participants whose results obtained for some or all the three tasks show different rates for the adopted metrics, like Dice Similarity Coefficient ranging from 0.06 to 0.99. Different Deep Learning models were applied by the entrants with diverse strategies to achieve the best results in each of the tasks. Obtained results show that the currently applied methods for solving one of the proposed tasks (document boundary detection) are already well established. However, for the other two challenge tasks (text zone and handwritten sign detection) research and development of more robust approaches are still required to achieve acceptable results.
△ Less
Submitted 8 July, 2021; v1 submitted 15 June, 2021;
originally announced June 2021.
-
SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model
Authors:
Edresson Casanova,
Christopher Shulby,
Eren Gölge,
Nicolas Michael Müller,
Frederico Santos de Oliveira,
Arnaldo Candido Junior,
Anderson da Silva Soares,
Sandra Maria Aluisio,
Moacir Antonelli Ponti
Abstract:
In this paper, we propose SC-GlowTTS: an efficient zero-shot multi-speaker text-to-speech model that improves similarity for speakers unseen during training. We propose a speaker-conditional architecture that explores a flow-based decoder that works in a zero-shot scenario. As text encoders, we explore a dilated residual convolutional-based encoder, gated convolutional-based encoder, and transform…
▽ More
In this paper, we propose SC-GlowTTS: an efficient zero-shot multi-speaker text-to-speech model that improves similarity for speakers unseen during training. We propose a speaker-conditional architecture that explores a flow-based decoder that works in a zero-shot scenario. As text encoders, we explore a dilated residual convolutional-based encoder, gated convolutional-based encoder, and transformer-based encoder. Additionally, we have shown that adjusting a GAN-based vocoder for the spectrograms predicted by the TTS model on the training dataset can significantly improve the similarity and speech quality for new speakers. Our model converges using only 11 speakers, reaching state-of-the-art results for similarity with new speakers, as well as high speech quality.
△ Less
Submitted 15 June, 2021; v1 submitted 2 April, 2021;
originally announced April 2021.
-
Analysis of Social Robotic Navigation approaches: CNN Encoder and Incremental Learning as an alternative to Deep Reinforcement Learning
Authors:
Janderson Ferreira,
Agostinho A. F. Júnior,
Letícia Castro,
Yves M. Galvão,
Pablo Barros,
Bruno J. T. Fernandes
Abstract:
Dealing with social tasks in robotic scenarios is difficult, as having humans in the learning loop is incompatible with most of the state-of-the-art machine learning algorithms. This is the case when exploring Incremental learning models, in particular the ones involving reinforcement learning. In this work, we discuss this problem and possible solutions by analysing a previous study on adaptive c…
▽ More
Dealing with social tasks in robotic scenarios is difficult, as having humans in the learning loop is incompatible with most of the state-of-the-art machine learning algorithms. This is the case when exploring Incremental learning models, in particular the ones involving reinforcement learning. In this work, we discuss this problem and possible solutions by analysing a previous study on adaptive convolutional encoders for a social navigation task.
△ Less
Submitted 5 September, 2020; v1 submitted 18 August, 2020;
originally announced August 2020.
-
Performance Improvement of Path Planning algorithms with Deep Learning Encoder Model
Authors:
Janderson Ferreira,
Agostinho A. F. Júnior,
Yves M. Galvão,
Pablo Barros,
Sergio Murilo Maciel Fernandes,
Bruno J. T. Fernandes
Abstract:
Currently, path planning algorithms are used in many daily tasks. They are relevant to find the best route in traffic and make autonomous robots able to navigate. The use of path planning presents some issues in large and dynamic environments. Large environments make these algorithms spend much time finding the shortest path. On the other hand, dynamic environments request a new execution of the a…
▽ More
Currently, path planning algorithms are used in many daily tasks. They are relevant to find the best route in traffic and make autonomous robots able to navigate. The use of path planning presents some issues in large and dynamic environments. Large environments make these algorithms spend much time finding the shortest path. On the other hand, dynamic environments request a new execution of the algorithm each time a change occurs in the environment, and it increases the execution time. The dimensionality reduction appears as a solution to this problem, which in this context means removing useless paths present in those environments. Most of the algorithms that reduce dimensionality are limited to the linear correlation of the input data. Recently, a Convolutional Neural Network (CNN) Encoder was used to overcome this situation since it can use both linear and non-linear information to data reduction. This paper analyzes in-depth the performance to eliminate the useless paths using this CNN Encoder model. To measure the mentioned model efficiency, we combined it with different path planning algorithms. Next, the final algorithms (combined and not combined) are checked in a database that is composed of five scenarios. Each scenario contains fixed and dynamic obstacles. Their proposed model, the CNN Encoder, associated to other existent path planning algorithms in the literature, was able to obtain a time decrease to find the shortest path in comparison to all path planning algorithms analyzed. the average decreased time was 54.43 %.
△ Less
Submitted 5 August, 2020;
originally announced August 2020.
-
FCN+RL: A Fully Convolutional Network followed by Refinement Layers to Offline Handwritten Signature Segmentation
Authors:
Celso A. M. Lopes Junior,
Matheus Henrique M. da Silva,
Byron Leite Dantas Bezerra,
Bruno Jose Torres Fernandes,
Donato Impedovo
Abstract:
Although secular, handwritten signature is one of the most reliable biometric methods used by most countries. In the last ten years, the application of technology for verification of handwritten signatures has evolved strongly, including forensic aspects. Some factors, such as the complexity of the background and the small size of the region of interest - signature pixels - increase the difficulty…
▽ More
Although secular, handwritten signature is one of the most reliable biometric methods used by most countries. In the last ten years, the application of technology for verification of handwritten signatures has evolved strongly, including forensic aspects. Some factors, such as the complexity of the background and the small size of the region of interest - signature pixels - increase the difficulty of the targeting task. Other factors that make it challenging are the various variations present in handwritten signatures such as location, type of ink, color and type of pen, and the type of stroke. In this work, we propose an approach to locate and extract the pixels of handwritten signatures on identification documents, without any prior information on the location of the signatures. The technique used is based on a fully convolutional encoder-decoder network combined with a block of refinement layers for the alpha channel of the predicted image. The experimental results demonstrate that the technique outputs a clean signature with higher fidelity in the lines than the traditional approaches and preservation of the pertinent characteristics to the signer's spelling. To evaluate the quality of our proposal, we use the following image similarity metrics: SSIM, SIFT, and Dice Coefficient. The qualitative and quantitative results show a significant improvement in comparison with the baseline system.
△ Less
Submitted 28 May, 2020;
originally announced May 2020.
-
Social Interaction Layers in Complex Networks for the Dynamical Epidemic Modeling of COVID-19 in Brazil
Authors:
Leonardo F. S. Scabini,
Lucas C. Ribas,
Mariane B. Neiva,
Altamir G. B. Junior,
Alex J. F. Farfán,
Odemir M. Bruno
Abstract:
We are currently living in a state of uncertainty due to the pandemic caused by the Sars-CoV-2 virus. There are several factors involved in the epidemic spreading such as the individual characteristics of each city/country. The true shape of the epidemic dynamics is a large, complex system such as most of the social systems. In this context, Complex networks are a great candidate to analyze these…
▽ More
We are currently living in a state of uncertainty due to the pandemic caused by the Sars-CoV-2 virus. There are several factors involved in the epidemic spreading such as the individual characteristics of each city/country. The true shape of the epidemic dynamics is a large, complex system such as most of the social systems. In this context, Complex networks are a great candidate to analyze these systems due to their ability to tackle structural and dynamical properties. Therefore this study presents a new approach to model the COVID-19 epidemic using a multi-layer complex network, where nodes represent people, edges are social contacts, and layers represent different social activities. The model improves the traditional SIR and it is applied to study the Brazilian epidemic by analyzing possible future actions and their consequences. The network is characterized using statistics of infection, death, and hospitalization time. To simulate isolation, social distancing, or precautionary measures we remove layers and/or reduce the intensity of social contacts. Results show that even taking various optimistic assumptions, the current isolation levels in Brazil still may lead to a critical scenario for the healthcare system and a considerable death toll (average of 149,000). If all activities return to normal, the epidemic growth may suffer a steep increase, and the demand for ICU beds may surpass 3 times the country's capacity. This would surely lead to a catastrophic scenario, as our estimation reaches an average of 212,000 deaths even considering that all cases are effectively treated. The increase of isolation (up to a lockdown) shows to be the best option to keep the situation under the healthcare system capacity, aside from ensuring a faster decrease of new case occurrences (months of difference), and a significantly smaller death toll (average of 87,000).
△ Less
Submitted 20 May, 2020; v1 submitted 16 May, 2020;
originally announced May 2020.
-
TTS-Portuguese Corpus: a corpus for speech synthesis in Brazilian Portuguese
Authors:
Edresson Casanova,
Arnaldo Candido Junior,
Christopher Shulby,
Frederico Santos de Oliveira,
João Paulo Teixeira,
Moacir Antonelli Ponti,
Sandra Maria Aluisio
Abstract:
Speech provides a natural way for human-computer interaction. In particular, speech synthesis systems are popular in different applications, such as personal assistants, GPS applications, screen readers and accessibility tools. However, not all languages are on the same level when in terms of resources and systems for speech synthesis. This work consists of creating publicly available resources fo…
▽ More
Speech provides a natural way for human-computer interaction. In particular, speech synthesis systems are popular in different applications, such as personal assistants, GPS applications, screen readers and accessibility tools. However, not all languages are on the same level when in terms of resources and systems for speech synthesis. This work consists of creating publicly available resources for Brazilian Portuguese in the form of a novel dataset along with deep learning models for end-to-end speech synthesis. Such dataset has 10.5 hours from a single speaker, from which a Tacotron 2 model with the RTISI-LA vocoder presented the best performance, achieving a 4.03 MOS value. The obtained results are comparable to related works covering English language and the state-of-the-art in Portuguese.
△ Less
Submitted 29 January, 2022; v1 submitted 11 May, 2020;
originally announced May 2020.
-
Forecasting in Non-stationary Environments with Fuzzy Time Series
Authors:
Petrônio Cândido de Lima e Silva,
Carlos Alberto Severiano Junior,
Marcos Antonio Alves,
Rodrigo Silva,
Miri Weiss Cohen,
Frederico Gadelha Guimarães
Abstract:
In this paper we introduce a Non-Stationary Fuzzy Time Series (NSFTS) method with time varying parameters adapted from the distribution of the data. In this approach, we employ Non-Stationary Fuzzy Sets, in which perturbation functions are used to adapt the membership function parameters in the knowledge base in response to statistical changes in the time series. The proposed method is capable of…
▽ More
In this paper we introduce a Non-Stationary Fuzzy Time Series (NSFTS) method with time varying parameters adapted from the distribution of the data. In this approach, we employ Non-Stationary Fuzzy Sets, in which perturbation functions are used to adapt the membership function parameters in the knowledge base in response to statistical changes in the time series. The proposed method is capable of dynamically adapting its fuzzy sets to reflect the changes in the stochastic process based on the residual errors, without the need to retraining the model. This method can handle non-stationary and heteroskedastic data as well as scenarios with concept-drift. The proposed approach allows the model to be trained only once and remain useful long after while keeping reasonable accuracy. The flexibility of the method by means of computational experiments was tested with eight synthetic non-stationary time series data with several kinds of concept drifts, four real market indices (Dow Jones, NASDAQ, SP500 and TAIEX), three real FOREX pairs (EUR-USD, EUR-GBP, GBP-USD), and two real cryptocoins exchange rates (Bitcoin-USD and Ethereum-USD). As competitor models the Time Variant fuzzy time series and the Incremental Ensemble were used, these are two of the major approaches for handling non-stationary data sets. Non-parametric tests are employed to check the significance of the results. The proposed method shows resilience to concept drift, by adapting parameters of the model, while preserving the symbolic structure of the knowledge base.
△ Less
Submitted 26 April, 2020;
originally announced April 2020.
-
CNN Encoder to Reduce the Dimensionality of Data Image for Motion Planning
Authors:
Janderson Ferreira,
Agostinho A. F. Júnior,
Yves M. Galvão,
Bruno J. T. Fernandes,
Pablo Barros
Abstract:
Many real-world applications need path planning algorithms to solve tasks in different areas, such as social applications, autonomous cars, and tracking activities. And most importantly motion planning. Although the use of path planning is sufficient in most motion planning scenarios, they represent potential bottlenecks in large environments with dynamic changes. To tackle this problem, the numbe…
▽ More
Many real-world applications need path planning algorithms to solve tasks in different areas, such as social applications, autonomous cars, and tracking activities. And most importantly motion planning. Although the use of path planning is sufficient in most motion planning scenarios, they represent potential bottlenecks in large environments with dynamic changes. To tackle this problem, the number of possible routes could be reduced to make it easier for path planning algorithms to find the shortest path with less efforts. An traditional algorithm for path planning is the A*, it uses an heuristic to work faster than other solutions. In this work, we propose a CNN encoder capable of eliminating useless routes for motion planning problems, then we combine the proposed neural network output with A*. To measure the efficiency of our solution, we propose a database with different scenarios of motion planning problems. The evaluated metric is the number of the iterations to find the shortest path. The A* was compared with the CNN Encoder (proposal) with A*. In all evaluated scenarios, our solution reduced the number of iterations by more than 60\%.
△ Less
Submitted 10 April, 2020;
originally announced April 2020.
-
Speech2Phone: A Novel and Efficient Method for Training Speaker Recognition Models
Authors:
Edresson Casanova,
Arnaldo Candido Junior,
Christopher Shulby,
Frederico Santos de Oliveira,
Lucas Rafael Stefanel Gris,
Hamilton Pereira da Silva,
Sandra Maria Aluisio,
Moacir Antonelli Ponti
Abstract:
In this paper we present an efficient method for training models for speaker recognition using small or under-resourced datasets. This method requires less data than other SOTA (State-Of-The-Art) methods, e.g. the Angular Prototypical and GE2E loss functions, while achieving similar results to those methods. This is done using the knowledge of the reconstruction of a phoneme in the speaker's voice…
▽ More
In this paper we present an efficient method for training models for speaker recognition using small or under-resourced datasets. This method requires less data than other SOTA (State-Of-The-Art) methods, e.g. the Angular Prototypical and GE2E loss functions, while achieving similar results to those methods. This is done using the knowledge of the reconstruction of a phoneme in the speaker's voice. For this purpose, a new dataset was built, composed of 40 male speakers, who read sentences in Portuguese, totaling approximately 3h. We compare the three best architectures trained using our method to select the best one, which is the one with a shallow architecture. Then, we compared this model with the SOTA method for the speaker recognition task: the Fast ResNet-34 trained with approximately 2,000 hours, using the loss functions Angular Prototypical and GE2E. Three experiments were carried out with datasets in different languages. Among these three experiments, our model achieved the second best result in two experiments and the best result in one of them. This highlights the importance of our method, which proved to be a great competitor to SOTA speaker recognition models, with 500x less data and a simpler approach.
△ Less
Submitted 18 June, 2021; v1 submitted 25 February, 2020;
originally announced February 2020.
-
Using Mapping Languages for Building Legal Knowledge Graphs from XML Files
Authors:
Ademar Crotti Junior,
Fabrizio Orlandi,
Declan O'Sullivan,
Christian Dirschl,
Quentin Reul
Abstract:
This paper presents our experience on building RDF knowledge graphs for an industrial use case in the legal domain. The information contained in legal information systems are often accessed through simple keyword interfaces and presented as a simple list of hits. In order to improve search accuracy one may avail of knowledge graphs, where the semantics of the data can be made explicit. Significant…
▽ More
This paper presents our experience on building RDF knowledge graphs for an industrial use case in the legal domain. The information contained in legal information systems are often accessed through simple keyword interfaces and presented as a simple list of hits. In order to improve search accuracy one may avail of knowledge graphs, where the semantics of the data can be made explicit. Significant research effort has been invested in the area of building knowledge graphs from semi-structured text documents, such as XML, with the prevailing approach being the use of mapping languages. In this paper, we present a semantic model for representing legal documents together with an industrial use case. We also present a set of use case requirements based on the proposed semantic model, which are used to compare and discuss the use of state-of-the-art mapping languages for building knowledge graphs for legal data.
△ Less
Submitted 18 November, 2019;
originally announced November 2019.
-
GeoSES -- um Índice Socioeconômico para Estudos de Saúde no Brasil
Authors:
Ligia Vizeu Barrozo,
Michel Fornaciali,
Carmen Diva Saldiva de André,
Guilherme Augusto Zimeo Morais,
Giselle Mansur,
William Cabral-Miranda,
João Ricardo Sato,
Edson Amaro Júnior
Abstract:
Objective: to define an index that summarizes the main dimensions of the socioeconomic context for research purposes, evaluation and monitoring health inequalities. Methods: the index was created from the 2010 Brazilian Demographic Census, whose variables selection was guided by theoretical references for health studies, including seven socioeconomic dimensions: education, mobility, poverty, wealt…
▽ More
Objective: to define an index that summarizes the main dimensions of the socioeconomic context for research purposes, evaluation and monitoring health inequalities. Methods: the index was created from the 2010 Brazilian Demographic Census, whose variables selection was guided by theoretical references for health studies, including seven socioeconomic dimensions: education, mobility, poverty, wealth, income, segregation and deprivation of resources and services. The index was developed using principal component analysis, and was evaluated for its construct, content and applicability components. Results: GeoSES-BR dimensions showed good association with HDI-M (above 0.85). The model with the poverty dimension best explained the relative risk of avoidable cause mortality in Brazil. In the intraurban scale, the model with GeoSES-IM was the one that best explained the relative risk of mortality from circulatory system diseases. Conclusion: GeoSES showed significant explanatory potential in the studied scales.
△ Less
Submitted 9 October, 2019;
originally announced October 2019.
-
Minimal Learning Machine: Theoretical Results and Clustering-Based Reference Point Selection
Authors:
Joonas Hämäläinen,
Alisson S. C. Alencar,
Tommi Kärkkäinen,
César L. C. Mattos,
Amauri H. Souza Júnior,
João P. P. Gomes
Abstract:
The Minimal Learning Machine (MLM) is a nonlinear supervised approach based on learning a linear mapping between distance matrices computed in the input and output data spaces, where distances are calculated using a subset of points called reference points. Its simple formulation has attracted several recent works on extensions and applications. In this paper, we aim to address some open questions…
▽ More
The Minimal Learning Machine (MLM) is a nonlinear supervised approach based on learning a linear mapping between distance matrices computed in the input and output data spaces, where distances are calculated using a subset of points called reference points. Its simple formulation has attracted several recent works on extensions and applications. In this paper, we aim to address some open questions related to the MLM. First, we detail theoretical aspects that assure the interpolation and universal approximation capabilities of the MLM, which were previously only empirically verified. Second, we identify the task of selecting reference points as having major importance for the MLM's generalization capability. Several clustering-based methods for reference point selection in regression scenarios are then proposed and analyzed. Based on an extensive empirical evaluation, we conclude that the evaluated methods are both scalable and useful. Specifically, for a small number of reference points, the clustering-based methods outperformed the standard random selection of the original MLM formulation.
△ Less
Submitted 6 October, 2020; v1 submitted 22 September, 2019;
originally announced September 2019.
-
vSDNEmul: A Software-Defined Network Emulator Based on Container Virtualization
Authors:
Fernando N. N. Farias,
Antônio de O. Junior,
Leonardo B. da Costa,
Billy A. Pinheiro,
Antônio J. G. Abelém
Abstract:
The main issue related to Software-Defined Network emulators is how to replicate real behavior in experiments. Mininet and others SDN emulators have an architecture that limits both the scope of experiments and the fidelity of networking tests. Consequently, the serialization, contention, and load of background processes may produce delays that compromise the operation of events such as transmitti…
▽ More
The main issue related to Software-Defined Network emulators is how to replicate real behavior in experiments. Mininet and others SDN emulators have an architecture that limits both the scope of experiments and the fidelity of networking tests. Consequently, the serialization, contention, and load of background processes may produce delays that compromise the operation of events such as transmitting a packet or completing a computation, possibly invalidating the performance evaluation of a network emulation. To address these problems, this paper presents vSDNEmul, a network emulator based on Docker container virtualization. Different from Mininet, vSDNEmul isolates each node in a container and interconnects the nodes through virtual or tunnel links. By using containers, vSDNEmul allows autonomous and flexible creation of independent network elements, resulting in more realistic emulations. This paper reports performance evaluations comparing vSDNEmul and Mininet. The results obtained with the vSDNEmul emulator are more realistic and present higher accuracy.
△ Less
Submitted 28 August, 2019;
originally announced August 2019.
-
Trajectory-Based Urban Air Mobility (UAM) Operations Simulator (TUS)
Authors:
Euclides C. Pinto Neto,
Derick M. Baum,
Jorge Rady de Almeida Junior,
João Batista Camargo Junior,
Paulo Sérgio Cugnasca
Abstract:
Nowadays, the demand for optimized services in urban environments to provide better society wellness is increasing. In this sense, ground transportation in dense urban environments has been facing challenges for many years (e.g., congestion and resilience). One import outcome of the effort made toward the creation of new concepts for enhancing urban transportation is the Urban Air Mobility (UAM) c…
▽ More
Nowadays, the demand for optimized services in urban environments to provide better society wellness is increasing. In this sense, ground transportation in dense urban environments has been facing challenges for many years (e.g., congestion and resilience). One import outcome of the effort made toward the creation of new concepts for enhancing urban transportation is the Urban Air Mobility (UAM) concept. UAM aims at enhancing city transportation services using manned and unmanned vehicles. However, these operations bring many challenges to be faced, e.g., the interaction between the controller agent and autonomous vehicles. Furthermore, trajectory planning is not a simple task due to several factors. Firstly, the trajectories must consider a reduced minimum separation as eVTOL vehicle are expected to operate in complex urban environments. This leads the trajectory planning process to observe safety primitives more restrictively once the airspace is expected to comport many vehicles that follow small minimum separation standards. Thereupon, the main goal of the Trajectory-Based UAM Operations Simulator (TUS) is to simulate the Trajectory-Based UAM operations in urban environments considering the presence of both manned and unmanned eVTOL vehicles. For this, a Discrete Event Simulation (DES) approach is adopted, which considers an input (i.e., the eVTOL vehicles, their origin and destination, and their respective trajectories) and produces an output (which describes if the trajectories are safe and the elapsed operation time). The main contribution of this simulation tool is to provide a simulated environment for testing and measuring the effectiveness (e.g., flight duration) of trajectories planned for eVTOL vehicles.
△ Less
Submitted 22 August, 2019;
originally announced August 2019.