-
Deep Learning Brasil at ABSAPT 2022: Portuguese Transformer Ensemble Approaches
Authors:
Juliana Resplande Santanna Gomes,
Eduardo Augusto Santos Garcia,
Adalberto Ferreira Barbosa Junior,
Ruan Chaves Rodrigues,
Diogo Fernandes Costa Silva,
Dyonnatan Ferreira Maia,
Nádia Félix Felipe da Silva,
Arlindo Rodrigues Galvão Filho,
Anderson da Silva Soares
Abstract:
Aspect-based Sentiment Analysis (ABSA) is a task whose objective is to classify the individual sentiment polarity of all entities, called aspects, in a sentence. The task is composed of two subtasks: Aspect Term Extraction (ATE), identify all aspect terms in a sentence; and Sentiment Orientation Extraction (SOE), given a sentence and its aspect terms, the task is to determine the sentiment polarit…
▽ More
Aspect-based Sentiment Analysis (ABSA) is a task whose objective is to classify the individual sentiment polarity of all entities, called aspects, in a sentence. The task is composed of two subtasks: Aspect Term Extraction (ATE), identify all aspect terms in a sentence; and Sentiment Orientation Extraction (SOE), given a sentence and its aspect terms, the task is to determine the sentiment polarity of each aspect term (positive, negative or neutral). This article presents we present our participation in Aspect-Based Sentiment Analysis in Portuguese (ABSAPT) 2022 at IberLEF 2022. We submitted the best performing systems, achieving new state-of-the-art results on both subtasks.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
No Pattern, No Recognition: a Survey about Reproducibility and Distortion Issues of Text Clustering and Topic Modeling
Authors:
Marília Costa Rosendo Silva,
Felipe Alves Siqueira,
João Pedro Mantovani Tarrega,
João Vitor Pataca Beinotti,
Augusto Sousa Nunes,
Miguel de Mattos Gardini,
Vinícius Adolfo Pereira da Silva,
Nádia Félix Felipe da Silva,
André Carlos Ponce de Leon Ferreira de Carvalho
Abstract:
Extracting knowledge from unlabeled texts using machine learning algorithms can be complex. Document categorization and information retrieval are two applications that may benefit from unsupervised learning (e.g., text clustering and topic modeling), including exploratory data analysis. However, the unsupervised learning paradigm poses reproducibility issues. The initialization can lead to variabi…
▽ More
Extracting knowledge from unlabeled texts using machine learning algorithms can be complex. Document categorization and information retrieval are two applications that may benefit from unsupervised learning (e.g., text clustering and topic modeling), including exploratory data analysis. However, the unsupervised learning paradigm poses reproducibility issues. The initialization can lead to variability depending on the machine learning algorithm. Furthermore, the distortions can be misleading when regarding cluster geometry. Amongst the causes, the presence of outliers and anomalies can be a determining factor. Despite the relevance of initialization and outlier issues for text clustering and topic modeling, the authors did not find an in-depth analysis of them. This survey provides a systematic literature review (2011-2022) of these subareas and proposes a common terminology since similar procedures have different terms. The authors describe research opportunities, trends, and open issues. The appendices summarize the theoretical background of the text vectorization, the factorization, and the clustering algorithms that are directly or indirectly related to the reviewed works.
△ Less
Submitted 2 August, 2022;
originally announced August 2022.
-
Deep Learning Brasil -- NLP at SemEval-2020 Task 9: Overview of Sentiment Analysis of Code-Mixed Tweets
Authors:
Manoel Veríssimo dos Santos Neto,
Ayrton Denner da Silva Amaral,
Nádia Félix Felipe da Silva,
Anderson da Silva Soares
Abstract:
In this paper, we describe a methodology to predict sentiment in code-mixed tweets (hindi-english). Our team called verissimo.manoel in CodaLab developed an approach based on an ensemble of four models (MultiFiT, BERT, ALBERT, and XLNET). The final classification algorithm was an ensemble of some predictions of all softmax values from these four models. This architecture was used and evaluated in…
▽ More
In this paper, we describe a methodology to predict sentiment in code-mixed tweets (hindi-english). Our team called verissimo.manoel in CodaLab developed an approach based on an ensemble of four models (MultiFiT, BERT, ALBERT, and XLNET). The final classification algorithm was an ensemble of some predictions of all softmax values from these four models. This architecture was used and evaluated in the context of the SemEval 2020 challenge (task 9), and our system got 72.7% on the F1 score.
△ Less
Submitted 28 July, 2020;
originally announced August 2020.