-
Fine-tuning with HED-IT: The impact of human post-editing for dialogical language models
Authors:
Daniela Occhipinti,
Michele Marchi,
Irene Mondella,
Huiyuan Lai,
Felice Dell'Orletta,
Malvina Nissim,
Marco Guerini
Abstract:
Automatic methods for generating and gathering linguistic data have proven effective for fine-tuning Language Models (LMs) in languages less resourced than English. Still, while there has been emphasis on data quantity, less attention has been given to its quality. In this work, we investigate the impact of human intervention on machine-generated data when fine-tuning dialogical models. In particu…
▽ More
Automatic methods for generating and gathering linguistic data have proven effective for fine-tuning Language Models (LMs) in languages less resourced than English. Still, while there has been emphasis on data quantity, less attention has been given to its quality. In this work, we investigate the impact of human intervention on machine-generated data when fine-tuning dialogical models. In particular, we study (1) whether post-edited dialogues exhibit higher perceived quality compared to the originals that were automatically generated; (2) whether fine-tuning with post-edited dialogues results in noticeable differences in the generated outputs; and (3) whether post-edited dialogues influence the outcomes when considering the parameter size of the LMs. To this end we created HED-IT, a large-scale dataset where machine-generated dialogues are paired with the version post-edited by humans. Using both the edited and unedited portions of HED-IT, we fine-tuned three different sizes of an LM. Results from both human and automatic evaluation show that the different quality of training data is clearly perceived and it has an impact also on the models trained on such data. Additionally, our findings indicate that larger models are less sensitive to data quality, whereas this has a crucial impact on smaller models. These results enhance our comprehension of the impact of human intervention on training data in the development of high-quality LMs.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
NLP for Counterspeech against Hate: A Survey and How-To Guide
Authors:
Helena Bonaldi,
Yi-Ling Chung,
Gavin Abercrombie,
Marco Guerini
Abstract:
In recent years, counterspeech has emerged as one of the most promising strategies to fight online hate. These non-escalatory responses tackle online abuse while preserving the freedom of speech of the users, and can have a tangible impact in reducing online and offline violence. Recently, there has been growing interest from the Natural Language Processing (NLP) community in addressing the challe…
▽ More
In recent years, counterspeech has emerged as one of the most promising strategies to fight online hate. These non-escalatory responses tackle online abuse while preserving the freedom of speech of the users, and can have a tangible impact in reducing online and offline violence. Recently, there has been growing interest from the Natural Language Processing (NLP) community in addressing the challenges of analysing, collecting, classifying, and automatically generating counterspeech, to reduce the huge burden of manually producing it. In particular, researchers have taken different directions in addressing these challenges, thus providing a variety of related tasks and resources. In this paper, we provide a guide for doing research on counterspeech, by describing - with detailed examples - the steps to undertake, and providing best practices that can be learnt from the NLP studies on this topic. Finally, we discuss open challenges and future directions of counterspeech research in NLP.
△ Less
Submitted 29 March, 2024;
originally announced March 2024.
-
Basque and Spanish Counter Narrative Generation: Data Creation and Evaluation
Authors:
Jaione Bengoetxea,
Yi-Ling Chung,
Marco Guerini,
Rodrigo Agerri
Abstract:
Counter Narratives (CNs) are non-negative textual responses to Hate Speech (HS) aiming at defusing online hatred and mitigating its spreading across media. Despite the recent increase in HS content posted online, research on automatic CN generation has been relatively scarce and predominantly focused on English. In this paper, we present CONAN-EUS, a new Basque and Spanish dataset for CN generatio…
▽ More
Counter Narratives (CNs) are non-negative textual responses to Hate Speech (HS) aiming at defusing online hatred and mitigating its spreading across media. Despite the recent increase in HS content posted online, research on automatic CN generation has been relatively scarce and predominantly focused on English. In this paper, we present CONAN-EUS, a new Basque and Spanish dataset for CN generation developed by means of Machine Translation (MT) and professional post-edition. Being a parallel corpus, also with respect to the original English CONAN, it allows to perform novel research on multilingual and crosslingual automatic generation of CNs. Our experiments on CN generation with mT5, a multilingual encoder-decoder model, show that generation greatly benefits from training on post-edited data, as opposed to relying on silver MT data only. These results are confirmed by their correlation with a qualitative manual evaluation, demonstrating that manually revised training data remains crucial for the quality of the generated CNs. Furthermore, multilingual data augmentation improves results over monolingual settings for structurally similar languages such as English and Spanish, while being detrimental for Basque, a language isolate. Similar findings occur in zero-shot crosslingual evaluations, where model transfer (fine-tuning in English and generating in a different target language) outperforms fine-tuning mT5 on machine translated data for Spanish but not for Basque. This provides an interesting insight into the asymmetry in the multilinguality of generative models, a challenging topic which is still open to research.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
Putting Context in Context: the Impact of Discussion Structure on Text Classification
Authors:
Nicolò Penzo,
Antonio Longa,
Bruno Lepri,
Sara Tonelli,
Marco Guerini
Abstract:
Current text classification approaches usually focus on the content to be classified. Contextual aspects (both linguistic and extra-linguistic) are usually neglected, even in tasks based on online discussions. Still in many cases the multi-party and multi-turn nature of the context from which these elements are selected can be fruitfully exploited. In this work, we propose a series of experiments…
▽ More
Current text classification approaches usually focus on the content to be classified. Contextual aspects (both linguistic and extra-linguistic) are usually neglected, even in tasks based on online discussions. Still in many cases the multi-party and multi-turn nature of the context from which these elements are selected can be fruitfully exploited. In this work, we propose a series of experiments on a large dataset for stance detection in English, in which we evaluate the contribution of different types of contextual information, i.e. linguistic, structural and temporal, by feeding them as natural language input into a transformer-based model. We also experiment with different amounts of training data and analyse the topology of local discussion networks in a privacy-compliant way. Results show that structural information can be highly beneficial to text classification but only under certain circumstances (e.g. depending on the amount of training data and on discussion chain complexity). Indeed, we show that contextual information on smaller datasets from other classification tasks does not yield significant improvements. Our framework, based on local discussion networks, allows the integration of structural information, while minimising user profiling, thus preserving their privacy.
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
Countering Misinformation via Emotional Response Generation
Authors:
Daniel Russo,
Shane Peter Kaszefski-Yaschuk,
Jacopo Staiano,
Marco Guerini
Abstract:
The proliferation of misinformation on social media platforms (SMPs) poses a significant danger to public health, social cohesion and ultimately democracy. Previous research has shown how social correction can be an effective way to curb misinformation, by engaging directly in a constructive dialogue with users who spread -- often in good faith -- misleading messages. Although professional fact-ch…
▽ More
The proliferation of misinformation on social media platforms (SMPs) poses a significant danger to public health, social cohesion and ultimately democracy. Previous research has shown how social correction can be an effective way to curb misinformation, by engaging directly in a constructive dialogue with users who spread -- often in good faith -- misleading messages. Although professional fact-checkers are crucial to debunking viral claims, they usually do not engage in conversations on social media. Thereby, significant effort has been made to automate the use of fact-checker material in social correction; however, no previous work has tried to integrate it with the style and pragmatics that are commonly employed in social media communication. To fill this gap, we present VerMouth, the first large-scale dataset comprising roughly 12 thousand claim-response pairs (linked to debunking articles), accounting for both SMP-style and basic emotions, two factors which have a significant role in misinformation credibility and spreading. To collect this dataset we used a technique based on an author-reviewer pipeline, which efficiently combines LLMs and human annotators to obtain high-quality data. We also provide comprehensive experiments showing how models trained on our proposed dataset have significant improvements in terms of output quality and generalization capabilities.
△ Less
Submitted 17 November, 2023;
originally announced November 2023.
-
PRODIGy: a PROfile-based DIalogue Generation dataset
Authors:
Daniela Occhipinti,
Serra Sinem Tekiroglu,
Marco Guerini
Abstract:
Providing dialogue agents with a profile representation can improve their consistency and coherence, leading to better conversations. However, current profile-based dialogue datasets for training such agents contain either explicit profile representations that are simple and dialogue-specific, or implicit representations that are difficult to collect. In this work, we propose a unified framework i…
▽ More
Providing dialogue agents with a profile representation can improve their consistency and coherence, leading to better conversations. However, current profile-based dialogue datasets for training such agents contain either explicit profile representations that are simple and dialogue-specific, or implicit representations that are difficult to collect. In this work, we propose a unified framework in which we bring together both standard and more sophisticated profile representations by creating a new resource where each dialogue is aligned with all possible speaker representations such as communication style, biographies, and personality. This framework allows to test several baselines built using generative language models with several profile configurations. The automatic evaluation shows that profile-based models have better generalisation capabilities than models trained on dialogues only, both in-domain and cross-domain settings. These results are consistent for fine-tuned models and instruction-based LLMs. Additionally, human evaluation demonstrates a clear preference for generations consistent with both profile and context. Finally, to account for possible privacy concerns, all experiments are done under two configurations: inter-character and intra-character. In the former, the LM stores the information about the character in its internal representation, while in the latter, the LM does not retain any personal information but uses it only at inference time.
△ Less
Submitted 9 November, 2023;
originally announced November 2023.
-
Weigh Your Own Words: Improving Hate Speech Counter Narrative Generation via Attention Regularization
Authors:
Helena Bonaldi,
Giuseppe Attanasio,
Debora Nozza,
Marco Guerini
Abstract:
Recent computational approaches for combating online hate speech involve the automatic generation of counter narratives by adapting Pretrained Transformer-based Language Models (PLMs) with human-curated data. This process, however, can produce in-domain overfitting, resulting in models generating acceptable narratives only for hatred similar to training data, with little portability to other targe…
▽ More
Recent computational approaches for combating online hate speech involve the automatic generation of counter narratives by adapting Pretrained Transformer-based Language Models (PLMs) with human-curated data. This process, however, can produce in-domain overfitting, resulting in models generating acceptable narratives only for hatred similar to training data, with little portability to other targets or to real-world toxic language. This paper introduces novel attention regularization methodologies to improve the generalization capabilities of PLMs for counter narratives generation. Overfitting to training-specific terms is then discouraged, resulting in more diverse and richer narratives. We experiment with two attention-based regularization techniques on a benchmark English dataset. Regularized models produce better counter narratives than state-of-the-art approaches in most cases, both in terms of automatic metrics and human evaluation, especially when hateful targets are not present in the training data. This work paves the way for better and more flexible counter-speech generation models, a task for which datasets are highly challenging to produce.
△ Less
Submitted 5 September, 2023;
originally announced September 2023.
-
Benchmarking the Generation of Fact Checking Explanations
Authors:
Daniel Russo,
Serra Sinem Tekiroglu,
Marco Guerini
Abstract:
Fighting misinformation is a challenging, yet crucial, task. Despite the growing number of experts being involved in manual fact-checking, this activity is time-consuming and cannot keep up with the ever-increasing amount of Fake News produced daily. Hence, automating this process is necessary to help curb misinformation. Thus far, researchers have mainly focused on claim veracity classification.…
▽ More
Fighting misinformation is a challenging, yet crucial, task. Despite the growing number of experts being involved in manual fact-checking, this activity is time-consuming and cannot keep up with the ever-increasing amount of Fake News produced daily. Hence, automating this process is necessary to help curb misinformation. Thus far, researchers have mainly focused on claim veracity classification. In this paper, instead, we address the generation of justifications (textual explanation of why a claim is classified as either true or false) and benchmark it with novel datasets and advanced baselines. In particular, we focus on summarization approaches over unstructured knowledge (i.e. news articles) and we experiment with several extractive and abstractive strategies. We employed two datasets with different styles and structures, in order to assess the generalizability of our findings. Results show that in justification production summarization benefits from the claim information, and, in particular, that a claim-driven extractive step improves abstractive summarization performances. Finally, we show that although cross-dataset experiments suffer from performance degradation, a unique model trained on a combination of the two datasets is able to retain style information in an efficient manner.
△ Less
Submitted 29 August, 2023;
originally announced August 2023.
-
Human-Machine Collaboration Approaches to Build a Dialogue Dataset for Hate Speech Countering
Authors:
Helena Bonaldi,
Sara Dellantonio,
Serra Sinem Tekiroglu,
Marco Guerini
Abstract:
Fighting online hate speech is a challenge that is usually addressed using Natural Language Processing via automatic detection and removal of hate content. Besides this approach, counter narratives have emerged as an effective tool employed by NGOs to respond to online hate on social media platforms. For this reason, Natural Language Generation is currently being studied as a way to automatize cou…
▽ More
Fighting online hate speech is a challenge that is usually addressed using Natural Language Processing via automatic detection and removal of hate content. Besides this approach, counter narratives have emerged as an effective tool employed by NGOs to respond to online hate on social media platforms. For this reason, Natural Language Generation is currently being studied as a way to automatize counter narrative writing. However, the existing resources necessary to train NLG models are limited to 2-turn interactions (a hate speech and a counter narrative as response), while in real life, interactions can consist of multiple turns. In this paper, we present a hybrid approach for dialogical data collection, which combines the intervention of human expert annotators over machine generated dialogues obtained using 19 different configurations. The result of this work is DIALOCONAN, the first dataset comprising over 3000 fictitious multi-turn dialogues between a hater and an NGO operator, covering 6 targets of hate.
△ Less
Submitted 7 November, 2022;
originally announced November 2022.
-
Using Pre-Trained Language Models for Producing Counter Narratives Against Hate Speech: a Comparative Study
Authors:
Serra Sinem Tekiroglu,
Helena Bonaldi,
Margherita Fanton,
Marco Guerini
Abstract:
In this work, we present an extensive study on the use of pre-trained language models for the task of automatic Counter Narrative (CN) generation to fight online hate speech in English. We first present a comparative study to determine whether there is a particular Language Model (or class of LMs) and a particular decoding mechanism that are the most appropriate to generate CNs. Findings show that…
▽ More
In this work, we present an extensive study on the use of pre-trained language models for the task of automatic Counter Narrative (CN) generation to fight online hate speech in English. We first present a comparative study to determine whether there is a particular Language Model (or class of LMs) and a particular decoding mechanism that are the most appropriate to generate CNs. Findings show that autoregressive models combined with stochastic decodings are the most promising. We then investigate how an LM performs in generating a CN with regard to an unseen target of hate. We find out that a key element for successful `out of target' experiments is not an overall similarity with the training data but the presence of a specific subset of training data, i.e. a target that shares some commonalities with the test target that can be defined a-priori. We finally introduce the idea of a pipeline based on the addition of an automatic post-editing step to refine generated CNs.
△ Less
Submitted 4 April, 2022;
originally announced April 2022.
-
Multilingual Counter Narrative Type Classification
Authors:
Yi-Ling Chung,
Marco Guerini,
Rodrigo Agerri
Abstract:
The growing interest in employing counter narratives for hatred intervention brings with it a focus on dataset creation and automation strategies. In this scenario, learning to recognize counter narrative types from natural text is expected to be useful for applications such as hate speech countering, where operators from non-governmental organizations are supposed to answer to hate with several a…
▽ More
The growing interest in employing counter narratives for hatred intervention brings with it a focus on dataset creation and automation strategies. In this scenario, learning to recognize counter narrative types from natural text is expected to be useful for applications such as hate speech countering, where operators from non-governmental organizations are supposed to answer to hate with several and diverse arguments that can be mined from online sources. This paper presents the first multilingual work on counter narrative type classification, evaluating SoTA pre-trained language models in monolingual, multilingual and cross-lingual settings. When considering a fine-grained annotation of counter narrative classes, we report strong baseline classification results for the majority of the counter narrative types, especially if we translate every language to English before cross-lingual prediction. This suggests that knowledge about counter narratives can be successfully transferred across languages.
△ Less
Submitted 28 September, 2021;
originally announced September 2021.
-
Agreeing to Disagree: Annotating Offensive Language Datasets with Annotators' Disagreement
Authors:
Elisa Leonardelli,
Stefano Menini,
Alessio Palmero Aprosio,
Marco Guerini,
Sara Tonelli
Abstract:
Since state-of-the-art approaches to offensive language detection rely on supervised learning, it is crucial to quickly adapt them to the continuously evolving scenario of social media. While several approaches have been proposed to tackle the problem from an algorithmic perspective, so to reduce the need for annotated data, less attention has been paid to the quality of these data. Following a tr…
▽ More
Since state-of-the-art approaches to offensive language detection rely on supervised learning, it is crucial to quickly adapt them to the continuously evolving scenario of social media. While several approaches have been proposed to tackle the problem from an algorithmic perspective, so to reduce the need for annotated data, less attention has been paid to the quality of these data. Following a trend that has emerged recently, we focus on the level of agreement among annotators while selecting data to create offensive language datasets, a task involving a high level of subjectivity. Our study comprises the creation of three novel datasets of English tweets covering different topics and having five crowd-sourced judgments each. We also present an extensive set of experiments showing that selecting training and test data according to different levels of annotators' agreement has a strong effect on classifiers performance and robustness. Our findings are further validated in cross-domain experiments and studied using a popular benchmark dataset. We show that such hard cases, where low agreement is present, are not necessarily due to poor-quality annotation and we advocate for a higher presence of ambiguous cases in future datasets, particularly in test sets, to better account for the different points of view expressed online.
△ Less
Submitted 28 September, 2021;
originally announced September 2021.
-
Human-in-the-Loop for Data Collection: a Multi-Target Counter Narrative Dataset to Fight Online Hate Speech
Authors:
Margherita Fanton,
Helena Bonaldi,
Serra Sinem Tekiroglu,
Marco Guerini
Abstract:
Undermining the impact of hateful content with informed and non-aggressive responses, called counter narratives, has emerged as a possible solution for having healthier online communities. Thus, some NLP studies have started addressing the task of counter narrative generation. Although such studies have made an effort to build hate speech / counter narrative (HS/CN) datasets for neural generation,…
▽ More
Undermining the impact of hateful content with informed and non-aggressive responses, called counter narratives, has emerged as a possible solution for having healthier online communities. Thus, some NLP studies have started addressing the task of counter narrative generation. Although such studies have made an effort to build hate speech / counter narrative (HS/CN) datasets for neural generation, they fall short in reaching either high-quality and/or high-quantity. In this paper, we propose a novel human-in-the-loop data collection methodology in which a generative language model is refined iteratively by using its own data from the previous loops to generate new training samples that experts review and/or post-edit. Our experiments comprised several loops including dynamic variations. Results show that the methodology is scalable and facilitates diverse, novel, and cost-effective data collection. To our knowledge, the resulting dataset is the only expert-based multi-target HS/CN dataset available to the community.
△ Less
Submitted 19 July, 2021;
originally announced July 2021.
-
Empowering NGOs in Countering Online Hate Messages
Authors:
Yi-Ling Chung,
Serra Sinem Tekiroglu,
Sara Tonelli,
Marco Guerini
Abstract:
Studies on online hate speech have mostly focused on the automated detection of harmful messages. Little attention has been devoted so far to the development of effective strategies to fight hate speech, in particular through the creation of counter-messages. While existing manual scrutiny and intervention strategies are time-consuming and not scalable, advances in natural language processing have…
▽ More
Studies on online hate speech have mostly focused on the automated detection of harmful messages. Little attention has been devoted so far to the development of effective strategies to fight hate speech, in particular through the creation of counter-messages. While existing manual scrutiny and intervention strategies are time-consuming and not scalable, advances in natural language processing have the potential to provide a systematic approach to hatred management. In this paper, we introduce a novel ICT platform that NGO operators can use to monitor and analyze social media data, along with a counter-narrative suggestion tool. Our platform aims at increasing the efficiency and effectiveness of operators' activities against islamophobia. We test the platform with more than one hundred NGO operators in three countries through qualitative and quantitative evaluation. Results show that NGOs favor the platform solution with the suggestion tool, and that the time required to produce counter-narratives significantly decreases.
△ Less
Submitted 6 July, 2021;
originally announced July 2021.
-
Towards Knowledge-Grounded Counter Narrative Generation for Hate Speech
Authors:
Yi-Ling Chung,
Serra Sinem Tekiroglu,
Marco Guerini
Abstract:
Tackling online hatred using informed textual responses - called counter narratives - has been brought under the spotlight recently. Accordingly, a research line has emerged to automatically generate counter narratives in order to facilitate the direct intervention in the hate discussion and to prevent hate content from further spreading. Still, current neural approaches tend to produce generic/re…
▽ More
Tackling online hatred using informed textual responses - called counter narratives - has been brought under the spotlight recently. Accordingly, a research line has emerged to automatically generate counter narratives in order to facilitate the direct intervention in the hate discussion and to prevent hate content from further spreading. Still, current neural approaches tend to produce generic/repetitive responses and lack grounded and up-to-date evidence such as facts, statistics, or examples. Moreover, these models can create plausible but not necessarily true arguments. In this paper we present the first complete knowledge-bound counter narrative generation pipeline, grounded in an external knowledge repository that can provide more informative content to fight online hatred. Together with our approach, we present a series of experiments that show its feasibility to produce suitable and informative counter narratives in in-domain and cross-domain settings.
△ Less
Submitted 22 June, 2021;
originally announced June 2021.
-
Toward Stance-based Personas for Opinionated Dialogues
Authors:
Thomas Scialom,
Serra Sinem Tekiroglu,
Jacopo Staiano,
Marco Guerini
Abstract:
In the context of chit-chat dialogues it has been shown that endowing systems with a persona profile is important to produce more coherent and meaningful conversations. Still, the representation of such personas has thus far been limited to a fact-based representation (e.g. "I have two cats."). We argue that these representations remain superficial w.r.t. the complexity of human personality. In th…
▽ More
In the context of chit-chat dialogues it has been shown that endowing systems with a persona profile is important to produce more coherent and meaningful conversations. Still, the representation of such personas has thus far been limited to a fact-based representation (e.g. "I have two cats."). We argue that these representations remain superficial w.r.t. the complexity of human personality. In this work, we propose to make a step forward and investigate stance-based persona, trying to grasp more profound characteristics, such as opinions, values, and beliefs to drive language generation. To this end, we introduce a novel dataset allowing to explore different stance-based persona representations and their impact on claim generation, showing that they are able to grasp abstract and profound aspects of the author persona.
△ Less
Submitted 7 October, 2020;
originally announced October 2020.
-
GePpeTto Carves Italian into a Language Model
Authors:
Lorenzo De Mattei,
Michele Cafagna,
Felice Dell'Orletta,
Malvina Nissim,
Marco Guerini
Abstract:
In the last few years, pre-trained neural architectures have provided impressive improvements across several NLP tasks. Still, generative language models are available mainly for English. We develop GePpeTto, the first generative language model for Italian, built using the GPT-2 architecture. We provide a thorough analysis of GePpeTto's quality by means of both an automatic and a human-based evalu…
▽ More
In the last few years, pre-trained neural architectures have provided impressive improvements across several NLP tasks. Still, generative language models are available mainly for English. We develop GePpeTto, the first generative language model for Italian, built using the GPT-2 architecture. We provide a thorough analysis of GePpeTto's quality by means of both an automatic and a human-based evaluation. The automatic assessment consists in (i) calculating perplexity across different genres and (ii) a profiling analysis over GePpeTto's writing characteristics. We find that GePpeTto's production is a sort of bonsai version of human production, with shorter but yet complex sentences. Human evaluation is performed over a sentence completion task, where GePpeTto's output is judged as natural more often than not, and much closer to the original human texts than to a simpler language model which we take as baseline.
△ Less
Submitted 29 April, 2020;
originally announced April 2020.
-
Generating Counter Narratives against Online Hate Speech: Data and Strategies
Authors:
Serra Sinem Tekiroglu,
Yi-Ling Chung,
Marco Guerini
Abstract:
Recently research has started focusing on avoiding undesired effects that come with content moderation, such as censorship and overblocking, when dealing with hatred online. The core idea is to directly intervene in the discussion with textual responses that are meant to counter the hate content and prevent it from further spreading. Accordingly, automation strategies, such as natural language gen…
▽ More
Recently research has started focusing on avoiding undesired effects that come with content moderation, such as censorship and overblocking, when dealing with hatred online. The core idea is to directly intervene in the discussion with textual responses that are meant to counter the hate content and prevent it from further spreading. Accordingly, automation strategies, such as natural language generation, are beginning to be investigated. Still, they suffer from the lack of sufficient amount of quality data and tend to produce generic/repetitive responses. Being aware of the aforementioned limitations, we present a study on how to collect responses to hate effectively, employing large scale unsupervised language models such as GPT-2 for the generation of silver data, and the best annotation strategies/neural architectures that can be used for data filtering before expert validation/post-editing.
△ Less
Submitted 8 April, 2020;
originally announced April 2020.
-
Generating Challenge Datasets for Task-Oriented Conversational Agents through Self-Play
Authors:
Sourabh Majumdar,
Serra Sinem Tekiroglu,
Marco Guerini
Abstract:
End-to-end neural approaches are becoming increasingly common in conversational scenarios due to their promising performances when provided with sufficient amount of data. In this paper, we present a novel methodology to address the interpretability of neural approaches in such scenarios by creating challenge datasets using dialogue self-play over multiple tasks/intents. Dialogue self-play allows…
▽ More
End-to-end neural approaches are becoming increasingly common in conversational scenarios due to their promising performances when provided with sufficient amount of data. In this paper, we present a novel methodology to address the interpretability of neural approaches in such scenarios by creating challenge datasets using dialogue self-play over multiple tasks/intents. Dialogue self-play allows generating large amount of synthetic data; by taking advantage of the complete control over the generation process, we show how neural approaches can be evaluated in terms of unseen dialogue patterns. We propose several out-of-pattern test cases each of which introduces a natural and unexpected user utterance phenomenon. As a proof of concept, we built a single and a multiple memory network, and show that these two architectures have diverse performances depending on the peculiar dialogue patterns.
△ Less
Submitted 16 October, 2019;
originally announced October 2019.
-
CONAN -- COunter NArratives through Nichesourcing: a Multilingual Dataset of Responses to Fight Online Hate Speech
Authors:
Y. L. Chung,
E. Kuzmenko,
S. S. Tekiroglu,
M. Guerini
Abstract:
Although there is an unprecedented effort to provide adequate responses in terms of laws and policies to hate content on social media platforms, dealing with hatred online is still a tough problem. Tackling hate speech in the standard way of content deletion or user suspension may be charged with censorship and overblocking. One alternate strategy, that has received little attention so far by the…
▽ More
Although there is an unprecedented effort to provide adequate responses in terms of laws and policies to hate content on social media platforms, dealing with hatred online is still a tough problem. Tackling hate speech in the standard way of content deletion or user suspension may be charged with censorship and overblocking. One alternate strategy, that has received little attention so far by the research community, is to actually oppose hate content with counter-narratives (i.e. informed textual responses). In this paper, we describe the creation of the first large-scale, multilingual, expert-based dataset of hate speech/counter-narrative pairs. This dataset has been built with the effort of more than 100 operators from three different NGOs that applied their training and expertise to the task. Together with the collected data we also provide additional annotations about expert demographics, hate and response type, and data augmentation through translation and paraphrasing. Finally, we provide initial experiments to assess the quality of our data.
△ Less
Submitted 8 October, 2019;
originally announced October 2019.
-
DepecheMood++: a Bilingual Emotion Lexicon Built Through Simple Yet Powerful Techniques
Authors:
Oscar Araque,
Lorenzo Gatti,
Jacopo Staiano,
Marco Guerini
Abstract:
Several lexica for sentiment analysis have been developed and made available in the NLP community. While most of these come with word polarity annotations (e.g. positive/negative), attempts at building lexica for finer-grained emotion analysis (e.g. happiness, sadness) have recently attracted significant attention. Such lexica are often exploited as a building block in the process of developing le…
▽ More
Several lexica for sentiment analysis have been developed and made available in the NLP community. While most of these come with word polarity annotations (e.g. positive/negative), attempts at building lexica for finer-grained emotion analysis (e.g. happiness, sadness) have recently attracted significant attention. Such lexica are often exploited as a building block in the process of developing learning models for which emotion recognition is needed, and/or used as baselines to which compare the performance of the models. In this work, we contribute two new resources to the community: a) an extension of an existing and widely used emotion lexicon for English; and b) a novel version of the lexicon targeting Italian. Furthermore, we show how simple techniques can be used, both in supervised and unsupervised experimental settings, to boost performances on datasets and tasks of varying degree of domain-specificity.
△ Less
Submitted 8 October, 2018;
originally announced October 2018.
-
Fortia-FBK at SemEval-2017 Task 5: Bullish or Bearish? Inferring Sentiment towards Brands from Financial News Headlines
Authors:
Youness Mansar,
Lorenzo Gatti,
Sira Ferradans,
Marco Guerini,
Jacopo Staiano
Abstract:
In this paper, we describe a methodology to infer Bullish or Bearish sentiment towards companies/brands. More specifically, our approach leverages affective lexica and word embeddings in combination with convolutional neural networks to infer the sentiment of financial news headlines towards a target company. Such architecture was used and evaluated in the context of the SemEval 2017 challenge (ta…
▽ More
In this paper, we describe a methodology to infer Bullish or Bearish sentiment towards companies/brands. More specifically, our approach leverages affective lexica and word embeddings in combination with convolutional neural networks to infer the sentiment of financial news headlines towards a target company. Such architecture was used and evaluated in the context of the SemEval 2017 challenge (task 5, subtask 2), in which it obtained the best performance.
△ Less
Submitted 4 April, 2017;
originally announced April 2017.
-
Why Do Urban Legends Go Viral?
Authors:
Marco Guerini,
Carlo Strapparava
Abstract:
Urban legends are a genre of modern folklore, consisting of stories about rare and exceptional events, just plausible enough to be believed, which tend to propagate inexorably across communities. In our view, while urban legends represent a form of "sticky" deceptive text, they are marked by a tension between the credible and incredible. They should be credible like a news article and incredible l…
▽ More
Urban legends are a genre of modern folklore, consisting of stories about rare and exceptional events, just plausible enough to be believed, which tend to propagate inexorably across communities. In our view, while urban legends represent a form of "sticky" deceptive text, they are marked by a tension between the credible and incredible. They should be credible like a news article and incredible like a fairy tale to go viral. In particular we will focus on the idea that urban legends should mimic the details of news (who, where, when) to be credible, while they should be emotional and readable like a fairy tale to be catchy and memorable. Using NLP tools we will provide a quantitative analysis of these prototypical characteristics. We also lay out some machine learning experiments showing that it is possible to recognize an urban legend using just these simple features.
△ Less
Submitted 22 January, 2016;
originally announced January 2016.
-
Community dynamics in connected time-dependent multilayer networks
Authors:
Marco Cristoforetti,
Marco Guerini,
Giuseppe Jurman,
Cesare Furlanello
Abstract:
Different strategies have been considered to extract information from social media about how similarly people react to the same news or event. In this context, a powerful method is offered by the application of graph techniques to the contents produced by social network users. In particular, large events typically attract enough content traffic along time to enable an analysis that explicitly mode…
▽ More
Different strategies have been considered to extract information from social media about how similarly people react to the same news or event. In this context, a powerful method is offered by the application of graph techniques to the contents produced by social network users. In particular, large events typically attract enough content traffic along time to enable an analysis that explicitly models a dependence from the time dimension. Here we demonstrate how it is possible to extend the application of community detection strategies in complex networks to the case of time-dependent multilayer networks, whenever the connection between consecutive time layers is non-trivial. We apply the method to 400K Twitter post related to the Expo event held in Milan (Italy) between May and October 2015.
△ Less
Submitted 11 November, 2015;
originally announced November 2015.
-
SentiWords: Deriving a High Precision and High Coverage Lexicon for Sentiment Analysis
Authors:
Lorenzo Gatti,
Marco Guerini,
Marco Turchi
Abstract:
Deriving prior polarity lexica for sentiment analysis - where positive or negative scores are associated with words out of context - is a challenging task. Usually, a trade-off between precision and coverage is hard to find, and it depends on the methodology used to build the lexicon. Manually annotated lexica provide a high precision but lack in coverage, whereas automatic derivation from pre-exi…
▽ More
Deriving prior polarity lexica for sentiment analysis - where positive or negative scores are associated with words out of context - is a challenging task. Usually, a trade-off between precision and coverage is hard to find, and it depends on the methodology used to build the lexicon. Manually annotated lexica provide a high precision but lack in coverage, whereas automatic derivation from pre-existing knowledge guarantees high coverage at the cost of a lower precision. Since the automatic derivation of prior polarities is less time consuming than manual annotation, there has been a great bloom of these approaches, in particular based on the SentiWordNet resource. In this paper, we compare the most frequently used techniques based on SentiWordNet with newer ones and blend them in a learning framework (a so called 'ensemble method'). By taking advantage of manually built prior polarity lexica, our ensemble method is better able to predict the prior value of unseen words and to outperform all the other SentiWordNet approaches. Using this technique we have built SentiWords, a prior polarity lexicon of approximately 155,000 words, that has both a high precision and a high coverage. We finally show that in sentiment analysis tasks, using our lexicon allows us to outperform both the single metrics derived from SentiWordNet and popular manually annotated sentiment lexica.
△ Less
Submitted 30 October, 2015;
originally announced October 2015.
-
Echoes of Persuasion: The Effect of Euphony in Persuasive Communication
Authors:
Marco Guerini,
Gözde Özbal,
Carlo Strapparava
Abstract:
While the effect of various lexical, syntactic, semantic and stylistic features have been addressed in persuasive language from a computational point of view, the persuasive effect of phonetics has received little attention. By modeling a notion of euphony and analyzing four datasets comprising persuasive and non-persuasive sentences in different domains (political speeches, movie quotes, slogans…
▽ More
While the effect of various lexical, syntactic, semantic and stylistic features have been addressed in persuasive language from a computational point of view, the persuasive effect of phonetics has received little attention. By modeling a notion of euphony and analyzing four datasets comprising persuasive and non-persuasive sentences in different domains (political speeches, movie quotes, slogans and tweets), we explore the impact of sounds on different forms of persuasiveness. We conduct a series of analyses and prediction experiments within and across datasets. Our results highlight the positive role of phonetic devices on persuasion.
△ Less
Submitted 24 August, 2015;
originally announced August 2015.
-
Deep Feelings: A Massive Cross-Lingual Study on the Relation between Emotions and Virality
Authors:
Marco Guerini,
Jacopo Staiano
Abstract:
This article provides a comprehensive investigation on the relations between virality of news articles and the emotions they are found to evoke. Virality, in our view, is a phenomenon with many facets, i.e. under this generic term several different effects of persuasive communication are comprised. By exploiting a high-coverage and bilingual corpus of documents containing metrics of their spread o…
▽ More
This article provides a comprehensive investigation on the relations between virality of news articles and the emotions they are found to evoke. Virality, in our view, is a phenomenon with many facets, i.e. under this generic term several different effects of persuasive communication are comprised. By exploiting a high-coverage and bilingual corpus of documents containing metrics of their spread on social networks as well as a massive affective annotation provided by readers, we present a thorough analysis of the interplay between evoked emotions and viral facets. We highlight and discuss our findings in light of a cross-lingual approach: while we discover differences in evoked emotions and corresponding viral effects, we provide preliminary evidence of a generalized explanatory model rooted in the deep structure of emotions: the Valence-Arousal-Dominance (VAD) circumplex. We find that viral facets appear to be consistently affected by particular VAD configurations, and these configurations indicate a clear connection with distinct phenomena underlying persuasive communication.
△ Less
Submitted 16 March, 2015;
originally announced March 2015.
-
DepecheMood: a Lexicon for Emotion Analysis from Crowd-Annotated News
Authors:
Jacopo Staiano,
Marco Guerini
Abstract:
While many lexica annotated with words polarity are available for sentiment analysis, very few tackle the harder task of emotion analysis and are usually quite limited in coverage. In this paper, we present a novel approach for extracting - in a totally automated way - a high-coverage and high-precision lexicon of roughly 37 thousand terms annotated with emotion scores, called DepecheMood. Our app…
▽ More
While many lexica annotated with words polarity are available for sentiment analysis, very few tackle the harder task of emotion analysis and are usually quite limited in coverage. In this paper, we present a novel approach for extracting - in a totally automated way - a high-coverage and high-precision lexicon of roughly 37 thousand terms annotated with emotion scores, called DepecheMood. Our approach exploits in an original way 'crowd-sourced' affective annotation implicitly provided by readers of news articles from rappler.com. By providing new state-of-the-art performances in unsupervised settings for regression and classification tasks, even using a naïve approach, our experiments show the beneficial impact of harvesting social media data for affective lexicon building.
△ Less
Submitted 7 May, 2014;
originally announced May 2014.
-
Is it morally acceptable for a system to lie to persuade me?
Authors:
Marco Guerini,
Fabio Pianesi,
Oliviero Stock
Abstract:
Given the fast rise of increasingly autonomous artificial agents and robots, a key acceptability criterion will be the possible moral implications of their actions. In particular, intelligent persuasive systems (systems designed to influence humans via communication) constitute a highly sensitive topic because of their intrinsically social nature. Still, ethical studies in this area are rare and t…
▽ More
Given the fast rise of increasingly autonomous artificial agents and robots, a key acceptability criterion will be the possible moral implications of their actions. In particular, intelligent persuasive systems (systems designed to influence humans via communication) constitute a highly sensitive topic because of their intrinsically social nature. Still, ethical studies in this area are rare and tend to focus on the output of the required action. Instead, this work focuses on the persuasive acts themselves (e.g. "is it morally acceptable that a machine lies or appeals to the emotions of a person to persuade her, even if for a good end?"). Exploiting a behavioral approach, based on human assessment of moral dilemmas -- i.e. without any prior assumption of underlying ethical theories -- this paper reports on a set of experiments. These experiments address the type of persuader (human or machine), the strategies adopted (purely argumentative, appeal to positive emotions, appeal to negative emotions, lie) and the circumstances. Findings display no differences due to the agent, mild acceptability for persuasion and reveal that truth-conditional reasoning (i.e. argument validity) is a significant dimension affecting subjects' judgment. Some implications for the design of intelligent persuasive systems are discussed.
△ Less
Submitted 15 April, 2014;
originally announced April 2014.
-
Sentiment Analysis: How to Derive Prior Polarities from SentiWordNet
Authors:
Marco Guerini,
Lorenzo Gatti,
Marco Turchi
Abstract:
Assigning a positive or negative score to a word out of context (i.e. a word's prior polarity) is a challenging task for sentiment analysis. In the literature, various approaches based on SentiWordNet have been proposed. In this paper, we compare the most often used techniques together with newly proposed ones and incorporate all of them in a learning framework to see whether blending them can fur…
▽ More
Assigning a positive or negative score to a word out of context (i.e. a word's prior polarity) is a challenging task for sentiment analysis. In the literature, various approaches based on SentiWordNet have been proposed. In this paper, we compare the most often used techniques together with newly proposed ones and incorporate all of them in a learning framework to see whether blending them can further improve the estimation of prior polarity scores. Using two different versions of SentiWordNet and testing regression and classification models across tasks and datasets, our learning approach consistently outperforms the single metrics, providing a new state-of-the-art approach in computing words' prior polarity for sentiment analysis. We conclude our investigation showing interesting biases in calculated prior polarity scores when word Part of Speech and annotator gender are considered.
△ Less
Submitted 23 September, 2013;
originally announced September 2013.
-
Exploring Image Virality in Google Plus
Authors:
Marco Guerini,
Jacopo Staiano,
Davide Albanese
Abstract:
Reactions to posts in an online social network show different dynamics depending on several textual features of the corresponding content. Do similar dynamics exist when images are posted? Exploiting a novel dataset of posts, gathered from the most popular Google+ users, we try to give an answer to such a question. We describe several virality phenomena that emerge when taking into account visual…
▽ More
Reactions to posts in an online social network show different dynamics depending on several textual features of the corresponding content. Do similar dynamics exist when images are posted? Exploiting a novel dataset of posts, gathered from the most popular Google+ users, we try to give an answer to such a question. We describe several virality phenomena that emerge when taking into account visual characteristics of images (such as orientation, mean saturation, etc.). We also provide hypotheses and potential explanations for the dynamics behind them, and include cases for which common-sense expectations do not hold true in our experiments.
△ Less
Submitted 16 September, 2013;
originally announced September 2013.
-
Assessing Sentiment Strength in Words Prior Polarities
Authors:
Lorenzo Gatti,
Marco Guerini
Abstract:
Many approaches to sentiment analysis rely on lexica where words are tagged with their prior polarity - i.e. if a word out of context evokes something positive or something negative. In particular, broad-coverage resources like SentiWordNet provide polarities for (almost) every word. Since words can have multiple senses, we address the problem of how to compute the prior polarity of a word startin…
▽ More
Many approaches to sentiment analysis rely on lexica where words are tagged with their prior polarity - i.e. if a word out of context evokes something positive or something negative. In particular, broad-coverage resources like SentiWordNet provide polarities for (almost) every word. Since words can have multiple senses, we address the problem of how to compute the prior polarity of a word starting from the polarity of each sense and returning its polarity strength as an index between -1 and 1. We compare 14 such formulae that appear in the literature, and assess which one best approximates the human judgement of prior polarities, with both regression and classification models.
△ Less
Submitted 18 December, 2012;
originally announced December 2012.
-
Ecological Evaluation of Persuasive Messages Using Google AdWords
Authors:
Marco Guerini,
Carlo Strapparava,
Oliviero Stock
Abstract:
In recent years there has been a growing interest in crowdsourcing methodologies to be used in experimental research for NLP tasks. In particular, evaluation of systems and theories about persuasion is difficult to accommodate within existing frameworks. In this paper we present a new cheap and fast methodology that allows fast experiment building and evaluation with fully-automated analysis at a…
▽ More
In recent years there has been a growing interest in crowdsourcing methodologies to be used in experimental research for NLP tasks. In particular, evaluation of systems and theories about persuasion is difficult to accommodate within existing frameworks. In this paper we present a new cheap and fast methodology that allows fast experiment building and evaluation with fully-automated analysis at a low cost. The central idea is exploiting existing commercial tools for advertising on the web, such as Google AdWords, to measure message impact in an ecological setting. The paper includes a description of the approach, tips for how to use AdWords for scientific research, and results of pilot experiments on the impact of affective text variations which confirm the effectiveness of the approach.
△ Less
Submitted 24 April, 2012;
originally announced April 2012.
-
Exploring Text Virality in Social Networks
Authors:
Marco Guerini,
Carlo Strapparava,
Gozde Ozbal
Abstract:
This paper aims to shed some light on the concept of virality - especially in social networks - and to provide new insights on its structure. We argue that: (a) virality is a phenomenon strictly connected to the nature of the content being spread, rather than to the influencers who spread it, (b) virality is a phenomenon with many facets, i.e. under this generic term several different effects of p…
▽ More
This paper aims to shed some light on the concept of virality - especially in social networks - and to provide new insights on its structure. We argue that: (a) virality is a phenomenon strictly connected to the nature of the content being spread, rather than to the influencers who spread it, (b) virality is a phenomenon with many facets, i.e. under this generic term several different effects of persuasive communication are comprised and they only partially overlap. To give ground to our claims, we provide initial experiments in a machine learning framework to show how various aspects of virality can be independently predicted according to content features.
△ Less
Submitted 25 March, 2012;
originally announced March 2012.
-
Do Linguistic Style and Readability of Scientific Abstracts affect their Virality?
Authors:
Marco Guerini,
Alberto Pepe,
Bruno Lepri
Abstract:
Reactions to textual content posted in an online social network show different dynamics depending on the linguistic style and readability of the submitted content. Do similar dynamics exist for responses to scientific articles? Our intuition, supported by previous research, suggests that the success of a scientific article depends on its content, rather than on its linguistic style. In this articl…
▽ More
Reactions to textual content posted in an online social network show different dynamics depending on the linguistic style and readability of the submitted content. Do similar dynamics exist for responses to scientific articles? Our intuition, supported by previous research, suggests that the success of a scientific article depends on its content, rather than on its linguistic style. In this article, we examine a corpus of scientific abstracts and three forms of associated reactions: article downloads, citations, and bookmarks. Through a class-based psycholinguistic analysis and readability indices tests, we show that certain stylistic and readability features of abstracts clearly concur in determining the success and viral capability of a scientific article.
△ Less
Submitted 19 March, 2012;
originally announced March 2012.