Skip to main content

Showing 1–35 of 35 results for author: Guerini, M

  1. arXiv:2406.07288  [pdf, other

    cs.CL

    Fine-tuning with HED-IT: The impact of human post-editing for dialogical language models

    Authors: Daniela Occhipinti, Michele Marchi, Irene Mondella, Huiyuan Lai, Felice Dell'Orletta, Malvina Nissim, Marco Guerini

    Abstract: Automatic methods for generating and gathering linguistic data have proven effective for fine-tuning Language Models (LMs) in languages less resourced than English. Still, while there has been emphasis on data quantity, less attention has been given to its quality. In this work, we investigate the impact of human intervention on machine-generated data when fine-tuning dialogical models. In particu… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  2. arXiv:2403.20103  [pdf, other

    cs.CL

    NLP for Counterspeech against Hate: A Survey and How-To Guide

    Authors: Helena Bonaldi, Yi-Ling Chung, Gavin Abercrombie, Marco Guerini

    Abstract: In recent years, counterspeech has emerged as one of the most promising strategies to fight online hate. These non-escalatory responses tackle online abuse while preserving the freedom of speech of the users, and can have a tangible impact in reducing online and offline violence. Recently, there has been growing interest from the Natural Language Processing (NLP) community in addressing the challe… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: To appear in Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics (findings)

  3. arXiv:2403.09159  [pdf, ps, other

    cs.CL

    Basque and Spanish Counter Narrative Generation: Data Creation and Evaluation

    Authors: Jaione Bengoetxea, Yi-Ling Chung, Marco Guerini, Rodrigo Agerri

    Abstract: Counter Narratives (CNs) are non-negative textual responses to Hate Speech (HS) aiming at defusing online hatred and mitigating its spreading across media. Despite the recent increase in HS content posted online, research on automatic CN generation has been relatively scarce and predominantly focused on English. In this paper, we present CONAN-EUS, a new Basque and Spanish dataset for CN generatio… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: Accepted for the Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING) 2024

  4. arXiv:2402.02975  [pdf, other

    cs.CL

    Putting Context in Context: the Impact of Discussion Structure on Text Classification

    Authors: Nicolò Penzo, Antonio Longa, Bruno Lepri, Sara Tonelli, Marco Guerini

    Abstract: Current text classification approaches usually focus on the content to be classified. Contextual aspects (both linguistic and extra-linguistic) are usually neglected, even in tasks based on online discussions. Still in many cases the multi-party and multi-turn nature of the context from which these elements are selected can be fruitfully exploited. In this work, we propose a series of experiments… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: Accepted to EACL 2024 main conference

  5. arXiv:2311.10587  [pdf, other

    cs.CL

    Countering Misinformation via Emotional Response Generation

    Authors: Daniel Russo, Shane Peter Kaszefski-Yaschuk, Jacopo Staiano, Marco Guerini

    Abstract: The proliferation of misinformation on social media platforms (SMPs) poses a significant danger to public health, social cohesion and ultimately democracy. Previous research has shown how social correction can be an effective way to curb misinformation, by engaging directly in a constructive dialogue with users who spread -- often in good faith -- misleading messages. Although professional fact-ch… ▽ More

    Submitted 17 November, 2023; originally announced November 2023.

    Comments: Accepted to EMNLP 2023 main conference

  6. arXiv:2311.05195  [pdf, other

    cs.CL

    PRODIGy: a PROfile-based DIalogue Generation dataset

    Authors: Daniela Occhipinti, Serra Sinem Tekiroglu, Marco Guerini

    Abstract: Providing dialogue agents with a profile representation can improve their consistency and coherence, leading to better conversations. However, current profile-based dialogue datasets for training such agents contain either explicit profile representations that are simple and dialogue-specific, or implicit representations that are difficult to collect. In this work, we propose a unified framework i… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

  7. arXiv:2309.02311  [pdf, other

    cs.CL

    Weigh Your Own Words: Improving Hate Speech Counter Narrative Generation via Attention Regularization

    Authors: Helena Bonaldi, Giuseppe Attanasio, Debora Nozza, Marco Guerini

    Abstract: Recent computational approaches for combating online hate speech involve the automatic generation of counter narratives by adapting Pretrained Transformer-based Language Models (PLMs) with human-curated data. This process, however, can produce in-domain overfitting, resulting in models generating acceptable narratives only for hatred similar to training data, with little portability to other targe… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

    Comments: To appear at CS4OA workshop (INLG-SIGDial)

  8. arXiv:2308.15202  [pdf, other

    cs.CL

    Benchmarking the Generation of Fact Checking Explanations

    Authors: Daniel Russo, Serra Sinem Tekiroglu, Marco Guerini

    Abstract: Fighting misinformation is a challenging, yet crucial, task. Despite the growing number of experts being involved in manual fact-checking, this activity is time-consuming and cannot keep up with the ever-increasing amount of Fake News produced daily. Hence, automating this process is necessary to help curb misinformation. Thus far, researchers have mainly focused on claim veracity classification.… ▽ More

    Submitted 29 August, 2023; originally announced August 2023.

    Comments: Accepted to TACL. This arXiv version is a pre-MIT Press publication version

  9. arXiv:2211.03433  [pdf, other

    cs.CL cs.CY

    Human-Machine Collaboration Approaches to Build a Dialogue Dataset for Hate Speech Countering

    Authors: Helena Bonaldi, Sara Dellantonio, Serra Sinem Tekiroglu, Marco Guerini

    Abstract: Fighting online hate speech is a challenge that is usually addressed using Natural Language Processing via automatic detection and removal of hate content. Besides this approach, counter narratives have emerged as an effective tool employed by NGOs to respond to online hate on social media platforms. For this reason, Natural Language Generation is currently being studied as a way to automatize cou… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

    Comments: To appear in Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (long paper)

  10. arXiv:2204.01440  [pdf, other

    cs.CL cs.CY

    Using Pre-Trained Language Models for Producing Counter Narratives Against Hate Speech: a Comparative Study

    Authors: Serra Sinem Tekiroglu, Helena Bonaldi, Margherita Fanton, Marco Guerini

    Abstract: In this work, we present an extensive study on the use of pre-trained language models for the task of automatic Counter Narrative (CN) generation to fight online hate speech in English. We first present a comparative study to determine whether there is a particular Language Model (or class of LMs) and a particular decoding mechanism that are the most appropriate to generate CNs. Findings show that… ▽ More

    Submitted 4 April, 2022; originally announced April 2022.

    Comments: To appear in "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL): Findings"

  11. arXiv:2109.13664  [pdf, other

    cs.CL cs.CY

    Multilingual Counter Narrative Type Classification

    Authors: Yi-Ling Chung, Marco Guerini, Rodrigo Agerri

    Abstract: The growing interest in employing counter narratives for hatred intervention brings with it a focus on dataset creation and automation strategies. In this scenario, learning to recognize counter narrative types from natural text is expected to be useful for applications such as hate speech countering, where operators from non-governmental organizations are supposed to answer to hate with several a… ▽ More

    Submitted 28 September, 2021; originally announced September 2021.

    Comments: To appear at the Workshop on Argument Mining 2021

  12. Agreeing to Disagree: Annotating Offensive Language Datasets with Annotators' Disagreement

    Authors: Elisa Leonardelli, Stefano Menini, Alessio Palmero Aprosio, Marco Guerini, Sara Tonelli

    Abstract: Since state-of-the-art approaches to offensive language detection rely on supervised learning, it is crucial to quickly adapt them to the continuously evolving scenario of social media. While several approaches have been proposed to tackle the problem from an algorithmic perspective, so to reduce the need for annotated data, less attention has been paid to the quality of these data. Following a tr… ▽ More

    Submitted 28 September, 2021; originally announced September 2021.

    Comments: To appear at EMNLP 2021 (long paper)

  13. Human-in-the-Loop for Data Collection: a Multi-Target Counter Narrative Dataset to Fight Online Hate Speech

    Authors: Margherita Fanton, Helena Bonaldi, Serra Sinem Tekiroglu, Marco Guerini

    Abstract: Undermining the impact of hateful content with informed and non-aggressive responses, called counter narratives, has emerged as a possible solution for having healthier online communities. Thus, some NLP studies have started addressing the task of counter narrative generation. Although such studies have made an effort to build hate speech / counter narrative (HS/CN) datasets for neural generation,… ▽ More

    Submitted 19 July, 2021; originally announced July 2021.

    Comments: To appear at ACL 2021 (long paper)

  14. arXiv:2107.02472  [pdf, other

    cs.CL cs.AI cs.CY cs.HC

    Empowering NGOs in Countering Online Hate Messages

    Authors: Yi-Ling Chung, Serra Sinem Tekiroglu, Sara Tonelli, Marco Guerini

    Abstract: Studies on online hate speech have mostly focused on the automated detection of harmful messages. Little attention has been devoted so far to the development of effective strategies to fight hate speech, in particular through the creation of counter-messages. While existing manual scrutiny and intervention strategies are time-consuming and not scalable, advances in natural language processing have… ▽ More

    Submitted 6 July, 2021; originally announced July 2021.

    Comments: Preprint of the paper published in Online Social Networks and Media Journal (OSNEM)

  15. arXiv:2106.11783  [pdf, other

    cs.CL cs.CY

    Towards Knowledge-Grounded Counter Narrative Generation for Hate Speech

    Authors: Yi-Ling Chung, Serra Sinem Tekiroglu, Marco Guerini

    Abstract: Tackling online hatred using informed textual responses - called counter narratives - has been brought under the spotlight recently. Accordingly, a research line has emerged to automatically generate counter narratives in order to facilitate the direct intervention in the hate discussion and to prevent hate content from further spreading. Still, current neural approaches tend to produce generic/re… ▽ More

    Submitted 22 June, 2021; originally announced June 2021.

    Comments: To appear in "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL): Findings"

  16. arXiv:2010.03369  [pdf, other

    cs.CL cs.AI

    Toward Stance-based Personas for Opinionated Dialogues

    Authors: Thomas Scialom, Serra Sinem Tekiroglu, Jacopo Staiano, Marco Guerini

    Abstract: In the context of chit-chat dialogues it has been shown that endowing systems with a persona profile is important to produce more coherent and meaningful conversations. Still, the representation of such personas has thus far been limited to a fact-based representation (e.g. "I have two cats."). We argue that these representations remain superficial w.r.t. the complexity of human personality. In th… ▽ More

    Submitted 7 October, 2020; originally announced October 2020.

    Comments: Accepted at Findings of EMNLP 2020

  17. arXiv:2004.14253  [pdf, other

    cs.CL

    GePpeTto Carves Italian into a Language Model

    Authors: Lorenzo De Mattei, Michele Cafagna, Felice Dell'Orletta, Malvina Nissim, Marco Guerini

    Abstract: In the last few years, pre-trained neural architectures have provided impressive improvements across several NLP tasks. Still, generative language models are available mainly for English. We develop GePpeTto, the first generative language model for Italian, built using the GPT-2 architecture. We provide a thorough analysis of GePpeTto's quality by means of both an automatic and a human-based evalu… ▽ More

    Submitted 29 April, 2020; originally announced April 2020.

  18. arXiv:2004.04216  [pdf, other

    cs.CL cs.CY cs.SI

    Generating Counter Narratives against Online Hate Speech: Data and Strategies

    Authors: Serra Sinem Tekiroglu, Yi-Ling Chung, Marco Guerini

    Abstract: Recently research has started focusing on avoiding undesired effects that come with content moderation, such as censorship and overblocking, when dealing with hatred online. The core idea is to directly intervene in the discussion with textual responses that are meant to counter the hate content and prevent it from further spreading. Accordingly, automation strategies, such as natural language gen… ▽ More

    Submitted 8 April, 2020; originally announced April 2020.

    Comments: To appear at ACL 2020 (long paper)

  19. arXiv:1910.07357  [pdf, ps, other

    cs.CL

    Generating Challenge Datasets for Task-Oriented Conversational Agents through Self-Play

    Authors: Sourabh Majumdar, Serra Sinem Tekiroglu, Marco Guerini

    Abstract: End-to-end neural approaches are becoming increasingly common in conversational scenarios due to their promising performances when provided with sufficient amount of data. In this paper, we present a novel methodology to address the interpretability of neural approaches in such scenarios by creating challenge datasets using dialogue self-play over multiple tasks/intents. Dialogue self-play allows… ▽ More

    Submitted 16 October, 2019; originally announced October 2019.

    Comments: Proceedings of Recent Advances in Natural Language Processing (RANLP) Conference, 2019

  20. CONAN -- COunter NArratives through Nichesourcing: a Multilingual Dataset of Responses to Fight Online Hate Speech

    Authors: Y. L. Chung, E. Kuzmenko, S. S. Tekiroglu, M. Guerini

    Abstract: Although there is an unprecedented effort to provide adequate responses in terms of laws and policies to hate content on social media platforms, dealing with hatred online is still a tough problem. Tackling hate speech in the standard way of content deletion or user suspension may be charged with censorship and overblocking. One alternate strategy, that has received little attention so far by the… ▽ More

    Submitted 8 October, 2019; originally announced October 2019.

    Comments: Published as a long paper at ACL 2019

    Journal ref: In Proceedings of ACL 2019 (pp. 2819-2829)

  21. arXiv:1810.03660  [pdf, other

    cs.CL cs.CY

    DepecheMood++: a Bilingual Emotion Lexicon Built Through Simple Yet Powerful Techniques

    Authors: Oscar Araque, Lorenzo Gatti, Jacopo Staiano, Marco Guerini

    Abstract: Several lexica for sentiment analysis have been developed and made available in the NLP community. While most of these come with word polarity annotations (e.g. positive/negative), attempts at building lexica for finer-grained emotion analysis (e.g. happiness, sadness) have recently attracted significant attention. Such lexica are often exploited as a building block in the process of developing le… ▽ More

    Submitted 8 October, 2018; originally announced October 2018.

    Comments: 12 pages, 2 figures

  22. arXiv:1704.00939  [pdf, other

    cs.CL cs.CY

    Fortia-FBK at SemEval-2017 Task 5: Bullish or Bearish? Inferring Sentiment towards Brands from Financial News Headlines

    Authors: Youness Mansar, Lorenzo Gatti, Sira Ferradans, Marco Guerini, Jacopo Staiano

    Abstract: In this paper, we describe a methodology to infer Bullish or Bearish sentiment towards companies/brands. More specifically, our approach leverages affective lexica and word embeddings in combination with convolutional neural networks to infer the sentiment of financial news headlines towards a target company. Such architecture was used and evaluated in the context of the SemEval 2017 challenge (ta… ▽ More

    Submitted 4 April, 2017; originally announced April 2017.

    Comments: 6 pages, 1 figure; accepted for publication at the International Workshop on Semantic Evaluation (SemEval-2017) to be held in conjunction with ACL 2017

  23. arXiv:1601.06081  [pdf, ps, other

    cs.CL cs.CY cs.SI

    Why Do Urban Legends Go Viral?

    Authors: Marco Guerini, Carlo Strapparava

    Abstract: Urban legends are a genre of modern folklore, consisting of stories about rare and exceptional events, just plausible enough to be believed, which tend to propagate inexorably across communities. In our view, while urban legends represent a form of "sticky" deceptive text, they are marked by a tension between the credible and incredible. They should be credible like a news article and incredible l… ▽ More

    Submitted 22 January, 2016; originally announced January 2016.

    Comments: Preprint of paper in Journal of Information Processing and Management Volume 52, Issue 1, January 2016, Pages 163-172

  24. arXiv:1511.03447  [pdf, other

    cs.SI physics.soc-ph

    Community dynamics in connected time-dependent multilayer networks

    Authors: Marco Cristoforetti, Marco Guerini, Giuseppe Jurman, Cesare Furlanello

    Abstract: Different strategies have been considered to extract information from social media about how similarly people react to the same news or event. In this context, a powerful method is offered by the application of graph techniques to the contents produced by social network users. In particular, large events typically attract enough content traffic along time to enable an analysis that explicitly mode… ▽ More

    Submitted 11 November, 2015; originally announced November 2015.

  25. SentiWords: Deriving a High Precision and High Coverage Lexicon for Sentiment Analysis

    Authors: Lorenzo Gatti, Marco Guerini, Marco Turchi

    Abstract: Deriving prior polarity lexica for sentiment analysis - where positive or negative scores are associated with words out of context - is a challenging task. Usually, a trade-off between precision and coverage is hard to find, and it depends on the methodology used to build the lexicon. Manually annotated lexica provide a high precision but lack in coverage, whereas automatic derivation from pre-exi… ▽ More

    Submitted 30 October, 2015; originally announced October 2015.

    Comments: in Affective Computing, IEEE Transactions on (2015)

  26. arXiv:1508.05817  [pdf, ps, other

    cs.CL cs.CY cs.SI

    Echoes of Persuasion: The Effect of Euphony in Persuasive Communication

    Authors: Marco Guerini, Gözde Özbal, Carlo Strapparava

    Abstract: While the effect of various lexical, syntactic, semantic and stylistic features have been addressed in persuasive language from a computational point of view, the persuasive effect of phonetics has received little attention. By modeling a notion of euphony and analyzing four datasets comprising persuasive and non-persuasive sentences in different domains (political speeches, movie quotes, slogans… ▽ More

    Submitted 24 August, 2015; originally announced August 2015.

  27. arXiv:1503.04723  [pdf, other

    cs.SI cs.CL cs.CY

    Deep Feelings: A Massive Cross-Lingual Study on the Relation between Emotions and Virality

    Authors: Marco Guerini, Jacopo Staiano

    Abstract: This article provides a comprehensive investigation on the relations between virality of news articles and the emotions they are found to evoke. Virality, in our view, is a phenomenon with many facets, i.e. under this generic term several different effects of persuasive communication are comprised. By exploiting a high-coverage and bilingual corpus of documents containing metrics of their spread o… ▽ More

    Submitted 16 March, 2015; originally announced March 2015.

    Comments: preprint version of WWW 2015 'Web Science Track' paper

  28. arXiv:1405.1605  [pdf, ps, other

    cs.CL cs.CY

    DepecheMood: a Lexicon for Emotion Analysis from Crowd-Annotated News

    Authors: Jacopo Staiano, Marco Guerini

    Abstract: While many lexica annotated with words polarity are available for sentiment analysis, very few tackle the harder task of emotion analysis and are usually quite limited in coverage. In this paper, we present a novel approach for extracting - in a totally automated way - a high-coverage and high-precision lexicon of roughly 37 thousand terms annotated with emotion scores, called DepecheMood. Our app… ▽ More

    Submitted 7 May, 2014; originally announced May 2014.

    Comments: To appear at ACL 2014. 7 pages

  29. arXiv:1404.3959  [pdf, other

    cs.CY cs.CL

    Is it morally acceptable for a system to lie to persuade me?

    Authors: Marco Guerini, Fabio Pianesi, Oliviero Stock

    Abstract: Given the fast rise of increasingly autonomous artificial agents and robots, a key acceptability criterion will be the possible moral implications of their actions. In particular, intelligent persuasive systems (systems designed to influence humans via communication) constitute a highly sensitive topic because of their intrinsically social nature. Still, ethical studies in this area are rare and t… ▽ More

    Submitted 15 April, 2014; originally announced April 2014.

  30. arXiv:1309.5843  [pdf, ps, other

    cs.CL

    Sentiment Analysis: How to Derive Prior Polarities from SentiWordNet

    Authors: Marco Guerini, Lorenzo Gatti, Marco Turchi

    Abstract: Assigning a positive or negative score to a word out of context (i.e. a word's prior polarity) is a challenging task for sentiment analysis. In the literature, various approaches based on SentiWordNet have been proposed. In this paper, we compare the most often used techniques together with newly proposed ones and incorporate all of them in a learning framework to see whether blending them can fur… ▽ More

    Submitted 23 September, 2013; originally announced September 2013.

    Comments: To appear in Proceedings of EMNLP 2013

  31. arXiv:1309.3908  [pdf, ps, other

    cs.SI cs.CY cs.MM physics.soc-ph

    Exploring Image Virality in Google Plus

    Authors: Marco Guerini, Jacopo Staiano, Davide Albanese

    Abstract: Reactions to posts in an online social network show different dynamics depending on several textual features of the corresponding content. Do similar dynamics exist when images are posted? Exploiting a novel dataset of posts, gathered from the most popular Google+ users, we try to give an answer to such a question. We describe several virality phenomena that emerge when taking into account visual… ▽ More

    Submitted 16 September, 2013; originally announced September 2013.

    Comments: 8 pages, 8 figures. IEEE/ASE SocialCom 2013

  32. arXiv:1212.4315  [pdf, ps, other

    cs.CL

    Assessing Sentiment Strength in Words Prior Polarities

    Authors: Lorenzo Gatti, Marco Guerini

    Abstract: Many approaches to sentiment analysis rely on lexica where words are tagged with their prior polarity - i.e. if a word out of context evokes something positive or something negative. In particular, broad-coverage resources like SentiWordNet provide polarities for (almost) every word. Since words can have multiple senses, we address the problem of how to compute the prior polarity of a word startin… ▽ More

    Submitted 18 December, 2012; originally announced December 2012.

    Comments: To appear at Coling 2012

  33. arXiv:1204.5369  [pdf, other

    cs.CL cs.SI

    Ecological Evaluation of Persuasive Messages Using Google AdWords

    Authors: Marco Guerini, Carlo Strapparava, Oliviero Stock

    Abstract: In recent years there has been a growing interest in crowdsourcing methodologies to be used in experimental research for NLP tasks. In particular, evaluation of systems and theories about persuasion is difficult to accommodate within existing frameworks. In this paper we present a new cheap and fast methodology that allows fast experiment building and evaluation with fully-automated analysis at a… ▽ More

    Submitted 24 April, 2012; originally announced April 2012.

    Comments: To appear at ACL 2012. 9 pages, 2 figures

    ACM Class: I.2.7

  34. arXiv:1203.5502  [pdf, ps, other

    cs.CL cs.SI physics.soc-ph

    Exploring Text Virality in Social Networks

    Authors: Marco Guerini, Carlo Strapparava, Gozde Ozbal

    Abstract: This paper aims to shed some light on the concept of virality - especially in social networks - and to provide new insights on its structure. We argue that: (a) virality is a phenomenon strictly connected to the nature of the content being spread, rather than to the influencers who spread it, (b) virality is a phenomenon with many facets, i.e. under this generic term several different effects of p… ▽ More

    Submitted 25 March, 2012; originally announced March 2012.

    Journal ref: Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media (ICWSM 2011), 17-21 July 2011, Barcelona, Spain

  35. arXiv:1203.4238  [pdf, ps, other

    cs.SI cs.CL cs.DL

    Do Linguistic Style and Readability of Scientific Abstracts affect their Virality?

    Authors: Marco Guerini, Alberto Pepe, Bruno Lepri

    Abstract: Reactions to textual content posted in an online social network show different dynamics depending on the linguistic style and readability of the submitted content. Do similar dynamics exist for responses to scientific articles? Our intuition, supported by previous research, suggests that the success of a scientific article depends on its content, rather than on its linguistic style. In this articl… ▽ More

    Submitted 19 March, 2012; originally announced March 2012.

    Comments: Proceedings of the Sixth International AAAI Conference on Weblogs and Social Media (ICWSM 2012), 4-8 June 2012, Dublin, Ireland