Unmasking information manipulation: A quantitative approach to detecting Copy-pasta, Rewording, and Translation on Social Media
Authors:
Manon Richard,
Lisa Giordani,
Cristian Brokate,
Jean Liénard
Abstract:
This study proposes a comprehensive methodology for identifying three techniques utilized in foreign-operated information manipulation campaigns: Copy-Pasta, Rewording, and Translation. Our approach, dubbed the ``$3Δ$-space duplicate methodology'', quantifies the semantic, grapheme, and language aspects of messages. Computing pairwise distances within these dimensions enables detection of abnormal…
▽ More
This study proposes a comprehensive methodology for identifying three techniques utilized in foreign-operated information manipulation campaigns: Copy-Pasta, Rewording, and Translation. Our approach, dubbed the ``$3Δ$-space duplicate methodology'', quantifies the semantic, grapheme, and language aspects of messages. Computing pairwise distances within these dimensions enables detection of abnormally close messages that are likely part of a coordinated campaign. We validate our approach using a synthetic dataset generated with ChatGPT and DeepL, further applying it to a real-world dataset on Venezuelan actors from Twitter Transparency. Our method successfully identifies all three types of inauthentic duplicates in the synthetic dataset, and is able to uncover inauthentic duplicates across political, commercial, and entertainment contexts in the Twitter dataset. The distinct focus on clustered alterations to messages, rather than individual messages, makes our approach efficient and effective at detecting large-scale instances of textual manipulation, including AI-generated ones. Moreover, our method offers a robust tool for identifying translated content, overlooked in previous research. This research also represents the first comprehensive analysis of copy-pasta detection, providing a reliable technique for tracking duplicate textual content across social networks.
△ Less
Submitted 28 December, 2023;
originally announced December 2023.
fakenewsbr: A Fake News Detection Platform for Brazilian Portuguese
Authors:
Luiz Giordani,
Gilsiley Darú,
Rhenan Queiroz,
Vitor Buzinaro,
Davi Keglevich Neiva,
Daniel Camilo Fuentes Guzmán,
Marcos Jardel Henriques,
Oilson Alberto Gonzatto Junior,
Francisco Louzada
Abstract:
The proliferation of fake news has become a significant concern in recent times due to its potential to spread misinformation and manipulate public opinion. This paper presents a comprehensive study on detecting fake news in Brazilian Portuguese, focusing on journalistic-type news. We propose a machine learning-based approach that leverages natural language processing techniques, including TF-IDF…
▽ More
The proliferation of fake news has become a significant concern in recent times due to its potential to spread misinformation and manipulate public opinion. This paper presents a comprehensive study on detecting fake news in Brazilian Portuguese, focusing on journalistic-type news. We propose a machine learning-based approach that leverages natural language processing techniques, including TF-IDF and Word2Vec, to extract features from textual data. We evaluate the performance of various classification algorithms, such as logistic regression, support vector machine, random forest, AdaBoost, and LightGBM, on a dataset containing both true and fake news articles. The proposed approach achieves high accuracy and F1-Score, demonstrating its effectiveness in identifying fake news. Additionally, we developed a user-friendly web platform, fakenewsbr.com, to facilitate the verification of news articles' veracity. Our platform provides real-time analysis, allowing users to assess the likelihood of fake news articles. Through empirical analysis and comparative studies, we demonstrate the potential of our approach to contribute to the fight against the spread of fake news and promote more informed media consumption.
△ Less
Submitted 20 September, 2023; v1 submitted 20 September, 2023;
originally announced September 2023.