-
"I'm in the Bluesky Tonight": Insights from a Year Worth of Social Data
Authors:
Andrea Failla,
Giulio Rossetti
Abstract:
Pollution of online social spaces caused by rampaging d/misinformation is a growing societal concern. However, recent decisions to reduce access to social media APIs are causing a shortage of publicly available, recent, social media data, thus hindering the advancement of computational social science as a whole. We present a large, high-coverage dataset of social interactions and user-generated co…
▽ More
Pollution of online social spaces caused by rampaging d/misinformation is a growing societal concern. However, recent decisions to reduce access to social media APIs are causing a shortage of publicly available, recent, social media data, thus hindering the advancement of computational social science as a whole. We present a large, high-coverage dataset of social interactions and user-generated content from Bluesky Social to address this pressing issue. The dataset contains the complete post history of over 4M users (81% of all registered accounts), totalling 235M posts. We also make available social data covering follow, comment, repost, and quote interactions. Since Bluesky allows users to create and bookmark feed generators (i.e., content recommendation algorithms), we also release the full output of several popular algorithms available on the platform, along with their timestamped ``like'' interactions and time of bookmarking. This dataset allows unprecedented analysis of online behavior and human-machine engagement patterns. Notably, it provides ground-truth data for studying the effects of content exposure and self-selection and performing content virality and diffusion analysis.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
From Perils to Possibilities: Understanding how Human (and AI) Biases affect Online Fora
Authors:
Virginia Morini,
Valentina Pansanella,
Katherine Abramski,
Erica Cau,
Andrea Failla,
Salvatore Citraro,
Giulio Rossetti
Abstract:
Social media platforms are online fora where users engage in discussions, share content, and build connections. This review explores the dynamics of social interactions, user-generated contents, and biases within the context of social media analysis (analyzing works that use the tools offered by complex network analysis and natural language processing) through the lens of three key points of view:…
▽ More
Social media platforms are online fora where users engage in discussions, share content, and build connections. This review explores the dynamics of social interactions, user-generated contents, and biases within the context of social media analysis (analyzing works that use the tools offered by complex network analysis and natural language processing) through the lens of three key points of view: online debates, online support, and human-AI interactions. On the one hand, we delineate the phenomenon of online debates, where polarization, misinformation, and echo chamber formation often proliferate, driven by algorithmic biases and extreme mechanisms of homophily. On the other hand, we explore the emergence of online support groups through users' self-disclosure and social support mechanisms. Online debates and support mechanisms present a duality of both perils and possibilities within social media; perils of segregated communities and polarized debates, and possibilities of empathy narratives and self-help groups. This dichotomy also extends to a third perspective: users' reliance on AI-generated content, such as the ones produced by Large Language Models, which can manifest both human biases hidden in training sets and non-human biases that emerge from their artificial neural architectures. Analyzing interdisciplinary approaches, we aim to deepen the understanding of the complex interplay between social interactions, user-generated content, and biases within the realm of social media ecosystems.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
Redefining Event Types and Group Evolution in Temporal Data
Authors:
Andrea Failla,
Rémy Cazabet,
Giulio Rossetti,
Salvatore Citraro
Abstract:
Groups -- such as clusters of points or communities of nodes -- are fundamental when addressing various data mining tasks. In temporal data, the predominant approach for characterizing group evolution has been through the identification of ``events". However, the events usually described in the literature, e.g., shrinks/growths, splits/merges, are often arbitrarily defined, creating a gap between…
▽ More
Groups -- such as clusters of points or communities of nodes -- are fundamental when addressing various data mining tasks. In temporal data, the predominant approach for characterizing group evolution has been through the identification of ``events". However, the events usually described in the literature, e.g., shrinks/growths, splits/merges, are often arbitrarily defined, creating a gap between such theoretical/predefined types and real-data group observations. Moving beyond existing taxonomies, we think of events as ``archetypes" characterized by a unique combination of quantitative dimensions that we call ``facets". Group dynamics are defined by their position within the facet space, where archetypal events occupy extremities. Thus, rather than enforcing strict event types, our approach can allow for hybrid descriptions of dynamics involving group proximity to multiple archetypes. We apply our framework to evolving groups from several face-to-face interaction datasets, showing it enables richer, more reliable characterization of group dynamics with respect to state-of-the-art methods, especially when the groups are subject to complex relationships. Our approach also offers intuitive solutions to common tasks related to dynamic group analysis, such as choosing an appropriate aggregation scale, quantifying partition stability, and evaluating event quality.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
Attributed Stream Hypergraphs: temporal modeling of node-attributed high-order interactions
Authors:
Andrea Failla,
Salvatore Citraro,
Giulio Rossetti
Abstract:
Recent advances in network science have resulted in two distinct research directions aimed at augmenting and enhancing representations for complex networks. The first direction, that of high-order modeling, aims to focus on connectivity between sets of nodes rather than pairs, whereas the second one, that of feature-rich augmentation, incorporates into a network all those elements that are driven…
▽ More
Recent advances in network science have resulted in two distinct research directions aimed at augmenting and enhancing representations for complex networks. The first direction, that of high-order modeling, aims to focus on connectivity between sets of nodes rather than pairs, whereas the second one, that of feature-rich augmentation, incorporates into a network all those elements that are driven by information which is external to the structure, like node properties or the flow of time. This paper proposes a novel toolbox, that of Attributed Stream Hypergraphs (ASHs), unifying both high-order and feature-rich elements for representing, mining, and analyzing complex networks. Applied to social network analysis, ASHs can characterize complex social phenomena along topological, dynamic and attributive elements. Experiments on real-world face-to-face and online social media interactions highlight that ASHs can easily allow for the analyses, among others, of high-order groups' homophily, nodes' homophily with respect to the hyperedges in which nodes participate, and time-respecting paths between hyperedges.
△ Less
Submitted 31 March, 2023;
originally announced March 2023.