-
Complex contagions can outperform simple contagions for network reconstruction with dense networks or saturated dynamics
Authors:
Nicholas W. Landry,
William Thompson,
Laurent Hébert-Dufresne,
Jean-Gabriel Young
Abstract:
Network scientists often use complex dynamic processes to describe network contagions, but tools for fitting contagion models typically assume simple dynamics. Here, we address this gap by developing a nonparametric method to reconstruct a network and dynamics from a series of node states, using a model that breaks the dichotomy between simple pairwise and complex neighborhood-based contagions. We…
▽ More
Network scientists often use complex dynamic processes to describe network contagions, but tools for fitting contagion models typically assume simple dynamics. Here, we address this gap by developing a nonparametric method to reconstruct a network and dynamics from a series of node states, using a model that breaks the dichotomy between simple pairwise and complex neighborhood-based contagions. We then show that a network is more easily reconstructed when observed through the lens of complex contagions if it is dense or the dynamic saturates, and that simple contagions are better otherwise.
△ Less
Submitted 30 April, 2024;
originally announced May 2024.
-
Biomedical Open Source Software: Crucial Packages and Hidden Heroes
Authors:
Andrew Nesbitt,
Boris Veytsman,
Daniel Mietchen,
Eva Maxfield Brown,
James Howison,
João Felipe Pimentel,
Laurent Hèbert-Dufresne,
Stephan Druskat
Abstract:
Despite the importance of scientific software for research, it is often not formally recognized and rewarded. This is especially true for foundation libraries, which are used by the software packages visible to the users, being ``hidden'' themselves. The funders and other organizations need to understand the complex network of computer programs that the modern research relies upon.
In this work…
▽ More
Despite the importance of scientific software for research, it is often not formally recognized and rewarded. This is especially true for foundation libraries, which are used by the software packages visible to the users, being ``hidden'' themselves. The funders and other organizations need to understand the complex network of computer programs that the modern research relies upon.
In this work we used CZ Software Mentions Dataset to map the dependencies of the software used in biomedical papers and find the packages critical to the software ecosystems. We propose the centrality metrics for the network of software dependencies, analyze three ecosystems (PyPi, CRAN, Bioconductor) and determine the packages with the highest centrality.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
Measuring Centralization of Online Platforms Through Size and Interconnection of Communities
Authors:
Milo Z. Trujillo,
Laurent Hébert-Dufresne,
James Bagrow
Abstract:
Decentralized architecture offers a robust and flexible structure for online platforms, since centralized moderation and computation can be easy to disrupt with targeted attacks. However, a platform offering a decentralized architecture does not guarantee that users will use it in a decentralized way, and measuring the centralization of socio-technical networks is not an easy task. In this paper w…
▽ More
Decentralized architecture offers a robust and flexible structure for online platforms, since centralized moderation and computation can be easy to disrupt with targeted attacks. However, a platform offering a decentralized architecture does not guarantee that users will use it in a decentralized way, and measuring the centralization of socio-technical networks is not an easy task. In this paper we introduce a method of characterizing community influence in terms of how many edges between communities would be disrupted by a community's removal. Our approach provides a careful definition of "centralization" appropriate in bipartite user-community socio-technical networks, and demonstrates the inadequacy of more trivial methods for interrogating centralization such as examining the distribution of community sizes. We use this method to compare the structure of multiple socio-technical platforms -- Mastodon, git code hosting servers, BitChute, Usenet, and Voat -- and find a range of structures, from interconnected but decentralized git servers to an effectively centralized use of Mastodon servers, as well as multiscale hybrid network structures of disconnected Voat subverses. As the ecosystem of socio-technical platforms diversifies, it becomes critical to not solely focus on the underlying technologies but also consider the structure of how users interact through the technical infrastructure.
△ Less
Submitted 27 July, 2023;
originally announced July 2023.
-
Modeling critical connectivity constraints in random and empirical networks
Authors:
Laurent Hébert-Dufresne,
Márton Pósfai,
Antoine Allard
Abstract:
Random networks are a powerful tool in the analytical modeling of complex networks as they allow us to write approximate mathematical models for diverse properties and behaviors of networks. One notable shortcoming of these models is that they are often used to study processes in terms of how they affect the giant connected component of the network, yet they fail to properly account for that compo…
▽ More
Random networks are a powerful tool in the analytical modeling of complex networks as they allow us to write approximate mathematical models for diverse properties and behaviors of networks. One notable shortcoming of these models is that they are often used to study processes in terms of how they affect the giant connected component of the network, yet they fail to properly account for that component. As an example, this approach is often used to answer questions such as how robust is the network to random damage but fails to capture the structure of the network before any inflicted damage. Here, we introduce a simple conceptual step to account for such connectivity constraints in existing models. We distinguish network neighbors into two types of connections that can lead or not to a component of interest, which we call critical and subcritical degrees. In doing so, we capture important structural features of the network in a system of only one or two equations. In particular cases where the component of interest is surprising under classic random network models, such as sparse connected networks, a single equation can approximate state-of-the art models like message passing which require a number of equations linear in system size. We discuss potential applications of this simple framework for the study of infrastructure networks where connectivity constraints are critical to the function of the system.
△ Less
Submitted 7 July, 2023;
originally announced July 2023.
-
TRACE-Omicron: Policy Counterfactuals to Inform Mitigation of COVID-19 Spread in the United States
Authors:
David O'Gara,
Samuel F. Rosenblatt,
Laurent Hébert-Dufresne,
Rob Purcell,
Matt Kasman,
Ross A. Hammond
Abstract:
The Omicron wave was the largest wave of COVID-19 pandemic to date, more than doubling any other in terms of cases and hospitalizations in the United States. In this paper, we present a large-scale agent-based model of policy interventions that could have been implemented to mitigate the Omicron wave. Our model takes into account the behaviors of individuals and their interactions with one another…
▽ More
The Omicron wave was the largest wave of COVID-19 pandemic to date, more than doubling any other in terms of cases and hospitalizations in the United States. In this paper, we present a large-scale agent-based model of policy interventions that could have been implemented to mitigate the Omicron wave. Our model takes into account the behaviors of individuals and their interactions with one another within a nationally representative population, as well as the efficacy of various interventions such as social distancing, mask wearing, testing, tracing, and vaccination. We use the model to simulate the impact of different policy scenarios and evaluate their potential effectiveness in controlling the spread of the virus. Our results suggest the Omicron wave could have been substantially curtailed via a combination of interventions comparable in effectiveness to extreme and unpopular singular measures such as widespread closure of schools and workplaces, and highlight the importance of early and decisive action.
△ Less
Submitted 19 January, 2023;
originally announced January 2023.
-
Diverse Misinformation: Impacts of Human Biases on Detection of Deepfakes on Networks
Authors:
Juniper Lovato,
Laurent Hébert-Dufresne,
Jonathan St-Onge,
Randall Harp,
Gabriela Salazar Lopez,
Sean P. Rogers,
Ijaz Ul Haq,
Jeremiah Onaolapo
Abstract:
Social media platforms often assume that users can self-correct against misinformation. However, social media users are not equally susceptible to all misinformation as their biases influence what types of misinformation might thrive and who might be at risk. We call "diverse misinformation" the complex relationships between human biases and demographics represented in misinformation. To investiga…
▽ More
Social media platforms often assume that users can self-correct against misinformation. However, social media users are not equally susceptible to all misinformation as their biases influence what types of misinformation might thrive and who might be at risk. We call "diverse misinformation" the complex relationships between human biases and demographics represented in misinformation. To investigate how users' biases impact their susceptibility and their ability to correct each other, we analyze classification of deepfakes as a type of diverse misinformation. We chose deepfakes as a case study for three reasons: 1) their classification as misinformation is more objective; 2) we can control the demographics of the personas presented; 3) deepfakes are a real-world concern with associated harms that must be better understood. Our paper presents an observational survey (N=2,016) where participants are exposed to videos and asked questions about their attributes, not knowing some might be deepfakes. Our analysis investigates the extent to which different users are duped and which perceived demographics of deepfake personas tend to mislead. We find that accuracy varies by demographics, and participants are generally better at classifying videos that match them. We extrapolate from these results to understand the potential population-level impacts of these biases using a mathematical model of the interplay between diverse misinformation and crowd correction. Our model suggests that diverse contacts might provide "herd correction" where friends can protect each other. Altogether, human biases and the attributes of misinformation matter greatly, but having a diverse social group may help reduce susceptibility to misinformation.
△ Less
Submitted 13 January, 2024; v1 submitted 18 October, 2022;
originally announced October 2022.
-
Multidisciplinary learning through collective performance favors decentralization
Authors:
John Meluso,
Laurent Hébert-Dufresne
Abstract:
Many models of learning in teams assume that team members can share solutions or learn concurrently. However, these assumptions break down in multidisciplinary teams where team members often complete distinct, interrelated pieces of larger tasks. Such contexts make it difficult for individuals to separate the performance effects of their own actions from the actions of interacting neighbors. In th…
▽ More
Many models of learning in teams assume that team members can share solutions or learn concurrently. However, these assumptions break down in multidisciplinary teams where team members often complete distinct, interrelated pieces of larger tasks. Such contexts make it difficult for individuals to separate the performance effects of their own actions from the actions of interacting neighbors. In this work, we show that individuals can overcome this challenge by learning from network neighbors through mediating artifacts (like collective performance assessments). When neighbors' actions influence collective outcomes, teams with different networks perform relatively similarly to one another. However, varying a team's network can affect performance on tasks that weight individuals' contributions by network properties. Consequently, when individuals innovate (through ``exploring'' searches), dense networks hurt performance slightly by increasing uncertainty. In contrast, dense networks moderately help performance when individuals refine their work (through ``exploiting'' searches) by efficiently finding local optima. We also find that decentralization improves team performance across a battery of 34 tasks. Our results offer design principles for multidisciplinary teams within which other forms of learning prove more difficult.
△ Less
Submitted 14 August, 2023; v1 submitted 24 August, 2022;
originally announced August 2022.
-
Compressing the chronology of a temporal network with graph commutators
Authors:
Andrea J. Allen,
Cristopher Moore,
Laurent Hébert-Dufresne
Abstract:
Studies of dynamics on temporal networks often represent the network as a series of "snapshots," static networks active for short durations of time. We argue that successive snapshots can be aggregated if doing so has little effect on the overlying dynamics. We propose a method to compress network chronologies by progressively combining pairs of snapshots whose matrix commutators have the smallest…
▽ More
Studies of dynamics on temporal networks often represent the network as a series of "snapshots," static networks active for short durations of time. We argue that successive snapshots can be aggregated if doing so has little effect on the overlying dynamics. We propose a method to compress network chronologies by progressively combining pairs of snapshots whose matrix commutators have the smallest dynamical effect. We apply this method to epidemic modeling on real contact tracing data and find that it allows for significant compression while remaining faithful to the epidemic dynamics.
△ Less
Submitted 29 March, 2024; v1 submitted 23 May, 2022;
originally announced May 2022.
-
Network Onion Divergence: Network representation and comparison using nested configuration models with fixed connectivity, correlation and centrality patterns
Authors:
Laurent Hébert-Dufresne,
Jean-Gabriel Young,
Alexander Daniels,
Antoine Allard
Abstract:
Random networks, constrained to reproduce specific features of networks, are often used to represent and analyze network data as well as their mathematical descriptions. Chief among them, the configuration model constrains random networks by their degree distribution and is foundational to many areas of network science. However, these representations are often selected based on intuition or mathem…
▽ More
Random networks, constrained to reproduce specific features of networks, are often used to represent and analyze network data as well as their mathematical descriptions. Chief among them, the configuration model constrains random networks by their degree distribution and is foundational to many areas of network science. However, these representations are often selected based on intuition or mathematical and computational simplicity rather than on statistical evidence. To evaluate the quality of a network representation we need to consider both the amount of information required by a random network model as well as the probability of recovering the original data when using the model as a generative process. To this end, we calculate the approximate size of network ensembles generated by the popular configuration model and its generalizations that include degree-correlations and centrality layers based on the onion decomposition. We then apply minimum description length as a model selection criterion and also introduce the Network Onion Divergence: model selection and network comparison over a nested family of configuration models with differing level of structural details. Using over 100 empirical sets of network data, we find that a simple Layered Configuration Model offers the most compact representation of the majority of real networks. We hope that our results will continue to motivate the development of intricate random network models that help capture network structure beyond the simple degree distribution.
△ Less
Submitted 18 April, 2022;
originally announced April 2022.
-
The OCEAN mailing list data set: Network analysis spanning mailing lists and code repositories
Authors:
Melanie Warrick,
Samuel F. Rosenblatt,
Jean-Gabriel Young,
Amanda Casari,
Laurent Hébert-Dufresne,
James Bagrow
Abstract:
Communication surrounding the development of an open source project largely occurs outside the software repository itself. Historically, large communities often used a collection of mailing lists to discuss the different aspects of their projects. Multimodal tool use, with software development and communication happening on different channels, complicates the study of open source projects as a soc…
▽ More
Communication surrounding the development of an open source project largely occurs outside the software repository itself. Historically, large communities often used a collection of mailing lists to discuss the different aspects of their projects. Multimodal tool use, with software development and communication happening on different channels, complicates the study of open source projects as a sociotechnical system. Here, we combine and standardize mailing lists of the Python community, resulting in 954,287 messages from 1995 to the present. We share all scraping and cleaning code to facilitate reproduction of this work, as well as smaller datasets for the Golang (122,721 messages), Angular (20,041 messages) and Node.js (12,514 messages) communities. To showcase the usefulness of these data, we focus on the CPython repository and merge the technical layer (which GitHub account works on what file and with whom) with the social layer (messages from unique email addresses) by identifying 33% of GitHub contributors in the mailing list data. We then explore correlations between the valence of social messaging and the structure of the collaboration network. We discuss how these data provide a laboratory to test theories from standard organizational science in large open source projects.
△ Less
Submitted 1 April, 2022;
originally announced April 2022.
-
Hierarchical team structure and multidimensional localization (or siloing) on networks
Authors:
Laurent Hébert-Dufresne,
Guillaume St-Onge,
John Meluso,
James Bagrow,
Antoine Allard
Abstract:
Knowledge silos emerge when structural properties of organizational interaction networks limit the diffusion of information. These structural barriers are known to take many forms at different scales - hubs in otherwise sparse organisations, large dense teams, or global core-periphery structure - but we lack an understanding of how these different structures interact. Here we bridge the gap betwee…
▽ More
Knowledge silos emerge when structural properties of organizational interaction networks limit the diffusion of information. These structural barriers are known to take many forms at different scales - hubs in otherwise sparse organisations, large dense teams, or global core-periphery structure - but we lack an understanding of how these different structures interact. Here we bridge the gap between the mathematical literature on localization of spreading dynamics and the more applied literature on knowledge silos in organizational interaction networks. To do so, we introduce a new model that considers a layered structure of teams to unveil a new form of hierarchical localization (i.e., the localization of information at the top or center of an organization) and study its interplay with known phenomena of mesoscopic localization (i.e., the localization of information in large groups), $k$-core localization (i.e., around denser $k$-cores) and hub localization (i.e., around high degree stars). We also include a complex contagion mechanism by considering a general infection kernel which can depend on hierarchical level (influence), degree (popularity), infectious neighbors (social reinforcement) or team size (importance). This general model allows us to study the multifaceted phenomenon of information siloing in complex organizational interaction networks and opens the door to new optimization problems to promote or hinder the emergence of different localization regimes.
△ Less
Submitted 1 March, 2022;
originally announced March 2022.
-
Source-sink cooperation dynamics constrain institutional evolution in a group-structured society
Authors:
Laurent Hébert-Dufresne,
Timothy M. Waring,
Guillaume St-Onge,
Meredith T. Niles,
Laura Kati Corlew,
Matthew P. Dube,
Stephanie J. Miller,
Nicholas Gotelli,
Brian J. McGill
Abstract:
Societies change through time, entailing changes in behaviors and institutions. We ask how social change occurs when behaviors and institutions are interdependent. We model a group-structured society in which the transmission of individual behavior occurs in parallel with the selection of group-level institutions. We consider a cooperative behavior that generates collective benefits for groups but…
▽ More
Societies change through time, entailing changes in behaviors and institutions. We ask how social change occurs when behaviors and institutions are interdependent. We model a group-structured society in which the transmission of individual behavior occurs in parallel with the selection of group-level institutions. We consider a cooperative behavior that generates collective benefits for groups but does not spread between individuals on its own. Groups exhibit institutions that increase the diffusion of the behavior within the group, but also incur a group cost. Groups adopt institutions in proportion to their fitness. Finally, cooperative behavior may also spread globally. As expected, we find that cooperation and institutions are mutually reinforcing. But the model also generates behavioral source-sink dynamics when cooperation generated in institutional groups spreads to non-institutional groups, boosting their fitness. Consequently, the global diffusion of cooperation creates a pattern of institutional free-riding that limits the evolution of group-beneficial institutions. Our model suggests that, in a group-structured society, large-scale change in behavior and institutions (i.e. social change) can be best achieved when the two remain correlated, such as through the spread successful pilot programs.
△ Less
Submitted 16 September, 2021;
originally announced September 2021.
-
When the Echo Chamber Shatters: Examining the Use of Community-Specific Language Post-Subreddit Ban
Authors:
Milo Z. Trujillo,
Samuel F. Rosenblatt,
Guillermo de Anda Jáuregui,
Emily Moog,
Briane Paul V. Samson,
Laurent Hébert-Dufresne,
Allison M. Roth
Abstract:
Community-level bans are a common tool against groups that enable online harassment and harmful speech. Unfortunately, the efficacy of community bans has only been partially studied and with mixed results. Here, we provide a flexible unsupervised methodology to identify in-group language and track user activity on Reddit both before and after the ban of a community (subreddit). We use a simple wor…
▽ More
Community-level bans are a common tool against groups that enable online harassment and harmful speech. Unfortunately, the efficacy of community bans has only been partially studied and with mixed results. Here, we provide a flexible unsupervised methodology to identify in-group language and track user activity on Reddit both before and after the ban of a community (subreddit). We use a simple word frequency divergence to identify uncommon words overrepresented in a given community, not as a proxy for harmful speech but as a linguistic signature of the community. We apply our method to 15 banned subreddits, and find that community response is heterogeneous between subreddits and between users of a subreddit. Top users were more likely to become less active overall, while random users often reduced use of in-group language without decreasing activity. Finally, we find some evidence that the effectiveness of bans aligns with the content of a community. Users of dark humor communities were largely unaffected by bans while users of communities organized around white supremacy and fascism were the most affected. Altogether, our results show that bans do not affect all groups or users equally, and pave the way to understanding the effect of bans across communities.
△ Less
Submitted 30 June, 2021;
originally announced June 2021.
-
The penumbra of open source: projects outside of centralized platforms are longer maintained, more academic and more collaborative
Authors:
Milo Z. Trujillo,
Laurent Hébert-Dufresne,
James Bagrow
Abstract:
GitHub has become the central online platform for much of open source, hosting most open source code repositories. With this popularity, the public digital traces of GitHub are now a valuable means to study teamwork and collaboration. In many ways, however, GitHub is a convenience sample, and may not be representative of open source development off the platform. Here we develop a novel, extensive…
▽ More
GitHub has become the central online platform for much of open source, hosting most open source code repositories. With this popularity, the public digital traces of GitHub are now a valuable means to study teamwork and collaboration. In many ways, however, GitHub is a convenience sample, and may not be representative of open source development off the platform. Here we develop a novel, extensive sample of public open source project repositories outside of centralized platforms. We characterized these projects along a number of dimensions, and compare to a time-matched sample of corresponding GitHub projects. Our sample projects tend to have more collaborators, are maintained for longer periods, and tend to be more focused on academic and scientific problems.
△ Less
Submitted 22 May, 2022; v1 submitted 29 June, 2021;
originally announced June 2021.
-
A Review & Framework for Modeling Complex Engineered System Development Processes
Authors:
John Meluso,
Jesse Austin-Breneman,
James P. Bagrow,
Laurent Hébert-Dufresne
Abstract:
Developing complex engineered systems (CES) poses significant challenges for engineers, managers, designers, and businesspeople alike due to the inherent complexity of the systems and contexts involved. Furthermore, experts have expressed great interest in filling the gap in theory about how CES develop. This article begins to address that gap in two ways. First, it reviews the numerous definition…
▽ More
Developing complex engineered systems (CES) poses significant challenges for engineers, managers, designers, and businesspeople alike due to the inherent complexity of the systems and contexts involved. Furthermore, experts have expressed great interest in filling the gap in theory about how CES develop. This article begins to address that gap in two ways. First, it reviews the numerous definitions of CES along with existing theory and methods on CES development processes. Then, it proposes the ComplEx System Integrated Utilities Model (CESIUM), a novel framework for exploring how numerous system and development process characteristics may affect the performance of CES. CESIUM creates simulated representations of a system architecture, the corresponding engineering organization, and the new product development process through which the organization designs the system. It does so by representing the system as a network of interdependent artifacts designed by agents. Agents iteratively design their artifacts through optimization and share information with other agents, thereby advancing the CES toward a solution. This paper describes the model, conducts a sensitivity analysis, provides validation, and suggests directions for future study.
△ Less
Submitted 24 March, 2021; v1 submitted 23 March, 2021;
originally announced March 2021.
-
Which contributions count? Analysis of attribution in open source
Authors:
Jean-Gabriel Young,
Amanda Casari,
Katie McLaughlin,
Milo Z. Trujillo,
Laurent Hébert-Dufresne,
James P. Bagrow
Abstract:
Open source software projects usually acknowledge contributions with text files, websites, and other idiosyncratic methods. These data sources are hard to mine, which is why contributorship is most frequently measured through changes to repositories, such as commits, pushes, or patches. Recently, some open source projects have taken to recording contributor actions with standardized systems; this…
▽ More
Open source software projects usually acknowledge contributions with text files, websites, and other idiosyncratic methods. These data sources are hard to mine, which is why contributorship is most frequently measured through changes to repositories, such as commits, pushes, or patches. Recently, some open source projects have taken to recording contributor actions with standardized systems; this opens up a unique opportunity to understand how community-generated notions of contributorship map onto codebases as the measure of contribution. Here, we characterize contributor acknowledgment models in open source by analyzing thousands of projects that use a model called All Contributors to acknowledge diverse contributions like outreach, finance, infrastructure, and community management. We analyze the life cycle of projects through this model's lens and contrast its representation of contributorship with the picture given by other methods of acknowledgment, including GitHub's top committers indicator and contributions derived from actions taken on the platform. We find that community-generated systems of contribution acknowledgment make work like idea generation or bug finding more visible, which generates a more extensive picture of collaboration. Further, we find that models requiring explicit attribution lead to more clearly defined boundaries around what is and what is not a contribution.
△ Less
Submitted 19 March, 2021;
originally announced March 2021.
-
Containing Future Epidemics with Trustworthy Federated Systems for Ubiquitous Warning and Response
Authors:
Dick Carrillo,
Lam Duc Nguyen,
Pedro H. J. Nardelli,
Evangelos Pournaras,
Plinio Morita,
Demóstenes Z. Rodríguez,
Merim Dzaferagic,
Harun Siljak,
Alexander Jung,
Laurent Hébert-Dufresne,
Irene Macaluso,
Mehar Ullah,
Gustavo Fraidenraich,
Petar Popovski
Abstract:
In this paper, we propose a global digital platform to avoid and combat epidemics by providing relevant real-time information to support selective lockdowns. It leverages the pervasiveness of wireless connectivity while being trustworthy and secure. The proposed system is conceptualized to be decentralized yet federated, based on ubiquitous public systems and active citizen participation. Its foun…
▽ More
In this paper, we propose a global digital platform to avoid and combat epidemics by providing relevant real-time information to support selective lockdowns. It leverages the pervasiveness of wireless connectivity while being trustworthy and secure. The proposed system is conceptualized to be decentralized yet federated, based on ubiquitous public systems and active citizen participation. Its foundations lie on the principle of informational self-determination. We argue that only in this way it can become a trustworthy and legitimate public good infrastructure for citizens by balancing the asymmetry of the different hierarchical levels within the federated organization while providing highly effective detection and guiding mitigation measures towards graceful lockdown of the society. To exemplify the proposed system, we choose the remote patient monitoring as use case. In which, the integration of distributed ledger technologies with narrowband IoT technology is evaluated considering different number of endorsed peers. An experimental proof of concept setup is used to evaluate the performance of this integration, in which the end-to-end latency is slightly increased when a new endorsed element is added. However, the system reliability, privacy, and interoperability are guaranteed. In this sense, we expect active participation of empowered citizens to supplement the more usual top-down management of epidemics.
△ Less
Submitted 25 March, 2021; v1 submitted 26 October, 2020;
originally announced October 2020.
-
Impact and dynamics of hate and counter speech online
Authors:
Joshua Garland,
Keyan Ghazi-Zahedi,
Jean-Gabriel Young,
Laurent Hébert-Dufresne,
Mirta Galesic
Abstract:
Citizen-generated counter speech is a promising way to fight hate speech and promote peaceful, non-polarized discourse. However, there is a lack of large-scale longitudinal studies of its effectiveness for reducing hate speech. To this end, we perform an exploratory analysis of the effectiveness of counter speech using several different macro- and micro-level measures to analyze 180,000 political…
▽ More
Citizen-generated counter speech is a promising way to fight hate speech and promote peaceful, non-polarized discourse. However, there is a lack of large-scale longitudinal studies of its effectiveness for reducing hate speech. To this end, we perform an exploratory analysis of the effectiveness of counter speech using several different macro- and micro-level measures to analyze 180,000 political conversations that took place on German Twitter over four years. We report on the dynamic interactions of hate and counter speech over time and provide insights into whether, as in `classic' bullying situations, organized efforts are more effective than independent individuals in steering online discourse. Taken together, our results build a multifaceted picture of the dynamics of hate and counter speech online. While we make no causal claims due to the complexity of discourse dynamics, our findings suggest that organized hate speech is associated with changes in public discourse and that counter speech -- especially when organized -- may help curb hateful rhetoric in online discourse.
△ Less
Submitted 5 September, 2021; v1 submitted 15 September, 2020;
originally announced September 2020.
-
Network comparison and the within-ensemble graph distance
Authors:
Harrison Hartle,
Brennan Klein,
Stefan McCabe,
Alexander Daniels,
Guillaume St-Onge,
Charles Murphy,
Laurent Hébert-Dufresne
Abstract:
Quantifying the differences between networks is a challenging and ever-present problem in network science. In recent years a multitude of diverse, ad hoc solutions to this problem have been introduced. Here we propose that simple and well-understood ensembles of random networks (such as Erdős-Rényi graphs, random geometric graphs, Watts-Strogatz graphs, the configuration model, and preferential at…
▽ More
Quantifying the differences between networks is a challenging and ever-present problem in network science. In recent years a multitude of diverse, ad hoc solutions to this problem have been introduced. Here we propose that simple and well-understood ensembles of random networks (such as Erdős-Rényi graphs, random geometric graphs, Watts-Strogatz graphs, the configuration model, and preferential attachment networks) are natural benchmarks for network comparison methods. Moreover, we show that the expected distance between two networks independently sampled from a generative model is a useful property that encapsulates many key features of that model. To illustrate our results, we calculate this within-ensemble graph distance and related quantities for classic network models (and several parameterizations thereof) using 20 distance measures commonly used to compare graphs. The within-ensemble graph distance provides a new framework for developers of graph distances to better understand their creations and for practitioners to better choose an appropriate tool for their particular task.
△ Less
Submitted 5 August, 2020;
originally announced August 2020.
-
Limits of Individual Consent and Models of Distributed Consent in Online Social Networks
Authors:
Juniper Lovato,
Antoine Allard,
Randall Harp,
Jeremiah Onaolapo,
Laurent Hébert-Dufresne
Abstract:
Personal data are not discrete in socially-networked digital environments. A user who consents to allow access to their profile can expose the personal data of their network connections to non-consented access. Therefore, the traditional consent model (informed and individual) is not appropriate in social networks where informed consent may not be possible for all users affected by data processing…
▽ More
Personal data are not discrete in socially-networked digital environments. A user who consents to allow access to their profile can expose the personal data of their network connections to non-consented access. Therefore, the traditional consent model (informed and individual) is not appropriate in social networks where informed consent may not be possible for all users affected by data processing and where information is distributed across users. Here, we outline the adequacy of consent for data transactions. Informed by the shortcomings of individual consent, we introduce both a platform-specific model of "distributed consent" and a cross-platform model of a "consent passport." In both models, individuals and groups can coordinate by giving consent conditional on that of their network connections. We simulate the impact of these distributed consent models on the observability of social networks and find that low adoption would allow macroscopic subsets of networks to preserve their connectivity and privacy.
△ Less
Submitted 11 April, 2022; v1 submitted 29 June, 2020;
originally announced June 2020.
-
Countering hate on social media: Large scale classification of hate and counter speech
Authors:
Joshua Garland,
Keyan Ghazi-Zahedi,
Jean-Gabriel Young,
Laurent Hébert-Dufresne,
Mirta Galesic
Abstract:
Hateful rhetoric is plaguing online discourse, fostering extreme societal movements and possibly giving rise to real-world violence. A potential solution to this growing global problem is citizen-generated counter speech where citizens actively engage in hate-filled conversations to attempt to restore civil non-polarized discourse. However, its actual effectiveness in curbing the spread of hatred…
▽ More
Hateful rhetoric is plaguing online discourse, fostering extreme societal movements and possibly giving rise to real-world violence. A potential solution to this growing global problem is citizen-generated counter speech where citizens actively engage in hate-filled conversations to attempt to restore civil non-polarized discourse. However, its actual effectiveness in curbing the spread of hatred is unknown and hard to quantify. One major obstacle to researching this question is a lack of large labeled data sets for training automated classifiers to identify counter speech. Here we made use of a unique situation in Germany where self-labeling groups engaged in organized online hate and counter speech. We used an ensemble learning algorithm which pairs a variety of paragraph embeddings with regularized logistic regression functions to classify both hate and counter speech in a corpus of millions of relevant tweets from these two groups. Our pipeline achieved macro F1 scores on out of sample balanced test sets ranging from 0.76 to 0.97---accuracy in line and even exceeding the state of the art. On thousands of tweets, we used crowdsourcing to verify that the judgments made by the classifier are in close alignment with human judgment. We then used the classifier to discover hate and counter speech in more than 135,000 fully-resolved Twitter conversations occurring from 2013 to 2018 and study their frequency and interaction. Altogether, our results highlight the potential of automated methods to evaluate the impact of coordinated counter speech in stabilizing conversations on social media.
△ Less
Submitted 5 June, 2020; v1 submitted 2 June, 2020;
originally announced June 2020.
-
Efficient sampling of spreading processes on complex networks using a composition and rejection algorithm
Authors:
Guillaume St-Onge,
Jean-Gabriel Young,
Laurent Hébert-Dufresne,
Louis J. Dubé
Abstract:
Efficient stochastic simulation algorithms are of paramount importance to the study of spreading phenomena on complex networks. Using insights and analytical results from network science, we discuss how the structure of contacts affects the efficiency of current algorithms. We show that algorithms believed to require $\mathcal{O}(\log N)$ or even $\mathcal{O}(1)$ operations per update---where $N$…
▽ More
Efficient stochastic simulation algorithms are of paramount importance to the study of spreading phenomena on complex networks. Using insights and analytical results from network science, we discuss how the structure of contacts affects the efficiency of current algorithms. We show that algorithms believed to require $\mathcal{O}(\log N)$ or even $\mathcal{O}(1)$ operations per update---where $N$ is the number of nodes---display instead a polynomial scaling for networks that are either dense or sparse and heterogeneous. This significantly affects the required computation time for simulations on large networks. To circumvent the issue, we propose a node-based method combined with a composition and rejection algorithm, a sampling scheme that has an average-case complexity of $\mathcal{O} [\log(\log N)]$ per update for general networks. This systematic approach is first set-up for Markovian dynamics, but can also be adapted to a number of non-Markovian processes and can enhance considerably the study of a wide range of dynamics on networks.
△ Less
Submitted 11 February, 2019; v1 submitted 15 August, 2018;
originally announced August 2018.
-
Finite size analysis of the detectability limit of the stochastic block model
Authors:
Jean-Gabriel Young,
Patrick Desrosiers,
Laurent Hébert-Dufresne,
Edward Laurence,
Louis J. Dubé
Abstract:
It has been shown in recent years that the stochastic block model (SBM) is sometimes undetectable in the sparse limit, i.e., that no algorithm can identify a partition correlated with the partition used to generate an instance, if the instance is sparse enough and infinitely large. In this contribution, we treat the finite case explicitly, using arguments drawn from information theory and statisti…
▽ More
It has been shown in recent years that the stochastic block model (SBM) is sometimes undetectable in the sparse limit, i.e., that no algorithm can identify a partition correlated with the partition used to generate an instance, if the instance is sparse enough and infinitely large. In this contribution, we treat the finite case explicitly, using arguments drawn from information theory and statistics. We give a necessary condition for finite-size detectability in the general SBM. We then distinguish the concept of average detectability from the concept of instance-by-instance detectability and give explicit formulas for both definitions. Using these formulas, we prove that there exist large equivalence classes of parameters, where widely different network ensembles are equally detectable with respect to our definitions of detectability. In an extensive case study, we investigate the finite-size detectability of a simplified variant of the SBM, which encompasses a number of important models as special cases. These models include the symmetric SBM, the planted coloring model, and more exotic SBMs not previously studied. We conclude with three appendices, where we study the interplay of noise and detectability, establish a connection between our information-theoretic approach and random matrix theory, and provide proofs of some of the more technical results.
△ Less
Submitted 27 June, 2017; v1 submitted 31 December, 2016;
originally announced January 2017.
-
Exotic phase transitions of k-cores in clustered networks
Authors:
Uttam Bhat,
Munik Shrestha,
Laurent Hébert-Dufresne
Abstract:
The giant $k$-core --- maximal connected subgraph of a network where each node has at least $k$ neighbors --- is important in the study of phase transitions and in applications of network theory. Unlike Erdős-Rényi graphs and other random networks where $k$-cores emerge discontinuously for $k\ge 3$, we show that transitive linking (or triadic closure) leads to 3-cores emerging through single or do…
▽ More
The giant $k$-core --- maximal connected subgraph of a network where each node has at least $k$ neighbors --- is important in the study of phase transitions and in applications of network theory. Unlike Erdős-Rényi graphs and other random networks where $k$-cores emerge discontinuously for $k\ge 3$, we show that transitive linking (or triadic closure) leads to 3-cores emerging through single or double phase transitions of both discontinuous and continuous nature. We also develop a $k$-core calculation that includes clustering and provides insights into how high-level connectivity emerges.
△ Less
Submitted 14 October, 2016; v1 submitted 28 July, 2016;
originally announced July 2016.
-
Dynamics of beneficial epidemics
Authors:
Andrew Berdahl,
Christa Brelsford,
Caterina De Bacco,
Marion Dumas,
Vanessa Ferdinand,
Joshua A. Grochow,
Laurent Hébert-Dufresne,
Yoav Kallus,
Christopher P. Kempes,
Artemy Kolchinsky,
Daniel B. Larremore,
Eric Libby,
Eleanor A. Power,
Caitlin A. Stern,
Brendan Tracey
Abstract:
Pathogens can spread epidemically through populations. Beneficial contagions, such as viruses that enhance host survival or technological innovations that improve quality of life, also have the potential to spread epidemically. How do the dynamics of beneficial biological and social epidemics differ from those of detrimental epidemics? We investigate this question using three theoretical approache…
▽ More
Pathogens can spread epidemically through populations. Beneficial contagions, such as viruses that enhance host survival or technological innovations that improve quality of life, also have the potential to spread epidemically. How do the dynamics of beneficial biological and social epidemics differ from those of detrimental epidemics? We investigate this question using three theoretical approaches. First, in the context of population genetics, we show that a horizontally-transmissible element that increases fitness, such as viral DNA, spreads superexponentially through a population, more quickly than a beneficial mutation. Second, in the context of behavioral epidemiology, we show that infections that cause increased connectivity lead to superexponential fixation in the population. Third, in the context of dynamic social networks, we find that preferences for increased global infection accelerate spread and produce superexponential fixation, but preferences for local assortativity halt epidemics by disconnecting the infected from the susceptible. We conclude that the dynamics of beneficial biological and social epidemics are characterized by the rapid spread of beneficial elements, which is facilitated in biological systems by horizontal transmission and in social systems by active spreading behavior of infected individuals.
△ Less
Submitted 17 February, 2017; v1 submitted 7 April, 2016;
originally announced April 2016.
-
Growing networks of overlapping communities with internal structure
Authors:
Jean-Gabriel Young,
Laurent Hébert-Dufresne,
Antoine Allard,
Louis J. Dubé
Abstract:
We introduce an intuitive model that describes both the emergence of community structure and the evolution of the internal structure of communities in growing social networks. The model comprises two complementary mechanisms: One mechanism accounts for the evolution of the internal link structure of a single community, and the second mechanism coordinates the growth of multiple overlapping communi…
▽ More
We introduce an intuitive model that describes both the emergence of community structure and the evolution of the internal structure of communities in growing social networks. The model comprises two complementary mechanisms: One mechanism accounts for the evolution of the internal link structure of a single community, and the second mechanism coordinates the growth of multiple overlapping communities. The first mechanism is based on the assumption that each node establishes links with its neighbors and introduces new nodes to the community at different rates. We demonstrate that this simple mechanism gives rise to an effective maximal degree within communities. This observation is related to the anthropological theory known as Dunbar's number, i.e., the empirical observation of a maximal number of ties which an average individual can sustain within its social groups. The second mechanism is based on a recently proposed generalization of preferential attachment to community structure, appropriately called structural preferential attachment (SPA). The combination of these two mechanisms into a single model (SPA+) allows us to reproduce a number of the global statistics of real networks: The distribution of community sizes, of node memberships and of degrees. The SPA+ model also predicts (a) three qualitative regimes for the degree distribution within overlapping communities and (b) strong correlations between the number of communities to which a node belongs and its number of connections within each community. We present empirical evidence that support our findings in real complex networks.
△ Less
Submitted 25 August, 2016; v1 submitted 17 March, 2016;
originally announced March 2016.
-
Multi-scale structure and topological anomaly detection via a new network statistic: The onion decomposition
Authors:
Laurent Hébert-Dufresne,
Joshua A. Grochow,
Antoine Allard
Abstract:
We introduce a new network statistic that measures diverse structural properties at the micro-, meso-, and macroscopic scales, while still being easy to compute and easy to interpret at a glance. Our statistic, the onion spectrum, is based on the onion decomposition, which refines the k-core decomposition, a standard network fingerprinting method. The onion spectrum is exactly as easy to compute a…
▽ More
We introduce a new network statistic that measures diverse structural properties at the micro-, meso-, and macroscopic scales, while still being easy to compute and easy to interpret at a glance. Our statistic, the onion spectrum, is based on the onion decomposition, which refines the k-core decomposition, a standard network fingerprinting method. The onion spectrum is exactly as easy to compute as the k-cores: It is based on the stages at which each vertex gets removed from a graph in the standard algorithm for computing the k-cores. But the onion spectrum reveals much more information about a network, and at multiple scales; for example, it can be used to quantify node heterogeneity, degree correlations, centrality, and tree- or lattice-likeness of the whole network as well as of each k-core. Furthermore, unlike the k-core decomposition, the combined degree-onion spectrum immediately gives a clear local picture of the network around each node which allows the detection of interesting subgraphs whose topological structure differs from the global network organization. This local description can also be leveraged to easily generate samples from the ensemble of networks with a given joint degree-onion distribution. We demonstrate the utility of the onion spectrum for understanding both static and dynamic properties on several standard graph models and on many real-world networks.
△ Less
Submitted 26 February, 2016; v1 submitted 28 October, 2015;
originally announced October 2015.
-
Complex networks as an emerging property of hierarchical preferential attachment
Authors:
Laurent Hébert-Dufresne,
Edward Laurence,
Antoine Allard,
Jean-Gabriel Young,
Louis J. Dubé
Abstract:
Real complex systems are not rigidly structured; no clear rules or blueprints exist for their construction. Yet, amidst their apparent randomness, complex structural properties universally emerge. We propose that an important class of complex systems can be modeled as an organization of many embedded levels (potentially infinite in number), all of them following the same universal growth principle…
▽ More
Real complex systems are not rigidly structured; no clear rules or blueprints exist for their construction. Yet, amidst their apparent randomness, complex structural properties universally emerge. We propose that an important class of complex systems can be modeled as an organization of many embedded levels (potentially infinite in number), all of them following the same universal growth principle known as preferential attachment. We give examples of such hierarchy in real systems, for instance in the pyramid of production entities of the film industry. More importantly, we show how real complex networks can be interpreted as a projection of our model, from which their scale independence, their clustering, their hierarchy, their fractality and their navigability naturally emerge. Our results suggest that complex networks, viewed as growing systems, can be quite simple, and that the apparent complexity of their structure is largely a reflection of their unobserved hierarchical nature.
△ Less
Submitted 10 December, 2015; v1 submitted 30 November, 2013;
originally announced December 2013.
-
Percolation on random networks with arbitrary k-core structure
Authors:
Laurent Hébert-Dufresne,
Antoine Allard,
Jean-Gabriel Young,
Louis J. Dubé
Abstract:
The k-core decomposition of a network has thus far mainly served as a powerful tool for the empirical study of complex networks. We now propose its explicit integration in a theoretical model. We introduce a Hard-core Random Network model that generates maximally random networks with arbitrary degree distribution and arbitrary k-core structure. We then solve exactly the bond percolation problem on…
▽ More
The k-core decomposition of a network has thus far mainly served as a powerful tool for the empirical study of complex networks. We now propose its explicit integration in a theoretical model. We introduce a Hard-core Random Network model that generates maximally random networks with arbitrary degree distribution and arbitrary k-core structure. We then solve exactly the bond percolation problem on the HRN model and produce fast and precise analytical estimates for the corresponding real networks. Extensive comparison with selected databases reveals that our approach performs better than existing models, while requiring less input information.
△ Less
Submitted 30 September, 2013; v1 submitted 29 August, 2013;
originally announced August 2013.
-
A shadowing problem in the detection of overlapping communities: lifting the resolution limit through a cascading procedure
Authors:
Jean-Gabriel Young,
Antoine Allard,
Laurent Hébert-Dufresne,
Louis J. Dubé
Abstract:
Community detection is the process of assigning nodes and links in significant communities (e.g. clusters, function modules) and its development has led to a better understanding of complex networks. When applied to sizable networks, we argue that most detection algorithms correctly identify prominent communities, but fail to do so across multiple scales. As a result, a significant fraction of the…
▽ More
Community detection is the process of assigning nodes and links in significant communities (e.g. clusters, function modules) and its development has led to a better understanding of complex networks. When applied to sizable networks, we argue that most detection algorithms correctly identify prominent communities, but fail to do so across multiple scales. As a result, a significant fraction of the network is left uncharted. We show that this problem stems from larger or denser communities overshadowing smaller or sparser ones, and that this effect accounts for most of the undetected communities and unassigned links. We propose a generic cascading approach to community detection that circumvents the problem. Using real and artificial network datasets with three widely used community detection algorithms, we show how a simple cascading procedure allows for the detection of the missing communities. This work highlights a new detection limit of community structure, and we hope that our approach can inspire better community detection algorithms.
△ Less
Submitted 30 September, 2015; v1 submitted 6 November, 2012;
originally announced November 2012.
-
On the constrained growth of complex critical systems
Authors:
Laurent Hébert-Dufresne,
Antoine Allard,
Louis J. Dubé
Abstract:
Critical, or scale independent, systems are so ubiquitous, that gaining theoretical insights on their nature and properties has many direct repercussions in social and natural sciences. In this report, we start from the simplest possible growth model for critical systems and deduce constraints in their growth : the well-known preferential attachment principle, and, mainly, a new law of temporal sc…
▽ More
Critical, or scale independent, systems are so ubiquitous, that gaining theoretical insights on their nature and properties has many direct repercussions in social and natural sciences. In this report, we start from the simplest possible growth model for critical systems and deduce constraints in their growth : the well-known preferential attachment principle, and, mainly, a new law of temporal scaling. We then support our scaling law with a number of calculations and simulations of more complex theoretical models : critical percolation, self-organized criticality and fractal growth. Perhaps more importantly, the scaling law is also observed in a number of empirical systems of quite different nature : prose samples, artistic and scientific productivity, citation networks, and the topology of the Internet. We believe that these observations pave the way towards a general and analytical framework for predicting the growth of complex systems.
△ Less
Submitted 6 November, 2012;
originally announced November 2012.
-
Bond percolation on a class of correlated and clustered random graphs
Authors:
Antoine Allard,
Laurent Hébert-Dufresne,
Pierre-André Noël,
Vincent Marceau,
Louis J. Dubé
Abstract:
We introduce a formalism for computing bond percolation properties of a class of correlated and clustered random graphs. This class of graphs is a generalization of the Configuration Model where nodes of different types are connected via different types of hyperedges, edges that can link more than 2 nodes. We argue that the multitype approach coupled with the use of clustered hyperedges can reprod…
▽ More
We introduce a formalism for computing bond percolation properties of a class of correlated and clustered random graphs. This class of graphs is a generalization of the Configuration Model where nodes of different types are connected via different types of hyperedges, edges that can link more than 2 nodes. We argue that the multitype approach coupled with the use of clustered hyperedges can reproduce a wide spectrum of complex patterns, and thus enhances our capability to model real complex networks. As an illustration of this claim, we use our formalism to highlight unusual behaviors of the size and composition of the components (small and giant) in a synthetic, albeit realistic, social network.
△ Less
Submitted 3 September, 2012; v1 submitted 22 January, 2012;
originally announced January 2012.
-
Exact solution of bond percolation on small arbitrary graphs
Authors:
Antoine Allard,
Laurent Hébert-Dufresne,
Pierre-André Noël,
Vincent Marceau,
Louis J. Dubé
Abstract:
We introduce a set of iterative equations that exactly solves the size distribution of components on small arbitrary graphs after the random removal of edges. We also demonstrate how these equations can be used to predict the distribution of the node partitions (i.e., the constrained distribution of the size of each component) in undirected graphs. Besides opening the way to the theoretical predic…
▽ More
We introduce a set of iterative equations that exactly solves the size distribution of components on small arbitrary graphs after the random removal of edges. We also demonstrate how these equations can be used to predict the distribution of the node partitions (i.e., the constrained distribution of the size of each component) in undirected graphs. Besides opening the way to the theoretical prediction of percolation on arbitrary graphs of large but finite size, we show how our results find application in graph theory, epidemiology, percolation and fragmentation theory.
△ Less
Submitted 27 April, 2012; v1 submitted 20 January, 2012;
originally announced January 2012.
-
Modeling the dynamical interaction between epidemics on overlay networks
Authors:
Vincent Marceau,
Pierre-André Noël,
Laurent Hébert-Dufresne,
Antoine Allard,
Louis J. Dubé
Abstract:
Epidemics seldom occur as isolated phenomena. Typically, two or more viral agents spread within the same host population and may interact dynamically with each other. We present a general model where two viral agents interact via an immunity mechanism as they propagate simultaneously on two networks connecting the same set of nodes. Exploiting a correspondence between the propagation dynamics and…
▽ More
Epidemics seldom occur as isolated phenomena. Typically, two or more viral agents spread within the same host population and may interact dynamically with each other. We present a general model where two viral agents interact via an immunity mechanism as they propagate simultaneously on two networks connecting the same set of nodes. Exploiting a correspondence between the propagation dynamics and a dynamical process performing progressive network generation, we develop an analytic approach that accurately captures the dynamical interaction between epidemics on overlay networks. The formalism allows for overlay networks with arbitrary joint degree distribution and overlap. To illustrate the versatility of our approach, we consider a hypothetical delayed intervention scenario in which an immunizing agent is disseminated in a host population to hinder the propagation of an undesirable agent (e.g. the spread of preventive information in the context of an emerging infectious disease).
△ Less
Submitted 24 June, 2011; v1 submitted 21 March, 2011;
originally announced March 2011.
-
Propagation on networks: an exact alternative perspective
Authors:
Pierre-André Noël,
Antoine Allard,
Laurent Hébert-Dufresne,
Vincent Marceau,
Louis J. Dubé
Abstract:
By generating the specifics of a network structure only when needed (on-the-fly), we derive a simple stochastic process that exactly models the time evolution of susceptible-infectious dynamics on finite-size networks. The small number of dynamical variables of this birth-death Markov process greatly simplifies analytical calculations. We show how a dual analytical description, treating large scal…
▽ More
By generating the specifics of a network structure only when needed (on-the-fly), we derive a simple stochastic process that exactly models the time evolution of susceptible-infectious dynamics on finite-size networks. The small number of dynamical variables of this birth-death Markov process greatly simplifies analytical calculations. We show how a dual analytical description, treating large scale epidemics with a Gaussian approximations and small outbreaks with a branching process, provides an accurate approximation of the distribution even for rather small networks. The approach also offers important computational advantages and generalizes to a vast class of systems.
△ Less
Submitted 1 March, 2012; v1 submitted 4 February, 2011;
originally announced February 2011.