subscribe to arXiv mailings

When Lawvere meets Peirce: an equational presentation of boolean hyperdoctrines

Authors: Filippo Bonchi, Alessandro Di Giorgio, Davide Trotta

Abstract: Fo-bicategories are a categorification of Peirce's calculus of relations. Notably, their laws provide a proof system for first-order logic that is both purely equational and complete. This paper illustrates a correspondence between fo-bicategories and Lawvere's hyperdoctrines. To streamline our proof, we introduce peircean bicategories, which offer a more succinct characterization of fo-bicategori… ▽ More Fo-bicategories are a categorification of Peirce's calculus of relations. Notably, their laws provide a proof system for first-order logic that is both purely equational and complete. This paper illustrates a correspondence between fo-bicategories and Lawvere's hyperdoctrines. To streamline our proof, we introduce peircean bicategories, which offer a more succinct characterization of fo-bicategories. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.16676 [pdf, ps, other]

Multilayer Correlation Clustering

Authors: Atsushi Miyauchi, Florian Adriaens, Francesco Bonchi, Nikolaj Tatti

Abstract: In this paper, we establish Multilayer Correlation Clustering, a novel generalization of Correlation Clustering (Bansal et al., FOCS '02) to the multilayer setting. In this model, we are given a series of inputs of Correlation Clustering (called layers) over the common set $V$. The goal is then to find a clustering of $V$ that minimizes the $\ell_p$-norm ($p\geq 1$) of the disagreements vector, wh… ▽ More In this paper, we establish Multilayer Correlation Clustering, a novel generalization of Correlation Clustering (Bansal et al., FOCS '02) to the multilayer setting. In this model, we are given a series of inputs of Correlation Clustering (called layers) over the common set $V$. The goal is then to find a clustering of $V$ that minimizes the $\ell_p$-norm ($p\geq 1$) of the disagreements vector, which is defined as the vector (with dimension equal to the number of layers), each element of which represents the disagreements of the clustering on the corresponding layer. For this generalization, we first design an $O(L\log n)$-approximation algorithm, where $L$ is the number of layers, based on the well-known region growing technique. We then study an important special case of our problem, namely the problem with the probability constraint. For this case, we first give an $(α+2)$-approximation algorithm, where $α$ is any possible approximation ratio for the single-layer counterpart. For instance, we can take $α=2.5$ in general (Ailon et al., JACM '08) and $α=1.73+ε$ for the unweighted case (Cohen-Addad et al., FOCS '23). Furthermore, we design a $4$-approximation algorithm, which improves the above approximation ratio of $α+2=4.5$ for the general probability-constraint case. Computational experiments using real-world datasets demonstrate the effectiveness of our proposed algorithms. △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2402.07718 [pdf, other]

Local Centrality Minimization with Quality Guarantees

Authors: Atsushi Miyauchi, Lorenzo Severini, Francesco Bonchi

Abstract: Centrality measures, quantifying the importance of vertices or edges, play a fundamental role in network analysis. To date, triggered by some positive approximability results, a large body of work has been devoted to studying centrality maximization, where the goal is to maximize the centrality score of a target vertex by manipulating the structure of a given network. On the other hand, due to the… ▽ More Centrality measures, quantifying the importance of vertices or edges, play a fundamental role in network analysis. To date, triggered by some positive approximability results, a large body of work has been devoted to studying centrality maximization, where the goal is to maximize the centrality score of a target vertex by manipulating the structure of a given network. On the other hand, due to the lack of such results, only very little attention has been paid to centrality minimization, despite its practical usefulness. In this study, we introduce a novel optimization model for local centrality minimization, where the manipulation is allowed only around the target vertex. We prove the NP-hardness of our model and that the most intuitive greedy algorithm has a quite limited performance in terms of approximation ratio. Then we design two effective approximation algorithms: The first algorithm is a highly-scalable algorithm that has an approximation ratio unachievable by the greedy algorithm, while the second algorithm is a bicriteria approximation algorithm that solves a continuous relaxation based on the Lovász extension, using a projected subgradient method. To the best of our knowledge, ours are the first polynomial-time algorithms with provable approximation guarantees for centrality minimization. Experiments using a variety of real-world networks demonstrate the effectiveness of our proposed algorithms: Our first algorithm is applicable to million-scale graphs and obtains much better solutions than those of scalable baselines, while our second algorithm is rather strong against adversarial instances. △ Less

Submitted 12 February, 2024; originally announced February 2024.

Comments: Accepted to The Web Conference 2024

arXiv:2402.01400 [pdf, other]

Query-Efficient Correlation Clustering with Noisy Oracle

Authors: Yuko Kuroki, Atsushi Miyauchi, Francesco Bonchi, Wei Chen

Abstract: We study a general clustering setting in which we have $n$ elements to be clustered, and we aim to perform as few queries as possible to an oracle that returns a noisy sample of the similarity between two elements. Our setting encompasses many application domains in which the similarity function is costly to compute and inherently noisy. We propose two novel formulations of online learning problem… ▽ More We study a general clustering setting in which we have $n$ elements to be clustered, and we aim to perform as few queries as possible to an oracle that returns a noisy sample of the similarity between two elements. Our setting encompasses many application domains in which the similarity function is costly to compute and inherently noisy. We propose two novel formulations of online learning problems rooted in the paradigm of Pure Exploration in Combinatorial Multi-Armed Bandits (PE-CMAB): fixed confidence and fixed budget settings. For both settings, we design algorithms that combine a sampling strategy with a classic approximation algorithm for correlation clustering and study their theoretical guarantees. Our results are the first examples of polynomial-time algorithms that work for the case of PE-CMAB in which the underlying offline optimization problem is NP-hard. △ Less

Submitted 2 February, 2024; originally announced February 2024.

arXiv:2401.16088 [pdf, other]

Fairness in Algorithmic Recourse Through the Lens of Substantive Equality of Opportunity

Authors: Andrew Bell, Joao Fonseca, Carlo Abrate, Francesco Bonchi, Julia Stoyanovich

Abstract: Algorithmic recourse -- providing recommendations to those affected negatively by the outcome of an algorithmic system on how they can take action and change that outcome -- has gained attention as a means of giving persons agency in their interactions with artificial intelligence (AI) systems. Recent work has shown that even if an AI decision-making classifier is ``fair'' (according to some reaso… ▽ More Algorithmic recourse -- providing recommendations to those affected negatively by the outcome of an algorithmic system on how they can take action and change that outcome -- has gained attention as a means of giving persons agency in their interactions with artificial intelligence (AI) systems. Recent work has shown that even if an AI decision-making classifier is ``fair'' (according to some reasonable criteria), recourse itself may be unfair due to differences in the initial circumstances of individuals, compounding disparities for marginalized populations and requiring them to exert more effort than others. There is a need to define more methods and metrics for evaluating fairness in recourse that span a range of normative views of the world, and specifically those that take into account time. Time is a critical element in recourse because the longer it takes an individual to act, the more the setting may change due to model or data drift. This paper seeks to close this research gap by proposing two notions of fairness in recourse that are in normative alignment with substantive equality of opportunity, and that consider time. The first considers the (often repeated) effort individuals exert per successful recourse event, and the second considers time per successful recourse event. Building upon an agent-based framework for simulating recourse, this paper demonstrates how much effort is needed to overcome disparities in initial circumstances. We then proposes an intervention to improve the fairness of recourse by rewarding effort, and compare it to existing strategies. △ Less

Submitted 29 January, 2024; originally announced January 2024.

arXiv:2401.07055 [pdf, other]

Diagrammatic Algebra of First Order Logic

Authors: Filippo Bonchi, Alessandro Di Giorgio, Nathan Haydon, Pawel Sobocinski

Abstract: We introduce the calculus of neo-Peircean relations, a string diagrammatic extension of the calculus of binary relations that has the same expressivity as first order logic and comes with a complete axiomatisation. The axioms are obtained by combining two well known categorical structures: cartesian and linear bicategories. We introduce the calculus of neo-Peircean relations, a string diagrammatic extension of the calculus of binary relations that has the same expressivity as first order logic and comes with a complete axiomatisation. The axioms are obtained by combining two well known categorical structures: cartesian and linear bicategories. △ Less

Submitted 13 January, 2024; originally announced January 2024.

arXiv:2311.15756 [pdf, other]

doi 10.1109/TSP.2024.3401072

Learning Multi-Frequency Partial Correlation Graphs

Authors: Gabriele D'Acunto, Paolo Di Lorenzo, Francesco Bonchi, Stefania Sardellitti, Sergio Barbarossa

Abstract: Despite the large research effort devoted to learning dependencies between time series, the state of the art still faces a major limitation: existing methods learn partial correlations but fail to discriminate across distinct frequency bands. Motivated by many applications in which this differentiation is pivotal, we overcome this limitation by learning a block-sparse, frequency-dependent, partial… ▽ More Despite the large research effort devoted to learning dependencies between time series, the state of the art still faces a major limitation: existing methods learn partial correlations but fail to discriminate across distinct frequency bands. Motivated by many applications in which this differentiation is pivotal, we overcome this limitation by learning a block-sparse, frequency-dependent, partial correlation graph, in which layers correspond to different frequency bands, and partial correlations can occur over just a few layers. To this aim, we formulate and solve two nonconvex learning problems: the first has a closed-form solution and is suitable when there is prior knowledge about the number of partial correlations; the second hinges on an iterative solution based on successive convex approximation, and is effective for the general case where no prior knowledge is available. Numerical results on synthetic data show that the proposed methods outperform the current state of the art. Finally, the analysis of financial time series confirms that partial correlations exist only within a few frequency bands, underscoring how our methods enable the gaining of valuable insights that would be undetected without discriminating along the frequency domain. △ Less

Submitted 12 May, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

Comments: Accepted at IEEE Transactions on Signal Processing

Journal ref: IEEE Transactions on Signal Processing, vol. 72, pp. 2953-2969, 2024

arXiv:2311.00118 [pdf, other]

Extracting the Multiscale Causal Backbone of Brain Dynamics

Authors: Gabriele D'Acunto, Francesco Bonchi, Gianmarco De Francisci Morales, Giovanni Petri

Abstract: The bulk of the research effort on brain connectivity revolves around statistical associations among brain regions, which do not directly relate to the causal mechanisms governing brain dynamics. Here we propose the multiscale causal backbone (MCB) of brain dynamics, shared by a set of individuals across multiple temporal scales, and devise a principled methodology to extract it. Our approach le… ▽ More The bulk of the research effort on brain connectivity revolves around statistical associations among brain regions, which do not directly relate to the causal mechanisms governing brain dynamics. Here we propose the multiscale causal backbone (MCB) of brain dynamics, shared by a set of individuals across multiple temporal scales, and devise a principled methodology to extract it. Our approach leverages recent advances in multiscale causal structure learning and optimizes the trade-off between the model fit and its complexity. Empirical assessment on synthetic data shows the superiority of our methodology over a baseline based on canonical functional connectivity networks. When applied to resting-state fMRI data, we find sparse MCBs for both the left and right brain hemispheres. Thanks to its multiscale nature, our approach shows that at low-frequency bands, causal dynamics are driven by brain regions associated with high-level cognitive functions; at higher frequencies instead, nodes related to sensory processing play a crucial role. Finally, our analysis of individual multiscale causal structures confirms the existence of a causal fingerprint of brain connectivity, thus supporting the existing extensive research in brain connectivity fingerprinting from a causal perspective. △ Less

Submitted 19 March, 2024; v1 submitted 31 October, 2023; originally announced November 2023.

Comments: Accepted at the 3rd conference on Causal Learning and Reasoning (CLeaR 2024)

arXiv:2309.06969 [pdf, other]

doi 10.1145/3617694.3623251

Setting the Right Expectations: Algorithmic Recourse Over Time

Authors: Joao Fonseca, Andrew Bell, Carlo Abrate, Francesco Bonchi, Julia Stoyanovich

Abstract: Algorithmic systems are often called upon to assist in high-stakes decision making. In light of this, algorithmic recourse, the principle wherein individuals should be able to take action against an undesirable outcome made by an algorithmic system, is receiving growing attention. The bulk of the literature on algorithmic recourse to-date focuses primarily on how to provide recourse to a single in… ▽ More Algorithmic systems are often called upon to assist in high-stakes decision making. In light of this, algorithmic recourse, the principle wherein individuals should be able to take action against an undesirable outcome made by an algorithmic system, is receiving growing attention. The bulk of the literature on algorithmic recourse to-date focuses primarily on how to provide recourse to a single individual, overlooking a critical element: the effects of a continuously changing context. Disregarding these effects on recourse is a significant oversight, since, in almost all cases, recourse consists of an individual making a first, unfavorable attempt, and then being given an opportunity to make one or several attempts at a later date - when the context might have changed. This can create false expectations, as initial recourse recommendations may become less reliable over time due to model drift and competition for access to the favorable outcome between individuals. In this work we propose an agent-based simulation framework for studying the effects of a continuously changing environment on algorithmic recourse. In particular, we identify two main effects that can alter the reliability of recourse for individuals represented by the agents: (1) competition with other agents acting upon recourse, and (2) competition with new agents entering the environment. Our findings highlight that only a small set of specific parameterizations result in algorithmic recourse that is reliable for agents over time. Consequently, we argue that substantial additional work is needed to understand recourse reliability over time, and to develop recourse methods that reward agents' effort. △ Less

Submitted 13 September, 2023; originally announced September 2023.

arXiv:2308.14486 [pdf, other]

doi 10.1145/3583780.3615025

Rebalancing Social Feed to Minimize Polarization and Disagreement

Authors: Federico Cinus, Aristides Gionis, Francesco Bonchi

Abstract: Social media have great potential for enabling public discourse on important societal issues. However, adverse effects, such as polarization and echo chambers, greatly impact the benefits of social media and call for algorithms that mitigate these effects. In this paper, we propose a novel problem formulation aimed at slightly nudging users' social feeds in order to strike a balance between releva… ▽ More Social media have great potential for enabling public discourse on important societal issues. However, adverse effects, such as polarization and echo chambers, greatly impact the benefits of social media and call for algorithms that mitigate these effects. In this paper, we propose a novel problem formulation aimed at slightly nudging users' social feeds in order to strike a balance between relevance and diversity, thus mitigating the emergence of polarization, without lowering the quality of the feed. Our approach is based on re-weighting the relative importance of the accounts that a user follows, so as to calibrate the frequency with which the content produced by various accounts is shown to the user. We analyze the convexity properties of the problem, demonstrating the non-matrix convexity of the objective function and the convexity of the feasible set. To efficiently address the problem, we develop a scalable algorithm based on projected gradient descent. We also prove that our problem statement is a proper generalization of the undirected-case problem so that our method can also be adopted for undirected social networks. As a baseline for comparison in the undirected case, we develop a semidefinite programming approach, which provides the optimal solution. Through extensive experiments on synthetic and real-world datasets, we validate the effectiveness of our approach, which outperforms non-trivial baselines, underscoring its ability to foster healthier and more cohesive online communities. △ Less

Submitted 28 August, 2023; originally announced August 2023.

Comments: Accepted for publication at ACM CIKM 2023

arXiv:2307.14849 [pdf, other]

Counterfactual Explanations for Graph Classification Through the Lenses of Density

Authors: Carlo Abrate, Giulia Preti, Francesco Bonchi

Abstract: Counterfactual examples have emerged as an effective approach to produce simple and understandable post-hoc explanations. In the context of graph classification, previous work has focused on generating counterfactual explanations by manipulating the most elementary units of a graph, i.e., removing an existing edge, or adding a non-existing one. In this paper, we claim that such language of explana… ▽ More Counterfactual examples have emerged as an effective approach to produce simple and understandable post-hoc explanations. In the context of graph classification, previous work has focused on generating counterfactual explanations by manipulating the most elementary units of a graph, i.e., removing an existing edge, or adding a non-existing one. In this paper, we claim that such language of explanation might be too fine-grained, and turn our attention to some of the main characterizing features of real-world complex networks, such as the tendency to close triangles, the existence of recurring motifs, and the organization into dense modules. We thus define a general density-based counterfactual search framework to generate instance-level counterfactual explanations for graph classifiers, which can be instantiated with different notions of dense substructures. In particular, we show two specific instantiations of this general framework: a method that searches for counterfactual graphs by opening or closing triangles, and a method driven by maximal cliques. We also discuss how the general method can be instantiated to exploit any other notion of dense substructures, including, for instance, a given taxonomy of nodes. We evaluate the effectiveness of our approaches in 7 brain network datasets and compare the counterfactual statements generated according to several widely-used metrics. Results confirm that adopting a semantic-relevant unit of change like density is essential to define versatile and interpretable counterfactual explanation methods. △ Less

Submitted 27 July, 2023; originally announced July 2023.

arXiv:2307.02817 [pdf, ps, other]

Exploiting Adjoints in Property Directed Reachability Analysis

Authors: Mayuko Kori, Flavio Ascari, Filippo Bonchi, Roberto Bruni, Roberta Gori, Ichiro Hasuo

Abstract: We formulate, in lattice-theoretic terms, two novel algorithms inspired by Bradley's property directed reachability algorithm. For finding safe invariants or counterexamples, the first algorithm exploits over-approximations of both forward and backward transition relations, expressed abstractly by the notion of adjoints. In the absence of adjoints, one can use the second algorithm, which exploits… ▽ More We formulate, in lattice-theoretic terms, two novel algorithms inspired by Bradley's property directed reachability algorithm. For finding safe invariants or counterexamples, the first algorithm exploits over-approximations of both forward and backward transition relations, expressed abstractly by the notion of adjoints. In the absence of adjoints, one can use the second algorithm, which exploits lower sets and their principals. As a notable example of application, we consider quantitative reachability problems for Markov Decision Processes. △ Less

Submitted 6 July, 2023; originally announced July 2023.

Comments: 44 pages, 11 figures, the full version of the paper accepted by CAV 2023

arXiv:2306.04828 [pdf, other]

Fast and Effective GNN Training with Linearized Random Spanning Trees

Authors: Francesco Bonchi, Claudio Gentile, Francesco Paolo Nerini, André Panisson, Fabio Vitale

Abstract: We present a new effective and scalable framework for training GNNs in node classification tasks, based on the effective resistance, a powerful tool solidly rooted in graph theory. Our approach progressively refines the GNN weights on an extensive sequence of random spanning trees, suitably transformed into path graphs that retain essential topological and node information of the original graph. T… ▽ More We present a new effective and scalable framework for training GNNs in node classification tasks, based on the effective resistance, a powerful tool solidly rooted in graph theory. Our approach progressively refines the GNN weights on an extensive sequence of random spanning trees, suitably transformed into path graphs that retain essential topological and node information of the original graph. The sparse nature of these path graphs substantially lightens the computational burden of GNN training. This not only enhances scalability but also effectively addresses common issues like over-squashing, over-smoothing, and performance deterioration caused by overfitting in small training set regimes. We carry out an extensive experimental investigation on a number of real-world graph benchmarks, where we apply our framework to graph convolutional networks, showing simultaneous improvement of both training speed and test accuracy over a wide pool of representative baselines. △ Less

Submitted 14 February, 2024; v1 submitted 7 June, 2023; originally announced June 2023.

arXiv:2306.02696 [pdf, other]

Hyper-distance Oracles in Hypergraphs

Authors: Giulia Preti, Gianmarco De Francisci Morales, Francesco Bonchi

Abstract: We study point-to-point distance estimation in hypergraphs, where the query is parameterized by a positive integer s, which defines the required level of overlap for two hyperedges to be considered adjacent. To answer s-distance queries, we first explore an oracle based on the line graph of the given hypergraph and discuss its limitations: the main one is that the line graph is typically orders of… ▽ More We study point-to-point distance estimation in hypergraphs, where the query is parameterized by a positive integer s, which defines the required level of overlap for two hyperedges to be considered adjacent. To answer s-distance queries, we first explore an oracle based on the line graph of the given hypergraph and discuss its limitations: the main one is that the line graph is typically orders of magnitude larger than the original hypergraph. We then introduce HypED, a landmark-based oracle with a predefined size, built directly on the hypergraph, thus avoiding constructing the line graph. Our framework allows to approximately answer vertex-to-vertex, vertex-to-hyperedge, and hyperedge-to-hyperedge s-distance queries for any value of s. A key observation at the basis of our framework is that, as s increases, the hypergraph becomes more fragmented. We show how this can be exploited to improve the placement of landmarks, by identifying the s-connected components of the hypergraph. For this task, we devise an efficient algorithm based on the union-find technique and a dynamic inverted index. We experimentally evaluate HypED on several real-world hypergraphs and prove its versatility in answering s-distance queries for different values of s. Our framework allows answering such queries in fractions of a millisecond, while allowing fine-grained control of the trade-off between index size and approximation error at creation time. Finally, we prove the usefulness of the s-distance oracle in two applications, namely, hypergraph-based recommendation and the approximation of the s-closeness centrality of vertices and hyper-edges in the context of protein-to-protein interactions. △ Less

Submitted 19 March, 2024; v1 submitted 5 June, 2023; originally announced June 2023.

Comments: To appear in VLDBJ

arXiv:2303.14467 [pdf, other]

doi 10.1145/3653298

A Survey on the Densest Subgraph Problem and Its Variants

Authors: Tommaso Lanciano, Atsushi Miyauchi, Adriano Fazzone, Francesco Bonchi

Abstract: The Densest Subgraph Problem requires to find, in a given graph, a subset of vertices whose induced subgraph maximizes a measure of density. The problem has received a great deal of attention in the algorithmic literature since the early 1970s, with many variants proposed and many applications built on top of this basic definition. Recent years have witnessed a revival of research interest in this… ▽ More The Densest Subgraph Problem requires to find, in a given graph, a subset of vertices whose induced subgraph maximizes a measure of density. The problem has received a great deal of attention in the algorithmic literature since the early 1970s, with many variants proposed and many applications built on top of this basic definition. Recent years have witnessed a revival of research interest in this problem with several important contributions, including some groundbreaking results, published in 2022 and 2023. This survey provides a deep overview of the fundamental results and an exhaustive coverage of the many variants proposed in the literature, with a special attention to the most recent results. The survey also presents a comprehensive overview of applications and discusses some interesting open problems for this evergreen research topic. △ Less

Submitted 18 April, 2024; v1 submitted 25 March, 2023; originally announced March 2023.

Comments: Accepted to ACM Computing Surveys

arXiv:2211.00980 [pdf, other]

doi 10.48786/edbt.2024.01

Balancing Utility and Fairness in Submodular Maximization (Technical Report)

Authors: Yanhao Wang, Yuchen Li, Francesco Bonchi, Ying Wang

Abstract: Submodular function maximization is a fundamental combinatorial optimization problem with plenty of applications -- including data summarization, influence maximization, and recommendation. In many of these problems, the goal is to find a solution that maximizes the average utility over all users, for each of whom the utility is defined by a monotone submodular function. However, when the populati… ▽ More Submodular function maximization is a fundamental combinatorial optimization problem with plenty of applications -- including data summarization, influence maximization, and recommendation. In many of these problems, the goal is to find a solution that maximizes the average utility over all users, for each of whom the utility is defined by a monotone submodular function. However, when the population of users is composed of several demographic groups, another critical problem is whether the utility is fairly distributed across different groups. Although the \emph{utility} and \emph{fairness} objectives are both desirable, they might contradict each other, and, to the best of our knowledge, little attention has been paid to optimizing them jointly. To fill this gap, we propose a new problem called \emph{Bicriteria Submodular Maximization} (BSM) to balance utility and fairness. Specifically, it requires finding a fixed-size solution to maximize the utility function, subject to the value of the fairness function not being below a threshold. Since BSM is inapproximable within any constant factor, we focus on designing efficient instance-dependent approximation schemes. Our algorithmic proposal comprises two methods, with different approximation factors, obtained by converting a BSM instance into other submodular optimization problem instances. Using real-world and synthetic datasets, we showcase applications of our proposed methods in three submodular maximization problems: maximum coverage, influence maximization, and facility location. △ Less

Submitted 19 June, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

Comments: 14 pages, 11 figures; accepted to EDBT 2024

arXiv:2210.17234 [pdf, other]

doi 10.1038/s41598-022-21720-4

The language of opinion change on social media under the lens of communicative action

Authors: Corrado Monti, Luca Maria Aiello, Gianmarco De Francisci Morales, Francesco Bonchi

Abstract: Which messages are more effective at inducing a change of opinion in the listener? We approach this question within the frame of Habermas' theory of communicative action, which posits that the illocutionary intent of the message (its pragmatic meaning) is the key. Thanks to recent advances in natural language processing, we are able to operationalize this theory by extracting the latent social dim… ▽ More Which messages are more effective at inducing a change of opinion in the listener? We approach this question within the frame of Habermas' theory of communicative action, which posits that the illocutionary intent of the message (its pragmatic meaning) is the key. Thanks to recent advances in natural language processing, we are able to operationalize this theory by extracting the latent social dimensions of a message, namely archetypes of social intent of language, that come from social exchange theory. We identify key ingredients to opinion change by looking at more than 46k posts and more than 3.5M comments on Reddit's r/ChangeMyView, a debate forum where people try to change each other's opinion and explicitly mark opinion-changing comments with a special flag called "delta". Comments that express no intent are about 77% less likely to change the mind of the recipient, compared to comments that convey at least one social dimension. Among the various social dimensions, the ones that are most likely to produce an opinion change are knowledge, similarity, and trust, which resonates with Habermas' theory of communicative action. We also find other new important dimensions, such as appeals to power or empathetic expressions of support. Finally, in line with theories of constructive conflict, yet contrary to the popular characterization of conflict as the bane of modern social media, our findings show that voicing conflict in the context of a structured public debate can promote integration, especially when it is used to counter another conflictive stance. By leveraging recent advances in natural language processing, our work provides an empirical framework for Habermas' theory, finds concrete examples of its effects in the wild, and suggests its possible extension with a more faceted understanding of intent interpreted as social dimensions of language. △ Less

Submitted 31 October, 2022; originally announced October 2022.

Comments: Main paper: 13 pages, 1 figure, 3 tables. Supplementary material: 9 pages, 6 figures, 8 tables

ACM Class: H.4.0; K.4.0

Journal ref: Nature Scientific Reports 12, 17920 (2022)

arXiv:2210.09950 [pdf, other]

doi 10.1145/3571257

Deconstructing the Calculus of Relations with Tape Diagrams

Authors: Filippo Bonchi, Alessandro Di Giorgio, Alessio Santamaria

Abstract: Rig categories with finite biproducts are categories with two monoidal products, where one is a biproduct and the other distributes over it. In this work we present tape diagrams, a sound and complete diagrammatic language for these categories, that can be intuitively thought as string diagrams of string diagrams. We test the effectiveness of our approach against the positive fragment of Tarski's… ▽ More Rig categories with finite biproducts are categories with two monoidal products, where one is a biproduct and the other distributes over it. In this work we present tape diagrams, a sound and complete diagrammatic language for these categories, that can be intuitively thought as string diagrams of string diagrams. We test the effectiveness of our approach against the positive fragment of Tarski's calculus of relations. △ Less

Submitted 18 October, 2022; originally announced October 2022.

Journal ref: Proceedings of the ACM on Programming Languages, Vol. 7, POPL 2023

arXiv:2208.14989 [pdf, other]

Learning Multiscale Non-stationary Causal Structures

Authors: Gabriele D'Acunto, Gianmarco De Francisci Morales, Paolo Bajardi, Francesco Bonchi

Abstract: This paper addresses a gap in the current state of the art by providing a solution for modeling causal relationships that evolve over time and occur at different time scales. Specifically, we introduce the multiscale non-stationary directed acyclic graph (MN-DAG), a framework for modeling multivariate time series data. Our contribution is twofold. Firstly, we expose a probabilistic generative mode… ▽ More This paper addresses a gap in the current state of the art by providing a solution for modeling causal relationships that evolve over time and occur at different time scales. Specifically, we introduce the multiscale non-stationary directed acyclic graph (MN-DAG), a framework for modeling multivariate time series data. Our contribution is twofold. Firstly, we expose a probabilistic generative model by leveraging results from spectral and causality theories. Our model allows sampling an MN-DAG according to user-specified priors on the time-dependence and multiscale properties of the causal graph. Secondly, we devise a Bayesian method named Multiscale Non-stationary Causal Structure Learner (MN-CASTLE) that uses stochastic variational inference to estimate MN-DAGs. The method also exploits information from the local partial correlation between time series over different time resolutions. The data generated from an MN-DAG reproduces well-known features of time series in different domains, such as volatility clustering and serial correlation. Additionally, we show the superior performance of MN-CASTLE on synthetic data with different multiscale and non-stationary properties compared to baseline models. Finally, we apply MN-CASTLE to identify the drivers of the natural gas prices in the US market. Causal relationships have strengthened during the COVID-19 outbreak and the Russian invasion of Ukraine, a fact that baseline methods fail to capture. MN-CASTLE identifies the causal impact of critical economic drivers on natural gas prices, such as seasonal factors, economic uncertainty, oil prices, and gas storage deviations. △ Less

Submitted 17 November, 2023; v1 submitted 31 August, 2022; originally announced August 2022.

Journal ref: Transactions on Machine Learning Research, 2023, ISSN 2835-8856

arXiv:2208.04620 [pdf, other]

Cascade-based Echo Chamber Detection

Authors: Marco Minici, Federico Cinus, Corrado Monti, Francesco Bonchi, Giuseppe Manco

Abstract: Despite echo chambers in social media have been under considerable scrutiny, general models for their detection and analysis are missing. In this work, we aim to fill this gap by proposing a probabilistic generative model that explains social media footprints -- i.e., social network structure and propagations of information -- through a set of latent communities, characterized by a degree of echo-… ▽ More Despite echo chambers in social media have been under considerable scrutiny, general models for their detection and analysis are missing. In this work, we aim to fill this gap by proposing a probabilistic generative model that explains social media footprints -- i.e., social network structure and propagations of information -- through a set of latent communities, characterized by a degree of echo-chamber behavior and by an opinion polarity. Specifically, echo chambers are modeled as communities that are permeable to pieces of information with similar ideological polarity, and impermeable to information of opposed leaning: this allows discriminating echo chambers from communities that lack a clear ideological alignment. To learn the model parameters we propose a scalable, stochastic adaptation of the Generalized Expectation Maximization algorithm, that optimizes the joint likelihood of observing social connections and information propagation. Experiments on synthetic data show that our algorithm is able to correctly reconstruct ground-truth latent communities with their degree of echo-chamber behavior and opinion polarity. Experiments on real-world data about polarized social and political debates, such as the Brexit referendum or the COVID-19 vaccine campaign, confirm the effectiveness of our proposal in detecting echo chambers. Finally, we show how our model can improve accuracy in auxiliary predictive tasks, such as stance detection and prediction of future propagations. △ Less

Submitted 9 August, 2022; originally announced August 2022.

Comments: Accepted for publication at ACM CIKM 2022

arXiv:2207.12196 [pdf, other]

On the Relation Between Opinion Change and Information Consumption on Reddit

Authors: Flavio Petruzzellis, Corrado Monti, Gianmarco De Francisci Morales, Francesco Bonchi

Abstract: While much attention has been devoted to the causes of opinion change, little is known about its consequences. Our study sheds a light on the relationship between one user's opinion change episode and subsequent behavioral change on an online social media, Reddit. In particular, we look at r/ChangeMyView, an online community dedicated to debating one's own opinions. Interestingly, this forum adopt… ▽ More While much attention has been devoted to the causes of opinion change, little is known about its consequences. Our study sheds a light on the relationship between one user's opinion change episode and subsequent behavioral change on an online social media, Reddit. In particular, we look at r/ChangeMyView, an online community dedicated to debating one's own opinions. Interestingly, this forum adopts a well-codified schema for explicitly self-reporting opinion change. Starting from this ground truth, we analyze changes in future online information consumption behavior that arise after a self-reported opinion change on sociopolitical topics; and in particular, operationalized in this work as the participation to sociopolitical subreddits. Such participation profile is important as it represents one's information diet, and is a reliable proxy for, e.g., political affiliation or health choices. We find that people who report an opinion change are significantly more likely to change their future participation in a specific subset of online communities. We characterize which communities are more likely to be abandoned after opinion change, and find a significant association (r=0.46) between propaganda-like language used in a community and the increase in chances of leaving it. We find comparable results (r=0.39) for the opposite direction, i.e., joining a community. This finding suggests how propagandistic communities act as a first gateway to internalize a shift in one's sociopolitical opinion. Finally, we show that the textual content of the discussion associated with opinion change is indicative of which communities are going to be subject to a participation change. In fact, a predictive model based only on the opinion change post is able to pinpoint these communities with an AP@5 of 0.20, similar to what can be reached by using all the past history of participation in communities. △ Less

Submitted 25 July, 2022; originally announced July 2022.

Comments: To appear in Proceedings of the International AAAI Conference on Web and Social Media (ICWSM 2023)

ACM Class: J.4; K.4

arXiv:2205.05052 [pdf, other]

On learning agent-based models from data

Authors: Corrado Monti, Marco Pangallo, Gianmarco De Francisci Morales, Francesco Bonchi

Abstract: Agent-Based Models (ABMs) are used in several fields to study the evolution of complex systems from micro-level assumptions. However, ABMs typically can not estimate agent-specific (or "micro") variables: this is a major limitation which prevents ABMs from harnessing micro-level data availability and which greatly limits their predictive power. In this paper, we propose a protocol to learn the lat… ▽ More Agent-Based Models (ABMs) are used in several fields to study the evolution of complex systems from micro-level assumptions. However, ABMs typically can not estimate agent-specific (or "micro") variables: this is a major limitation which prevents ABMs from harnessing micro-level data availability and which greatly limits their predictive power. In this paper, we propose a protocol to learn the latent micro-variables of an ABM from data. The first step of our protocol is to reduce an ABM to a probabilistic model, characterized by a computationally tractable likelihood. This reduction follows two general design principles: balance of stochasticity and data availability, and replacement of unobservable discrete choices with differentiable approximations. Then, our protocol proceeds by maximizing the likelihood of the latent variables via a gradient-based expectation maximization algorithm. We demonstrate our protocol by applying it to an ABM of the housing market, in which agents with different incomes bid higher prices to live in high-income neighborhoods. We demonstrate that the obtained model allows accurate estimates of the latent variables, while preserving the general behavior of the ABM. We also show that our estimates can be used for out-of-sample forecasting. Our protocol can be seen as an alternative to black-box data assimilation methods, that forces the modeler to lay bare the assumptions of the model, to think about the inferential process, and to spot potential identification problems. △ Less

Submitted 23 November, 2022; v1 submitted 10 May, 2022; originally announced May 2022.

arXiv:2202.08815 [pdf, other]

GRAPHSHAP: Explaining Identity-Aware Graph Classifiers Through the Language of Motifs

Authors: Alan Perotti, Paolo Bajardi, Francesco Bonchi, André Panisson

Abstract: Most methods for explaining black-box classifiers (e.g. on tabular data, images, or time series) rely on measuring the impact that removing/perturbing features has on the model output. This forces the explanation language to match the classifier's feature space. However, when dealing with graph data, in which the basic features correspond to the edges describing the graph structure, this matching… ▽ More Most methods for explaining black-box classifiers (e.g. on tabular data, images, or time series) rely on measuring the impact that removing/perturbing features has on the model output. This forces the explanation language to match the classifier's feature space. However, when dealing with graph data, in which the basic features correspond to the edges describing the graph structure, this matching between features space and explanation language might not be appropriate. Decoupling the feature space (edges) from a desired high-level explanation language (such as motifs) is thus a major challenge towards developing actionable explanations for graph classification tasks. In this paper we introduce GRAPHSHAP, a Shapley-based approach able to provide motif-based explanations for identity-aware graph classifiers, assuming no knowledge whatsoever about the model or its training data: the only requirement is that the classifier can be queried as a black-box at will. For the sake of computational efficiency we explore a progressive approximation strategy and show how a simple kernel can efficiently approximate explanation scores, thus allowing GRAPHSHAP to scale on scenarios with a large explanation space (i.e. large number of motifs). We showcase GRAPHSHAP on a real-world brain-network dataset consisting of patients affected by Autism Spectrum Disorder and a control group. Our experiments highlight how the classification provided by a black-box model can be effectively explained by few connectomics patterns. △ Less

Submitted 7 July, 2023; v1 submitted 17 February, 2022; originally announced February 2022.

Comments: Accepted by International Joint Conference on Neural Networks 2023 (IJCNN)

arXiv:2202.00640 [pdf, other]

doi 10.1145/3485447.3512143

Rewiring What-to-Watch-Next Recommendations to Reduce Radicalization Pathways

Authors: Francesco Fabbri, Yanhao Wang, Francesco Bonchi, Carlos Castillo, Michael Mathioudakis

Abstract: Recommender systems typically suggest to users content similar to what they consumed in the past. If a user happens to be exposed to strongly polarized content, she might subsequently receive recommendations which may steer her towards more and more radicalized content, eventually being trapped in what we call a "radicalization pathway". In this paper, we study the problem of mitigating radicaliza… ▽ More Recommender systems typically suggest to users content similar to what they consumed in the past. If a user happens to be exposed to strongly polarized content, she might subsequently receive recommendations which may steer her towards more and more radicalized content, eventually being trapped in what we call a "radicalization pathway". In this paper, we study the problem of mitigating radicalization pathways using a graph-based approach. Specifically, we model the set of recommendations of a "what-to-watch-next" recommender as a d-regular directed graph where nodes correspond to content items, links to recommendations, and paths to possible user sessions. We measure the "segregation" score of a node representing radicalized content as the expected length of a random walk from that node to any node representing non-radicalized content. High segregation scores are associated to larger chances to get users trapped in radicalization pathways. Hence, we define the problem of reducing the prevalence of radicalization pathways by selecting a small number of edges to "rewire", so to minimize the maximum of segregation scores among all radicalized nodes, while maintaining the relevance of the recommendations. We prove that the problem of finding the optimal set of recommendations to rewire is NP-hard and NP-hard to approximate within any factor. Therefore, we turn our attention to heuristics, and propose an efficient yet effective greedy algorithm based on the absorbing random walk theory. Our experiments on real-world datasets in the context of video and news recommendations confirm the effectiveness of our proposal. △ Less

Submitted 1 February, 2022; originally announced February 2022.

Comments: To appear in the Web conference 2022 (WWW '22)

arXiv:2201.08005 [pdf]

doi 10.1145/3485447.3512191

FreSCo: Mining Frequent Patterns in Simplicial Complexes

Authors: Giulia Preti, Gianmarco De Francisci Morales, Francesco Bonchi

Abstract: Simplicial complexes are a generalization of graphs that model higher-order relations. In this paper, we introduce simplicial patterns -- that we call simplets -- and generalize the task of frequent pattern mining from the realm of graphs to that of simplicial complexes. Our task is particularly challenging due to the enormous search space and the need for higher-order isomorphism. We show that fi… ▽ More Simplicial complexes are a generalization of graphs that model higher-order relations. In this paper, we introduce simplicial patterns -- that we call simplets -- and generalize the task of frequent pattern mining from the realm of graphs to that of simplicial complexes. Our task is particularly challenging due to the enormous search space and the need for higher-order isomorphism. We show that finding the occurrences of simplets in a complex can be reduced to a bipartite graph isomorphism problem, in linear time and at most quadratic space. We then propose an anti-monotonic frequency measure that allows us to start the exploration from small simplets and stop expanding a simplet as soon as its frequency falls below the minimum frequency threshold. Equipped with these ideas and a clever data structure, we develop a memory-conscious algorithm that, by carefully exploiting the relationships among the simplices in the complex and among the simplets, achieves efficiency and scalability for our complex mining task. Our algorithm, FreSCo, comes in two flavors: it can compute the exact frequency of the simplets or, more quickly, it can determine whether a simplet is frequent, without having to compute the exact frequency. Experimental results prove the ability of FreSCo to mine frequent simplets in complexes of various size and dimension, and the significance of the simplets with respect to the traditional graph patterns. △ Less

Submitted 26 January, 2022; v1 submitted 20 January, 2022; originally announced January 2022.

Comments: To appear at The Web Conference 2022

arXiv:2112.15488 [pdf, ps, other]

doi 10.1145/3494561

Multi-relation Graph Summarization

Authors: Xiangyu Ke, Arijit Khan, Francesco Bonchi

Abstract: Graph summarization is beneficial in a wide range of applications, such as visualization, interactive and exploratory analysis, approximate query processing, reducing the on-disk storage footprint, and graph processing in modern hardware. However, the bulk of the literature on graph summarization surprisingly overlooks the possibility of having edges of different types. In this paper, we study the… ▽ More Graph summarization is beneficial in a wide range of applications, such as visualization, interactive and exploratory analysis, approximate query processing, reducing the on-disk storage footprint, and graph processing in modern hardware. However, the bulk of the literature on graph summarization surprisingly overlooks the possibility of having edges of different types. In this paper, we study the novel problem of producing summaries of multi-relation networks, i.e., graphs where multiple edges of different types may exist between any pair of nodes. Multi-relation graphs are an expressive model of real-world activities, in which a relation can be a topic in social networks, an interaction type in genetic networks, or a snapshot in temporal graphs. The first approach that we consider for multi-relation graph summarization is a two-step method based on summarizing each relation in isolation, and then aggregating the resulting summaries in some clever way to produce a final unique summary. In doing this, as a side contribution, we provide the first polynomial-time approximation algorithm based on the k-Median clustering for the classic problem of lossless single-relation graph summarization. Then, we demonstrate the shortcomings of these two-step methods, and propose holistic approaches, both approximate and heuristic algorithms, to compute a summary directly for multi-relation graphs. In particular, we prove that the approximation bound of k-Median clustering for the single relation solution can be maintained in a multi-relation graph with proper aggregation operation over adjacency matrices corresponding to its multiple relations. Experimental results and case studies (on co-authorship networks and brain networks) validate the effectiveness and efficiency of the proposed algorithms. △ Less

Submitted 24 December, 2021; originally announced December 2021.

Comments: To appear, ACM TKDD

arXiv:2112.08237 [pdf, other]

Exposure Inequality in People Recommender Systems: The Long-Term Effects

Authors: Francesco Fabbri, Maria Luisa Croci, Francesco Bonchi, Carlos Castillo

Abstract: People recommender systems may affect the exposure that users receive in social networking platforms, influencing attention dynamics and potentially strengthening pre-existing inequalities that disproportionately affect certain groups. In this paper we introduce a model to simulate the feedback loop created by multiple rounds of interactions between users and a link recommender in a social netwo… ▽ More People recommender systems may affect the exposure that users receive in social networking platforms, influencing attention dynamics and potentially strengthening pre-existing inequalities that disproportionately affect certain groups. In this paper we introduce a model to simulate the feedback loop created by multiple rounds of interactions between users and a link recommender in a social network. This allows us to study the long-term consequences of those particular recommendation algorithms. Our model is equipped with several parameters to control (i) the level of homophily in the network, (ii) the relative size of the groups, (iii) the choice among several state-of-the-art link recommenders, and (iv) the choice among three different user behavior models, that decide which recommendations are accepted or rejected. Our extensive experimentation with the proposed model shows that a minority group, if homophilic enough, can get a disproportionate advantage in exposure from all link recommenders. Instead, when it is heterophilic, it gets under-exposed. Moreover, while the homophily level of the minority affects the speed of the growth of the disparate exposure, the relative size of the minority affects the magnitude of the effect. Finally, link recommenders strengthen exposure inequalities at the individual level, exacerbating the "rich-get-richer" effect: this happens for both the minority and the majority class and independently of their level of homophily. △ Less

Submitted 15 December, 2021; originally announced December 2021.

Comments: To appear in ICWSM 2022

arXiv:2112.03337 [pdf, other]

Dense and well-connected subgraph detection in dual networks

Authors: Tianyi Chen, Francesco Bonchi, David Garcia-Soriano, Atsushi Miyauchi, Charalampos E. Tsourakakis

Abstract: Dense subgraph discovery is a fundamental problem in graph mining with a wide range of applications \cite{gionis2015dense}. Despite a large number of applications ranging from computational neuroscience to social network analysis, that take as input a {\em dual} graph, namely a pair of graphs on the same set of nodes, dense subgraph discovery methods focus on a single graph input with few notable… ▽ More Dense subgraph discovery is a fundamental problem in graph mining with a wide range of applications \cite{gionis2015dense}. Despite a large number of applications ranging from computational neuroscience to social network analysis, that take as input a {\em dual} graph, namely a pair of graphs on the same set of nodes, dense subgraph discovery methods focus on a single graph input with few notable exceptions \cite{semertzidis2019finding,charikar2018finding,reinthal2016finding,jethava2015finding}. In this work, we focus the following problem: given a pair of graphs $G,H$ on the same set of nodes $V$, how do we find a subset of nodes $S \subseteq V$ that induces a well-connected subgraph in $G$ and a dense subgraph in $H$? Our formulation generalizes previous research on dual graphs \cite{Wu+15,WuZLFJZ16,Cui2018}, by enabling the {\em control} of the connectivity constraint on $G$. We propose a novel mathematical formulation based on $k$-edge connectivity, and prove that it is solvable exactly in polynomial time. We compare our method to state-of-the-art competitors; we find empirically that ranging the connectivity constraint enables the practitioner to obtain insightful information that is otherwise inaccessible. Finally, we show that our proposed mining tool can be used to better understand how users interact on Twitter, and connectivity aspects of human brain networks with and without Autism Spectrum Disorder (ASD). △ Less

Submitted 6 December, 2021; originally announced December 2021.

arXiv:2112.00626 [pdf, other]

The Effect of People Recommenders on Echo Chambers and Polarization

Authors: Federico Cinus, Marco Minici, Corrado Monti, Francesco Bonchi

Abstract: The effects of social media on critical issues, such as polarization and misinformation, are under scrutiny due to the disruptive consequences that these phenomena can have on our societies. Among the algorithms routinely used by social media platforms, people-recommender systems are of special interest, as they directly contribute to the evolution of the social network structure, affecting the in… ▽ More The effects of social media on critical issues, such as polarization and misinformation, are under scrutiny due to the disruptive consequences that these phenomena can have on our societies. Among the algorithms routinely used by social media platforms, people-recommender systems are of special interest, as they directly contribute to the evolution of the social network structure, affecting the information and the opinions users are exposed to. In this paper, we propose a framework to assess the effect of people recommenders on the evolution of opinions. Our proposal is based on Monte Carlo simulations combining link recommendation and opinion-dynamics models. In order to control initial conditions, we define a random network model to generate graphs with opinions, with tunable amounts of modularity and homophily. We join these elements into a methodology to study the effects of the recommender system on echo chambers and polarization. We also show how to use our framework to measure, by means of simulations, the impact of different intervention strategies. Our thorough experimentation shows that people recommenders can in fact lead to a significant increase in echo chambers. However, this happens only if there is considerable initial homophily in the network. Also, we find that if the network already contains echo chambers, the effect of the recommendation algorithm is negligible. Such findings are robust to two very different opinion dynamics models, a bounded confidence model and an epistemological model. △ Less

Submitted 1 December, 2021; originally announced December 2021.

Comments: To appear in: Proceedings of the International AAAI Conference on Web and Social Media, vol. 16 (ICWSM '22)

ACM Class: I.6; J.4

arXiv:2111.05072 [pdf, other]

doi 10.1145/3490354.3494370

The Evolving Causal Structure of Equity Risk Factors

Authors: Gabriele D'Acunto, Paolo Bajardi, Francesco Bonchi, Gianmarco De Francisci Morales

Abstract: In recent years, multi-factor strategies have gained increasing popularity in the financial industry, as they allow investors to have a better understanding of the risk drivers underlying their portfolios. Moreover, such strategies promise to promote diversification and thus limit losses in times of financial turmoil. However, recent studies have reported a significant level of redundancy between… ▽ More In recent years, multi-factor strategies have gained increasing popularity in the financial industry, as they allow investors to have a better understanding of the risk drivers underlying their portfolios. Moreover, such strategies promise to promote diversification and thus limit losses in times of financial turmoil. However, recent studies have reported a significant level of redundancy between these factors, which might enhance risk contagion among multi-factor portfolios during financial crises. Therefore, it is of fundamental importance to better understand the relationships among factors. Empowered by recent advances in causal structure learning methods, this paper presents a study of the causal structure of financial risk factors and its evolution over time. In particular, the data we analyze covers 11 risk factors concerning the US equity market, spanning a period of 29 years at daily frequency. Our results show a statistically significant sparsifying trend of the underlying causal structure. However, this trend breaks down during periods of financial stress, in which we can observe a densification of the causal network driven by a growth of the out-degree of the market factor node. Finally, we present a comparison with the analysis of factors cross-correlations, which further confirms the importance of causal analysis for gaining deeper insights in the dynamics of the factor system, particularly during economic downturns. Our findings are especially significant from a risk-management perspective. They link the evolution of the causal structure of equity risk factors with market volatility and a worsening macroeconomic environment, and show that, in times of financial crisis, exposure to different factors boils down to exposure to the market risk factor. △ Less

Submitted 9 November, 2021; originally announced November 2021.

Journal ref: ACM International Conference on AI in Finance, 2021

arXiv:2109.13589 [pdf, other]

doi 10.1145/3459637.3482444

Learning Ideological Embeddings from Information Cascades

Authors: Corrado Monti, Giuseppe Manco, Cigdem Aslay, Francesco Bonchi

Abstract: Modeling information cascades in a social network through the lenses of the ideological leaning of its users can help understanding phenomena such as misinformation propagation and confirmation bias, and devising techniques for mitigating their toxic effects. In this paper we propose a stochastic model to learn the ideological leaning of each user in a multidimensional ideological space, by anal… ▽ More Modeling information cascades in a social network through the lenses of the ideological leaning of its users can help understanding phenomena such as misinformation propagation and confirmation bias, and devising techniques for mitigating their toxic effects. In this paper we propose a stochastic model to learn the ideological leaning of each user in a multidimensional ideological space, by analyzing the way politically salient content propagates. In particular, our model assumes that information propagates from one user to another if both users are interested in the topic and ideologically aligned with each other. To infer the parameters of our model, we devise a gradient-based optimization procedure maximizing the likelihood of an observed set of information cascades. Our experiments on real-world political discussions on Twitter and Reddit confirm that our model is able to learn the political stance of the social media users in a multidimensional ideological space. △ Less

Submitted 28 September, 2021; originally announced September 2021.

Comments: Published in CIKM 2021

ACM Class: J.4; G.3

Journal ref: Proceedings of the 30th ACM International Conference on Information and Knowledge Management (CIKM 2021)

arXiv:2109.06049 [pdf, other]

String Diagram Rewrite Theory III: Confluence with and without Frobenius

Authors: Filippo Bonchi, Fabio Gadducci, Aleks Kissinger, Paweł Sobociński, Fabio Zanasi

Abstract: In this paper we address the problem of proving confluence for string diagram rewriting, which was previously shown to be characterised combinatorically as double-pushout rewriting with interfaces (DPOI) on (labelled) hypergraphs. For standard DPO rewriting without interfaces, confluence for terminating rewrite systems is, in general, undecidable. Nevertheless, we show here that confluence for DPO… ▽ More In this paper we address the problem of proving confluence for string diagram rewriting, which was previously shown to be characterised combinatorically as double-pushout rewriting with interfaces (DPOI) on (labelled) hypergraphs. For standard DPO rewriting without interfaces, confluence for terminating rewrite systems is, in general, undecidable. Nevertheless, we show here that confluence for DPOI, and hence string diagram rewriting, is decidable. We apply this result to give effective procedures for deciding local confluence of symmetric monoidal theories with and without Frobenius structure by critical pair analysis. For the latter, we introduce the new notion of path joinability for critical pairs, which enables finitely many joins of a critical pair to be lifted to an arbitrary context in spite of the strong non-local constraints placed on rewriting in a generic symmetric monoidal theory. △ Less

Submitted 18 April, 2022; v1 submitted 13 September, 2021; originally announced September 2021.

arXiv:2108.10718 [pdf, other]

doi 10.46298/lmcs-18(4:8)2022

Convexity via Weak Distributive Laws

Authors: Filippo Bonchi, Alessio Santamaria

Abstract: We study the canonical weak distributive law $δ$ of the powerset monad over the semimodule monad for a certain class of semirings containing, in particular, positive semifields. For this subclass we characterise $δ$ as a convex closure in the free semimodule of a set. Using the abstract theory of weak distributive laws, we compose the powerset and the semimodule monads via $δ$, obtaining the monad… ▽ More We study the canonical weak distributive law $δ$ of the powerset monad over the semimodule monad for a certain class of semirings containing, in particular, positive semifields. For this subclass we characterise $δ$ as a convex closure in the free semimodule of a set. Using the abstract theory of weak distributive laws, we compose the powerset and the semimodule monads via $δ$, obtaining the monad of convex subsets of the free semimodule. △ Less

Submitted 22 November, 2022; v1 submitted 19 August, 2021; originally announced August 2021.

Comments: arXiv admin note: substantial text overlap with arXiv:2012.14778

Journal ref: Logical Methods in Computer Science, Volume 18, Issue 4 (November 23, 2022) lmcs:8389

arXiv:2106.08652 [pdf, other]

Maxmin-Fair Ranking: Individual Fairness under Group-Fairness Constraints

Authors: David Garcia-Soriano, Francesco Bonchi

Abstract: We study a novel problem of fairness in ranking aimed at minimizing the amount of individual unfairness introduced when enforcing group-fairness constraints. Our proposal is rooted in the distributional maxmin fairness theory, which uses randomization to maximize the expected satisfaction of the worst-off individuals. We devise an exact polynomial-time algorithm to find maxmin-fair distributions o… ▽ More We study a novel problem of fairness in ranking aimed at minimizing the amount of individual unfairness introduced when enforcing group-fairness constraints. Our proposal is rooted in the distributional maxmin fairness theory, which uses randomization to maximize the expected satisfaction of the worst-off individuals. We devise an exact polynomial-time algorithm to find maxmin-fair distributions of general search problems (including, but not limited to, ranking), and show that our algorithm can produce rankings which, while satisfying the given group-fairness constraints, ensure that the maximum possible value is brought to individuals. △ Less

Submitted 17 June, 2021; v1 submitted 16 June, 2021; originally announced June 2021.

Comments: In proceedings of KDD 2021

arXiv:2106.08640 [pdf, other]

Counterfactual Graphs for Explainable Classification of Brain Networks

Authors: Carlo Abrate, Francesco Bonchi

Abstract: Training graph classifiers able to distinguish between healthy brains and dysfunctional ones, can help identifying substructures associated to specific cognitive phenotypes. However, the mere predictive power of the graph classifier is of limited interest to the neuroscientists, which have plenty of tools for the diagnosis of specific mental disorders. What matters is the interpretation of the mod… ▽ More Training graph classifiers able to distinguish between healthy brains and dysfunctional ones, can help identifying substructures associated to specific cognitive phenotypes. However, the mere predictive power of the graph classifier is of limited interest to the neuroscientists, which have plenty of tools for the diagnosis of specific mental disorders. What matters is the interpretation of the model, as it can provide novel insights and new hypotheses. In this paper we propose \emph{counterfactual graphs} as a way to produce local post-hoc explanations of any black-box graph classifier. Given a graph and a black-box, a counterfactual is a graph which, while having high structural similarity with the original graph, is classified by the black-box in a different class. We propose and empirically compare several strategies for counterfactual graph search. Our experiments against a white-box classifier with known optimal counterfactual, show that our methods, although heuristic, can produce counterfactuals very close to the optimal one. Finally, we show how to use counterfactual graphs to build global explanations correctly capturing the behaviour of different black-box classifiers and providing interesting insights for the neuroscientists. △ Less

Submitted 18 June, 2021; v1 submitted 16 June, 2021; originally announced June 2021.

Comments: In proceedings of KDD 2021

arXiv:2106.08142 [pdf, other]

doi 10.4230/LIPIcs.CALCO.2021.10

On Doctrines and Cartesian Bicategories

Authors: Filippo Bonchi, Alessio Santamaria, Jens Seeber, Paweł Sobociński

Abstract: We study the relationship between cartesian bicategories and a specialisation of Lawvere's hyperdoctrines, namely elementary existential doctrines. Both provide different ways of abstracting the structural properties of logical systems: the former in algebraic terms based on a string diagrammatic calculus, the latter in universal terms using the fundamental notion of adjoint functor. We prove that… ▽ More We study the relationship between cartesian bicategories and a specialisation of Lawvere's hyperdoctrines, namely elementary existential doctrines. Both provide different ways of abstracting the structural properties of logical systems: the former in algebraic terms based on a string diagrammatic calculus, the latter in universal terms using the fundamental notion of adjoint functor. We prove that these two approaches are related by an adjunction, which can be strengthened to an equivalence by imposing further constraints on doctrines. △ Less

Submitted 15 June, 2021; originally announced June 2021.

ACM Class: F.3; F.4

Journal ref: 9th Conference on Algebra and Coalgebra in Computer Science (CALCO 2021)

arXiv:2105.10946 [pdf, other]

doi 10.4230/LIPIcs.FSTTCS.2021.40

Diagrammatic Polyhedral Algebra

Authors: Filippo Bonchi, Alessandro Di Giorgio, Pawel Sobocinski

Abstract: We extend the theory of Interacting Hopf algebras with an order primitive, and give a sound and complete axiomatisation of the prop of polyhedral cones. Next, we axiomatise an affine extension and prove soundness and completeness for the prop of polyhedra. We extend the theory of Interacting Hopf algebras with an order primitive, and give a sound and complete axiomatisation of the prop of polyhedral cones. Next, we axiomatise an affine extension and prove soundness and completeness for the prop of polyhedra. △ Less

Submitted 24 September, 2021; v1 submitted 23 May, 2021; originally announced May 2021.

arXiv:2104.14686 [pdf, ps, other]

String Diagram Rewrite Theory II: Rewriting with Symmetric Monoidal Structure

Authors: Filippo Bonchi, Fabio Gadducci, Aleks Kissinger, Pawel Sobocinski, Fabio Zanasi

Abstract: Symmetric monoidal theories (SMTs) generalise algebraic theories in a way that make them suitable to express resource-sensitive systems, in which variables cannot be copied or discarded at will. In SMTs, traditional tree-like terms are replaced by string diagrams, topological entities that can be intuitively thoughts as diagrams of wires and boxes. Recently, string diagrams have become increasin… ▽ More Symmetric monoidal theories (SMTs) generalise algebraic theories in a way that make them suitable to express resource-sensitive systems, in which variables cannot be copied or discarded at will. In SMTs, traditional tree-like terms are replaced by string diagrams, topological entities that can be intuitively thoughts as diagrams of wires and boxes. Recently, string diagrams have become increasingly popular as a graphical syntax to reason about computational models across diverse fields, including programming language semantics, circuit theory, quantum mechanics, linguistics, and control theory. In applications, it is often convenient to implement the equations appearing in SMTs as rewriting rules. This poses the challenge of extending the traditional theory of term rewriting, which has been developed for algebraic theories, to string diagrams. In this paper, we develop a mathematical theory of string diagram rewriting for SMTs. Our approach exploits the correspondence between string diagram rewriting and double pushout (DPO) rewriting of certain graphs, introduced in the first paper of this series. Such a correspondence is only sound when the SMT includes a Frobenius algebra structure. In the present work, we show how an analogous correspondence may be established for arbitrary SMTs, once an appropriate notion of DPO rewriting (which we call convex) is identified. As proof of concept, we use our approach to show termination of two SMTs of interest: Frobenius semi-algebras and bialgebras. △ Less

Submitted 14 September, 2022; v1 submitted 29 April, 2021; originally announced April 2021.

arXiv:2102.07564 [pdf]

doi 10.1145/3442381.3450073

STruD: Truss Decomposition of Simplicial Complexes

Authors: Giulia Preti, Gianmarco De Francisci Morales, Francesco Bonchi

Abstract: A simplicial complex is a generalization of a graph: a collection of n-ary relationships (instead of binary as the edges of a graph), named simplices. In this paper, we develop a new tool to study the structure of simplicial complexes: we generalize the graph notion of truss decomposition to complexes, and show that this more powerful representation gives rise to different properties compared to t… ▽ More A simplicial complex is a generalization of a graph: a collection of n-ary relationships (instead of binary as the edges of a graph), named simplices. In this paper, we develop a new tool to study the structure of simplicial complexes: we generalize the graph notion of truss decomposition to complexes, and show that this more powerful representation gives rise to different properties compared to the graph-based one. This power, however, comes with important computational challenges derived from the combinatorial explosion caused by the downward closure property of complexes. Drawing upon ideas from itemset mining and similarity search, we design a memory-aware algorithm, dubbed STruD, which is able to efficiently compute the truss decomposition of a simplicial complex. STruD adapts its behavior to the amount of available memory by storing intermediate data in a compact way. We then devise a variant that computes directly the n simplices of maximum trussness. By applying STruD to several datasets, we prove its scalability, and provide an analysis of their structure. Finally, we show that the truss decomposition can be seen as a filtration, and as such it can be used to study the persistent homology of a dataset, a method for computing topological features at different spatial resolutions, prominent in Topological Data Analysis. △ Less

Submitted 15 February, 2021; originally announced February 2021.

ACM Class: G.2.2

Journal ref: Proceedings of The Web Conference 2021

arXiv:2012.14778 [pdf, ps, other]

doi 10.1007/978-3-030-71995-1_6

Combining Semilattices and Semimodules

Authors: Filippo Bonchi, Alessio Santamaria

Abstract: We describe the canonical weak distributive law $δ\colon \mathcal S \mathcal P \to \mathcal P \mathcal S$ of the powerset monad $\mathcal P$ over the $S$-left-semimodule monad $\mathcal S$, for a class of semirings $S$. We show that the composition of $\mathcal P$ with $\mathcal S$ by means of such $δ$ yields almost the monad of convex subsets previously introduced by Jacobs: the only difference c… ▽ More We describe the canonical weak distributive law $δ\colon \mathcal S \mathcal P \to \mathcal P \mathcal S$ of the powerset monad $\mathcal P$ over the $S$-left-semimodule monad $\mathcal S$, for a class of semirings $S$. We show that the composition of $\mathcal P$ with $\mathcal S$ by means of such $δ$ yields almost the monad of convex subsets previously introduced by Jacobs: the only difference consists in the absence in Jacobs's monad of the empty convex set. We provide a handy characterisation of the canonical weak lifting of $\mathcal P$ to $\mathbb{EM}(\mathcal S)$ as well as an algebraic theory for the resulting composed monad. Finally, we restrict the composed monad to finitely generated convex subsets and we show that it is presented by an algebraic theory combining semimodules and semilattices with bottom, which are the algebras for the finite powerset monad $\mathcal P_f$. △ Less

Submitted 24 January, 2021; v1 submitted 29 December, 2020; originally announced December 2020.

Journal ref: Foundations of Software Science and Computation Structures. FOSSACS 2021. Lecture Notes in Computer Science, vol 12650 (2021), pp 102-123. Springer, Cham

arXiv:2012.01847 [pdf, other]

String Diagram Rewrite Theory I: Rewriting with Frobenius Structure

Authors: Filippo Bonchi, Fabio Gadducci, Aleks Kissinger, Pawel Sobocinski, Fabio Zanasi

Abstract: String diagrams are a powerful and intuitive graphical syntax, originated in the study of symmetric monoidal categories. In the last few years, they have found application in the modelling of various computational structures, in fields as diverse as Computer Science, Physics, Control Theory, Linguistics, and Biology. In many such proposals, the transformations of the described systems are modell… ▽ More String diagrams are a powerful and intuitive graphical syntax, originated in the study of symmetric monoidal categories. In the last few years, they have found application in the modelling of various computational structures, in fields as diverse as Computer Science, Physics, Control Theory, Linguistics, and Biology. In many such proposals, the transformations of the described systems are modelled as rewrite rules of diagrams. These developments demand a mathematical foundation for string diagram rewriting: whereas rewrite theory for terms is well-understood, the two-dimensional nature of string diagrams poses additional challenges. This work systematises and expands a series of recent conference papers laying down such foundation. As first step, we focus on the case of rewrite systems for string diagrammatic theories which feature a Frobenius algebra. This situation ubiquitously appear in various approaches: for instance, in the algebraic semantics of linear dynamical systems, Frobenius structures model the wiring of circuits; in categorical quantum mechanics, they model interacting quantum observables. Our work introduces a combinatorial interpretation of string diagram rewriting modulo Frobenius structures, in terms of double-pushout hypergraph rewriting. Furthermore, we prove this interpretation to be sound and complete. In the last part, we also see that the approach can be generalised to model rewriting modulo multiple Frobenius structures. As a proof of concept, we show how to derive from these results a termination strategy for Interacting Bialgebras, an important rewrite theory in the study of quantum circuits and signal flow graphs. △ Less

Submitted 3 February, 2022; v1 submitted 3 December, 2020; originally announced December 2020.

arXiv:2012.00356 [pdf, other]

Better Fewer but Better: Community Search with Outliers

Authors: Francesco Bonchi, Lorenzo Severini, Mauro Sozio

Abstract: Given a set of vertices in a network, that we believe are of interest for the application under analysis, community search is the problem of producing a subgraph potentially explaining the relationships existing among the vertices of interest. In practice this means that the solution should add some vertices to the query ones, so to create a connected subgraph that exhibits some "cohesiveness" pro… ▽ More Given a set of vertices in a network, that we believe are of interest for the application under analysis, community search is the problem of producing a subgraph potentially explaining the relationships existing among the vertices of interest. In practice this means that the solution should add some vertices to the query ones, so to create a connected subgraph that exhibits some "cohesiveness" property. This problem has received increasing attention in recent years: while several cohesiveness functions have been studied, the bulk of the literature looks for a solution subgraphs containing all the query vertices. However, in many exploratory analyses we might only have a reasonable belief about the vertices of interest: if only one of them is not really related to the others, forcing the solution to include all of them might hide the existence of much more cohesive and meaningful subgraphs, that we could have found by allowing the solution to detect and drop the outlier vertex. In this paper we study the problem of community search with outliers, where we are allowed to drop up to $k$ query vertices, with $k$ being an input parameter. We consider three of the most used measures of cohesiveness: the minimum degree, the diameter of the subgraph and the maximum distance with a query vertex. By optimizing one and using one of the others as a constraint we obtain three optimization problems: we study their hardness and we propose different exact and approximation algorithms. △ Less

Submitted 4 December, 2020; v1 submitted 1 December, 2020; originally announced December 2020.

Comments: 2020 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT'20)

arXiv:2011.05353 [pdf, other]

Adaptive Community Search in Dynamic Networks

Authors: Ioanna Tsalouchidou, Francesco Bonchi, Ricardo Baeza-Yates

Abstract: Community search is a well-studied problem which, given a static graph and a query set of vertices, requires to find a cohesive (or dense) subgraph containing the query vertices. In this paper we study the problem of community search in temporal dynamic networks. We adapt to the temporal setting the notion of \emph{network inefficiency} which is based on the pairwise shortest-path distance among a… ▽ More Community search is a well-studied problem which, given a static graph and a query set of vertices, requires to find a cohesive (or dense) subgraph containing the query vertices. In this paper we study the problem of community search in temporal dynamic networks. We adapt to the temporal setting the notion of \emph{network inefficiency} which is based on the pairwise shortest-path distance among all the vertices in a solution. For this purpose we define the notion of \emph{shortest-fastest-path distance}: a linear combination of the temporal and spatial dimensions governed by a user-defined parameter. We thus define the \textsc{Minimum Temporal-Inefficiency Subgraph} problem and show that it is \NPhard. We develop an algorithm which exploits a careful transformation of the temporal network to a static directed and weighted graph, and some recent approximation algorithm for finding the minimum Directed Steiner Tree. We finally generalize our framework to the streaming setting in which new snapshots of the temporal graph keep arriving continuously and our goal is to produce a community search solution for the temporal graph corresponding to a sliding time window. △ Less

Submitted 10 November, 2020; originally announced November 2020.

Comments: IEEE BigData 2020

arXiv:2007.01533 [pdf, other]

doi 10.1016/j.dam.2021.08.032

Finding Densest $k$-Connected Subgraphs

Authors: Francesco Bonchi, David García-Soriano, Atsushi Miyauchi, Charalampos E. Tsourakakis

Abstract: Dense subgraph discovery is an important graph-mining primitive with a variety of real-world applications. One of the most well-studied optimization problems for dense subgraph discovery is the densest subgraph problem, where given an edge-weighted undirected graph $G=(V,E,w)$, we are asked to find $S\subseteq V$ that maximizes the density $d(S)$, i.e., half the weighted average degree of the indu… ▽ More Dense subgraph discovery is an important graph-mining primitive with a variety of real-world applications. One of the most well-studied optimization problems for dense subgraph discovery is the densest subgraph problem, where given an edge-weighted undirected graph $G=(V,E,w)$, we are asked to find $S\subseteq V$ that maximizes the density $d(S)$, i.e., half the weighted average degree of the induced subgraph $G[S]$. This problem can be solved exactly in polynomial time and well-approximately in almost linear time. However, a densest subgraph has a structural drawback, namely, the subgraph may not be robust to vertex/edge failure. Indeed, a densest subgraph may not be well-connected, which implies that the subgraph may be disconnected by removing only a few vertices/edges within it. In this paper, we provide an algorithmic framework to find a dense subgraph that is well-connected in terms of vertex/edge connectivity. Specifically, we introduce the following problems: given a graph $G=(V,E,w)$ and a positive integer/real $k$, we are asked to find $S\subseteq V$ that maximizes the density $d(S)$ under the constraint that $G[S]$ is $k$-vertex/edge-connected. For both problems, we propose polynomial-time (bicriteria and ordinary) approximation algorithms, using classic Mader's theorem in graph theory and its extensions. △ Less

Submitted 3 July, 2020; originally announced July 2020.

Journal ref: Discrete Applied Mathematics 305, 34-47, 2021

arXiv:2006.05176 [pdf, other]

Explainable Classification of Brain Networks via Contrast Subgraphs

Authors: Tommaso Lanciano, Francesco Bonchi, Aristides Gionis

Abstract: Mining human-brain networks to discover patterns that can be used to discriminate between healthy individuals and patients affected by some neurological disorder, is a fundamental task in neuroscience. Learning simple and interpretable models is as important as mere classification accuracy. In this paper we introduce a novel approach for classifying brain networks based on extracting contrast subg… ▽ More Mining human-brain networks to discover patterns that can be used to discriminate between healthy individuals and patients affected by some neurological disorder, is a fundamental task in neuroscience. Learning simple and interpretable models is as important as mere classification accuracy. In this paper we introduce a novel approach for classifying brain networks based on extracting contrast subgraphs, i.e., a set of vertices whose induced subgraphs are dense in one class of graphs and sparse in the other. We formally define the problem and present an algorithmic solution for extracting contrast subgraphs. We then apply our method to a brain-network dataset consisting of children affected by Autism Spectrum Disorder and children Typically Developed. Our analysis confirms the interestingness of the discovered patterns, which match background knowledge in the neuroscience literature. Further analysis on other classification tasks confirm the simplicity, soundness, and high explainability of our proposal, which also exhibits superior classification accuracy, to more complex state-of-the-art methods. △ Less

Submitted 9 June, 2020; originally announced June 2020.

Comments: To be published at KDD 2020

arXiv:2006.01673 [pdf, other]

doi 10.1145/3394486.3403119

Learning Opinion Dynamics From Social Traces

Authors: Corrado Monti, Gianmarco De Francisci Morales, Francesco Bonchi

Abstract: Opinion dynamics - the research field dealing with how people's opinions form and evolve in a social context - traditionally uses agent-based models to validate the implications of sociological theories. These models encode the causal mechanism that drives the opinion formation process, and have the advantage of being easy to interpret. However, as they do not exploit the availability of data, the… ▽ More Opinion dynamics - the research field dealing with how people's opinions form and evolve in a social context - traditionally uses agent-based models to validate the implications of sociological theories. These models encode the causal mechanism that drives the opinion formation process, and have the advantage of being easy to interpret. However, as they do not exploit the availability of data, their predictive power is limited. Moreover, parameter calibration and model selection are manual and difficult tasks. In this work we propose an inference mechanism for fitting a generative, agent-like model of opinion dynamics to real-world social traces. Given a set of observables (e.g., actions and interactions between agents), our model can recover the most-likely latent opinion trajectories that are compatible with the assumptions about the process dynamics. This type of model retains the benefits of agent-based ones (i.e., causal interpretation), while adding the ability to perform model selection and hypothesis testing on real data. We showcase our proposal by translating a classical agent-based model of opinion dynamics into its generative counterpart. We then design an inference algorithm based on online expectation maximization to learn the latent parameters of the model. Such algorithm can recover the latent opinion trajectories from traces generated by the classical agent-based model. In addition, it can identify the most likely set of macro parameters used to generate a data trace, thus allowing testing of sociological hypotheses. Finally, we apply our model to real-world data from Reddit to explore the long-standing question about the impact of backfire effect. Our results suggest a low prominence of the effect in Reddit's political conversation. △ Less

Submitted 2 June, 2020; originally announced June 2020.

Comments: Published at KDD2020

ACM Class: J.4; G.3; I.6

Journal ref: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD2020)

arXiv:2005.01790 [pdf, other]

doi 10.1145/3394231.3397894

Roots of Trumpism: Homophily and Social Feedback in Donald Trump Support on Reddit

Authors: Joan Massachs, Corrado Monti, Gianmarco De Francisci Morales, Francesco Bonchi

Abstract: We study the emergence of support for Donald Trump in Reddit's political discussion. With almost 800k subscribers, "r/The_Donald" is one of the largest communities on Reddit, and one of the main hubs for Trump supporters. It was created in 2015, shortly after Donald Trump began his presidential campaign. By using only data from 2012, we predict the likelihood of being a supporter of Donald Trump i… ▽ More We study the emergence of support for Donald Trump in Reddit's political discussion. With almost 800k subscribers, "r/The_Donald" is one of the largest communities on Reddit, and one of the main hubs for Trump supporters. It was created in 2015, shortly after Donald Trump began his presidential campaign. By using only data from 2012, we predict the likelihood of being a supporter of Donald Trump in 2016, the year of the last US presidential elections. To characterize the behavior of Trump supporters, we draw from three different sociological hypotheses: homophily, social influence, and social feedback. We operationalize each hypothesis as a set of features for each user, and train classifiers to predict their participation in r/The_Donald. We find that homophily-based and social feedback-based features are the most predictive signals. Conversely, we do not observe a strong impact of social influence mechanisms. We also perform an introspection of the best-performing model to build a "persona" of the typical supporter of Donald Trump on Reddit. We find evidence that the most prominent traits include a predominance of masculine interests, a conservative and libertarian political leaning, and links with politically incorrect and conspiratorial content. △ Less

Submitted 4 May, 2020; originally announced May 2020.

Comments: 10 pages. Published at WebSci20

MSC Class: 91D30 ACM Class: J.4; K.4

Journal ref: Proceedings of the 12th ACM Conference on Web Science (WebSci 2020)

arXiv:2005.01670 [pdf, other]

Presenting convex sets of probability distributions by convex semilattices and unique bases

Authors: Filippo Bonchi, Ana Sokolova, Valeria Vignudelli

Abstract: We prove that every finitely generated convex set of finitely supported probability distributions has a unique base, and use this result to show that the monad of convex sets of probability distributions is presented by the algebraic theory of convex semilattices. We prove that every finitely generated convex set of finitely supported probability distributions has a unique base, and use this result to show that the monad of convex sets of probability distributions is presented by the algebraic theory of convex semilattices. △ Less

Submitted 4 May, 2020; originally announced May 2020.

arXiv:2004.05222 [pdf]

Give more data, awareness and control to individual citizens, and they will help COVID-19 containment

Authors: Mirco Nanni, Gennady Andrienko, Albert-László Barabási, Chiara Boldrini, Francesco Bonchi, Ciro Cattuto, Francesca Chiaromonte, Giovanni Comandé, Marco Conti, Mark Coté, Frank Dignum, Virginia Dignum, Josep Domingo-Ferrer, Paolo Ferragina, Fosca Giannotti, Riccardo Guidotti, Dirk Helbing, Kimmo Kaski, Janos Kertesz, Sune Lehmann, Bruno Lepri, Paul Lukowicz, Stan Matwin, David Megías Jiménez, Anna Monreale , et al. (14 additional authors not shown)

Abstract: The rapid dynamics of COVID-19 calls for quick and effective tracking of virus transmission chains and early detection of outbreaks, especially in the phase 2 of the pandemic, when lockdown and other restriction measures are progressively withdrawn, in order to avoid or minimize contagion resurgence. For this purpose, contact-tracing apps are being proposed for large scale adoption by many countri… ▽ More The rapid dynamics of COVID-19 calls for quick and effective tracking of virus transmission chains and early detection of outbreaks, especially in the phase 2 of the pandemic, when lockdown and other restriction measures are progressively withdrawn, in order to avoid or minimize contagion resurgence. For this purpose, contact-tracing apps are being proposed for large scale adoption by many countries. A centralized approach, where data sensed by the app are all sent to a nation-wide server, raises concerns about citizens' privacy and needlessly strong digital surveillance, thus alerting us to the need to minimize personal data collection and avoiding location tracking. We advocate the conceptual advantage of a decentralized approach, where both contact and location data are collected exclusively in individual citizens' "personal data stores", to be shared separately and selectively, voluntarily, only when the citizen has tested positive for COVID-19, and with a privacy preserving level of granularity. This approach better protects the personal sphere of citizens and affords multiple benefits: it allows for detailed information gathering for infected people in a privacy-preserving fashion; and, in turn this enables both contact tracing, and, the early detection of outbreak hotspots on more finely-granulated geographic scale. Our recommendation is two-fold. First to extend existing decentralized architectures with a light touch, in order to manage the collection of location data locally on the device, and allow the user to share spatio-temporal aggregates - if and when they want, for specific aims - with health authorities, for instance. Second, we favour a longer-term pursuit of realizing a Personal Data Store vision, giving users the opportunity to contribute to collective good in the measure they want, enhancing self-awareness, and cultivating collective efforts for rebuilding society. △ Less

Submitted 16 April, 2020; v1 submitted 10 April, 2020; originally announced April 2020.

Comments: Revised text. Additional authors

Journal ref: Transactions on Data Privacy 13(1): 61-66 (2020), http://www.tdp.cat/issues16/abs.a389a20.php

arXiv:2003.09453 [pdf, other]

Cartesian bicategories with choice

Authors: Filippo Bonchi, Jens Seeber, Pawel Sobocinski

Abstract: Relational structures are emerging as ubiquitous mathematical machinery in the semantics of open systems of various kinds. Cartesian bicategories are a well-known categorical algebra of relations that has proved especially useful in recent applications. The passage between a category and its bicategory of relations is an important question that has been widely studied for decades. We study an alte… ▽ More Relational structures are emerging as ubiquitous mathematical machinery in the semantics of open systems of various kinds. Cartesian bicategories are a well-known categorical algebra of relations that has proved especially useful in recent applications. The passage between a category and its bicategory of relations is an important question that has been widely studied for decades. We study an alternative construction that yields a cartesian bicategory of relations. Its behaviour is closely related to the axiom of choice, which itself can be expressed in the language of cartesian bicategories. △ Less

Submitted 20 March, 2020; originally announced March 2020.

Showing 1–50 of 102 results for author: Bonchi, F