subscribe to arXiv mailings

Temporal fingerprints: Identity matching across fully encrypted domain

Authors: Shahar Somin, Keeley Erhardt, Alex 'Sandy' Pentland

Abstract: Technological advancements have significantly transformed communication patterns, introducing a diverse array of online platforms, thereby prompting individuals to use multiple profiles for different domains and objectives. Enhancing the understanding of cross domain identity matching capabilities is essential, not only for practical applications such as commercial strategies and cybersecurity mea… ▽ More Technological advancements have significantly transformed communication patterns, introducing a diverse array of online platforms, thereby prompting individuals to use multiple profiles for different domains and objectives. Enhancing the understanding of cross domain identity matching capabilities is essential, not only for practical applications such as commercial strategies and cybersecurity measures, but also for theoretical insights into the privacy implications of data disclosure. In this study, we demonstrate that individual temporal data, in the form of inter-event times distribution, constitutes an individual temporal fingerprint, allowing for matching profiles across different domains back to their associated real-world entity. We evaluate our methodology on encrypted digital trading platforms within the Ethereum Blockchain and present impressing results in matching identities across these privacy-preserving domains, while outperforming previously suggested models. Our findings indicate that simply knowing when an individual is active, even if information about who they talk to and what they discuss is lacking, poses risks to users' privacy, highlighting the inherent challenges in preserving privacy in today's digital landscape. △ Less

Submitted 5 July, 2024; originally announced July 2024.

arXiv:2404.14643 [pdf, other]

Teaching Network Traffic Matrices in an Interactive Game Environment

Authors: Chasen Milner, Hayden Jananthan, Jeremy Kepner, Vijay Gadepally, Michael Jones, Peter Michaleas, Ritesh Patel, Sandeep Pisharody, Gabriel Wachman, Alex Pentland

Abstract: The Internet has become a critical domain for modern society that requires ongoing efforts for its improvement and protection. Network traffic matrices are a powerful tool for understanding and analyzing networks and are broadly taught in online graph theory educational resources. Network traffic matrix concepts are rarely available in online computer network and cybersecurity educational resource… ▽ More The Internet has become a critical domain for modern society that requires ongoing efforts for its improvement and protection. Network traffic matrices are a powerful tool for understanding and analyzing networks and are broadly taught in online graph theory educational resources. Network traffic matrix concepts are rarely available in online computer network and cybersecurity educational resources. To fill this gap, an interactive game environment has been developed to teach the foundations of traffic matrices to the computer networking community. The game environment provides a convenient, broadly accessible, delivery mechanism that enables making material available rapidly to a wide audience. The core architecture of the game is a facility to add new network traffic matrix training modules via an easily editable JSON file. Using this facility an initial set of modules were rapidly created covering: basic traffic matrices, traffic patterns, security/defense/deterrence, a notional cyber attack, a distributed denial-of-service (DDoS) attack, and a variety of graph theory concepts. The game environment enables delivery in a wide range of contexts to enable rapid feedback and improvement. The game can be used as a core unit as part of a formal course or as a simple interactive introduction in a presentation. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: 9 pages, 10 figures, 52 references; accepted to IEEE GrAPL

arXiv:2402.17019 [pdf, other]

Leveraging Large Language Models for Learning Complex Legal Concepts through Storytelling

Authors: Hang Jiang, Xiajie Zhang, Robert Mahari, Daniel Kessler, Eric Ma, Tal August, Irene Li, Alex 'Sandy' Pentland, Yoon Kim, Deb Roy, Jad Kabbara

Abstract: Making legal knowledge accessible to non-experts is crucial for enhancing general legal literacy and encouraging civic participation in democracy. However, legal documents are often challenging to understand for people without legal backgrounds. In this paper, we present a novel application of large language models (LLMs) in legal education to help non-experts learn intricate legal concepts throug… ▽ More Making legal knowledge accessible to non-experts is crucial for enhancing general legal literacy and encouraging civic participation in democracy. However, legal documents are often challenging to understand for people without legal backgrounds. In this paper, we present a novel application of large language models (LLMs) in legal education to help non-experts learn intricate legal concepts through storytelling, an effective pedagogical tool in conveying complex and abstract concepts. We also introduce a new dataset LegalStories, which consists of 294 complex legal doctrines, each accompanied by a story and a set of multiple-choice questions generated by LLMs. To construct the dataset, we experiment with various LLMs to generate legal stories explaining these concepts. Furthermore, we use an expert-in-the-loop approach to iteratively design multiple-choice questions. Then, we evaluate the effectiveness of storytelling with LLMs through randomized controlled trials (RCTs) with legal novices on 10 samples from the dataset. We find that LLM-generated stories enhance comprehension of legal concepts and interest in law among non-native speakers compared to only definitions. Moreover, stories consistently help participants relate legal concepts to their lives. Finally, we find that learning with stories shows a higher retention rate for non-native speakers in the follow-up assessment. Our work has strong implications for using LLMs in promoting teaching and learning in the legal field and beyond. △ Less

Submitted 2 July, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

Comments: Accepted to ACL 2024

arXiv:2402.02675 [pdf, other]

Verifiable evaluations of machine learning models using zkSNARKs

Authors: Tobin South, Alexander Camuto, Shrey Jain, Shayla Nguyen, Robert Mahari, Christian Paquin, Jason Morton, Alex 'Sandy' Pentland

Abstract: In a world of increasing closed-source commercial machine learning models, model evaluations from developers must be taken at face value. These benchmark results-whether over task accuracy, bias evaluations, or safety checks-are traditionally impossible to verify by a model end-user without the costly or impossible process of re-performing the benchmark on black-box model outputs. This work presen… ▽ More In a world of increasing closed-source commercial machine learning models, model evaluations from developers must be taken at face value. These benchmark results-whether over task accuracy, bias evaluations, or safety checks-are traditionally impossible to verify by a model end-user without the costly or impossible process of re-performing the benchmark on black-box model outputs. This work presents a method of verifiable model evaluation using model inference through zkSNARKs. The resulting zero-knowledge computational proofs of model outputs over datasets can be packaged into verifiable evaluation attestations showing that models with fixed private weights achieve stated performance or fairness metrics over public inputs. We present a flexible proving system that enables verifiable attestations to be performed on any standard neural network model with varying compute requirements. For the first time, we demonstrate this across a sample of real-world models and highlight key challenges and design solutions. This presents a new transparency paradigm in the verifiable evaluation of private models. △ Less

Submitted 22 May, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

MSC Class: 68T01

arXiv:2312.14158 [pdf, other]

Data Cooperatives for Identity Attestations

Authors: Thomas Hardjono, Alex Pentland

Abstract: Data cooperatives with fiduciary obligations to members provide a useful source of truthful information regarding a given member whose personal data is managed by the cooperative. Since one of the main propositions the cooperative model is to protect the data privacy of members, we explore the notion of blinded attestations in which the identity of the subject is removed from the attestations issu… ▽ More Data cooperatives with fiduciary obligations to members provide a useful source of truthful information regarding a given member whose personal data is managed by the cooperative. Since one of the main propositions the cooperative model is to protect the data privacy of members, we explore the notion of blinded attestations in which the identity of the subject is removed from the attestations issued by the cooperative regarding one of its members. This is performed at the request of the individual member. We propose the use of a legal entity to countersign the blinded attestation, one that has an attorney-client relationship with the cooperative, and which can henceforth become the legal point of contact for inquiries regarding the individual related to the attribute being attested. There are several use-cases for this feature, including the Funds Travel Rule in transactions in digital assets, and the protection of privacy in decentralized social networks. △ Less

Submitted 29 October, 2023; originally announced December 2023.

Comments: 15 pages, 5 figures

arXiv:2311.13008 [pdf, other]

zkTax: A pragmatic way to support zero-knowledge tax disclosures

Authors: Alex Berke, Tobin South, Robert Mahari, Kent Larson, Alex Pentland

Abstract: Tax returns contain key financial information of interest to third parties: public officials are asked to share financial data for transparency, companies seek to assess the financial status of business partners, and individuals need to prove their income to landlords or to receive benefits. Tax returns also contain sensitive data such that sharing them in their entirety undermines privacy. We int… ▽ More Tax returns contain key financial information of interest to third parties: public officials are asked to share financial data for transparency, companies seek to assess the financial status of business partners, and individuals need to prove their income to landlords or to receive benefits. Tax returns also contain sensitive data such that sharing them in their entirety undermines privacy. We introduce a zero-knowledge tax disclosure system (zkTax) that allows individuals and organizations to make provable claims about select information in their tax returns without revealing additional information, which can be independently verified by third parties. The system consists of three distinct services that can be distributed: a tax authority provides tax documents signed with a public key; a Redact & Prove Service enables users to produce a redacted version of the tax documents with a zero-knowledge proof attesting the provenance of the redacted data; a Verify Service enables anyone to verify the proof. We implement a prototype with a user interface, compatible with U.S. tax forms, and demonstrate how this design could be implemented with minimal changes to existing tax infrastructure. Our system is designed to be extensible to other contexts and jurisdictions. This work provides a practical example of how distributed tools leveraging cryptography can enhance existing government or financial infrastructures, providing immediate transparency alongside privacy without system overhauls. △ Less

Submitted 24 March, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

arXiv:2311.12955 [pdf, other]

Don't forget private retrieval: distributed private similarity search for large language models

Authors: Guy Zyskind, Tobin South, Alex Pentland

Abstract: While the flexible capabilities of large language models (LLMs) allow them to answer a range of queries based on existing learned knowledge, information retrieval to augment generation is an important tool to allow LLMs to answer questions on information not included in pre-training data. Such private information is increasingly being generated in a wide array of distributed contexts by organizati… ▽ More While the flexible capabilities of large language models (LLMs) allow them to answer a range of queries based on existing learned knowledge, information retrieval to augment generation is an important tool to allow LLMs to answer questions on information not included in pre-training data. Such private information is increasingly being generated in a wide array of distributed contexts by organizations and individuals. Performing such information retrieval using neural embeddings of queries and documents always leaked information about queries and database content unless both were stored locally. We present Private Retrieval Augmented Generation (PRAG), an approach that uses multi-party computation (MPC) to securely transmit queries to a distributed set of servers containing a privately constructed database to return top-k and approximate top-k documents. This is a first-of-its-kind approach to dense information retrieval that ensures no server observes a client's query or can see the database content. The approach introduces a novel MPC friendly protocol for inverted file approximate search (IVF) that allows for fast document search over distributed and private data in sublinear communication complexity. This work presents new avenues through which data for use in LLMs can be accessed and used without needing to centralize or forgo privacy. △ Less

Submitted 21 November, 2023; originally announced November 2023.

arXiv:2311.09356 [pdf, other]

LePaRD: A Large-Scale Dataset of Judges Citing Precedents

Authors: Robert Mahari, Dominik Stammbach, Elliott Ash, Alex `Sandy' Pentland

Abstract: We present the Legal Passage Retrieval Dataset LePaRD. LePaRD is a massive collection of U.S. federal judicial citations to precedent in context. The dataset aims to facilitate work on legal passage prediction, a challenging practice-oriented legal retrieval and reasoning task. Legal passage prediction seeks to predict relevant passages from precedential court decisions given the context of a lega… ▽ More We present the Legal Passage Retrieval Dataset LePaRD. LePaRD is a massive collection of U.S. federal judicial citations to precedent in context. The dataset aims to facilitate work on legal passage prediction, a challenging practice-oriented legal retrieval and reasoning task. Legal passage prediction seeks to predict relevant passages from precedential court decisions given the context of a legal argument. We extensively evaluate various retrieval approaches on LePaRD, and find that classification appears to work best. However, we note that legal precedent prediction is a difficult task, and there remains significant room for improvement. We hope that by publishing LePaRD, we will encourage others to engage with a legal NLP task that promises to help expand access to justice by reducing the burden associated with legal research. A subset of the LePaRD dataset is freely available and the whole dataset will be released upon publication. △ Less

Submitted 15 November, 2023; originally announced November 2023.

arXiv:2310.14346 [pdf, other]

The Law and NLP: Bridging Disciplinary Disconnects

Authors: Robert Mahari, Dominik Stammbach, Elliott Ash, Alex 'Sandy' Pentland

Abstract: Legal practice is intrinsically rooted in the fabric of language, yet legal practitioners and scholars have been slow to adopt tools from natural language processing (NLP). At the same time, the legal system is experiencing an access to justice crisis, which could be partially alleviated with NLP. In this position paper, we argue that the slow uptake of NLP in legal practice is exacerbated by a di… ▽ More Legal practice is intrinsically rooted in the fabric of language, yet legal practitioners and scholars have been slow to adopt tools from natural language processing (NLP). At the same time, the legal system is experiencing an access to justice crisis, which could be partially alleviated with NLP. In this position paper, we argue that the slow uptake of NLP in legal practice is exacerbated by a disconnect between the needs of the legal community and the focus of NLP researchers. In a review of recent trends in the legal NLP literature, we find limited overlap between the legal NLP community and legal academia. Our interpretation is that some of the most popular legal NLP tasks fail to address the needs of legal practitioners. We discuss examples of legal NLP tasks that promise to bridge disciplinary disconnects and highlight interesting areas for legal NLP research that remain underexplored. △ Less

Submitted 22 October, 2023; originally announced October 2023.

arXiv:2310.00522 [pdf, other]

Mapping of Internet "Coastlines" via Large Scale Anonymized Network Source Correlations

Authors: Hayden Jananthan, Jeremy Kepner, Michael Jones, William Arcand, David Bestor, William Bergeron, Chansup Byun, Timothy Davis, Vijay Gadepally, Daniel Grant, Michael Houle, Matthew Hubbell, Anna Klein, Lauren Milechin, Guillermo Morales, Andrew Morris, Julie Mullen, Ritesh Patel, Alex Pentland, Sandeep Pisharody, Andrew Prout, Albert Reuther, Antonio Rosa, Siddharth Samsi, Tyler Trigg , et al. (3 additional authors not shown)

Abstract: Expanding the scientific tools available to protect computer networks can be aided by a deeper understanding of the underlying statistical distributions of network traffic and their potential geometric interpretations. Analyses of large scale network observations provide a unique window into studying those underlying statistics. Newly developed GraphBLAS hypersparse matrices and D4M associative ar… ▽ More Expanding the scientific tools available to protect computer networks can be aided by a deeper understanding of the underlying statistical distributions of network traffic and their potential geometric interpretations. Analyses of large scale network observations provide a unique window into studying those underlying statistics. Newly developed GraphBLAS hypersparse matrices and D4M associative array technologies enable the efficient anonymized analysis of network traffic on the scale of trillions of events. This work analyzes over 100,000,000,000 anonymized packets from the largest Internet telescope (CAIDA) and over 10,000,000 anonymized sources from the largest commercial honeyfarm (GreyNoise). Neither CAIDA nor GreyNoise actively emit Internet traffic and provide distinct observations of unsolicited Internet traffic (primarily botnets and scanners). Analysis of these observations confirms the previously observed Cauchy-like distributions describing temporal correlations between Internet sources. The Gull lighthouse problem is a well-known geometric characterization of the standard Cauchy distribution and motivates a potential geometric interpretation for Internet observations. This work generalizes the Gull lighthouse problem to accommodate larger classes of coastlines, deriving a closed-form solution for the resulting probability distributions, stating and examining the inverse problem of identifying an appropriate coastline given a continuous probability distribution, identifying a geometric heuristic for solving this problem computationally, and applying that heuristic to examine the temporal geometry of different subsets of network observations. Application of this method to the CAIDA and GreyNoise data reveals a several orders of magnitude difference between known benign and other traffic which can lead to potentially novel ways to protect networks. △ Less

Submitted 30 September, 2023; originally announced October 2023.

Comments: 9 pages, 7 figures, IEEE HPEC 2023 (accepted)

arXiv:2309.01806 [pdf, other]

doi 10.1109/HPEC58863.2023.10363471

Focusing and Calibration of Large Scale Network Sensors using GraphBLAS Anonymized Hypersparse Matrices

Authors: Jeremy Kepner, Michael Jones, Phil Dykstra, Chansup Byun, Timothy Davis, Hayden Jananthan, William Arcand, David Bestor, William Bergeron, Vijay Gadepally, Micheal Houle, Matthew Hubbell, Anna Klein, Lauren Milechin, Guillermo Morales, Julie Mullen, Ritesh Patel, Alex Pentland, Sandeep Pisharody, Andrew Prout, Albert Reuther, Antonio Rosa, Siddharth Samsi, Tyler Trigg, Charles Yee , et al. (1 additional authors not shown)

Abstract: Defending community-owned cyber space requires community-based efforts. Large-scale network observations that uphold the highest regard for privacy are key to protecting our shared cyberspace. Deployment of the necessary network sensors requires careful sensor placement, focusing, and calibration with significant volumes of network observations. This paper demonstrates novel focusing and calibrati… ▽ More Defending community-owned cyber space requires community-based efforts. Large-scale network observations that uphold the highest regard for privacy are key to protecting our shared cyberspace. Deployment of the necessary network sensors requires careful sensor placement, focusing, and calibration with significant volumes of network observations. This paper demonstrates novel focusing and calibration procedures on a multi-billion packet dataset using high-performance GraphBLAS anonymized hypersparse matrices. The run-time performance on a real-world data set confirms previously observed real-time processing rates for high-bandwidth links while achieving significant data compression. The output of the analysis demonstrates the effectiveness of these procedures at focusing the traffic matrix and revealing the underlying stable heavy-tail statistical distributions that are necessary for anomaly detection. A simple model of the corresponding probability of detection ($p_{\rm d}$) and probability of false alarm ($p_{\rm fa}$) for these distributions highlights the criticality of network sensor focusing and calibration. Once a sensor is properly focused and calibrated it is then in a position to carry out two of the central tenets of good cybersecurity: (1) continuous observation of the network and (2) minimizing unbrokered network connections. △ Less

Submitted 4 September, 2023; originally announced September 2023.

Comments: Accepted to IEEE HPEC, 9 pages, 12 figures, 1 table, 63 references, 2 appendices

arXiv:2307.03401 [pdf, other]

Metropolitan Scale and Longitudinal Dataset of Anonymized Human Mobility Trajectories

Authors: Takahiro Yabe, Kota Tsubouchi, Toru Shimizu, Yoshihide Sekimoto, Kaoru Sezaki, Esteban Moro, Alex Pentland

Abstract: Modeling and predicting human mobility trajectories in urban areas is an essential task for various applications. The recent availability of large-scale human movement data collected from mobile devices have enabled the development of complex human mobility prediction models. However, human mobility prediction methods are often trained and tested on different datasets, due to the lack of open-sour… ▽ More Modeling and predicting human mobility trajectories in urban areas is an essential task for various applications. The recent availability of large-scale human movement data collected from mobile devices have enabled the development of complex human mobility prediction models. However, human mobility prediction methods are often trained and tested on different datasets, due to the lack of open-source large-scale human mobility datasets amid privacy concerns, posing a challenge towards conducting fair performance comparisons between methods. To this end, we created an open-source, anonymized, metropolitan scale, and longitudinal (90 days) dataset of 100,000 individuals' human mobility trajectories, using mobile phone location data. The location pings are spatially and temporally discretized, and the metropolitan area is undisclosed to protect users' privacy. The 90-day period is composed of 75 days of business-as-usual and 15 days during an emergency. To promote the use of the dataset, we will host a human mobility prediction data challenge (`HuMob Challenge 2023') using the human mobility dataset, which will be held in conjunction with ACM SIGSPATIAL 2023. △ Less

Submitted 7 July, 2023; originally announced July 2023.

Comments: Data descriptor for the Human Mobility Prediction Challenge (HuMob Challenge) 2023

arXiv:2306.13723 [pdf, other]

Human-AI Coevolution

Authors: Dino Pedreschi, Luca Pappalardo, Emanuele Ferragina, Ricardo Baeza-Yates, Albert-Laszlo Barabasi, Frank Dignum, Virginia Dignum, Tina Eliassi-Rad, Fosca Giannotti, Janos Kertesz, Alistair Knott, Yannis Ioannidis, Paul Lukowicz, Andrea Passarella, Alex Sandy Pentland, John Shawe-Taylor, Alessandro Vespignani

Abstract: Human-AI coevolution, defined as a process in which humans and AI algorithms continuously influence each other, increasingly characterises our society, but is understudied in artificial intelligence and complexity science literature. Recommender systems and assistants play a prominent role in human-AI coevolution, as they permeate many facets of daily life and influence human choices on online pla… ▽ More Human-AI coevolution, defined as a process in which humans and AI algorithms continuously influence each other, increasingly characterises our society, but is understudied in artificial intelligence and complexity science literature. Recommender systems and assistants play a prominent role in human-AI coevolution, as they permeate many facets of daily life and influence human choices on online platforms. The interaction between users and AI results in a potentially endless feedback loop, wherein users' choices generate data to train AI models, which, in turn, shape subsequent user preferences. This human-AI feedback loop has peculiar characteristics compared to traditional human-machine interaction and gives rise to complex and often ``unintended'' social outcomes. This paper introduces Coevolution AI as the cornerstone for a new field of study at the intersection between AI and complexity science focused on the theoretical, empirical, and mathematical investigation of the human-AI feedback loop. In doing so, we: (i) outline the pros and cons of existing methodologies and highlight shortcomings and potential ways for capturing feedback loop mechanisms; (ii) propose a reflection at the intersection between complexity science, AI and society; (iii) provide real-world examples for different human-AI ecosystems; and (iv) illustrate challenges to the creation of such a field of study, conceptualising them at increasing levels of abstraction, i.e., technical, epistemological, legal and socio-political. △ Less

Submitted 3 May, 2024; v1 submitted 23 June, 2023; originally announced June 2023.

arXiv:2306.04141 [pdf, other]

doi 10.1126/science.adh4451

Art and the science of generative AI: A deeper dive

Authors: Ziv Epstein, Aaron Hertzmann, Laura Herman, Robert Mahari, Morgan R. Frank, Matthew Groh, Hope Schroeder, Amy Smith, Memo Akten, Jessica Fjeld, Hany Farid, Neil Leach, Alex Pentland, Olga Russakovsky

Abstract: A new class of tools, colloquially called generative AI, can produce high-quality artistic media for visual arts, concept art, music, fiction, literature, video, and animation. The generative capabilities of these tools are likely to fundamentally alter the creative processes by which creators formulate ideas and put them into production. As creativity is reimagined, so too may be many sectors of… ▽ More A new class of tools, colloquially called generative AI, can produce high-quality artistic media for visual arts, concept art, music, fiction, literature, video, and animation. The generative capabilities of these tools are likely to fundamentally alter the creative processes by which creators formulate ideas and put them into production. As creativity is reimagined, so too may be many sectors of society. Understanding the impact of generative AI - and making policy decisions around it - requires new interdisciplinary scientific inquiry into culture, economics, law, algorithms, and the interaction of technology and creativity. We argue that generative AI is not the harbinger of art's demise, but rather is a new medium with its own distinct affordances. In this vein, we consider the impacts of this new medium on creators across four themes: aesthetics and culture, legal questions of ownership and credit, the future of creative work, and impacts on the contemporary media ecosystem. Across these themes, we highlight key research questions and directions to inform policy and beneficial uses of the technology. △ Less

Submitted 7 June, 2023; originally announced June 2023.

Comments: This white paper is an expanded version of Epstein et al 2023 published in Science Perspectives on July 16, 2023 which you can find at the following DOI: 10.1126/science.adh4451

arXiv:2212.00869 [pdf, other]

Flexible social inference facilitates targeted social learning when rewards are not observable

Authors: Robert D. Hawkins, Andrew M. Berdahl, Alex "Sandy" Pentland, Joshua B. Tenenbaum, Noah D. Goodman, P. M. Krafft

Abstract: Groups coordinate more effectively when individuals are able to learn from others' successes. But acquiring such knowledge is not always easy, especially in real-world environments where success is hidden from public view. We suggest that social inference capacities may help bridge this gap, allowing individuals to update their beliefs about others' underlying knowledge and success from observable… ▽ More Groups coordinate more effectively when individuals are able to learn from others' successes. But acquiring such knowledge is not always easy, especially in real-world environments where success is hidden from public view. We suggest that social inference capacities may help bridge this gap, allowing individuals to update their beliefs about others' underlying knowledge and success from observable trajectories of behavior. We compared our social inference model against simpler heuristics in three studies of human behavior in a collective sensing task. In Experiment 1, we found that average performance improves as a function of group size at a rate greater than predicted by non-inferential models. Experiment 2 introduced artificial agents to evaluate how individuals selectively rely on social information. Experiment 3 generalized these findings to a more complex reward landscape. Taken together, our findings provide insight into the relationship between individual social cognition and the flexibility of collective behavior. △ Less

Submitted 5 August, 2023; v1 submitted 1 December, 2022; originally announced December 2022.

Comments: Nature Human Behaviour

arXiv:2210.11053 [pdf, other]

The Network Structure of Unequal Diffusion

Authors: Eaman Jahani, Dean Eckles, Alex 'Sandy' Pentland

Abstract: Social networks affect the diffusion of information, and thus have the potential to reduce or amplify inequality in access to opportunity. We show empirically that social networks often exhibit a much larger potential for unequal diffusion across groups along paths of length 2 and 3 than expected by our random graph models. We argue that homophily alone cannot not fully explain the extent of unequ… ▽ More Social networks affect the diffusion of information, and thus have the potential to reduce or amplify inequality in access to opportunity. We show empirically that social networks often exhibit a much larger potential for unequal diffusion across groups along paths of length 2 and 3 than expected by our random graph models. We argue that homophily alone cannot not fully explain the extent of unequal diffusion and attribute this mismatch to unequal distribution of cross-group links among the nodes. Based on this insight, we develop a variant of the stochastic block model that incorporates the heterogeneity in cross-group linking. The model provides an unbiased and consistent estimate of assortativity or homophily on paths of length 2 and provide a more accurate estimate along paths of length 3 than existing models. We characterize the null distribution of its log-likelihood ratio test and argue that the goodness of fit test is valid only when the network is dense. Based on our empirical observations and modeling results, we conclude that the impact of any departure from equal distribution of links to source nodes in the diffusion process is not limited to its first order effects as some nodes will have fewer direct links to the sources. More importantly, this unequal distribution will also lead to second order effects as the whole group will have fewer diffusion paths to the sources. △ Less

Submitted 20 October, 2022; originally announced October 2022.

Comments: 47 pages

arXiv:2210.01927 [pdf, other]

doi 10.1007/978-3-031-43129-6_6

Building a healthier feed: Private location trace intersection driven feed recommendations

Authors: Tobin South, Nick Lothian, Alex "Sandy" Pentland

Abstract: The physical environment you navigate strongly determines which communities and people matter most to individuals. These effects drive both personal access to opportunities and the social capital of communities, and can often be observed in the personal mobility traces of individuals. Traditional social media feeds underutilize these mobility-based features, or do so in a privacy exploitative mann… ▽ More The physical environment you navigate strongly determines which communities and people matter most to individuals. These effects drive both personal access to opportunities and the social capital of communities, and can often be observed in the personal mobility traces of individuals. Traditional social media feeds underutilize these mobility-based features, or do so in a privacy exploitative manner. Here we propose a consent-first private information sharing paradigm for driving social feeds from users' personal private data, specifically using mobility traces. This approach designs the feed to explicitly optimize for integrating the user into the local community and for social capital building through leveraging mobility trace overlaps as a proxy for existing or potential real-world social connections, creating proportionality between whom a user sees in their feed, and whom the user is likely to see in person. These claims are validated against existing social-mobility data, and a reference implementation of the proposed algorithm is built for demonstration. In total, this work presents a novel technique for designing feeds that represent real offline social connections through private set intersections requiring no third party, or public data exposure. △ Less

Submitted 20 September, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

Journal ref: Social, Cultural, and Behavioral Modeling. SBP-BRiMS 2023. Lecture Notes in Computer Science, vol 14161. Springer, Cham

arXiv:2209.12095 [pdf, other]

Identifying latent activity behaviors and lifestyles using mobility data to describe urban dynamics

Authors: Yanni Yang, Alex Pentland, Esteban Moro

Abstract: Urbanization and its problems require an in-depth and comprehensive understanding of urban dynamics, especially the complex and diversified lifestyles in modern cities. Digitally acquired data can accurately capture complex human activity, but it lacks the interpretability of demographic data. In this paper, we study a privacy-enhanced dataset of the mobility visitation patterns of 1.2 million peo… ▽ More Urbanization and its problems require an in-depth and comprehensive understanding of urban dynamics, especially the complex and diversified lifestyles in modern cities. Digitally acquired data can accurately capture complex human activity, but it lacks the interpretability of demographic data. In this paper, we study a privacy-enhanced dataset of the mobility visitation patterns of 1.2 million people to 1.1 million places in 11 metro areas in the U.S. to detect the latent mobility behaviors and lifestyles in the largest American cities. Despite the considerable complexity of mobility visitations, we found that lifestyles can be automatically decomposed into only 12 latent interpretable activity behaviors on how people combine shopping, eating, working, or using their free time. Rather than describing individuals with a single lifestyle, we find that city dwellers' behavior is a mixture of those behaviors. Those detected latent activity behaviors are equally present across cities and cannot be fully explained by main demographic features. Finally, we find those latent behaviors are associated with dynamics like experienced income segregation, transportation, or healthy behaviors in cities, even after controlling for demographic features. Our results signal the importance of complementing traditional census data with activity behaviors to understand urban dynamics. △ Less

Submitted 24 September, 2022; originally announced September 2022.

Comments: 18 pages, 7 figures

arXiv:2207.03652 [pdf, other]

Private independence testing across two parties

Authors: Praneeth Vepakomma, Mohammad Mohammadi Amiri, Clément L. Canonne, Ramesh Raskar, Alex Pentland

Abstract: We introduce $π$-test, a privacy-preserving algorithm for testing statistical independence between data distributed across multiple parties. Our algorithm relies on privately estimating the distance correlation between datasets, a quantitative measure of independence introduced in Székely et al. [2007]. We establish both additive and multiplicative error bounds on the utility of our differentially… ▽ More We introduce $π$-test, a privacy-preserving algorithm for testing statistical independence between data distributed across multiple parties. Our algorithm relies on privately estimating the distance correlation between datasets, a quantitative measure of independence introduced in Székely et al. [2007]. We establish both additive and multiplicative error bounds on the utility of our differentially private test, which we believe will find applications in a variety of distributed hypothesis testing settings involving sensitive data. △ Less

Submitted 26 September, 2023; v1 submitted 7 July, 2022; originally announced July 2022.

arXiv:2206.12915 [pdf, ps, other]

doi 10.36190/2021.51

Disambiguating Disinformation: Extending Beyond the Veracity of Online Content

Authors: Keeley Erhardt, Alex Pentland

Abstract: Following the 2016 US presidential election and the now overwhelming evidence of Russian interference, there has been an explosion of interest in the phenomenon of "fake news". To date, research on false news has centered around detecting content from low-credibility sources and analyzing how this content spreads across online platforms. Misinformation poses clear risks, yet research agendas that… ▽ More Following the 2016 US presidential election and the now overwhelming evidence of Russian interference, there has been an explosion of interest in the phenomenon of "fake news". To date, research on false news has centered around detecting content from low-credibility sources and analyzing how this content spreads across online platforms. Misinformation poses clear risks, yet research agendas that overemphasize veracity miss the opportunity to truly understand the Kremlin-led disinformation campaign that shook so many Americans. In this paper, we present a definition for disinformation - a set or sequence of orchestrated, agenda-driven information actions with the intent to deceive - that is useful in contextualizing Russian interference in 2016 and disinformation campaigns more broadly. We expand on our ongoing work to operationalize this definition and demonstrate how detecting disinformation must extend beyond assessing the credibility of a specific publisher, user, or story. △ Less

Submitted 26 June, 2022; originally announced June 2022.

Comments: In Workshop Proceedings of the 15th International AAAI Conference on Web and Social Media (2021)

arXiv:2205.14174 [pdf, other]

Private and Byzantine-Proof Cooperative Decision-Making

Authors: Abhimanyu Dubey, Alex Pentland

Abstract: The cooperative bandit problem is a multi-agent decision problem involving a group of agents that interact simultaneously with a multi-armed bandit, while communicating over a network with delays. The central idea in this problem is to design algorithms that can efficiently leverage communication to obtain improvements over acting in isolation. In this paper, we investigate the stochastic bandit p… ▽ More The cooperative bandit problem is a multi-agent decision problem involving a group of agents that interact simultaneously with a multi-armed bandit, while communicating over a network with delays. The central idea in this problem is to design algorithms that can efficiently leverage communication to obtain improvements over acting in isolation. In this paper, we investigate the stochastic bandit problem under two settings - (a) when the agents wish to make their communication private with respect to the action sequence, and (b) when the agents can be byzantine, i.e., they provide (stochastically) incorrect information. For both these problem settings, we provide upper-confidence bound algorithms that obtain optimal regret while being (a) differentially-private and (b) tolerant to byzantine agents. Our decentralized algorithms require no information about the network of connectivity between agents, making them scalable to large dynamic systems. We test our algorithms on a competitive benchmark of random graphs and demonstrate their superior performance with respect to existing robust algorithms. We hope that our work serves as an important step towards creating distributed decision-making systems that maintain privacy. △ Less

Submitted 27 May, 2022; originally announced May 2022.

Comments: Full version of AAMAS 2020 paper uploaded to arXiv

arXiv:2201.06068 [pdf]

Zero Botnets: An Observe-Pursue-Counter Approach

Authors: Jeremy Kepner, Jonathan Bernays, Stephen Buckley, Kenjiro Cho, Cary Conrad, Leslie Daigle, Keeley Erhardt, Vijay Gadepally, Barry Greene, Michael Jones, Robert Knake, Bruce Maggs, Peter Michaleas, Chad Meiners, Andrew Morris, Alex Pentland, Sandeep Pisharody, Sarah Powazek, Andrew Prout, Philip Reiner, Koichi Suzuki, Kenji Takahashi, Tony Tauber, Leah Walker, Douglas Stetson

Abstract: Adversarial Internet robots (botnets) represent a growing threat to the safe use and stability of the Internet. Botnets can play a role in launching adversary reconnaissance (scanning and phishing), influence operations (upvoting), and financing operations (ransomware, market manipulation, denial of service, spamming, and ad click fraud) while obfuscating tailored tactical operations. Reducing the… ▽ More Adversarial Internet robots (botnets) represent a growing threat to the safe use and stability of the Internet. Botnets can play a role in launching adversary reconnaissance (scanning and phishing), influence operations (upvoting), and financing operations (ransomware, market manipulation, denial of service, spamming, and ad click fraud) while obfuscating tailored tactical operations. Reducing the presence of botnets on the Internet, with the aspirational target of zero, is a powerful vision for galvanizing policy action. Setting a global goal, encouraging international cooperation, creating incentives for improving networks, and supporting entities for botnet takedowns are among several policies that could advance this goal. These policies raise significant questions regarding proper authorities/access that cannot be answered in the abstract. Systems analysis has been widely used in other domains to achieve sufficient detail to enable these questions to be dealt with in concrete terms. Defeating botnets using an observe-pursue-counter architecture is analyzed, the technical feasibility is affirmed, and the authorities/access questions are significantly narrowed. Recommended next steps include: supporting the international botnet takedown community, expanding network observatories, enhancing the underlying network science at scale, conducting detailed systems analysis, and developing appropriate policy frameworks. △ Less

Submitted 16 January, 2022; originally announced January 2022.

Comments: 26 pages, 13 figures, 2 tables, 72 references, submitted to PlosOne

Report number: Harvard Belfer Center Report (2021 June)

arXiv:2112.04766 [pdf, other]

Adaptive Methods for Aggregated Domain Generalization

Authors: Xavier Thomas, Dhruv Mahajan, Alex Pentland, Abhimanyu Dubey

Abstract: Domain generalization involves learning a classifier from a heterogeneous collection of training sources such that it generalizes to data drawn from similar unknown target domains, with applications in large-scale learning and personalized inference. In many settings, privacy concerns prohibit obtaining domain labels for the training data samples, and instead only have an aggregated collection of… ▽ More Domain generalization involves learning a classifier from a heterogeneous collection of training sources such that it generalizes to data drawn from similar unknown target domains, with applications in large-scale learning and personalized inference. In many settings, privacy concerns prohibit obtaining domain labels for the training data samples, and instead only have an aggregated collection of training points. Existing approaches that utilize domain labels to create domain-invariant feature representations are inapplicable in this setting, requiring alternative approaches to learn generalizable classifiers. In this paper, we propose a domain-adaptive approach to this problem, which operates in two steps: (a) we cluster training data within a carefully chosen feature space to create pseudo-domains, and (b) using these pseudo-domains we learn a domain-adaptive classifier that makes predictions using information about both the input and the pseudo-domain it belongs to. Our approach achieves state-of-the-art performance on a variety of domain generalization benchmarks without using domain labels whatsoever. Furthermore, we provide novel theoretical guarantees on domain generalization using cluster information. Our approach is amenable to ensemble-based methods and provides substantial gains even on large-scale benchmark datasets. The code can be found at: https://github.com/xavierohan/AdaClust_DomainBed △ Less

Submitted 23 December, 2021; v1 submitted 9 December, 2021; originally announced December 2021.

arXiv:2111.12482 [pdf, other]

One More Step Towards Reality: Cooperative Bandits with Imperfect Communication

Authors: Udari Madhushani, Abhimanyu Dubey, Naomi Ehrich Leonard, Alex Pentland

Abstract: The cooperative bandit problem is increasingly becoming relevant due to its applications in large-scale decision-making. However, most research for this problem focuses exclusively on the setting with perfect communication, whereas in most real-world distributed settings, communication is often over stochastic networks, with arbitrary corruptions and delays. In this paper, we study cooperative ban… ▽ More The cooperative bandit problem is increasingly becoming relevant due to its applications in large-scale decision-making. However, most research for this problem focuses exclusively on the setting with perfect communication, whereas in most real-world distributed settings, communication is often over stochastic networks, with arbitrary corruptions and delays. In this paper, we study cooperative bandit learning under three typical real-world communication scenarios, namely, (a) message-passing over stochastic time-varying networks, (b) instantaneous reward-sharing over a network with random delays, and (c) message-passing with adversarially corrupted rewards, including byzantine communication. For each of these environments, we propose decentralized algorithms that achieve competitive performance, along with near-optimal guarantees on the incurred group regret as well. Furthermore, in the setting with perfect communication, we present an improved delayed-update algorithm that outperforms the existing state-of-the-art on various network topologies. Finally, we present tight network-dependent minimax lower bounds on the group regret. Our proposed algorithms are straightforward to implement and obtain competitive empirical performance. △ Less

Submitted 24 November, 2021; originally announced November 2021.

Journal ref: Conference on Neural Information Processing Systems, 2021

arXiv:2109.10523 [pdf, other]

doi 10.1038/s42005-022-00863-w

Investigating and Modeling the Dynamics of Long Ties

Authors: Ding Lyu, Yuan Yuan, Lin Wang, Xiaofan Wang, Alex Pentland

Abstract: Long ties, the social ties that bridge different communities, are widely believed to play crucial roles in spreading novel information in social networks. However, some existing network theories and prediction models indicate that long ties might dissolve quickly or eventually become redundant, thus putting into question the long-term value of long ties. Our empirical analysis of real-world dynami… ▽ More Long ties, the social ties that bridge different communities, are widely believed to play crucial roles in spreading novel information in social networks. However, some existing network theories and prediction models indicate that long ties might dissolve quickly or eventually become redundant, thus putting into question the long-term value of long ties. Our empirical analysis of real-world dynamic networks shows that contrary to such reasoning, long ties are more likely to persist than other social ties, and that many of them constantly function as social bridges without being embedded in local networks. Using a novel cost-benefit analysis model combined with machine learning, we show that long ties are highly beneficial, which instinctively motivates people to expend extra effort to maintain them. This partly explains why long ties are more persistent than what has been suggested by many existing theories and models. Overall, our study suggests the need for social interventions that can promote the formation of long ties, such as mixing people with diverse backgrounds. △ Less

Submitted 2 April, 2022; v1 submitted 22 September, 2021; originally announced September 2021.

Comments: Forthcoming at Communications Physics (Nature portfolio)

MSC Class: 05C85; 62P25; 91B16 ACM Class: J.4

Journal ref: Commun. Phys. 5 (2022) 87

arXiv:2108.07437 [pdf, other]

Social influence leads to the formation of diverse local trends

Authors: Ziv Epstein, Matthew Groh, Abhimanyu Dubey, Alex "Sandy" Pentland

Abstract: How does the visual design of digital platforms impact user behavior and the resulting environment? A body of work suggests that introducing social signals to content can increase both the inequality and unpredictability of its success, but has only been shown in the context of music listening. To further examine the effect of social influence on media popularity, we extend this research to the co… ▽ More How does the visual design of digital platforms impact user behavior and the resulting environment? A body of work suggests that introducing social signals to content can increase both the inequality and unpredictability of its success, but has only been shown in the context of music listening. To further examine the effect of social influence on media popularity, we extend this research to the context of algorithmically-generated images by re-adapting Salganik et al's Music Lab experiment. On a digital platform where participants discover and curate AI-generated hybrid animals, we randomly assign both the knowledge of other participants' behavior and the visual presentation of the information. We successfully replicate the Music Lab's findings in the context of images, whereby social influence leads to an unpredictable winner-take-all market. However, we also find that social influence can lead to the emergence of local cultural trends that diverge from the status quo and are ultimately more diverse. We discuss the implications of these results for platform designers and animal conservation efforts. △ Less

Submitted 17 August, 2021; originally announced August 2021.

Comments: 18 pages, to appear in CSCW October 2021

ACM Class: J.4

arXiv:2103.15796 [pdf, other]

Adaptive Methods for Real-World Domain Generalization

Authors: Abhimanyu Dubey, Vignesh Ramanathan, Alex Pentland, Dhruv Mahajan

Abstract: Invariant approaches have been remarkably successful in tackling the problem of domain generalization, where the objective is to perform inference on data distributions different from those used in training. In our work, we investigate whether it is possible to leverage domain information from the unseen test samples themselves. We propose a domain-adaptive approach consisting of two steps: a) we… ▽ More Invariant approaches have been remarkably successful in tackling the problem of domain generalization, where the objective is to perform inference on data distributions different from those used in training. In our work, we investigate whether it is possible to leverage domain information from the unseen test samples themselves. We propose a domain-adaptive approach consisting of two steps: a) we first learn a discriminative domain embedding from unsupervised training examples, and b) use this domain embedding as supplementary information to build a domain-adaptive model, that takes both the input as well as its domain into account while making predictions. For unseen domains, our method simply uses few unlabelled test examples to construct the domain embedding. This enables adaptive classification on any unseen domain. Our approach achieves state-of-the-art performance on various domain generalization benchmarks. In addition, we introduce the first real-world, large-scale domain generalization benchmark, Geo-YFCC, containing 1.1M samples over 40 training, 7 validation, and 15 test domains, orders of magnitude larger than prior work. We show that the existing approaches either do not scale to this dataset or underperform compared to the simple baseline of training a model on the union of data from all training domains. In contrast, our approach achieves a significant improvement. △ Less

Submitted 29 March, 2021; v1 submitted 29 March, 2021; originally announced March 2021.

Comments: To appear as an oral presentation in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021. v2 corrects double printing of appendix

arXiv:2103.04972 [pdf, ps, other]

Provably Efficient Cooperative Multi-Agent Reinforcement Learning with Function Approximation

Authors: Abhimanyu Dubey, Alex Pentland

Abstract: Reinforcement learning in cooperative multi-agent settings has recently advanced significantly in its scope, with applications in cooperative estimation for advertising, dynamic treatment regimes, distributed control, and federated learning. In this paper, we discuss the problem of cooperative multi-agent RL with function approximation, where a group of agents communicates with each other to joint… ▽ More Reinforcement learning in cooperative multi-agent settings has recently advanced significantly in its scope, with applications in cooperative estimation for advertising, dynamic treatment regimes, distributed control, and federated learning. In this paper, we discuss the problem of cooperative multi-agent RL with function approximation, where a group of agents communicates with each other to jointly solve an episodic MDP. We demonstrate that via careful message-passing and cooperative value iteration, it is possible to achieve near-optimal no-regret learning even with a fixed constant communication budget. Next, we demonstrate that even in heterogeneous cooperative settings, it is possible to achieve Pareto-optimal no-regret learning with limited communication. Our work generalizes several ideas from the multi-agent contextual and multi-armed bandit literature to MDPs and reinforcement learning. △ Less

Submitted 8 March, 2021; originally announced March 2021.

Comments: 53 pages including Appendix

arXiv:2010.11425 [pdf, other]

Differentially-Private Federated Linear Bandits

Authors: Abhimanyu Dubey, Alex Pentland

Abstract: The rapid proliferation of decentralized learning systems mandates the need for differentially-private cooperative learning. In this paper, we study this in context of the contextual linear bandit: we consider a collection of agents cooperating to solve a common contextual bandit, while ensuring that their communication remains private. For this problem, we devise \textsc{FedUCB}, a multiagent pri… ▽ More The rapid proliferation of decentralized learning systems mandates the need for differentially-private cooperative learning. In this paper, we study this in context of the contextual linear bandit: we consider a collection of agents cooperating to solve a common contextual bandit, while ensuring that their communication remains private. For this problem, we devise \textsc{FedUCB}, a multiagent private algorithm for both centralized and decentralized (peer-to-peer) federated learning. We provide a rigorous technical analysis of its utility in terms of regret, improving several results in cooperative bandit learning, and provide rigorous privacy guarantees as well. Our algorithms provide competitive performance both in terms of pseudoregret bounds and empirical benchmark performance in various multi-agent settings. △ Less

Submitted 21 October, 2020; originally announced October 2020.

Comments: 22 pages. Camera-ready for NeurIPS 2020

arXiv:2009.07413 [pdf, other]

Towards a Contract Service Provider Model for Virtual Assets and VASPs

Authors: Thomas Hardjono, Alexander Lipton, Alex Pentland

Abstract: We introduce the contract service provider (CSP) model as an analog of the successful Internet ISP model. Our exploration is motivated by the need to seek alternative blockchain service-fee models that departs from the token-for-operations (gas fee) model for smart contracts found on many popular blockchain platforms today. A given CSP community consisting of multiple CSP business entities (VASPs)… ▽ More We introduce the contract service provider (CSP) model as an analog of the successful Internet ISP model. Our exploration is motivated by the need to seek alternative blockchain service-fee models that departs from the token-for-operations (gas fee) model for smart contracts found on many popular blockchain platforms today. A given CSP community consisting of multiple CSP business entities (VASPs) form a contract domain which implement well-defined contract primitives, policies and contract-ledger. The nodes of the members of CSP community form the blockchain network. We discuss a number of design principles borrowed from the design principles of the Internet Architecture, and we discuss the interoperability of cross-domain (cross-chain) transfers of virtual assets in the context of contract domains. △ Less

Submitted 6 December, 2020; v1 submitted 15 September, 2020; originally announced September 2020.

Comments: 33 pages, 8 figures

arXiv:2008.06244 [pdf, other]

Cooperative Multi-Agent Bandits with Heavy Tails

Authors: Abhimanyu Dubey, Alex Pentland

Abstract: We study the heavy-tailed stochastic bandit problem in the cooperative multi-agent setting, where a group of agents interact with a common bandit problem, while communicating on a network with delays. Existing algorithms for the stochastic bandit in this setting utilize confidence intervals arising from an averaging-based communication protocol known as~\textit{running consensus}, that does not le… ▽ More We study the heavy-tailed stochastic bandit problem in the cooperative multi-agent setting, where a group of agents interact with a common bandit problem, while communicating on a network with delays. Existing algorithms for the stochastic bandit in this setting utilize confidence intervals arising from an averaging-based communication protocol known as~\textit{running consensus}, that does not lend itself to robust estimation for heavy-tailed settings. We propose \textsc{MP-UCB}, a decentralized multi-agent algorithm for the cooperative stochastic bandit that incorporates robust estimation with a message-passing protocol. We prove optimal regret bounds for \textsc{MP-UCB} for several problem settings, and also demonstrate its superiority to existing methods. Furthermore, we establish the first lower bounds for the cooperative bandit problem, in addition to providing efficient algorithms for robust bandit estimation of location. △ Less

Submitted 14 August, 2020; originally announced August 2020.

Comments: 26 pages including appendix, camera-ready for ICML 2020

arXiv:2008.06220 [pdf, other]

Kernel Methods for Cooperative Multi-Agent Contextual Bandits

Authors: Abhimanyu Dubey, Alex Pentland

Abstract: Cooperative multi-agent decision making involves a group of agents cooperatively solving learning problems while communicating over a network with delays. In this paper, we consider the kernelised contextual bandit problem, where the reward obtained by an agent is an arbitrary linear function of the contexts' images in the related reproducing kernel Hilbert space (RKHS), and a group of agents must… ▽ More Cooperative multi-agent decision making involves a group of agents cooperatively solving learning problems while communicating over a network with delays. In this paper, we consider the kernelised contextual bandit problem, where the reward obtained by an agent is an arbitrary linear function of the contexts' images in the related reproducing kernel Hilbert space (RKHS), and a group of agents must cooperate to collectively solve their unique decision problems. For this problem, we propose \textsc{Coop-KernelUCB}, an algorithm that provides near-optimal bounds on the per-agent regret, and is both computationally and communicatively efficient. For special cases of the cooperative problem, we also provide variants of \textsc{Coop-KernelUCB} that provides optimal per-agent regret. In addition, our algorithm generalizes several existing results in the multi-agent bandit setting. Finally, on a series of both synthetic and real-world multi-agent network benchmarks, we demonstrate that our algorithm significantly outperforms existing benchmarks. △ Less

Submitted 14 August, 2020; originally announced August 2020.

Comments: 19 pages including supplement, camera-ready at ICML 2020

arXiv:2006.09437 [pdf, other]

A Study of Compositional Generalization in Neural Models

Authors: Tim Klinger, Dhaval Adjodah, Vincent Marois, Josh Joseph, Matthew Riemer, Alex 'Sandy' Pentland, Murray Campbell

Abstract: Compositional and relational learning is a hallmark of human intelligence, but one which presents challenges for neural models. One difficulty in the development of such models is the lack of benchmarks with clear compositional and relational task structure on which to systematically evaluate them. In this paper, we introduce an environment called ConceptWorld, which enables the generation of imag… ▽ More Compositional and relational learning is a hallmark of human intelligence, but one which presents challenges for neural models. One difficulty in the development of such models is the lack of benchmarks with clear compositional and relational task structure on which to systematically evaluate them. In this paper, we introduce an environment called ConceptWorld, which enables the generation of images from compositional and relational concepts, defined using a logical domain specific language. We use it to generate images for a variety of compositional structures: 2x2 squares, pentominoes, sequences, scenes involving these objects, and other more complex concepts. We perform experiments to test the ability of standard neural architectures to generalize on relations with compositional arguments as the compositional depth of those arguments increases and under substitution. We compare standard neural networks such as MLP, CNN and ResNet, as well as state-of-the-art relational networks including WReN and PrediNet in a multi-class image classification setting. For simple problems, all models generalize well to close concepts but struggle with longer compositional chains. For more complex tests involving substitutivity, all models struggle, even with short chains. In highlighting these difficulties and providing an environment for further experimentation, we hope to encourage the development of models which are able to generalize effectively in compositional, relational domains. △ Less

Submitted 8 July, 2020; v1 submitted 16 June, 2020; originally announced June 2020.

Comments: 28 pages

arXiv:2006.01028 [pdf, other]

Interpretable Stochastic Block Influence Model: measuring social influence among homophilous communities

Authors: Yan Leng, Tara Sowrirajan, Alex Pentland

Abstract: Decision-making on networks can be explained by both homophily and social influence. While homophily drives the formation of communities with similar characteristics, social influence occurs both within and between communities. Social influence can be reasoned through role theory, which indicates that the influences among individuals depend on their roles and the behavior of interest. To operation… ▽ More Decision-making on networks can be explained by both homophily and social influence. While homophily drives the formation of communities with similar characteristics, social influence occurs both within and between communities. Social influence can be reasoned through role theory, which indicates that the influences among individuals depend on their roles and the behavior of interest. To operationalize these social science theories, we empirically identify the homophilous communities and use the community structures to capture the "roles", which affect the particular decision-making processes. We propose a generative model named Stochastic Block Influence Model and jointly analyze both the network formation and the behavioral influence within and between different empirically-identified communities. To evaluate the performance and demonstrate the interpretability of our method, we study the adoption decisions of microfinance in an Indian village. We show that although individuals tend to form links within communities, there are strong positive and negative social influences between communities, supporting the weak tie theory. Moreover, we find that communities with shared characteristics are associated with positive influence. In contrast, the communities with a lack of overlap are associated with negative influence. Our framework facilitates the quantification of the influences underlying decision communities and is thus a useful tool for driving information diffusion, viral marketing, and technology adoptions. △ Less

Submitted 1 June, 2020; originally announced June 2020.

arXiv:2005.14689 [pdf, other]

Wallet Attestations for Virtual Asset Service Providers and Crypto-Assets Insurance

Authors: Thomas Hardjono, Alexander Lipton, Alex Pentland

Abstract: The emerging virtual asset service providers (VASP) industry currently faces a number of challenges related to the Travel Rule, notably pertaining to customer personal information, account number and cryptographic key information. VASPs will be handling virtual assets of different forms, where each may be bound to different private-public key pairs on the blockchain. As such, VASPs also face the a… ▽ More The emerging virtual asset service providers (VASP) industry currently faces a number of challenges related to the Travel Rule, notably pertaining to customer personal information, account number and cryptographic key information. VASPs will be handling virtual assets of different forms, where each may be bound to different private-public key pairs on the blockchain. As such, VASPs also face the additional problem of the management of its own keys and the management of customer keys that may reside in a customer wallet. The use of attestation technologies as applied to wallet systems may provide VASPs with suitable evidence relevant to the Travel Rule regarding cryptographic key information and their operational state. Additionally, wallet attestations may provide crypto-asset insurers with strong evidence regarding the key management aspects of a wallet device, thereby providing the insurance industry with measurable levels of assurance that can become the basis for insurers to perform risk assessment on crypto-assets bound to keys in wallets, both enterprise-grade wallets and consumer-grade wallets. △ Less

Submitted 29 May, 2020; originally announced May 2020.

Comments: 35 pages; 9 figures

arXiv:2005.12218 [pdf, other]

User behavior and token adoption on ERC20

Authors: Alfredo J. Morales, Shahar Somin, Yaniv Altshuler, Alex 'Sandy' Pentland

Abstract: Cryptocurrencies and Blockchain-based technologies are disrupting all markets. While the potential of such technologies remains to be seen, there is a current need to understand emergent patterns of user behavior and token adoption in order to design future products. In this paper we analyze the social dynamics taking place during one arbitrary day on the ERC20 platform. We characterize the networ… ▽ More Cryptocurrencies and Blockchain-based technologies are disrupting all markets. While the potential of such technologies remains to be seen, there is a current need to understand emergent patterns of user behavior and token adoption in order to design future products. In this paper we analyze the social dynamics taking place during one arbitrary day on the ERC20 platform. We characterize the network of token transactions among agents. We show heterogeneous profiles of user behavior, portfolio diversity, and token adoption. While most users are specialized in transacting with a few tokens, those that have diverse portfolios are bridging across large parts of the network and may jeopardize the system stability. We believe this work to be a foundation for unveiling the usage dynamics of crypto-currencies networks. △ Less

Submitted 25 May, 2020; originally announced May 2020.

Comments: 12 pages, 4 figures

arXiv:2005.10414 [pdf, other]

Analysis of misinformation during the COVID-19 outbreak in China: cultural, social and political entanglements

Authors: Yan Leng, Yujia Zhai, Shaojing Sun, Yifei Wu, Jordan Selzer, Sharon Strover, Julia Fensel, Alex Pentland, Ying Ding

Abstract: COVID-19 resulted in an infodemic, which could erode public trust, impede virus containment, and outlive the pandemic itself. The evolving and fragmented media landscape is a key driver of the spread of misinformation. Using misinformation identified by the fact-checking platform by Tencent and posts on Weibo, our results showed that the evolution of misinformation follows an issue-attention cycle… ▽ More COVID-19 resulted in an infodemic, which could erode public trust, impede virus containment, and outlive the pandemic itself. The evolving and fragmented media landscape is a key driver of the spread of misinformation. Using misinformation identified by the fact-checking platform by Tencent and posts on Weibo, our results showed that the evolution of misinformation follows an issue-attention cycle, pertaining to topics such as city lockdown, cures, and preventions, and school reopening. Sources of authority weigh in on these topics, but their influence is complicated by peoples' pre-existing beliefs and cultural practices. Finally, social media has a complicated relationship with established or legacy media systems. Sometimes they reinforce each other, but in general, social media may have a topic cycle of its own making. Our findings shed light on the distinct characteristics of misinformation during the COVID-19 and offer insights into combating misinformation in China and across the world at large. △ Less

Submitted 20 May, 2020; originally announced May 2020.

arXiv:2004.08201 [pdf, other]

ERC20 Transactions over Ethereum Blockchain: Network Analysis and Predictions

Authors: Shahar Somin, Goren Gordon, Alex Pentland, Erez Shmueli, Yaniv Altshuler

Abstract: Following the birth of Bitcoin and the introduction of the Ethereum ERC20 protocol a decade ago, recent years have witnessed a growing number of cryptographic tokens that are being introduced by researchers, private sector companies and NGOs. The ubiquitous of such Blockchain based cryptocurrencies give birth to a new kind of rising economy, which presents great difficulties to modeling its dynami… ▽ More Following the birth of Bitcoin and the introduction of the Ethereum ERC20 protocol a decade ago, recent years have witnessed a growing number of cryptographic tokens that are being introduced by researchers, private sector companies and NGOs. The ubiquitous of such Blockchain based cryptocurrencies give birth to a new kind of rising economy, which presents great difficulties to modeling its dynamics using conventional semantic properties. Our work presents the analysis of the dynamical properties of the ERC20 protocol compliant crypto-coins' trading data using a network theory prism. We examine the dynamics of ERC20 based networks over time by analyzing a meta-parameter of the network, the power of its degree distribution. Our analysis demonstrates that this parameter can be modeled as an under-damped harmonic oscillator over time, enabling a year forward of network parameters predictions. △ Less

Submitted 17 April, 2020; originally announced April 2020.

arXiv:2004.05222 [pdf]

Give more data, awareness and control to individual citizens, and they will help COVID-19 containment

Authors: Mirco Nanni, Gennady Andrienko, Albert-László Barabási, Chiara Boldrini, Francesco Bonchi, Ciro Cattuto, Francesca Chiaromonte, Giovanni Comandé, Marco Conti, Mark Coté, Frank Dignum, Virginia Dignum, Josep Domingo-Ferrer, Paolo Ferragina, Fosca Giannotti, Riccardo Guidotti, Dirk Helbing, Kimmo Kaski, Janos Kertesz, Sune Lehmann, Bruno Lepri, Paul Lukowicz, Stan Matwin, David Megías Jiménez, Anna Monreale , et al. (14 additional authors not shown)

Abstract: The rapid dynamics of COVID-19 calls for quick and effective tracking of virus transmission chains and early detection of outbreaks, especially in the phase 2 of the pandemic, when lockdown and other restriction measures are progressively withdrawn, in order to avoid or minimize contagion resurgence. For this purpose, contact-tracing apps are being proposed for large scale adoption by many countri… ▽ More The rapid dynamics of COVID-19 calls for quick and effective tracking of virus transmission chains and early detection of outbreaks, especially in the phase 2 of the pandemic, when lockdown and other restriction measures are progressively withdrawn, in order to avoid or minimize contagion resurgence. For this purpose, contact-tracing apps are being proposed for large scale adoption by many countries. A centralized approach, where data sensed by the app are all sent to a nation-wide server, raises concerns about citizens' privacy and needlessly strong digital surveillance, thus alerting us to the need to minimize personal data collection and avoiding location tracking. We advocate the conceptual advantage of a decentralized approach, where both contact and location data are collected exclusively in individual citizens' "personal data stores", to be shared separately and selectively, voluntarily, only when the citizen has tested positive for COVID-19, and with a privacy preserving level of granularity. This approach better protects the personal sphere of citizens and affords multiple benefits: it allows for detailed information gathering for infected people in a privacy-preserving fashion; and, in turn this enables both contact tracing, and, the early detection of outbreak hotspots on more finely-granulated geographic scale. Our recommendation is two-fold. First to extend existing decentralized architectures with a light touch, in order to manage the collection of location data locally on the device, and allow the user to share spatio-temporal aggregates - if and when they want, for specific aims - with health authorities, for instance. Second, we favour a longer-term pursuit of realizing a Personal Data Store vision, giving users the opportunity to contribute to collective good in the measure they want, enhancing self-awareness, and cultivating collective efforts for rebuilding society. △ Less

Submitted 16 April, 2020; v1 submitted 10 April, 2020; originally announced April 2020.

Comments: Revised text. Additional authors

Journal ref: Transactions on Data Privacy 13(1): 61-66 (2020), http://www.tdp.cat/issues16/abs.a389a20.php

arXiv:2003.14412 [pdf, other]

Assessing Disease Exposure Risk with Location Data: A Proposal for Cryptographic Preservation of Privacy

Authors: Alex Berke, Michiel Bakker, Praneeth Vepakomma, Kent Larson, Alex 'Sandy' Pentland

Abstract: Governments and researchers around the world are implementing digital contact tracing solutions to stem the spread of infectious disease, namely COVID-19. Many of these solutions threaten individual rights and privacy. Our goal is to break past the false dichotomy of effective versus privacy-preserving contact tracing. We offer an alternative approach to assess and communicate users' risk of expos… ▽ More Governments and researchers around the world are implementing digital contact tracing solutions to stem the spread of infectious disease, namely COVID-19. Many of these solutions threaten individual rights and privacy. Our goal is to break past the false dichotomy of effective versus privacy-preserving contact tracing. We offer an alternative approach to assess and communicate users' risk of exposure to an infectious disease while preserving individual privacy. Our proposal uses recent GPS location histories, which are transformed and encrypted, and a private set intersection protocol to interface with a semi-trusted authority. There have been other recent proposals for privacy-preserving contact tracing, based on Bluetooth and decentralization, that could further eliminate the need for trust in authority. However, solutions with Bluetooth are currently limited to certain devices and contexts while decentralization adds complexity. The goal of this work is two-fold: we aim to propose a location-based system that is more privacy-preserving than what is currently being adopted by governments around the world, and that is also practical to implement with the immediacy needed to stem a viral outbreak. △ Less

Submitted 8 April, 2020; v1 submitted 31 March, 2020; originally announced March 2020.

arXiv:2003.12347 [pdf]

Mobile phone data and COVID-19: Missing an opportunity?

Authors: Nuria Oliver, Emmanuel Letouzé, Harald Sterly, Sébastien Delataille, Marco De Nadai, Bruno Lepri, Renaud Lambiotte, Richard Benjamins, Ciro Cattuto, Vittoria Colizza, Nicolas de Cordes, Samuel P. Fraiberger, Till Koebe, Sune Lehmann, Juan Murillo, Alex Pentland, Phuong N Pham, Frédéric Pivetta, Albert Ali Salah, Jari Saramäki, Samuel V. Scarpino, Michele Tizzoni, Stefaan Verhulst, Patrick Vinck

Abstract: This paper describes how mobile phone data can guide government and public health authorities in determining the best course of action to control the COVID-19 pandemic and in assessing the effectiveness of control measures such as physical distancing. It identifies key gaps and reasons why this kind of data is only scarcely used, although their value in similar epidemics has proven in a number of… ▽ More This paper describes how mobile phone data can guide government and public health authorities in determining the best course of action to control the COVID-19 pandemic and in assessing the effectiveness of control measures such as physical distancing. It identifies key gaps and reasons why this kind of data is only scarcely used, although their value in similar epidemics has proven in a number of use cases. It presents ways to overcome these gaps and key recommendations for urgent action, most notably the establishment of mixed expert groups on national and regional level, and the inclusion and support of governments and public authorities early on. It is authored by a group of experienced data scientists, epidemiologists, demographers and representatives of mobile network operators who jointly put their work at the service of the global effort to combat the COVID-19 pandemic. △ Less

Submitted 27 March, 2020; originally announced March 2020.

arXiv:1912.08998 [pdf, other]

Understanding Human Judgments of Causality

Authors: Masahiro Kazama, Yoshihiko Suhara, Andrey Bogomolov, Alex `Sandy' Pentland

Abstract: Discriminating between causality and correlation is a major problem in machine learning, and theoretical tools for determining causality are still being developed. However, people commonly make causality judgments and are often correct, even in unfamiliar domains. What are humans doing to make these judgments? This paper examines differences in human experts' and non-experts' ability to attribute… ▽ More Discriminating between causality and correlation is a major problem in machine learning, and theoretical tools for determining causality are still being developed. However, people commonly make causality judgments and are often correct, even in unfamiliar domains. What are humans doing to make these judgments? This paper examines differences in human experts' and non-experts' ability to attribute causality by comparing their performances to those of machine-learning algorithms. We collected human judgments by using Amazon Mechanical Turk (MTurk) and then divided the human subjects into two groups: experts and non-experts. We also prepared expert and non-expert machine algorithms based on different training of convolutional neural network (CNN) models. The results showed that human experts' judgments were similar to those made by an "expert" CNN model trained on a large number of examples from the target domain. The human non-experts' judgments resembled the prediction outputs of the CNN model that was trained on only the small number of examples used during the MTurk instruction. We also analyzed the differences between the expert and non-expert machine algorithms based on their neural representations to evaluate the performances, providing insight into the human experts' and non-experts' cognitive abilities. △ Less

Submitted 18 December, 2019; originally announced December 2019.

arXiv:1912.06871 [pdf, other]

Privacy-Preserving Claims Exchange Networks for Virtual Asset Service Providers

Authors: Thomas Hardjono, Alexander Lipton, Alex Pentland

Abstract: In order for VASPs to fulfill the regulatory requirements from the FATF and the Travel Rule, VASPs need access to truthful information regarding originators, beneficiaries and other VASPs involved in a virtual asset transfer instance. Additionally, in seeking data regarding subjects (individuals or organizations) VASPs are faced with privacy regulations such as the GDPR and CCPA. In this paper we… ▽ More In order for VASPs to fulfill the regulatory requirements from the FATF and the Travel Rule, VASPs need access to truthful information regarding originators, beneficiaries and other VASPs involved in a virtual asset transfer instance. Additionally, in seeking data regarding subjects (individuals or organizations) VASPs are faced with privacy regulations such as the GDPR and CCPA. In this paper we a propose privacy-preserving claims issuance model that carries indicators of the provenance of the data and the algorithms used to derive the claim or assertion. This allows VASPs to obtain originator and beneficiary information without necessarily having access to the private data about these entities. Secondly we propose the use of a consortium trust network arrangement for VASPs to exchange signed claims about subjects and their public-key information or certificate. △ Less

Submitted 2 March, 2020; v1 submitted 14 December, 2019; originally announced December 2019.

Comments: 4 figures

arXiv:1911.10433 [pdf, other]

Empowering Artists, Songwriters & Musicians in a Data Cooperative through Blockchains and Smart Contracts

Authors: Thomas Hardjono, Alex Pentland

Abstract: Over the last decade there has been a continuing decline in social trust on the part of individuals with regards to the handling and fair use of personal data, digital assets and other related rights in general. At the same time, there has been a change in the employment patterns for many people through the emergence of the gig economy. These gig workers include artists, songwriters and musicians… ▽ More Over the last decade there has been a continuing decline in social trust on the part of individuals with regards to the handling and fair use of personal data, digital assets and other related rights in general. At the same time, there has been a change in the employment patterns for many people through the emergence of the gig economy. These gig workers include artists, songwriters and musicians in the music industry. We discuss the notion of the data cooperative with fiduciary responsibilities to its members, which is similar in purpose to credit unions in the financial sector. A data cooperative for artists and musicians allows the community to share IT resources, such as data storage, analytics processing, blockchains and distributed ledgers. A cooperative can also employ smart contracts to remedy the various challenges currently faced by the music industry with regards to the license tracking management. △ Less

Submitted 23 November, 2019; originally announced November 2019.

Comments: 4 figures

arXiv:1911.04027 [pdf, other]

Segregated interactions in urban and online space

Authors: Xiaowen Dong, Alfredo J. Morales, Eaman Jahani, Esteban Moro, Bruno Lepri, Burcin Bozkaya, Carlos Sarraute, Yaneer Bar-Yam, Alex Pentland

Abstract: Urban income segregation is a widespread phenomenon that challenges societies across the globe. Classical studies on segregation have largely focused on the geographic distribution of residential neighborhoods rather than on patterns of social behaviors and interactions. In this study, we analyze segregation in economic and social interactions by observing credit card transactions and Twitter ment… ▽ More Urban income segregation is a widespread phenomenon that challenges societies across the globe. Classical studies on segregation have largely focused on the geographic distribution of residential neighborhoods rather than on patterns of social behaviors and interactions. In this study, we analyze segregation in economic and social interactions by observing credit card transactions and Twitter mentions among thousands of individuals in three culturally different metropolitan areas. We show that segregated interaction is amplified relative to the expected effects of geographic segregation in terms of both purchase activity and online communication. Furthermore, we find that segregation increases with difference in socio-economic status but is asymmetric for purchase activity, i.e., the amount of interaction from poorer to wealthier neighborhoods is larger than vice versa. Our results provide novel insights into the understanding of behavioral segregation in human interactions with significant socio-political and economic implications. △ Less

Submitted 19 April, 2020; v1 submitted 10 November, 2019; originally announced November 2019.

arXiv:1910.13983 [pdf, other]

DADI: Dynamic Discovery of Fair Information with Adversarial Reinforcement Learning

Authors: Michiel A. Bakker, Duy Patrick Tu, Humberto Riverón Valdés, Krishna P. Gummadi, Kush R. Varshney, Adrian Weller, Alex Pentland

Abstract: We introduce a framework for dynamic adversarial discovery of information (DADI), motivated by a scenario where information (a feature set) is used by third parties with unknown objectives. We train a reinforcement learning agent to sequentially acquire a subset of the information while balancing accuracy and fairness of predictors downstream. Based on the set of already acquired features, the age… ▽ More We introduce a framework for dynamic adversarial discovery of information (DADI), motivated by a scenario where information (a feature set) is used by third parties with unknown objectives. We train a reinforcement learning agent to sequentially acquire a subset of the information while balancing accuracy and fairness of predictors downstream. Based on the set of already acquired features, the agent decides dynamically to either collect more information from the set of available features or to stop and predict using the information that is currently available. Building on previous work exploring adversarial representation learning, we attain group fairness (demographic parity) by rewarding the agent with the adversary's loss, computed over the final feature set. Importantly, however, the framework provides a more general starting point for fair or private dynamic information discovery. Finally, we demonstrate empirically, using two real-world datasets, that we can trade-off fairness and predictive performance △ Less

Submitted 30 October, 2019; originally announced October 2019.

Comments: Accepted at NeurIPS 2019 HCML Workshop

arXiv:1909.08607 [pdf, other]

Towards a Public Key Management Framework for Virtual Assets and Virtual Asset Service Providers

Authors: Thomas Hardjono, Alexander Lipton, Alex Pentland

Abstract: The recent FATF Recommendations defines virtual assets and virtual assets service providers (VASP), and requires under the Travel Rule that originating VASPs obtain and hold required and accurate originator information and required beneficiary information on virtual asset transfers. In this paper we discuss the notion of key ownership evidence as a core part of originator and beneficiary informati… ▽ More The recent FATF Recommendations defines virtual assets and virtual assets service providers (VASP), and requires under the Travel Rule that originating VASPs obtain and hold required and accurate originator information and required beneficiary information on virtual asset transfers. In this paper we discuss the notion of key ownership evidence as a core part of originator and beneficiary information required by the FATF Recommendation. We discuss approaches to securely communicate the originator and beneficiary information between VASPs, and review existing standards for public key certificates as applied to VASPs and virtual asset transfers. We propose the notion of a trust network of VASPs in which originator and beneficiary information, including key ownership information, can be exchanged securely while observing individual privacy requirements. △ Less

Submitted 18 September, 2019; originally announced September 2019.

Comments: 33 pages, 9 figures

arXiv:1907.06929 [pdf]

doi 10.1109/TCSS.2019.2923216

Assessing Refugees' Integration via Spatio-temporal Similarities of Mobility and Calling Behaviors

Authors: Antonio L. Alfeo, Mario G. C. A. Cimino, Bruno Lepri, Alex 'Sandy' Pentland, Gigliola Vaglini

Abstract: In Turkey the increasing tension, due to the presence of 3.4 million Syrian refugees, demands the formulation of effective integration policies. Moreover, their design requires tools aimed at understanding the integration of refugees despite the complexity of this phenomenon. In this work, we propose a set of metrics aimed at providing insights and assessing the integration of Syrians refugees, by… ▽ More In Turkey the increasing tension, due to the presence of 3.4 million Syrian refugees, demands the formulation of effective integration policies. Moreover, their design requires tools aimed at understanding the integration of refugees despite the complexity of this phenomenon. In this work, we propose a set of metrics aimed at providing insights and assessing the integration of Syrians refugees, by analyzing a real-world Call Details Records (CDRs) dataset including calls from refugees and locals in Turkey throughout 2017. Specifically, we exploit the similarity between refugees' and locals' spatial and temporal behaviors, in terms of communication and mobility in order to assess integration dynamics. Together with the already known methods for data analysis, we use a novel computational approach to analyze spatiotemporal patterns: Computational Stigmergy, a bio-inspired scalar and temporal aggregation of samples. Computational Stigmergy associates each sample to a virtual pheromone deposit (mark). Marks in spatiotemporal proximity are aggregated into functional structures called trails, which summarize the spatiotemporal patterns in data and allows computing the similarity between different patterns. According to our results, collective mobility and behavioral similarity with locals have great potential as measures of integration, since they are: (i) correlated with the amount of interaction with locals; (ii) an effective proxy for refugee's economic capacity, thus refugee's potential employment; and (iii) able to capture events that may disrupt the integration phenomena, such as social tensions. △ Less

Submitted 16 July, 2019; originally announced July 2019.

Comments: https://ieeexplore.ieee.org/document/8758458

Journal ref: IEEE Transactions on Computational Social Systems, pp 1 - 13, Electronic ISSN: 2329-924X, Date of Publication: 09 July 2019

arXiv:1907.03821 [pdf, other]

Thompson Sampling on Symmetric $α$-Stable Bandits

Authors: Abhimanyu Dubey, Alex Pentland

Abstract: Thompson Sampling provides an efficient technique to introduce prior knowledge in the multi-armed bandit problem, along with providing remarkable empirical performance. In this paper, we revisit the Thompson Sampling algorithm under rewards drawn from symmetric $α$-stable distributions, which are a class of heavy-tailed probability distributions utilized in finance and economics, in problems such… ▽ More Thompson Sampling provides an efficient technique to introduce prior knowledge in the multi-armed bandit problem, along with providing remarkable empirical performance. In this paper, we revisit the Thompson Sampling algorithm under rewards drawn from symmetric $α$-stable distributions, which are a class of heavy-tailed probability distributions utilized in finance and economics, in problems such as modeling stock prices and human behavior. We present an efficient framework for posterior inference, which leads to two algorithms for Thompson Sampling in this setting. We prove finite-time regret bounds for both algorithms, and demonstrate through a series of experiments the stronger performance of Thompson Sampling in this setting. With our results, we provide an exposition of symmetric $α$-stable distributions in sequential decision-making, and enable sequential Bayesian inference in applications from diverse fields in finance and complex systems that operate on heavy-tailed features. △ Less

Submitted 5 December, 2019; v1 submitted 8 July, 2019; originally announced July 2019.

Comments: IJCAI 2019 Camera Ready with appendix, updated Theorem 1

arXiv:1906.09698 [pdf, other]

Gift Contagion in Online Groups: Evidence From Virtual Red Packets

Authors: Yuan Yuan, Tracy Liu, Chenhao Tan, Qian Chen, Alex Pentland, Jie Tang

Abstract: Gifts are important instruments for forming bonds in interpersonal relationships. Our study analyzes the phenomenon of gift contagion in online groups. Gift contagion encourages social bonds by prompting further gifts; it may also promote group interaction and solidarity. Using data on 36 million online red packet gifts on a large social site in East Asia, we leverage a natural experimental design… ▽ More Gifts are important instruments for forming bonds in interpersonal relationships. Our study analyzes the phenomenon of gift contagion in online groups. Gift contagion encourages social bonds by prompting further gifts; it may also promote group interaction and solidarity. Using data on 36 million online red packet gifts on a large social site in East Asia, we leverage a natural experimental design to identify the social contagion of gift giving in online groups. Our natural experiment is enabled by the randomization of the gift amount allocation algorithm on the platform, which addresses the common challenge of causal identifications in observational data. Our study provides evidence of gift contagion: on average, receiving one additional dollar causes a recipient to send 18 cents back to the group within the subsequent 24 hours. Decomposing this effect, we find that it is mainly driven by the extensive margin -- more recipients are triggered to send red packets. Moreover, we find that this effect is stronger for "luckiest draw" recipients, suggesting the presence of a group norm regarding the next red packet sender. Finally, we investigate the moderating effects of group- and individual-level social network characteristics on gift contagion as well as the causal impact of receiving gifts on group network structure. Our study has implications for promoting group dynamics and designing marketing strategies for product adoption. △ Less

Submitted 29 August, 2023; v1 submitted 23 June, 2019; originally announced June 2019.

Comments: 46 pages

Showing 1–50 of 109 results for author: Pentland, A