subscribe to arXiv mailings

Neyman Meets Causal Machine Learning: Experimental Evaluation of Individualized Treatment Rules

Authors: Michael Lingzhi Li, Kosuke Imai

Abstract: A century ago, Neyman showed how to evaluate the efficacy of treatment using a randomized experiment under a minimal set of assumptions. This classical repeated sampling framework serves as a basis of routine experimental analyses conducted by today's scientists across disciplines. In this paper, we demonstrate that Neyman's methodology can also be used to experimentally evaluate the efficacy of i… ▽ More A century ago, Neyman showed how to evaluate the efficacy of treatment using a randomized experiment under a minimal set of assumptions. This classical repeated sampling framework serves as a basis of routine experimental analyses conducted by today's scientists across disciplines. In this paper, we demonstrate that Neyman's methodology can also be used to experimentally evaluate the efficacy of individualized treatment rules (ITRs), which are derived by modern causal machine learning algorithms. In particular, we show how to account for additional uncertainty resulting from a training process based on cross-fitting. The primary advantage of Neyman's approach is that it can be applied to any ITR regardless of the properties of machine learning algorithms that are used to derive the ITR. We also show, somewhat surprisingly, that for certain metrics, it is more efficient to conduct this ex-post experimental evaluation of an ITR than to conduct an ex-ante experimental evaluation that randomly assigns some units to the ITR. Our analysis demonstrates that Neyman's repeated sampling framework is as relevant for causal inference today as it has been since its inception. △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2403.12108 [pdf, other]

Does AI help humans make better decisions? A methodological framework for experimental evaluation

Authors: Eli Ben-Michael, D. James Greiner, Melody Huang, Kosuke Imai, Zhichao Jiang, Sooahn Shin

Abstract: The use of Artificial Intelligence (AI) based on data-driven algorithms has become ubiquitous in today's society. Yet, in many cases and especially when stakes are high, humans still make final decisions. The critical question, therefore, is whether AI helps humans make better decisions as compared to a human alone or AI an alone. We introduce a new methodological framework that can be used to ans… ▽ More The use of Artificial Intelligence (AI) based on data-driven algorithms has become ubiquitous in today's society. Yet, in many cases and especially when stakes are high, humans still make final decisions. The critical question, therefore, is whether AI helps humans make better decisions as compared to a human alone or AI an alone. We introduce a new methodological framework that can be used to answer experimentally this question with no additional assumptions. We measure a decision maker's ability to make correct decisions using standard classification metrics based on the baseline potential outcome. We consider a single-blinded experimental design, in which the provision of AI-generated recommendations is randomized across cases with a human making final decisions. Under this experimental design, we show how to compare the performance of three alternative decision-making systems--human-alone, human-with-AI, and AI-alone. We apply the proposed methodology to the data from our own randomized controlled trial of a pretrial risk assessment instrument. We find that AI recommendations do not improve the classification accuracy of a judge's decision to impose cash bail. Our analysis also shows that AI-alone decisions generally perform worse than human decisions with or without AI assistance. Finally, AI recommendations tend to impose cash bail on non-white arrestees more often than necessary when compared to white arrestees. △ Less

Submitted 17 March, 2024; originally announced March 2024.

arXiv:2403.07031 [pdf, other]

The Cram Method for Efficient Simultaneous Learning and Evaluation

Authors: Zeyang Jia, Kosuke Imai, Michael Lingzhi Li

Abstract: We introduce the "cram" method, a general and efficient approach to simultaneous learning and evaluation using a generic machine learning (ML) algorithm. In a single pass of batched data, the proposed method repeatedly trains an ML algorithm and tests its empirical performance. Because it utilizes the entire sample for both learning and evaluation, cramming is significantly more data-efficient tha… ▽ More We introduce the "cram" method, a general and efficient approach to simultaneous learning and evaluation using a generic machine learning (ML) algorithm. In a single pass of batched data, the proposed method repeatedly trains an ML algorithm and tests its empirical performance. Because it utilizes the entire sample for both learning and evaluation, cramming is significantly more data-efficient than sample-splitting. The cram method also naturally accommodates online learning algorithms, making its implementation computationally efficient. To demonstrate the power of the cram method, we consider the standard policy learning setting where cramming is applied to the same data to both develop an individualized treatment rule (ITR) and estimate the average outcome that would result if the learned ITR were to be deployed. We show that under a minimal set of assumptions, the resulting crammed evaluation estimator is consistent and asymptotically normal. While our asymptotic results require a relatively weak stabilization condition of ML algorithm, we develop a simple, generic method that can be used with any policy learning algorithm to satisfy this condition. Our extensive simulation studies show that, when compared to sample-splitting, cramming reduces the evaluation standard error by more than 40% while improving the performance of learned policy. We also apply the cram method to a randomized clinical trial to demonstrate its applicability to real-world problems. Finally, we briefly discuss future extensions of the cram method to other learning and evaluation settings. △ Less

Submitted 11 March, 2024; originally announced March 2024.

arXiv:2311.02467 [pdf, other]

Individualized Policy Evaluation and Learning under Clustered Network Interference

Authors: Yi Zhang, Kosuke Imai

Abstract: While there now exists a large literature on policy evaluation and learning, much of prior work assumes that the treatment assignment of one unit does not affect the outcome of another unit. Unfortunately, ignoring interference may lead to biased policy evaluation and ineffective learned policies. For example, treating influential individuals who have many friends can generate positive spillover e… ▽ More While there now exists a large literature on policy evaluation and learning, much of prior work assumes that the treatment assignment of one unit does not affect the outcome of another unit. Unfortunately, ignoring interference may lead to biased policy evaluation and ineffective learned policies. For example, treating influential individuals who have many friends can generate positive spillover effects, thereby improving the overall performance of an individualized treatment rule (ITR). We consider the problem of evaluating and learning an optimal ITR under clustered network interference (also known as partial interference) where clusters of units are sampled from a population and units may influence one another within each cluster. Unlike previous methods that impose strong restrictions on spillover effects, the proposed methodology only assumes a semiparametric structural model where each unit's outcome is an additive function of individual treatments within the cluster. Under this model, we propose an estimator that can be used to evaluate the empirical performance of an ITR. We show that this estimator is substantially more efficient than the standard inverse probability weighting estimator, which does not impose any assumption about spillover effects. We derive the finite-sample regret bound for a learned ITR, showing that the use of our efficient evaluation estimator leads to the improved performance of learned policies. Finally, we conduct simulation and empirical studies to illustrate the advantages of the proposed methodology. △ Less

Submitted 4 February, 2024; v1 submitted 4 November, 2023; originally announced November 2023.

arXiv:2307.08840 [pdf, other]

Bayesian Safe Policy Learning with Chance Constrained Optimization: Application to Military Security Assessment during the Vietnam War

Authors: Zeyang Jia, Eli Ben-Michael, Kosuke Imai

Abstract: Algorithmic decisions and recommendations are used in many high-stakes decision-making settings such as criminal justice, medicine, and public policy. We investigate whether it would have been possible to improve a security assessment algorithm employed during the Vietnam War, using outcomes measured immediately after its introduction in late 1969. This empirical application raises several methodo… ▽ More Algorithmic decisions and recommendations are used in many high-stakes decision-making settings such as criminal justice, medicine, and public policy. We investigate whether it would have been possible to improve a security assessment algorithm employed during the Vietnam War, using outcomes measured immediately after its introduction in late 1969. This empirical application raises several methodological challenges that frequently arise in high-stakes algorithmic decision-making. First, before implementing a new algorithm, it is essential to characterize and control the risk of yielding worse outcomes than the existing algorithm. Second, the existing algorithm is deterministic, and learning a new algorithm requires transparent extrapolation. Third, the existing algorithm involves discrete decision tables that are difficult to optimize over. To address these challenges, we introduce the Average Conditional Risk (ACRisk), which first quantifies the risk that a new algorithmic policy leads to worse outcomes for subgroups of individual units and then averages this over the distribution of subgroups. We also propose a Bayesian policy learning framework that maximizes the posterior expected value while controlling the posterior expected ACRisk. This framework separates the estimation of heterogeneous treatment effects from policy optimization, enabling flexible estimation of effects and optimization over complex policy classes. We characterize the resulting chance-constrained optimization problem as a constrained linear programming problem. Our analysis shows that compared to the actual algorithm used during the Vietnam War, the learned algorithm assesses most regions as more secure and emphasizes economic and political factors over military factors. △ Less

Submitted 27 May, 2024; v1 submitted 17 July, 2023; originally announced July 2023.

arXiv:2306.07521 [pdf, other]

doi 10.1126/sciadv.adl2524

Evaluating Bias and Noise Induced by the U.S. Census Bureau's Privacy Protection Methods

Authors: Christopher T. Kenny, Cory McCartan, Shiro Kuriwaki, Tyler Simko, Kosuke Imai

Abstract: The United States Census Bureau faces a difficult trade-off between the accuracy of Census statistics and the protection of individual information. We conduct the first independent evaluation of bias and noise induced by the Bureau's two main disclosure avoidance systems: the TopDown algorithm employed for the 2020 Census and the swapping algorithm implemented for the three previous Censuses. Our… ▽ More The United States Census Bureau faces a difficult trade-off between the accuracy of Census statistics and the protection of individual information. We conduct the first independent evaluation of bias and noise induced by the Bureau's two main disclosure avoidance systems: the TopDown algorithm employed for the 2020 Census and the swapping algorithm implemented for the three previous Censuses. Our evaluation leverages the Noisy Measure File (NMF) as well as two independent runs of the TopDown algorithm applied to the 2010 decennial Census. We find that the NMF contains too much noise to be directly useful, especially for Hispanic and multiracial populations. TopDown's post-processing dramatically reduces the NMF noise and produces data whose accuracy is similar to that of swapping. While the estimated errors for both TopDown and swapping algorithms are generally no greater than other sources of Census error, they can be relatively substantial for geographies with small total populations. △ Less

Submitted 10 February, 2024; v1 submitted 12 June, 2023; originally announced June 2023.

Comments: 25 pages, 6 figures, 2 tables, plus appendices

Journal ref: Science advances, 10(18) (2024) eadl2524

arXiv:2305.07208 [pdf, other]

doi 10.1162/99608f92.c3c87223

Making Differential Privacy Work for Census Data Users

Authors: Cory McCartan, Tyler Simko, Kosuke Imai

Abstract: The U.S. Census Bureau collects and publishes detailed demographic data about Americans which are heavily used by researchers and policymakers. The Bureau has recently adopted the framework of differential privacy in an effort to improve confidentiality of individual census responses. A key output of this privacy protection system is the Noisy Measurement File (NMF), which is produced by adding ra… ▽ More The U.S. Census Bureau collects and publishes detailed demographic data about Americans which are heavily used by researchers and policymakers. The Bureau has recently adopted the framework of differential privacy in an effort to improve confidentiality of individual census responses. A key output of this privacy protection system is the Noisy Measurement File (NMF), which is produced by adding random noise to tabulated statistics. The NMF is critical to understanding any errors introduced in the data, and performing valid statistical inference on published census data. Unfortunately, the current release format of the NMF is difficult to access and work with. We describe the process we use to transform the NMF into a usable format, and provide recommendations to the Bureau for how to release future versions of the NMF. These changes are essential for ensuring transparency of privacy measures and reproducibility of scientific research built on census data. △ Less

Submitted 7 October, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

Comments: 9 pages, 2 figures

Journal ref: Harvard Data Science Review 5(4), 2023

arXiv:2305.05833 [pdf, other]

A Statistical Model of Bipartite Networks: Application to Cosponsorship in the United States Senate

Authors: Adeline Lo, Santiago Olivella, Kosuke Imai

Abstract: Many networks in political and social research are bipartite, with edges connecting exclusively across two distinct types of nodes. A common example includes cosponsorship networks, in which legislators are connected indirectly through the bills they support. Yet most existing network models are designed for unipartite networks, where edges can arise between any pair of nodes. However, using a uni… ▽ More Many networks in political and social research are bipartite, with edges connecting exclusively across two distinct types of nodes. A common example includes cosponsorship networks, in which legislators are connected indirectly through the bills they support. Yet most existing network models are designed for unipartite networks, where edges can arise between any pair of nodes. However, using a unipartite network model to analyze bipartite networks, as often done in practice, can result in aggregation bias and artificially high-clustering -- a particularly insidious problem when studying the role groups play in network formation. To address these methodological problems, we develop a statistical model of bipartite networks theorized to be generated through group interactions by extending the popular mixed-membership stochastic blockmodel. Our model allows researchers to identify the groups of nodes, within each node type in the bipartite structure, that share common patterns of edge formation. The model also incorporates both node and dyad-level covariates as the predictors of group membership and of observed dyadic relations. We develop an efficient computational algorithm for fitting the model, and apply it to cosponsorship data from the United States Senate. We show that legislators in a Senate that was perfectly split along party lines were able to remain productive and pass major legislation by forming non-partisan, power-brokering coalitions that found common ground through their collaboration on low-stakes bills. We also find evidence for norms of reciprocity, and uncover the substantial role played by policy expertise in the formation of cosponsorships between senators and legislation. We make an open-source software package available that makes it possible for other researchers to uncover similar insights from bipartite networks. △ Less

Submitted 27 June, 2024; v1 submitted 9 May, 2023; originally announced May 2023.

Comments: 41 pages (main text), 6 pages (appendix), 19 pages (online SI)

arXiv:2303.02580 [pdf, other]

Estimating Racial Disparities When Race is Not Observed

Authors: Cory McCartan, Robin Fisher, Jacob Goldin, Daniel E. Ho, Kosuke Imai

Abstract: The estimation of racial disparities in various fields is often hampered by the lack of individual-level racial information. In many cases, the law prohibits the collection of such information to prevent direct racial discrimination. As a result, analysts have frequently adopted Bayesian Improved Surname Geocoding (BISG) and its variants, which combine individual names and addresses with Census da… ▽ More The estimation of racial disparities in various fields is often hampered by the lack of individual-level racial information. In many cases, the law prohibits the collection of such information to prevent direct racial discrimination. As a result, analysts have frequently adopted Bayesian Improved Surname Geocoding (BISG) and its variants, which combine individual names and addresses with Census data to predict race. Unfortunately, the residuals of BISG are often correlated with the outcomes of interest, generally attenuating estimates of racial disparities. To correct this bias, we propose an alternative identification strategy under the assumption that surname is conditionally independent of the outcome given (unobserved) race, residence location, and other observed characteristics. We introduce a new class of models, Bayesian Instrumental Regression for Disparity Estimation (BIRDiE), that take BISG probabilities as inputs and produce racial disparity estimates by using surnames as an instrumental variable for race. Our estimation method is scalable, making it possible to analyze large-scale administrative data. We also show how to address potential violations of the key identification assumptions. A validation study based on the North Carolina voter file shows that BISG+BIRDiE reduces error by up to 84% when estimating racial differences in party registration. Finally, we apply the proposed methodology to estimate racial differences in who benefits from the home mortgage interest deduction using individual-level tax data from the U.S. Internal Revenue Service. Open-source software is available which implements the proposed methodology. △ Less

Submitted 16 April, 2024; v1 submitted 4 March, 2023; originally announced March 2023.

Comments: 28 pages, 9 figures, plus references and appendices

arXiv:2210.08383 [pdf, ps, other]

doi 10.1162/99608f92.abc2c765

Comment: The Essential Role of Policy Evaluation for the 2020 Census Disclosure Avoidance System

Authors: Christopher T. Kenny, Shiro Kuriwaki, Cory McCartan, Evan T. R. Rosenman, Tyler Simko, Kosuke Imai

Abstract: In "Differential Perspectives: Epistemic Disconnects Surrounding the US Census Bureau's Use of Differential Privacy," boyd and Sarathy argue that empirical evaluations of the Census Disclosure Avoidance System (DAS), including our published analysis, failed to recognize how the benchmark data against which the 2020 DAS was evaluated is never a ground truth of population counts. In this commentary,… ▽ More In "Differential Perspectives: Epistemic Disconnects Surrounding the US Census Bureau's Use of Differential Privacy," boyd and Sarathy argue that empirical evaluations of the Census Disclosure Avoidance System (DAS), including our published analysis, failed to recognize how the benchmark data against which the 2020 DAS was evaluated is never a ground truth of population counts. In this commentary, we explain why policy evaluation, which was the main goal of our analysis, is still meaningful without access to a perfect ground truth. We also point out that our evaluation leveraged features specific to the decennial Census and redistricting data, such as block-level population invariance under swapping and voter file racial identification, better approximating a comparison with the ground truth. Lastly, we show that accurate statistical predictions of individual race based on the Bayesian Improved Surname Geocoding, while not a violation of differential privacy, substantially increases the disclosure risk of private information the Census Bureau sought to protect. We conclude by arguing that policy makers must confront a key trade-off between data utility and privacy protection, and an epistemic disconnect alone is insufficient to explain disagreements between policy choices. △ Less

Submitted 15 October, 2022; originally announced October 2022.

Comments: Version accepted to Harvard Data Science Review

Journal ref: Harvard Data Science Review, (Special Issue 2, 2023)

arXiv:2210.08326 [pdf, ps, other]

Distributionally Robust Causal Inference with Observational Data

Authors: Dimitris Bertsimas, Kosuke Imai, Michael Lingzhi Li

Abstract: We consider the estimation of average treatment effects in observational studies and propose a new framework of robust causal inference with unobserved confounders. Our approach is based on distributionally robust optimization and proceeds in two steps. We first specify the maximal degree to which the distribution of unobserved potential outcomes may deviate from that of observed outcomes. We then… ▽ More We consider the estimation of average treatment effects in observational studies and propose a new framework of robust causal inference with unobserved confounders. Our approach is based on distributionally robust optimization and proceeds in two steps. We first specify the maximal degree to which the distribution of unobserved potential outcomes may deviate from that of observed outcomes. We then derive sharp bounds on the average treatment effects under this assumption. Our framework encompasses the popular marginal sensitivity model as a special case, and we demonstrate how the proposed methodology can address a primary challenge of the marginal sensitivity model that it produces uninformative results when unobserved confounders substantially affect treatment and outcome. Specifically, we develop an alternative sensitivity model, called the distributional sensitivity model, under the assumption that heterogeneity of treatment effect due to unobserved variables is relatively small. Unlike the marginal sensitivity model, the distributional sensitivity model allows for potential lack of overlap and often produces informative bounds even when unobserved variables substantially affect both treatment and outcome. Finally, we show how to extend the distributional sensitivity model to difference-in-differences designs and settings with instrumental variables. Through simulation and empirical studies, we demonstrate the applicability of the proposed methodology. △ Less

Submitted 2 February, 2023; v1 submitted 15 October, 2022; originally announced October 2022.

arXiv:2208.12443 [pdf, other]

Race and ethnicity data for first, middle, and last names

Authors: Evan T. R. Rosenman, Santiago Olivella, Kosuke Imai

Abstract: We provide the largest compiled publicly available dictionaries of first, middle, and last names for the purpose of imputing race and ethnicity using, for example, Bayesian Improved Surname Geocoding (BISG). The dictionaries are based on the voter files of six Southern states that collect self-reported racial data upon voter registration. Our data cover a much larger scope of names than any compar… ▽ More We provide the largest compiled publicly available dictionaries of first, middle, and last names for the purpose of imputing race and ethnicity using, for example, Bayesian Improved Surname Geocoding (BISG). The dictionaries are based on the voter files of six Southern states that collect self-reported racial data upon voter registration. Our data cover a much larger scope of names than any comparable dataset, containing roughly one million first names, 1.1 million middle names, and 1.4 million surnames. Individuals are categorized into five mutually exclusive racial and ethnic groups -- White, Black, Hispanic, Asian, and Other -- and racial/ethnic counts by name are provided for every name in each dictionary. Counts can then be normalized row-wise or column-wise to obtain conditional probabilities of race given name or name given race. These conditional probabilities can then be deployed for imputation in a data analytic task for which ground truth racial and ethnic data is not available. △ Less

Submitted 26 August, 2022; originally announced August 2022.

arXiv:2206.10763 [pdf, other]

doi 10.1038/s41597-022-01808-2

Simulated redistricting plans for the analysis and evaluation of redistricting in the United States

Authors: Cory McCartan, Christopher T. Kenny, Tyler Simko, George Garcia III, Kevin Wang, Melissa Wu, Shiro Kuriwaki, Kosuke Imai

Abstract: This article introduces the 50stateSimulations, a collection of simulated congressional districting plans and underlying code developed by the Algorithm-Assisted Redistricting Methodology (ALARM) Project. The 50stateSimulations allow for the evaluation of enacted and other congressional redistricting plans in the United States. While the use of redistricting simulation algorithms has become standa… ▽ More This article introduces the 50stateSimulations, a collection of simulated congressional districting plans and underlying code developed by the Algorithm-Assisted Redistricting Methodology (ALARM) Project. The 50stateSimulations allow for the evaluation of enacted and other congressional redistricting plans in the United States. While the use of redistricting simulation algorithms has become standard in academic research and court cases, any simulation analysis requires non-trivial efforts to combine multiple data sets, identify state-specific redistricting criteria, implement complex simulation algorithms, and summarize and visualize simulation outputs. We have developed a complete workflow that facilitates this entire process of simulation-based redistricting analysis for the congressional districts of all 50 states. The resulting 50stateSimulations include ensembles of simulated 2020 congressional redistricting plans and necessary replication data. We also provide the underlying code, which serves as a template for customized analyses. All data and code are free and publicly available. This article details the design, creation, and validation of the data. △ Less

Submitted 20 October, 2022; v1 submitted 21 June, 2022; originally announced June 2022.

Comments: 11 pages, 3 figures

Journal ref: Sci Data (2022) 9, 689

arXiv:2206.10479 [pdf, other]

Policy Learning with Asymmetric Counterfactual Utilities

Authors: Eli Ben-Michael, Kosuke Imai, Zhichao Jiang

Abstract: Data-driven decision making plays an important role even in high stakes settings like medicine and public policy. Learning optimal policies from observed data requires a careful formulation of the utility function whose expected value is maximized across a population. Although researchers typically use utilities that depend on observed outcomes alone, in many settings the decision maker's utility… ▽ More Data-driven decision making plays an important role even in high stakes settings like medicine and public policy. Learning optimal policies from observed data requires a careful formulation of the utility function whose expected value is maximized across a population. Although researchers typically use utilities that depend on observed outcomes alone, in many settings the decision maker's utility function is more properly characterized by the joint set of potential outcomes under all actions. For example, the Hippocratic principle to "do no harm" implies that the cost of causing death to a patient who would otherwise survive without treatment is greater than the cost of forgoing life-saving treatment. We consider optimal policy learning with asymmetric counterfactual utility functions of this form that consider the joint set of potential outcomes. We show that asymmetric counterfactual utilities lead to an unidentifiable expected utility function, and so we first partially identify it. Drawing on statistical decision theory, we then derive minimax decision rules by minimizing the maximum expected utility loss relative to different alternative policies. We show that one can learn minimax loss decision rules from observed data by solving intermediate classification problems, and establish that the finite sample excess expected utility loss of this procedure is bounded by the regret of these intermediate classifiers. We apply this conceptual framework and methodology to the decision about whether or not to use right heart catheterization for patients with possible pulmonary hypertension. △ Less

Submitted 28 November, 2023; v1 submitted 21 June, 2022; originally announced June 2022.

arXiv:2205.06129 [pdf, other]

Addressing Census data problems in race imputation via fully Bayesian Improved Surname Geocoding and name supplements

Authors: Kosuke Imai, Santiago Olivella, Evan T. R. Rosenman

Abstract: Prediction of individual's race and ethnicity plays an important role in social science and public health research. Examples include studies of racial disparity in health and voting. Recently, Bayesian Improved Surname Geocoding (BISG), which uses Bayes' rule to combine information from Census surname files with the geocoding of an individual's residence, has emerged as a leading methodology for t… ▽ More Prediction of individual's race and ethnicity plays an important role in social science and public health research. Examples include studies of racial disparity in health and voting. Recently, Bayesian Improved Surname Geocoding (BISG), which uses Bayes' rule to combine information from Census surname files with the geocoding of an individual's residence, has emerged as a leading methodology for this prediction task. Unfortunately, BISG suffers from two Census data problems that contribute to unsatisfactory predictive performance for minorities. First, the decennial Census often contains zero counts for minority racial groups in the Census blocks where some members of those groups reside. Second, because the Census surname files only include frequent names, many surnames -- especially those of minorities -- are missing from the list. To address the zero counts problem, we introduce a fully Bayesian Improved Surname Geocoding (fBISG) methodology that accounts for potential measurement error in Census counts by extending the naive Bayesian inference of the BISG methodology to full posterior inference. To address the missing surname problem, we supplement the Census surname data with additional data on last, first, and middle names taken from the voter files of six Southern states where self-reported race is available. Our empirical validation shows that the fBISG methodology and name supplements significantly improve the accuracy of race imputation across all racial groups, and especially for Asians. The proposed methodology, together with additional name data, is available via the open-source software WRU. △ Less

Submitted 31 August, 2022; v1 submitted 12 May, 2022; originally announced May 2022.

arXiv:2111.12147 [pdf, other]

kmclib: Automated Inference and Verification of Session Types

Authors: Keigo Imai, Julien Lange, Rumyana Neykova

Abstract: Theories and tools based on multiparty session types offer correctness guarantees for concurrent programs that communicate using message-passing. These guarantees usually come at the cost of an intrinsically top-down approach, which requires the communication behaviour of the entire program to be specified as a global type. This paper introduces kmclib: an OCaml library that supports the developme… ▽ More Theories and tools based on multiparty session types offer correctness guarantees for concurrent programs that communicate using message-passing. These guarantees usually come at the cost of an intrinsically top-down approach, which requires the communication behaviour of the entire program to be specified as a global type. This paper introduces kmclib: an OCaml library that supports the development of correct message-passing programs without having to write any types. The library utilises the meta-programming facilities of OCaml to automatically infer the session types of concurrent programs and verify their compatibility (k-MC). Well-typed programs, written with kmclib, do not lead to communication errors and cannot get stuck. △ Less

Submitted 26 November, 2021; v1 submitted 23 November, 2021; originally announced November 2021.

Comments: kmclib is available at https://github.com/keigoi/kmclib

arXiv:2110.14014 [pdf, other]

Measuring and Modeling Neighborhoods

Authors: Cory McCartan, Jacob R. Brown, Kosuke Imai

Abstract: Granular geographic data present new opportunities to understand how neighborhoods are formed, and how they influence politics. At the same time, the inherent subjectivity of neighborhoods creates methodological challenges in measuring and modeling them. We develop an open-source survey instrument that allows respondents to draw their neighborhoods on a map. We also propose a statistical model to… ▽ More Granular geographic data present new opportunities to understand how neighborhoods are formed, and how they influence politics. At the same time, the inherent subjectivity of neighborhoods creates methodological challenges in measuring and modeling them. We develop an open-source survey instrument that allows respondents to draw their neighborhoods on a map. We also propose a statistical model to analyze how the characteristics of respondents and local areas determine subjective neighborhoods. We conduct two surveys: collecting subjective neighborhoods from voters in Miami, New York City, and Phoenix, and asking New York City residents to draw a community of interest for inclusion in their city council district. Our analysis shows that, holding other factors constant, White respondents include census blocks with more White residents in their neighborhoods. Similarly, Democrats and Republicans are more likely to include co-partisan areas. Furthermore, our model provides more accurate out-of-sample predictions than standard neighborhood measures. △ Less

Submitted 19 January, 2024; v1 submitted 26 October, 2021; originally announced October 2021.

Comments: 34 pages, 11 figures, and supplementary material

arXiv:2109.11679 [pdf, other]

Safe Policy Learning through Extrapolation: Application to Pre-trial Risk Assessment

Authors: Eli Ben-Michael, D. James Greiner, Kosuke Imai, Zhichao Jiang

Abstract: Algorithmic recommendations and decisions have become ubiquitous in today's society. Many of these and other data-driven policies, especially in the realm of public policy, are based on known, deterministic rules to ensure their transparency and interpretability. For example, algorithmic pre-trial risk assessments, which serve as our motivating application, provide relatively simple, deterministic… ▽ More Algorithmic recommendations and decisions have become ubiquitous in today's society. Many of these and other data-driven policies, especially in the realm of public policy, are based on known, deterministic rules to ensure their transparency and interpretability. For example, algorithmic pre-trial risk assessments, which serve as our motivating application, provide relatively simple, deterministic classification scores and recommendations to help judges make release decisions. How can we use the data based on existing deterministic policies to learn new and better policies? Unfortunately, prior methods for policy learning are not applicable because they require existing policies to be stochastic rather than deterministic. We develop a robust optimization approach that partially identifies the expected utility of a policy, and then finds an optimal policy by minimizing the worst-case regret. The resulting policy is conservative but has a statistical safety guarantee, allowing the policy-maker to limit the probability of producing a worse outcome than the existing policy. We extend this approach to common and important settings where humans make decisions with the aid of algorithmic recommendations. Lastly, we apply the proposed methodology to a unique field experiment on pre-trial risk assessment instruments. We derive new classification and recommendation rules that retain the transparency and interpretability of the existing instrument while potentially leading to better overall outcomes at a lower cost. △ Less

Submitted 15 February, 2022; v1 submitted 21 September, 2021; originally announced September 2021.

arXiv:2105.14197 [pdf, other]

doi 10.1126/sciadv.abk3283

The Impact of the U.S. Census Disclosure Avoidance System on Redistricting and Voting Rights Analysis

Authors: Christopher T. Kenny, Shiro Kuriwaki, Cory McCartan, Evan Rosenman, Tyler Simko, Kosuke Imai

Abstract: The US Census Bureau plans to protect the privacy of 2020 Census respondents through its Disclosure Avoidance System (DAS), which attempts to achieve differential privacy guarantees by adding noise to the Census microdata. By applying redistricting simulation and analysis methods to DAS-protected 2010 Census data, we find that the protected data are not of sufficient quality for redistricting purp… ▽ More The US Census Bureau plans to protect the privacy of 2020 Census respondents through its Disclosure Avoidance System (DAS), which attempts to achieve differential privacy guarantees by adding noise to the Census microdata. By applying redistricting simulation and analysis methods to DAS-protected 2010 Census data, we find that the protected data are not of sufficient quality for redistricting purposes. We demonstrate that the injected noise makes it impossible for states to accurately comply with the One Person, One Vote principle. Our analysis finds that the DAS-protected data are biased against certain areas, depending on voter turnout and partisan and racial composition, and that these biases lead to large and unpredictable errors in the analysis of partisan and racial gerrymanders. Finally, we show that the DAS algorithm does not universally protect respondent privacy. Based on the names and addresses of registered voters, we are able to predict their race as accurately using the DAS-protected data as when using the 2010 Census data. Despite this, the DAS-protected data can still inaccurately estimate the number of majority-minority districts. We conclude with recommendations for how the Census Bureau should proceed with privacy protection for the 2020 Census. △ Less

Submitted 20 August, 2021; v1 submitted 28 May, 2021; originally announced May 2021.

Comments: 42 pages, 22 figures. New postscript analyzing newly-released DAS-19.61 revision

Journal ref: Science advances, 7(41) (2021) eabk3283

arXiv:2103.00702 [pdf, other]

Dynamic Stochastic Blockmodel Regression for Network Data: Application to International Militarized Conflicts

Authors: Santiago Olivella, Tyler Pratt, Kosuke Imai

Abstract: A primary goal of social science research is to understand how latent group memberships predict the dynamic process of network evolution. In the modeling of international militarized conflicts, for instance, scholars hypothesize that membership in geopolitical coalitions shapes the decision to engage in conflict. Such theories explain the ways in which nodal and dyadic characteristics affect the e… ▽ More A primary goal of social science research is to understand how latent group memberships predict the dynamic process of network evolution. In the modeling of international militarized conflicts, for instance, scholars hypothesize that membership in geopolitical coalitions shapes the decision to engage in conflict. Such theories explain the ways in which nodal and dyadic characteristics affect the evolution of conflict patterns over time via their effects on group memberships. To aid the empirical testing of these arguments, we develop a dynamic model of network data by combining a hidden Markov model with a mixed-membership stochastic blockmodel that identifies latent groups underlying the network structure. Unlike existing models, we incorporate covariates that predict dynamic node memberships in latent groups as well as the direct formation of edges between dyads. While prior substantive research often assumes the decision to engage in international militarized conflict is independent across states and static over time, we demonstrate that conflict is driven by states' evolving membership in geopolitical blocs. Changes in monadic covariates like democracy shift states between coalitions, generating heterogeneous effects on conflict over time and across states. The proposed methodology, which relies on a variational approximation to a collapsed posterior distribution as well as stochastic optimization for scalability, is implemented through an open-source software package. △ Less

Submitted 25 October, 2021; v1 submitted 28 February, 2021; originally announced March 2021.

Comments: 34 pages (main text), 34 pages (supplementary information), 21 figures

arXiv:2012.02845 [pdf, other]

Experimental Evaluation of Algorithm-Assisted Human Decision-Making: Application to Pretrial Public Safety Assessment

Authors: Kosuke Imai, Zhichao Jiang, James Greiner, Ryan Halen, Sooahn Shin

Abstract: Despite an increasing reliance on fully-automated algorithmic decision-making in our day-to-day lives, human beings still make highly consequential decisions. As frequently seen in business, healthcare, and public policy, recommendations produced by algorithms are provided to human decision-makers to guide their decisions. While there exists a fast-growing literature evaluating the bias and fairne… ▽ More Despite an increasing reliance on fully-automated algorithmic decision-making in our day-to-day lives, human beings still make highly consequential decisions. As frequently seen in business, healthcare, and public policy, recommendations produced by algorithms are provided to human decision-makers to guide their decisions. While there exists a fast-growing literature evaluating the bias and fairness of such algorithmic recommendations, an overlooked question is whether they help humans make better decisions. We develop a statistical methodology for experimentally evaluating the causal impacts of algorithmic recommendations on human decisions. We also show how to examine whether algorithmic recommendations improve the fairness of human decisions and derive the optimal decision rules under various settings. We apply the proposed methodology to preliminary data from the first-ever randomized controlled trial that evaluates the pretrial Public Safety Assessment (PSA) in the criminal justice system. A goal of the PSA is to help judges decide which arrested individuals should be released. On the basis of the preliminary data available, we find that providing the PSA to the judge has little overall impact on the judge's decisions and subsequent arrestee behavior. However, our analysis yields some potentially suggestive evidence that the PSA may help avoid unnecessarily harsh decisions for female arrestees regardless of their risk levels while it encourages the judge to make stricter decisions for male arrestees who are deemed to be risky. In terms of fairness, the PSA appears to increase the gender bias against males while having little effect on any existing racial differences in judges' decision. Finally, we find that the PSA's recommendations might be unnecessarily severe unless the cost of a new crime is sufficiently high. △ Less

Submitted 11 December, 2021; v1 submitted 4 December, 2020; originally announced December 2020.

arXiv:2008.06131 [pdf, other]

doi 10.1214/23-AOAS1763

Sequential Monte Carlo for Sampling Balanced and Compact Redistricting Plans

Authors: Cory McCartan, Kosuke Imai

Abstract: Random sampling of graph partitions under constraints has become a popular tool for evaluating legislative redistricting plans. Analysts detect partisan gerrymandering by comparing a proposed redistricting plan with an ensemble of sampled alternative plans. For successful application, sampling methods must scale to maps with a moderate or large number of districts, incorporate realistic legal cons… ▽ More Random sampling of graph partitions under constraints has become a popular tool for evaluating legislative redistricting plans. Analysts detect partisan gerrymandering by comparing a proposed redistricting plan with an ensemble of sampled alternative plans. For successful application, sampling methods must scale to maps with a moderate or large number of districts, incorporate realistic legal constraints, and accurately and efficiently sample from a selected target distribution. Unfortunately, most existing methods struggle in at least one of these areas. We present a new Sequential Monte Carlo (SMC) algorithm that generates a sample of redistricting plans converging to a realistic target distribution. Because it draws many plans in parallel, the SMC algorithm can efficiently explore the relevant space of redistricting plans better than the existing Markov chain Monte Carlo (MCMC) algorithms that generate plans sequentially. Our algorithm can simultaneously incorporate several constraints commonly imposed in real-world redistricting problems, including equal population, compactness, and preservation of administrative boundaries. We validate the accuracy of the proposed algorithm by using a small map where all redistricting plans can be enumerated. We then apply the SMC algorithm to evaluate the partisan implications of several maps submitted by relevant parties in a recent high-profile redistricting case in the state of Pennsylvania. We find that the proposed algorithm converges faster and with fewer samples than a comparable MCMC algorithm. Open-source software is available for implementing the proposed methodology. △ Less

Submitted 14 February, 2023; v1 submitted 13 August, 2020; originally announced August 2020.

Comments: 19 pages, 7 figures, plus appendices; revised validation section, discussion, and appendices

Journal ref: Annals of Applied Statistics 14(7), 2023

arXiv:2006.10148 [pdf, other]

The Essential Role of Empirical Validation in Legislative Redistricting Simulation

Authors: Benjamin Fifield, Kosuke Imai, Jun Kawahara, Christopher T. Kenny

Abstract: As granular data about elections and voters become available, redistricting simulation methods are playing an increasingly important role when legislatures adopt redistricting plans and courts determine their legality. These simulation methods are designed to yield a representative sample of all redistricting plans that satisfy statutory guidelines and requirements such as contiguity, population p… ▽ More As granular data about elections and voters become available, redistricting simulation methods are playing an increasingly important role when legislatures adopt redistricting plans and courts determine their legality. These simulation methods are designed to yield a representative sample of all redistricting plans that satisfy statutory guidelines and requirements such as contiguity, population parity, and compactness. A proposed redistricting plan can be considered gerrymandered if it constitutes an outlier relative to this sample according to partisan fairness metrics. Despite their growing use, an insufficient effort has been made to empirically validate the accuracy of the simulation methods. We apply a recently developed computational method that can efficiently enumerate all possible redistricting plans and yield an independent uniform sample from this population. We show that this algorithm scales to a state with a couple of hundred geographical units. Finally, we empirically examine how existing simulation methods perform on realistic validation data sets. △ Less

Submitted 17 June, 2020; originally announced June 2020.

Comments: 32 pages, 14 figures

arXiv:2005.10400 [pdf, other]

Principal Fairness for Human and Algorithmic Decision-Making

Authors: Kosuke Imai, Zhichao Jiang

Abstract: Using the concept of principal stratification from the causal inference literature, we introduce a new notion of fairness, called principal fairness, for human and algorithmic decision-making. The key idea is that one should not discriminate among individuals who would be similarly affected by the decision. Unlike the existing statistical definitions of fairness, principal fairness explicitly acco… ▽ More Using the concept of principal stratification from the causal inference literature, we introduce a new notion of fairness, called principal fairness, for human and algorithmic decision-making. The key idea is that one should not discriminate among individuals who would be similarly affected by the decision. Unlike the existing statistical definitions of fairness, principal fairness explicitly accounts for the fact that individuals can be impacted by the decision. Furthermore, we explain how principal fairness differs from the existing causality-based fairness criteria. In contrast to the counterfactual fairness criteria, for example, principal fairness considers the effects of decision in question rather than those of protected attributes of interest. We briefly discuss how to approach empirical evaluation and policy learning problems under the proposed principal fairness criterion. △ Less

Submitted 24 March, 2022; v1 submitted 20 May, 2020; originally announced May 2020.

arXiv:2005.06333 [pdf, other]

Multiparty Session Programming with Global Protocol Combinators

Authors: Keigo Imai, Rumyana Neykova, Nobuko Yoshida, Shoji Yuen

Abstract: Multiparty Session Types (MPST) is a typing discipline for communication protocols. It ensures the absence of communication errors and deadlocks for well-typed communicating processes. The state-of-the-art implementations of the MPST theory rely on (1) runtime linearity checks to ensure correct usage of communication channels and (2) external domain-specific languages for specifying and verifying… ▽ More Multiparty Session Types (MPST) is a typing discipline for communication protocols. It ensures the absence of communication errors and deadlocks for well-typed communicating processes. The state-of-the-art implementations of the MPST theory rely on (1) runtime linearity checks to ensure correct usage of communication channels and (2) external domain-specific languages for specifying and verifying multiparty protocols. To overcome these limitations, we propose a library for programming with global combinators -- a set of functions for writing and verifying multiparty protocols in OCaml. Local behaviours for all processes in a protocol are inferred at once from a global combinator. We formalise global combinators and prove a sound realisability of global combinators -- a well-typed global combinator derives a set of local types, by which typed endpoint programs can ensure type and communication safety. Our approach enables fully-static verification and implementation of the whole protocol, from the protocol specification to the process implementations, to happen in the same language. We compare our implementation to untyped and continuation-passing style implementations, and demonstrate its expressiveness by implementing a plethora of protocols. We show our library can interoperate with existing libraries and services, implementing DNS (Domain Name Service) protocol and the OAuth (Open Authentication) protocol. △ Less

Submitted 23 May, 2020; v1 submitted 13 May, 2020; originally announced May 2020.

Comments: ECOOP 2020

arXiv:2004.05964 [pdf, other]

Keyword Assisted Topic Models

Authors: Shusei Eshima, Kosuke Imai, Tomoya Sasaki

Abstract: In recent years, fully automated content analysis based on probabilistic topic models has become popular among social scientists because of their scalability. The unsupervised nature of the models makes them suitable for exploring topics in a corpus without prior knowledge. However, researchers find that these models often fail to measure specific concepts of substantive interest by inadvertently… ▽ More In recent years, fully automated content analysis based on probabilistic topic models has become popular among social scientists because of their scalability. The unsupervised nature of the models makes them suitable for exploring topics in a corpus without prior knowledge. However, researchers find that these models often fail to measure specific concepts of substantive interest by inadvertently creating multiple topics with similar content and combining distinct themes into a single topic. In this paper, we empirically demonstrate that providing a small number of keywords can substantially enhance the measurement performance of topic models. An important advantage of the proposed keyword assisted topic model (keyATM) is that the specification of keywords requires researchers to label topics prior to fitting a model to the data. This contrasts with a widespread practice of post-hoc topic interpretation and adjustments that compromises the objectivity of empirical findings. In our application, we find that keyATM provides more interpretable results, has better document classification performance, and is less sensitive to the number of topics than the standard topic models. Finally, we show that keyATM can also incorporate covariates and model time trends. An open-source software package is available for implementing the proposed methodology. △ Less

Submitted 2 February, 2023; v1 submitted 13 April, 2020; originally announced April 2020.

arXiv:2004.01325 [pdf, other]

doi 10.4204/EPTCS.314.6

Fluent Session Programming in C#

Authors: Shunsuke Kimura, Keigo Imai

Abstract: We propose SessionC#, a lightweight session typed library for safe concurrent/distributed programming. The key features are (1) the improved fluent interface which enables writing communication in chained method calls, by exploiting C#'s out variables, and (2) amalgamation of session delegation with async/await, which materialises session cancellation in a limited form, which we call session inter… ▽ More We propose SessionC#, a lightweight session typed library for safe concurrent/distributed programming. The key features are (1) the improved fluent interface which enables writing communication in chained method calls, by exploiting C#'s out variables, and (2) amalgamation of session delegation with async/await, which materialises session cancellation in a limited form, which we call session intervention. We show the effectiveness of our proposal via a Bitcoin miner application. △ Less

Submitted 2 April, 2020; originally announced April 2020.

Comments: In Proceedings PLACES 2020, arXiv:2004.01062

Journal ref: EPTCS 314, 2020, pp. 61-75

arXiv:2003.07042 [pdf, other]

Gated Texture CNN for Efficient and Configurable Image Denoising

Authors: Kaito Imai, Takamichi Miyata

Abstract: Convolutional neural network (CNN)-based image denoising methods typically estimate the noise component contained in a noisy input image and restore a clean image by subtracting the estimated noise from the input. However, previous denoising methods tend to remove high-frequency information (e.g., textures) from the input. It caused by intermediate feature maps of CNN contains texture information.… ▽ More Convolutional neural network (CNN)-based image denoising methods typically estimate the noise component contained in a noisy input image and restore a clean image by subtracting the estimated noise from the input. However, previous denoising methods tend to remove high-frequency information (e.g., textures) from the input. It caused by intermediate feature maps of CNN contains texture information. A straightforward approach to this problem is stacking numerous layers, which leads to a high computational cost. To achieve high performance and computational efficiency, we propose a gated texture CNN (GTCNN), which is designed to carefully exclude the texture information from each intermediate feature map of the CNN by incorporating gating mechanisms. Our GTCNN achieves state-of-the-art performance with 4.8 times fewer parameters than previous state-of-the-art methods. Furthermore, the GTCNN allows us to interactively control the texture strength in the output image without any additional modules, training, or computational costs. △ Less

Submitted 19 April, 2020; v1 submitted 16 March, 2020; originally announced March 2020.

Comments: code is available: https://github.com/mdipcit/GTCNN

arXiv:2003.00790 [pdf]

Towards Identifying and closing Gaps in Assurance of autonomous Road vehicleS -- a collection of Technical Notes Part 2

Authors: Robin Bloomfield, Gareth Fletcher, Heidy Khlaaf, Philippa Ryan, Shuji Kinoshita, Yoshiki Kinoshit, Makoto Takeyama, Yutaka Matsubara, Peter Popov, Kazuki Imai, Yoshinori Tsutake

Abstract: This report provides an introduction and overview of the Technical Topic Notes (TTNs) produced in the Towards Identifying and closing Gaps in Assurance of autonomous Road vehicleS (Tigars) project. These notes aim to support the development and evaluation of autonomous vehicles. Part 1 addresses: Assurance-overview and issues, Resilience and Safety Requirements, Open Systems Perspective and Formal… ▽ More This report provides an introduction and overview of the Technical Topic Notes (TTNs) produced in the Towards Identifying and closing Gaps in Assurance of autonomous Road vehicleS (Tigars) project. These notes aim to support the development and evaluation of autonomous vehicles. Part 1 addresses: Assurance-overview and issues, Resilience and Safety Requirements, Open Systems Perspective and Formal Verification and Static Analysis of ML Systems. This report is Part 2 and discusses: Simulation and Dynamic Testing, Defence in Depth and Diversity, Security-Informed Safety Analysis, Standards and Guidelines. △ Less

Submitted 28 February, 2020; originally announced March 2020.

Comments: Authors of the individual notes are indicated in the text

Report number: Adelard Tigars D5.6 D/1259/138008/7

arXiv:2003.00789 [pdf]

Towards Identifying and closing Gaps in Assurance of autonomous Road vehicleS -- a collection of Technical Notes Part 1

Authors: Robin Bloomfield, Gareth Fletcher, Heidy Khlaaf, Philippa Ryan, Shuji Kinoshita, Yoshiki Kinoshit, Makoto Takeyama, Yutaka Matsubara, Peter Popov, Kazuki Imai, Yoshinori Tsutake

Abstract: This report provides an introduction and overview of the Technical Topic Notes (TTNs) produced in the Towards Identifying and closing Gaps in Assurance of autonomous Road vehicleS (Tigars) project. These notes aim to support the development and evaluation of autonomous vehicles. Part 1 addresses: Assurance-overview and issues, Resilience and Safety Requirements, Open Systems Perspective and Formal… ▽ More This report provides an introduction and overview of the Technical Topic Notes (TTNs) produced in the Towards Identifying and closing Gaps in Assurance of autonomous Road vehicleS (Tigars) project. These notes aim to support the development and evaluation of autonomous vehicles. Part 1 addresses: Assurance-overview and issues, Resilience and Safety Requirements, Open Systems Perspective and Formal Verification and Static Analysis of ML Systems. Part 2: Simulation and Dynamic Testing, Defence in Depth and Diversity, Security-Informed Safety Analysis, Standards and Guidelines. △ Less

Submitted 28 February, 2020; originally announced March 2020.

Comments: Authors of individual Topic Notes are indicated in the body of the report

Report number: Adelard Tigars D5.6 v2.0 (D/1259/138008/7)

arXiv:1912.00547 [pdf, other]

Large-scale text processing pipeline with Apache Spark

Authors: Alexey Svyatkovskiy, Kosuke Imai, Mary Kroeger, Yuki Shiraito

Abstract: In this paper, we evaluate Apache Spark for a data-intensive machine learning problem. Our use case focuses on policy diffusion detection across the state legislatures in the United States over time. Previous work on policy diffusion has been unable to make an all-pairs comparison between bills due to computational intensity. As a substitute, scholars have studied single topic areas. We provide… ▽ More In this paper, we evaluate Apache Spark for a data-intensive machine learning problem. Our use case focuses on policy diffusion detection across the state legislatures in the United States over time. Previous work on policy diffusion has been unable to make an all-pairs comparison between bills due to computational intensity. As a substitute, scholars have studied single topic areas. We provide an implementation of this analysis workflow as a distributed text processing pipeline with Spark dataframes and Scala application programming interface. We discuss the challenges and strategies of unstructured data processing, data formats for storage and efficient access, and graph processing at scale. △ Less

Submitted 1 December, 2019; originally announced December 2019.

Journal ref: Published in Proceedings of Big NLP workshop at the IEEE Big Data Conference 2016

arXiv:1610.00333 [pdf, other]

doi 10.1007/978-3-319-18812-6_3

5-State Rotation-Symmetric Number-Conserving Cellular Automata are not Strongly Universal

Authors: Katsunobu Imai, Hisamichi Ishizaka, Victor Poupet

Abstract: We study two-dimensional rotation-symmetric number-conserving cellular automata working on the von Neumann neighborhood (RNCA). It is known that such automata with 4 states or less are trivial, so we investigate the possible rules with 5 states. We give a full characterization of these automata and show that they cannot be strongly Turing universal. However, we give example of constructions that a… ▽ More We study two-dimensional rotation-symmetric number-conserving cellular automata working on the von Neumann neighborhood (RNCA). It is known that such automata with 4 states or less are trivial, so we investigate the possible rules with 5 states. We give a full characterization of these automata and show that they cannot be strongly Turing universal. However, we give example of constructions that allow to embed some boolean circuit elements in a 5-states RNCA. △ Less

Submitted 2 October, 2016; originally announced October 2016.

ACM Class: F.1.1

Journal ref: Automata 2014: 31-43

arXiv:1208.2771 [pdf, other]

doi 10.4204/EPTCS.90.21

A Universal Semi-totalistic Cellular Automaton on Kite and Dart Penrose Tilings

Authors: Katsunobu Imai, Takahiro Hatsuda, Victor Poupet, Kota Sato

Abstract: In this paper we investigate certain properties of semi-totalistic cellular automata (CA) on the well known quasi-periodic kite and dart two dimensional tiling of the plane presented by Roger Penrose. We show that, despite the irregularity of the underlying grid, it is possible to devise a semi-totalistic CA capable of simulating any boolean circuit on this aperiodic tiling. In this paper we investigate certain properties of semi-totalistic cellular automata (CA) on the well known quasi-periodic kite and dart two dimensional tiling of the plane presented by Roger Penrose. We show that, despite the irregularity of the underlying grid, it is possible to devise a semi-totalistic CA capable of simulating any boolean circuit on this aperiodic tiling. △ Less

Submitted 13 August, 2012; originally announced August 2012.

Comments: In Proceedings AUTOMATA&JAC 2012, arXiv:1208.2498

ACM Class: F.1.1; F.1.2; F.1.3

Journal ref: EPTCS 90, 2012, pp. 267-278

arXiv:1110.4163 [pdf, other]

doi 10.4204/EPTCS.69.6

Session Type Inference in Haskell

Authors: Keigo Imai, Shoji Yuen, Kiyoshi Agusa

Abstract: We present an inference system for a version of the Pi-calculus in Haskell for the session type proposed by Honda et al. The session type is very useful in checking if the communications are well-behaved. The full session type implementation in Haskell was first presented by Pucella and Tov, which is 'semi-automatic' in that the manual operations for the type representation was necessary. We g… ▽ More We present an inference system for a version of the Pi-calculus in Haskell for the session type proposed by Honda et al. The session type is very useful in checking if the communications are well-behaved. The full session type implementation in Haskell was first presented by Pucella and Tov, which is 'semi-automatic' in that the manual operations for the type representation was necessary. We give an automatic type inference for the session type by using a more abstract representation for the session type based on the 'de Bruijn levels'. We show an example of the session type inference for a simple SMTP client. △ Less

Submitted 18 October, 2011; originally announced October 2011.

Comments: In Proceedings PLACES 2010, arXiv:1110.3853

ACM Class: D.1.1; D.3.3

Journal ref: EPTCS 69, 2011, pp. 74-91

arXiv:0912.4164 [pdf, ps, other]

doi 10.1016/j.comgeo.2010.05.001

Distance k-Sectors Exist

Authors: Keiko Imai, Akitoshi Kawamura, Jiří Matoušek, Daniel Reem, Takeshi Tokuyama

Abstract: The bisector of two nonempty sets P and Q in a metric space is the set of all points with equal distance to P and to Q. A distance k-sector of P and Q, where k is an integer, is a (k-1)-tuple (C_1, C_2, ..., C_{k-1}) such that C_i is the bisector of C_{i-1} and C_{i+1} for every i = 1, 2, ..., k-1, where C_0 = P and C_k = Q. This notion, for the case where P and Q are points in Euclidean plane,… ▽ More The bisector of two nonempty sets P and Q in a metric space is the set of all points with equal distance to P and to Q. A distance k-sector of P and Q, where k is an integer, is a (k-1)-tuple (C_1, C_2, ..., C_{k-1}) such that C_i is the bisector of C_{i-1} and C_{i+1} for every i = 1, 2, ..., k-1, where C_0 = P and C_k = Q. This notion, for the case where P and Q are points in Euclidean plane, was introduced by Asano, Matousek, and Tokuyama, motivated by a question of Murata in VLSI design. They established the existence and uniqueness of the distance trisector in this special case. We prove the existence of a distance k-sector for all k and for every two disjoint, nonempty, closed sets P and Q in Euclidean spaces of any (finite) dimension, or more generally, in proper geodesic spaces (uniqueness remains open). The core of the proof is a new notion of k-gradation for P and Q, whose existence (even in an arbitrary metric space) is proved using the Knaster-Tarski fixed point theorem, by a method introduced by Reem and Reich for a slightly different purpose. △ Less

Submitted 21 December, 2009; originally announced December 2009.

Comments: 10 pages, 5 figures

ACM Class: F.2.2; G.0; F.0

Journal ref: Computational Geometry 43(9):713-720, November 2010

arXiv:0809.0355 [pdf, ps, other]

Simulations between triangular and hexagonal number-conserving cellular automata

Authors: Katsunobu Imai, Bruno Martin

Abstract: A number-conserving cellular automaton is a cellular automaton whose states are integers and whose transition function keeps the sum of all cells constant throughout its evolution. It can be seen as a kind of modelization of the physical conservation laws of mass or energy. In this paper, we first propose a necessary condition for triangular and hexagonal cellular automata to be number-conservin… ▽ More A number-conserving cellular automaton is a cellular automaton whose states are integers and whose transition function keeps the sum of all cells constant throughout its evolution. It can be seen as a kind of modelization of the physical conservation laws of mass or energy. In this paper, we first propose a necessary condition for triangular and hexagonal cellular automata to be number-conserving. The local transition function is expressed by the sum of arity two functions which can be regarded as 'flows' of numbers. The sufficiency is obtained through general results on number-conserving cellular automata. Then, using the previous flow functions, we can construct effective number-conserving simulations between hexagonal cellular automata and triangular cellular automata. △ Less

Submitted 2 September, 2008; originally announced September 2008.

Comments: 11 pages; International Workshop on Natural Computing, Yokohama : Japon (2008)

ACM Class: F.1.1

Showing 1–36 of 36 results for author: Imai, K