-
Long-Term Fairness Inquiries and Pursuits in Machine Learning: A Survey of Notions, Methods, and Challenges
Authors:
Usman Gohar,
Zeyu Tang,
Jialu Wang,
Kun Zhang,
Peter L. Spirtes,
Yang Liu,
Lu Cheng
Abstract:
The widespread integration of Machine Learning systems in daily life, particularly in high-stakes domains, has raised concerns about the fairness implications. While prior works have investigated static fairness measures, recent studies reveal that automated decision-making has long-term implications and that off-the-shelf fairness approaches may not serve the purpose of achieving long-term fairne…
▽ More
The widespread integration of Machine Learning systems in daily life, particularly in high-stakes domains, has raised concerns about the fairness implications. While prior works have investigated static fairness measures, recent studies reveal that automated decision-making has long-term implications and that off-the-shelf fairness approaches may not serve the purpose of achieving long-term fairness. Additionally, the existence of feedback loops and the interaction between models and the environment introduces additional complexities that may deviate from the initial fairness goals. In this survey, we review existing literature on long-term fairness from different perspectives and present a taxonomy for long-term fairness studies. We highlight key challenges and consider future research directions, analyzing both current issues and potential further explorations.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Fast Causal Inference with Non-Random Missingness by Test-Wise Deletion
Authors:
Eric V. Strobl,
Shyam Visweswaran,
Peter L. Spirtes
Abstract:
Many real datasets contain values missing not at random (MNAR). In this scenario, investigators often perform list-wise deletion, or delete samples with any missing values, before applying causal discovery algorithms. List-wise deletion is a sound and general strategy when paired with algorithms such as FCI and RFCI, but the deletion procedure also eliminates otherwise good samples that contain on…
▽ More
Many real datasets contain values missing not at random (MNAR). In this scenario, investigators often perform list-wise deletion, or delete samples with any missing values, before applying causal discovery algorithms. List-wise deletion is a sound and general strategy when paired with algorithms such as FCI and RFCI, but the deletion procedure also eliminates otherwise good samples that contain only a few missing values. In this report, we show that we can more efficiently utilize the observed values with test-wise deletion while still maintaining algorithmic soundness. Here, test-wise deletion refers to the process of list-wise deleting samples only among the variables required for each conditional independence (CI) test used in constraint-based searches. Test-wise deletion therefore often saves more samples than list-wise deletion for each CI test, especially when we have a sparse underlying graph. Our theoretical results show that test-wise deletion is sound under the justifiable assumption that none of the missingness mechanisms causally affect each other in the underlying causal graph. We also find that FCI and RFCI with test-wise deletion outperform their list-wise deletion and imputation counterparts on average when MNAR holds in both synthetic and real data.
△ Less
Submitted 24 May, 2017;
originally announced May 2017.
-
Estimating and Controlling the False Discovery Rate for the PC Algorithm Using Edge-Specific P-Values
Authors:
Eric V. Strobl,
Peter L. Spirtes,
Shyam Visweswaran
Abstract:
The PC algorithm allows investigators to estimate a complete partially directed acyclic graph (CPDAG) from a finite dataset, but few groups have investigated strategies for estimating and controlling the false discovery rate (FDR) of the edges in the CPDAG. In this paper, we introduce PC with p-values (PC-p), a fast algorithm which robustly computes edge-specific p-values and then estimates and co…
▽ More
The PC algorithm allows investigators to estimate a complete partially directed acyclic graph (CPDAG) from a finite dataset, but few groups have investigated strategies for estimating and controlling the false discovery rate (FDR) of the edges in the CPDAG. In this paper, we introduce PC with p-values (PC-p), a fast algorithm which robustly computes edge-specific p-values and then estimates and controls the FDR across the edges. PC-p specifically uses the p-values returned by many conditional independence tests to upper bound the p-values of more complex edge-specific hypothesis tests. The algorithm then estimates and controls the FDR using the bounded p-values and the Benjamini-Yekutieli FDR procedure. Modifications to the original PC algorithm also help PC-p accurately compute the upper bounds despite non-zero Type II error rates. Experiments show that PC-p yields more accurate FDR estimation and control across the edges in a variety of CPDAGs compared to alternative methods.
△ Less
Submitted 9 May, 2017; v1 submitted 13 July, 2016;
originally announced July 2016.
-
Calculation of Entailed Rank Constraints in Partially Non-Linear and Cyclic Models
Authors:
Peter L. Spirtes
Abstract:
The Trek Separation Theorem (Sullivant et al. 2010) states necessary and sufficient conditions for a linear directed acyclic graphical model to entail for all possible values of its linear coefficients that the rank of various sub-matrices of the covariance matrix is less than or equal to n, for any given n. In this paper, I extend the Trek Separation Theorem in two ways: I prove that the same nec…
▽ More
The Trek Separation Theorem (Sullivant et al. 2010) states necessary and sufficient conditions for a linear directed acyclic graphical model to entail for all possible values of its linear coefficients that the rank of various sub-matrices of the covariance matrix is less than or equal to n, for any given n. In this paper, I extend the Trek Separation Theorem in two ways: I prove that the same necessary and sufficient conditions apply even when the generating model is partially non-linear and contains some cycles. This justifies application of constraint-based causal search algorithms such as the BuildPureClusters algorithm (Silva et al. 2006) for discovering the causal structure of latent variable models to data generated by a wider class of causal models that may contain non-linear and cyclic relations among the latent variables.
△ Less
Submitted 17 September, 2013;
originally announced September 2013.
-
Detecting Causal Relations in the Presence of Unmeasured Variables
Authors:
Peter L. Spirtes
Abstract:
The presence of latent variables can greatly complicate inferences about causal relations between measured variables from statistical data. In many cases, the presence of latent variables makes it impossible to determine for two measured variables A and B, whether A causes B, B causes A, or there is some common cause. In this paper I present several theorems that state conditions under which it…
▽ More
The presence of latent variables can greatly complicate inferences about causal relations between measured variables from statistical data. In many cases, the presence of latent variables makes it impossible to determine for two measured variables A and B, whether A causes B, B causes A, or there is some common cause. In this paper I present several theorems that state conditions under which it is possible to reliably infer the causal relation between two measured variables, regardless of whether latent variables are acting or not.
△ Less
Submitted 20 March, 2013;
originally announced March 2013.
-
Causal Inference in the Presence of Latent Variables and Selection Bias
Authors:
Peter L. Spirtes,
Christopher Meek,
Thomas S. Richardson
Abstract:
We show that there is a general, informative and reliable procedure for discovering causal relations when, for all the investigator knows, both latent variables and selection bias may be at work. Given information about conditional independence and dependence relations between measured variables, even when latent variables and selection bias may be present, there are sufficient conditions for reli…
▽ More
We show that there is a general, informative and reliable procedure for discovering causal relations when, for all the investigator knows, both latent variables and selection bias may be at work. Given information about conditional independence and dependence relations between measured variables, even when latent variables and selection bias may be present, there are sufficient conditions for reliably concluding that there is a causal path from one variable to another, and sufficient conditions for reliably concluding when no such causal path exists.
△ Less
Submitted 20 February, 2013;
originally announced February 2013.
-
Directed Cyclic Graphical Representations of Feedback Models
Authors:
Peter L. Spirtes
Abstract:
The use of directed acyclic graphs (DAGs) to represent conditional independence relations among random variables has proved fruitful in a variety of ways. Recursive structural equation models are one kind of DAG model. However, non-recursive structural equation models of the kinds used to model economic processes are naturally represented by directed cyclic graphs with independent errors, a char…
▽ More
The use of directed acyclic graphs (DAGs) to represent conditional independence relations among random variables has proved fruitful in a variety of ways. Recursive structural equation models are one kind of DAG model. However, non-recursive structural equation models of the kinds used to model economic processes are naturally represented by directed cyclic graphs with independent errors, a characterization of conditional independence errors, a characterization of conditional independence constraints is obtained, and it is shown that the result generalizes in a natural way to systems in which the error variables or noises are statistically dependent. For non-linear systems with independent errors a sufficient condition for conditional independence of variables in associated distributions is obtained.
△ Less
Submitted 20 February, 2013;
originally announced February 2013.
-
Semi-Instrumental Variables: A Test for Instrument Admissibility
Authors:
Tianjiao Chu,
Richard Scheines,
Peter L. Spirtes
Abstract:
In a causal graphical model, an instrument for a variable X and its effect Y is a random variable that is a cause of X and independent of all the causes of Y except X. (Pearl (1995), Spirtes et al (2000)). Instrumental variables can be used to estimate how the distribution of an effect will respond to a manipulation of its causes, even in the presence of unmeasured common causes (confounders). In…
▽ More
In a causal graphical model, an instrument for a variable X and its effect Y is a random variable that is a cause of X and independent of all the causes of Y except X. (Pearl (1995), Spirtes et al (2000)). Instrumental variables can be used to estimate how the distribution of an effect will respond to a manipulation of its causes, even in the presence of unmeasured common causes (confounders). In typical instrumental variable estimation, instruments are chosen based on domain knowledge. There is currently no statistical test for validating a variable as an instrument. In this paper, we introduce the concept of semi-instrument, which generalizes the concept of instrument. We show that in the framework of additive models, under certain conditions, we can test whether a variable is semi-instrumental. Moreover, adding some distribution assumptions, we can test whether two semi-instruments are instrumental. We give algorithms to estimate the p-value that a random variable is semi-instrumental, and the p-value that two semi-instruments are both instrumental. These algorithms can be used to test the experts' choice of instruments, or to identify instruments automatically.
△ Less
Submitted 10 January, 2013;
originally announced January 2013.
-
Learning Measurement Models for Unobserved Variables
Authors:
Ricardo Silva,
Richard Scheines,
Clark Glymour,
Peter L. Spirtes
Abstract:
Observed associations in a database may be due in whole or part to variations in unrecorded (latent) variables. Identifying such variables and their causal relationships with one another is a principal goal in many scientific and practical domains. Previous work shows that, given a partition of observed variables such that members of a class share only a single latent common cause,…
▽ More
Observed associations in a database may be due in whole or part to variations in unrecorded (latent) variables. Identifying such variables and their causal relationships with one another is a principal goal in many scientific and practical domains. Previous work shows that, given a partition of observed variables such that members of a class share only a single latent common cause, standard search algorithms for causal Bayes nets can infer structural relations between latent variables. We introduce an algorithm for discovering such partitions when they exist. Uniquely among available procedures, the algorithm is (asymptotically) correct under standard assumptions in causal Bayes net search algorithms, requires no prior knowledge of the number of latent variables, and does not depend on the mathematical form of the relationships among the latent variables. We evaluate the algorithm on a variety of simulated data sets.
△ Less
Submitted 19 October, 2012;
originally announced December 2012.
-
Strong Faithfulness and Uniform Consistency in Causal Inference
Authors:
Jiji Zhang,
Peter L. Spirtes
Abstract:
A fundamental question in causal inference is whether it is possible to reliably infer manipulation effects from observational data. There are a variety of senses of asymptotic reliability in the statistical literature, among which the most commonly discussed frequentist notions are pointwise consistency and uniform consistency. Uniform consistency is in general preferred to po…
▽ More
A fundamental question in causal inference is whether it is possible to reliably infer manipulation effects from observational data. There are a variety of senses of asymptotic reliability in the statistical literature, among which the most commonly discussed frequentist notions are pointwise consistency and uniform consistency. Uniform consistency is in general preferred to pointwise consistency because the former allows us to control the worst case error bounds with a finite sample size. In the sense of pointwise consistency, several reliable causal inference algorithms have been established under the Markov and Faithfulness assumptions [Pearl 2000, Spirtes et al. 2001]. In the sense of uniform consistency, however, reliable causal inference is impossible under the two assumptions when time order is unknown and/or latent confounders are present [Robins et al. 2000]. In this paper we present two natural generalizations of the Faithfulness assumption in the context of structural equation models, under which we show that the typical algorithms in the literature (in some cases with modifications) are uniformly consistent even when the time order is unknown. We also discuss the situation where latent confounders may be present and the sense in which the Faithfulness assumption is a limiting case of the stronger assumptions.
△ Less
Submitted 19 October, 2012;
originally announced December 2012.
-
A Transformational Characterization of Markov Equivalence for Directed Acyclic Graphs with Latent Variables
Authors:
Jiji Zhang,
Peter L. Spirtes
Abstract:
Different directed acyclic graphs (DAGs) may be Markov equivalent in the sense that they entail the same conditional independence relations among the observed variables. Chickering (1995) provided a transformational characterization of Markov equivalence for DAGs (with no latent variables), which is useful in deriving properties shared by Markov equivalent DAGs, and, with certain generalization, i…
▽ More
Different directed acyclic graphs (DAGs) may be Markov equivalent in the sense that they entail the same conditional independence relations among the observed variables. Chickering (1995) provided a transformational characterization of Markov equivalence for DAGs (with no latent variables), which is useful in deriving properties shared by Markov equivalent DAGs, and, with certain generalization, is needed to prove the asymptotic correctness of a search procedure over Markov equivalence classes, known as the GES algorithm. For DAG models with latent variables, maximal ancestral graphs (MAGs) provide a neat representation that facilitates model search. However, no transformational characterization -- analogous to Chickering's -- of Markov equivalent MAGs is yet available. This paper establishes such a characterization for directed MAGs, which we expect will have similar uses as it does for DAGs.
△ Less
Submitted 4 July, 2012;
originally announced July 2012.
-
Towards Characterizing Markov Equivalence Classes for Directed Acyclic Graphs with Latent Variables
Authors:
Ayesha R. Ali,
Thomas S. Richardson,
Peter L. Spirtes,
Jiji Zhang
Abstract:
It is well known that there may be many causal explanations that are consistent with a given set of data. Recent work has been done to represent the common aspects of these explanations into one representation. In this paper, we address what is less well known: how do the relationships common to every causal explanation among the observed variables of some DAG process change in the presence of lat…
▽ More
It is well known that there may be many causal explanations that are consistent with a given set of data. Recent work has been done to represent the common aspects of these explanations into one representation. In this paper, we address what is less well known: how do the relationships common to every causal explanation among the observed variables of some DAG process change in the presence of latent variables? Ancestral graphs provide a class of graphs that can encode conditional independence relations that arise in DAG models with latent and selection variables. In this paper we present a set of orientation rules that construct the Markov equivalence class representative for ancestral graphs, given a member of the equivalence class. These rules are sound and complete. We also show that when the equivalence class includes a DAG, the equivalence class representative is the essential graph for the said DAG
△ Less
Submitted 4 July, 2012;
originally announced July 2012.
-
A theoretical study of Y structures for causal discovery
Authors:
Subramani Mani,
Peter L. Spirtes,
Gregory F. Cooper
Abstract:
There are several existing algorithms that under appropriate assumptions can reliably identify a subset of the underlying causal relationships from observational data. This paper introduces the first computationally feasible score-based algorithm that can reliably identify causal relationships in the large sample limit for discrete models, while allowing for the possibility that there are unobserv…
▽ More
There are several existing algorithms that under appropriate assumptions can reliably identify a subset of the underlying causal relationships from observational data. This paper introduces the first computationally feasible score-based algorithm that can reliably identify causal relationships in the large sample limit for discrete models, while allowing for the possibility that there are unobserved common causes. In doing so, the algorithm does not ever need to assign scores to causal structures with unobserved common causes. The algorithm is based on the identification of so called Y substructures within Bayesian network structures that can be learned from observational data. An example of a Y substructure is A -> C, B -> C, C -> D. After providing background on causal discovery, the paper proves the conditions under which the algorithm is reliable in the large sample limit.
△ Less
Submitted 27 June, 2012;
originally announced June 2012.
-
Adjacency-Faithfulness and Conservative Causal Inference
Authors:
Joseph Ramsey,
Jiji Zhang,
Peter L. Spirtes
Abstract:
Most causal inference algorithms in the literature (e.g., Pearl (2000), Spirtes et al. (2000), Heckerman et al. (1999)) exploit an assumption usually referred to as the causal Faithfulness or Stability condition. In this paper, we highlight two components of the condition used in constraint-based algorithms, which we call "Adjacency-Faithfulness" and "Orientation-Faithfulness". We point out that a…
▽ More
Most causal inference algorithms in the literature (e.g., Pearl (2000), Spirtes et al. (2000), Heckerman et al. (1999)) exploit an assumption usually referred to as the causal Faithfulness or Stability condition. In this paper, we highlight two components of the condition used in constraint-based algorithms, which we call "Adjacency-Faithfulness" and "Orientation-Faithfulness". We point out that assuming Adjacency-Faithfulness is true, it is in principle possible to test the validity of Orientation-Faithfulness. Based on this observation, we explore the consequence of making only the Adjacency-Faithfulness assumption. We show that the familiar PC algorithm has to be modified to be (asymptotically) correct under the weaker, Adjacency-Faithfulness assumption. Roughly the modified algorithm, called Conservative PC (CPC), checks whether Orientation-Faithfulness holds in the orientation phase, and if not, avoids drawing certain causal conclusions the PC algorithm would draw. However, if the stronger, standard causal Faithfulness condition actually obtains, the CPC algorithm is shown to output the same pattern as the PC algorithm does in the large sample limit. We also present a simulation study showing that the CPC algorithm runs almost as fast as the PC algorithm, and outputs significantly fewer false causal arrowheads than the PC algorithm does on realistic sample sizes. We end our paper by discussing how score-based algorithms such as GES perform when the Adjacency-Faithfulness but not the standard causal Faithfulness condition holds, and how to extend our work to the FCI algorithm, which allows for the possibility of latent variables.
△ Less
Submitted 27 June, 2012;
originally announced June 2012.
-
Discovering Cyclic Causal Models by Independent Components Analysis
Authors:
Gustavo Lacerda,
Peter L. Spirtes,
Joseph Ramsey,
Patrik O. Hoyer
Abstract:
We generalize Shimizu et al's (2006) ICA-based approach for discovering linear non-Gaussian acyclic (LiNGAM) Structural Equation Models (SEMs) from causally sufficient, continuous-valued observational data. By relaxing the assumption that the generating SEM's graph is acyclic, we solve the more general problem of linear non-Gaussian (LiNG) SEM discovery. LiNG discovery algorithms output the distri…
▽ More
We generalize Shimizu et al's (2006) ICA-based approach for discovering linear non-Gaussian acyclic (LiNGAM) Structural Equation Models (SEMs) from causally sufficient, continuous-valued observational data. By relaxing the assumption that the generating SEM's graph is acyclic, we solve the more general problem of linear non-Gaussian (LiNG) SEM discovery. LiNG discovery algorithms output the distribution equivalence class of SEMs which, in the large sample limit, represents the population distribution. We apply a LiNG discovery algorithm to simulated data. Finally, we give sufficient conditions under which only one of the SEMs in the output class is 'stable'.
△ Less
Submitted 13 June, 2012;
originally announced June 2012.
-
Causal discovery of linear acyclic models with arbitrary distributions
Authors:
Patrik O. Hoyer,
Aapo Hyvarinen,
Richard Scheines,
Peter L. Spirtes,
Joseph Ramsey,
Gustavo Lacerda,
Shohei Shimizu
Abstract:
An important task in data analysis is the discovery of causal relationships between observed variables. For continuous-valued data, linear acyclic causal models are commonly used to model the data-generating process, and the inference of such models is a well-studied problem. However, existing methods have significant limitations. Methods based on conditional independencies (Spirtes et al. 1993; P…
▽ More
An important task in data analysis is the discovery of causal relationships between observed variables. For continuous-valued data, linear acyclic causal models are commonly used to model the data-generating process, and the inference of such models is a well-studied problem. However, existing methods have significant limitations. Methods based on conditional independencies (Spirtes et al. 1993; Pearl 2000) cannot distinguish between independence-equivalent models, whereas approaches purely based on Independent Component Analysis (Shimizu et al. 2006) are inapplicable to data which is partially Gaussian. In this paper, we generalize and combine the two approaches, to yield a method able to learn the model structure in many cases for which the previous methods provide answers that are either incorrect or are not as informative as possible. We give exact graphical conditions for when two distinct models represent the same family of distributions, and empirically demonstrate the power of our method through thorough simulations.
△ Less
Submitted 13 June, 2012;
originally announced June 2012.