The Logic of Counterfactuals and
the Epistemology of Causal Inference
Abstract
The 2021 Nobel Prize in Economics recognized an epistemology of causal inference based on the Rubin causal model (Rubin 1974), which merits broader attention in philosophy. This model, in fact, presupposes a logical principle of counterfactuals, Conditional Excluded Middle (CEM), the locus of a pivotal debate between Stalnaker (1968) and Lewis (1973) on the semantics of counterfactuals. Proponents of CEM should recognize that this connection points to a new argument for CEM—a Quine-Putnam indispensability argument grounded in the Nobel-winning applications of the Rubin model in health and social sciences. To advance the dialectic, I challenge this argument with an updated Rubin causal model that retains its successes while dispensing with CEM. This novel approach combines the strengths of the Rubin causal model and a causal model familiar in philosophy, the causal Bayes net. The takeaway: deductive logic and inductive inference, often studied in isolation, are deeply interconnected.
1 Introduction
The debates on inductive inference often proceed in isolation from the controversies about deductive logic, which makes sense. When discussing induction, we often tentatively assume a simple deductive framework, like classical logic, as induction alone poses enough challenges and disagreements. But I would like to highlight the intricate relationship between deduction and induction. Specifically, I will explore the connection between the deductive logic of counterfactuals on the one hand, and a very interesting type of induction on the other hand—causal inference. The spotlight will be on the most influential theory of causal inference in health and social sciences. This theory, recognized by the 2021 Nobel Prize in Economics but underdiscussed in philosophy, has been widely used for tasks such as estimating the efficacy of new drugs and the impact of military service on lifetime earnings.
This Nobel Prize-winning theory of causal inference is based on the Rubin causal model (Rubin 1974), also known as the potential outcome framework. It is developed on the assumption of a deductively logical principle of counterfactuals:
Conditional Excluded Middle (CEM)
It is logically necessary that
- either would be the case if were the case,
- or would not the be case if were the case.
This logical principle is not entirely uncontroversial in science. In fact:
-
•
Statistician Dawid (2000) raises concerns about this logical principle, leading him to reject the Rubin causal model and its associated theory of causal inference.
-
•
In contrast, computer scientist Pearl (2000) adopts the opposite stance, suggesting that the success of this theory of causal inference supports the theory itself and, in turn, vindicates its underlying logical principle, CEM.
This debate, mostly pursued in science for now, warrants attention by philosophers. Indeed, Dawid appears unaware that his objection to CEM closely mirrors Lewis’s (1973) critique of Stalnaker’s (1968) adoption of CEM—a classic debate in philosophy of language. And Pearl is close to offering an indispensability argument for CEM. I will examine both sides of the debate to illustrate how issues of induction intertwine with those of deduction.
In particular, I will first turn Pearl’s (2000) preliminary defense of CEM into a full fledged argument. Here is the idea: CEM was already assumed in the early days of the Rubin causal model (Rubin 1974), which found important applications to causal inference in health and social sciences through the work of Imbens & Angrist (1994) and Angrist, Imbens, & Rubin (1996), culminating in the 2021 Nobel Prize in Economics. Notably, even though the assumption of CEM was already challenged in the scientific community more than 20 years ago (Dawid 2000), it has remained central to the Rubin causal model to this day. I will explain in detail why CEM has been here to stay for so long. Thus, CEM appears to be an indispensable part of our best scientific theory of causal inference in health and social sciences. An argument for CEM then emerges: an indispensability argument in the style of Quine (1948) and Putnam (1971), as detailed below (Section 3).
Next, following the good cop/bad cop approach, I will switch sides and undermine the indispensability argument. A new theory of causal inference will be developed to dispense with CEM while preserving the Nobel-Prize-winning applications of the Rubin causal model. The key, somewhat surprisingly, is to combine two causal modeling frameworks: the Rubin causal model, more familiar to health and social scientists, and the causal Bayes net, more familiar in philosophy (Section 4).
In the final section, 5, the good cop/bad cop dialectic will conclude by connecting it to a broader philosophical context, encompassing such topics as the revisability of deductive logic, intertheory relations, and the role of background assumptions in justifying scientific inference.
Before doing all these, I must first tackle a preliminary task. Given the severe lack of discussion on the Rubin causal model in philosophy, I will have to begin by providing an accessible tutorial on it in Section 2. This tutorial will be accompanied by a fully rigorous version, presented in Appendix A.1.
2 A Gentle Introduction to the Rubin Causal Model
The Rubin causal model has been extensively applied to study various aspects of our medical and economic lives. Think about it: life itself is not unlike a card game.
2.1 Introducing the Card Game
There are cards that determine our fates:
Card #1: What If You Took the Treatment?
Nature gives every individual a card of this form: the back is printed with ‘’, and the face is printed with ‘’ or ‘’.
The former case means that this person would be cured if they took the treatment, while the latter means that this person would not be cured if they took the treatment. Thus, this card design already presupposes Conditional Excluded Middle.
There is only one rule for card flipping: any card given to a person is initially face down and will be flipped to reveal the result exactly when the if-clause actually applies to that person.
Similarly, there is also:
Card #2: What If You Didn’t Take the Treatment?
Nature gives every individual a second card, with the back printed ‘’, and the face is printed with ‘’ or ‘’.
Each person’s cards #1 and #2 define that person’s individual treatment effect (ITE): the value of binary variable on card #1 minus its value on card #2. There are three possible cases:
The average treatment effect (ATE) for a population is defined as the average of the individual treatment effects for all individuals in the population.
A bit of algebra shows that the ATE is equal to the difference between two proportions:
(i) the proportion of ‘’ cards among all cards of type #1 | ||||
There is a simple and effective way to estimate term (i): randomly flipping some cards of type #1 in the population—or equivalently, randomly selecting some people in the population and forcing them to flip their cards of type #1. Once the faces of those cards are revealed, register the proportion of the occurrences of ‘’, and use it as an estimate of term (i). Term (ii) can be estimated similarly. This procedure for estimating the ATE is the idea behind randomized controlled trials (RCTs). The problem, however, is that RCTs are often ethically impermissible.
Fortunately, there is a Nobel-Prize-winning solution, which seeks to estimate, not exactly the ATE, but a closely related causal effect—without forcing anyone to do anything.
2.2 Switching from the ATE to the LATE
Let’s randomly select individuals from the population and then assign each of them to either the treatment or control group by flipping a coin. Here is the thing: anyone in the treatment group is offered the treatment for free, and they decide whether to take it—there is no forcing anyone to do anything. This creates a new type of card:
Card #3: What If You Were Assigned to the Treatment Group?
Nature gives every individual a card of this form: the back is printed with ‘’ (where means the treatment group), and the face is printed with ‘’ or ‘’.
This determines whether the individual would or would not take the treatment if assigned to the treatment group. Similarly:
Card #4: What If You Were Assigned to the Control Group?
Nature gives every individual a card of this form: the back is printed with ‘’ (where means the control group), and the face is printed with ‘’ or ‘’.
With the new cards, we can define some subpopulations:
-
1.
Compilers: those who would take the treatment if assigned to the treatment group, and would not if assigned to the control group (namely, those whose card #3 and card #4 are printed with ‘’ and ‘’, respectively).
-
2.
Defiers: those who would do the opposite of what compliers would do.
-
3.
Always-Takers: those who would take the treatment regardless of assignment.
-
4.
Never-Takers: those who would not take the treatment regardless of assignment.
By Conditional Excluded Middle, those four subpopulations jointly exhaust the entire population.
Now, let the target of estimation be, not exactly the ATE, but a closely related quantity, the LATE, short for local average treatment effect. The LATE is defined as the average of the individual treatment effects of just the compliers in the population, or more formally:
LATE |
Interestingly, when there are no defiers and other conditions are met, it is possible to estimate the LATE without forcing anyone to take the treatment, as will be shown shortly.
2.3 Estimating the LATE
The standard procedure for estimating the LATE is known as instrumental variable estimation. To understand it, we need a theorem, now a classic result in econometrics and statistics (Imbens & Angrist 1994, Angrist, Imbens, & Rubin 1996):
Informal Statement of Theorem 1 (Identification of the LATE). In the card game presented above, which already builds in Conditional Excluded Middle, suppose that the following four assumptions hold:
- •
(Random Selection) Everyone has an equal probability of being selected.
- •
(Random Assignment) The selected people are randomly assigned, with equal probabilities, to the treatment or control group.
- •
(Existence of Compliers) There are compliers.
- •
(No Defiers) There are no defiers.
Then the LATE can be expressed solely in terms of probabilities over the three observable variables—, , and —without counterfactuals. Specifically:
To be more precise, this equation holds under the assumptions 1-8 formalized in Appendix A.1. The first four of those eight assumptions, including CEM, are build into the card design; the remaining four are stated informally as the bullet points in the above.
Some explanations are in order. First, the assumption that there are compliers plays a straightforward role by ensuring that the target of estimation, the LATE, is well-defined (i.e., has a nonzero denominator).
The assumption of no defiers plays a more interesting role: to delineate the scope of application. For example, when estimating the causal effect of a newly designed drug not yet available on the market, no one in the control group could take the new drug, which implies that no one is a defier. Another example comes from Angrist’s (1990) now-classic study on the Vietnam War, where “random assignment” refers to the draft lottery, “treatment” to military service, and the “medical result” to lifetime earnings. A defier in this scenario is someone being this crazy: one who would volunteer for military service if they were not drafted but would avoid service if drafted. Here, it is also reasonable to assume that no defiers exist. However, in cases where it is implausible to assume that, the present theorem provides no guidance on estimating causal effects.
Let’s now turn to , the probability function in use. The probabilities discussed in this paper are restricted to physical objective probabilities. These probabilities might be frequencies (Neyman 1955), propensities (Popper 1959), or primitive physical states posited in science (Sober 2000: sec. 3.2)—to mention just the options developed with classical statistics in mind, which often serves as the background theory for the Rubin causal model. I remain open to the metaphysics of physical objective probabilities; the focus of this paper is epistemology.
The first conditional probability in the equation, , is defined in the standard way:
where the denominator is the probability that a randomly selected person is assigned to the treatment group , and the numerator is the probability that a randomly selected person is assigned to the treatment group and then gets cured . This unknown conditional probability can be easily estimated—by the observed proportion of the cured individuals in the treatment group. The other three conditional probabilities can be similarly estimated by observed proportions. This procedure for estimating the conditional probabilities on the right-hand side of the equation, and thus estimating the LATE on the left-hand side, is known as instrumental variable estimation (with the variable serving as the so-called instrument).
This result marks an important achievement. Recall that the LATE is defined in counterfactual terms, using the contents of cards that cannot all be flipped to reveal their faces at the same time—a single person cannot simultaneously take the treatment and not take it. Fortunately, to estimate the LATE, it suffices to observe some proportions in the treatment and control groups and estimate the counterfactual-free, conditional probabilities on the right-hand side of the equation in Theorem 1. It is amazing that an interesting quantity defined in counterfactual terms (the LATE on the left) can be identified with a quantity that depends solely on counterfactual-free probabilities (on the right), which are easy to estimate. Thus, this theorem is also known as an identification result. Many important theorems in statistics and econometrics for causal inference are identification results.
For a rigorous statement of Theorem 1, see Appendix A.1, which seeks to improve upon standard presentations. To be sure, there is a particularly lucid and frequently cited presentation in the statistics article by Angrist, Imbens, & Rubin (1996, Proposition 1), but those authors list only four assumptions, omitting an explicit statement of CEM. In Appendix A.1, I identify eight assumptions in total, including CEM, of course.
This concludes the first task of this paper: a crash course on the Rubin causal model and the identification result for the LATE.
3 Playing Good Cop
The preceding discussion can inspire a new argument for Conditional Excluded Middle. Let me flesh it out, playing the role of the good cop—for now.
3.1 A New Argument for CEM
Why might it be interesting to have a new argument supporting CEM? The reason is that there is a highly influential argument against CEM (Lewis 1973). Let me briefly review it. Consider the following pair of sentences:
-
If took the treatment, would be cured.
-
If took the treatment, would not be cured.
CEM requires that the disjunction be true in every possible world. To find a counterexample, consider an indeterministic world in which the following holds:
-
If took the treatment, would have a nontrivial probability of being cured and a probability of being not cured, where nontriviality means that lies strictly between and .
Then argue as follows that the truth of implies the falsity of both and :
Indeterminist Argument Against CEM
-
1.
Assume that is true.
-
2.
By 1, if the individual took the treatment, would have a more-than-zero probability of being not cured.
-
3.
So, if took the treatment, could be not cured. (This follows from 2, by the inference from ‘would have a more-than-zero probability to be’ to ‘could be’.)
-
4.
Now, suppose for reductio that is true: if took the treatment, would be cured.
-
5.
Then, by 3 and 4, we have: if took the treatment, would be cured and could be not cured—absurd.
-
4.
-
6.
So, by the reductio argument from 4 to 5, it follows that is false.
-
7.
By symmetry, is false, too; thus and are both false.
In a nutshell, nontrivial counterfactual probabilitiy refutes CEM—or so Lewis (1973) concludes. Hájek (manuscript) further argues that such counterexamples to CEM are pervasive in the actual world we live in.
The above is just round one of the debate. The next round features responses from defenders of CEM, such as Stalnaker (1981). This debate has unfolded across philosophy of language (Williams 2010), metaphysics (Emery 2017), and traditional epistemology (Boylan 2024).111For reviews of this debate, see Loewenstein (2021) and Mandelkern (2022, sec. 17.3.4). I submit that philosophy of science is also an area where we can explore a new argument for CEM:
Indispensability Argument For CEM
CEM is assumed in our best theory of causal inference in health and social sciences, whose application to instrumental variable estimation underpinned the 2021 Nobel Prize in Economics. Despite the influential challenge raised by statistician Dawid (2000) more than twenty years ago in the scientific community —a challenge very similar to Lewis’s (1973) worry from nontrivial counterfactual probability—CEM has persisted as a core assumption of this theory to this day. Thus, CEM seems indispensable. Given that we should believe in our best theory of causal inference in health and social sciences, and that CEM is an indispensable part of it, it seems that we have no choice but to believe in CEM—for fear of intellectual dishonesty, in Putnam’s (1971) terms.
As just mentioned, the indispensability of CEM is already supported by its persistence in the face of the challenge in the scientific community. This indispensability can be further reinforced by examining the role of CEM in the Rubin causal model, to which I turn now.
3.2 What’s the Role of CEM, Exactly?
CEM has been here to stay for a long time due to its crucial role in proving key lemmas that underpin causal inference. To see this clearly, we need to delve into some formal details of the Rubin causal model. This section is more technical and can be skipped on a first reading.
Let express the proposition that the individual takes the treatment. Similarly, expresses that is cured, and expresses that is assigned to the treatment group (rather than the control group). To this notation, we can add superscripts to express counterfactuals, such as the following:
-
•
means that would be cured if took the treatment.
-
•
means that the individual would not take the treatment if were assigned to the control group.
The Rubin causal model makes some logical assumptions:
Assumption (Centering/Consistency).222While ‘Centering’ is the standard name for this logical principle in philosophy, the scientific literature uses ‘Consistency’ instead. The antecedent of a counterfactual is redundant if it happens to be true; in symbols:
There is another logical assumption, being the focus of this paper:
Assumption (Conditional Excluded Middle, or CEM). If is a binary variable, so is the counterfactual variable .
Notably, most presentations in the scientific literature only mention in passing that is a binary variable, making this assumption look more innocuous than it actually is. The substance of this assumption can be appreciated only by going from the formalism back to the intended interpretation: To say that is binary is to say that either or , which means that either would be cured under the treatment or would not be cured under the treatment—an instance of CEM.
The four cards for each individual correspond to the four counterfactual variables: , , , and , whose values correspond to the faces of the four cards. Thus, the card-based definitions presented above can be formalized as follows. First, the ITE (individual treatment effect) for an individual is defined by:
The four subpopulations are defined as follows:
The target of estimation is the local average treatment effect for the compliers, which is defined by:
LATE |
where the denominator denotes the size of the complier subpopulation. We can then derive a probabilistic formula to express the LATE:
Lemma A. Under the assumption of Random Selection (that everyone has an equal probability of being selected), we have:
This formula is often treated as a definition in textbooks for convenience (Hernán & Robins 2023), but is actually a lemma in the rigorous treatment (Imbens & Rubin 2015). If we unpack the conditional probabilities on the right-hand side using the standard definition, there will appear a denominator , the probability of selecting a complier from the population, which by Random Selection equals the proportion of compliers in the population. So, in the existing proofs of the theorem for identifying and estimating the LATE, the crux is find a formula that helps estimate the proportion of compliers in the population. It is in this task that CEM is deeply involved. Let me explain.
The existing proofs start with this lemma:
Lemma B. Under the assumption of CEM, the four subpopulations just defined—compliers, defiers, always-takers, and never-takers—are mutually exclusive and jointly exhaustive.
This lemma is easy to prove: mutual exclusion follows immediately from the definitions; joint exhaustion follows immediately from the definitions and the assumption of CEM. Thanks to Lemma B, the following four proportions sum to :
-
(1)
the proportion of compliers in the population;
-
(2)
the proportion of defiers in the population;
-
(3)
the proportion of never-takers in the population;
-
(4)
the proportion of always-takers in the population.
So, to estimate the primary target, (1), it suffices to subtract the estimates of (2)-(4) from . This is the first role played by Lemma A, and hence, by CEM. Then, since (2) is equal to zero by the assumption of No Defiers, it remains to estimate (3) and (4).
To estimate (3), consider the following three quantities:
-
the proportion of never-takers in the population;
-
the proportion of never-takers in the control group;
-
the proportion of those who end up not taking the treatment in the control group.
Under the assumption of Random Assignment to the treatment or control group, proportion can be estimated by proportion , if we can obtain an accurate value for the latter. And we can. The idea is to exploit this lemma:
Lemma C. Under the assumptions of CEM, Centering, and No Defiers, it follows that, within the treatment group, the never-takers are exactly those who end up not taking the treatment. Or in symbols, implies this equivalence:
Thanks to this lemma, proportion is equal to proportion , which can be easily obtained through observartion: simply count the number of individuals not taking the treatment in the treatment group and divide it by the size of the treatment group. To recap: CEM is assumed in Lemma C, which enables us to use proportion as an accurate value of proportion , which, by Random Assignment, can then be used as a good estimate of proportion . Very ingenious indeed!
The idea behind the proof of Lemma C is also clever, drawing on Lemma A. This is a second role played by Lemma A, and hence, by CEM. Let me present the proof in plain language.
Proof of Lemma C. To prove the “” direction, consider any individual being a never-taker in the treatment group. Then, by the assumption of Centering, does not take the treatment. Now, to prove the “” direction, consider any individual in the treatment group who ends up not taking the treatment. By Lemma A, which relies on CEM, this person must be one of the following: an always-taker, never-taker, defier, or complier—notably, this is the only place where CEM is employed in this proof. Of those four possibilities, three can be eliminated. Specifically, we can eliminate the possibility that is a defier, by the assumption of No Defiers. We can also eliminate the possibility that is an always-taker or complier; for the always-takers and complier in the treatment group end up taking the treatment by the assumption of Centering, but does not take the treatment. Thus, the only remaining possibility is that is a never-taker, as desired. Q.E.D.
Now that we know how to estimate proportion (3), the same trick can be used to estimate (4), the proportion of always-takers in the population, by counting the actual takers in the control group, and by applying a similar lemma with a similar proof.333 This lemma, Lemma C’, states that, under the assumptions of CEM, Centering, and No Defiers, we have that, within the control group, the always-takers are exactly those who end up taking the treatment; in symbols, implies this equivalence: . And recall that proportion (2) equals zero. Once estimates of proportions (2), (3), and (4) are obtained as explained above, subtracting them from yields an estimate of the primary target, (1), the proportion of compliers.
3.3 Wrapping up the Indispensability Argument
I hope the above reconstruction illuminates the deeply involved roles that CEM plays in the Rubin model and in its applications to causal inference. No wonder CEM has remained a core assumption for more than twenty years even after the influential challenge posed by statistician Dawid in 2000. This strongly suggests that CEM is indispensable to our best theory of causal inference in health and social sciences.
I have thus completed my second task: presenting a new argument that proponents of CEM can explore and utilize—an indispensability argument drawn from the 2021 Nobel Prize in Economics. To further the dialectic, it is now time for me to switch sides and assist opponents of CEM.
4 Playing Bad Cop
In my role as the bad cop, I undermine the indispensability argument by showing how the above theory of causal inference can be reformulated without the assumption of CEM. This is akin in spirit to what Field (1980/2016) did to challenge the indispensability argument for mathematical realism; he reformulated Newtonian mechanics without referring to real numbers.
The assumption of CEM is dxeeply involved in the original card game, as we have seen. So, to remove that assumption, the base game needs an overhaul, to be achieved by two expansion packs.
4.1 An Expansion Pack: Going Stochastic
In the base game, everyone is given only a single card printed with ‘’, whose face determines whether that person would, or would not, be cured under the treatment. But now imagine that you are given not just one card printed with ‘’, but a deck of such cards, where 80% are printed with ‘’ on their faces, and the remaining 20% with ‘’. Let this deck be thoroughly shuffled, with all faces down initially. What if you took the treatment? Nature would then randomly draw a card from this deck and flip it to reveal your medical result. Consequently, you would have an 80% probability of being cured.444If randomly drawing a card from a deck does not sound chancy enough, replace it with measuring an observable in a quantum-mechanical system. So, you could be cured and could be not cured, and hence, it is neither true that you would be cured nor that you would not be cured. CEM is thereby rendered invalid—or so the Lewisians contend.
Let’s generalize. In the base game, every individual is given four cards, answering the following what-if questions:
-
•
What if one took (or didn’t take) the treatment?
-
•
What if one were assigned to the treatment (or control) group?
Now, let each individual’s four cards be replaced by four decks, which provide answers in the following form: ‘If individual were …, then would have a probability of being …’. Such a is a counterfactual probability—a probability under a counterfactual condition.
So, we now have a stochastic version of the Rubin causal model: single cards are replaced by decks of cards—that is, deterministic outcomes are replaced by counterfactual probabilities. These counterfactual probabilities can then be used to redefine several concepts in the original Rubin causal model.
Start with the ITE (individual treatment effect). Each individual still has an ITE, but it is now redefined as the difference between two counterfactual probabilities, or equivalently, two proportions in decks of cards:
(i) the proportion of ‘’ cards in ’s deck for ‘’ | ||||
In the limiting case where each deck contains only one card, the newly defined ITE reduces to the original ITE.
Subpopulations are redfined, too. Every individual now has a degree of compliance , defined by how one’s counterfactual probability of taking the treatment would change if one switched from the control group to the treatment group:
the proportion of ‘’ cards in ’s deck for ‘’ | ||||
The difference between term and term can be positive, zero, or negative, corresponding to three subpopulations:
-
•
If , one is called a complier (in the general sense).
-
•
If , one is called a defier (in the general sense).
-
•
If , one is called an indifferent-taker, with two special cases: an always-taker, who has = = 100%, and a never-taker, who has = = 0%.
As to the target of estimation, LATE, it is replaced by a more general concept: a weighted average of the individual treatment effects, where each individual’s weight is proportional to their degree of compliance . This new concept is called the degree-of-compliance-weighted average treatment effect, or DATE for short. In symbols:
where the denominator in the definition of weights is a normalizing factor introduced to ensure that the weights sum to .
The present setting is quite general, encompassing the original card game as a limiting case, where every deck contains only one card. In this special case, all compliers are equally compliant, with a maximal degree of compliance ( minus ), which reduces the DATE to the LATE.
4.2 The Final Expansion Pack: A Causal Bayes Net
The next step is to state a key assumption in instrumental variable estimation, which, when expressed in plain language, asserts the following:
Assumption (Instrumentality, Informal Version). The assignment mechanism (to the treatment/control group) causally influences the medical outcome only through whether an individual takes the treatment. Moreover, there is no common cause shared by the assignment mechanism and the medical outcome.
When this assumption holds, the variable is called an instrument. This informal statement is often found in textbooks (Hernán & Robins 2023, sec. 16.1), but interestingly, the standard formalization of this assumption in the Rubin causal model appears quite different, as you can see from the statement of Assumption 2 in Appendix A.1 (see also Hernán & Robins 2023, technical point 16.1).
I propose a more straightforward formalization of this assumption, using the causal structure depicted in Figure 1.

This causal structure is an exact representation of the Instrumentality assumption: every path from the variable to the variable passes through the variable, and there is no common cause shared by and . The confounding variable, , is set to be as fine-grained as possible to avoid missing any confounding factors: its possible values are the individuals in the population. This suffices to encompass all the social, economic, and health conditions of each individual.
Next, let’s turn this causal graph into a causal Bayes net.555 The word ‘Bayes’ can be misleading. Despite the established name in the literature, there is nothing inherently Bayesian in causal Bayes nets, also known as causal Bayesian networks. The probabilities in such networks are most naturally interpreted as physical objective probabilities, measuring the propensities or tendencies of causal influences, rather than degrees of belief. This is done by specifying some probabilities: the probability distribution of each exogenous variable (i.e., and ), and the conditional probability distribution of each effect variable given its direct cause variables, as shown in Figure 2.

Those probabilities are defined as follows. First, everyone in the population has an equal probability of being selected, so , where is the -th individual and is the population size. Once a person is selected, a coin is flipped to decide whether to assign that person to the treatment or control group, with , or more generally, being a constant, independent of the individual selected. Finally, the conditional probabilities of effects given direct causes are identified with the appropriate counterfactual probabilities , and (as shown in the Figure 2), whose values are taken from the stochastic version of the Rubin causal model, or equivalently, the stochastic expansion pack to the base game:
The main idea can be summarized as follows:
Proposal of a New Causal Modeling. While the original Rubin causal model allows only deterministic outcomes for an individual, it is updated with an expansion pack—replacing single cards with decks—to allow stochastic outcomes with nontrivial counterfactual probabilities. These probabilities are then incorporated into an appropriate causal Bayes net.
This is a combination of two frameworks for causal modeling: the Rubin causal model, more familiar to health and social scientists, and the causal Bayes net, more familiar to philosophers and computer scientists. You will soon see that these two causal models are stronger together, at least for the purpose of pursuing freedom from CEM.
4.3 Dispensing with CEM
Finally, we arrive at a new result—a stochastic counterpart to the previous theorem:
Theorem 2 (Identification of the DATE). Suppose that the following assumptions hold:
- •
(Random Selection) Individuals are randomly selected from the population with equal probabilities.
- •
(Random Assignment) The selected people are randomly assigned to the treatment or control group with a constant bias strictly between and e.g., by flipping a fair coin.
- •
(Instrumentality*) The true causal model is the causal Bayes net depicted in Figure 2.
- •
(Existence of Compliers*) There are compliers in the population, in the sense that someone’s degree of compliance is positive.
- •
(No Defiers*) There are no defiers in the population, in the sense that no one’s degree of compliance is negative.
Then the DATE can be expressed solely in terms of probabilities over the observable variables—, , and —without counterfactuals. Specifically:
See Appendix A.2 for a proof. The first two assumptions are actually redundant, as they are already encapsulated in the causal Bayes net posited in the third assumption; but they are stated here to highlight the role of randomization. The last three assumptions are labeled with asterisks to distinguish them from their counterparts in the original Rubin causal model, as stated in Appendix A.1.
This new theorem has a notable feature: the right-hand side of the equation for the DATE in the new theorem is identical to that for the LATE in the classic result. Both are expressed as the same combination of conditional probabilities: . This feature is crucial. Scientists can continue using the same procedure of instrumental variable estimation—estimating the left-hand side by estimating the exact same conditional probabilities on the right-hand side, based on the exact same proportions observed in the treatment and control groups. However, thanks to this new theorem, the old estimation procedure no longer assumes CEM and can be reinterpreted as estimating the new left-hand side: the newly defined causal effect DATE, of which the LATE is merely a limiting case in a deterministic world (at least for Lewisans).
This reinterpretation undermines the indispensability argument. Medical and social scientists have practiced instrumental variable estimation for decades, with the stated goal of estimating the LATE under the assumption of CEM. Yet this well-established practice can now be reinterpreted as actually estimating the DATE all along—without assuming CEM. So, the successes of the original theory for causal effect estimation are preserved in the new theory, which dispenses with CEM. The indispensability argument is thus defused.
At this point, proponents of CEM might reply that even if they are compelled to adopt the new theory of causal inference, this would not to stop them from holding onto CEM. Indeed, the assumptions of the new theory only involve counterfactual probabilities and do not explicitly refer to the logic of counterfactuals. And Stalnaker (1981) already argued that one can coherently embrace nontrivial counterfactual probabilities and insist on CEM at the same time. The idea is based on a semantic technique known as supervaluation, used to resist Lewis’s (1973) argument that nontrivial counterfactual probability refutes CEM.
Setting aside the details of supervaluation, it suffices to note that I, as the bad cop at this moment, can concede the points that Stalnakerians made in the previous paragraph. Even so, my main point remains: thanks to the new theory of causal inference, CEM is no longer indispensable, even if it might still be optional. This is sufficient to undermine the indispensability argument—the mere optionality of an option is too weak to entail that we should take that option. This concludes my role as the bad cop.
5 Closing
I demonstrated how the Rubin causal model could be used to construct a new argument for Conditional Excluded Middle (CEM)—an indispensability argument. Then, I switched sides and undermined that argument. The assumption of CEM is removed by, first, turning the Rubin causal model into a stochastic version and, second, incorporating it into a causal Bayes net. Where does my heart lie—on the Stalnakerian side supporting CEM or the Lewisian side opposing it? You might have guessed that I lean toward the latter, but that is secondary for now. The more important message is this: while the Nobel Prize-winning theory of causal inference has been largely overlooked in philosophy, it actually offers a rich source of interesting issues for philosophers to explore. Let me mention three.
First of all, the dialectic developed above suggests an interesting case for the revisability of logic. If health and social scientists can be persuaded to abandon CEM, possibly following the new theory of causal inference developed above, it would be an example of how empirical inquiry can drive revisions in deductive logic—precisely the kind of case Quine (1951) envisioned. This would underscore the possibility of revising logic in light of not only empirical inquiries but also practical concerns, such as those in the health and social sciences—a much more relatable example than Putnam’s (1968) proposal to shift from classical to quantum logic.
So much for deductive logic, but there is also something here for theorists of induction. When scientists justify inductive methods, they rely heavily on their contexts of inquiry, including background assumptions. Past discussions have mostly focused on background assumptions that are physical (Longino 1979, Christensen 1997), methodological, or ethical (Reiss 2020), rather than logical. But do scientists have to assume a logical principle like CEM to justify certain causal inferences? As we have seen, the search for an answer is far from trivial. Thus, background assumptions about deductive logic warrant greater attention from theorists of induction.
Last but not the least, there is also something for those more interested in scientific modeling rather than inference, whether deductive or inductive. Consider the interplay between three approaches to causal modeling:
-
(1)
Rubin causal models (Rubin 1974),
-
(2)
structural equation models (Pearl 2009),
-
(3)
causal Bayes nets (Spirtes et al. 2000).
Pearl (2009) famously argues that the first two approaches—Rubin causal models and structural equation models—are essentially equivalent, with the common commitment to deterministic generation of outcomes. I am happy to grant him this point. Yet Pearl further argues that the first two (equivalent) approaches can be used to do everything we can do with the third approach—causal Bayes nets—and this is where I disagree with Pearl. The new theorem suggests that, in at least one important application (instrumental variable estimation), causal Bayes nets generalize Rubin causal models while dropping the metaphysical assumption of deterministic outcomes and removing the logical assumption of Conditional Excluded Middle. This prompts a reconsideration of some questions: Which approach to causal modeling is more general? Which are equivalent, and in what sense? These questions would make for an interesting case study on an important topic: intertheory relations, a subject whose case studies have thus far been largely drawn from natural sciences.666 For a review of this subject, see Palacios (2024). I submit that more attention be directed to the relations among causal models in computer science and health and social sciences. While initial steps have been taken by Markus (2021) and Weinberger (2023), their work does not consider causal Bayes nets. Much more remains to be explored.
The Rubin causal model has been overlooked in philosophy for far too long. I hope to have demonstrated that it offers a rich and promising landscape for exploration. It may be surprising that a core issue in philosophy of language (regarding CEM) is deeply connected to philosophy of health and social sciences.
Acknowledgements
I am indebted to the participants of the Workshop on the Philosophy, Psychology, and Computer Science of Causation held in Kyoto (June 24-26, 2023, Kyoto, Japan), the Conference on Causality in Epidemiology (May 2-4, 2024, Linz, Austria), and the Causation session at the 2024 Philosophy of Science Association Biennial Meeting (November 14-17, 2024, New Orleans, LA, USA). I am especially grateful to Christopher Hitchcock, Peng Ding, Jiji Zhang, Frederick Eberhardt, Jun Otsuka, Xiao-Li Meng, Konstantin Genin, Conor Mayo-Wilson, Tom Wysocki, and Jennifer Jhun for stimulating questions and discussions.
Appendix A Appendices
A.1 The Formalism of the Rubin Causal Model
The Rubin causal model builds on a simple idea: ordinary variables are extended to variables under counterfactual conditions, also known as potential outcomes. Recall that expresses the proposition that individual takes the treatment. Similarly, says that gets cured, and says that is assigned to the treatment group (rather than the control group). Given a variable , we can use to denote a potential outcome, which represents the value of that individual would have under the counterfactual condition . Here is a substantive assumption:
Assumption 1 (Stable Unit Treatment Value, or SUTVA). The values of the variables of each individual (or unit) are determined independently of the values of the variables of any other individuals. That is, for any variable and any conditions concerning individuals from to , we have .
So, this assumption helps simplify the antecedents of counterfactuals. It might be violated in some cases, such as when dealing with a contagious disease in a densely populated community. A further simplification is enabled by the next assumption:
Assumption 2 (Instrumentality). For each individual , is an instrumental variable in the following sense: the value of is determined once the value of is determined, independently of the value of . That is, , which omits the assignment in the counterfactual condition.
Thanks to the above two assumptions, now we only need to consider just four potential outcomes for each individual : , , , , which correspond to the four cards that has in the game presented in the tutorial (Section 2).
The design of four cards (as opposed to the four-deck design in my expansion pack) also comes with logical assumptions:
Assumption 3 (Centering/Consistency). An antecedent, if true, is always redundant; that is, it must be that
Assumption 4 (Conditional Excluded Middle, or CEM). If is a binary variable, then the counterfactual variable is always a binary variable, too; in other words:
Under the assumption of CEM, the four subpopulations defined below are mutually exclusive and jointly exhaustive (as stated in Lemma A in Section 3.2):
Now we can define the individual treatment effect (ITE) for each individual and the local average treatment effect (LATE) for the compliers:
LATE |
To make the LATE well-defined, the denominator must be assumed to be nonzero:
Assumption 5 (Existence of Compliers) for some individual .
The design of the cards itself is non-probabilistic. In the Rubin causal model, probabilities arise entirely from how individuals are drawn from the population and assigned to different groups. For simplicity, let the subscript-free notation denote the probability of drawing an individual from the population who would be cured without taking the treatment. If everyone has an equal probability of being selected, where is the population size, then is identical to the proportion of those who would be cured without taking the treatment. This exploits a convenient ambiguity of between probability and proportion. There are two probabilistic assumptions:
Assumption 6 (Random Selection). Everyone in the population has an equal probability of being selected. In other words, the probability
is equal to the proportions of the individuals with the corresponding counterfactual properties , , , and .
Assumption 7 (Random Assignment). Any individual, once selected, has a nontrivial probability (say 50%) of being assigned to the treatment/control group, independently of their identity. So, is probabilistically independent of the set of all the four potential outcomes in use, , , , and ; in symbols:
There is one final assumption:
Assumption 8 (No Defiers) for no individual .
This assumption is presented last because, in real applications, it is often the one most responsible for delineating the scope of the method of instrumental variable estimation.
Then we have the classic result due to Imbens & Angrist (1994) and Angrist, Imbens, & Rubin (1996):
Theorem 1 (Formal Version). Under the assumptions 1-8 as stated above,
I believe that this list of assumptions, 1-8, is the most comprehensive one currently available.
A.2 Proof of the Main Result: Theorem 2
Recall that each individual has an individual treatment effect given by , with a degree of compliance defined by . Hence the DATE can be expressed as follows:
The first line is just the definition of the DATE, which is well-defined (with a nonzero denominator) by the assumption of Existence of Compliers*. In the second line, and are no longer restricted to compliers but range over all individuals in the population; this is justified by the assumption of No Defiers* and by the fact that indifference-takers carry zero weights. Now, the goal is to verify this equation:
The terms on the right-hand side are to be calculated in turn. I will leverage a defining feature of the causal Bayes net, the Causal Markov Assumption, which asserts that every variable is probabilistically independent of its non-descendants (non-effects) given its parents (direct causes). Start with the first term in the numerator. By applying the Chain Rule, we have:
The above can be simplified by the Causal Markov Assumption:
Then, by plugging in the parameters, we have:
Similarly for the second term in the numerator:
Now calculate the first term in the denominator:
Similarly for the second term in the denominator:
To finish off, plug the four terms just calculated into the following:
Q.E.D.
References
-
Angrist, J. D. (1990) “Lifetime Earnings and the Vietnam Era Draft Lottery: Evidence from Social Security Administrative Records”, American Economic Review, 80, 313-336.
-
Boylan, D. (2024) “Counterfactual Skepticism Is (Just) Skepticism”, Philosophy and Phenomenological Research, 108(1), 259-286.
-
Christensen, D. (1997) “What Is Relative Confirmation?”, Noûs, 31(3), 370-384.
-
Dawid, A. P. (2000) “Causal Inference without Counterfactuals”, Journal of the American Statistical Association, 95(450): 407-424.
-
Emery, N. (2017) “The Metaphysical Consequences of Counterfactual Skepticism”, Philosophy and Phenomenological Research, 94(2), 399-432.
-
Field, H. (2016) Science without Numbers, Oxford University Press.
-
Hájek, A. (unpublished manuscript) “Most Counterfactuals Are False”, URL =
https://philarchive.org/rec/HJEMCA -
Hausman, D. M. (2024) “Philosophy of Economics”, Zalta, E. N. & Nodelman, U. (eds.) The Stanford Encyclopedia of Philosophy (Fall 2024 Edition), URL =
https://plato.stanford.edu/archives/fall2024/entries/economics/ -
Hernán, M. A. & Robins, J. M. (2023) Causal Inference: What If, Chapman & Hall/CRC.
-
Hitchcock, C. (2024) “Causal Models”, Zalta, E. N. & Nodelman, U. (eds.) The Stanford Encyclopedia of Philosophy (Summer 2024 Edition), URL =
https://plato.stanford.edu/archives/sum2024/entries/causal-models/ -
Imbens, G. W., & Angrist, J. (1994) “Identification and Estimation of Local Average Treatment Effects”, Econometrica 62, 467-476.
-
Imbens, G. W., & Rubin, D. (2015) Causal Inference for Statistics, Social, and Biomedical Sciences, Cambridge University Press.
-
Lewis, D. K. (1973) Counterfactuals, Blackwell.
-
Longino, H. E. (1979) “Evidence and Hypothesis: An Analysis of Evidential Relations”, Philosophy of Science, 46(1), 35-56.
-
Mandelkern, M. (2022) “Modals and Conditionals”, in Altshuler, D. (ed.) Linguistics Meets Philosophy, Oxford University Press, pp. 502-533.
-
Markus, K. A. (2021) “Causal Effects and Counterfactual Conditionals: Contrasting Rubin, Lewis and Pearl”, Economics & Philosophy, 37(3), 441-461.
-
Palacios, P. (2024) “Intertheory Relations in Physics”, Zalta, E. N. & Nodelman, U. (eds.) The Stanford Encyclopedia of Philosophy (Spring 2024 Edition), URL =
https://plato.stanford.edu/archives/spr2024/entries/physics-interrelate/ -
Pearl, J. (2000) “Comment on Dawid’s Causal Inference without Counterfactuals”, Journal of the American Statistical Association, 95(450): 428-431.
-
Pearl, J. (2009), Causality, Cambridge University Press.
-
Putnam, H. (1968) “Is Logic Empirical?”, in Cohen, R. S. & Wartofsky, M. W. (eds.) Boston Studies in the Philosophy of Science, Vol. 5, D. Reidel: 216-241.
-
——— (1971) Philosophy of Logic, Routledge.
-
Quine, W. V. (1948) “On What There Is”, Review of Metaphysics, 2(5): 21-38.
-
——— (1951) “Two Dogmas of Empiricism”, Philosophical Review, 60: 20-43.
-
Reiss, J. (2020) “What Are the Drivers of Induction? Towards a Material Theory+”, Studies in History and Philosophy of Science Part A, 83, 8-16.
-
Rubin, D. B. (1974) “Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies”, Journal of Educational Psychology 66: 688-701.
-
Spirtes, P., Glymour, C. N., & Scheines, R. (2000) Causation, Prediction, and Search, MIT Press.
-
Stalnaker, R. C. (1968) “A Theory of Conditionals”, in Harper, W. L., Pearce, G. A., & Stalnaker, R. C. (eds.) Ifs: Conditionals, Belief, Decision, Chance and Time, Springer Netherlands: 41-55.
-
Stalnaker, R. (1981) “A Defense of Conditional Excluded Middle”, in: Harper, W. L., Pearce, G. A., & Stalnaker, R. (ends), Ifs: Conditionals, Belief, Decision, Chance and Time, D. Reidel Publishing Company, pp. 87-104
-
Weinberger, N. (2023) “Comparing Rubin and Pearl’s Causal Modelling Frameworks: A Commentary on Markus (2021)”, Economics & Philosophy, 39(3), 485-493.
-
Williams, J. R. G. (2010) Defending Conditional Excluded Middle, Noûs, 44(4), 650-668.