Impact Evaluation Sebastian Galiani November 2006 Causal Inference

Impact EvaluationImpact Evaluation

Sebastian GalianiSebastian Galiani

November 2006November 2006

Causal InferenceCausal Inference

2WBISARHDN WBISARHDN

Motivation The research questions that motivate

most studies in the health sciences are causal in nature. For example:

What is the efficacy of a given drug in Impact: given population? What fraction of deaths from a given disease could have been avoided by a given treatment or policy?


Motivation

The most challenging empirical questions in economics also involve causal-effect relationships:

Does school decentralization improve schools quality?


Motivation Interest in these questions is motivated

by: Policy concerns

Does privatization of water systems improve child health?

Theoretical considerations

Problems facing individual decision makers


Causal Analysis The aim of standard statistical analysis, typified

by likelihood and other estimation techniques, is to infer parameters of a distribution from samples drawn of that distribution.

With the help of such parameters, one can:

1. Infer association among variables,

2. Estimate the likelihood of past and future events,

3. As well as update the likelihood of events in light of new evidence or new measurement.


Causal Analysis These tasks are managed well by standard

statistical analysis as long as experimental conditions remain the same.

Causal analysis goes one step further:

Its aim is to infer aspects of the data generation process.

With the help of such aspects, one can deduce not only the likelihood of events under static conditions, but also the dynamics of events under changing conditions.


Causal Analysis This capability includes:

1. Predicting the effects of interventions2. Predicting the effects of spontaneous changes3. Identifying causes of reported events

This distinction implies that causal and associational concepts do not mix.


Causal AnalysisThe word cause is not in the vocabulary of standard

probability theory.

All Probability theory allows us to say is that two events are mutually correlated, or dependent –meaning that if we find one, we can expect to encounter the other.

Scientists seeking causal explanations for complex phenomena or rationales for policy decisions must therefore supplement the language of probability with a vocabulary for causality.


Causal Analysis

Two languages for causality have been proposed:

1. Structural equation modeling (ESM) (Haavelmo 1943).

2. The Neyman-Rubin potential outcome model (RCM) (Neyman, 1923; Rubin, 1974).


The Rubin Causal Model

Define the population by U. Each unit in U is denoted by u.

For each u U, there is associated a value Y(u) of the variable of interest Y, which we call: the response variable.

Let A be a second variable defined on U. We call A an attribute of the units in U.

11WBISARHDN

The key notion is the potential for exposing or not exposing each unit to the action of a cause:

Each unit has to be potentially exposable to any one of the causes.

Thus, Rubin takes the position that causes are only those things that could be treatments in hypothetical experiments.

An attribute cannot be a cause in an experiment, because the notion of potential exposability does not apply to it.

12WBISARHDN

For simplicity, we assume that there are just two causes or level of treatment.

Let D be a variable that indicates the cause to which each unit in U is exposed:

controltoexposedisuunitif

treatmenttoexposedisuunitif

c

tD

In a controlled study, D is constructed by the experimenter. In an uncontrolled study, it is determined by factors beyond the experimenter’s control.

13WBISARHDN

The values of Y are potentially affected by the particular cause, t or c, to which the unit is exposed.

Thus, we need two response variables:

Yt(u), Yc(u)

Yt is the value of the response that would be observed if the unit were exposed to t and

Yc is the value that would be observed on the same unit if it were exposed to c.

14WBISARHDN

Let D also be expressed as a binary variable:

D = 1 if D = t and D = 0 if D = c

Then, the outcome of each individual can be written as:

Y(U) = D Y1 + (1 – D) Y0

15WBISARHDN

Definition: For every unit u treatment {Du = 1 instead of Du = 0} causes the effect

u = Y1(u) – Y0(u)

This definition of a causal effect assumes that the treatment status of one individual does not affect the potential outcomes of other individuals.

Fundamental Problem of Causal Inference: It is impossible to observe the value of Y1(u) and Y0(u) on the same unit and, therefore, it is impossible to observe the effect of t on u.

Another way to express this problem is to say that we cannot infer the effect of treatment because we do not have the counterfactual evidence i.e. what would have happened in the absence of treatment.

16WBISARHDN

Given that the causal effect for a single unit u cannot be observed, we aim to identify the average causal effect for the entire population or for sub-populations.

The average treatment effect ATE of t (relative to c) over U (or any sub-population) is given by:

ATE =E [Y1(u) – Y0(u)]

= E [Y1(u)] – E [Y0(u)]

(1)01 YY

17WBISARHDN

The statistical solution replaces the impossible-to-observe causal effect of t on a specific unit with the possible-to-estimate average causal effect of t over a population of units.

Although E(Y1) and E(Y0) cannot both be calculated, they can be estimated.

Most econometrics methods attempt to construct from observational data consistent estimates of

01 YandY

18WBISARHDN

Consider the following simple estimator of ATE:

(2)0]D|Y[-1]D|Y[ˆ01

Note that equation (1) is defined for the whole population, whereas equation (2) represents an estimator to be evaluated on a sample drawn from that population

19WBISARHDN

Let equal the proportion of the population that would be assigned to the treatment group.

Decomposing ATE, we have:

0D|YY)1(1D|YY 0101

0100

11

YY]0D|Y[)1(]1D|Y[

]0D|Y[)1(]1D|Y[

}0{}1{ )1( DD

20WBISARHDN

If we assume that

]0D|Y[)1(]0D|Y[

]1D|Y[)1(]1D|Y[

00

11

0]D|Y[1]D|Y[and0]D|Y[1]D|Y[ 0011

0]D|Y[-1]D|Y[ˆ01

0]D|Y[-1]D|Y[ 01

Which is consistently estimated by its sample analog estimator:

21WBISARHDN

Thus, a sufficient condition for the standard estimator to consistently estimate the true ATE is that:

0]D|Y[1]D|Y[and0]D|Y[1]D|Y[ 0011 In this situation, the average outcome under the treatment

and the average outcome under the control do not differ between the treatment and control groups.

In order to satisfy these conditions, it is sufficient that treatment assignment D be uncorrelated with the potential outcome distributions of Y1 and Y2.

The principal way to achieve this uncorrelatedness is through random assignment of treatment.

22WBISARHDN

In most circumstances, there is simply no information available on how those in the control group would have reacted if they had received the treatment instead.

This is the basis for an important insight into the potential biases of the standard estimator (2).

After a bit of algebra, it can be shown that:

ityHeterogeneTreatment

0}{D1}{D

DifferenceBaseline

00 )1(0]D|Y[1]D|Y[ˆ

23WBISARHDN

This equation specifies the two sources of biases that need to be eliminated from estimates of causal effects from observational studies.

1. Selection Bias: Baseline difference. 2. Treatment Heterogeneity.

Most of the methods available only deal with selection bias, simply assuming that the treatment effect is constant in the population or by redefining the parameter of interest in the population.


Treatment on the Treated ATE is not always the parameter of interest.

In a variety of policy contexts, it is the average treatment effect for the treated that is of substantive interest:

TOT =E [Y1(u) – Y0(u)| D = 1] = E [Y1(u)| D = 1] – E [Y0(u)| D = 1]


Treatment on the Treated

The standard estimator (2) consistently estimates TOT if:

0]D|Y[1]D|Y[ 00


Structural Equation Modeling

Structural equation modeling was originally developed by geneticists (Wright 1921) and economists (Haavelmo 1943).


Structural Equations Definition: An equation

y = β x + ε (8)

is said to be structural if it is to be interpreted as follows:

In an ideal experiment where we control X to x and any other set Z of variables (not containing X or Y) to z, the value y of Y is given by β x + ε, where ε is not a function of the settings x and z.

This definition is in the spirit of Haavelmo (1943), who explicitly interpreted each structural equation as a statement about a hypothetical controlled experiment.

28WBISARHDN

Thus, to the often asked question, “Under what conditions can we give causal interpretation to structural coefficients?”

Haavelmo would have answered: Always!

According to the founding father of SEM, the conditions that make the equation y = β x + ε structural are precisely those that make the causal connection between X and Y have no other value but β, and ensuring that nothing about the statistical relationship between x and ε can ever change this interpretation of β.

29WBISARHDN

The average causal effect: The average causal effect on Y of treatment level x is the difference in the conditional expectations:

E(Y|X = x) – E(Y|X = 0)

In the context of dichotomous interventions (x = 1), this causal effect is called the average treatment effect (ATE).


Representing Interventions Consider the structural model M:

z = fz(w)

x = fx(z, )

y = fy(x, u)

We represent an intervention in the model through a mathematical operator denoted d0(x).

d0(x) simulates physical interventions by deleting certain functions from the model, replacing them by a constant X = x, while keeping the rest of the model unchanged.

31WBISARHDN

To emulate an intervention d0(x0) that holds X constant (at X = x0) in model M, replace the equation for x with x = x0, and obtain a new model, Mx0

z = fz(w)

x = x0

y = fy(x, u)

The joint distribution associated with the modified model, denoted P(z, y| d0(x0)) describes the post-intervention (“experimental”) distribution.

From this distribution, one is able to assess treatment efficacy by comparing aspects of this distribution at different levels of x0.


Structural Parameters Definition: The interpretation of a structural

equation as a statement about the behavior of Y under a hypothetical intervention yields a simple definition for the structural parameters.

The meaning of β in the equation y = β x + ε is simply

(x)]d|E[Yx o


Counterfactual Analysis in Structural Models

Consider again model Mxo. Call the solution of Y the potential response of Y to x0.

We denote it as Yx0(u, , w).

This entity can be given a counterfactual interpretation, for it stands for the way an individual with characteristics (u, , w) would respond, had the treatment been x0, rather than the x = fx(z, ) actually received by the individual.

34WBISARHDN

In our example,

Yx0(u, , w) = Yx0(u) = y = fy(x0, u)

• This interpretation of counterfactuals, cast as solutions to modified systems of equations, provides the conceptual and formal link between structural equation modeling and the Rubin potential-outcome framework.

• It ensures us that the end results of the two approaches will be the same.

• Thus, the choice of model is strictly a matter of convenience or insight.


References Judea Pearl (2000): Causality: Models, Reasoning and

Inference, CUP. Chapters 1, 5 and 7. Trygve Haavelmo (1944): “The probability approach in

econometrics”, Econometrica 12, pp. iii-vi+1-115. Arthur Goldberger (1972): “Structural Equations Methods in

the Social Sciences”, Econometrica 40, pp. 979-1002. Donald B. Rubin (1974): “Estimating causal effects of

treatments in randomized and nonrandomized experiments”, Journal of Educational Psychology 66, pp. 688-701.

Paul W. Holland (1986): “Statistics and Causal Inference”, Journal of the American Statistical Association 81, pp. 945-70, with discussion.

Documents

Impact Evaluation Sebastian Galiani November 2006 Causal Inference