8
Biometrics Section 19

Biometrics Section - Eötvös Loránd Universityweb.cs.elte.hu/~lukacs/adatbszem2012tavasz/2007Pearl.pdfHolland can recognize h suc expressions through the subscripts By un tested

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Biometrics Section - Eötvös Loránd Universityweb.cs.elte.hu/~lukacs/adatbszem2012tavasz/2007Pearl.pdfHolland can recognize h suc expressions through the subscripts By un tested

The Mathematics of Causal Inference in Statistics

Judea PearlUCLA Computer Science Department

Los Angeles� CA ����� USA

Abstract

The �potential outcome�� or Neyman�Rubin NR modelthrough which statisticians were �rst introduced to causalanalysis su�ers from two fundamental shortcomings �It lacks formal underpinning and � it uses conceptuallyopaque language for expressing causal information� As aresults� investigators �nd it di�cult to discern whethera set of formulae represents a faithful picture of one�sknowledge� and whether such a set is self�consistent orredundant�These shortcomings can be recti�ed using counterfac�

tual semantics based on nonparametric structural equa�tions �Pearl� ����a� which provides both a mathematicalfoundation for the NR analysis and a conceptually trans�parent language for expressing causal knowledge� Thissemantical framework gives rise to a friendly calculus ofcausation that uni�es the graphical� potential outcomeand structural equation approaches and resolves long�standing problems in several of the sciences� These in�clude questions of confounding� causal e�ect estimation�policy analysis� legal responsibility� direct and indirect ef�fects� instrumental variables� surrogate designs� and theintegration of data from experimental and observationalstudies�

KEY WORDS Structural equation models� confound�ing� Rubin causal model� graphical methods� counterfac�tuals� causal e�ects�

�� Introduction

Almost two decades have passed since Paul Holland pub�lished his seminal review paper on the Neyman�RubinNR approach to causal inference �Holland� ������ Ourunderstanding of causal inference has since increased sev�eral folds� due primarily to advances in three areas

�� Nonparametric structural equations

�� Graphical models

�� Symbiosis between counterfactual and graphicalmethods�

These advances are central to the empirical sciences be�cause the research questions that motivate most studies inthe health� social and behavioral sciences are not statisti�cal but causal in nature� For example� what is the e�cacyof a given drug in a given population� Whether data can

prove an employer guilty of hiring discrimination� Whatfraction of past crimes could have been avoided by a givenpolicy� What was the cause of death of a given individ�ual� in a speci�c incident�Remarkably� although much of the conceptual frame�

work and algorithmic tools needed for tackling such prob�lems are now well established� they are hardly known toresearchers in the �eld who could put them into practicaluse� Why�

Solving causal problems mathematically requires cer�tain extensions in the standard mathematical languageof statistics� and these extensions are not generally em�phasized in the mainstream literature and education� Asa result� large segments of the statistical research commu�nity �nd it hard to appreciate and bene�t from the manyresults that causal analysis has produced in the past twodecades�

This paper aims at making these advances more acces�sible to the general research community by� �rst� contrast�ing causal analysis with standard statistical analysis and�second� by comparing and unifying various approaches tocausal analysis�

�� From Associational to Causal Analysis�

Distinctions and Barriers

��� The Basic Distinction� Coping With Change

The aim of standard statistical analysis� typi�ed by re�gression� estimation� and hypothesis testing techniques�is to assess parameters of a distribution from samplesdrawn of that distribution� With the help of such pa�rameters� one can infer associations among variables� es�timate the likelihood of past and future events� as well asupdate the likelihood of events in light of new evidenceor new measurements� These tasks are managed well bystandard statistical analysis so long as experimental con�ditions remain the same� Causal analysis goes one stepfurther� its aim is to infer not only the likelihood of eventsunder static conditions� but also the dynamics of eventsunder changing conditions� for example� changes inducedby treatments or external interventions�

This distinction implies that causal and associationalconcepts do not mix� There is nothing in the joint dis�tribution of symptoms and diseases to tell us that curingthe former would or would not cure the latter� More gen�erally� there is nothing in a distribution function to tellus how that distribution would di�er if external condi�tions were to change�say from observational to experi�

Biometrics Section

19

Kaoru
Text Box
TECHNICAL REPORT R-337 October 2007
Kaoru
Text Box
In 2007 JSM Proceedings of the American Statistical Association, Biometrics Section [CD-ROM], Alexandria, VA: American Statistical Association: pp. 19-26.
Page 2: Biometrics Section - Eötvös Loránd Universityweb.cs.elte.hu/~lukacs/adatbszem2012tavasz/2007Pearl.pdfHolland can recognize h suc expressions through the subscripts By un tested

mental setup�because the laws of probability theory donot dictate how one property of a distribution ought tochange when another property is modi�ed� This infor�mation must be provided by causal assumptions whichidentify relationships that remain invariant when exter�nal conditions change�

These considerations imply that the slogan �correla�tion does not imply causation� can be translated into auseful principle one cannot substantiate causal claimsfrom associations alone� even at the population level�behind every causal conclusion there must lie some causalassumption that is not testable in observational studies�

��� Formulating the Basic Distinction

A useful demarcation line that makes the distinction be�tween associational and causal concepts crisp and easy toapply� can be formulated as follows� An associational con�cept is any relationship that can be de�ned in terms of ajoint distribution of observed variables� and a causal con�cept is any relationship that cannot be de�ned from thedistribution alone� Examples of associational conceptsare correlation� regression� dependence� conditional in�dependence� likelihood� collapsibility� risk ratio� odd ra�tio� marginalization� conditionalization� �controlling for��and so on� Examples of causal concepts are randomiza�tion� in�uence� e�ect� confounding� �holding constant��disturbance� spurious correlation� instrumental variables�intervention� explanation� attribution� and so on� Theformer can� while the latter cannot be de�ned in term ofdistribution functions�

This demarcation line is extremely useful in causalanalysis for it helps investigators to trace the assump�tions that are needed for substantiating various types ofscienti�c claims� Every claim invoking causal conceptsmust rely on some premises that invoke such concepts� itcannot be inferred from� or even de�ned in terms statis�tical associations alone�

��� Rami�cations of the Basic Distinction

This principle has far reaching consequences that are notgenerally recognized in the standard statistical literature�Many researchers� for example� are still convinced thatconfounding is solidly founded in standard� frequentiststatistics� and that it can be given an associational def�inition saying roughly �U is a potential confounderfor examining the e�ect of treatment X on outcome Ywhen both U and X and U and Y are not independent��That this de�nition and all its many variants must fail�is obvious from the demarcation line above� �indepen�dence� is an associational concept while confounding isneeded for establishing causal relations� The two do notmix� hence� the de�nition must be false� Therefore� tothe bitter disappointment of generations of epidemiologyresearchers� confounding bias cannot be detected or cor�rected by statistical methods alone� one must make somejudgmental assumptions regarding causal relationships in

the problem before an adjustment e�g�� by strati�cationcan safely correct for confounding bias�Another rami�cation of the sharp distinction between

associational and causal concepts is that any mathemat�ical approach to causal analysis must acquire new nota�tion for expressing causal relations � probability calculusis insu�cient� To illustrate� the syntax of probability cal�culus does not permit us to express the simple fact that�symptoms do not cause diseases�� let alone draw math�ematical conclusions from such facts� All we can say isthat two events are dependent�meaning that if we �ndone� we can expect to encounter the other� but we can�not distinguish statistical dependence� quanti�ed by theconditional probability P disease jsymptom from causaldependence� for which we have no expression in standardprobability calculus� Scientists seeking to express causalrelationships must therefore supplement the language ofprobability with a vocabulary for causality� one in whichthe symbolic representation for the relation �symptomscause disease� is distinct from the symbolic representa�tion of �symptoms are associated with disease��

��� Two Mental Barriers� Untested Assump

tions and New Notation

The preceding two requirements � to commence causalanalysis with untested�� theoretically or judgmentallybased assumptions� and � to extend the syntax of prob�ability calculus� constitute the two main obstacles tothe acceptance of causal analysis among statisticians andamong professionals with traditional training in statistics�Associational assumptions� even untested� are testable

in principle� given su�ciently large sample and su��ciently �ne measurements� Causal assumptions� in con�trast� cannot be veri�ed even in principle� unless one re�sorts to experimental control� This di�erence stands outin Bayesian analysis� Though the priors that Bayesianscommonly assign to statistical parameters are untestedquantities� the sensitivity to these priors tends to dimin�ish with increasing sample size� In contrast� sensitivityto prior causal assumptions� say that treatment does notchange gender� remains substantial regardless of samplesize�This makes it doubly important that the notation we

use for expressing causal assumptions be meaningful andunambiguous so that one can clearly judge the plausibil�ity or inevitability of the assumptions articulated� Statis�ticians can no longer ignore the mental representation inwhich scientists store experiential knowledge� since it isthis representation� and the language used to access thatrepresentation that determine the reliability of the judg�ments upon which the analysis so crucially depends�How does one recognize causal expressions in the sta�

tistical literature� Those versed in the potential�outcomenotation �Neyman� ����� Rubin� ����� Holland� ������can recognize such expressions through the subscripts

�By �untested� I mean untested using frequency data in nonex�perimental studies�

Biometrics Section

20

Page 3: Biometrics Section - Eötvös Loránd Universityweb.cs.elte.hu/~lukacs/adatbszem2012tavasz/2007Pearl.pdfHolland can recognize h suc expressions through the subscripts By un tested

that are attached to counterfactual events and variables�e�g� Yxu or Zxy� Some authors use parenthetical ex�pressions� e�g� Y x� u or Zx� y� The expression Yxu�for example� stands for the value that outcome Y wouldtake in individual u� had treatment X been at level x�If u is chosen at random� Yx is a random variable� andone can talk about the probability that Yx would at�tain a value y in the population� written P Yx � y�Alternatively� Pearl ������ used expressions of the formP Y � yjsetX � x or P Y � yjdoX � x to denotethe probability or frequency that event Y � y wouldoccur if treatment condition X � x were enforced uni�formly over the population�� Still a third notation thatdistinguishes causal expressions is provided by graphicalmodels� where the arrows convey causal directionality��

However� few have taken seriously the textbook re�quirement that any introduction of new notation mustentail a systematic de�nition of the syntax and seman�tics that governs the notation� Moreover� in the bulk ofthe statistical literature before ����� causal claims rarelyappear in the mathematics� They surface only in theverbal interpretation that investigators occasionally at�tach to certain associations� and in the verbal descriptionwith which investigators justify assumptions� For exam�ple� the assumption that a covariate is not a�ected by atreatment� a necessary assumption for the control of con�founding �Cox� ������ is expressed in plain English� notin a mathematical expression�

Remarkably� though the necessity of explicit causal no�tation is now recognized by most leaders in the �eld� theuse of such notation has remained enigmatic to most rankand �le researchers� and its potentials still lay grossly un�derutilized in the statistics based sciences� The reasonfor this� I am �rmly convinced� can be traced to the un�friendly and ad�hoc way in which the NR model has beenpresented to the research community�

The next section provides a conceptualization thatovercomes these mental barriers� it o�ers both a friendlymathematical machinery for cause�e�ect analysis and aformal foundation for counterfactual analysis�

�� The Language of Diagrams and Structural

Equations

��� Semantics� Causal Eects and Counterfac

tuals

How can one express mathematically the common un�derstanding that symptoms do not cause diseases� Theearliest attempt to formulate such relationship mathe�

�Clearly� P �Y � yjdo�X � x is equivalent to P �Yx � y�This is what we normally assess in a controlled experiment� withX randomized� in which the distribution of Y is estimated for eachlevel x of X�

�These notational clues should be useful for detecting inade�quate denitions of causal concepts� any denition of confounding�randomization or instrumental variables that is cast in standardprobability expressions� void of graphs� counterfactual subscriptsor do�� operators� can safely be discarded as inadequate�

matically was made in the �����s by the geneticist SewallWright ������� who used a combination of equations andgraphs� For example� if X stands for a disease variableand Y stands for a certain symptom of the disease� Wrightwould write a linear equation

y � �x u �

where x stands for the level or severity of the disease� ystands for the level or severity of the symptom� and u

stands for all factors� other than the disease in question�that could possibly a�ect Y � In interpreting this equationone should think of a physical process whereby Natureexamines the values of x and u and� accordingly� assignsvariable Y the value y � �x u�To express the directionality inherent in this process�

Wright augmented the equation with a diagram� latercalled �path diagram�� in which arrows are drawn fromperceived causes to their perceived e�ects and� moreimportantly� the absence of an arrow makes the empiricalclaim that the value Nature assigns to one variable is notdetermined by the value taken by another�The variables V and U are called �exogenous� � they

represent observed or unobserved background factorsthat the modeler decides to keep unexplained� that is�factors that in�uence but are not in�uenced by the othervariables called �endogenous� in the model�If correlation is judged possible between two exogenous

variables� U and V � it is customary to connect them bya dashed double arrow� as shown in Fig� �b�

V UV U

βX YβX Y

(b)(a)

x = vy = x + uβ

Figure � A simple structural equation model� and itsassociated diagrams� Unobserved exogenous variables areconnected by dashed arrows�

To summarize� path diagrams encode causal assump�tions via missing arrows� representing claims of zero in��uence� and missing double arrows e�g�� between V andU� representing the causal assumption CovU� V ���

The generalization to nonlinear system of equations isstraightforward� For example� the non�parametric inter�pretation of the diagram of Fig� �a corresponds to aset of three functions� each corresponding to one of theobserved variables

z � fZw

x � fXz� v �

y � fY x� u

whereW�V and U are assumed to be jointly independentbut� otherwise� arbitrarily distributed�

Biometrics Section

21

Page 4: Biometrics Section - Eötvös Loránd Universityweb.cs.elte.hu/~lukacs/adatbszem2012tavasz/2007Pearl.pdfHolland can recognize h suc expressions through the subscripts By un tested

(a) (b)

W

Z

V

X

U

Y

0x

U

Y

W

Z

V

X

Figure � a The diagram associated with the structuralmodel of Eq� �� b The diagram associated with themodi�ed model of Eq� �� representing the interventiondoX � x��

Remarkably� unknown to most economists and philoso�phers� structural equation models provide a formal inter�pretation and symbolic machinery for analyzing counter�factual relationships of the type �Y would be y had X

been x in situation U � u�� denoted Yxu � y� Here Urepresents the vector of all exogenous variables�The key idea is to interpret the phrase �had X been

x�� as an instruction to modify the original model andreplace the equation for X by a constant x�� yielding

z � fZw

x � x� �

y � fY x� u

the graphical description of which is shown in Fig� �b�This replacement permits the constant x� to di�er from

the actual value of X namely fXz� v without render�ing the system of equations inconsistent� thus yieldinga formal interpretation of counterfactuals in multi�stagemodels� where the dependent variable in one equationmay be an independent variable in another �Balke andPearl� ����ab� Pearl� ����b�� For example� to computethe average causal e�ect of X on Y � i�e�� EYx� we solveEq� � for Y in terms of the exogenous variables� yield�ing Yx� � fY x�� u� and average over U and V � Toanswer more sophisticate questions such as whether Ywould be y� if X were x�� given that in fact Y is y� andX is x�� we need to compute the conditional probabilityP Yx� � y�jY � y�� X � x� which is well de�ned oncewe know the forms of the structural equations and thedistribution of the exogenous variables in the model�This interpretation of counterfactuals� cast as solutions

to modi�ed systems of equations� provides the conceptualand formal link between structural equation models� usedin economics and social science and the Neyman�Rubinpotential�outcome framework to be discussed in Section�� But �rst we discuss two long�standing problems thathave been completely resolved in purely graphical terms�without delving into algebraic techniques�

��� Confounding and Causal Eect Estimation

The central target of most studies in the social andhealth sciences is the elucidation of cause�e�ect relation�ships among variables of interests� for example� treat�

ments� policies� preconditions and outcomes� While goodstatisticians have always known that the elucidation ofcausal relationships from observational studies must beshaped by assumptions about how the data were gener�ated� the relative roles of assumptions and data� and waysof using those assumptions to eliminate confounding biashave been a subject of much controversy� The structuralframework of Section ��� puts these controversies to rest�

Covariate Selection� The back�door criterion

Consider an observational study where we wish to �nd thee�ect of X on Y � for example� treatment on response� andassume that the factors deemed relevant to the problemare structured as in Fig� �� some are a�ecting the re�

Z1

Z3

Z2

Y

X

W

W

W

1

2

3

Figure � Graphical model illustrating the back�door cri�terion� Error terms are not shown explicitly�

sponse� some are a�ecting the treatment and some area�ecting both treatment and response� Some of thesefactors may be unmeasurable� such as genetic trait orlife style� others are measurable� such as gender� age� andsalary level� Our problem is to select a subset of these fac�tors for measurement and adjustment� namely� that if wecompare treated vs� untreated subjects having the samevalues of the selected factors� we get the correct treat�ment e�ect in that subpopulation of subjects� Such a setof factors is called a �su�cient set� or a set �appropri�ate for adjustment�� The problem of de�ning a su�cientset� let alone �nding one� has ba!ed epidemiologists andsocial science for decades see Greenland et al�� �������Pearl �����a� and ������ for review�The following criterion� named �back�door� in

�Pearl� ����a�� provides a graphical method of selectingsuch a set of factors for adjustment� It states that a setS is appropriate for adjustment if two conditions hold

�� No element of S is a descendant of X

�� The elements of S �block� all �back�door� pathsfrom X to Y � namely all paths that end with anarrow pointing to X ��

Based on this criterion we see� for example� that thesets fZ�� Z�� Z�g� fZ�� Z�g� and fW�� Z�g� each is su��cient for adjustment� because each blocks all back�door

�In this criterion� a set S of nodes is said to block a path p ifeither �i p contains at least one arrow�emitting node that is in S�or �ii p contains at least one collision node that is outside S andhas no descendant in S�

Biometrics Section

22

Page 5: Biometrics Section - Eötvös Loránd Universityweb.cs.elte.hu/~lukacs/adatbszem2012tavasz/2007Pearl.pdfHolland can recognize h suc expressions through the subscripts By un tested

paths between X and Y � The set fZ�g� however� is notsu�cient for adjustment because� as explained above� itdoes not block the path X � W� � Z� � Z� � Z� �W� � Y �The implication of �nding a su�cient set S is that�

stratifying on S is guaranteed to remove all confoundingbias relative the causal e�ect of X on Y � In other words�it renders the causal e�ect of X on Y identi�able� via

P Y � yjdoX � x

�X

s

P Y � yjX � x� S � sP S � s �

Since all factors on the right hand side of the equation areestimable e�g�� by regression from the pre�interventionaldata� the causal e�ect can likewise be estimated from suchdata without bias�The back�door criterion allows us to write Eq� � di�

rectly� after selecting a su�cient set S from the diagram�without resorting to any algebraic manipulation� Theselection criterion can be applied systematically to dia�grams of any size and shape� thus freeing analysts fromjudging whether �X is conditionally ignorable given S�� aformidable mental task required in the potential�responseframework �Rosenbaum and Rubin� ������ The criterionalso enables the analyst to search for an optimal set ofcovariate�namely� a set S that minimizes measurementcost or sampling variability �Tian et al�� ������

General control of confounding

Adjusting for covariates is only one of many methods thatpermits us to estimate causal e�ects in nonexperimentalstudies� A much more general identi�cation criterion isprovided by the following theorem

Theorem � �Tian and Pearl� �����A su�cient condition for identifying the causal eectP yjdox is that every path between X and any of itschildren traces at least one arrow emanating from a mea�sured variable��

For example� ifW� is the only observed covariate in themodel of Fig� �� then there exists no su�cient set for ad�justment because no set of observed covariates can blockthe paths from X to Y through Z�� yet P yjdox cannevertheless be estimated since the one path from X toW� the only child ofX traces the arrowX � W�� whichemanates from a measured variable� X � In this example�the variable W� acts as a �mediating instrumental vari�able� �Pearl ����b� Chalak and White� ����� and yieldsthe estimand

P Y � yjdoX � x

�X

w�

P W� � wjdoX � xP Y � yjdoW� � w

�X

w

P wjxX

x�

P yjw� x�P x� �

�Before applying this criterion� one may delete from the causalgraph all nodes that are not ancestors of Y �

More recent results extend this theorem by � present�ing a necessary and su�cient condition for identi�cation�Shpitser and Pearl� ������ and � extending the condi�tion from causal e�ects to any counterfactual expression�Shpitser and Pearl� ������ The corresponding unbiasedestimands for these causal quantities� are readable di�rectly from the diagram�

�� The Language of Potential Outcomes

The elementary object of analysis in the potential�outcome framework is the unit�based response variable�denoted Yxu� read �the value that Y would obtain inunit u� had treatment X been x� �Neyman� ����� Rubin������� These subscripted variables are treated as unde��ned quantities� useful for expressing the causal quanti�ties we seek� but are not derived from other quantitiesin the model� In contrast� in the previous section coun�terfactual entities were derived from a set of meaningfulphysical processes� each represented by an equation� andunit was interpreted a vector u of background factors thatcharacterize an experimental unit� Each structural equa�tion model thus provides a compact representation fora huge number of counterfactual claims� guaranteed tobe consistent� The potential outcome framework lackssuch compactness� nor does it provide guarantees thatany given set of claims is consistent�In view of these features� the structural de�nition of

Yxu can be regarded as the formal basis for the poten�tial outcome approach� It interprets the opaque Englishphrase �the value that Y would obtain in unit u� had Xbeen x� in terms of a meaningful mathematical modelthat allows such values to be computed unambiguously�Consequently� important concepts in potential responseanalysis that researchers �nd ill�de�ned or overly esotericoften obtain meaningful and natural interpretation inthe structural semantics� Examples are �unit� �exoge�nous variables� in structural semantics� �principal strat�i�cation� �equivalence classes� in structural semantics�Balke and Pearl� ����a� and �Pearl ����b� �conditionalignorability� �back�door condition� in �Pearl ����a� �as�signment mechanism� P xjdirect�causes of X in struc�tural semantics and so on� The next two subsectionsexamine how assumptions and inferences are handled inthe potential outcome approach�

��� Formulating Assumptions

The distinct characteristic of the potential outcome ap�proach is that� although its primitive objects are unde��ned� hypothetical quantities� the analysis itself is con�ducted almost entirely within the axiomatic frameworkof probability theory� This is accomplished� by postulat�ing a �super� probability function on both hypotheticaland real events� treating the former as �missing data��In other words� if U is treated as a random variablethen the value of the counterfactual Yxu becomes arandom variable as well� denoted as Yx� The potential�

Biometrics Section

23

Page 6: Biometrics Section - Eötvös Loránd Universityweb.cs.elte.hu/~lukacs/adatbszem2012tavasz/2007Pearl.pdfHolland can recognize h suc expressions through the subscripts By un tested

outcome analysis proceeds by treating the observed dis�tribution P x�� � � � � xn as the marginal distribution of anaugmented probability function P � de�ned over both ob�served and counterfactual variables� Queries about causale�ects are phrased as queries about the marginal distri�bution of the counterfactual variable of interest� writ�ten P �Yx � y� The new hypothetical entities Yx aretreated as ordinary random variables� for example� theyare assumed to obey the axioms of probability calcu�lus� the laws of conditioning� and the axioms of condi�tional independence� Moreover� these hypothetical enti�ties are not entirely whimsy� but are assumed to be con�nected to observed variables via consistency constraints�Robins� ����� such as

X � x �� Yx � Y� �

which states that� for every u� if the actual value of Xturns out to be x� then the value that Y would take on ifX were x is equal to the actual value of Y � For example� aperson who chose treatment x and recovered� would alsohave recovered if given treatment x by design�The main conceptual di�erence between the two ap�

proaches is that� whereas the structural approach viewsthe subscript x as an operation that changes the distri�bution but keeps the variables the same� the potential�outcome approach views the variable Yx� to be a di�erentvariable� loosely connected to Y through relations suchas ��Pearl �����a� Chapter �� shows� using the structural in�

terpretation of Yxu� that it is indeed legitimate to treatcounterfactuals as jointly distributed random variables inall respects� that consistency constraints like � are au�tomatically satis�ed in the structural interpretation and�moreover� that investigators need not be concerned aboutany additional constraints except the following two�

Yyz � y for all y and z �

Xz � x �� Yxz � Yz for all x and z �

Eq� � ensures that the interventions doY � y resultsin the condition Y � y� regardless of concurrent interven�tions� say doZ � z� that are applied to variables otherthan Y � Equation � generalizes � to cases where Z isheld �xed� at z�To communicate substantive causal knowledge� the

potential�outcome analyst must express causal assump�tions as constraints on P �� usually in the form of condi�tional independence assertions involving counterfactualvariables� In Fig� �a for instance� to communicatethe understanding that a treatment assignment Z israndomized hence independent of both U and V � the

�This completeness result is due to Halpern � ����� who notedthat an additional axiom

fYxz � yg � fZxy � zg �� Yx � y

must hold in non�recursive models� This fundamental axiom maycome to haunt economists and social scientists who blindly applyNR analysis in their elds�

potential�outcome analyst needs to use the independenceconstraint Z��fXz� Yxg� To further formulate the under�standing that Z does not a�ect Y directly� except throughX � the analyst would write a� so called� �exclusion restric�tion� Yxz � Yx� Clearly� no mortal can judge the valid�ity of such assumptions in any real life problem withoutresorting to graphs��

��� Performing Inferences

A collection of assumptions of this type might sometimesbe su�cient to permit a unique solution to the query ofinterest� in other cases� only bounds on the solution canbe obtained� For example� if one can plausibly assumethat a set Z of covariates satis�es the conditional inde�pendence

Yx��X jZ �

an assumption that was termed �conditional ignorabil�ity� by �Rosenbaum and Rubin� ����� then the causal ef�fect P �Yx � y can readily be evaluated to yield

P �Yx � y �X

z

P �Yx � yjzP z

�X

z

P �Yx � yjx� zP z using �

�X

z

P �Y � yjx� zP z using �

�X

z

P yjx� zP z� ��

which is the usual covariate�adjustment formula� as inEq� ��Note that almost all mathematical operations in this

derivation are conducted within the safe con�nes of prob�ability calculus� Save for an occasional application of rule� or �� the analyst may forget that Yx stands for acounterfactual quantity�it is treated as any other ran�dom variable� and the entire derivation follows the courseof routine probability exercises�However� this mathematical illusion comes at the ex�

pense of conceptual clarity� especially at a stage wherecausal assumptions need be formulated� The reader mayappreciate this aspect by attempting to judge whetherthe assumption of conditional ignorability Eq� �� thekey to the derivation of Eq� ��� holds in any familiarsituation� say in the experimental setup of Fig� �a� Thisassumption reads �the value that Y would obtain hadX been x� is independent of X � given Z�� Such assump�tions of conditional independence among counterfactualvariables are not straightforward to comprehend or as�certain� for they are cast in a language far removed fromordinary understanding of cause and e�ect� When coun�terfactual variables are not viewed as byproducts of adeeper� process�based model� it is also hard to ascertain

�Even with the use of graphs the task is not easy� for example�the reader should try to verify whether fZ��XzjY g holds in thesimple model of Fig� ��a�

Biometrics Section

24

Page 7: Biometrics Section - Eötvös Loránd Universityweb.cs.elte.hu/~lukacs/adatbszem2012tavasz/2007Pearl.pdfHolland can recognize h suc expressions through the subscripts By un tested

whether all relevant counterfactual independence judg�ments have been articulated� whether the judgments ar�ticulated are redundant� or whether those judgments areself�consistent�The need to express� defend� and manage formidable

counterfactual relationships of this type explains the slowacceptance of causal analysis among epidemiologists andstatisticians� and why economists and social scientistscontinue to use structural equation models instead ofthe potential�outcome alternatives advocated in Holland������� Angrist et al� ������� and Sobel �������On the other hand� the algebraic machinery o�ered

by the potential�outcome notation� once a problem isproperly formalized� can be powerful in re�ning as�sumptions �Angrist et al�� ������ deriving consistent es�timands �Robins� ������ bounding probabilities of causa�tion �Tian and Pearl� ������ and combining data from ex�perimental and nonexperimental studies �Pearl� ����a��

��� Combining Graphs and Algebra � Methods

and Accomplishments�

Pearl �����a� page ���� presents a way of combining thebest features of the two approaches� It is based on en�coding causal assumptions in the language of diagrams�translating these assumptions into potential outcome no�tation� performing the mathematics in the algebraic lan�guage of counterfactuals and� �nally� interpreting the re�sult in plain causal language� Often� the answer desiredcan be obtained directly from the diagram� and no trans�lation is necessary as demonstrated in Section ����This method has scored an impressive list of ac�

complishments� including solutions to the long�standingproblems of legal responsibility �Tian and Pearl� �����Pearl� ������ non�compliance �Balke and Pearl� �����Chickering and Pearl� ������ direct and indirect ef�fects �Pearl� ������ mediating instrumental variables�Pearl� ����b� Brito and Pearl� ������ robustness analysis�Pearl� ������ and the integration of data from experi�mental and observational studies �Tian and Pearl� �����Pearl� ����a�� Detailed descriptions of these results aregiven in the corresponding articles which are available onhbayes�cs�ucla�edu"jp�home�htmli�

�� Conclusions

Statistics is strong in devising ways of describing data andinferring distributional parameters from sample� Causalinference require two addition ingredients a science�friendly language for articulating causal knowledge� anda mathematical machinery for processing that knowl�edge� combining it with data and drawing new causalconclusions about a phenomena� This paper introducesnonparametric structural equations models as a formaland meaningful language for formulating causal knowl�edge and for explicating causal concepts used in sci�enti�c discourse� These include randomization� inter�vention� direct and indirect e�ects� confounding� coun�

terfactuals� and attribution� The algebraic componentof the structural language coincides with the potential�outcome framework� and its graphical component em�braces Wright�s method of path diagrams in its non�parametric version� When uni�ed and synthesized� thetwo components o�er statistical investigators a power�ful methodology for empirical research� The merits ofthis methodology have quickly been recognized by sev�eral research communities �e�g�� Morgan and Winship������ Greenland et al�� ����� Petersen et al�� ����� Chalakand White� ����� and are making their way� past obviouspockets of resistance� to statistical education as well�

Acknowledgments

This research was supported in parts by grants from NSFand NIH�

References

�Angrist et al�� ����� J�D� Angrist� G�W� Imbens� andRubin D�B� Identi�cation of causal e�ects using in�strumental variables with comments� Journal ofthe American Statistical Association� ����� ��������June �����

�Balke and Pearl� ����a� A� Balke and J� Pearl� Coun�terfactual probabilities Computational methods�bounds� and applications� In R� Lopez de Mantarasand D� Poole� editors� Uncertainty in Articial Intelli�gence ��� pages ������ Morgan Kaufmann� San Mateo�CA� �����

�Balke and Pearl� ����b� A� Balke and J� Pearl� Prob�abilistic evaluation of counterfactual queries� In Pro�ceedings of the Twelfth National Conference on Arti�cial Intelligence� volume I� pages �������� MIT Press�Menlo Park� CA� �����

�Balke and Pearl� ����� A� Balke and J� Pearl� Boundson treatment e�ects from studies with imperfect com�pliance� Journal of the American Statistical Associa�tion� ����� ���������� September �����

�Brito and Pearl� ����� C� Brito and J Pearl� Graphi�cal condition for identi�cation in recursive SEM� InProceedings of the Twenty�Third Conference on Un�certainty in Articial Intelligence� pages ������ AUAIPress� Corvallis� OR� �����

�Chalak and White� ����� K� Chalak and H� White� Anextended class of instrumental variables for the esti�mation of causal e�ects� Technical Report DiscussionPaper� UCSD� Department of Economics� July �����

�Chickering and Pearl� ����� D�M� Chickering andJ� Pearl� A clinician�s tool for analyzing non�compliance� Computing Science and Statistics���� �������� �����

Biometrics Section

25

Page 8: Biometrics Section - Eötvös Loránd Universityweb.cs.elte.hu/~lukacs/adatbszem2012tavasz/2007Pearl.pdfHolland can recognize h suc expressions through the subscripts By un tested

�Cox� ����� D�R� Cox� The Planning of Experiments�John Wiley and Sons� NY� �����

�Greenland et al�� ����� S� Greenland� J� Pearl� and J�MRobins� Causal diagrams for epidemiologic research�Epidemiology� ��� ������ �����

�Halpern� ����� J�Y� Halpern� Axiomatizing causal rea�soning� In G�F� Cooper and S� Moral� editors� Uncer�tainty in Articial Intelligence� pages �������� MorganKaufmann� San Francisco� CA� �����

�Holland� ����� P�W� Holland� Statistics and causal in�ference� Journal of the American Statistical Associa�tion� ����� �������� December �����

�Holland� ����� P�W� Holland� Causal inference� pathanalysis� and recursive structural equations models� InC� Clogg� editor� Sociological Methodology� pages �������� American Sociological Association� Washington�D�C�� �����

�Morgan and Winship� ����� S�L� Morgan and C� Win�ship� Counterfactuals and Causal Inference� Methodsand Principles for Social Research �Analytical Meth�ods for Social Research � Cambridge University Press�New York� NY� �����

�Neyman� ����� J� Neyman� On the application of prob�ability theory to agricultural experiments� Essay onprinciples� Section �� Statistical Science� �� �������������

�Pearl� ����a� J� Pearl� Comment Graphical models�causality� and intervention� Statistical Science� � �������� �����

�Pearl� ����b� J� Pearl� Mediating instrumental vari�ables� Technical Report Technical Report R����� Com�puter Science Department� UCLA� �����

�Pearl� ����� J� Pearl� Causal diagrams for empirical re�search� Biometrika� ��� �������� December �����

�Pearl� ����a� J� Pearl� Causality� Models� Reasoning�and Inference� Cambridge University Press� New York������

�Pearl� ����b� J� Pearl� Comment on A�P� Dawid�s�causal inference without counterfactuals� Journal ofthe American Statistical Association� ����� ��������June �����

�Pearl� ����� J� Pearl� Direct and indirect e�ects� In Pro�ceedings of the Seventeenth Conference on Uncertaintyin Articial Intelligence� pages �������� Morgan Kauf�mann� San Francisco� CA� �����

�Pearl� ����� J� Pearl� Statistics and causal inference Areview� Test Journal� ��� �������� December �����

�Pearl� ����� J� Pearl� Robustness of causal claims� Pro�ceedings of the Twentieth Conference Uncertainty inArticial Intelligence� pages �������� �����

�Petersen et al�� ����� M�L� Petersen� S�E� Sinisi� andM�J� van der Laan� Estimation of direct causal e�ects�Epidemiology� ��� �������� �����

�Robins� ����� J�M� Robins� A new approach to causalinference in mortality studies with a sustained expo�sure period � applications to control of the healthyworkers survivor e�ect� Mathematical Modeling�� ���������� �����

�Rosenbaum and Rubin� ����� P� Rosenbaum andD� Rubin� The central role of propensity score inobservational studies for causal e�ects� Biometrica��� ������ �����

�Rubin� ����� D�B� Rubin� Estimating causal e�ects oftreatments in randomized and nonrandomized studies�Journal of Educational Psychology� �� �������� �����

�Shpitser and Pearl� ����� I� Shpitser and J Pearl� Iden�ti�cation of conditional interventional distributions�In Proceedings of the Twenty�Second Conference onUncertainty in Articial Intelligence� pages ��������AUAI Press� Corvallis� OR� �����

�Shpitser and Pearl� ����� I� Shpitser and J Pearl� Whatcounterfactuals can be tested� In Proceedings of theTwenty�Third Conference on Uncertainty in ArticialIntelligence� pages �������� AUAI Press� Vancouver�BC Canada� �����

�Sobel� ����� M�E� Sobel� Causal inference in statisti�cal models of the process of socioeconomic achieve�ment� Sociological Methods � Research� ��� ��������November ����� cited in jp��� book�

�Tian and Pearl� ����� J� Tian and J� Pearl� Probabili�ties of causation Bounds and identi�cation� In Pro�ceedings of the Sixteenth Conference on Uncertainty inArticial Intelligence� pages �������� Morgan Kauf�mann� San Francisco� CA� �����

�Tian and Pearl� ����� J� Tian and J� Pearl� A generalidenti�cation condition for causal e�ects� In Proceed�ings of the Eighteenth National Conference on Arti�cial Intelligence� pages �������� AAAI Press"The MITPress� Menlo Park� CA� �����

�Tian et al�� ����� J� Tian� A� Paz� and J� Pearl� Find�ing minimal separating sets� Technical Report R�����University of California� Los Angeles� CA� �����

�Wright� ����� S� Wright� Correlation and causation�Journal of Agricultural Research� �� �������� �����

Biometrics Section

26