Otis Dudley Duncan Memorial Lecture: THE RESURRECTION OF DUNCANISM

Preview:

DESCRIPTION

Otis Dudley Duncan Memorial Lecture: THE RESURRECTION OF DUNCANISM. Judea Pearl University of California Los Angeles (www.cs.ucla.edu/~judea/). OUTLINE. Duncanism = Causally Assertive SEM History: Oppression, Distortion, and Resurrection The Old-New Logic of SEM New Tools - PowerPoint PPT Presentation

Citation preview

11

Otis Dudley DuncanMemorial Lecture:

THE RESURRECTION OF DUNCANISM

Judea PearlUniversity of California

Los Angeles(www.cs.ucla.edu/~judea/)

22

OUTLINE

1. Duncanism = Causally Assertive SEM

2. History: Oppression, Distortion, and Resurrection

3. The Old-New Logic of SEM

4. New Tools

4.1 Local testing

4.2 Non-parametric identification

4.3 Logic of counterfactuals

4.3 Non linear mediation analysis

33

Duncan book

44

Duncan book

55

FINDING INSTRUMENTAL VARIABLES

Can you find a n instrument for identifying b34? (Duncan, 1975)

By inspection: X2 d-separate X1 from V Therefore X1 is a valid instrument

66

DUNCANISM = ASSERTIVE SEM

• [y = bx + u] may be read as "a change in x or u produces a change in y” ... or “x and u are the causes of y” (Duncan, 1975, p. 1)

• “[the disturbance] u stands for all other sources of variation in y” (ibid)

• “doing the model consists largely in thinking about what kind of model one wants and can justify.” (ibid, p. viii)

• Assuming a model for sake of argument, we can express its properties in terms of correlations and (sometimes) find one or more conditions that must hold if the model is true“ (ibid, p. 20).

SEM – A tool for deriving causal conclusion from data and assumptions:

77

HISTORY: BIRTH, OPPRESSION, DISTORTION, RESURRECTION

•Birth and Development (1920 - 1980)Sewell Wright (1921), Haavelmo (1943), Simon (1950), Marschak (1950), Koopmans (1953), Wold and Strotz (1963), Goldberger (1973), Blalock (1964), and Duncan (1969, 1975)

•The regressional assault (1970-1990)Causality is meaningless, therefore, to be meaningful, SEM must be a regression technique, not “causal modeling.” Richard (1980) Cliff (1983), Dempster (1990), Wermuth (1992), and Muthen (1987)    

•The Potential-outcome assault (1985-present) Causality is meaningful but, since SEM is a regression technique, it could not possibly have causal interpretation. Rubin (2004, 2010), Holland (1986) Imbens (2009), andSobel (1996, 2008)

88

REGRESSION VS. STRUCTURAL EQUATIONS(THE CONFUSION OF THE CENTURY)

Regression (claimless, nonfalsifiable): Y = ax + Y

Structural (empirical, falsifiable): Y = bx + uY

Claim: (regardless of distributions): E(Y | do(x)) = E(Y | do(x), do(z)) = bx

Q. When is b a partial regression? b = YX •

A. Shown in the diagram, Slide 40.

The mothers of all questions:Q. When would b equal a?A. When (uY X), read from the diagram

99

THE POTENTIAL-OUTCOME ASSAULT (1985-PRESENT)

•“I am speaking, of course, about the equation:What does it mean? The only meaning I have ever determined for such an equation is that it is a shorthand way of describing the conditional distribution of {y} given {x}.” (Holland 1995)

•“The use of complicated causal-modeling software [read SEM] rarely yields any results that have any interpretation as causal effects.” (Wilkinson and Task Force 1999 on Statistical Methods in Psychology Journals: Guidelines and Explanations)

}.{ bxay

1010

THE POTENTIAL-OUTCOME ASSAULT (1985-PRESENT) (Cont)

•In general (even in randomized studies), the structural and causal parameters are not equal, implying that the structural parameters should not be interpreted as effect.” (Sobel 2008)

•“Using the observed outcome notation entangles the science...Bad! Yet this is exactly what regression approaches, path analyses, directed acyclic graphs, and so forth essentially compel one to do.” (Rubin 1010)

1111

WHY SEM INTERPRETATION IS“SELF-CONTRADICTORY”

D. Freedman JES (1987), p. 114, Fig. 3

"Now try the direct effect of Z on Y:  We intervene by fixing W and X but increasing Z by one unit; this should increase Y by d units.  However, this hypothetical intervention is self-contradictory,because fixing W and increasing Z causes an increase in X.  

The oversight: Fixing X DISABLES equation (7.1);

)2.7(

)1.7(

VdZcXY

UbWaZX

1212

SEM REACTION TOFREEDMAN CRITICS

•It would be very healthy if more researchers abandoned thinking of and using terms such as cause and effect (Muthen, 1987).

•“Causal modeling” is an outdated misnomer (Kelloway, 1998).

•Causality-free, politically-safe vocabulary: “covariance structure” “regression analysis,” or “simultaneous equations.”

•[Causal Modeling] may be somewhat dated, however, as it seems to appear less often in the literature nowadays”(Kline, 2004, p. 9)

Total Surrender:

1313

SEM REACTION TO THE STRUCTURE-PHOBIC   ASSAULT

Galles, Pearl, Halpern (1998 - Logical equivalence)

Heckman-Sobel

Morgan Winship (1997)

Gelman Blog

NONE TO SPEAK OF!

1414

THE RESURRECTION

Why non-parametric perspective?

What are the parameters all about and why we labor to identify them?Can we do without them?

Consider:

Only he who lost the parameters and needs to find substitutes can begin to ask:Do I really need them? What do they really mean? What role do the play?

uxY

),( uxfY

1515

CAUSAL MODEL

(MA)

THE LOGIC OF SEM

A - CAUSAL ASSUMPTIONS

Q Queries of interest

Q(P) - Identified estimands

Data (D)

Q - Estimates of Q(P)

Causal inference

T(MA) - Testable implications

Statistical inference

Goodness of fit

Model testingConditional claims

),|( ADQQ )(Tg

A* - Logicalimplications of A

CAUSAL MODEL

(MA)

1616

TRADITIONAL STATISTICALINFERENCE PARADIGM

Data

Inference

Q(P)(Aspects of P)

PJoint

Distribution

e.g.,Infer whether customers who bought product Awould also buy product B.Q = P(B | A)

1717

What happens when P changes?e.g.,Infer whether customers who bought product Awould still buy A if we were to double the price.

FROM STATISTICAL TO CAUSAL ANALYSIS:1. THE DIFFERENCES

Probability and statistics deal with static relations

Data

Inference

Q(P)(Aspects of P)

P Joint

Distribution

PJoint

Distribution

change

1818

FROM STATISTICAL TO CAUSAL ANALYSIS:1. THE DIFFERENCES

Note: P (v) P (v | price = 2)

P does not tell us how it ought to changeCausal knowledge: what remains invariant

What remains invariant when P changes say, to satisfy P (price=2)=1

Data

Inference

Q(P)(Aspects of P)

P Joint

Distribution

PJoint

Distribution

change

1919

FROM STATISTICAL TO CAUSAL ANALYSIS:1. THE DIFFERENCES (CONT)

CAUSALSpurious correlationRandomization / InterventionConfounding / EffectInstrumental variableExogeneity / IgnorabilityMediation

STATISTICALRegressionAssociation / Independence“Controlling for” / ConditioningOdd and risk ratiosCollapsibility / Granger causalityPropensity score

1. Causal and statistical concepts do not mix.

2.

3.

4.

2020

CAUSALSpurious correlationRandomization / Intervention Confounding / EffectInstrumental variable Exogeneity / IgnorabilityMediation

STATISTICALRegressionAssociation / Independence“Controlling for” / ConditioningOdd and risk ratiosCollapsibility / Granger causality Propensity score

1. Causal and statistical concepts do not mix.

4.

3. Causal assumptions cannot be expressed in the mathematical language of standard statistics.

FROM STATISTICAL TO CAUSAL ANALYSIS:2. MENTAL BARRIERS

2. No causes in – no causes out (Cartwright, 1989)

statistical assumptions + datacausal assumptions causal conclusions }

2121

4. Non-standard mathematics:a) Structural equation models (Wright, 1920; Simon, 1960)

b) Counterfactuals (Neyman-Rubin (Yx), Lewis (x Y))

CAUSALSpurious correlationRandomization / Intervention Confounding / EffectInstrumental variable Exogeneity / IgnorabilityMediation

STATISTICALRegressionAssociation / Independence“Controlling for” / ConditioningOdd and risk ratiosCollapsibility / Granger causalityPropensity score

1. Causal and statistical concepts do not mix.

3. Causal assumptions cannot be expressed in the mathematical language of standard statistics.

FROM STATISTICAL TO CAUSAL ANALYSIS:2. MENTAL BARRIERS

2. No causes in – no causes out (Cartwright, 1989)

statistical assumptions + datacausal assumptions causal conclusions }

2222

Data

Inference

Q(M)(Aspects of M)

Data Generating

Model

M – Invariant strategy (mechanism, recipe, law, protocol) by which Nature assigns values to variables in the analysis.

JointDistribution

THE STRUCTURAL MODELPARADIGM

M

“Think Nature, not experiment!”•

2323

STRUCTURALCAUSAL MODELS

Definition: A structural causal model is a 4-tupleV,U, F, P(u), where• V = {V1,...,Vn} are endogenous variables• U = {U1,...,Um} are background variables• F = {f1,..., fn} are functions determining V,

vi = fi(v, u)• P(u) is a distribution over UP(u) and F induce a distribution P(v) over observable variables

Yuxy e.g.,

2424

CAUSAL MODELS AND COUNTERFACTUALS

Definition: The sentence: “Y would be y (in unit u), had X been x,”

denoted Yx(u) = y, means:The solution for Y in a mutilated model Mx, (i.e., the equations

for X replaced by X = x) with input U=u, is equal to y.

U

X (u) Y (u)

M

U

X = x YX (u)

Mx

2525

CAUSAL MODELS AND COUNTERFACTUALS

Definition: The sentence: “Y would be y (in unit u), had X been x,”

denoted Yx(u) = y, means:The solution for Y in a mutilated model Mx, (i.e., the equations

for X replaced by X = x) with input U=u, is equal to y.

)()( uYuY xMx

The Fundamental Equation of Counterfactuals:

2626

),|(),|'(

)()()|(

')(':'

)(:

yxuPyxyYPN

uPyYPyP

yuxYux

yuxYux

In particular:

)(xdo

CAUSAL MODELS AND COUNTERFACTUALS

Definition: The sentence: “Y would be y (in situation u), had X been x,”

denoted Yx(u) = y, means:The solution for Y in a mutilated model Mx, (i.e., the equations

for X replaced by X = x) with input U=u, is equal to y.•

)(),()(,)(:

uPzZyYPzuZyuYu

wxwx

Joint probabilities of counterfactuals:

2727

3-LEVEL HIERARCHY OF CAUSAL MODELS

1. Probabilistic Knowledge P(y | x)Bayesian networks, graphical models

2. Interventional Knowledge P(y | do(x))Causal Bayesian Networks (CBN) (Agnostic graphs, manipulation graphs)

3. Counterfactual Knowledge P(Yx = y,Yx =y)Structural equation models, physics, functional graphs, “Treatment assignment mechanism”

2828

TWO PARADIGMS FOR CAUSAL INFERENCE

Observed: P(X, Y, Z,...)Conclusions needed: P(Yx=y), P(Xy=x | Z=z)...

How do we connect observables, X,Y,Z,…to counterfactuals Yx, Xz, Zy,… ?

N-R modelCounterfactuals areprimitives, new variables

Super-distribution

Structural modelCounterfactuals are derived quantities

Subscripts modify the model and distribution )()( yYPyYP xMx ,...,,,

,...),,...,,(*

yx

zxZYZYX

XYYXP

constrain

2929

ARE THE TWO PARADIGMS EQUIVALENT?

• Yes (Galles and Pearl, 1998; Halpern 1998)

• In the N-R paradigm, Yx is defined by consistency:

• In SCM, consistency is a theorem.

• Moreover, a theorem in one approach is a theorem in the other.

01 )1( YxxYY

3030

Define:

Assume:

Identify:

Estimate:

Test:

THE FIVE NECESSARY STEPSOF CAUSAL ANALYSIS

Express the target quantity Q as a function Q(M) that can be computed from any model M.

Formulate causal assumptions A using some formal language.

Determine if Q is identifiable given A.

Estimate Q if it is identifiable; approximate it, if it is not.

Test the testable implications of A (if any).

3131

THE FIVE NECESSARY STEPS FOR EFFECT ESTIMATION

))(|())(|( 01 xdoYExdoYE ATE

Define:

Assume:

Identify:

Estimate:

Test:

Express the target quantity Q as a function Q(M) that can be computed from any model M.

Formulate causal assumptions A using some formal language.

Determine if Q is identifiable given A.

Estimate Q if it is identifiable; approximate it, if it is not.

Test the testable implications of A (if any).

3232

FORMULATING ASSUMPTIONSTHREE LANGUAGES

1. English: Smoking (X), Cancer (Y), Tar (Z), Genotypes (U)

Not too friendly:consistent? complete? redundant? arguable?

ZX Y

3. Structural:

},{

),()(

),()()()(

),()(

XYZ

uYuY

uXuXuXuX

uZuZ

zx

zxz

zzyy

yxx

2. Counterfactuals:

3333

Define:

Assume:

Identify:

Estimate:

IDENTIFYING CAUSAL EFFECTSIN POTENTIAL-OUTCOME FRAMEWORK

Express the target quantity Q as a counterfactual formula, e.g., E(Y(1) – Y(0))

Formulate causal assumptions using the distribution:

Determine if Q is identifiable using P* andY=x Y (1) + (1 – x) Y (0).

Estimate Q if it is identifiable; approximate it, if it is not.

))0(),1(,,|(* YYZYXPP

3434

GRAPHICAL – COUNTERFACTUALS SYMBIOSIS

Every causal graph expresses counterfactuals assumptions, e.g., X Y Z

consistent, and readable from the graph.

• Express assumption in graphs• Derive estimands by graphical or algebraic

methods

)()(, uYuY xzx 1. Missing arrows Y Z

2. Missing arcs Y Z yx ZY

3535

IDENTIFICATION IN SCM

Find the effect of X on Y, P(y|do(x)), given the

causal assumptions shown in G, where Z1,..., Zk are auxiliary variables.

Z6

Z3

Z2

Z5

Z1

X Y

Z4

G

Can P(y|do(x)) be estimated if only a subset, Z, can be measured?

3636

ELIMINATING CONFOUNDING BIASTHE BACK-DOOR CRITERION

P(y | do(x)) is estimable if there is a set Z ofvariables such that Z d-separates X from Y in Gx.

Z6

Z3

Z2

Z5

Z1

X Y

Z4

Z6

Z3

Z2

Z5

Z1

X Y

Z4

Z

Gx G

• Moreover,

(“adjusting” for Z) Ignorability

z z zxP

zyxPzPzxyPxdoyP

)|(),,(

)(),|())(|(

3737

IDENTIFYING TESTABLE IMPLICATIONS

Assumptions advertized in the missing edges are Z1 – Z2, Z1 – Y

W3

W1

Z2

W2

Z1

X Y

Z3

Implying:

},{|

},,{|

312

3211

21

ZZXZ

ZZXYZ

ZZ

''

'

341312

3423211

21

ZcZcXcZ

ZbZbXbYbZ

aZZ

The missing edges imply: z = 0, b1 = 0, and c1 = 0.Software routines for automatic detection of all such tests reported in Kyono (2010)

3838

SEPARATION EQUIVALENCE MODEL EQUIVALENCE

3939

FINDING INSTRUMENTAL VARIABLES

Can you find a n instrument for identifying b34? (Duncan, 1975)

By inspection: X2 d-separate X1 from V Therefore X1 is a valid instrument

4040

W1 W4

X YV2V1

W2 W3

Z T

CONFOUNDING EQUIVALENCEWHEN TWO MEASUREMENTS ARE

EQUALLY VALUABLE

L

Z T?

4141

CONFOUNDING EQUIVALENCEWHEN TWO MEASUREMENTS ARE

EQUALLY VALUABLE

Definition:T and Z are c-equivalent if

Definition (Markov boundary):

Markov boundary Sm of S (relative to X) is the minimal

subset of S that d-separates X from all other members of S. Theorem (Pearl and Paz, 2009) Z and T are c-equivalent iff

1. Zm=Tm, or

2. Z and T are admissible (i.e., satisfy the back-door condition)

t z

yxzPzxyPtPtxyP ,)(),|()(),|(

4242

W1 W4

X YV2V1

W2 W3

Z T

CONFOUNDING EQUIVALENCEWHEN TWO MEASUREMENTS ARE

EQUALLY VALUABLE

Z T

L

4343

W1 W4

X YV2V1

W2 W3

CONFOUNDING EQUIVALENCEWHEN TWO MEASUREMENTS ARE

EQUALLY VALUABLE

Z T

Z T

4444

W1 W4

X YV2V1

W2 W3Z T

CONFOUNDING EQUIVALENCEWHEN TWO MEASUREMENTS ARE

EQUALLY VALUABLE

Z T

4545

BIAS AMPLIFICATIONBY INSTRUMENTAL VARIABLES

W2 W1

X Y

U

W1 {W1, W2}

• Adding W2 to Propensity Score increases bias (if such exists) (Wooldridge, 2009)

• In linear systems – always• In non-linear systems – almost always (Pearl, 2010)• Outcome predictors are safer than treatment predictors

4646

EFFECT DECOMPOSITION(direct vs. indirect effects)

1. Why decompose effects?

2. What is the definition of direct and indirect effects?

3. What are the policy implications of direct and indirect effects?

4. When can direct and indirect effect be estimated consistently from experimental and nonexperimental data?

4747

WHY DECOMPOSE EFFECTS?

1. To understand how Nature works

2. To comply with legal requirements

3. To predict the effects of new type of interventions:

Signal routing, rather than variable fixing

4848

X Z

Y

LEGAL IMPLICATIONSOF DIRECT EFFECT

What is the direct effect of X on Y ?

(averaged over z)

))(),(())(),( 01 zdoxdoYEzdoxdoYE ||(

(Qualifications)

(Hiring)

(Gender)

Can data prove an employer guilty of hiring discrimination?

Adjust for Z? No! No!

4949

z = f (x, u)y = g (x, z, u)

X Z

Y

NATURAL INTERPRETATION OFAVERAGE DIRECT EFFECTS

Natural Direct Effect of X on Y:The expected change in Y, when we change X from x0 to x1 and, for each u, we keep Z constant at whatever value it attained before the change.

In linear models, DE = Controlled Direct Effect

][001 xZx YYE

x

);,( 10 YxxDE

Robins and Greenland (1992) – “Pure”

)( 01 xx

5050

z = f (x, u)y = g (x, z, u)

X Z

Y

DEFINITION OFINDIRECT EFFECTS

Indirect Effect of X on Y:The expected change in Y when we keep X constant, say at x0, and let Z change to whatever value it would have attained had X changed to x1.

In linear models, IE = TE - DE

][010 xZx YYE

x

);,( 10 YxxIE

5151

Z

m2

X Y

m1

)(revIEDETE

DETEmmIE

DE

mmTE

21

21

mediation

disablingby prevented Effect

alone mediationby sustained Effect

DETE

IE

IEDETE WHY

IErevIE )(

Disabling mediation

Disabling direct path

DE

TE - DE

TE

IE

In linear systems

Is NOT equal to:

5252

POLICY IMPLICATIONS OF INDIRECT EFFECTS

f

GENDER QUALIFICATION

HIRING

What is the indirect effect of X on Y?

The effect of Gender on Hiring if sex discriminationis eliminated.

X Z

Y

IGNORE

Deactivating a link – a new type of intervention

5353

1. The natural direct and indirect effects are identifiable in Markovian models (no confounding),

2. And are given by:

3. Applicable to linear and non-linear models, continuous and discrete variables, regardless of distributional form.

MEDIATION FORMULAS

z

zxdozPxdozPzxdoYEIE

xdozPzxdoYEzxdoYEDE

))](|())(|())[,(|(

)).(|())],(|()),(|([

010

001

5454

MEDIATION FORMULASIN UNCONFOUNDED MODELS

X

Z

Y

)|()|(

])|()|()[,|([

)|()],|(),|([

01

010

001

xYExYETE

xzPxzPzxYEIE

xzPzxYEzxYEDE

z

z

mediation to owed responses of Fraction

mediationby explained responses of Fraction

DETE

IE

5555

COMPUTING THE MEDIATION FORMULA

X

Z

Y

))((

)()1)((

000101

0011100010gghhIE

hgghggDE

XX ZZ YY EE((Y|x,zY|x,z))==gxz EE((Z|xZ|x))==hhxx

nn11 00 00 00

nn22 00 00 11

nn33 00 11 00

nn44 00 11 11

nn55 11 00 00

nn66 11 00 11

nn77 11 11 00

nn88 11 11 11

0021

2 gnn

n

0143

4 gnn

n

1065

6 gnn

n

1187

8 gnn

n

04321

43 hnnnn

nn

18765

87 hnnnn

nn

5656

RAMIFICATION OF THEMEDIATION FORMULA

• DE should be averaged over mediator levels,

IE should NOT be averaged over exposure levels.

• TE-DE need not equal IETE-DE = proportion for whom mediation is necessary

IE = proportion for whom mediation is sufficient

• TE-DE informs interventions on indirect pathways

IE informs intervention on direct pathways.

5757

W

MEASUREMENT BIAS ANDEFFECT RESTORATION

Unobserved Z

X Y

P(w|z)P(y | do(x)) is identifiable from

measurement of W, if P(w | z) is given (Selen, 1986; Greenland & Lash, 2008)

z

z

z

zyxPzwP

zyxPzyxwP

wzyxPwyxPzwPzyxwP

),,()|(

),,(),,|(

),,,(),,()|(),,|(

w

wyxPwzIzyxP ),,(),(),,(

(local independence)

5858

EFFECT RESTORATION IN BINARY MODELS

),|( 1 yxzP

),|( 1 yxwP

1

1 1

undefined undefined

)0|1(

)1|0(

ZWP

ZWP

)()(1

),|(1

1

)()(1

),|(1

1))(|(

00)0|()0,,(

11)1|()1,,(

wPxP

yxwP

wPxP

yxwPxdoyP

wxPwyxP

wxPwyxP

To cell (x,y,Z = 0)

To cell (x,y,Z = 1)

Weight distribution from cell (x,y,W = 1)

W

Z

X Y

5959

EFFECT RESTORATIONIN LINEAR MODELS

W

Z

X Y

c1 c2c3

c0

(a)

V

Z

X Y

c1 c2c3

c0

W

c4

(b)

)()()()()()()()(

0 WVcovXWcovXarvXVcovWVcovYWcovXVcovXYcov

c

zzxwxx

ywxwxy ckk

kc

2320

/

/

Correlated proxies (Cai & Kuroki, 2008)

6060

I TOLD YOU CAUSALITY IS SIMPLE

• Formal basis for causal and counterfactual

inference (complete)

• Unification of the graphical, potential-outcome

and structural equation approaches

• Friendly and formal solutions to

century-old problems and confusions.

He is wise who bases causal inference on an explicit causal structure that is defensible on scientific grounds.

         (Aristotle 384-322 B.C.)

From Charlie Poole

CONCLUSIONS

6161

They will be answered

QUESTIONS???

Recommended