24
Introduction to DAGs Directed Acyclic Graphs Metalund and SIMSAM EarlyLife Seminar, 22 March 2013 Jonas Björk E-mail: [email protected] (Fleischer & Diez Roux 2008)

Statistik för kliniska prövningar

Embed Size (px)

Citation preview

Page 1: Statistik för kliniska prövningar

Introduction to DAGs Directed Acyclic Graphs

Metalund and SIMSAM EarlyLife

Seminar, 22 March 2013

Jonas Björk

E-mail: [email protected]

(Fleischer & Diez Roux 2008)

Page 2: Statistik för kliniska prövningar

Introduction to DAGs

• Basic terminology and principles

• Classification of bias in the DAG framework

• Concerns and limitations

• Examples from analysis of

neighbourhood health effects

Page 3: Statistik för kliniska prövningar

DAG – a logical system for

causal relationships Development of Western science is based on

two great achievements:

1. the invention of the formal logical system

by the Greek philosophers

2. the discovery of the possibility to find out causal

relationships by systematic experiment during the

Renaissance

(Albert Einstein, 1953; adopted from Pearl 2009)

Page 4: Statistik för kliniska prövningar

Directed Acyclic Graphs (DAGs)

• Causal diagrams - visualize causal (structural)

relationships between variables

• Based on mathematical theory and reasoning

• Used in

– Epidemiology, social science

– Computer science, artificial Intelligence

– Economics, Business administration

– Cognitive science

– ...

• Minimize bias - identify appropriate (and

inappropriate) analytical strategies

Page 5: Statistik för kliniska prövningar

DAGs - nodes and arrows

• Nodes represent variables

– Measured and unmeasured

– Observable and unobservable Structural Equation Models (SEMs);

• Directed arrows (single-headed)

show direct causal effects

(Hernan, Epidemiology 2004)

Page 6: Statistik för kliniska prövningar

DAG Language • Direct effect (only one)

– E affects D directly if there is an arrow from E to D

E D

• Indirect effect (can be more than one) – E affects D indirectly if there are a sequence of

directed arrows starting in E and ending in D

E M D

• Children – Variables directly affected by E

– Descendants: directly or indirectly affected by E

• Parents – Variables that directly affect E

– Ancestors: all variables that affect E directly or indirectly

Page 7: Statistik för kliniska prövningar

DAG Language - Example • Direct effect

• Indirect effect

• Children, Descendants

• Parents, Ancestors

(Fleischer & Diez Roux 2008)

Page 8: Statistik för kliniska prövningar

Acyclic graphs

E0

D

Loops not allowed:

Temporal associations can be depicted

in the following way:

E

E1 D

Time moves from left to right

in the graph

Page 9: Statistik för kliniska prövningar

Paths (E – Z – D) • Path

– Sequence of arrows connecting two variables,

regardless of the direction of the arrows

– E → Z ← D

– E → Z → D

– E ← Z → D

– E ← Z ← D

• Collider (common cause within a path)

– Variable Z in a path that has two arrows pointing into it

– E → Z ← D

– Blocks (breaks) the information chain between E and D

• Unblocked backdoor path from E to D

– Begins with arrow pointing into E

– Ends with arrow pointing into D

– Does not contain a collider

This is the

origin of

confounding

Page 10: Statistik för kliniska prövningar

DAG Language - Example • Path

• Collider

• Unblocked backdoor

path

(Fleischer & Diez Roux 2008)

Page 11: Statistik för kliniska prövningar

Common cause

(Hernan, Epidemiology 2004)

If we want to illustrate the E-D association,

all common causes must be included,

otherwise the DAG is not considered causal

Page 12: Statistik för kliniska prövningar

Common effect Common consequence

Collider on the path between E and D

• Conditioning (“knowing the value of”)

– Restriction

– Stratification

– Matching

– Adjustment

(Hernan, Epidemiology 2004)

Creates an association

between E and D

Page 13: Statistik för kliniska prövningar

DAG Language - Example • Common cause

• Common effect

• Conditioning

(Fleischer & Diez Roux 2008)

Page 14: Statistik för kliniska prövningar

Bias

• Structural association between exposure (E)

and outcome (D) that is not causal (from E to D)

– Reversed causality (Information bias?)

– Confounding

– Selection bias

Thus, under the causal null hypothesis,

exposure and outcome will still be associated

Page 15: Statistik för kliniska prövningar

Association vs. causation E - D associations can have three different

structural origins according to DAG theory:

1. Cause and effect (watch out for reversed causality)

2. Common cause (confounding)

3. Conditioning on a common effect (selection bias)

(Hernan et al. 2004)

Chance is not a

structural source

of association!

Page 16: Statistik för kliniska prövningar

Appropriate design and

analytical strategy 1. Design that avoids reversed causality

2. Control confounding by blocking

backdoor paths from E to D

(conditioning)

3. ....identify selection bias

introduced by conditioning

Page 17: Statistik för kliniska prövningar

Small Group Discussion DAGs

Which variables should we adjust for in order to estimate

1) the total (direct + indirect) effect

2) the direct effect

of neighborhood violence on CVD? Motivate your answers!

(Fleischer & Diez Roux 2008)

• Control confounding

by conditioning

• Identify selection bias

from conditioning

Page 18: Statistik för kliniska prövningar

Confounding controls in DAGs

There exists formal methods (and software) to

1. Determine the set S of covariates that is necessary to control

for confounding

2. Determine whether set S of covariates is minimally sufficient to

control for confounding

Have we discovered all unblocked backdoor paths?

Is there redundancy in the set of blocking variables?

(Fleischer & Diez Roux 2008)

Page 19: Statistik för kliniska prövningar

Minimally sufficient?

1. Delete all arrows starting at E (Neighbourhood violence)

2. Connect all variables that share a child or descendent in S

3. Is there any unblocked backdoor paths from E to D (CVD)

that does not pass through S?

Suppose we think

that S={Income, PA}

is sufficient to control

for when estimating

the direct effect?

Page 20: Statistik för kliniska prövningar

If you still think you can rely

on your intuition...

Which variables should we adjust for in order to estimate

the effect of E and D? Motivate your answer!

Z1

Z3

Z2

Z4 Z5

Z6 E D

(Adopted from Pearl 2009, p. 80)

Page 21: Statistik för kliniska prövningar

DAGs – concerns

and limitations • How much should be included?

– All common causes must be included

– A complete DAG for several exposures and outcomes

can be quite messy

• Binary nature

– Effect / no effect

– Effect size, dose-response, magnitude of interaction etc. cannot be

depicted

• Assumes a “perfect” study setting

– Correctly specified model, no measurement errors,

continuous monitoring of outcome in longitudinal settings etc.

– Limited guidance in the choice of analytical strategy

in less perfect settings (e.g. trade-off confounding vs. selection)

Page 22: Statistik för kliniska prövningar

DAGs – How much should be included?

(de Jong et al. 2012)

Page 23: Statistik för kliniska prövningar

DAGs in longitudinal survey settings

Time

Different types of effects

1. Trigger effect

2. Effect of long-time exposure

3. Effect with long-time effect on outcome

4. Delayed effect

Dt

Et

Dt - 1

Et - 1

t -1 t

Common cause

Common consequence (collider)

Page 24: Statistik för kliniska prövningar

Additional Reading • Pearl J. Causality – models, reasoning and inference.

Cambridge University Press 2009 (second edition)

• Fleischer NL & Diez Roux AV. Using directed acyclic

graphs to guide analyses of neighbourhood health

effects: an introduction. J Epidemiol Community

Health 2008;62:842-846

• Hernan et al. A structural approach to selection bias.

Epidemiology 2004;15:615-625