5.2.1 dags

Causal Inference in Epidemiology Part 2

PH 250B Fall 2016Jade Benjamin- Chung, ‐ PhD MPH Colford- Hubbard ‐ Research Group

Adapted from Professor Jen Ahern’s 250B slides

Outline1. What does causal inference entail?2. Using directed acyclic graphs

a. DAG basicsb. Identifying confoundingc. Understanding selection bias

3. Causal perspective on effect modificationa. Brief recap of effect modification (EM)b. Linking EM in our studies to realityc. Types of interactiond. Causal interaction / EM

1. Sufficient cause model (“causal pies”)2. Potential outcomes model (“causal types”)

e. Choosing which measure of interaction to estimate and report4. Integrating causal concepts into your research

Reality of study design• We often don’t have ideal data on our population

of interest

• The data we collect are incomplete

• Statistics can help us understand correlations or associations between exposures and outcomes

• Typically what we really want to know is if the exposure causes the outcome

What does causal inference entail?

• Careful definition of our estimation goals

• A set of assumptions that allow us to link our observed data to ideal data that would be used to reach our goals

• Causal inference techniques help us– Express assumptions about our data in a transparent,

mathematical form– Provide us with mathematical steps to translate

assumptions into quantities that can be estimated with observed data

Pearl, Glymour & Jewell, 2016

5

1. Define research hypothesis– Your hypothesis can include possible effect modification– Determine to what extent you aim to make causal inferences

using your data2. Determine study design (trial, cohort, etc.)3. Draw a DAG

a. Identify potential confoundersb. Choose which variables to measure

4. Analyze your dataa.b.

c.

Control for confounders identified in step 3Assess effect modification on the additive or multiplicative scaleMake statistical inferences

5. Make scientific inferences about your hypothesis

Causal inference in your research

Outline1. What does causal inference entail?2. Using directed acyclic graphs

a. DAG basicsb. Identifying confoundingc. Understanding selection bias

3. Causal perspective on effect modificationa. Brief recap of effect modification (EM)b. Linking EM in our studies to realityc. Types of interactiond. Causal interaction / EM

1. Sufficient cause model (“causal pies”)2. Potential outcomes model (“causal types”)

e. Choosing which measure of interaction to estimate and report4. Integrating causal concepts into your research

Causal diagrams as mathematical language

“Graphical methods now provide a powerful symbolic machinery for deriving the consequences of causal assumptionswhen such assumptions are combined with statistical data.”

Pearl J, 2009, Causality

8

Directed Acyclic Graphs (DAGs)• Visually depict assumptions about causal relationships

between exposures, outcomes, and other variables– Depicts the “data generating process”

• DAGs depict our knowledge (or beliefs) about the “data generating process”

• DAGs are informed by subject matter knowledge, prior research, and a priori hypotheses

• Learning curve on terminology and approach – practice helps! Can be very useful tool once you are comfortable with it

How can we use DAGs?Generally• Document assumptions about cause- effect ‐

relationships• Explore implications of those assumptions• Assess how to make causal inferences from both

one’s data and one’s assumptionsToday• To understand selection bias• To identify confounding

Pearl J, 2009, Causality

• Direct causal relationships between variables are represented by arrows– Directed– All causal relationships have a direction– A given variable cannot be simultaneously a cause

and an effect

SESPrenatal Care

10

DAG construction

Malnutrition (M)

Infection (I) I (t=0) I (t=1)

M (t=0) M (t=1)

• There are no feedback loops– Acyclic– Causes always precede their effects– To avoid feedback loops, extend graph over time

11

DAG construction

Vitamins Birth Defects

Prenatal Care

Difficulty conceivingSES

Maternal genetics

• Parent & Child:– Directly connected by an arrow– Prenatal care is a “parent” of birth defects– Birth defects is a “child” of prenatal‐ care

12

DAG terminology

Vitamins Birth Defects

Prenatal Care

Difficulty conceivingSES

Maternal genetics

• Ancestor & Descendant:– Connected by a directed path of a series of arrows– SES is an “ancestor” of Birth Defects– Birth Defects is a “descendant” of SES

13

DAG terminology

Smoking

Smoking

CancerTar Mutations

Cancer

• Absence of a directed path from X to Y implies X has no effect on Y– Directed paths not in the graph as important as those in

the graph• Note: Not all intermediate steps between two

variables need to be represented– Depends on level of detail of the model

14

DAG assumptions

• DAGs assume that all common causes of exposure and disease are included– Common causes that are not observed should still be

included– These are often denoted with a “U” to indicate they were

unmeasured

U (religious beliefs, culture, lifestyle, etc.)

Alcohol Use

Smoking

Heart Disease

DAG assumptions15

Example

Speed

Bicycle Fall

16

Example

Speed

Bicycle Characteristics

Road/Lane/Path Surface

Bicycle TrafficRoad Grade

Car Traffic

Bicycle Fall

17

SpeedCar Traffic

Example

Rider Skill/Experience




Bicycle Fall

18

Speed




Car Traffic

Populace BicycleAwareness

Bicycle Lane/Path

Example


Bicycle Fall

19

Speed



Bicycle Traffic


Road Grade

Car Traffic

Populace BicycleAwareness

Bicycle Lane/Path

Example

Bicycle Fall

20

What are some assumptions are we making?

Bicycle lane/path only has an effect on bicycle falls throughits effect on bicycle traffic

Road surface does not affect bicycle traffic

All common causes of speed and bicycle fall are included (even those unmeasured)

Statistical underpinnings of DAGs

• Multiple possible causal models for this DAG:

Y Z

X = School funding Y = SAT ScoresZ = College AcceptanceX = UX

Y = (x/3) + UY

Z = (y/16) + UZ

X = Number of hours worked per week Y = Number of training hours per week Z = Race completion time

X = UX

Y = 84 – x + UY

Z = (100/y) + UZ

UX UY

X

UZ



• Both models share the same statistical relationships:

• Z and Y are dependent• Y and X are dependent• Z and X are likely dependent• Z and X are independent

depending on the values of Y

Y Z

UX UY

X

UZ

X = School funding Y = SAT ScoresZ = College Acceptance

X = UX

Y = (x/3) + UY

Z = (y/16) + UZPearl, Glymour & Jewell, 2016


• Both models share the same statistical relationshipsFor specific values of these variables (lower case x, y, z):

• Z and Y are dependent• Y and X are dependent• Z and X are likely dependent• Z and X are independent

depending on the values of Y

Y Z

UX UY

X

UZ


Conditioning on a variable in a DAG

• “Conditioning” on a variable means filtering the data into groups based on the value of a variable

• A box is often used around a variable denote that it is being conditioned on (e.g., in this DAG we condition on Y)

• This is equivalent to stratifying the data or controlling for a variable in a statistical model

X YZ

UX UY UZ

DAG configurationsX

X

Y Z

Y

Z

X Z

Y

Chain

Fork

Collider* Has special considerations and challenges


CancerDiet

BMI

Colliders

In this example, BMI is a collider

26

CancerDiet

BMI

CollidersConditioning on BMI induces an association between diet and cancer

CancerDiet

BMI

Among those who have had a BMI decrease there will be larger numbers of dieters and larger numbers ofpeople with cancer than in the total population

27

Colliders

YX

Why does conditioning on a collider induce an association between its parents?

Example: Z=X+Y


28

Do not condition on Z:

• X=3 ! know nothing about Y

Z

Colliders

YX

Z

Why does conditioning on a collider induce an association between its parents?

Example: Z=X+Y


29

Do not condition on Z:

• X=3 ! know nothing about Y

Condition on Z:

• Z=10, X=3 ! know Y=7

Thus, X and Y are dependent given that (“conditional on”) Z = 10

30

Strengths of DAGs• Can determine which variables depend on each

other in our causal model without knowing thespecific functions (e.g., Z=X+Y in the previous slide) connecting them (Pearl, Glymour & Jewell, 2016)

• Allow us to link our causal model to our statistical relationships in our data

• DAGS can incorporate measurement error as well(Hernan & Cole, 2009)

31

Limitations of DAGs• Cannot display effect modification easily (example of road

surface)

• Arrows in graphs do not provide specific definitions of effects (contrast with counterfactuals)

• Can become extremely complicated when representing real data structures

• Are not designed to capture effects of infectious disease interventions that may impact not only intervention recipients but also non- recipients ‐ (e.g., herd effects of vaccines)

DAG limitations

Example of extremely complicated

32

33

DAGs• DAG itself is not used to analyze data from the study

you’ve conducted– Informs how study is designed/data are collected– Informs how data are analyzed– Helps identify which research questions are answerable in

a given data set• Utility of DAGs dependent on accuracy/correctness

of associations we represent in the diagram

34

Non- ‐parametric structural equation models

• Non- parametric ‐ structural equation models (NPSEMs) provide a link between DAGs and counterfactuals and are a way to analyze data

• They encode relationships between variables that can include many possible equations and functional forms

• Non- parametric ‐ estimation used to avoid assumptions of typical SEMs (e.g., linearity)

• Learn more about this topic in PH252D

Example of NPSEM

Y Z

UX UY

X

UZ

Previous example:


X = UX

Y = (x/3) + UY

Z = (y/16) + UZPearl, Glymour & Jewell, 2016

NPSEM:


X = fX(UX)Y = fY(X, UY) Z = fZ(Y, UZ)

Education

5.2.1 dags