35
Causal Inference in Epidemiology Part 2 PH 250B Fall 2016 Jade Benjamin-‐Chung, PhD MPH Colford-‐ Hubbard Research Group Adapted from Professor Jen Ahern’s 250B slides

5.2.1 dags

  • Upload
    a-m

  • View
    722

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 5.2.1 dags

Causal Inference in Epidemiology Part 2

PH 250B Fall 2016Jade Benjamin- Chung, ‐ PhD MPH Colford- Hubbard ‐ Research Group

Adapted from Professor Jen Ahern’s 250B slides

Page 2: 5.2.1 dags

Outline1. What does causal inference entail?2. Using directed acyclic graphs

a. DAG basicsb. Identifying confoundingc. Understanding selection bias

3. Causal perspective on effect modificationa. Brief recap of effect modification (EM)b. Linking EM in our studies to realityc. Types of interactiond. Causal interaction / EM

1. Sufficient cause model (“causal pies”)2. Potential outcomes model (“causal types”)

e. Choosing which measure of interaction to estimate and report4. Integrating causal concepts into your research

Page 3: 5.2.1 dags

Reality of study design• We often don’t have ideal data on our population

of interest

• The data we collect are incomplete

• Statistics can help us understand correlations or associations between exposures and outcomes

• Typically what we really want to know is if the exposure causes the outcome

Page 4: 5.2.1 dags

What does causal inference entail?

• Careful definition of our estimation goals

• A set of assumptions that allow us to link our observed data to ideal data that would be used to reach our goals

• Causal inference techniques help us– Express assumptions about our data in a transparent,

mathematical form– Provide us with mathematical steps to translate

assumptions into quantities that can be estimated with observed data

Pearl, Glymour & Jewell, 2016

Page 5: 5.2.1 dags

5

1. Define research hypothesis– Your hypothesis can include possible effect modification– Determine to what extent you aim to make causal inferences

using your data2. Determine study design (trial, cohort, etc.)3. Draw a DAG

a. Identify potential confoundersb. Choose which variables to measure

4. Analyze your dataa.b.

c.

Control for confounders identified in step 3Assess effect modification on the additive or multiplicative scaleMake statistical inferences

5. Make scientific inferences about your hypothesis

Causal inference in your research

Page 6: 5.2.1 dags

Outline1. What does causal inference entail?2. Using directed acyclic graphs

a. DAG basicsb. Identifying confoundingc. Understanding selection bias

3. Causal perspective on effect modificationa. Brief recap of effect modification (EM)b. Linking EM in our studies to realityc. Types of interactiond. Causal interaction / EM

1. Sufficient cause model (“causal pies”)2. Potential outcomes model (“causal types”)

e. Choosing which measure of interaction to estimate and report4. Integrating causal concepts into your research

Page 7: 5.2.1 dags

Causal diagrams as mathematical language

“Graphical methods now provide a powerful symbolic machinery for deriving the consequences of causal assumptionswhen such assumptions are combined with statistical data.”

Pearl J, 2009, Causality

Page 8: 5.2.1 dags

8

Directed Acyclic Graphs (DAGs)• Visually depict assumptions about causal relationships

between exposures, outcomes, and other variables– Depicts the “data generating process”

• DAGs depict our knowledge (or beliefs) about the “data generating process”

• DAGs are informed by subject matter knowledge, prior research, and a priori hypotheses

• Learning curve on terminology and approach – practice helps! Can be very useful tool once you are comfortable with it

Page 9: 5.2.1 dags

How can we use DAGs?Generally• Document assumptions about cause- effect ‐

relationships• Explore implications of those assumptions• Assess how to make causal inferences from both

one’s data and one’s assumptionsToday• To understand selection bias• To identify confounding

Pearl J, 2009, Causality

Page 10: 5.2.1 dags

• Direct causal relationships between variables are represented by arrows– Directed– All causal relationships have a direction– A given variable cannot be simultaneously a cause

and an effect

SESPrenatal Care

10

DAG construction

Page 11: 5.2.1 dags

Malnutrition (M)

Infection (I) I (t=0) I (t=1)

M (t=0) M (t=1)

• There are no feedback loops– Acyclic– Causes always precede their effects– To avoid feedback loops, extend graph over time

11

DAG construction

Page 12: 5.2.1 dags

Vitamins Birth Defects

Prenatal Care

Difficulty conceivingSES

Maternal genetics

• Parent & Child:– Directly connected by an arrow– Prenatal care is a “parent” of birth defects– Birth defects is a “child” of pre- natal‐ care

12

DAG terminology

Page 13: 5.2.1 dags

Vitamins Birth Defects

Prenatal Care

Difficulty conceivingSES

Maternal genetics

• Ancestor & Descendant:– Connected by a directed path of a series of arrows– SES is an “ancestor” of Birth Defects– Birth Defects is a “descendant” of SES

13

DAG terminology

Page 14: 5.2.1 dags

Smoking

Smoking

CancerTar Mutations

Cancer

• Absence of a directed path from X to Y implies X has no effect on Y– Directed paths not in the graph as important as those in

the graph• Note: Not all intermediate steps between two

variables need to be represented– Depends on level of detail of the model

14

DAG assumptions

Page 15: 5.2.1 dags

• DAGs assume that all common causes of exposure and disease are included– Common causes that are not observed should still be

included– These are often denoted with a “U” to indicate they were

unmeasured

U (religious beliefs, culture, lifestyle, etc.)

Alcohol Use

Smoking

Heart Disease

DAG assumptions15

Page 16: 5.2.1 dags

Example

Speed

Bicycle Fall

16

Page 17: 5.2.1 dags

Example

Speed

Bicycle Characteristics

Road/Lane/Path Surface

Bicycle TrafficRoad Grade

Car Traffic

Bicycle Fall

17

Page 18: 5.2.1 dags

SpeedCar Traffic

Example

Rider Skill/Experience

Bicycle Characteristics

Road/Lane/Path Surface

Bicycle TrafficRoad Grade

Bicycle Fall

18

Page 19: 5.2.1 dags

Speed

Bicycle Characteristics

Road/Lane/Path Surface

Bicycle TrafficRoad Grade

Car Traffic

Populace BicycleAwareness

Bicycle Lane/Path

Example

Rider Skill/Experience

Bicycle Fall

19

Page 20: 5.2.1 dags

Speed

Rider Skill/Experience

Bicycle Characteristics

Bicycle Traffic

Road/Lane/Path Surface

Road Grade

Car Traffic

Populace BicycleAwareness

Bicycle Lane/Path

Example

Bicycle Fall

20

What are some assumptions are we making?

Bicycle lane/path only has an effect on bicycle falls throughits effect on bicycle traffic

Road surface does not affect bicycle traffic

All common causes of speed and bicycle fall are included (even those unmeasured)

Page 21: 5.2.1 dags

Statistical underpinnings of DAGs

• Multiple possible causal models for this DAG:

Y Z

X = School funding Y = SAT ScoresZ = College AcceptanceX = UX

Y = (x/3) + UY

Z = (y/16) + UZ

X = Number of hours worked per week Y = Number of training hours per week Z = Race completion time

X = UX

Y = 84 – x + UY

Z = (100/y) + UZ

UX UY

X

UZ

Pearl, Glymour & Jewell, 2016

Page 22: 5.2.1 dags

Statistical underpinnings of DAGs

• Both models share the same statistical relationships:

• Z and Y are dependent• Y and X are dependent• Z and X are likely dependent• Z and X are independent

depending on the values of Y

Y Z

UX UY

X

UZ

X = School funding Y = SAT ScoresZ = College Acceptance

X = UX

Y = (x/3) + UY

Z = (y/16) + UZPearl, Glymour & Jewell, 2016

Page 23: 5.2.1 dags

Statistical underpinnings of DAGs

• Both models share the same statistical relationshipsFor specific values of these variables (lower case x, y, z):

• Z and Y are dependent• Y and X are dependent• Z and X are likely dependent• Z and X are independent

depending on the values of Y

Y Z

UX UY

X

UZ

Pearl, Glymour & Jewell, 2016

Page 24: 5.2.1 dags

Conditioning on a variable in a DAG

• “Conditioning” on a variable means filtering the data into groups based on the value of a variable

• A box is often used around a variable denote that it is being conditioned on (e.g., in this DAG we condition on Y)

• This is equivalent to stratifying the data or controlling for a variable in a statistical model

X YZ

UX UY UZ

Page 25: 5.2.1 dags

DAG configurationsX

X

Y Z

Y

Z

X Z

Y

Chain

Fork

Collider* Has special considerations and challenges

Pearl, Glymour & Jewell, 2016

Page 26: 5.2.1 dags

CancerDiet

BMI

Colliders

In this example, BMI is a collider

26

Page 27: 5.2.1 dags

CancerDiet

BMI

CollidersConditioning on BMI induces an association between diet and cancer

CancerDiet

BMI

Among those who have had a BMI decrease there will be larger numbers of dieters and larger numbers ofpeople with cancer than in the total population

27

Page 28: 5.2.1 dags

Colliders

YX

Why does conditioning on a collider induce an association between its parents?

Example: Z=X+Y

Pearl, Glymour & Jewell, 2016

28

Do not condition on Z:

• X=3 ! know nothing about Y

Z

Page 29: 5.2.1 dags

Colliders

YX

Z

Why does conditioning on a collider induce an association between its parents?

Example: Z=X+Y

Pearl, Glymour & Jewell, 2016

29

Do not condition on Z:

• X=3 ! know nothing about Y

Condition on Z:

• Z=10, X=3 ! know Y=7

Thus, X and Y are dependent given that (“conditional on”) Z = 10

Page 30: 5.2.1 dags

30

Strengths of DAGs• Can determine which variables depend on each

other in our causal model without knowing thespecific functions (e.g., Z=X+Y in the previous slide) connecting them (Pearl, Glymour & Jewell, 2016)

• Allow us to link our causal model to our statistical relationships in our data

• DAGS can incorporate measurement error as well(Hernan & Cole, 2009)

Page 31: 5.2.1 dags

31

Limitations of DAGs• Cannot display effect modification easily (example of road

surface)

• Arrows in graphs do not provide specific definitions of effects (contrast with counterfactuals)

• Can become extremely complicated when representing real data structures

• Are not designed to capture effects of infectious disease interventions that may impact not only intervention recipients but also non- recipients ‐ (e.g., herd effects of vaccines)

Page 32: 5.2.1 dags

DAG limitations

Example of extremely complicated

32

Page 33: 5.2.1 dags

33

DAGs• DAG itself is not used to analyze data from the study

you’ve conducted– Informs how study is designed/data are collected– Informs how data are analyzed– Helps identify which research questions are answerable in

a given data set• Utility of DAGs dependent on accuracy/correctness

of associations we represent in the diagram

Page 34: 5.2.1 dags

34

Non- ‐parametric structural equation models

• Non- parametric ‐ structural equation models (NPSEMs) provide a link between DAGs and counterfactuals and are a way to analyze data

• They encode relationships between variables that can include many possible equations and functional forms

• Non- parametric ‐ estimation used to avoid assumptions of typical SEMs (e.g., linearity)

• Learn more about this topic in PH252D

Page 35: 5.2.1 dags

Example of NPSEM

Y Z

UX UY

X

UZ

Previous example:

X = School funding Y = SAT ScoresZ = College Acceptance

X = UX

Y = (x/3) + UY

Z = (y/16) + UZPearl, Glymour & Jewell, 2016

NPSEM:

X = School funding Y = SAT ScoresZ = College Acceptance

X = fX(UX)Y = fY(X, UY) Z = fZ(Y, UZ)