The Logic of Biases via Causal Diagrams€¦ · The Logic of Biases via Causal Diagrams Sander Greenland Epidemiology and Statistics University of California. ... (EMM) Greenland

29 marzo 08 Greenland 1

The Logic of Biases via

Causal Diagrams

Sander Greenland

Epidemiology and Statistics

University of California


Two definitions of bias:

• Epidemiology: Nonrandom differencebetween an estimate and the true value ofthe target parameter; systematic error;invalidity.

• Statistics: Any difference between theaverage value of an estimator and the truevalue of the target parameter (e.g., arelative risk)

There are subtle differences between thetwo; the second definition subsumes otherimportant problems.


Types of bias

Epi categories (overlapping):

• Confounding (nonrandom exposure)

• Selection bias (nonrandom sampling)

• Bias from measurement error

Further statistical categories (often importantbut overlooked in epidemiology):

• Bias from use of a wrong model form(model-form mis-specification)

• Method invalidity (e.g., stepwise selection)

• Method failure (e.g., sparse-data bias)


• There are many finer divisions of epi bias,

but they obscure the underlying deductive

logic of the biases

• Logic is about conclusions that could be

drawn regardless of the content

• Logical deduction concerns what must

follow from what is assumed

• Deductions can only be hypotheticals of

the form “If we assume this, we can

deduce that…” (some would say this is all

science can offer beyond data)


The easiest way to remember the

logic of epidemiologic biases:

Causal diagrams

• Causal diagrams are schematics for

causal explanations (e.g., “Process P may

have caused bias B”) of possible

associations.

• Diagramming a study can reveal many

avenues for bias that are otherwise

overlooked.


Directed acyclic graphs (DAGs)

and causal diagrams• A directed acyclic graph shows the factors

in the problem linked by arrows only, with

no feedback loops.

• A graph is a causal diagram if the arrows

are interpreted as links in causal chains

• Causal effects of one variable on another

are transmitted by causal sequences,

which are directed (head-tail) paths:

X Y Z means X can affect Z


Example DAG: A and B can affect

any variable except each other

A B

C

F

E D


Colliders vs. noncolliders on a path• Paths are closed at colliders: Associations

cannot be transmitted across a collider( C ) on a path unless we stratify(condition) on it or something it affects(such as F in C F).

• Paths are open (unblocked) atnoncolliders: Associations can betransmitted across a noncollider ( Cor C ) on a path unless we docompletely stratify on it.


Think of associations as signals

flowing through the graph

• A variable can transmit associations along

some open (unblocked) directions but not

along closed (blocked) directions.

• The open and closed directions are

switched by conditioning (stratifying) on

the variable (and may be partially switched

by partially or indirectly conditioning)


Spot the open and closed

directions for C:

A B

C

F

E D


Colliders vs. noncolliders on a path

• Associations may be transmitted across a

collider ( C ) on a path if we stratify

(condition) on it or something it affects

(such as F in C F).

• Associations may be transmitted across a

noncollider ( C or C ) on a path if

we do not completely stratify on it.

“(C)” = C unobserved, “[C]” = C conditioned



directions for C given C:

A B

[C]

F

E D



directions for C given F:

A B

C

[F]

E D


Closed and open paths

• Closed (blocked) path: Closed at some

variable within the path, hence cannot

transmit associations.

• Open (unblocked) path: Open at all

variables within the path, hence can

transmit associations.

Conditioning may open some closed paths

and close some open paths

(C) = C unobserved, [C] = C conditioned


Spot the open and closed paths,

and rank the signal strengths:

A B

C

F

E D


Size of associations

• The more steps along a given path, the

more attenuated the signal (the weaker

the transmitted association, a.e.), but

• Distance along distinct paths are not

comparable unless all steps (arrows) in

both paths are assigned a size!


EAC larger than EACD (a.e.), but

can’t say relative to ECD

A B

C

F

E D


Spot the open and closed paths

given C:

A B

[C]

F

E D


Spot the open and closed paths

given F:

A B

C

[F]

E D


“Control” of bias

• Target path: A path that transmits some

of the effect we want to estimate; must be

a directed path from cause to effect.

• Biasing path: Any other open path

between the cause and effect variables.

• By judicious conditioning, we must close

all biasing paths without closing target

paths or opening new biasing paths.

This isn’t always possible with available data


Confounding

There are many definitions, none universally

accepted. My definition:

• Noncausal association transmitted via

effects on the outcome

This definition appears to correspond best to

the intuitive definitions given since the 19th

century: Confounding is a mixing of the

effect of interest with other effects on the

outcome (Mill, 1843).


Biasing paths I: Confounding paths

and confounders

• Confounding path: Any path capable of

transmitting confounding

• Confounder: Any variable within a

confounding path

• Without conditioning, all biasing paths in a

DAG are confounding paths,

HOWEVER,

• Upon conditioning other kinds of bias arise


Confounding paths from E to D:

EACD, ECBD, ECD

A B

C

F

E D


Confounding paths from E to D

after conditioning on C: EACBD

A B

[C]

F

E D



EACD, ECBD, ECD, EACBD

A B

C

[F]

E D



ECD

[A] [B]

C

[F]

E D



None!

A [B]

[C]

F

E D


Biasing path from A to B: ACB,

which is not a confounding path!

A B

[C]

F

E D


Selection Bias

There are many definitions, none universallyaccepted. My definition:

• Noncausal association created bynonrandom selection.

This definition appears to correspond best tothe intuitive definitions given in epi textssince the mid-20th century.

• Confounding and selection bias overlap,but one is not always the other. (Usinggraphs, the distinction is not important.)


Confounding that is not selection

bias: ECD

C

F

E D


Selection bias that is not

confounding: Berksonian bias

E D

[S]

T

Uncontrollable biasing path: ESD


Case-control matching is

Intentional selection bias

We must control the matching factor M

to block the bias induced by matching

M

E

[S] [D]


M-bias that is both confounding &

selection bias (via EACBD)

A B

C

[S]

E D


Collider bias: Selection bias and

confounding induced by conditioning

Many variations:

• Beksonian bias

• M-bias

• Confounding produced by control of

intermediates to estimate direct effects, or

by selection affected by intermediates


E has no direct effect on D, but control of

C or F can make it appear so (via ECBD)

E B

[C]

F

D


Instrumental variables:

ED = AED/AE or ED = FAED/FAE

A (B)

E

F

D


Differential measurement error: Can’t tell

direction of bias without further info

A B

(E)

E* D


Independent nondifferential error:

bias toward the null in typical cases

A C

(E)

E*

D


Effect-Measure Modification

(Heterogeneity)

Sander Greenland

Epidemiology and Statistics

University of California


The term “interaction” gets used for

several distinct phenomena:

• Biologic interaction (synergy, antagonism,

coaction): One factor changes the physical

mechanism of action of another.

• “Statistical interaction”: Change in a

measure (of effect or association) upon

change in a third factor.

In the 1970s, few researchers understood

the difference. Many still don’t today (e.g.,

in genetics)


Solution: Invent new term for

“statistical interaction”…

Effect Modification (Miettinen, 1974)

Unfortunately, the term still suggests

biologic interaction, so Rothman and

Greenland (1998) call it

Effect-Measure Modification (EMM)

Greenland prefers heterogeneity (of effect

or association), already in use in statistics


Consider the simple case

with no confounding:

C

E D


Presence & direction of EMM depends

on measure! (Berkson,1958)

C=1 C=0

E=1 E=0 E=1 E=0

D=1 32 20 10 4

N 105 105 105 105

RD: 32-20 = 12 per 105 10-4 = 6 per 105

RR: 32/20 = 1.6 10/4 = 2.5

NOTE: NO CONFOUNDING PRESENT!

105


Only the RD has a simple relation

to biologic interaction:• If RD changes across strata and there is

no bias, this implies biologic interaction

must be present (known in bioassay since

the 1920s)

• Unfortunately, nearly all epi studies

present RRs only, so confusion remains.

• EMM has no bearing on confounding:

Both require C to be a risk factor given E,

but one can be present with the other

absent!


Most epi studies have little power

to detect EMM, hence:A literature distortion is created:

• Studies examine only the RR

• All of them fail to detect RR modification

• Hence reviewers conclude there is no RRmodification (that the RR is homogeneous)

BUT, if they had examined only RD instead,

• All of them would fail to detect RDmodification and reviewers would infer thatthe RD is homogeneous!

Documents

The Logic of Biases via Causal Diagrams€¦ · The Logic of Biases via Causal Diagrams Sander Greenland Epidemiology and Statistics University of California. ... (EMM) Greenland