Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
29 marzo 08 Greenland 1
The Logic of Biases via
Causal Diagrams
Sander Greenland
Epidemiology and Statistics
University of California
29 marzo 08 Greenland 2
Two definitions of bias:
• Epidemiology: Nonrandom differencebetween an estimate and the true value ofthe target parameter; systematic error;invalidity.
• Statistics: Any difference between theaverage value of an estimator and the truevalue of the target parameter (e.g., arelative risk)
There are subtle differences between thetwo; the second definition subsumes otherimportant problems.
29 marzo 08 Greenland 3
Types of bias
Epi categories (overlapping):
• Confounding (nonrandom exposure)
• Selection bias (nonrandom sampling)
• Bias from measurement error
Further statistical categories (often importantbut overlooked in epidemiology):
• Bias from use of a wrong model form(model-form mis-specification)
• Method invalidity (e.g., stepwise selection)
• Method failure (e.g., sparse-data bias)
29 marzo 08 Greenland 4
• There are many finer divisions of epi bias,
but they obscure the underlying deductive
logic of the biases
• Logic is about conclusions that could be
drawn regardless of the content
• Logical deduction concerns what must
follow from what is assumed
• Deductions can only be hypotheticals of
the form “If we assume this, we can
deduce that…” (some would say this is all
science can offer beyond data)
29 marzo 08 Greenland 5
The easiest way to remember the
logic of epidemiologic biases:
Causal diagrams
• Causal diagrams are schematics for
causal explanations (e.g., “Process P may
have caused bias B”) of possible
associations.
• Diagramming a study can reveal many
avenues for bias that are otherwise
overlooked.
29 marzo 08 Greenland 6
Directed acyclic graphs (DAGs)
and causal diagrams• A directed acyclic graph shows the factors
in the problem linked by arrows only, with
no feedback loops.
• A graph is a causal diagram if the arrows
are interpreted as links in causal chains
• Causal effects of one variable on another
are transmitted by causal sequences,
which are directed (head-tail) paths:
X Y Z means X can affect Z
29 marzo 08 Greenland 7
Example DAG: A and B can affect
any variable except each other
A B
C
F
E D
29 marzo 08 Greenland 8
Colliders vs. noncolliders on a path• Paths are closed at colliders: Associations
cannot be transmitted across a collider( C ) on a path unless we stratify(condition) on it or something it affects(such as F in C F).
• Paths are open (unblocked) atnoncolliders: Associations can betransmitted across a noncollider ( Cor C ) on a path unless we docompletely stratify on it.
29 marzo 08 Greenland 9
Think of associations as signals
flowing through the graph
• A variable can transmit associations along
some open (unblocked) directions but not
along closed (blocked) directions.
• The open and closed directions are
switched by conditioning (stratifying) on
the variable (and may be partially switched
by partially or indirectly conditioning)
29 marzo 08 Greenland 10
Spot the open and closed
directions for C:
A B
C
F
E D
29 marzo 08 Greenland 11
Colliders vs. noncolliders on a path
• Associations may be transmitted across a
collider ( C ) on a path if we stratify
(condition) on it or something it affects
(such as F in C F).
• Associations may be transmitted across a
noncollider ( C or C ) on a path if
we do not completely stratify on it.
“(C)” = C unobserved, “[C]” = C conditioned
29 marzo 08 Greenland 12
Spot the open and closed
directions for C given C:
A B
[C]
F
E D
29 marzo 08 Greenland 13
Spot the open and closed
directions for C given F:
A B
C
[F]
E D
29 marzo 08 Greenland 14
Closed and open paths
• Closed (blocked) path: Closed at some
variable within the path, hence cannot
transmit associations.
• Open (unblocked) path: Open at all
variables within the path, hence can
transmit associations.
Conditioning may open some closed paths
and close some open paths
(C) = C unobserved, [C] = C conditioned
29 marzo 08 Greenland 15
Spot the open and closed paths,
and rank the signal strengths:
A B
C
F
E D
29 marzo 08 Greenland 16
Size of associations
• The more steps along a given path, the
more attenuated the signal (the weaker
the transmitted association, a.e.), but
• Distance along distinct paths are not
comparable unless all steps (arrows) in
both paths are assigned a size!
29 marzo 08 Greenland 17
EAC larger than EACD (a.e.), but
can’t say relative to ECD
A B
C
F
E D
29 marzo 08 Greenland 18
Spot the open and closed paths
given C:
A B
[C]
F
E D
29 marzo 08 Greenland 19
Spot the open and closed paths
given F:
A B
C
[F]
E D
29 marzo 08 Greenland 20
“Control” of bias
• Target path: A path that transmits some
of the effect we want to estimate; must be
a directed path from cause to effect.
• Biasing path: Any other open path
between the cause and effect variables.
• By judicious conditioning, we must close
all biasing paths without closing target
paths or opening new biasing paths.
This isn’t always possible with available data
29 marzo 08 Greenland 21
Confounding
There are many definitions, none universally
accepted. My definition:
• Noncausal association transmitted via
effects on the outcome
This definition appears to correspond best to
the intuitive definitions given since the 19th
century: Confounding is a mixing of the
effect of interest with other effects on the
outcome (Mill, 1843).
29 marzo 08 Greenland 22
Biasing paths I: Confounding paths
and confounders
• Confounding path: Any path capable of
transmitting confounding
• Confounder: Any variable within a
confounding path
• Without conditioning, all biasing paths in a
DAG are confounding paths,
HOWEVER,
• Upon conditioning other kinds of bias arise
29 marzo 08 Greenland 23
Confounding paths from E to D:
EACD, ECBD, ECD
A B
C
F
E D
29 marzo 08 Greenland 24
Confounding paths from E to D
after conditioning on C: EACBD
A B
[C]
F
E D
29 marzo 08 Greenland 25
Confounding paths from E to D:
EACD, ECBD, ECD, EACBD
A B
C
[F]
E D
29 marzo 08 Greenland 26
Confounding paths from E to D:
ECD
[A] [B]
C
[F]
E D
29 marzo 08 Greenland 27
Confounding paths from E to D:
None!
A [B]
[C]
F
E D
29 marzo 08 Greenland 28
Biasing path from A to B: ACB,
which is not a confounding path!
A B
[C]
F
E D
29 marzo 08 Greenland 29
Selection Bias
There are many definitions, none universallyaccepted. My definition:
• Noncausal association created bynonrandom selection.
This definition appears to correspond best tothe intuitive definitions given in epi textssince the mid-20th century.
• Confounding and selection bias overlap,but one is not always the other. (Usinggraphs, the distinction is not important.)
29 marzo 08 Greenland 30
Confounding that is not selection
bias: ECD
C
F
E D
29 marzo 08 Greenland 31
Selection bias that is not
confounding: Berksonian bias
E D
[S]
T
Uncontrollable biasing path: ESD
29 marzo 08 Greenland 32
Case-control matching is
Intentional selection bias
We must control the matching factor M
to block the bias induced by matching
M
E
[S] [D]
29 marzo 08 Greenland 33
M-bias that is both confounding &
selection bias (via EACBD)
A B
C
[S]
E D
29 marzo 08 Greenland 34
Collider bias: Selection bias and
confounding induced by conditioning
Many variations:
• Beksonian bias
• M-bias
• Confounding produced by control of
intermediates to estimate direct effects, or
by selection affected by intermediates
29 marzo 08 Greenland 35
E has no direct effect on D, but control of
C or F can make it appear so (via ECBD)
E B
[C]
F
D
29 marzo 08 Greenland 36
Instrumental variables:
ED = AED/AE or ED = FAED/FAE
A (B)
E
F
D
29 marzo 08 Greenland 37
Differential measurement error: Can’t tell
direction of bias without further info
A B
(E)
E* D
29 marzo 08 Greenland 38
Independent nondifferential error:
bias toward the null in typical cases
A C
(E)
E*
D
29 marzo 08 Greenland 39
Effect-Measure Modification
(Heterogeneity)
Sander Greenland
Epidemiology and Statistics
University of California
29 marzo 08 Greenland 40
The term “interaction” gets used for
several distinct phenomena:
• Biologic interaction (synergy, antagonism,
coaction): One factor changes the physical
mechanism of action of another.
• “Statistical interaction”: Change in a
measure (of effect or association) upon
change in a third factor.
In the 1970s, few researchers understood
the difference. Many still don’t today (e.g.,
in genetics)
29 marzo 08 Greenland 41
Solution: Invent new term for
“statistical interaction”…
Effect Modification (Miettinen, 1974)
Unfortunately, the term still suggests
biologic interaction, so Rothman and
Greenland (1998) call it
Effect-Measure Modification (EMM)
Greenland prefers heterogeneity (of effect
or association), already in use in statistics
29 marzo 08 Greenland 42
Consider the simple case
with no confounding:
C
E D
29 marzo 08 Greenland 43
Presence & direction of EMM depends
on measure! (Berkson,1958)
C=1 C=0
E=1 E=0 E=1 E=0
D=1 32 20 10 4
N 105 105 105 105
RD: 32-20 = 12 per 105 10-4 = 6 per 105
RR: 32/20 = 1.6 10/4 = 2.5
NOTE: NO CONFOUNDING PRESENT!
105
29 marzo 08 Greenland 44
Only the RD has a simple relation
to biologic interaction:• If RD changes across strata and there is
no bias, this implies biologic interaction
must be present (known in bioassay since
the 1920s)
• Unfortunately, nearly all epi studies
present RRs only, so confusion remains.
• EMM has no bearing on confounding:
Both require C to be a risk factor given E,
but one can be present with the other
absent!
29 marzo 08 Greenland 45
Most epi studies have little power
to detect EMM, hence:A literature distortion is created:
• Studies examine only the RR
• All of them fail to detect RR modification
• Hence reviewers conclude there is no RRmodification (that the RR is homogeneous)
BUT, if they had examined only RD instead,
• All of them would fail to detect RDmodification and reviewers would infer thatthe RD is homogeneous!