Upload
others
View
16
Download
0
Embed Size (px)
Citation preview
CAMARADES: Bringing evidence to translational medicine
Heterogeneity
Chapters 15 and 16
Introduction to Meta-analysis
Borenstein, Hedges, Higgins &
Rothstein
CAMARADES: Bringing evidence to translational medicine
Heterogeneity
Chapter 15
Overview
• The goal of a synthesis is not simply to compute
a summary effect, but rather to make sense of
the pattern of effects.
– If the effect size is consistent across studies we need
to know that and to consider the implications, and if it
varies we need to know that and consider the
different implications.
CAMARADES: Bringing evidence to translational medicine
• The problem- the observed variation in the estimated
effect sizes is partly spurious as it includes both true
variation in effect sizes and also random error.
• We use the following measures:
– Q statistic (a measure of weighted squared deviations)
– P value (the result of a statistical test based on Q)
– T2 (the between-studies variance)
– T (the between-studies standard deviation)
– I2 (the ratio of true heterogeneity to total observed variation)
CAMARADES: Bringing evidence to translational medicine
• Under the random effects model we allow that true effect
sizes may vary from study to study. We discuss
approaches to identify and then quantify this
heterogeneity.
• When we discuss heterogeneity in effect sizes we mean
the variation in true effect sizes.
• However the variation that we actually observe is partly
spurious, incorporating both (true) heterogeneity and
also random error.
Chapter 16
Identifying and Quantifying
Heterogeneity
True effect size = the effect size in the underlying population, and the effect size
we would observe if the study had an infinitely large sample size (and therefore
no sampling error).
CAMARADES: Bringing evidence to translational medicine
• If all studies in an analysis shared the same true effect size, so that true heterogeneity is zero. We would not expect the observed effects to be identical to each other but because of within-study error, we would expect each to fall within some range of the common effect.
• If the true effect sizes vary from one study to the next , the observed effects vary from one another for 2 reasons:
1. the real heterogeneity in effect size
2. the within-study error
To quantify the heterogeneity we partition the observed variance into these 2 components and then focus on the real heterogeneity in effect sizes.
How do we do this?
CAMARADES: Bringing evidence to translational medicine
1. We compute the total amount of study-to-
study variation actually observed
2. We estimate how much the observed effects
would be expected to vary from each other
if the true effect was actually the same in all
studies
3. The excess variation (if any) is assumed to
reflect real differences in effect sizes (that is,
the heterogeneity)
CAMARADES: Bringing evidence to translational medicine
• Observed effects are
identical in A & B.
• CIs are relatively wide in A
and relatively narrow in B.
• In A, all studies could share
a common effect, with the
observed dispersion falling
within the umbrella of the
CIs.
• The CIs for the B studies
are quite narrow and cannot
comfortably account for the
observed dispersion.
• Similarly, the observed
effects are identical in C &
D where C has wider CIs.
• In C, the effects can be fully
explained by within-study
error, while in D they
cannot.
Dispersion across studies relative
to error within studies
CAMARADES: Bringing evidence to translational medicine
• From A to C both the within-study variance and the observed variance have been
multiplied by 2. Same for B and D.
• The scale has increased but the ratio (observed/within) is unchanged.
• While the effects are more widely dispersed in the 2nd row than in the 1st, this is
not relevant to the purpose of isolating the true dispersion.
• What matters is the ratio of observed to expected dispersion, which is the same in
A and C (and is the same in B and D)
• Q= a statistic that is sensitive to the ratio of the observed variation to the within-
study error, rather than their absolute values.
CAMARADES: Bringing evidence to translational medicine
Computing Q
• 1st step in partitioning the heterogeneity is to compute Q, defined as
» Where Wi is the study weight (1/Vi)
» Yi is the study effect size
» M is the summary effect
» K is the number of studies
• In words, we compute the deviation of each effect size from the mean, square it, weight this by the inverse-variance for that study, and sum these values over all studies to yield the weighted sum of squares (WSS), or Q.
CAMARADES: Bringing evidence to translational medicine
The expected value of Q
based on within-study error
• Because Q is a standardised measure the
expected value does not depend on the metric
of the effect size, but is simply the degrees of
freedom
where k is the number of studies
CAMARADES: Bringing evidence to translational medicine
The excess variation
• Since Q is the observed WSS and df is the
expected WSS (under the assumption that all
studies share a common effect), the difference,
reflects the excess variation, the part that will be
attributed to differences in the true effects from
study to study.
CAMARADES: Bringing evidence to translational medicine
Ratio of observed to expected
variation
A) Observed value of Q =3
Expected value= 5 (k-1)
Observed variation is less than
expected based on within study
error (Q is less than the
degrees of freedom)
B) Observed variation is greater
than we would expect based on
within-study error (Q is greater
than the degrees of freedom)
• Q reflects the total dispersion (WSS)
• Q-df reflects the excess dispersion
CAMARADES: Bringing evidence to translational medicine
• Q = 3 in both A and C because these plots share the same ratio
• Q= 12 in both B and D because these plots share the same ratio
• (despite the fact that the absolute range of effects is higher in C and
D)
CAMARADES: Bringing evidence to translational medicine
How to derive Tau2 and I2 from
Q and df
We can use Q to estimate the
variance (and standard deviation)
of the true effects : start with Q,
remove the dependence on the
number of studies, and return to the
original metric. These are Tau2 and
Tau.
To estimate what proportion of the
observed variance reflects real
differences among studies (rather than
random error) we will start with Q,
remove the dependence on the
number of studies, and express the
results as a ratio, I2
CAMARADES: Bringing evidence to translational medicine
• Researchers typically ask: is the heterogeneity statistically
significant?
• We test the null hypothesis that all studies share a common effect
size.
• Typically we set alpha at 0.10 or at 0.05, with a p value less than
alpha leading us to reject the null hypothesis, and conclude that the
studies do not share a common effect size.
• This test of significance is sensitive both to the magnitude of the
effect (here, the excess dispersion) and the precision with which this
effect is estimated (here, based on the number of studies).
CAMARADES: Bringing evidence to translational medicine
The impact of excess dispersion
Compare plots A vs B, which both have 6
studies.
As the excess dispersion increases (Q
moves from 3.00 in A to 12.00 in B) the p
value moves from 0.70 to 0.035.
Similarly when we compare plots C vs D.
The impact of number of studies
Compare plots A vs C, identical except A
has 6 studies and C has 12 with the
same estimated value of between-study
variation.
With the additional precision the p-value
moves away from zero, from 0.70 (for A)
to 0.83 (for C).
Compare plots B vs D, identical except B
has 6 studies and D has 12 with the
same estimated value of between-
studies variation (Tau2=0.037). With the
added precision the p-value moves
towards zero.
CAMARADES: Bringing evidence to translational medicine
The p-value for the left-hand
columns moves towards 1.0
as we added studies, while the
p-value for the left-hand
column moved towards 0.0 as
we added studies.
At left, since Q is less than df
the additional evidence
strengthens the case that the
excess dispersion is zero, and
moves the p-value towards 1.
At right, since Q exceeds df,
the additional evidence
strengthens the case that the
excess dispersion is not zero,
and moves the p-value
towards 0.
CAMARADES: Bringing evidence to translational medicine
Q and its p-value
• A significant p-value provides evidence that the true effects vary, the converse is not true.
• A nonsignificant p-value should not be taken as evidence that the effect sizes are consistent, since the lack of significance could be due to low power.
• The Q statistic and p-value address only the test of significance and should never be used as surrogates for the amount of true variance.
• A nonsignificant p-value could reflect a trivial amount of observed dispersion, but could also reflect a substantial amount of observed dispersion with imprecise studies
• Similarly, a significant p-value could reflect a substantial amount of observed dispersion but could also reflect a minor amount of observed dispersion with precise studies.
• The purpose of the test is to assess the viability of the null hypothesis, and not to estimate the magnitude of the true dispersion.
CAMARADES: Bringing evidence to translational medicine
Estimating Tau2
• Tau2 is defined as the variance of the true effect sizes
• In other words, if we had an infinitely large (so that the estimate in each study was the true effect) and computed the variance of these effects, this variance would be .
• Since we cannot observe the true effects we cannot compute this variance directly but estimate it from the observed effects, with the estimate denoted T2.
• To yield this estimate we start with the difference (Q-df) which represents the dispersion in true effects on a standardised scale. We divide by a quantity (C) which has the effect of putting the measure back into its original metric and also making it an average, rather than a sum, of squared deviations.
CAMARADES: Bringing evidence to translational medicine
• This means that T2 is in the same metric (squared) as the effect itself,
and also reflects the absolute amount of variation in that scale.
• While the actual variance of the true effects can never be less
than zero, our estimate of this value T2 can be less than zero if,
because of sampling error the observed variance is less than we
would expect based within-study error- in other words, if Q<df. In this
case, T2 is simply set to zero.
• If Q>df then T2 will be positive and it will be based on 2 factors. The
first is the amount of excess variation (Q-df), and the second is the
metric of the effect size index.
CAMARADES: Bringing evidence to translational medicine
• The impact of the excess variation on our estimate of T2 is evident if we compare A vs B.
– The within study error is smaller in B. Therefore while the observed variation is the same in both plots, a
higher proportion of this variation is assumed to be real in B. As we move from A to B, Q moves from
12 to 48.01 and T2 from 0.037 to 0.057.
• The impact of the scale on our estimate of T2 is evident if we compare C vs D.
– Q and df are the same in the 2 plots , which means that the same proportion of the observed variance
will be attributed to between-studies variance. However, the absolute amount of the variance is larger in
D, so this proportion translates into a larger estimate of . As we move from C to D, T2 moves from
0.037 to 0.096.
CAMARADES: Bringing evidence to translational medicine
T2
• T2 (our estimate for the variance of the true effects) is
used to assign weights under the random effects model,
where the weight assigned to each study is
• In words, the total variance for a study (V*Yi) is the sum
of the within-study variance VY and the between-studies
variance, (T2).
• This method of estimating the variance between studies
is known as the method of moments or the DerSimonian
and Laird method.
CAMARADES: Bringing evidence to translational medicine
Tau
• refers to the actual variance and T2 is our
estimate of this parameter.
• Now we turn to the standard deviation of the true
effect sizes.
• Here, refers to the actual standard deviation
and T is our estimate of this parameter.
• T, the estimate of the standard deviation, is
simply the square root of T2.
CAMARADES: Bringing evidence to translational medicine
The expected distribution of true
effects, based on T.
E.g. Plot A the summary effect is 0.41 and T is 0.193. We expect that some 95% of
the true effects will fall in the range of 0.41 plus or minus 1.96 T, or 0.04 to 0.79
and this is reflected in the bell curve.
Plots A and B have the same observed variance , but differ in the proportion of this
variance that is attributed to real differences in effect size.
In A, the bell curve is relatively narrow and captures only a fraction of the observed
dispersion- the rest is assumed to reflect error. In B, the bell curve is relatively
wide, and captures a larger fraction of the dispersion, since most of the dispersion
is here assumed to be real.
CAMARADES: Bringing evidence to translational medicine
The expected distribution of true
effects, based on T.
Similarly, plots C vs D the ratio of true to observed variance is the same, but the
observed dispersion is larger in D. The bell curve is wider in D than in C but in both
cases a comparable proportion of the effects fall within the range of the curve
(because the ratio is the same).
CAMARADES: Bringing evidence to translational medicine
T
• T enables us to talk about the substantive
importance of the dispersion.
• An intervention with a summary effect size of
0.50.
– If T is 0.10, then most of the effects (95%) fall in the
approximate range of 0.30 to 0.70.
– If T is 0.20 then most of the true effects fall in the
approx range of 0.10 to 0.90.
– If T is 0.30 then most of the true effects fall in the
approx range of -0.10 to +0.10
CAMARADES: Bringing evidence to translational medicine
I2
• What proportion of the observed variance reflects real differences in effect size?
• The statistic I2 reflects this proportion
• That is, the ratio of excess dispersion to total dispersion. The statistic I2 can be viewed as a statistic of the form
• That is, the ratio of true heterogeneity to total variance across the observed effect estimates. However, this is not a true definition of I2 because in reality there is not a single VY, since the within-study variances vary from study to study.
• The I2 statistic is a descriptive statistic and not an estimate of any underlying quantity.
CAMARADES: Bringing evidence to translational medicine
Impact of excess dispersion on I2
• For any df, I2 moves in
tandem with Q. As such, it is
driven entirely by the ratio of
observed dispersion to
within-study dispersion.
• In the top row, both plots A
and B have a Q value of
12.00 with 5 degrees of
freedom. Therefore both
have an I2 58.34%.
• Similarly in plots C and D.
The wider scale does not
impact the I2.
CAMARADES: Bringing evidence to translational medicine
Impact of excess dispersion on I2
The scale of I2 has a range of 0-100%, regardless of the scale used for the meta-
analysis itself. It can be interpreted as a ratio, and has the additional advantage
being analogous to indices used in psychometrics (where reliability is the ratio of
true to total variance) or regression (where R2 is the proportion of the total variance
that can be explained by covariates). Importantly, I2 is not directly affected by the
number of studies in the analysis.
I2 reflects the extent of
overlap of CIs, which is not
dependent on the actual
location or spread of the
true effects. As such it is
convenient to view I2 as a
measure of inconsistency
across the findings of the
studies, and not as a
measure of the real
variation across the
underlying true effects.
CAMARADES: Bringing evidence to translational medicine
Concluding remarks I2
• I2 allows us to discuss the amount of variance on a relative scale
• We can use I2 to determine what proportion of the observed variance is real
• If I2 is near zero, then almost all of the observed variance is spurious, which means there is nothing to explain.
• If I2 is large, then it would make sense to speculate about reasons for the variance & possibly to apply techniques such as subgroup analysis or meta-regression to try & explain it. – Low 25%
– Moderate 50%
– High 75%
• This indicates what proportion of the observed variation is real but does not address the dispersion.
CAMARADES: Bringing evidence to translational medicine
Comparing the measures of
heterogeneity
• The Q statistic and its p-value serve as a test of significance. Useful because depends on
number of studies and not sensitive to the metric of the effect size index.
• T2 serves as the between-studies variance in the analysis and our estimate of T serves as
the standard deviation of the true effects. Useful because they are sensitive to the metric of
the effect size and they are not sensitive to the number of studies.
• I2 is the ratio of true heterogeneity to total variation in observed effects, a kind of signal to
noise ratio. Useful because it is not sensitive to the metric of the effect size and it is not
sensitive to the number of studies.
CAMARADES: Bringing evidence to translational medicine
• T2 and T reflect the amount of true heterogeneity
(the variance or the standard deviation)
• I2 reflects the proportion of observed dispersion
that is due to this heterogeneity.
CAMARADES: Bringing evidence to translational medicine
Above, I2 is the same in both plots but in A the true effects are clustered in a small range
(T2=0.006) while in B they are dispersed across a wider range (T2=0.037).
I2 reflects only the
proportion of
variance that is
true, and says
nothing about the
absolute value of
this variance.
T2 reflects only the
absolute value of
the true variance
and says nothing
about the
proportion of
observed variance
that is true.
Above, T2 is the same in both plots , but in A it is a large part (I2= 58.34%) of a small observed
dispersion whereas in B it is a small part (I2=16.01%) of a large observed dispersion.
CAMARADES: Bringing evidence to translational medicine
Please note
• T2 is tied to the effect size index while I2 is not. For example, T2 for a
synthesis of risk ratios will be in the metric of log risk ratios while T2
for a synthesis of standardised mean differences will be in the
metrics of standardised mean differences. It would not be
meaningful to compare the T2 values for 2 synthesis unless they
were in the same metric.
• By contrast, I2 is on a ratio scale of 0% to 100% and it is possible to
compare this value from different syntheses.
CAMARADES: Bringing evidence to translational medicine
1. The Q statistic and its p-value only address the viability of the null hypothesis and not the amount of excess dispersion.
2. Q is sensitive to relative variance (the kind tracked by I2) and not absolute variance (the kind tracked by T2 and T).
An informative presentation of heterogeneity indices requires both a measure of the magnitude and a measure of uncertainty. Magnitude may be represented by the degree of true variation on the scale of the effect measure (T2) or the degree of inconsistency (I2) or both. Uncertainty over whether apparent heterogeneity is genuine may be expressed using the p-value for Q or using confidence intervals for T2 or I2.
Note that uncertainty around T2 or I2 is often very large. If the studies themselves have poor precision (wide CIs), this could mask the presence of real heterogeneity, resulting in an estimate of zero for T2 and I2. Therefore, it would be a mistake to interpret a T2 or I2 of zero as meaning that the effect sizes are consistent unless this is justified by CIs for T2 and I2 that exclude large values.
CAMARADES: Bringing evidence to translational medicine
Confidence intervals
T2 I2
CAMARADES: Bringing evidence to translational medicine