BIOSTATISTICS IICapita Selecta, 2009
Part I Analysis of VariancePart II Generalized Linear ModelsPart III Multiple regression and model buildingPart IV Sample size calculationsPart V Measuring agreementPart VI Systematic review and meta-analysis
Søren Lundbye ChristensenJohannes J. Struijk
Part IIIMultiple regression & model building
Literature: any serious book on statistics
Martin Bland, ”Introduction to medical statistics” Oxford Univ. Press, 2000,chapter 17.
Multiple regression & model building
Basic model:
Best (minimum mean square error) estimator:
Solution for b:
We immediately see a problem: if some of the independent variables are linearly related then the inverse of the covariance matrix doesn’t exist.
exbxbxbbY kk 22110
eXbY
bXY ˆˆ
xyxx
xyxx
SSb
SbS
bXXYX
1
TT
ˆ
ˆ
ˆ
Multiple regression & model building
Maximum voluntary contraction (MVC) of the quadriceps muscle as function of age and height of 41 alcoholics.
Multiple regression & model building
Multiple regression & model building
Model: MVC = b0 + b1xHeight + b2xAge
Multiple correlation coefficientR2 = SSReg / SST (proportion of variability accounted for)
Multiple regression & model building
Multiple regression & model building
Interaction:
MVC = b0 + b1xHeight + b2xAge + b3xHeightxAge
Note: adjusted Ra2 = 1- (1-R2)(n-1)/(n-p-1)
Multiple regression & model building
Polynomial regression: MVC = b0 + b1xHeight + b2xHeight2
Multiple regression & model building
Multiple regression & model building
Dichotomous variables
ExamplesSex: man / womanLiver disease: yes / no
Assign 0’s and 1’s to those variables and use the standard techniques
Multiple regression & model building
Variance inflation factor: VIF = 1/(1-Ri2)
VIF>10 is real problem (Ri2 >90%: 90 of influence
of xi is explained by other x’s)
Leverage: Cook’s distance (influential points)
Multiple regression & model building
Many variables?
Step-up (forward)Step-down (backward)Forward-backwardBest subset
F1,n-q=(SSE(q)-SSE(q+1)) / (SSE(q+1)/(n-q))
Part IVSample Size Calculation
Literature:
Machin et al., (1997), ”Sample size tables for clinical studies”, Blackwell, Oxford
Altman (1982), ”How large a sample?” In: Statistics in Practice (Eds. Gore & Altman), Blackwell Publishing Ltd., London
Lehr (1992), ”Sixteen s squared over d squared: a relation for crude sample size estimates”, Stat. in Med., 11:1099-1102
Martin Bland, ”Introduction to medical statistics” Oxford Univ. Press, 2000,chapter 18.
Sample Size Calculation
Importance of sample size
Common error to have a sample that is too small: low power, Type II error: no rejection of the null hyptoheses.
Sample Size Calculation
A little taxonomy of sample size calculations
Power – chance of rejecting the null-hypothesis if it is falseSignificance level – cutt-off level of the p-value below which
we reject the null-hypothesisVariability - e.g., standard deviation for numerical dataSmallest effect of interest – magnitude of the effect that we
want to be able to detect as being statistically significant
Sample Size Calculations
Sample size calculations are important for:- Estimation: effect on confidence intervals
- Examples: estimation of population mean estimation of correlation coefficient
- Tests: effect on confidence level and power- Example: 1-sample test
Literature: Martin Bland, ”Introduction to medical statistics” Oxford Univ. Press, 2000,chapter 18.
Sample Size Calculations
Methods of Sample Size Calculations
- Do the math- Special tables- Nomograms- Simulation- Computer software
Sample Size Calculations
Estimation of population mean μ.
Assume sample size = n.
Estimated mean:
Estimated variance:
Estimated standard error:
Confidence interval:
n
iinXM
1
1
n
iinMXS
1
21
12
nSMSE /)(
)(,)( 2/2/ MSEzMMSEzM
Sample Size Calculations
Estimation of population mean μ.
Width of the confidence interval:
For a desired width, Wd, of the CI we can thus calculate n:
Thus, n depends on • confidence level,• desired width of the confidence interval,• variance,• distribution of the data.
n
SzWidth 2/2
22/2
dW
Szn
Sample Size Calculations
Estimation of correlation coefficient ρ.
Assume sample size = n
Estimated correlation = r (has a very nasty distribution)
Fisher’s z transformation: has a normal distribution with
Mean:
SE:
r
rz
1
1ln
2
1
1
1ln
2
1
121
1ln
2
1
nz
31)(S nzE
Sample Size Calculations
Estimation of correlation coefficient ρ.
Confidence interval:
Example: expected r = 0.5; desired 95% CI = [0.4, 0.6]z0.4=0.424; z0.5=0.549; z0.6=0.693
z0.6-z0.5=0.144; z0.5-z0.4=0.126
31,312/2/
nzznzz
246126.03196.1 nn
Sample Size Calculations
Paired-sample test.
Test statistic:)(dse
dz d
zα
-zβ+μd/se(d)
μd/se(d)
0
β
H0
Sample Size Calculations
Altman’s nomogramAltman (1982), ”How large a sample?”, in
Statistics in practice, eds. Gore & Altman, BMA London.
Example: difference of capillary density (per mm2) in the feet of ulcerated patients (better foot minus worse foot):Min. diff. to be detected 4 mm-2
SD(difference) = 6.1Standardized difference = 2 x (4/6.1)= 1.31Required Power = 0.80Significance level = 0.05
Sample Size Calculations
Using the formula:
zα = 1.96 (α = 0.05)
zβ = 0.86 (Power = 80%)
Min. μd = 4.0
VAR(d) = 6.12 =37.21
n = 18
zα
-zβ+μd/se(d)
μd/se(d)
0
β
H0
Part VMeasuring agreement
Literature:
Bland, Altman, (1999), ”Measuring agreement in method comparison studies”, Stat Meth Med Res, 8:135-160
Landis, Koch, (1977), ”The measurement of observer agreement for categorical data”, Biometrics, 33:159-174
Measuring agreement
Methods used in the literature:
Data MethodOrdinal Cohen’s kappa
Spearman’s rank-order correlation coefficientKendall’s tauKendall’s coefficient of concordance
Interval/ratio Pearson’s correlation coefficientIntraclass correlation coefficientTukey’s mean-difference plot (Bland-Altman plot)
Measuring agreement
Measuring agreement
Cohen’s kappa(Ordinal data)
Doctor 1
Doctor 2
Schizo- Bipolar Other Row sum
Schizo- 31 4 2 37
Bipolar 6 29 8 43
Other 10 7 3 20
Column sum
10 7 13 100agreement rate = 0.63κ = 0.41σκ= 0.077
Measuring agreement
More than two judges (Ordinal data)For example: Kendall’s coefficient of concordance(related to Friedman’s two-way ANOVA on ranks)
MGP 2009 - Song
District 1 2 3 4 5 Totals
NJutl 1 2 3 5 4
MJutl 1 2 4 3 5
SJutl 1 2 3 5 4
Sjæll 1 2 3 4 5
Cophn 1 2 4 3 5Sum 5 10 17 20 23 T=75Sumsq 25 100 289 400 529 U=1343
Measuring agreement
Kendall’s coefficient of concordance, W
m = number of ratersn = number of classes
W = 218 / 250 = 0.872
Measuring agreement
NUMERICAL VARIABLES
Correlation coefficient
Intraclass correlation coefficient
Bland-Altman plot (Tukey plot)
Manual
Auto
mat
ed
Identity lin
e
Measuring agreement
Pearson’s product-moment correlation coefficient
Ignores bias and gain!Only for two raters.
Measuring agreement
Intraclass correlation coefficient (also for multiple raters) = Between pairs variance / Total variance.
k = number of subjects (or measured objects)n = number of raters (or methods)
This takes into account the systematic difference!
Measuring agreement
Bland-Altman plotTukey mean-difference plot
Measuring agreementBias
Proportional errorHeterogeneous variance
Part VISystematic review and meta-analysis
Literature:
Chalmers, Altman, (eds), (1995), ”Systematic reviews”, Br. Med. J. Publ. Group, London
Higgins et al., (2003), ”Measuring inconsistency in meta-analysis”, Br. Med. J., 237:557-560
Cochrane Handbook: at http://www.cochrane.org
Systematic review and meta-analysis
Systematic review =
Formalized and stringent process of combining the information from all (published and unpublished) of the same health condition.
Systematic review and meta-analysis
Why systematic reviews?
Reduction of informationGeneralization to a wider populationConsistency by comparing different studiesReliability of recommendationsPower and precision increases
Systematic review and meta-analysis
Meta-analysis =
Systematic review with focus on numerical results
To combine results f rom individual studies to estimate an overall / average effect of interest (example: the relative risk of getting cancer because of using mobile phones)
Systematic review and meta-analysis
Meta-analysis
From a statistical angle, meta-analysis is an application of multifactorial methods:
Multiple studies of the same thing. Combine the results of the studies: - Treatment / risk factor is one independent factor- Study is a second independent factor
Systematic review and meta-analysis
Meta-analysis
Clear definition of the question / effect of interest.Example:- Does lowering serum cholesterol reduce risk of dying from
coronary artery disease? - Does a diet to lower serum cholesterol reduce risk of dying
from coronary artery disease?
Study where attempt to lower cholesterol failed should be included?
Systematic review and meta-analysis
Meta-analysis – PUBLICATION BIAS
Simple literature search is not good enough!- Bias towards positive results (sometimes to
negative results)- More positive results in English literature?- Unpublished studies are important.
Systematic review and meta-analysis
Meta-analysis – Example from M. Bland, ch. 17
Systematic review and meta-analysis
Meta-analysis – Example from M. Bland, ch. 17
Systematic review and meta-analysis
Meta-analysis – Example from M. Bland, ch. 17
ln(o) = b0+b1T+b2S1+ ... +b5S4+b6S5+b7TS1+ ... +b11TS5
Systematic review
Example (Mailis-Gagnon et al., (2004), ”Spinal cord stimulation for chronic
pain”, The Cochrane Library, issue 3)
1692 papers : only 2 admitted to the reviewResult: further study needed(!)
http://thecochranelibrary.com