Download pptx - BIOSTATISTICS

BIOSTATISTICS IICapita Selecta, 2009

Part I Analysis of VariancePart II Generalized Linear ModelsPart III Multiple regression and model buildingPart IV Sample size calculationsPart V Measuring agreementPart VI Systematic review and meta-analysis

Søren Lundbye ChristensenJohannes J. Struijk

Part IIIMultiple regression & model building

Literature: any serious book on statistics

Martin Bland, ”Introduction to medical statistics” Oxford Univ. Press, 2000,chapter 17.

Multiple regression & model building

Basic model:

Best (minimum mean square error) estimator:

Solution for b:

We immediately see a problem: if some of the independent variables are linearly related then the inverse of the covariance matrix doesn’t exist.

exbxbxbbY kk 22110

eXbY

bXY ˆˆ

xyxx

xyxx

SSb

SbS

bXXYX

1

TT

ˆ

ˆ

ˆ


Maximum voluntary contraction (MVC) of the quadriceps muscle as function of age and height of 41 alcoholics.



Model: MVC = b0 + b1xHeight + b2xAge

Multiple correlation coefficientR2 = SSReg / SST (proportion of variability accounted for)



Interaction:

MVC = b0 + b1xHeight + b2xAge + b3xHeightxAge

Note: adjusted Ra2 = 1- (1-R2)(n-1)/(n-p-1)


Polynomial regression: MVC = b0 + b1xHeight + b2xHeight2



Dichotomous variables

ExamplesSex: man / womanLiver disease: yes / no

Assign 0’s and 1’s to those variables and use the standard techniques


Variance inflation factor: VIF = 1/(1-Ri2)

VIF>10 is real problem (Ri2 >90%: 90 of influence

of xi is explained by other x’s)

Leverage: Cook’s distance (influential points)


Many variables?

Step-up (forward)Step-down (backward)Forward-backwardBest subset

F1,n-q=(SSE(q)-SSE(q+1)) / (SSE(q+1)/(n-q))

Part IVSample Size Calculation

Literature:

Machin et al., (1997), ”Sample size tables for clinical studies”, Blackwell, Oxford

Altman (1982), ”How large a sample?” In: Statistics in Practice (Eds. Gore & Altman), Blackwell Publishing Ltd., London

Lehr (1992), ”Sixteen s squared over d squared: a relation for crude sample size estimates”, Stat. in Med., 11:1099-1102

Martin Bland, ”Introduction to medical statistics” Oxford Univ. Press, 2000,chapter 18.

Sample Size Calculation

Importance of sample size

Common error to have a sample that is too small: low power, Type II error: no rejection of the null hyptoheses.

Sample Size Calculation

A little taxonomy of sample size calculations

Power – chance of rejecting the null-hypothesis if it is falseSignificance level – cutt-off level of the p-value below which

we reject the null-hypothesisVariability - e.g., standard deviation for numerical dataSmallest effect of interest – magnitude of the effect that we

want to be able to detect as being statistically significant

Sample Size Calculations

Sample size calculations are important for:- Estimation: effect on confidence intervals

- Examples: estimation of population mean estimation of correlation coefficient

- Tests: effect on confidence level and power- Example: 1-sample test

Literature: Martin Bland, ”Introduction to medical statistics” Oxford Univ. Press, 2000,chapter 18.


Methods of Sample Size Calculations

- Do the math- Special tables- Nomograms- Simulation- Computer software


Estimation of population mean μ.

Assume sample size = n.

Estimated mean:

Estimated variance:

Estimated standard error:

Confidence interval:

n

iinXM

1

1

n

iinMXS

1

21

12

nSMSE /)(

)(,)( 2/2/ MSEzMMSEzM


Estimation of population mean μ.

Width of the confidence interval:

For a desired width, Wd, of the CI we can thus calculate n:

Thus, n depends on • confidence level,• desired width of the confidence interval,• variance,• distribution of the data.

n

SzWidth 2/2

22/2

dW

Szn


Estimation of correlation coefficient ρ.

Assume sample size = n

Estimated correlation = r (has a very nasty distribution)

Fisher’s z transformation: has a normal distribution with

Mean:

SE:

r

rz

1

1ln

2

1

1

1ln

2

1

121

1ln

2

1

nz

31)(S nzE


Estimation of correlation coefficient ρ.

Confidence interval:

Example: expected r = 0.5; desired 95% CI = [0.4, 0.6]z0.4=0.424; z0.5=0.549; z0.6=0.693

z0.6-z0.5=0.144; z0.5-z0.4=0.126

31,312/2/

nzznzz

246126.03196.1 nn


Paired-sample test.

Test statistic:)(dse

dz d

zα

-zβ+μd/se(d)

μd/se(d)

0

β

H0


Altman’s nomogramAltman (1982), ”How large a sample?”, in

Statistics in practice, eds. Gore & Altman, BMA London.

Example: difference of capillary density (per mm2) in the feet of ulcerated patients (better foot minus worse foot):Min. diff. to be detected 4 mm-2

SD(difference) = 6.1Standardized difference = 2 x (4/6.1)= 1.31Required Power = 0.80Significance level = 0.05


Using the formula:

zα = 1.96 (α = 0.05)

zβ = 0.86 (Power = 80%)

Min. μd = 4.0

VAR(d) = 6.12 =37.21

n = 18

zα

-zβ+μd/se(d)

μd/se(d)

0

β

H0

Part VMeasuring agreement

Literature:

Bland, Altman, (1999), ”Measuring agreement in method comparison studies”, Stat Meth Med Res, 8:135-160

Landis, Koch, (1977), ”The measurement of observer agreement for categorical data”, Biometrics, 33:159-174

Measuring agreement

Methods used in the literature:

Data MethodOrdinal Cohen’s kappa

Spearman’s rank-order correlation coefficientKendall’s tauKendall’s coefficient of concordance

Interval/ratio Pearson’s correlation coefficientIntraclass correlation coefficientTukey’s mean-difference plot (Bland-Altman plot)

Measuring agreement

Measuring agreement

Cohen’s kappa(Ordinal data)

Doctor 1

Doctor 2

Schizo- Bipolar Other Row sum

Schizo- 31 4 2 37

Bipolar 6 29 8 43

Other 10 7 3 20

Column sum

10 7 13 100agreement rate = 0.63κ = 0.41σκ= 0.077

Measuring agreement

More than two judges (Ordinal data)For example: Kendall’s coefficient of concordance(related to Friedman’s two-way ANOVA on ranks)

MGP 2009 - Song

District 1 2 3 4 5 Totals

NJutl 1 2 3 5 4

MJutl 1 2 4 3 5

SJutl 1 2 3 5 4

Sjæll 1 2 3 4 5

Cophn 1 2 4 3 5Sum 5 10 17 20 23 T=75Sumsq 25 100 289 400 529 U=1343

Measuring agreement

Kendall’s coefficient of concordance, W

m = number of ratersn = number of classes

W = 218 / 250 = 0.872

Measuring agreement

NUMERICAL VARIABLES

Correlation coefficient

Intraclass correlation coefficient

Bland-Altman plot (Tukey plot)

Manual

Auto

mat

ed

Identity lin

e

Measuring agreement

Pearson’s product-moment correlation coefficient

Ignores bias and gain!Only for two raters.

Measuring agreement

Intraclass correlation coefficient (also for multiple raters) = Between pairs variance / Total variance.

k = number of subjects (or measured objects)n = number of raters (or methods)

This takes into account the systematic difference!

Measuring agreement

Bland-Altman plotTukey mean-difference plot

Measuring agreementBias

Proportional errorHeterogeneous variance

Part VISystematic review and meta-analysis

Literature:

Chalmers, Altman, (eds), (1995), ”Systematic reviews”, Br. Med. J. Publ. Group, London

Higgins et al., (2003), ”Measuring inconsistency in meta-analysis”, Br. Med. J., 237:557-560

Cochrane Handbook: at http://www.cochrane.org

Systematic review and meta-analysis

Systematic review =

Formalized and stringent process of combining the information from all (published and unpublished) of the same health condition.


Why systematic reviews?

Reduction of informationGeneralization to a wider populationConsistency by comparing different studiesReliability of recommendationsPower and precision increases


Meta-analysis =

Systematic review with focus on numerical results

To combine results f rom individual studies to estimate an overall / average effect of interest (example: the relative risk of getting cancer because of using mobile phones)


Meta-analysis

From a statistical angle, meta-analysis is an application of multifactorial methods:

Multiple studies of the same thing. Combine the results of the studies: - Treatment / risk factor is one independent factor- Study is a second independent factor


Meta-analysis

Clear definition of the question / effect of interest.Example:- Does lowering serum cholesterol reduce risk of dying from

coronary artery disease? - Does a diet to lower serum cholesterol reduce risk of dying

from coronary artery disease?

Study where attempt to lower cholesterol failed should be included?


Meta-analysis – PUBLICATION BIAS

Simple literature search is not good enough!- Bias towards positive results (sometimes to

negative results)- More positive results in English literature?- Unpublished studies are important.


Meta-analysis – Example from M. Bland, ch. 17





ln(o) = b0+b1T+b2S1+ ... +b5S4+b6S5+b7TS1+ ... +b11TS5

Systematic review

Example (Mailis-Gagnon et al., (2004), ”Spinal cord stimulation for chronic

pain”, The Cochrane Library, issue 3)

1692 papers : only 2 admitted to the reviewResult: further study needed(!)

http://thecochranelibrary.com

http://thecochranelibrary.com/