39
CCEB Topics in Biostatistics Part 2 Sarah J. Ratcliffe, Ph.D. Sarah J. Ratcliffe, Ph.D. Center for Clinical Epidemiology and Center for Clinical Epidemiology and Biostatistics Biostatistics University of Penn School of Medicine University of Penn School of Medicine

Topics in Biostatistics: Part II

Embed Size (px)

Citation preview

Page 1: Topics in Biostatistics: Part II

CCEB

Topics in BiostatisticsPart 2

Sarah J. Ratcliffe, Ph.D.Sarah J. Ratcliffe, Ph.D.Center for Clinical Epidemiology and Center for Clinical Epidemiology and

BiostatisticsBiostatisticsUniversity of Penn School of Medicine University of Penn School of Medicine

Page 2: Topics in Biostatistics: Part II

CCEB

Outline

Hypothesis testingHypothesis testing ExamplesExamples Interpreting resultsInterpreting results ResourcesResources

Page 3: Topics in Biostatistics: Part II

CCEB

Hypothesis testing

Steps:Steps: Select a one-sided or two-sided test.Select a one-sided or two-sided test. Establish the level of significance (e.g., Establish the level of significance (e.g., αα

= .05).= .05). Select an appropriate test statistic.Select an appropriate test statistic. Compute test statistic with actual data.Compute test statistic with actual data. Calculate degrees of freedom (Calculate degrees of freedom (dfdf) for the ) for the

test statistic.test statistic.

Page 4: Topics in Biostatistics: Part II

CCEB

Hypothesis testing

Steps cont’d:Steps cont’d: Obtain a tabled value for the statistical Obtain a tabled value for the statistical

test.test. Compare the test statistic to the tabled Compare the test statistic to the tabled

value.value. Calculate a p-value.Calculate a p-value.

Make decision to accept or reject null Make decision to accept or reject null hypothesis.hypothesis.

Page 5: Topics in Biostatistics: Part II

CCEB

Hypothesis testing

Steps:Steps: Select a one-sided or two-sided test.Select a one-sided or two-sided test. Establish the level of significance (e.g., Establish the level of significance (e.g., αα = .05). = .05). Select an appropriate test statistic.Select an appropriate test statistic. Compute test statistic with actual data.Compute test statistic with actual data. Calculate degrees of freedom (Calculate degrees of freedom (dfdf) for the test ) for the test

statistic.statistic.

Page 6: Topics in Biostatistics: Part II

CCEB

Hypothesis testing: One-sided versus Two-sided

Determined by the alternative hypothesis. Unidirectional = one-sided

Example: Infected macaques given vaccine or placebo. Higherviral-replication in vaccine group has no benefit ofinterest.

H0: vaccine has no beneficial effect on viral-replication levels at 6 weeks after infection.

Ha: vaccine lowers viral-replication levels by 6 weeks after infection.

Page 7: Topics in Biostatistics: Part II

CCEB

Hypothesis testing: One-sided versus Two-sided

Bi-directional = two-sidedExample:

Infected macaques given vaccine or placebo. Interested in whether vaccine has any effect on viral-replication levels, regardless of direction of effect.

H0: vaccine has no beneficial effect on viral-replication levels at 6 weeks after infection.

Ha: vaccine effects the viral-replication levels.

Page 8: Topics in Biostatistics: Part II

CCEB

Hypothesis testing

Steps:Steps: Select a one-sided or two-sided test.Select a one-sided or two-sided test. Establish the level of significance (e.g., Establish the level of significance (e.g., αα = . = .

05).05). Select an appropriate test statistic.Select an appropriate test statistic. Compute test statistic with actual data.Compute test statistic with actual data. Calculate degrees of freedom (Calculate degrees of freedom (dfdf) for the test ) for the test

statistic.statistic.

Page 9: Topics in Biostatistics: Part II

CCEB

Hypothesis testing: Level of Significance

How many different hypotheses are being examining?

How many comparisons are needed to answer this hypothesis?

Are any interim analyses planned?e.g. test data, depending on results

collect more data and re-test.=>=> How many tests will be ran in total?How many tests will be ran in total?

Page 10: Topics in Biostatistics: Part II

CCEB

Hypothesis testing: Level of Significance

αtotal = desired total Type-I error (false positives) for all comparisons.

One test α1 = αtotal

Multiple tests / comparisons If αi = αtotal, then ∑αi > αtotal

Need to use a smaller α for each test.

Page 11: Topics in Biostatistics: Part II

CCEB

Hypothesis testing: Level of Significance

Conservative approach: αi = αtotal / number comparisons

Can give different α’s to each comparison. Formal methods include: Bonferroni, Tukey-

Cramer, Scheffe’s method, Duncan-Walker. O’Brien-Fleming boundary or a Lan and Demets analog

can be used to determine αi for interim analyses.

Benjamini Y, and Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. JRSSB, 57:125-133.

Page 12: Topics in Biostatistics: Part II

CCEB

Hypothesis testing

Steps:Steps: Select a one-tailed or two-tailed test.Select a one-tailed or two-tailed test. Establish the level of significance (e.g., Establish the level of significance (e.g., αα = .05). = .05). Select an appropriate test statistic.Select an appropriate test statistic. Compute test statistic with actual data.Compute test statistic with actual data. Calculate degrees of freedom (Calculate degrees of freedom (dfdf) for the test ) for the test

statistic.statistic.

Page 13: Topics in Biostatistics: Part II

CCEB

Hypothesis testing: Selecting an Appropriate test

How many samples are being compared? One sample Two samples Multi-samples

Are these samples independent? Unrelated subjects in each sample. Subjects in each sample related / same.

Page 14: Topics in Biostatistics: Part II

CCEB

Hypothesis testing: Selecting an Appropriate test

Are your variables continuous or categorical? If continuous, is the data normally distributed?

Normality can be determined using a P-P

(or Q-Q) plot. Plot should be approximately a straight line

for normality. If not normal, can it be transformed to

normality?Blindly assuming normality can lead to

wrong conclusions!!!

Page 15: Topics in Biostatistics: Part II

CCEB

Hypothesis testing: Selecting an Appropriate test

Approximately a straight line

= normal assumption okay

Page 16: Topics in Biostatistics: Part II

CCEB

Hypothesis testing: Selecting an Appropriate test

Not a straight line

= NOT normal

Can it be transformed to normality?

Page 17: Topics in Biostatistics: Part II

CCEB

Hypothesis testing: Selecting an Appropriate test

The natural log transform of the data is approximately a straight line

= normal assumption okay

Analyze the transformed data NOT the original data.

Page 18: Topics in Biostatistics: Part II

CCEB

Hypothesis testing: Geometric versus Arithmetic mean

GeometricGeometric mean of n positive numerical values is mean of n positive numerical values is the nth root of the product of the n values. the nth root of the product of the n values.

GeometricGeometric will always be will always be less thanless than arithmeticarithmetic.. GeometricGeometric better when some values are very large better when some values are very large

in magnitude and others are small.in magnitude and others are small. If If geometricgeometric is used, log-transform the data before is used, log-transform the data before

analyzing. analyzing. Arithmetic mean of log-transformed data is the Arithmetic mean of log-transformed data is the

log of the geometric mean of the data log of the geometric mean of the data E.g. t-test on log-transformed data = test for E.g. t-test on log-transformed data = test for

location of the geometric mean location of the geometric mean Langley R., Langley R., Practical Statistics Simply ExplainedPractical Statistics Simply Explained, 1970, , 1970,

Dover Press Dover Press

Page 19: Topics in Biostatistics: Part II

CCEB

Source: Richardson & Overbaugh (2005). Basic statistical considerations in virological experiments. Journal of Virology, 29(2): 669-676.

Type of Data

No. of samplesbeing

compared

Relationshipbetweensamples

Underlyingdistribution ofall samples Potential statistical tests

Binary 1 n/a Binary One sample binomial test

Binary 2 Independent Binary Chi-square test, Fisher's exact test

Binary >2 Independent Binary Chi-square test

Binary 2 Paired Binary McNemar's test

Binary >2 Related Binary Cochran's Q test

Continuous 1 n/a NormalOne sample t-test for means, one-

sample chi-square test fro variances

Continuous 1 n/a Non-normalOne sample Wilcoxon signed-rank test,

one-sample sign test

Continuous 2 Independent NormalTwo-sample t-test for means, two-sample

F test for variances

Continuous 2 Independent Non-normal Wilcoxon rank sum test

Continuous 2 Paired Normal Paired t-test

Continuous 2 Paired Non-normal Wilcoxon signed-rank test, sign test

Continuous >2 Independent NormalOne-way ANOVA for means, Bartlett's

test of homogeneity for variances

Continuous >2 Independent Non-normal Kruskal-Wallis test

Continuous >2 Related Non-normal Friedman rank sum test

Page 20: Topics in Biostatistics: Part II

CCEB

Hypothesis testing: Selecting an Appropriate test

Other tests are available for more complex situations. For example,

Repeated measures ANOVA: >2 measurements taken on each subject; usually interested in time effect.

GEEs / Mixed-effects models: >2 measurements taken on each subject; adjust for other covariates.

Page 21: Topics in Biostatistics: Part II

CCEB

Hypothesis testing

Steps:Steps: Select a one-tailed or two-tailed test.Select a one-tailed or two-tailed test. Establish the level of significance (e.g., Establish the level of significance (e.g., αα = .05). = .05). Select an appropriate test statistic.Select an appropriate test statistic. Run the testRun the test..

Page 22: Topics in Biostatistics: Part II

CCEB

Example 1

Expression of chemokine receptors on CD14+/CD14- populations of blood monocytes.

Percent of cells positive by FACS.

Page 23: Topics in Biostatistics: Part II

CCEB

CCR8

subject  CD14+ CD14-

1 5 17

2 9 25

3 13 36

4 2 9

5 5 18

6 0 2

7 6 6

8 21 30

9 5 6

10 36 35

mean 10.2 18.4

st dev 10.9 12.6

st error 3.4 4.0

Page 24: Topics in Biostatistics: Part II

CCEB

Example 1 cont’d

Continuous data, 2 samples=> t-test, if normal OR=> Wilcoxon rank sum or signed-rank

sum test, if non-normal Are samples independent or paired?

If independent, can test for equality of variances using a Levene’s test

Page 25: Topics in Biostatistics: Part II

CCEB

Example 1 cont’d

T-tests in excel

=TTEST(L6:L15,M6:M15,2,2)

Cells containing data from sample 1

Cells containing data from sample 2

1-sided or 2-sided test

Type of t-test:

1: paired

2: independent, equal variance

3: independent, unequal variance

Page 26: Topics in Biostatistics: Part II

CCEB

Page 27: Topics in Biostatistics: Part II

CCEB

Example 1 cont’d Possible results for different assumptions:

P-valuesP-values Normal Normal (t-tests)(t-tests)

Non-normal Non-normal (non-parametric (non-parametric

tests)tests)

Independent, Independent, equal varianceequal variance

0.1370.137

Independent, Independent, unequal varianceunequal variance

0.1370.137 0.1050.105

PairedPaired 0.0100.010 0.0130.013

Page 28: Topics in Biostatistics: Part II

CCEB

Example 1 cont’d

Which result is correct? Data are paired The differences for each subject are

normally distributed.=> Paired t-test

p = .0095There is a difference in the percentage of

positive CD14+ and CD14- cells.

Page 29: Topics in Biostatistics: Part II

CCEB

A graph of the 95% CIs for the means would give the impression there is no difference …

Page 30: Topics in Biostatistics: Part II

CCEB

When it’s really the differences we are testing.

Page 31: Topics in Biostatistics: Part II

CCEB

Example 1 cont’d

Note: paired tests don’t always give lower p-values.

A 1-sided test on the CCR5 values would give p-values of:

p = 0.06 independent samplesp = 0.11 paired samples

WHY?

Page 32: Topics in Biostatistics: Part II

CCEB

Example 1 cont’d

The differences have a larger spread than the individual variables.

Page 33: Topics in Biostatistics: Part II

CCEB

Example 2

Does the level of CCR5 expression on PBLs (basal or upregulated using lentiviral vector) determine the % of entry that occurs via CCR5?

Two viruses 89.6 DH12

Page 34: Topics in Biostatistics: Part II

CCEB

Example 2 cont’dCCR5-mediated entry into PBL from 6 donors

89.6y = 3.7371x - 0.1265

R2 = 0.4473

DH12y = 4.1408x + 4.2137

R2 = 0.4333

0

4

8

12

16

20

0 0.5 1 1.5 2 2.5

% of cells CCR5 positive

% o

f e

ntr

y m

ed

iate

d b

y C

CR

5

89.6

DH12

Linear (89.6)

Linear (DH12)

Page 35: Topics in Biostatistics: Part II

CCEB

Example 2 cont’d

How do we know if the slope of the line is significantly different from 0?

Can perform a t-test on the slope estimate. For simple linear regression, this is the same as a t-test for correlation (= square root of R2).

Page 36: Topics in Biostatistics: Part II

CCEB

Example 2 cont’d

Page 37: Topics in Biostatistics: Part II

CCEB

Interpreting Results

P-values Is there a statistically significant result? If not, was the sample size large

enough to detect a biologically meaningful difference?

Page 38: Topics in Biostatistics: Part II

CCEB

Online Resources

Power / sample size calculatorsPower / sample size calculators http://calculators.stat.ucla.edu/powercalc/http://calculators.stat.ucla.edu/powercalc/ http://www.stat.uiowa.edu/~rlenth/Power/http://www.stat.uiowa.edu/~rlenth/Power/

Free statistical softwareFree statistical software http://members.aol.com/johnp71/javasta2.html#http://members.aol.com/johnp71/javasta2.html#

FreebiesFreebies

Page 39: Topics in Biostatistics: Part II

CCEB

BECC – Consulting Center

www.cceb.upenn.edu/main/center/becc.htmlwww.cceb.upenn.edu/main/center/becc.html Hourly fee serviceHourly fee service Design and analysis strategies for research Design and analysis strategies for research

proposals; proposals; Selecting and implementing appropriate statistical Selecting and implementing appropriate statistical

methods for specific applications to research data; methods for specific applications to research data; Statistical and graphical analysis of data; Statistical and graphical analysis of data; Statistical review of manuscripts.Statistical review of manuscripts.