37
CVI Statistics Michael LaValley 1/10/2011

CVI Statistics Talk 2011 · CVI Statistics Michael LaValley 1/10/2011. The P-value Police? Often researchers see statistics (and statisticians) as barriers to publishing ... May want

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CVI Statistics Talk 2011 · CVI Statistics Michael LaValley 1/10/2011. The P-value Police? Often researchers see statistics (and statisticians) as barriers to publishing ... May want

CVI Statistics

Michael LaValley1/10/2011

Page 2: CVI Statistics Talk 2011 · CVI Statistics Michael LaValley 1/10/2011. The P-value Police? Often researchers see statistics (and statisticians) as barriers to publishing ... May want

The P-value Police? Often researchers see statistics (and

statisticians) as barriers to publishing their important work

However, good statistics (and statisticians) can help you avoid wasting time and money following false leads

Page 3: CVI Statistics Talk 2011 · CVI Statistics Michael LaValley 1/10/2011. The P-value Police? Often researchers see statistics (and statisticians) as barriers to publishing ... May want

Role of Experimental Design Statistics can only be as good as the data Good data requires thoughtfully designed

experiments Some failures of animal experiments to

translate to human trials have raised the issue of experimental design of animal studies NXY-059 for Stroke (Gawrylewski 2007) Fluid resuscitation in bleeding trauma patients

(Roberts 2002)

Page 4: CVI Statistics Talk 2011 · CVI Statistics Michael LaValley 1/10/2011. The P-value Police? Often researchers see statistics (and statisticians) as barriers to publishing ... May want

Experimental Design A well designed experiment should Produce unbiased comparisons between

groups Provide precise estimates

Well designed experiments require Clear objectives Planning Sample size large enough to achieve the

objectives with good power

Page 5: CVI Statistics Talk 2011 · CVI Statistics Michael LaValley 1/10/2011. The P-value Police? Often researchers see statistics (and statisticians) as barriers to publishing ... May want

Experimental Design Comparison/Control group Concurrent controls Internal control (before and after treatment)

Replication Reduce effect of uncontrolled variation Quantify the uncertainty in the results

Randomization Computer generated

Blocking or stratification Blinding

Page 6: CVI Statistics Talk 2011 · CVI Statistics Michael LaValley 1/10/2011. The P-value Police? Often researchers see statistics (and statisticians) as barriers to publishing ... May want

Hypothesis Tests Hypothesis tests answer a yes/no question

about a population value Example: Quantitative assay for level of antibodies for a

virus in mice Does a vaccine have an effect on the levels of

antibodies? Null Hypothesis (H0) corresponds to no

effect Alternative Hypothesis (HA) indicates that

there is an effect

Page 7: CVI Statistics Talk 2011 · CVI Statistics Michael LaValley 1/10/2011. The P-value Police? Often researchers see statistics (and statisticians) as barriers to publishing ... May want

Hypothesis Tests Example: Suppose there are 10 mice available for the

experiment Assay the mice for antibodies before and after

vaccination Xi is the difference in assay values for mouse

number i Is the mean value of the Xi close to 0? No effect µ is population mean difference

Null hypothesis H0: µ=0 Alternative hypothesis HA: µ≠0

Page 8: CVI Statistics Talk 2011 · CVI Statistics Michael LaValley 1/10/2011. The P-value Police? Often researchers see statistics (and statisticians) as barriers to publishing ... May want

Hypothesis Tests The goal of a hypothesis test is to

reject H0 Rejecting H0 indicates that either H0 is wrong A rare event occurred (type I error)

We cannot confirm H0 on the basis of a test We may fail to reject H0, but we do not

accept H0

Page 9: CVI Statistics Talk 2011 · CVI Statistics Michael LaValley 1/10/2011. The P-value Police? Often researchers see statistics (and statisticians) as barriers to publishing ... May want

Hypothesis Tests Each test has an associated test statistic For a paired t-test for the mouse vaccine

data

We reject H0 when T > t* t* is chosen so that

Pr(Reject H0 when H0 is true) = α In this case, t* is from a t-distribution with 9

degrees of freedom (number of mice – 1)

/ 10XT

s

Page 10: CVI Statistics Talk 2011 · CVI Statistics Michael LaValley 1/10/2011. The P-value Police? Often researchers see statistics (and statisticians) as barriers to publishing ... May want

Hypothesis Tests

Values used are from the t distribution with 9 degrees of freedom

Page 11: CVI Statistics Talk 2011 · CVI Statistics Michael LaValley 1/10/2011. The P-value Police? Often researchers see statistics (and statisticians) as barriers to publishing ... May want

Hypothesis TestsDecision

Not Reject H0

Reject H0

TruthH0 True Right Type I

Error (α)

H0 False Type II Error (β)

Right(Power)

Unfortunately with testing comes the possibility of reaching a wrong conclusion and making an error

Page 12: CVI Statistics Talk 2011 · CVI Statistics Michael LaValley 1/10/2011. The P-value Police? Often researchers see statistics (and statisticians) as barriers to publishing ... May want

Hypothesis Tests Type I Error – reject H0 when it is

true (false positive finding) Hypothesis tests are set up so that the

user specifies the Type I Error rate Significance level α, almost always 0.05

Type II Error – failing to reject H0when it is false (false negative finding) As the Type I error rate is decreased, the

rate of Type II error is increased

Page 13: CVI Statistics Talk 2011 · CVI Statistics Michael LaValley 1/10/2011. The P-value Police? Often researchers see statistics (and statisticians) as barriers to publishing ... May want

Hypothesis Tests The significance level is the rate of

false positive findings that you are willing to live with

Power is the probability of rejecting the null hypothesis (1 - Type II Error rate) Once the significance level is set, the

Power is determined by the sample size For the alternative shown in the figure,

the power is 76%

Page 14: CVI Statistics Talk 2011 · CVI Statistics Michael LaValley 1/10/2011. The P-value Police? Often researchers see statistics (and statisticians) as barriers to publishing ... May want

Hypothesis Tests

For a 0.05 two sided t-test with 9 degrees of freedom, we reject the null if T<-2.26 or T>2.26

76% power if true difference is 3.0

Page 15: CVI Statistics Talk 2011 · CVI Statistics Michael LaValley 1/10/2011. The P-value Police? Often researchers see statistics (and statisticians) as barriers to publishing ... May want

Hypothesis Tests Role of sample size In designing an experiment, one should

determine an appropriate sample size for the goals of the experiment

Given Expected difference between groups Expected variability of measurements Significance level that will be used Power to be targeted

One can determine the sample size to achieve the study goal

Page 16: CVI Statistics Talk 2011 · CVI Statistics Michael LaValley 1/10/2011. The P-value Police? Often researchers see statistics (and statisticians) as barriers to publishing ... May want

Hypothesis Tests Role of sample size There are software packages and online

power calculators available for determining sample size

If the sample size is too small for the study goal, test result is likely to be negative (underpowered)

If the sample size is too large for the study goal, resources will be wasted

Page 17: CVI Statistics Talk 2011 · CVI Statistics Michael LaValley 1/10/2011. The P-value Police? Often researchers see statistics (and statisticians) as barriers to publishing ... May want

http://www.stat.uiowa.edu/~rlenth/Power/

Page 18: CVI Statistics Talk 2011 · CVI Statistics Michael LaValley 1/10/2011. The P-value Police? Often researchers see statistics (and statisticians) as barriers to publishing ... May want

Hypothesis Tests P-value Smallest level of significance for which you

would reject the Null Hypothesis with your data Probability of obtaining data as extreme as what

was found if the Null Hypothesis were true Provides a measure of the evidence against the

Null Hypothesis Small p-values (close to 0) show strong

evidence against the null hypothesis Large p-values (close to 1) show only weak

evidence against the null hypothesis

Page 19: CVI Statistics Talk 2011 · CVI Statistics Michael LaValley 1/10/2011. The P-value Police? Often researchers see statistics (and statisticians) as barriers to publishing ... May want

Hypothesis Tests If p-value ≤ α then reject H0

The p-value is determined by How far the data are from the Null

Hypothesis The sample size

The larger the sample, the smaller the p-value and the greater the power

Page 20: CVI Statistics Talk 2011 · CVI Statistics Michael LaValley 1/10/2011. The P-value Police? Often researchers see statistics (and statisticians) as barriers to publishing ... May want

Hypothesis Test Limitations P-values and hypothesis tests give a

dichotomous (significant/not significant) view of study results

Statistically significant means that the observed difference is unlikely to be due to chance Either H0 is not correct or The observed data is a rare event –

happening no more than (100*α)% of the time

Page 21: CVI Statistics Talk 2011 · CVI Statistics Michael LaValley 1/10/2011. The P-value Police? Often researchers see statistics (and statisticians) as barriers to publishing ... May want

Hypothesis Test Limitations Statistical significance doesn’t mean that

the observed difference is important Could find a significantly significant result with a

large sample size when the observed difference is small and unimportant

Could have a large and important difference between groups with a small sample size and not have statistical significance Would especially be the case for an underpowered

study

Page 22: CVI Statistics Talk 2011 · CVI Statistics Michael LaValley 1/10/2011. The P-value Police? Often researchers see statistics (and statisticians) as barriers to publishing ... May want

Confidence Intervals Confidence intervals show the

precision of the sample values as estimates of population values Provides a range of population values

that are consistent with the study findings

Often more informative than the p-values

Page 23: CVI Statistics Talk 2011 · CVI Statistics Michael LaValley 1/10/2011. The P-value Police? Often researchers see statistics (and statisticians) as barriers to publishing ... May want

Test or Interval Limitations A significance test/confidence interval

doesn’t provide a check of the study design Example: in a study of gene expression Cancer tissue samples kept on ice while the

normal tissue samples are processed Observed differences in expression may be

due to iced/not iced rather than cancer/normal

A statistical procedure will never indicate that this is the reason for the result

Page 24: CVI Statistics Talk 2011 · CVI Statistics Michael LaValley 1/10/2011. The P-value Police? Often researchers see statistics (and statisticians) as barriers to publishing ... May want

Role of Data Distribution Particular tests are tuned for data from the

normal (Gaussian) distribution Examples T-test Standard (Pearson) correlation

Often it is difficult to be sure that the data come from the normal distribution Plot histograms of data – bell-shaped and

symmetric? Plot ordered data values against expected

normal values – is a straight line is obtained? (called QQplots)

Plots require a substantial amount of data to be conclusive

Page 25: CVI Statistics Talk 2011 · CVI Statistics Michael LaValley 1/10/2011. The P-value Police? Often researchers see statistics (and statisticians) as barriers to publishing ... May want

Role of Data Distribution Some tests are specifically designed to work

reasonably well with data from any distribution Called Nonparametric or distribution-free tests Examples

Wilcoxon test (alternative to t-test) Spearman correlation (alternative to standard

correlation) In some situations these may be less likely to reject

the null hypothesis of no difference than tests based on normal data

May want to see if nonparametric results are similar to those assuming normality

Page 26: CVI Statistics Talk 2011 · CVI Statistics Michael LaValley 1/10/2011. The P-value Police? Often researchers see statistics (and statisticians) as barriers to publishing ... May want

Example Study question: what is the effect of

calcium on blood pressure in African-American men

Experiment: a Randomized comparison Treatment group of 10 men received a calcium

supplement for 12 weeks Control group of 11 men received a placebo

during the same period Outcome is the difference in the seated

systolic blood pressure (BP) over the 12-week period

Lyle RM, et al., "Blood pressure and metabolic effects of calcium supplementation in normotensive white and black men," JAMA, 257(1987), pp. 1772-1776

Page 27: CVI Statistics Talk 2011 · CVI Statistics Michael LaValley 1/10/2011. The P-value Police? Often researchers see statistics (and statisticians) as barriers to publishing ... May want

Data Distribution

Histograms by group

Page 28: CVI Statistics Talk 2011 · CVI Statistics Michael LaValley 1/10/2011. The P-value Police? Often researchers see statistics (and statisticians) as barriers to publishing ... May want

QQplot by group

Page 29: CVI Statistics Talk 2011 · CVI Statistics Michael LaValley 1/10/2011. The P-value Police? Often researchers see statistics (and statisticians) as barriers to publishing ... May want

Example These plots aren’t very useful in

determining the data distribution Don’t really suggest normality Aren’t conclusively non-normal either Ambiguity is typical with small numbers

Should probably look at both t-test and Wilcoxon test If same results – everything is fine If different results – probably trust

nonparametric more

Page 30: CVI Statistics Talk 2011 · CVI Statistics Michael LaValley 1/10/2011. The P-value Police? Often researchers see statistics (and statisticians) as barriers to publishing ... May want

Example The t-test is not significant at the 0.05

significance level P-value = 0.12

The Wilcoxon test is not statistically significant at the 0.05 significance level P-value = 0.33

The test results are consistent in that with either we fail to reject the null hypothesis

Important difference? Check the confidence intervals

Page 31: CVI Statistics Talk 2011 · CVI Statistics Michael LaValley 1/10/2011. The P-value Police? Often researchers see statistics (and statisticians) as barriers to publishing ... May want

ExampleMean Decrease in BP

95% Confidence Interval

Calcium Group 5.00 -1.26 to 11.26

Control Group -0.27 -4.24 to 3.69

Difference 5.27 -1.48 to 12.03

Page 32: CVI Statistics Talk 2011 · CVI Statistics Michael LaValley 1/10/2011. The P-value Police? Often researchers see statistics (and statisticians) as barriers to publishing ... May want

Example So we found a 5 mm Hg difference between

groups… Might be large enough to be important? But can’t rule out that this finding is due to

chance (P-value > α)

If 5 mm Hg is worth pursuing, would need to evaluate this in a larger sample Do the power and sample size calculation!

If not, pursue more promising therapies

Page 33: CVI Statistics Talk 2011 · CVI Statistics Michael LaValley 1/10/2011. The P-value Police? Often researchers see statistics (and statisticians) as barriers to publishing ... May want

Multiple-Testing Another issue to be aware of is limits

of ordinary statistical significance when doing many tests

When we use a significance level of α=0.05, we allow about 5 out of every 100 tests to be false positives

When 10s or 100s of tests are run, false positive findings are almost guaranteed

Page 34: CVI Statistics Talk 2011 · CVI Statistics Michael LaValley 1/10/2011. The P-value Police? Often researchers see statistics (and statisticians) as barriers to publishing ... May want

Multiple-Testing An fMRI study using a dead salmon

for a subject found several voxels with significant signal change after being shown 15 pictures http://prefrontal.org/files/posters/Benne

tt-Salmon-2009.pdf Why? Out of 8064 voxels, 16 were significant

(0.2% of voxels)

Page 35: CVI Statistics Talk 2011 · CVI Statistics Michael LaValley 1/10/2011. The P-value Police? Often researchers see statistics (and statisticians) as barriers to publishing ... May want

Multiple-Testing Methods exist (and new ones are

being continually developed) to deal with multiple testing issues Bonferroni correction Tukey’s method False discovery rates Which method is used is less important

than that something is done to account for the number of tests

Page 36: CVI Statistics Talk 2011 · CVI Statistics Michael LaValley 1/10/2011. The P-value Police? Often researchers see statistics (and statisticians) as barriers to publishing ... May want

References Triola MM, Triola MF. Biostatistics for the

Biological and Health Sciences. Pearson Education, Inc., 2006

Grafen A, Hails R. Modern Statistics for the Life Sciences. Oxford University Press, 2002

Broman K. Statistics for Laboratory Scientists I, 2006 (Course Website) http://ocw.jhsph.edu/courses/StatisticsLaboratoryScientistsI/

Page 37: CVI Statistics Talk 2011 · CVI Statistics Michael LaValley 1/10/2011. The P-value Police? Often researchers see statistics (and statisticians) as barriers to publishing ... May want

References Festing M. Principles: the need for better

experimental design. TRENDS in Pharmacological Sciences, 24:341-5, 2003

Roberts I, Kwan I, Evans P, Haig S. Does animal experimentation inform human healthcare? Observations from a systematic review of international animal experiments on fluid resuscitation. BMJ, 324:474-6, 2002