CVI Statistics Talk 2011 · CVI Statistics Michael LaValley 1/10/2011. The P-value Police? Often...

CVI Statistics

Michael LaValley1/10/2011

The P-value Police? Often researchers see statistics (and

statisticians) as barriers to publishing their important work

However, good statistics (and statisticians) can help you avoid wasting time and money following false leads

Role of Experimental Design Statistics can only be as good as the data Good data requires thoughtfully designed

experiments Some failures of animal experiments to

translate to human trials have raised the issue of experimental design of animal studies NXY-059 for Stroke (Gawrylewski 2007) Fluid resuscitation in bleeding trauma patients

(Roberts 2002)

Experimental Design A well designed experiment should Produce unbiased comparisons between

groups Provide precise estimates

Well designed experiments require Clear objectives Planning Sample size large enough to achieve the

objectives with good power

Experimental Design Comparison/Control group Concurrent controls Internal control (before and after treatment)

Replication Reduce effect of uncontrolled variation Quantify the uncertainty in the results

Randomization Computer generated

Blocking or stratification Blinding

Hypothesis Tests Hypothesis tests answer a yes/no question

about a population value Example: Quantitative assay for level of antibodies for a

virus in mice Does a vaccine have an effect on the levels of

antibodies? Null Hypothesis (H0) corresponds to no

effect Alternative Hypothesis (HA) indicates that

there is an effect

Hypothesis Tests Example: Suppose there are 10 mice available for the

experiment Assay the mice for antibodies before and after

vaccination Xi is the difference in assay values for mouse

number i Is the mean value of the Xi close to 0? No effect µ is population mean difference

Null hypothesis H0: µ=0 Alternative hypothesis HA: µ≠0

Hypothesis Tests The goal of a hypothesis test is to

reject H0 Rejecting H0 indicates that either H0 is wrong A rare event occurred (type I error)

We cannot confirm H0 on the basis of a test We may fail to reject H0, but we do not

accept H0

Hypothesis Tests Each test has an associated test statistic For a paired t-test for the mouse vaccine

We reject H0 when T > t* t* is chosen so that

Pr(Reject H0 when H0 is true) = α In this case, t* is from a t-distribution with 9

degrees of freedom (number of mice – 1)

/ 10XT

Hypothesis Tests

Values used are from the t distribution with 9 degrees of freedom

Hypothesis TestsDecision

Not Reject H0

Reject H0

TruthH0 True Right Type I

Error (α)

H0 False Type II Error (β)

Right(Power)

Unfortunately with testing comes the possibility of reaching a wrong conclusion and making an error

Hypothesis Tests Type I Error – reject H0 when it is

true (false positive finding) Hypothesis tests are set up so that the

user specifies the Type I Error rate Significance level α, almost always 0.05

Type II Error – failing to reject H0when it is false (false negative finding) As the Type I error rate is decreased, the

rate of Type II error is increased

Hypothesis Tests The significance level is the rate of

false positive findings that you are willing to live with

Power is the probability of rejecting the null hypothesis (1 - Type II Error rate) Once the significance level is set, the

Power is determined by the sample size For the alternative shown in the figure,

the power is 76%

Hypothesis Tests

For a 0.05 two sided t-test with 9 degrees of freedom, we reject the null if T<-2.26 or T>2.26

76% power if true difference is 3.0

Hypothesis Tests Role of sample size In designing an experiment, one should

determine an appropriate sample size for the goals of the experiment

Given Expected difference between groups Expected variability of measurements Significance level that will be used Power to be targeted

One can determine the sample size to achieve the study goal

Hypothesis Tests Role of sample size There are software packages and online

power calculators available for determining sample size

If the sample size is too small for the study goal, test result is likely to be negative (underpowered)

If the sample size is too large for the study goal, resources will be wasted

http://www.stat.uiowa.edu/~rlenth/Power/

Hypothesis Tests P-value Smallest level of significance for which you

would reject the Null Hypothesis with your data Probability of obtaining data as extreme as what

was found if the Null Hypothesis were true Provides a measure of the evidence against the

Null Hypothesis Small p-values (close to 0) show strong

evidence against the null hypothesis Large p-values (close to 1) show only weak

evidence against the null hypothesis

Hypothesis Tests If p-value ≤ α then reject H0

The p-value is determined by How far the data are from the Null

Hypothesis The sample size

The larger the sample, the smaller the p-value and the greater the power

Hypothesis Test Limitations P-values and hypothesis tests give a

dichotomous (significant/not significant) view of study results

Statistically significant means that the observed difference is unlikely to be due to chance Either H0 is not correct or The observed data is a rare event –

happening no more than (100*α)% of the time

Hypothesis Test Limitations Statistical significance doesn’t mean that

the observed difference is important Could find a significantly significant result with a

large sample size when the observed difference is small and unimportant

Could have a large and important difference between groups with a small sample size and not have statistical significance Would especially be the case for an underpowered

Confidence Intervals Confidence intervals show the

precision of the sample values as estimates of population values Provides a range of population values

that are consistent with the study findings

Often more informative than the p-values

Test or Interval Limitations A significance test/confidence interval

doesn’t provide a check of the study design Example: in a study of gene expression Cancer tissue samples kept on ice while the

normal tissue samples are processed Observed differences in expression may be

due to iced/not iced rather than cancer/normal

A statistical procedure will never indicate that this is the reason for the result

Role of Data Distribution Particular tests are tuned for data from the

normal (Gaussian) distribution Examples T-test Standard (Pearson) correlation

Often it is difficult to be sure that the data come from the normal distribution Plot histograms of data – bell-shaped and

symmetric? Plot ordered data values against expected

normal values – is a straight line is obtained? (called QQplots)

Plots require a substantial amount of data to be conclusive

Role of Data Distribution Some tests are specifically designed to work

reasonably well with data from any distribution Called Nonparametric or distribution-free tests Examples

Wilcoxon test (alternative to t-test) Spearman correlation (alternative to standard

correlation) In some situations these may be less likely to reject

the null hypothesis of no difference than tests based on normal data

May want to see if nonparametric results are similar to those assuming normality

Example Study question: what is the effect of

calcium on blood pressure in African-American men

Experiment: a Randomized comparison Treatment group of 10 men received a calcium

supplement for 12 weeks Control group of 11 men received a placebo

during the same period Outcome is the difference in the seated

systolic blood pressure (BP) over the 12-week period

Lyle RM, et al., "Blood pressure and metabolic effects of calcium supplementation in normotensive white and black men," JAMA, 257(1987), pp. 1772-1776

Data Distribution

Histograms by group

QQplot by group

Example These plots aren’t very useful in

determining the data distribution Don’t really suggest normality Aren’t conclusively non-normal either Ambiguity is typical with small numbers

Should probably look at both t-test and Wilcoxon test If same results – everything is fine If different results – probably trust

nonparametric more

Example The t-test is not significant at the 0.05

significance level P-value = 0.12

The Wilcoxon test is not statistically significant at the 0.05 significance level P-value = 0.33

The test results are consistent in that with either we fail to reject the null hypothesis

Important difference? Check the confidence intervals

ExampleMean Decrease in BP

95% Confidence Interval

Calcium Group 5.00 -1.26 to 11.26

Control Group -0.27 -4.24 to 3.69

Difference 5.27 -1.48 to 12.03

Example So we found a 5 mm Hg difference between

groups… Might be large enough to be important? But can’t rule out that this finding is due to

chance (P-value > α)

If 5 mm Hg is worth pursuing, would need to evaluate this in a larger sample Do the power and sample size calculation!

If not, pursue more promising therapies

Multiple-Testing Another issue to be aware of is limits

of ordinary statistical significance when doing many tests

When we use a significance level of α=0.05, we allow about 5 out of every 100 tests to be false positives

When 10s or 100s of tests are run, false positive findings are almost guaranteed

Multiple-Testing An fMRI study using a dead salmon

for a subject found several voxels with significant signal change after being shown 15 pictures http://prefrontal.org/files/posters/Benne

tt-Salmon-2009.pdf Why? Out of 8064 voxels, 16 were significant

(0.2% of voxels)

Multiple-Testing Methods exist (and new ones are

being continually developed) to deal with multiple testing issues Bonferroni correction Tukey’s method False discovery rates Which method is used is less important

than that something is done to account for the number of tests

References Triola MM, Triola MF. Biostatistics for the

Biological and Health Sciences. Pearson Education, Inc., 2006

Grafen A, Hails R. Modern Statistics for the Life Sciences. Oxford University Press, 2002

Broman K. Statistics for Laboratory Scientists I, 2006 (Course Website) http://ocw.jhsph.edu/courses/StatisticsLaboratoryScientistsI/

References Festing M. Principles: the need for better

experimental design. TRENDS in Pharmacological Sciences, 24:341-5, 2003

Roberts I, Kwan I, Evans P, Haig S. Does animal experimentation inform human healthcare? Observations from a systematic review of international animal experiments on fluid resuscitation. BMJ, 324:474-6, 2002

CVI Statistics Talk 2011 · CVI Statistics Michael LaValley 1/10/2011. The P-value Police? Often...

Documents

Session 7: Wednesday October 21, 2015: CVI. Housekeeping Assessing CVI CVI Range CVI Case Study CVI Assignment: ONLY TIME FOR QUESTIONS!!

Michael P. LaValley - Boston Universitypeople.bu.edu/mlava/Against all Odds.pdf · Against all Odds Michael P. LaValley Boston University ACR/ARHP Annual Scientific Meeting San Antonio

Hdsd 3156 cvi

LabWindows /CVI Evaluation Guide · Functionality of the LabWindows/CVI Evaluation Software This evaluation copy of LabWindows/CVI is a fully functioning version of LabWindows/CVI

EM - CVI Overview

Cvi Referat Didi

Cvi fall 2011

CVI Optical Specifications

HD-CVI solutions

CVI Quarterly

Gettingstarted CVI

CVI Quarterly CVI Quarterly

Lavalley presentation April 27 2017 - Advocis...U.S EDITION SO YOU THINK YOU ARE* READY0. TO min, RETIREat What you REALLY want to know before you take the leap! • RRY LAVALLEY •

Hdsd 3155 cvi

Getting Started with LabWindows/CVI - National … · LabWindows TM/CVI TM Getting Started with LabWindows/CVI Getting Started with LabWindows/CVI August 2012 373552H-01

CVI (Varises)

CVI Procedural Manual

Richtlijn behandeling CVI · 2020. 9. 17. · CVI. Met de vervolgprojecten “Behandelrichtlijn voor kinderen met CVI” (2017-2018) en “Richtlijn behan-deling CVI” (2019-2020)

Guideline CVI

Intervention Strategies for Infants & Toddlers with CVI ... · Determining a CVI Profile Potential Manifestations of CVI CVI Profile Data: • Medical Records • Interview/History