40
P-values and statistical tests 4. ANOVA Hand-outs available at http://is.gd/statlec Marek Gierliński Division of Computational Biology

P-values and statistical tests - University of Dundee · 2017-04-20 · 1. Introduction Null hypothesis, statistical test, p-value Fisher’s test 2. Contingency tables Chi-square

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: P-values and statistical tests - University of Dundee · 2017-04-20 · 1. Introduction Null hypothesis, statistical test, p-value Fisher’s test 2. Contingency tables Chi-square

P-values and statistical tests4. ANOVA

Hand-outsavailableathttp://is.gd/statlec

MarekGierlińskiDivisionofComputationalBiology

Page 2: P-values and statistical tests - University of Dundee · 2017-04-20 · 1. Introduction Null hypothesis, statistical test, p-value Fisher’s test 2. Contingency tables Chi-square

1. Introduction

Nullhypothesis,statisticaltest,p-valueFisher’stest

2. Contingency tables

Chi-squaretestG-test

3. T-test

One- andtwo-samplePairedOne-samplevariancetest

4. ANOVA

One-wayTwo-way

5. Non-parametric methods 1

Mann-WhitneyWilcoxonsigned-rankKruskal-Wallis

6. Non-parametric methods 2

Kolmogorov-SmirnovPermutationBootstrap

7. Statistical power

EffectsizePowerint-testPowerin ANOVA

8. Multiple test corrections

Family-wise errorrateFalsediscoveryrateHolm-BonferronilimitBenjamini-HochberglimitStorey method

9. What’s wrong with p-values?

Alot

Page 3: P-values and statistical tests - University of Dundee · 2017-04-20 · 1. Introduction Null hypothesis, statistical test, p-value Fisher’s test 2. Contingency tables Chi-square

One-way ANOVA

Page 4: P-values and statistical tests - University of Dundee · 2017-04-20 · 1. Introduction Null hypothesis, statistical test, p-value Fisher’s test 2. Contingency tables Chi-square

One-way ANOVA

n Extensionofthet-testtomorethan2groups

n Nullhypothesis:allsamplescamefrompopulationswiththesamemean

n H0:𝜇" = 𝜇$ = ⋯ = 𝜇&

n Thenullhypothesisistestedbycomparingvariances

n ANOVA– ANalysis OfVAriance

4

Page 5: P-values and statistical tests - University of Dundee · 2017-04-20 · 1. Introduction Null hypothesis, statistical test, p-value Fisher’s test 2. Contingency tables Chi-square

Variance between and within groups

5

• Variancewithingroups– typicalvarianceineachgroup• Variancebetweengroups– howthesamplemeanvariesfromgrouptogroup

Page 6: P-values and statistical tests - University of Dundee · 2017-04-20 · 1. Introduction Null hypothesis, statistical test, p-value Fisher’s test 2. Contingency tables Chi-square

One-way ANOVA

n Nullhypothesis:allsamplescamefrompopulationswiththesamemean

n H0:𝜇" = 𝜇$ = ⋯ = 𝜇&

n Assumption:theyallhavecommonvariance𝜎$

n 𝑛 = 34 datapointsn 𝑘 = 4 groupsofdata

n 𝑛, - numberofpointsingroup𝑔n 𝑥,/ - bodymass,group𝑔,mouse𝑖n �̅�, - meaningroup𝑔n �̅� - grandmean,acrossalldatapoints

6

Page 7: P-values and statistical tests - University of Dundee · 2017-04-20 · 1. Introduction Null hypothesis, statistical test, p-value Fisher’s test 2. Contingency tables Chi-square

Variancen Onesampleofsize𝑛n Samplevariance

𝑆𝐷45"$ =1

𝑛 − 18 𝑥/ − 𝑀 $�

/

n Generalizedvariance:meansquare

𝑀𝑆 =𝑆𝑆𝜈

n whereo 𝑆𝑆 - sumofsquaredresidualso 𝜈 - numberofdegreesoffreedom

7

Samplemean

Residual

Page 8: P-values and statistical tests - University of Dundee · 2017-04-20 · 1. Introduction Null hypothesis, statistical test, p-value Fisher’s test 2. Contingency tables Chi-square

Variance within groupsn Variancewithingroupsis

𝑀𝑆< =𝑆𝑆<𝜈<

n𝑀𝑆< estimatesthecommonvariance,𝜎$,regardlessofthenullhypothesis

8

residual

groupmean

𝑆𝑆< = 88 𝑥,/ − �̅�,$

4=

/>"

&

,>"

𝜈< = 8(𝑛, − 1)&

,>"

sumofsquaredresiduals

numberofdegreesoffreedom

Page 9: P-values and statistical tests - University of Dundee · 2017-04-20 · 1. Introduction Null hypothesis, statistical test, p-value Fisher’s test 2. Contingency tables Chi-square

Variance within groupsn Variancewithingroupsis

𝑀𝑆< =𝑆𝑆<𝜈<

n𝑀𝑆< estimatesthecommonvariance,𝜎$,regardlessofthenullhypothesis

9

𝑆𝑆< = 88 𝑥,/ − �̅�,$

4=

/>"

&

,>"

𝜈< = 8(𝑛, − 1)&

,>"

sumofsquaredresiduals

numberofdegreesoffreedom

𝑆𝑆< 524

𝜈< 30

𝑀𝑆< 17.5

Page 10: P-values and statistical tests - University of Dundee · 2017-04-20 · 1. Introduction Null hypothesis, statistical test, p-value Fisher’s test 2. Contingency tables Chi-square

Variance between groupsn Variancebetweengroupsis

𝑀𝑆A =𝑆𝑆A𝜈A

n𝑀𝑆A estimatesthecommonvariance,𝜎$,only whenthenullhypothesisistrue

10

𝑆𝑆A = 8𝑛, �̅�, − �̅�$

&

,>"

𝜈A = 𝑘 − 1

sumofsquaredresiduals

numberofdegreesoffreedom

residual

groupmean

grandmean

Page 11: P-values and statistical tests - University of Dundee · 2017-04-20 · 1. Introduction Null hypothesis, statistical test, p-value Fisher’s test 2. Contingency tables Chi-square

Variance between groupsn Variancebetweengroupsis

𝑀𝑆A =𝑆𝑆A𝜈A

n𝑀𝑆A estimatesthecommonvariance,𝜎$,only whenthenullhypothesisistrue

11

𝑆𝑆A = 8𝑛, �̅�, − �̅�$

&

,>"

𝜈A = 𝑘 − 1

sumofsquaredresiduals

numberofdegreesoffreedom

𝑆𝑆< 524

𝜈< 30

𝑀𝑆< 17.5

𝑆𝑆A 623

𝜈A 3

𝑀𝑆A 208

Page 12: P-values and statistical tests - University of Dundee · 2017-04-20 · 1. Introduction Null hypothesis, statistical test, p-value Fisher’s test 2. Contingency tables Chi-square

F testn𝑀𝑆< estimatesthecommonvariance,𝜎$,regardlessofthenullhypothesis

n𝑀𝑆A estimatesthecommonvariance,𝜎$,only whenthenullhypothesisistrue

n Testforequalityofvariances:F-test

𝐹 =𝑀𝑆A𝑀𝑆<

n Degreesoffreedom:𝜈A, 𝜈<

n IfH0 istrue,weexpect𝐹~1

12

𝑆𝑆< 524

𝜈< 30

𝑀𝑆< 17.5

𝑆𝑆A 623

𝜈A 3

𝑀𝑆A 208

𝐹 11.9

Page 13: P-values and statistical tests - University of Dundee · 2017-04-20 · 1. Introduction Null hypothesis, statistical test, p-value Fisher’s test 2. Contingency tables Chi-square

Null distribution

13

Normalpopulation𝜇 = 20 g,𝜎 = 5 g

x100,000𝐹

PopulationofBritishmice𝜇 = 20 g,𝜎 = 5

Selectfoursamplessize12,9,8and5

𝐹 =𝑀𝑆A𝑀𝑆<

Builddistributionof𝐹

Page 14: P-values and statistical tests - University of Dundee · 2017-04-20 · 1. Introduction Null hypothesis, statistical test, p-value Fisher’s test 2. Contingency tables Chi-square

Null distribution = F-distribution

14

𝐹(3, 30)

Page 15: P-values and statistical tests - University of Dundee · 2017-04-20 · 1. Introduction Null hypothesis, statistical test, p-value Fisher’s test 2. Contingency tables Chi-square

Effect vs. no effect𝑀𝑆< 16.4 g2

𝑀𝑆A 15.8 g2

𝐹 0.96𝑝 0.42

𝑀𝑆< 17.5 g2

𝑀𝑆A 208 g2

𝐹 11.9𝑝 3×105P

Page 16: P-values and statistical tests - University of Dundee · 2017-04-20 · 1. Introduction Null hypothesis, statistical test, p-value Fisher’s test 2. Contingency tables Chi-square

ANOVA assumptions

n Normality– dataineachgrouparedistributednormallyo ANOVAisquiterobustagainstnon-normalityo ifstronglynotnormal(e.g.log-normal)– transformtonormalityo ifthisfails,usenon-parametricKruskal-Wallistest

n Independence– groupsareindependento dependence:e.g.,observationsofthesamesubjectsovertimeo ifgroupsarenotindependent,ANOVAisnotappropriate,useothermethods

n Equalityofvariances– groupssampledfrompopulationswiththesamevarianceo sometimescalledhomogeneityofvariances,orhomoscedasticity/ˌhoʊmoʊskəˈdæstɪsity/

o ifvariancesarenotequal,useWelch’sapproximatedtest

16

Page 17: P-values and statistical tests - University of Dundee · 2017-04-20 · 1. Introduction Null hypothesis, statistical test, p-value Fisher’s test 2. Contingency tables Chi-square

Test to compare variancesn Nullhypothesis:samplescomefrompopulationswithequalvariances

n H0:𝜎"$ = 𝜎$$ = ⋯ = 𝜎&$

n LikeANOVA,exceptdata𝑥,/ arereplacedbyresiduals𝑅,/:𝑅,/ = 𝑥,/ − �̅�, - Levene’s test

𝑅,/ = 𝑥,/ − 𝑥R, - Brown-Forsythetest

n Teststatistic:

𝑊 =𝑀𝑆A𝑀𝑆<

17

residual

groupmean

Page 18: P-values and statistical tests - University of Dundee · 2017-04-20 · 1. Introduction Null hypothesis, statistical test, p-value Fisher’s test 2. Contingency tables Chi-square

Test to compare variancesn Nullhypothesis:samplescomefrompopulationswithequalvariances

n H0:𝜎"$ = 𝜎$$ = ⋯ = 𝜎&$

n Teststatistic:

𝑊 =𝑀𝑆A𝑀𝑆<

𝑀𝑆A = 6.40 g2

𝑀𝑆< = 6.89 g2

𝑊 = 0.930𝑝 = 0.44

18

Page 19: P-values and statistical tests - University of Dundee · 2017-04-20 · 1. Introduction Null hypothesis, statistical test, p-value Fisher’s test 2. Contingency tables Chi-square

What if variances are not equal?

n B.L.Welchdevelopedanapproximatedtestn Welch,B.L.(1951),“Onthecomparisonofseveralmeanvalues:analternativeapproach”,Biometrika,38,330–336

n Skipthedetails...

n Micedata

19

𝐹 𝜈" 𝜈$ 𝑝ANOVA 11.89 3 30 2.7×10-5

Welch’stest 28.95 3 15.96 10-6

Page 20: P-values and statistical tests - University of Dundee · 2017-04-20 · 1. Introduction Null hypothesis, statistical test, p-value Fisher’s test 2. Contingency tables Chi-square

Post-hoc analysis: Tukey’s test

n Amultiplet-testn Findsdifferencesandp-valuesforeachpairofcategories

n Post-hoctest,youneedANOVAfirst

n Skipthedetails...

20

Scottish Welsh N.Irish

Welsh -1.10.95

N.Irish -12.90.00003*

-11.90.0001*

English -4.90.05

-3.90.20

8.00.006*

Page 21: P-values and statistical tests - University of Dundee · 2017-04-20 · 1. Introduction Null hypothesis, statistical test, p-value Fisher’s test 2. Contingency tables Chi-square

How to do it in R?# ANOVA

> mice = read.table('http://tiny.cc/mice_1way', header=T)

> mice.aov = aov(Mass ~ Country, data=mice)

> summary(mice.aov)

Df Sum Sq Mean Sq F value Pr(>F)

country 3 622.7 207.56 11.89 2.67e-05 ***

Residuals 30 523.9 17.46

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

# Tukey’s Honest Significant Differences

> TukeyHSD(mice.aov)

Tukey multiple comparisons of means

95% family-wise confidence level

Fit: aov(formula = Mass ~ Country, data = mice)

$Country

diff lwr upr p adj

N.Irish-English -8.001667 -14.04998948 -1.953344 0.0059422

Scottish-English 4.947222 -0.06331043 9.957755 0.0539580

Welsh-English 3.858333 -1.32806069 9.044727 0.2023039

Scottish-N.Irish 12.948889 6.61101070 19.286767 0.0000277

Welsh-N.Irish 11.860000 5.38219594 18.337804 0.0001394

Welsh-Scottish -1.088889 -6.61022696 4.432449 0.9494897

21

> miceCountry Mass

1 English 16.52 English 21.33 English 12.44 English 11.25 English 23.76 English 20.27 English 17.48 English 23.09 English 15.610 English 26.511 English 21.812 English 18.913 Scottish 19.714 Scottish 29.315 Scottish 27.116 Scottish 24.817 Scottish 22.418 Scottish 27.619 Scottish 25.720 Scottish 23.921 Scottish 15.422 Welsh 29.623 Welsh 20.724 Welsh 28.425 Welsh 19.8...

Page 22: P-values and statistical tests - University of Dundee · 2017-04-20 · 1. Introduction Null hypothesis, statistical test, p-value Fisher’s test 2. Contingency tables Chi-square

How to do it in R?# Levene’s test for equality of variances

> library(lawstat)

> levene.test(mice$Mass, mice$Country)

modified robust Brown-Forsythe Levene-type test based on the absolute deviations from the median

data: mice$Mass

Test Statistic = 0.92948, p-value = 0.4386

# Welch’s test for unequal variances

> oneway.test(Mass ~ Country, mice, var.equal=F)

One-way analysis of means (not assuming equal variances)

data: mass and country

F = 28.95, num df = 3.00, denom df = 15.96, p-value = 1.084e-06

22

Page 23: P-values and statistical tests - University of Dundee · 2017-04-20 · 1. Introduction Null hypothesis, statistical test, p-value Fisher’s test 2. Contingency tables Chi-square

Two-way ANOVA

Page 24: P-values and statistical tests - University of Dundee · 2017-04-20 · 1. Introduction Null hypothesis, statistical test, p-value Fisher’s test 2. Contingency tables Chi-square

ANOVA as a linear model (one-way)

24

𝑥/T = 𝜇/ + 𝜖/T

group replicate

measurement groupmean noise

H0:𝜇" = 𝜇$ = ⋯ = 𝜇&nullhypothesis

𝑥/T = 𝜇 + 𝛼/ + 𝜖/T

groupeffect

grandmean

H0:𝛼" = 𝛼$ = ⋯ = 𝛼& = 0∀𝑖:𝛼/ = 0

nullhypothesis

Page 25: P-values and statistical tests - University of Dundee · 2017-04-20 · 1. Introduction Null hypothesis, statistical test, p-value Fisher’s test 2. Contingency tables Chi-square

ANOVA as a linear model (two-way)

25

𝑥/ZT = 𝜇/Z + 𝜖/ZT

column replicate

measurement cellmean noise

Columnmeansareequal:

H\]^_: 𝜇". = 𝜇$. = ⋯ = 𝜇4`. or∀𝑖:𝛼/ = 0Rowmeansareequal:

H\a^b:𝜇." = 𝜇.$ = ⋯ = 𝜇.4c or ∀𝑖:𝛽/ = 0Thereisnointeractionbetweenrowsandcolumns:

H\efg: ∀𝑖, 𝑗: 𝛾/Z = 0

𝑥/ZT = 𝜇 + 𝛼/ + 𝛽Z + 𝛾/Z + 𝜖/ZT

columneffect

grandmean

row

roweffect

interactioneffect

Page 26: P-values and statistical tests - University of Dundee · 2017-04-20 · 1. Introduction Null hypothesis, statistical test, p-value Fisher’s test 2. Contingency tables Chi-square

More mice!

26

Page 27: P-values and statistical tests - University of Dundee · 2017-04-20 · 1. Introduction Null hypothesis, statistical test, p-value Fisher’s test 2. Contingency tables Chi-square

Two-way ANOVA – two variables

27

Page 28: P-values and statistical tests - University of Dundee · 2017-04-20 · 1. Introduction Null hypothesis, statistical test, p-value Fisher’s test 2. Contingency tables Chi-square

How to do it in R?# 2-way ANOVA

> mice = read.table('http://tiny.cc/mice_2way', header=T)

> mice.lm = lm(Mass ~ Country + Colour + Country*Colour, mice)

> anova(mice.lm)

Analysis of Variance Table

Response: Mass Df Sum Sq Mean Sq F value Pr(>F)

Country 3 809.68 269.893 11.9366 3.598e-06 ***

Colour 1 59.87 59.873 2.6480 0.1092

Country:Colour 3 107.39 35.797 1.5832 0.2034

Residuals 57 1288.80 22.611

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1>

28

Page 29: P-values and statistical tests - University of Dundee · 2017-04-20 · 1. Introduction Null hypothesis, statistical test, p-value Fisher’s test 2. Contingency tables Chi-square

Null hypotheses: all three true

n 𝑥/ZT = 𝜇 + 𝛼/ + 𝛽Z + 𝛾/Z + 𝜖/ZT

n 𝚨 = 0 0 0 0 , 𝚩 = 00 , 𝚪 = 0 0 0 0

0 0 0 0

29

𝑝columns 0.39rows 0.24interaction 0.97

Page 30: P-values and statistical tests - University of Dundee · 2017-04-20 · 1. Introduction Null hypothesis, statistical test, p-value Fisher’s test 2. Contingency tables Chi-square

Null hypotheses: columns not equal

n 𝑥/ZT = 𝜇 + 𝛼/ + 𝛽Z + 𝛾/Z + 𝜖/ZT

n 𝚨 = 0 10 −10 0 , 𝚩 = 00 , 𝚪 = 0 0 0 0

0 0 0 0

30

𝑝columns 2×105m

rows 0.41interaction 0.48

Page 31: P-values and statistical tests - University of Dundee · 2017-04-20 · 1. Introduction Null hypothesis, statistical test, p-value Fisher’s test 2. Contingency tables Chi-square

Null hypotheses: rows not equal

n 𝑥/ZT = 𝜇 + 𝛼/ + 𝛽Z + 𝛾/Z + 𝜖/ZT

n 𝚨 = 0 0 0 0 , 𝚩 = 100 , 𝚪 = 0 0 0 0

0 0 0 0

31

𝑝columns 0.69rows 105"$

interaction 0.30

Page 32: P-values and statistical tests - University of Dundee · 2017-04-20 · 1. Introduction Null hypothesis, statistical test, p-value Fisher’s test 2. Contingency tables Chi-square

Null hypotheses: interaction

n 𝑥/ZT = 𝜇 + 𝛼/ + 𝛽Z + 𝛾/Z + 𝜖/ZT

n 𝚨 = 0 0 0 0 , 𝚩 = 00 , 𝚪 = −10 10 0 0

10 −10 0 0

32

𝑝columns 0.60rows 0.88interaction 105"n

Page 33: P-values and statistical tests - University of Dundee · 2017-04-20 · 1. Introduction Null hypothesis, statistical test, p-value Fisher’s test 2. Contingency tables Chi-square

Time-course experimentsn Obesitystudyinmicen Twogroups:

o untreatedo treatedwithadrug

n Feedthemalotn Observebodymassovertime

n Isthereadifferencebetweenthetwogroups?

33

untreated

treated

untreated

treated

Page 34: P-values and statistical tests - University of Dundee · 2017-04-20 · 1. Introduction Null hypothesis, statistical test, p-value Fisher’s test 2. Contingency tables Chi-square

Time-course experimentsn YoucandoANOVAn 𝑝 = 5×105P

n But

n Dataarecorrelatedn ANOVAdoesn’trecognizenumericalvariables(time)n Youdon’tknowwherethechangeis

34

> dat = read.table('http://tiny.cc/time_course', header=T)> dat.lm = lm(Mass ~ Treatment + Time + Treatment*Time, dat)> anova(dat.lm)

Df Sum Sq Mean Sq F value Pr(>F) Treatment 1 85.538 85.538 20.1508 4.481e-05Time 7 272.465 38.924 9.1694 3.825e-07Treatment:Time 7 230.738 32.963 7.7652 2.907e-06

Page 35: P-values and statistical tests - University of Dundee · 2017-04-20 · 1. Introduction Null hypothesis, statistical test, p-value Fisher’s test 2. Contingency tables Chi-square

Time-course experimentsn Whataboutt-testateachtimepoint?n Workswell!n Threetimepointsaresignificantlydifferent

n But:missespoint-to-pointcorrelation

35

BH0.05limit

Individualt-testresults

Page 36: P-values and statistical tests - University of Dundee · 2017-04-20 · 1. Introduction Null hypothesis, statistical test, p-value Fisher’s test 2. Contingency tables Chi-square

Better approach: build a model

n First:understandyourdatan Buildamodelandreducetime-coursecurvestojustonenumber

n Doat-testorsimilartestonthesenumbers

n Verysimple:areaundereachcurven Thisgivesus4vs.3areas

36

Page 37: P-values and statistical tests - University of Dundee · 2017-04-20 · 1. Introduction Null hypothesis, statistical test, p-value Fisher’s test 2. Contingency tables Chi-square

Compare area under the curve

37

Page 38: P-values and statistical tests - University of Dundee · 2017-04-20 · 1. Introduction Null hypothesis, statistical test, p-value Fisher’s test 2. Contingency tables Chi-square

Chi-square or G-test vs. ANOVA

38

WT KO1 KO2 KO3G1 50,54,48 61,75,69 78,77,80 43,34,49S 172,180,172 175,168,166 162,167,180 178,173,168G2 55,50,63 45,41,38 47,49,43 59,50,45

English Scottish Welsh N.IrishWhite 19.1,20,21 22.3,21.2,25.6 18.1,19.2,22.7 15.6,16.7,15Black 21.1,20,20.5 21.1,27.5,23 22.5,18.5,19 19.1,17.7,13.5Grey 20,21,17 18.6,20.1,19.7 15,18,22 12,18.1,20.3

Fisher’stest/Chi-squaretest/G-testExperimentoutcome:categoryTablecontainscounts

ANOVAExperimentoutcome:measurement(couldbecounts)Tablecontainsmeasurements

Page 39: P-values and statistical tests - University of Dundee · 2017-04-20 · 1. Introduction Null hypothesis, statistical test, p-value Fisher’s test 2. Contingency tables Chi-square

G-test or ANOVA?

39

WT KO1 KO2 KO3Noantibiotic 77,51,92 50,83,16 70,111,78 121,147,110

Conc.1 83,51,40 66,18,49 95,109,52 75,116,109Conc.2 11,7,31 69,41,21 85,51,60 95,128,116

Bacterialantibioticresistance

• Fourstrains• Growninnormalmediumandtwoantibioticconcentrations• Dilutionplating,countcolonies

Outcomeismeasurement,notcategoryThisisnotacontingencytable!

PerhapsANOVANeedtochecknormality

Page 40: P-values and statistical tests - University of Dundee · 2017-04-20 · 1. Introduction Null hypothesis, statistical test, p-value Fisher’s test 2. Contingency tables Chi-square

Hand-outsavailableathttp://tiny.cc/statlec