Upload
adityanarang147
View
236
Download
0
Embed Size (px)
Citation preview
8/9/2019 Anova Biometry
1/33
One-way ANOVA• Motivating Example• Analysis of Variance• Model & Assumptions
• Data Estimates of the Model• Analysis of Variance• Multiple Comparisons
• Checking Assumptions• One !ay A"OVA #ransformations
8/9/2019 Anova Biometry
2/33
Motivating Example$#reating Anorexia "ervosa
8/9/2019 Anova Biometry
3/33
Analysis of Variance• Analysis of Variance is a !idely used statistical
techni%ue that partitions the total varia ility in ourdata into components of varia ility that are used totest hypotheses'
• (n One !ay A"OVA) !e !ish to test the hypothesis$ H 0 : µ 1 = µ 2 = = µ k
against$ H a : Not all population means are the same
8/9/2019 Anova Biometry
4/33
Model & Assumptions
• #he model for the o served response is given y$
• *e assume that the errors are normally distri uted
!ith constant variance'
• #his implies that the populations eing sampled are
also normally distri uted !ith e%ual variances+
ijiij x ε α µ ++=
8/9/2019 Anova Biometry
5/33
Analysis of Variance
• (n ANOVA ) !e compare the between-groupvariation !ith the within-group variation to assess!hether there is a difference in the population
means'
• #hus y comparing these t!o measures of variance,spread- !ith one another) !e are a le to detect ifthere are true differences among the underlyinggroup population means'
8/9/2019 Anova Biometry
6/33
Analysis of Variance
• (f the variation et!een the sample means is large)relative to the variation !ithin the samples) then !e!ould e likely to detect significant differences
among the sample means'
8/9/2019 Anova Biometry
7/33
Between Group Variation is LargeCompared to Within Group Variation
Here we would almost certainly reject the null hypothesis.
8/9/2019 Anova Biometry
8/33
Analysis of Variance Between-group
variation is largecompared to theWithin-group
variation
µ 2
µ 3
µ 1
µ α 3
α 2 = µ 2 - µ
α 1
If we sampled from
these populations, wewould expect to rejectH0
ε 3 j = y3 j − µ 3
8/9/2019 Anova Biometry
9/33
Analysis of Variance
• (f the variation et!een the sample means is small)relative to the variation !ithin the samples) thenthere !ould e considera le overlap of
o servations in the different samples) and !e !oulde unlikely to detect any differences among thepopulation means'
8/9/2019 Anova Biometry
10/33
Between Group Variation is SmallCompared to Within Group Variation
Here we would fail to reject the null hypothesis.
8/9/2019 Anova Biometry
11/33
Analysis of Variance
µ . µ / . µ 0 . µ 1
All α i = 0If we sampled
from thesepopulations, we
would not expectto reject H 0
ε 2 j = y2 j - µ 2
8/9/2019 Anova Biometry
12/33
Analysis of Variance• (f !e consider all of the data together) regardless of
!hich sample the o servation elongs to) !e canmeasure the overall total varia ility in the data y$
• #his is the Total Sum of S uares , SS Total -'
• (f !e divide this sum of s%uares y its degrees offreedom , N − /-) !e !ill have a measure ofvariance'
∑∑= = ••−k
i
n
jij
i
x x1 1
2
)(
8/9/2019 Anova Biometry
13/33
Analysis of Variance• "o!) the deviation of every o servation from the overall
,grand- mean can e partitioned as$
• 2%uaring and summing across all o servations)
!e get$
iijiij x x x x x x −+−=− •••••• )()()(
∑∑ ∑∑∑∑= = = =
•= =
••••• −+−=−k
i
n
j
k
i
n
jiij
k
i
n
jiij
i ii
x x x x x x1 1 1 1
2
1 1
22 )()()(
Measure variation dueto the fact differenttreatments are used.
Measures errorvariation, variation inresponse when same
treatment is applied.
8/9/2019 Anova Biometry
14/33
Analysis of Variance• "o!) the deviation of every o servation from the overall
,grand- mean can e partitioned as$
• 2%uaring and summing across all o servations)
!e get$
iijiij x x x x x x −+−=− •••••• )()()(
∑∑ ∑∑∑∑= = = =
•= =
••••• −+−=−k
i
n
j
k
i
n
jiij
k
i
n
jiij
i ii
x x x x x x1 1 1 1
2
1 1
22 )()()(
Treatment Sum of Squares(SS Treat ) or Between GroupSum of Squares
Error Sum of Squares (SS Error ) orWithin Group Sum of Squares
8/9/2019 Anova Biometry
15/33
Analysis of Variance
• #o convert Sums of S uares , SS - into compara lemeasures of variance) !e need to divide the SS ytheir respective degrees of freedom '
• #his gives us mean s uares , MS - !hich aremeasures of variance $
MS Treat = SS Treat / df Treat = SS Treat / (k – 1) MS rror = SS rror / df rror = SS rror / ( ! – k )
8/9/2019 Anova Biometry
16/33
Analysis of Variance• #he expected values of the mean s%uares for
repeated sampling are$ E( MS Treat ) = σ 2 + Σα i2 / (k − 1)
E( MS rror ) = σ 2
• #hus MS Error is an estimate of σ 0 ) the
!ithin group variance$
• (f all the α i are 3) then the expected value for the!-ratio !ill e σ 0 4 σ 0 . /) !hile if some of the α i are
not 3) E, MS B - 5 E, MS W -) and E, F - 5 /
ˆW MS = 2σ and σ ̂=W MS
8/9/2019 Anova Biometry
17/33
Analysis of Variance "# • Our test statistic is the !-ratio ,or !-statistic -
!hich compares these t!o mean s%uares$
"ote that the greater the natural varia ility !ithinthe groups) the larger the effects , α i - !ill need toe ,as estimated y MS
Treat - for us to detect any
significant differences'
rror
Treat
MS MS
" =0
8/9/2019 Anova Biometry
18/33
Analysis of Variance• #raditionally the Analysis of Variance calculations
have een presented in an ANOVA Table '• #he format of the ta le is$
k – 1
Source of e!rees of Sum of Mean F- "atio P-value#ariation $reedom Squares SquareTreatment SS Treat MS Treat MS Treat /MS rror Tail AreaError ! – k SS
rror MS
rror Total ! – 1 SS T
hese cols add up !!" df
8/9/2019 Anova Biometry
19/33
Motivating $%ample
# 0 = $0%&$'"(%&)* = (&+'
8/9/2019 Anova Biometry
20/33
Analysis of Variance• A large !-statistic provides evidence against H 3
!hile a small !-statistic indicates that the data andH 3 are compati le'
• #o calculate a -value to test H 3 , !e compare the!-statistic !e o tained from our data to thedistri ution it !ould have under a true H 3 ) i'e' an!-distribution !ith ,k − /- and ," – k - degrees of
freedom '
• "ote that F 3 is al!ays positive) so this is al!ays aone tailed test'
8/9/2019 Anova Biometry
21/33
Analysis of Variance
hen the -value = 0&0)
When H 0 is true, # 0 # .df / ,df '
1et2s sa3 our o4served value for # was # 0 = '&(
3 / 0 1 6
3 ' 3
3 ' 0
3 ' 6
3 ' 7
3 ' 8
F-distribution
#or example, consider the # -distri4ution with + and $0 df
8/9/2019 Anova Biometry
22/33
Multiple 'omparisons• A significant !-test tells us that at least t!o of the
underlying population means are different) ut itdoes not tell us !hich ones differ from the others'
• *e need extra tests to compare all the means)
!hich !e call Multiple Comparisons'• *e look at the difference et!een every pair of
group population means) as !ell as the confidenceinterval for each difference'
• *hen !e have ( groups) there are$
possi le pair !ise comparisons'5 choose ' ( )
21
)2(22
−=−=
k k
k k k
8/9/2019 Anova Biometry
23/33
Multiple 'omparisons • (f !e estimate each comparison separately !ith 9:;
confidence) the overall error rate !ill e greaterthan )* '
• 2o) using ordinary pair !ise comparisons ,i'e' lotsof individual t-tests -) !e tend to find too manysignificant differences et!een our sample means'
• *e need to modify our intervals so that theysimultaneously contain the true differences !ith 9:;
confidence across the entire set of comparisons'• #he modified intervals are kno!n as$
simultaneous confidence intervals O<multiple comparison procedures
8/9/2019 Anova Biometry
24/33
Multiple 'omparisons• =irst) the +onferroni correction '• (nstead of using t df, α / 2 as our multiplier for the
confidence interval) !e use t df ,α / 2L ) !here is thetotal num er of possi le pair !ise comparisons
,i'e' = ( ( !" / 2 -'• #hat is) !e divide α 40 y the num er of tests to e
done , α 40L-'• #his assumes all pair !ise comparisons are
independent) !hich is not the case) so thisad>ustment is too conservative ,intervals !ill e too!ide? i'e' finds too fe! significant differences-'
8/9/2019 Anova Biometry
25/33
Multiple 'omparisons• 2econd) !e have Tu(ey #ntervals '• #he calculation of #ukey (ntervals is %uite
complicated) ut overcomes the pro lems of theunad>usted pair !ise comparisons finding too many
significant differences,i'e' confidence intervals that are too narro!-)and the @onferroni correction finding too fe!significant differences,i'e' confidence intervals that are too !ide-'
We will use u5e3 Intervals
8/9/2019 Anova Biometry
26/33
8/9/2019 Anova Biometry
27/33
#ukey air !ise Comparisons
Select Compare Means > All Pairs, Tukey HSD
!ere "e see that onl# $eha%ioral an& 'tan&ar& therapies &i erin terms o mean "ei ht ain* e estimate those in ,eha%ioraltherap# "ill ain ,et"een 2 l,s* an& 13 l,s* more on a%era e*
8/9/2019 Anova Biometry
28/33
'hec(ing Assumptions. #ndependence
• #he o servations !ithin each sample must eindependent of one another'
• #he samples must e taken from independentpopulations'
'h (i A i
8/9/2019 Anova Biometry
29/33
'hec(ing Assumptions.$ uality of Variance #
• E%uality of variance is very important in One !ay A"OVA'
• *e check e%uality of variance using Bevene s)@artlett s) @ro!n =orsythe) or O @rien s #ests '
• (f the assumption of e%ual population variances isnot satisfied , small -value from these tests-) !ecan try transforming the data or use *elch s
A"OVA !hich allo!s the variances to e une%ual'
'hec(ing Assumptions
8/9/2019 Anova Biometry
30/33
'hec(ing Assumptions.$ uality of Variance
• =or many data sets) !e often find there is arelationship et!een the centre of the dataand the spread of the data$• (n particular) samples !ith lo! means ,or
medians- often have small spread !hile samples!ith large means ,or medians- often have largespread ,or vice versa-'
• #he positive relationship et!een the mean andvariance ,or et!een the median and midspread-in different samples is often true for data thathave right ske!ed distri utions'
'hec(ing Assumptions
8/9/2019 Anova Biometry
31/33
hec(ing Assumptions.$ uality of Variance
• (f the variance of the samples isincreasing as the sample means
increase a log or s%uare roottransformation is often times used'
8/9/2019 Anova Biometry
32/33
8/9/2019 Anova Biometry
33/33
One-way ANOVA Transformations• *e can transform our response varia le if !e
detect pro lems !ith the e%uality of variance ornormality assumptions'
• o!ever) as in the t!o sample situation) !e canonly use a log transformation if !e !ish to e a leto ack transform and interpret our confidenceintervals meaningfully'