Upload
lillie-shepperson
View
235
Download
1
Embed Size (px)
Citation preview
One-Way ANOVAOne-Way ANOVA
One-Way Analysis of VarianceOne-Way Analysis of Variance
One-Way ANOVAOne-Way ANOVA
The one-way analysis of variance is used The one-way analysis of variance is used to test the claim that three or more to test the claim that three or more population means are equalpopulation means are equal
This is an extension of the two This is an extension of the two independent samples t-testindependent samples t-test
One-Way ANOVAOne-Way ANOVA
The The responseresponse variable is the variable variable is the variable you’re comparingyou’re comparing
The The factorfactor variable is the categorical variable is the categorical variable being used to define the groupsvariable being used to define the groups We will assume We will assume kk samples (groups) samples (groups)
The The one-wayone-way is because each value is is because each value is classified in exactly one wayclassified in exactly one way Examples include comparisons by gender, Examples include comparisons by gender,
race, political party, color, etc.race, political party, color, etc.
One-Way ANOVAOne-Way ANOVA
Conditions or AssumptionsConditions or Assumptions The data are randomly sampledThe data are randomly sampled The variances of each sample are assumed The variances of each sample are assumed
equalequal The residuals are normally distributedThe residuals are normally distributed
One-Way ANOVAOne-Way ANOVA
The null hypothesis is that the means are all The null hypothesis is that the means are all equalequal
The alternative hypothesis is that at least one The alternative hypothesis is that at least one of the means is differentof the means is different Think about the Sesame StreetThink about the Sesame Street®® game where game where
three of these things are kind of the same, but one three of these things are kind of the same, but one of these things is not like the other. They don’t all of these things is not like the other. They don’t all have to be different, just one of them.have to be different, just one of them.
0 1 2 3:
kH
One-Way ANOVAOne-Way ANOVA
The statistics classroom is divided into The statistics classroom is divided into three rows: front, middle, and backthree rows: front, middle, and back
The instructor noticed that the further the The instructor noticed that the further the students were from him, the more likely students were from him, the more likely they were to miss class or use an instant they were to miss class or use an instant messenger during classmessenger during class
He wanted to see if the students further He wanted to see if the students further away did worse on the examsaway did worse on the exams
One-Way ANOVAOne-Way ANOVA
The ANOVA doesn’t test that one mean is less The ANOVA doesn’t test that one mean is less than another, only whether they’re all equal or than another, only whether they’re all equal or at least one is different.at least one is different.
0:
F M BH
One-Way ANOVAOne-Way ANOVA
A random sample of the students in each A random sample of the students in each row was takenrow was taken
The score for those students on the The score for those students on the second exam was recordedsecond exam was recorded Front:Front: 82, 83, 97, 93, 55, 67, 5382, 83, 97, 93, 55, 67, 53 Middle:Middle: 83, 78, 68, 61, 77, 54, 69, 51, 6383, 78, 68, 61, 77, 54, 69, 51, 63 Back:Back: 38, 59, 55, 66, 45, 52, 52, 6138, 59, 55, 66, 45, 52, 52, 61
One-Way ANOVAOne-Way ANOVA
The summary statistics for the grades of each row The summary statistics for the grades of each row are shown in the table beloware shown in the table below
RowRow FrontFront MiddleMiddle BackBack
Sample sizeSample size 77 99 88
MeanMean 75.7175.71 67.1167.11 53.5053.50
St. DevSt. Dev 17.6317.63 10.9510.95 8.968.96
VarianceVariance 310.90310.90 119.86119.86 80.2980.29
One-Way ANOVAOne-Way ANOVA
VariationVariation Variation is the sum of the squares of the Variation is the sum of the squares of the
deviations between a value and the mean of deviations between a value and the mean of the valuethe value
Sum of Squares is abbreviated by SS and Sum of Squares is abbreviated by SS and often followed by a variable in parentheses often followed by a variable in parentheses such as SS(B) or SS(W) so we know which such as SS(B) or SS(W) so we know which sum of squares we’re talking aboutsum of squares we’re talking about
One-Way ANOVAOne-Way ANOVA
Are all of the values identical?Are all of the values identical? No, so there is some variation in the dataNo, so there is some variation in the data This is called the total variationThis is called the total variation Denoted SS(Total) for the total Sum of Denoted SS(Total) for the total Sum of
Squares (variation)Squares (variation) Sum of Squares is another name for variationSum of Squares is another name for variation
One-Way ANOVAOne-Way ANOVA
Are all of the sample means identical?Are all of the sample means identical? No, so there is some variation between the No, so there is some variation between the
groupsgroups This is called the between group variationThis is called the between group variation Sometimes called the variation due to the Sometimes called the variation due to the
factorfactor Denoted SS(B) for Sum of Squares (variation) Denoted SS(B) for Sum of Squares (variation)
between the groupsbetween the groups
One-Way ANOVAOne-Way ANOVA
Are each of the values within each group Are each of the values within each group identical?identical? No, there is some variation within the groupsNo, there is some variation within the groups This is called the within group variationThis is called the within group variation Sometimes called the error variationSometimes called the error variation Denoted SS(W) for Sum of Squares Denoted SS(W) for Sum of Squares
(variation) within the groups(variation) within the groups
One-Way ANOVAOne-Way ANOVA
There are two sources of variationThere are two sources of variation the variation between the groups, SS(B), or the variation between the groups, SS(B), or
the variation due to the factorthe variation due to the factor the variation within the groups, SS(W), or the the variation within the groups, SS(W), or the
variation that can’t be explained by the factor variation that can’t be explained by the factor so it’s called the error variationso it’s called the error variation
One-Way ANOVAOne-Way ANOVA
Here is the basic one-way ANOVA tableHere is the basic one-way ANOVA table
SourceSource SSSS dfdf MSMS FF pp
BetweenBetween
WithinWithin
TotalTotal
One-Way ANOVAOne-Way ANOVA
Grand MeanGrand Mean The grand mean is the average of all the The grand mean is the average of all the
values when the factor is ignoredvalues when the factor is ignored It is a weighted average of the individual It is a weighted average of the individual
sample meanssample means
1 1 2 2
1 2
k k
k
n x n x n xx
n n n
1
1
k
i iik
ii
n xx
n
One-Way ANOVAOne-Way ANOVA
Grand Mean for our example is 65.08Grand Mean for our example is 65.08
7 75.71 9 67.11 8 53.50
7 9 81562
2465.08
x
x
x
One-Way ANOVAOne-Way ANOVA
Between Group Variation, SS(B)Between Group Variation, SS(B) The between group variation is the variation The between group variation is the variation
between each sample mean and the grand between each sample mean and the grand meanmean
Each individual variation is weighted by the Each individual variation is weighted by the sample sizesample size
2 2 2
1 1 2 2 k kSS B n x x n x x n x x
21
k
i ii
SS B n x x
One-Way ANOVAOne-Way ANOVA
The Between Group Variation for our example is The Between Group Variation for our example is SS(B)=1902SS(B)=1902
I know that doesn’t round to be 1902, but if you I know that doesn’t round to be 1902, but if you don’t round the intermediate steps, then it does. don’t round the intermediate steps, then it does. My goal here is to show an ANOVA table from My goal here is to show an ANOVA table from MINITAB and it returns 1902.MINITAB and it returns 1902.
2 2 2
7 75.71 65.08 9 67.11 65.08 8 53.50 65.08SS B
1900.8376 1902SS B
One-Way ANOVAOne-Way ANOVA
Within Group Variation, SS(W)Within Group Variation, SS(W) The Within Group Variation is the weighted total of The Within Group Variation is the weighted total of
the individual variationsthe individual variations The weighting is done with the degrees of freedomThe weighting is done with the degrees of freedom The df for each sample is one less than the sample The df for each sample is one less than the sample
size for that sample.size for that sample.
One-Way ANOVAOne-Way ANOVA
Within Group VariationWithin Group Variation
2
1
k
i ii
SS W df s
2 2 2
1 1 2 2 k kSS W df s df s df s
One-Way ANOVAOne-Way ANOVA
The within group variation for our example The within group variation for our example is 3386is 3386
6 310.90 8 119.86 7 80.29SS W
3386.31 3386SS W
One-Way ANOVAOne-Way ANOVA
After filling in the sum of squares, we have …After filling in the sum of squares, we have …
SourceSource SSSS dfdf MSMS FF pp
BetweenBetween 19021902
WithinWithin 33863386
TotalTotal 52885288
One-Way ANOVAOne-Way ANOVA
Degrees of Freedom, dfDegrees of Freedom, df A degree of freedom occurs for each value that can A degree of freedom occurs for each value that can
vary before the rest of the values are predeterminedvary before the rest of the values are predetermined For example, if you had six numbers that had an For example, if you had six numbers that had an
average of 40, you would know that the total had to average of 40, you would know that the total had to be 240. Five of the six numbers could be anything, be 240. Five of the six numbers could be anything, but once the first five are known, the last one is fixed but once the first five are known, the last one is fixed so the sum is 240. The df would be 6-1=5so the sum is 240. The df would be 6-1=5
The df is often one less than the number of valuesThe df is often one less than the number of values
One-Way ANOVAOne-Way ANOVA
The between group df is one less than the The between group df is one less than the number of groupsnumber of groups We have three groups, so df(B) = 2We have three groups, so df(B) = 2
The within group df is the sum of the individual The within group df is the sum of the individual df’s of each groupdf’s of each group The sample sizes are 7, 9, and 8The sample sizes are 7, 9, and 8 df(W) = 6 + 8 + 7 = 21df(W) = 6 + 8 + 7 = 21
The total df is one less than the sample sizeThe total df is one less than the sample size df(Total) = 24 – 1 = 23df(Total) = 24 – 1 = 23
One-Way ANOVAOne-Way ANOVA
Filling in the degrees of freedom gives this …Filling in the degrees of freedom gives this …
SourceSource SSSS dfdf MSMS FF pp
BetweenBetween 19021902 22
WithinWithin 33863386 2121
TotalTotal 52885288 2323
One-Way ANOVAOne-Way ANOVA
VariancesVariances The variances are also called the Mean of the The variances are also called the Mean of the
Squares and abbreviated by MS, often with an Squares and abbreviated by MS, often with an accompanying variable MS(B) or MS(W)accompanying variable MS(B) or MS(W)
They are an average squared deviation from the They are an average squared deviation from the mean and are found by dividing the variation by the mean and are found by dividing the variation by the degrees of freedomdegrees of freedom
MS = SS / dfMS = SS / df
VariationVariance
df
One-Way ANOVAOne-Way ANOVA
MS(B)MS(B) = 1902 / 2= 1902 / 2 = 951.0= 951.0 MS(W)MS(W) = 3386 / 21= 3386 / 21 = 161.2= 161.2 MS(T)MS(T) = 5288 / 23= 5288 / 23 = 229.9= 229.9
Notice that the MS(Total) is NOT the sum of Notice that the MS(Total) is NOT the sum of MS(Between) and MS(Within).MS(Between) and MS(Within).
This works for the sum of squares SS(Total), This works for the sum of squares SS(Total), but not the mean square MS(Total)but not the mean square MS(Total)
The MS(Total) isn’t usually shownThe MS(Total) isn’t usually shown
One-Way ANOVAOne-Way ANOVA
Completing the MS gives …Completing the MS gives …
SourceSource SSSS dfdf MSMS FF pp
BetweenBetween 19021902 22 951.0951.0
WithinWithin 33863386 2121 161.2161.2
TotalTotal 52885288 2323 229.9229.9
One-Way ANOVAOne-Way ANOVA
Special VariancesSpecial Variances The MS(Within) is also known as the pooled The MS(Within) is also known as the pooled
estimate of the variance since it is a weighted estimate of the variance since it is a weighted average of the individual variancesaverage of the individual variances
Sometimes abbreviated Sometimes abbreviated
The MS(Total) is the variance of the response The MS(Total) is the variance of the response variable.variable.
Not technically part of ANOVA table, but useful none the Not technically part of ANOVA table, but useful none the lessless
2
ps
One-Way ANOVAOne-Way ANOVA
F test statisticF test statistic An F test statistic is the ratio of two sample An F test statistic is the ratio of two sample
variancesvariances The MS(B) and MS(W) are two sample The MS(B) and MS(W) are two sample
variances and that’s what we divide to find F.variances and that’s what we divide to find F. F = MS(B) / MS(W)F = MS(B) / MS(W)
For our data, F = 951.0 / 161.2 = 5.9For our data, F = 951.0 / 161.2 = 5.9
One-Way ANOVAOne-Way ANOVA
Adding F to the table …Adding F to the table …
SourceSource SSSS dfdf MSMS FF pp
BetweenBetween 19021902 22 951.0951.0 5.95.9
WithinWithin 33863386 2121 161.2161.2
TotalTotal 52885288 2323 229.9229.9
One-Way ANOVAOne-Way ANOVA
The F test is a right tail testThe F test is a right tail test The F test statistic has an F distribution The F test statistic has an F distribution
with df(B) numerator df and df(W) with df(B) numerator df and df(W) denominator dfdenominator df
The p-value is the area to the right of the The p-value is the area to the right of the test statistictest statistic
P(FP(F2,212,21 > 5.9) = 0.009 > 5.9) = 0.009
One-Way ANOVAOne-Way ANOVA
Completing the table with the p-valueCompleting the table with the p-value
SourceSource SSSS dfdf MSMS FF pp
BetweenBetween 19021902 22 951.0951.0 5.95.9 0.0090.009
WithinWithin 33863386 2121 161.2161.2
TotalTotal 52885288 2323 229.9229.9
One-Way ANOVAOne-Way ANOVA
The p-value is 0.009, which is less than The p-value is 0.009, which is less than the significance level of 0.05, so we reject the significance level of 0.05, so we reject the null hypothesis.the null hypothesis.
The null hypothesis is that the means of The null hypothesis is that the means of the three rows in class were the same, but the three rows in class were the same, but we reject that, so at least one row has a we reject that, so at least one row has a different mean.different mean.
One-Way ANOVAOne-Way ANOVA
There is enough evidence to support the There is enough evidence to support the claim that there is a difference in the mean claim that there is a difference in the mean scores of the front, middle, and back rows scores of the front, middle, and back rows in class.in class.
The ANOVA doesn’t tell which row is The ANOVA doesn’t tell which row is different, you would need to look at different, you would need to look at confidence intervals or run post hoc tests confidence intervals or run post hoc tests to determine thatto determine that