Upload
others
View
21
Download
0
Embed Size (px)
Citation preview
One Factor ANOVA
March 5, 2019
Contents
• One-factor ANOVA• Example 1: Exam scores• The Numerator of the ANOVA - variability between the means• The Denominator of the ANOVA - variability within each group• The F-statistic - the ratio of variability between and within.• Finding the critical value of F with table E• The Summary Table• Example 2: Preferred temperature for weather sensitivity• Effect Size• Using R to conduct a one factor ANOVA• Questions• Answers
Happy birthday to Merideth Kirry!
One-factor ANOVA
The ANOVA (”ANalysis Of VAriance”) is a hypothesis test for the difference between two ormore means. This tutorial will describe how to compute the simplest ANOVA - the ’1-way’or ’1-factor’ ANOVA for independent measures.
Here’s how to get to the 1-factor ANOVA with the flow chart:
1
Test for = 0
Ch 17.2
Test for
1 =
2
Ch 17.4
2 testfrequencyCh 19.5
2 testindependence
Ch 19.9
one samplet-test
Ch 13.14
z-testCh 13.1
1-factorANOVACh 20
2-factorANOVACh 21
dependent measurest-test
Ch 16.4
independent measurest-test
Ch 15.6
number ofcorrelations
measurementscale
number ofvariables
Do youknow ?
number ofmeans
number offactors
independentsamples?
STARTHERE
1
2
correlation (r) frequency
2
1
Means
1Yes
No
More than 2 2
1
2
Yes No
The null hypothesis for the ANOVA is that that all samples are drawn from populationswith equal means and equal variances. The alternative hypothesis is that the populationsare not the same, but the variances are still the same.
The logic behind the ANOVA is to compare two estimates of the population variance. Oneis related to the variability between the means and the other is related to the variability ofscores within each group. If the null hypothesis is true, these two estimates should be onaverage the same, so their ratio, called the F-ratio, should be around one. If the populationmeans, differ, then the F-statistic will become larger than one on average. If the measuredvalue of F exceeds a critical value determined by α, then we reject the null hypothesis thatthe population means are the same.
We’ll show how it works with an example.
Example 1: Exam scores
Suppose a professor teaches the same small graduate course over three years ( 2017, 2018and 2019). You want to know if the course grades have varied significantly over these 3years. We’ll run an ANOVA using alpha value of α = 0.05. Here are the scores:
2
2017 2018 201988.4 84 76.677.5 81.3 69.683.5 80.2 85.488.2 72.3 84.484.7 87.3 74.386.2 67.5 7882.1 66.8 80.789.3 78.2 83.585.3 73 73.6
Here is a table of useful statistics from this list:
2017 2018 2019n 9 9 9mean 85.02 76.73 78.46SS 108.8156 417.0001 236.9624s 3.6881 7.2198 5.4425sem 1.2294 2.4066 1.8142
Also, the mean of all scores, called the ’grand mean’, is 80.0704.
We’ll use the numbers from this table to visualize the data using a bar graph with error barsrepresenting standard errors of the mean, just like we did with the independent-measurest-test:
3
2017 2018 2019
Exam
72
74
76
78
80
82
84
86
88
scor
e
We can use the same rule of thumb as we did for the independent measures t-test to estimatethe significance of the differences between pairs of means. Remember, if the error barsoverlap, then we will almost always fail to reject the null hypothesis for a one-tailed testwith α = .05. Since this is the most liberal of all tests, we will also fail to reject for anyother choice of tails or alphas. But remember, we can’t make separate t-tests on each pairof means - the probability of at least one type I error will be greater than α.
Looking at the graph, you might guess that there is a statistically significant differencebetween the means.
The Numerator of the ANOVA - variability between the means
The numerator of the ANOVA’s F-statistic reflects how different the means are from eachother. Under the null hypothesis, this number, called the ’variance between’, or MSbet is anestimate of the population variance.
We first calculate the sums of squared deviation of each mean from the grand mean, scaledby the sample size. This is called SSbet:
SSbet =∑n(X̄ − ¯̄X)2 =
4
(9)(85.02− 80.0704)2 + (9)(76.73− 80.0704)2 + (9)(78.46− 80.0704)2 =
220.4869 + 100.4244 + 23.3405 = 344.2518
The degrees of freedom for SSbet is the number of groups minus one: 3 - 1 = 2.
We can now calculate MSbet which is SSbet divided by its degrees of freedom:
MSbet =SSbetdfbet
= 344.25182 = 172.1259
The Denominator of the ANOVA - variability within each group
Another estimate of the population variance is the variance of the scores within each group.This number, called the ’variance within’ or MSw is calculated by first adding up the sumof the squared deviation of each score from it’s own group mean:
SSw =∑j
∑i
(Xi − X̄j)2
The table above already gives us the sum of squared deviation for each group, so the totalsums of squared within is:
SSw = 108.8156 + 417.0001 + 236.9624 = 762.7781
The degrees of freedom for SS within is the number of scores minus the number of groups:9 + 9 + 9 - 3 = 24. The variance within is SS within divided by its degrees of freedom:
MSw = SSwdfw
= 762.778124 = 31.7824
The F-statistic - the ratio of variability between and within.
Finally, we’ll calculate the observed value of F which is the ratio of MSbet and MSw:
F =MSbetMSw
= 172.125931.7824 = 5.4158
Finding the critical value of F with table E
The distribution of the F-statistic is a family of curves that varies both with dfbet and dfw.Table E provides critical values of F for α = .05 and α = .01 (bold). The rows of the tablecorrespond to dfw and the columns are for dfbet. For our example, dfbet is 2 and dfw is 24.Here’s the relevant part of the table:
5
dfw|dfb 1 2 323 4.28 3.42 3.03
7.88 5.66 4.7624 4.26 3.4 3.01
7.82 5.61 4.7225 4.24 3.39 2.99
7.77 5.57 4.68
Looking at the row for dfbet = 2 and the column for dfw = 24, we see that in bold thecritical value for F with α = 0.05 is 3.4.
Remember, both the numerator and the denominator for the ANOVA are measures of thepopulation variance under the null hypothesis. If the null hypothesis is false, then the meanswill have greater variability, so the numerator will grow large compared to the denominator.Therefore, a large value of F is evidence against the null hypothesis. Specifically, if ourobserved value of F is greater than the critical value, then we reject the null hypothesis andconclude that there is a significant difference between the means.
For our example, since our observed value of F (5.4158) is greater than the critical value ofF (3.4), we reject the null hypothesis.
Using APA format, we state: ”There is a significant difference between the means of the 3exams, F(2,24) = 5.4158, p < 0.05”
We can also use the F-calculator in the Excel spreadsheet to find the actual p-value:
Convert α to Fdfb dfw α F2 24 0.05 3.4
Convert F to αdfb dfw F α2 24 5.4158 0.0115
Using this p-value we can conclude using APA format: ”There is a significant differencebetween the means of the 3 exams, F(2,24) = 5.4158, p = 0.0115”
The Summary Table
The statistics used to generate the F statistic for an ANOVA are traditionally reported ina summary table like this:
SS df MS F Fcrit p-value
Between 344.2518 2 172.1259 5.4158 3.4 0.0115
Within 762.7781 24 31.7824
Total 1107.1563 26
6
In this next example we’ll use this table to keep track of our values.
Example 2: Preferred temperature for weather sensitivity
At the beginning of the quarter I surveyed you for your preferred outdoor temperature. Ialso asked you how much weather affected your mood with the options of Not at all, Justa little, A fair amount and Very much. Let’s see if there is a significant difference betweenthe preferred temperatures across these 4 options. We’ll us α = 0.05 again. Here’s a tableof statistics:
Not at all Just a little A fair amount Very muchn 8 26 20 36mean 68.38 70.96 71.5 73.82SS 673.8752 1510.9616 1027 1813.1504s 9.8116 7.7742 7.352 7.1975sem 3.4689 1.5246 1.644 1.1996
Totals:n 90grand mean 71.9933SStotal 5281.956
Here’s a graph of the means with error bars representing the standard error of the means:
7
Not at all Just a little A fair amount Very much
How much does weather affect your mood?
62
64
66
68
70
72
74
76
78P
refe
rred
Tem
pera
ture
(F
)
We can easily fill in some of the entries in the summary table. I’ve told you that SStotal =5281.956. The df for SStotal is the total sample size minus 1: 90 - 1 = 89.
SSw is just the sum of SSw for each cell: 673.8752 + 1510.9616 + 1027 + 1813.1504 =5024.9872. The df for SSw is the total sample size minus the number of groups: 90 - 4 =86.
SS df MS F Fcrit p-valueBetweenWithin 5024.9872 86Total 5281.956 89
SSbet can be hard to calculate, but since we know that SStotal = SSw + SSbet, we know thatSSbet = 5281.956 - 5024.9872 = 256.9688. The degrees of freedom is 4 - 1 = 3.
Just like for the last example, the variance within and between are just the SS divided bytheir df’s:
MSbet =SSbetdfbet
= 257.20083 = 85.7336
8
MSw = SSwdfw
= 5024.987286 = 58.4301
Finally, the F statistic is their ratio:
F =MSbetMSw
= 85.733658.4301 = 1.4673
The table now looks like this:
SS df MS F Fcrit p-valueBetween 257.2008 3 85.7336 1.4673Within 5024.9872 86 58.4301Total 5281.956 89
The critical value of F for α = 0.05 and dfbet = 3 and dfw = 86 can be found in table E.
Since dfw = 86 is not in the table, we’ll use 80:dfw|dfb 3
80 2.724.04
If you use the F calculaton the Excel spreadsheet you’ll see that the p-value is 0.2291:
Convert α to Fdfb dfw α F3 86 0.05 2.71
Convert F to αdfb dfw F α3 86 1.4673 0.2291
The final summary table is:
SS df MS F Fcrit p-valueBetween 257.2008 3 85.7336 1.4673 2.72 0.2291Within 5024.9872 86 58.4301Total 5281.956 89
The observed value of F (1.4673) is not greater than the critical value of F (2.72), we failto reject the null hypothesis.
Using APA format, we state: ”There is not a significant difference in preferred outdoortemperature across the 4 levels of the survey about how weather affects mood, F(3,86) =1.4673, p = 0.2291.”
Note that the ANOVA doesn’t tell us anything about exactly how preferred temperaturevaries with how weather affects mood. Looking at the bar graph, it appears that there is anupward trend so that the more weather affects mood, the higher the preferred temperature.But to make statistical conclusions like this requires a post-hoc analysis that allows you tocompare specific means to other means.
9
Effect Size
There are several measures of effect size for an ANOVA, and most software packages will
spit out more than one. Probably the most common is η2, or ’eta squared’, which is theproportion of total variance in your data that is attributed to the effect driving the differencebetween the means. It’s simply the ratio of SSbet to SStotal:
η2 =SSbetSStotal
From our last example:
η2 = 257.2015281.96 = 0.0487
If SSbet = 0, then η2 = 0. This is small as η2 can get. It happens when there areno differences between the means, so all of the variaiblity in your data is attributed tovarability within each group.
If SSw = 0, then SSbet = SStotal, which makes η2 = 1. This is as big as η2 can get. Ithappens when the means are different, but there is no variability within each group.
η2 is simple, comonly used, but tends to overestimate effect size for larger number of treat-
ments. Just so you know, η2 is sometimes called ’partial eta-squared’ for some reason.
A rule of thumb for interpreting the size of η2 is:
0.01 small0.06 medium0.14 large
Note that these values of small, medium and large are very different than those used tointerpret effect size for t-tests (Cohen’s d).
Using R to conduct a one factor ANOVA
Computing a one factor ANOVA is easy using the ’aov’ function. The tricky part is pullingout the values from the output, and using ggplot to plot bar graphs and error bars. Thefollowing R script conducts the test in Example 2 on preferred temperature and how muchweather affects mood.
The R commands shown below can be found here: OneFactorANOVA.R
# 1-factor ANOVA
#
# Conducting a 1-factor ANOVA from our survey data is easy using the ’aov’ function.
# Interpreting and plotting the results will require two libraries, ’ggplot’ for
# plotting and ’broom’ for cleaning up the output of ’aov’.
install.packages("ggplot2")
10
Installing package into ’C:/Users/Geoff Boynton/Documents/R/win-library/3.5’
(as ’lib’ is unspecified)
Error in contrib.url(repos, "source") :
trying to use CRAN without setting a mirror
Calls: install.packages -> contrib.url
Execution halted
Not at all Just a little A fair amount Very much
How much does weather affect your mood?
62
64
66
68
70
72
74
76
78
Pre
ferr
ed T
empe
ratu
re (
F)
11
Questions
Your turn again. Here are 7 practice questions based on the class survey, followed by theiranswers.
1) From our survey we can calculate Caffeiene consumption as a function of how much yousaid you liked math. Here is a table of statistics based on our survey:
Not at all Just a little A fair amount Very much
n 8 31 50 6
mean 2.06 0.77 1.45 1.17
SS 45.2188 58.9626 137.125 16.8334
ntotal 95
grand mean 1.2624
SStotal 272.5327
Calculate the standard errors of the mean for each of the 4 groups.Make a bar graph of the means for each of the 4 groups with error bars as the standarderror of the means.Using an alpha value of α =0.05, is there difference in Caffeine consumption across the 4groups of students who vary their preference for math?
2) From our survey we can calculate the preferred outdoor temperature as a func-tion of how much weather affects your mood. Here is a table of statistics based on oursurvey:
agree disagree neutral strongly agree
n 25 23 32 15
mean 1.36 1.39 1.63 1.87
SS 35.76 29.4783 51.5008 15.7335
ntotal 95
grand mean 1.5368
SStotal 135.6211
Calculate the standard errors of the mean for each of the 4 groups.Make a bar graph of the means for each of the 4 groups with error bars as the standarderror of the means.Using an alpha value of α =0.05, is there difference in number of siblings across the 4groups of how siblings varies with introversion?
3) From our survey we can calculate how much you drink across voting preferences.
12
Here is a table of statistics based on our survey:
Democrat I never (or can’t)vote
Republican
n 62 21 7
mean 2.76 1.67 4.57
SS 1303.3712 192.6669 157.7143
ntotal 90
grand mean 2.6444
SStotal 1700.6222
Calculate the standard errors of the mean for each of the 3 groups.Make a bar graph of the means for each of the 3 groups with error bars as the standarderror of the means.Using an alpha value of α =0.01, is there difference in drink per week across the 3 groupsof voting preference for students?
4) From our survey we can calculate how well you think you’ll do on Exam 1 as afunction of how much you think you’ll like this class. Here is a table of statistics based onour survey:
Not at all Just a little A fair amount Very much
n 8 31 50 6
mean 77 84.58 87.98 88.33
SS 1054 3811.5484 2222.98 433.3334
ntotal 95
grand mean 85.9684
SStotal 8460.9053
Calculate the standard errors of the mean for each of the 4 groups.Make a bar graph of the means for each of the 4 groups with error bars as the standarderror of the means.Using an alpha value of α =0.01, is there difference in predicted Exam 1 score across the 4groups of how much you think you’ll like Psych 315?
5) Suppose you want to know if how much students drink varies with how muchthey exercise. Here is a table of statistics based on our survey:
13
Not at all Just a little A fair amount Very much
n 6 26 46 17
mean 1.83 1.42 2.2 5.94
SS 80.8334 136.3464 339.24 928.9412
ntotal 95
grand mean 2.6316
SStotal 1722.1053
Calculate the standard errors of the mean for each of the 4 groups.Make a bar graph of the means for each of the 4 groups with error bars as the standarderror of the means.Using an alpha value of α =0.01, is there difference in alcoholic drinks across the 4 groupsof students who exercise?
6) From our survey we can compute how many hours per week students play videogames as a function of how much they exercise. Here is a table of statistics based on oursurvey:
Not at all Just a little A fair amount Very much
n 6 26 46 17
mean 0.17 0.62 0.86 0.88
SS 0.8334 32.1544 47.3316 17.7648
ntotal 95
grand mean 0.7526
SStotal 101.4368
Calculate the standard errors of the mean for each of the 4 groups.Make a bar graph of the means for each of the 4 groups with error bars as the standarderror of the means.Using an alpha value of α =0.05, is there difference in hours of video game playing perweek across the 4 groups of students who exercise?
7) From our survey we can see if there is a difference in the heights of studentsacross different levels of exercise. Here is a table of statistics based on our survey:
Just a little A fair amount Very much
n 26 46 17
mean 64.54 66.35 66
SS 280.4616 592.435 186
14
ntotal 89
grand mean 65.7528
SStotal 1114.5618
Calculate the standard errors of the mean for each of the 3 groups.Make a bar graph of the means for each of the 3 groups with error bars as the standarderror of the means.Using an alpha value of α =0.05, is there difference in height across the 3 groups of studentswho exercise?
15
Answers
1)
SSbet = (8)(2.06 − 1.2624)2 + (31)(0.77 − 1.2624)2 + (50)(1.45 − 1.2624)2 + (6)(1.17 −1.2624)2 = 14.4164
MSbet = 14.41643 = 4.8055
SSw = SStotal − SSbet = 272.533− 14.4164 = 258.14
MSw = 258.139891 = 2.84
F = 4.80552.8367 = 1.69
Fcrit = 2.72 (with dfbet = 3, dfw = 80 and α = 0.05)
SS df MS F Fcrit p-value
Between 14.4164 3 4.8055 1.694 2.72 0.1739
Within 258.1398 91 2.8367
Total 272.5327 94
We fail to reject H0.
There is not a significant difference in mean Caffeine consumption across the 4 groups ofstudents who vary their preference for math, F(3,91) = 1.69, p = 0.1739.
students who vary their preference for math
Not at all Just a little A fair amount Very much
0.5
1
1.5
2
2.5
3
Caf
fein
e co
nsum
ptio
n
16
SS df MS F Fcrit p-value
Between 14.4164 3 4.8055 1.694 2.72 0.1739
Within 258.1398 91 2.8367
Total 272.5327 94
17
2)
SSbet = (25)(1.36 − 1.5368)2 + (23)(1.39 − 1.5368)2 + (32)(1.63 − 1.5368)2 + (15)(1.87 −1.5368)2 = 3.2204
MSbet = 3.22043 = 1.0735
SSw = SStotal − SSbet = 135.621− 3.2204 = 132.473
MSw = 132.472691 = 1.46
F = 1.07351.4557 = 0.74
Fcrit = 2.72 (with dfbet = 3, dfw = 80 and α = 0.05)
SS df MS F Fcrit p-value
Between 3.2204 3 1.0735 0.7374 2.72 0.5324
Within 132.4726 91 1.4557
Total 135.6211 94
We fail to reject H0.
There is not a significant difference in mean number of siblings across the 4 groupsof how siblings varies with introversion, F(3,91) = 0.74, p = 0.5324.
how siblings varies with introversion
agree disagree neutral strongly agree1
1.2
1.4
1.6
1.8
2
2.2
num
ber
of s
iblin
gs
18
SS df MS F Fcrit p-value
Between 3.2204 3 1.0735 0.7374 2.72 0.5324
Within 132.4726 91 1.4557
Total 135.6211 94
19
3)
SSbet = (62)(2.76− 2.6444)2 + (21)(1.67− 2.6444)2 + (7)(4.57− 2.6444)2 = 46.7226
MSbet = 46.72262 = 23.3613
SSw = SStotal − SSbet = 1700.62− 46.7226 = 1653.75
MSw = 1653.752487 = 19.01
F = 23.361319.0086 = 1.23
Fcrit = 4.88 (with dfbet = 2, dfw = 80 and α = 0.01)
SS df MS F Fcrit p-value
Between 46.7226 2 23.3613 1.229 4.88 0.2976
Within 1653.7524 87 19.0086
Total 1700.6222 89
We fail to reject H0.
There is not a significant difference in mean drink per week across the 3 groups ofvoting preference for students, F(2,87) = 1.23, p = 0.2976.
voting preference for students
Democrat I never (or can't) vote Republican
1
2
3
4
5
6
7
drin
k pe
r w
eek
20
SS df MS F Fcrit p-value
Between 46.7226 2 23.3613 1.229 4.88 0.2976
Within 1653.7524 87 19.0086
Total 1700.6222 89
21
4)
SSbet = (8)(77− 85.9684)2 + (31)(84.58− 85.9684)2 + (50)(87.98− 85.9684)2 + (6)(88.33−85.9684)2 = 939.005
MSbet = 939.00453 = 313.0015
SSw = SStotal − SSbet = 8460.91− 939.005 = 7521.86
MSw = 7521.861891 = 82.66
F = 313.001582.6578 = 3.79
Fcrit = 4.04 (with dfbet = 3, dfw = 80 and α = 0.01)
SS df MS F Fcrit p-value
Between 939.0045 3 313.0015 3.7867 4.04 0.0131
Within 7521.8618 91 82.6578
Total 8460.9053 94
We fail to reject H0.
There is not a significant difference in mean predicted Exam 1 score across the 4groups of how much you think you’ll like Psych 315, F(3,91) = 3.79, p = 0.0131.
how much you think you'll like Psych 315
Not at all Just a little A fair amount Very much
75
80
85
90
pred
icte
d E
xam
1 s
core
22
SS df MS F Fcrit p-value
Between 939.0045 3 313.0015 3.7867 4.04 0.0131
Within 7521.8618 91 82.6578
Total 8460.9053 94
23
5)
SSbet = (6)(1.83 − 2.6316)2 + (26)(1.42 − 2.6316)2 + (46)(2.2 − 2.6316)2 + (17)(5.94 −2.6316)2 = 236.665
MSbet = 236.66523 = 78.8884
SSw = SStotal − SSbet = 1722.11− 236.665 = 1485.36
MSw = 1485.36191 = 16.32
F = 78.888416.3226 = 4.83
Fcrit = 4.04 (with dfbet = 3, dfw = 80 and α = 0.01)
SS df MS F Fcrit p-value
Between 236.6652 3 78.8884 4.8331 4.04 0.0036
Within 1485.361 91 16.3226
Total 1722.1053 94
We reject H0.
There is a significant difference in mean alcoholic drinks across the 4 groups of stu-dents who exercise, F(3,91) = 4.83, p = 0.0036.
students who exercise
Not at all Just a little A fair amount Very much
0
1
2
3
4
5
6
7
8
alco
holic
drin
ks
24
SS df MS F Fcrit p-value
Between 236.6652 3 78.8884 4.8331 4.04 0.0036
Within 1485.361 91 16.3226
Total 1722.1053 94
25
6)
SSbet = (6)(0.17 − 0.7526)2 + (26)(0.62 − 0.7526)2 + (46)(0.86 − 0.7526)2 + (17)(0.88 −0.7526)2 = 3.3002
MSbet = 3.30023 = 1.1001
SSw = SStotal − SSbet = 101.437− 3.3002 = 98.0842
MSw = 98.084291 = 1.08
F = 1.10011.0778 = 1.02
Fcrit = 2.72 (with dfbet = 3, dfw = 80 and α = 0.05)
SS df MS F Fcrit p-value
Between 3.3002 3 1.1001 1.0207 2.72 0.3873
Within 98.0842 91 1.0778
Total 101.4368 94
We fail to reject H0.
There is not a significant difference in mean hours of video game playing per weekacross the 4 groups of students who exercise, F(3,91) = 1.02, p = 0.3873.
students who exercise
Not at all Just a little A fair amount Very much
0
0.2
0.4
0.6
0.8
1
1.2
hour
s of
vid
eo g
ame
play
ing
per
wee
k
26
SS df MS F Fcrit p-value
Between 3.3002 3 1.1001 1.0207 2.72 0.3873
Within 98.0842 91 1.0778
Total 101.4368 94
27
7)
SSbet = (26)(64.54− 65.7528)2 + (46)(66.35− 65.7528)2 + (17)(66− 65.7528)2 = 55.6876
MSbet = 55.68762 = 27.8438
SSw = SStotal − SSbet = 1114.56− 55.6876 = 1058.9
MSw = 1058.896686 = 12.31
F = 27.843812.3128 = 2.26
Fcrit = 3.11 (with dfbet = 2, dfw = 80 and α = 0.05)
SS df MS F Fcrit p-value
Between 55.6876 2 27.8438 2.2614 3.11 0.1104
Within 1058.8966 86 12.3128
Total 1114.5618 88
We fail to reject H0.
There is not a significant difference in mean height across the 3 groups of studentswho exercise, F(2,86) = 2.26, p = 0.1104.
students who exercise
Just a little A fair amount Very much
64
64.5
65
65.5
66
66.5
67
heig
ht
28