STAT 430 (Fall 2017): Tutorial 5
Two-way ANOVA
Luyao Lin
October 17/19, 2017
Department Statistics and Actuarial Science, Simon Fraser University
Outlines
Two-way ANOVA
• two treatment factors
• equal sample-size
• unequal sample-size
1
Battery Data Description
Brief Background: An engineer is designing a battery for use in a device
that will be subjected to some extreme variations in temperature. The
only design parameter that he can select at this point is the plate
material for the battery, and he has three possible choices. He also know
from experience that temperature will affect the effective battery life. So,
he also includes that as a factor in the battery life experiment.
The engineer decides to test all three plate materials at three
temperature levels: 15, 70, and 125 ◦F , because these temperature levels
are consistent with the product end-use environment.
2
Data Description
Table 1: Life data (in hours) for the Battery Design Experiment
Temperature (◦F )
Material 15 70 125 yi··
130 155 34 40 20 70
1 74 180 80 75 82 58 83.17
(134.75) (57.25) (57.5)
150 188 136 122 25 70
2 159 126 106 115 58 45 108.33
(155.75) (119.75) (49.5)
138 110 174 120 96 104
3 168 160 150 139 82 60 125.08
(144.00) (145.75) (85.5)
y·j· 144.83 107.58 64.17 105.53
Note: the numbers in parantheses are averages of each pair of levels of
the two factors. 3
Statistical Models:
Cell-means model:
yijt = µ+ τij + εijt ,
Two-way complete model:
yijt = µ+ αi + βj + (αβ)ij + εijk ,
Two-way main-effects model
yijt = µ+ αi + βj + εijt ,
where
• i = 1, . . . , a = 3; j = 1, . . . , b = 3; t = 1, . . . , n(= r).
• µ is the overall mean;
• ε is still the random error
4
Cell-means Model
yijt = µ+ τij + εijt ,
• with a constraint:∑a
i=1
∑bj=1 τij = 0, why this is necessary?
• Similar to one-way anova
yijt ∼ N(µ+ τij , σ2)
• Each combination of two treatments is considered as a new
treatment
• we have ab treatment in total
• the null hypothesis:
τij = 0 ∀i , j
the alternative hypothesis:
at least one τij not equal to 0
5
cell-means model = two-way complete model
yijt = µ+ αi + βj + (αβ)ij + εijk ,
if τij = αi + βj + (αβ)ij
• again we assume
yijt ∼ N(µ+ αi + βj + (αβ)ij , σ2)
• with three constraints:∑i αi = 0,
∑j βj = 0,
∑i (αβ)ij = 0,
∑j(αβ)ij = 0
• cell-means and two-way complete models are equivalent because
given one, you can derive the other one
6
Comparing them
Cell-means model:
yijt = µ+ τij + εijt ,
• there are in total 9 combinations → 9 unknown parameters in τij• with the constraint
∑i
∑j τij = 0
• we need to estimate 8 unknown τij
Two-way complete model:
yijt = µ+ αi + βj + (αβ)ij + εijk ,
• 2 unknowns for the αi
• 2 unknowns for the βj• 4 unknowns for the interactions (αβ)ij• add up to 8
7
Do they answer the same questions?
Cell-means model:
yijt = µ+ τij + εijt ,
H0 : τ11 = τ12 = . . . = 0 versus
Ha : at least one τij 6= 0
⇒ if one is trying to see which
combination gives the ‘best’
outcome, cell-means model is good
enough.
Two-way complete model:
yijt = µ+ αi + βj + (αβ)ij + εijk ,
For the interactions:
H0 : (αβ)ij = 0 versus
Ha : at least one (αβ)ij 6= 0
For the first main effect αi
H0 : α1 = α2 = . . . = αa = 0 versus
Ha : at least one αi 6= 0
For the second main effect βjH0 : β1 = β2 = . . . βb = 0 versus
Ha : at least one βj 6= 0
⇒ if one is trying to learn about the
effect of each treatment, two-way
complete model should be chosen.8
Two-way main-effects model
yijk = µ+ αi + βj + εijk ,
• with two constraints:∑
i αi = 0,∑
j βj = 0
• 2 unknown parameters for αi
• 2 unknown parameters for βj
• in total we have 4 unknown parameters
• compared to two-way complete model, we have 4 less parameters
• because there is no ‘interaction’ terms
9
which one to use
yijk = µ+ αi + βj + εijk or
yijk = µ+ αi + βj + (αβ)ij + εijk
It depends on two things:
• whether the ’interaction’ effect is ‘huge’
• sample size
10
Interaction effect
⇒ Use the interaction plot to check, and also consider variability11
Model selection: It also depends on the sample-size
• When the sample size is small, two-way main-effects might be the
only choice
• An extreme case is: only one sample for each treatment combination
(section 6.7.1)
Source df
Temperature b-1 =2
Material a-1 = 2
Interaction (a-1)(b-1) = 4
Error ab(n-1)=0
Total abn-1 = 8
What does degree of freedom being 0 means? When the degree of
freedom for Error part is 0, we cannot estimate σ2
12
ANOVA table for two-way complete model
When the sample sizes for each group are equal
SST = SSA + SSB + SSAB + SSE
d.f. Sum of squares
(nab − 1) SST =∑n
k=1
∑bj=1
∑ai=1(yijk − y···)
2
=∑n
k=1
∑bj=1
∑ai=1 y
2ijk − naby2
···(a− 1) SSA = nb
∑ai=1(yi·· − y···)
2 = nb∑a
i=1 y2i·· − naby2
···(b − 1) SSB = na
∑bj=1(y·j· − y···)
2 = na∑b
j=1 y2·j· − naby2
···(a− 1)(b − 1) SSAB = n
∑bj=1
∑ai=1(yij· − yi·· − y·j· + y···)
2
ab(n − 1) SSE =∑n
k=1
∑bj=1
∑ai=1(yijk − yij·)
2
= SST − SSA− SSB − SSAB
13
Manual computation of SS for Battery Life Data
SS(Total) =4∑
l=1
3∑j=1
3∑i=1
(yijl − y···)2 =
4∑l=1
3∑j=1
3∑i=1
y2ijl − 4 × 3 × 3 × y2
···
= (130)2 + (155)2 + ... + (60)2 − 36 × 105.532 = 77646.97
SS(Material) = 4 × 33∑
i=1
(yi·· − y···)2 = 12 ×
3∑i=1
y2i·· − 36 × y2
···
= 12 × [(83.17)2 + ... + (125.08)2] − 36 × 105.532 = 10683.72
SS(Temp) = 4 × 33∑
j=1
(y·j· − y···)2 = 12 ×
3∑j=1
y2·j· − 36 × y2
···
= 12 × [(144.83)2 + ... + (64.17)2] − 36 × 105.532 = 39118.72
SS(interaction) = 43∑
j=1
3∑i=1
(yij· − yi·· − y·j· + y···)2 = 4 × [(134.75 − 83.17 − 144.83
+ 105.53)2 + ... + (85.5 − 125.08 − 64.17 + 105.53)2] = 9613.78
SS(Error) = SS(Total) − SS(Mat.) − SS(Temp) − SS(Interac) = 18230.75
Note: I will show that SS(Error) indeed follows that relationship using R... 14
ANOVA Table for Battery Life Data
With the SS’s all calculated, the calculation for the rest is rather
straightforward:
Source SS df MS F0 P-Value
Temperature 39119 2 19559.4 28.9677 1.909e-07
Material 10684 2 5341.9 7.9114 0.001976
Interaction 9614 4 2403.4 3.5595 0.018611
Error 18231 27 675.2
Total 77647 35
15
Hypotheses to be tested:
• Testing for interaction
• SSE
• SSAB
• Test statistics
• Rejection region
• Testing for main effects
16
Testing for interaction: SSE
• Sum of Squares for the error
• For each observation yijt , the error is yijt − yijt .
• What is yijt?
• Least square estimate yijt = y ij. (section 6.4.1)
• Sum of Squares means
SSE =n∑
t=1
b∑j=1
a∑i=1
(yijt − yij·)2
17
Testing for interaction: SSAB
(Page 153) Definition: Sum of Squares for the interaction.
SSAB = nb∑
j=1
a∑i=1
(yij· − yi·· − y·j· + y···)2
• Least square estimate: (αβ)ij = yij· − yi·· − y·j· + y···
• SSAB = SSEAB0 − SSE
• SSEAB0 is the sum of Squares for error when the HAB
0 is true: no
interaction
• SSE is the sum of Squares for error under the two-way complete
model
Larger SSAB ⇒ adding the interaction terms better explains variance
⇒ the interaction is important.
18
Testing for interaction: SSAB
• SSAB = SSEAB0 − SSE
• SSEAB0 is the Sum of Squares for error under main effect model
yijk = µ+ αi + βj + εijk ,
SSEAB0 =
∑bj=1
∑ai=1
∑nt=1(yijt − yi·· − y·j· + y···)
2
• SSE is the Sum of Squares for error under two-way complete model
yijk = µ+ αi + βj + (αβ)ij + εijk ,
SSE =∑b
j=1
∑ai=1
∑nt=1(yijt − yij.)
2
Larger SSAB ⇒ adding the interaction terms better explains variance
⇒ the interaction is important.
19
Testing for interaction: Test statistics and Rejection region
From another point of view:
E (MSAB) = σ2 +n∑
(αβ)2ij
(a− 1)(b − 1)
E (MSE ) = σ2
Reject HAB0 if msAB
msE > F(a−1)(b−1),N−ab,α
⇒ Reject the Null Hypotheses when F statistics is large!
20
Testing for Main effects
page 155
In this book, we take the view that the main effect of A would
not be tested unless the hypothesis of no interaction were first
accepted.
• Our goal in testing main effect A is to see whether factor A has no
effect on the response or outcome
• Choice 1: the levels of A (averaged over the levels of B) have the
same average level on the response):
HA0 : α?1 = α?2 = . . . = α?a = 0
where α?i = αi + (αβ)i.• Choice 2: response only depends on B
HA+AB0 : {both HA
0 and HAB0 are true }
21
Testing for Main effects: Two choices
• Choice 1:
HA0 : α?1 = α?2 = . . . = α?a = 0
where α?i = αi + (αβ)i. = αi
• Choice 2:
HA+AB0 : {both HA
0 and HAB0 are true }
yijt = µ+ βj + εijt
• Choice 1 & 2 are equivalent when there is no interaction (see Page
155 for the reason)
• Otherwise they are different tests
22
Testing for Main effects: no interaction and equal sample sizes
SSA = nba∑
i=1
(yi·· − y···)2 = nb
a∑i=1
y2i·· − naby2
···
SSA = SSEA0 − SSE
SSEA0 denotes the sum of Squares for the error when HA
0 is true.
Again
E (MSA) = σ2 +bn
∑α2i
a− 1
E (MSE ) = σ2
reject HA0 if msA
msE > F(a−1),N−ab,α
23
Testing for Main effects: unequal sample sizes
Type I and Type III sum of Squares
• They are the same if the sample sizes are equal
• Otherwise, Type III compares the SSEs in a full model and a reduced
model
• Type I compares the SSEs with the existing model and the existing
model + the tested term. Also called ”sequential” sum of squares.
• in Type I calculation, the order of the test matters
24
Model parameters estimation: Least square
• two-way complete model:
yijk = µ+ αi + βj + (αβ)ij + εijk ,
yijt = µ+ αi + βj + αβij
= y ... + (y i.. − y ...) + (y .j. − y ...)+
(y ij. − y i.. − y .j. + y ...)
yijt = y ij.
• two-way main-effects model:
yijt = µ+ αi + βj + εijt
yijt = y ... + (y i.. − y ...) + (y .j. − y ...)
25
Summary
• three models
• two-way complete VS cell-means
• two-way main-effects VS two-way complete
• Definition of SSAB, SSA, SSB, SSE
• test the interaction first, then the main effects
• when sample sizes are not equal, type I and type III sum of squares
26
Next time
• Check the assumptions
• Contrasts
• Multiple Comparisons
• SAS example
27