of 39 /39
Topic 25: Inference for Two-Way ANOVA

# Topic 25: Inference for Two-Way ANOVA. Outline Two-way ANOVA –Data, models, parameter estimates ANOVA table, EMS Analytical strategies Regression approach

Embed Size (px)

Citation preview

Topic 25: Inference for Two-Way ANOVA

Outline

• Two-way ANOVA

–Data, models, parameter estimates

• ANOVA table, EMS

• Analytical strategies

• Regression approach

Data

• Response written Yijk where

– i denotes the level of the factor A

– j denotes the level of the factor B

–k denotes the kth observation in cell (i,j)

• i = 1, . . . , a levels of factor A

• j = 1, . . . , b levels of factor B

• k = 1, . . . , n observations in cell (i,j)

Cell means model

• Yijk = μij + εijk

–where μij is the theoretical mean or expected value of all observations in cell (i,j)

– the εijk are iid N(0, σ2)

–This means Yijk ~N(μij, σ2) and independent

Factor effects model

• μij = μ + αi + βj + (αβ)ij

• Consider μ to be the overall mean

• αi is the main effect of A

• βj is the main effect of B

• (αβ)ij is the interaction between A and B

Constraints for this interpretation

• α. = Σiαi = 0 (df = a-1)

• β. = Σjβj = 0 (df = b-1)

• (αβ).j = Σi (αβ)ij = 0 for all j

• (αβ)i. = Σj (αβ)ij= 0 for all I

df = (a-1)(b-1)

SAS GLM Constraints

• αa = 0 (1 constraint)• βb = 0 (1 constraint)• (αβ)aj = 0 for all j (b constraints)• (αβ)ib = 0 for all i (a constraints)• The total is 1+1+a+b-1=a+b+1 (the

constraint (αβ)ab is counted twice in the last two bullets above)

Parameters and constraints

• The cell means model has ab parameters for the means

• The factor effects model has (1+a+b+ab) parameters–An intercept (1)–Main effect of A (a)–Main effect of B (b)– Interaction of A and B (ab)

Factor effects model

• There are 1+a+b+ab parameters• There are 1+a+b constraints• There are ab unconstrained parameters

(or sets of parameters), the same number of parameters for the means in the cell means model

• While certain parameters depend on choice of constraints, others do not

KNNL Example• KNNL p 833• Y is the number of cases of bread sold• A is the height of the shelf display, a=3

levels: bottom, middle, top• B is the width of the shelf display, b=2:

regular, wide• n=2 stores for each of the 3x2

treatment combinations

Proc GLM with solution

proc glm data=a1; class height width; model sales=height width height*width /solution; means height*width;run;

Solution output

Intercept 44.0 B height 1 -1.0 Bheight 2 25.0 B height 3 0.0 B width 1 -4.0 Bwidth 2 0.0 B

Solution output

height*width 1 1 6.0 Bheight*width 1 2 0.0 B height*width 2 1 0.0 Bheight*width 2 2 0.0 B height*width 3 1 0.0 Bheight*width 3 2 0.0 B

Means

height width Mean1 1 45=44 -1-4+61 2 43=44 -1+0+0 2 1 65=44+25-4+02 2 69=44+25+0+03 1 40=44 +0-4+03 2 44=44 +0+0+0

Based on estimates from previous two

pages

Check normalityAlternative way to form QQplot

proc glm data=a1; class height width; model sales=height width height*width; output out=a2 r=resid;proc rank data=a2 out=a3 normal=blom; var resid; ranks zresid;

Normal Quantile plot

proc sort data=a3; by zresid;symbol1 v=circle i=sm70;proc gplot data=a3; plot resid*zresid/frame;run;

The plot

Note, dfE is only 6

ANOVA Table

Source df SS MS F A a-1 SSA MSA MSA/MSE B b-1 SSB MSB MSB/MSE AB (a-1)(b-1) SSAB MSAB MSAB/MSEError ab(n-1) SSE MSE _ Total abn-1 SSTO

Expected Mean Squares

• E(MSE) = σ2

• E(MSA) = σ2 + nb(Σiαi2)/(a-1)

• E(MSB) = σ2 + na(Σjβj2)/(b-1)

• E(MSAB) = σ2 + n(Σ )/((a-1)(b-1))

• Here, αi, βj, and (αβ)ij are defined with the usual factor effects constraints

2)( ij

An analytical strategy

• Run the model with main effects and the two-way interaction

• Plot the data, the means, and look at the normal quantile plot and residual plots

• If assumptions seem reasonable, check the significance of test for the interaction

AB interaction not sig• If the AB interaction is not statistically

significant

–Possibly rerun the analysis without the interaction (See pooling §19.10)

–Potential Type II errors when pooling

–For a main effect with more than two levels that is significant, use the means statement with the Tukey multiple comparison procedure

GLM Output

Source DF SS MS F Pr > FModel 5 1580 316 30.58 0.0003Error 6 62 10Total 11 1642

Note that there are 6 cells inthis design.

Output ANOVA

Type I or Type IIISource DF SS MS F Pr > Fheight 2 1544 772 74.71 <.0001width 1 12 12 1.16 0.3226h*w 2 24 12 1.16 0.3747

Note Type I and Type III analyses are the same becausecell size n is constant

Rerun without interaction

proc glm data=a1; class height width; model sales=height width; means height / tukey lines;run;

ANOVA output

Source DF MS F Pr > Fheight 2 772 71.81 <.0001width 1 12 1.12 0.3216

MS(height) and MS(width) havenot changed. The MSE, F*’s, and P-values have because of pooling.

Comparison of MSEs

Error 8 86 10.75

Error 6 62 10.33

Model with interaction

Model without interaction

Little change in MSE here…often only pool when df small

Pooling SS• Data = Model + Residual• When we remove a term from the `model’,

we put this variation and the associated df into `residual’

• This is called pooling• A benefit is that we have more df for error

and a simpler model• Potential Type II errors• Beneficial only in small experiments

Pooling SSE and SSAB

• For model with interaction

• SSAB=24, dfAB=2

• SSE=62, dfE=6

•MSE=10.33

• For the model with main effects only

• SSE=62+24=86, dfE=6+2=8

•MSE=10.75

Tukey Output

Mean N height

A 67.000 4 2

B 44.000 4 1BB 42.000 4 3

Plot of the means

Regression Approach

• Similar to what we did for one-way• Use a-1 variables for A• Use b-1 variables for B• Multiply each of the a-1 variables for A

times each of the b-1 for B to get (a-1)(b-1) for AB

• You can use the test statement in Proc reg to perform F tests

Create Variables

data a4;

set a1;

X1 = (height eq 1) - (height eq 3);

X2 = (height eq 2) - (height eq 3);

X3 = (width eq 1) - (width eq 2);

X13 = X1*X3;

X23 = X2*X3;

Run Proc Reg

proc reg data=a4;

model sales= X1 X2 X3 X13 X23 / ss1;

height: test X1, X2;

width: test X3;

interaction: test X13, X23;

run;

SAS Output

Analysis of Variance

Source DFSum of

SquaresMean

Square F Value Pr > FModel 5 1580.00000 316.00000 30.58 0.0003

Error 6 62.00000 10.33333

Corrected Total 11 1642.00000

Same basic ANOVA table

SAS OutputParameter Estimates

Variable DFParameter

EstimateStandard

Error t Value Pr > |t| Type I SSIntercept 1 51.00000 0.92796 54.96 <.0001 31212

X1 1 -7.00000 1.31233 -5.33 0.0018 8.00000

X2 1 16.00000 1.31233 12.19 <.0001 1536.0000

X3 1 -1.00000 0.92796 -1.08 0.3226 12.00000

X13 1 2.00000 1.31233 1.52 0.1783 18.00000

X23 1 -1.00000 1.31233 -0.76 0.4749 6.00000

SS Results

• SS(Height) = SS(X1)+SS(X2|X1)

1544 = 8.0 + 1536

• SS(Width) = SS(X3|X1,X2)

12 = 12

• SS(Height*Width) = SS(X13|X1,X2,X3) + SS(X23|X1, X2,X3,X13)

24 = 18 + 6

Test ResultsTest height Results for Dependent Variable

sales

Source DFMean

Square F Value Pr > FNumerator 2 772.0000 74.71 <.0001

Denominator 6 10.33333

Test interaction Results for Dependent Variable sales

Source DFMean

SquareF

Value Pr > FNumerator 2 12.000 1.16 0.3747

Denominator 6 10.333

Test width Results for Dependent Variable sales

Source DFMean

Square F Value Pr > FNumerator 1 12.0000 1.16 0.3226Denominator 6 10.3333

Interpreting Estimates

69)1()1(1651ˆ

452)1()7(51ˆ

52)1(51ˆ 50)1(51ˆ

4216)7(51ˆ

671651ˆ

44)7(51ˆ

22

11

2.1.

.3

.2

.1

Last slide

• Finish reading KNNL Chapter 19• Topic25.sas contains the SAS commands for these

slides• We will now focus more on the strategies needed to

handle a two- or more factor ANOVA