Topic 24: Two-Way ANOVA

Preview:

DESCRIPTION

Topic 24: Two-Way ANOVA. Outline. Two-way ANOVA Data Cell means model Parameter estimates Factor effects model. Two-Way ANOVA. The response variable Y is continuous There are two categorical explanatory variables or factors. Data for two-way ANOVA. Y is the response variable - PowerPoint PPT Presentation

Citation preview

Topic 24: Two-Way ANOVA

Outline

• Two-way ANOVA

–Data

–Cell means model

–Parameter estimates

–Factor effects model

Two-Way ANOVA

• The response variable Y is continuous

• There are two categorical explanatory variables or factors

Data for two-way ANOVA

• Y is the response variable

• Factor A with levels i = 1 to a

• Factor B with levels j = 1 to b

• Yijk is the kth observation in cell (i,j)

• In Chapter 19, we assume equal

sample size in each cell (nij=n)

KNNL Example• KNNL p 833

• Y is the number of cases of bread sold

• A is the height of the shelf display, a=3 levels: bottom, middle, top

• B is the width of the shelf display, b=2 levels: regular, wide

• n=2 stores for each of the 3x2=6 treatment combinations (nT=12)

Read the data

data a1; infile ‘../data/ch19ta07.txt'; input sales height width;

proc print data=a1; run;

The dataObs sales height width 1 47 1 1 2 43 1 1 3 46 1 2 4 40 1 2 5 62 2 1 6 68 2 1 7 67 2 2 8 71 2 2 9 41 3 1 10 39 3 1 11 42 3 2 12 46 3 2

Notation

• For Yijk we use

– i to denote the level of the factor A

– j to denote the level of the factor B

–k to denote the kth observation in cell (i,j)

• i = 1, . . . , a levels of factor A

• j = 1, . . . , b levels of factor B

• k = 1, . . . , n observations in cell (i,j)

Model

• We assume that the response variable observations are

–Normally distributed

•With a mean that may depend on the levels of the factors A and B

•With a constant variance

– Independent

Cell Means Model• Yijk = μij + εijk

–where μij is the theoretical mean or expected value of all observations in cell (i,j) – the εijk are iid N(0, σ2)

• This means Yijk ~ N(μij, σ2), independent• The parameters of the model are– μij, for i = 1 to a and j = 1 to b –σ2

Estimates• Estimate μij by the mean of the

observations in cell (i,j), • • For each (i,j) combination, we can get

an estimate of the variance

• • We need to combine these to get an

estimate of σ2

ij.Yn/)Y(Y k ijkij.

k2

ij.ijk2ij )1n/()YY(s

Pooled estimate of σ2

• In general we pool the sij2, using

weights proportional to the df, nij -1

• The pooled estimate is

s2 = (Σ (nij-1)sij2) / (Σ(nij-1))

• Here, nij = n, so s2 = (Σsij2) / (ab),

which is the average sample variance

Run proc glm

proc glm data=a1; class height width; model sales= height width height*width; means height width height*width;run;

Output

Class Level InformationClass Levels Valuesheight 3 1 2 3width 2 1 2

Number of Observations Read 12Number of Observations Used 12

Means statement height

Level ofheight N

sales

Mean Std Dev1 4 44.0000000 3.16227766

2 4 67.0000000 3.74165739

3 4 42.0000000 2.94392029

Means statement width

Level ofwidth N

sales

Mean Std Dev1 6 50.0000000 12.0664825

2 6 52.0000000 13.4313067

Means statement ht*w

Level ofheight

Level ofwidth N

sales

Mean Std Dev1 1 2 45.0000000 2.828427121 2 2 43.0000000 4.242640692 1 2 65.0000000 4.242640692 2 2 69.0000000 2.828427123 1 2 40.0000000 1.414213563 2 2 44.0000000 2.82842712

Code the factor levelsdata a1; set a1; if height eq 1 and width eq 1 then hw='1_BR'; if height eq 1 and width eq 2 then hw='2_BW'; if height eq 2 and width eq 1 then hw='3_MR'; if height eq 2 and width eq 2 then hw='4_MW'; if height eq 3 and width eq 1 then hw='5_TR'; if height eq 3 and width eq 2 then hw='6_TW';

Plot the data

symbol1 v=circle i=none;proc gplot data=a1; plot sales*hw/frame;run;

The plot

Put the means in a2

proc means data=a1; var sales; by height width; output out=a2 mean=avsales;proc print data=a2; run;

Output Data Set

Obs height width _TYPE_ _FREQ_ avsales

1 1 1 0 2 45 2 1 2 0 2 43 3 2 1 0 2 65 4 2 2 0 2 69 5 3 1 0 2 40 6 3 2 0 2 44

Plot the means

symbol1 v=square i=join c=black;symbol2 v=diamond i=join c=black;proc gplot data=a2; plot avsales*height=width/frame;run;

The interaction plot

Questions to consider

• Does the height of the display affect

sales? If yes, compare top with middle,

top with bottom, and middle with bottom

• Does the width of the display affect

sales? If yes, compare regular and wide

But wait!!! Are these factor level comparisons

meaningful?• Does the effect of height on sales

depend on the width?

• Does the effect of width on sales depend on the height?

• If yes, we have an interaction and we need to do some additional analysis

Factor effects model

• For the one-way ANOVA model, we wrote μi = μ + αi

• Here we use μij = μ + αi + βj + (αβ)ij

• Under “common” formulation– μ (μ.. in KNNL) is the “overall mean”

– αi is the main effect of A

– βj is the main effect of B

– (αβ)ij is the interaction between A and B

Factor effects model

• μ = (Σij μij)/(ab)

• μi. = (Σj μij)/b and μ.j = (Σi μij)/a

• αi = μi. – μ and βj = μ.j - μ

• (αβ)ij is difference between μij and μ + αi + βj

• (αβ)ij = μij - (μ + (μi. - μ) + (μ.j - μ))

= μij – μi. – μ.j + μ

Interpretation

• μij = μ + αi + βj + (αβ)ij

• μ is the “overall” mean

• αi is an adjustment for level i of A

• βj is an adjustment for level j of B

• (αβ)ij is an additional adjustment that takes into account both i and j that cannot be explained by the previous adjustments

Constraints for this framework

• α. = Σi αi= 0

• β. = Σjβj = 0

• (αβ).j = Σi (αβ)ij = 0 for all j

• (αβ)i. = Σj (αβ)ij = 0 for all i

Estimates for factor effects model

....j...i.ijij

....j.j.....ii

.j..j..i.i

ijk ijk...

YYYY)ˆ(

YYˆ and YYˆ

Yˆ and Yˆ

abn/)Y(Yˆ

SS for ANOVA Table22

.. ...ijk

2jijk

2ijijk

2ijk .ijk

2...ijk

ˆSSA (Y Y )

ˆSSB

SSAB ( )

SSE (Y Y )

SSTO (Y Y )

ˆ

i iijk

ij

ijk

df for ANOVA Table

• dfA = a-1

• dfB = b-1

• dfAB = (a-1)(b-1)

• dfE = ab(n-1)

• dfT = abn-1 = nT-1

MS for ANOVA Table

• MSA = SSA/dfA

• MSB = SSB/dfB

• MSAB = SSAB/dfAB

• MSE = SSE/dfE

• MST = SST/dfT

Hypotheses for two-way ANOVA

• H0A: αi = 0 for all i

• H1A: αi ≠ 0 for at least one i

• H0B: βj = 0 for all j

• H1B: βj ≠ 0 for at least one j

• H0AB: (αβ)ij = 0 for all (i,j)

• H1AB: (αβ)ij ≠ 0 for at least one (i,j)

F statistics

• H0A is tested by FA = MSA/MSE; df=dfA, dfE

• H0B is tested by FB = MSB/MSE; df=dfB,

dfE

• H0AB is tested by FAB = MSAB/MSE;

df=dfAB, dfE

ANOVA Table

Source df SS MS F A a-1 SSA MSA MSA/MSE B b-1 SSB MSB MSB/MSE AB (a-1)(b-1) SSAB MSAB MSAB/MSEError ab(n-1) SSE MSE _ Total abn-1 SSTO MST

P-values

• P-values are calculated using the F(dfNumerator, dfDenominator) distributions

• If P ≤ 0.05 we conclude that the effect being tested is statistically significant

KNNL Example• NKNW p 833• Y is the number of cases of bread sold• A is the height of the shelf display, a=3

levels: bottom, middle, top• B is the width of the shelf display, b=2:

regular, wide• n=2 stores for each of the 3x2

treatment combinations

PROC GLM

proc glm data=a1; class height width; model sales= height width height*width;run;

Output

Note that there are 6 cells inthis design…(6-1)df for model

Source DFSum of

SquaresMean

Square F Value Pr > FModel 5 1580.0000 316.000000 30.58 0.0003Error 6 62.000000 10.333333Corrected Total

11 1642.0000

Output ANOVA

Note Type I and Type III Analyses are the same becausenij is constant

Source DF Type III SS Mean Square F Value Pr > Fheight 2 1544.00000 772.000000 74.71 <.0001

width 1 12.000000 12.000000 1.16 0.3226

height*width 2 24.000000 12.000000 1.16 0.3747

Other output

R-Square Coeff Var Root MSE sales Mean0.962241 6.303040 3.214550 51.00000

Commonly do not consider R-sq when performing ANOVA…interested more in difference in levels rather than the models predictive ability

Results

• The main effect of height is statistically significant (F=74.71; df=2,6; P<0.0001)

• The main effect of width is not statistically significant (F=1.16; df=1,6; P=0.32)

• The interaction between height and width is not statistically significant (F=1.16; df=2,6; P=0.37)

Interpretation

• The height of the display affects sales of bread

• The width of the display has no apparent effect

• The effect of the height of the display is similar for both the regular and the wide widths

Plot of the means

Additional analyses

• We will need to do additional analyses to explain the height effect (factor A)

• There were three levels: bottom, middle and top

• We could rerun the data with a one-way anova and use the methods we learned in the previous chapters

• Use means statement with lines

Run Proc GLM

proc glm data=a1; class height width; model sales= height width height*width; means height / tukey lines; lsmeans height / adjust=tukey;run;

MEANS OutputAlpha 0.05Error Degrees of Freedom 6Error Mean Square 10.33333Critical Value of Studentized Range 4.33920Minimum Significant Difference 6.9743

Means with the same letter are not significantly different.

Tukey Grouping Mean N heightA 67.000 4 2

B 44.000 4 1BB 42.000 4 3

LSMEANS Outputheight sales LSMEAN

LSMEAN Number

1 44.0000000 1

2 67.0000000 2

3 42.0000000 3

Least Squares Means for effect heightPr > |t| for H0: LSMean(i)=LSMean(j)

Dependent Variable: salesi/j 1 2 31 0.0001 0.67142 0.0001 <.00013 0.6714 <.0001

Last slide

• We went over Chapter 19

• We used program topic24.sas to generate the output for today.

Recommended