55
Topic 20: Single Factor Analysis of Variance

Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

Embed Size (px)

Citation preview

Page 1: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

Topic 20: Single Factor Analysis of Variance

Page 2: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

Outline

• Analysis of Variance–One set of treatments (i.e., single

factor)• Cell means model• Factor effects model

–Link to linear regression using indicator explanatory variables

Page 3: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

One-Way ANOVA

• The response variable Y is continuous

• The explanatory variable is categorical

– We call it a factor

– The possible values are called levels

• This approach is a generalization of the independent two-sample pooled t-test

• In other words, it can be used when there are more than two treatments

Page 4: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

Data for One-Way ANOVA

• Y is the response variable

• X is the factor (it is qualitative/discrete)

– r is the number of levels

– often refer to these levels as groups or treatments

• Yi,j is the jth observation in the ith group

Page 5: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

Notation• For Yi,j we use

– i to denote the level of the factor– j to denote the jth observation at factor

level i• i = 1, . . . , r levels of factor X

• j = 1, . . . , ni observations for level i of factor

X

– ni does not need to be the same in each group

Page 6: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

KNNL Example (p 685)• Y is the number of cases of cereal sold

• X is the design of the cereal package

– there are 4 levels for X because there are 4 different package designs

• i =1 to 4 levels

• j =1 to ni stores with design i (ni=5,5,4,5)

• Will use n if ni the same across groups

Page 7: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

Data for one-way ANOVA

data a1; infile 'c:../data/ch16ta01.txt'; input cases design store;

proc print data=a1; run;

Page 8: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

The data

Obs cases design store1 11 1 12 17 1 23 16 1 34 14 1 45 15 1 56 12 2 17 10 2 28 15 2 3

Page 9: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

Plot the data

symbol1 v=circle i=none;proc gplot data=a1; plot cases*design;run;

Page 10: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

The plot

Page 11: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

Plot the means

proc means data=a1; var cases; by design; output out=a2 mean=avcases;proc print data=a2;symbol1 v=circle i=join;proc gplot data=a2; plot avcases*design;run;

Page 12: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

New Data Set

Obs design _TYPE_ _FREQ_ avcases1 1 0 5 14.6

2 2 0 5 13.4

3 3 0 4 19.5

4 4 0 5 27.2

Page 13: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

Plot of the means

Page 14: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

The Model

• We assume that the response variable is – Normally distributed with a

1. mean that may depend on the level of the factor

2. constant variance • All observations assumed independent• NOTE: Same assumptions as linear

regression except there is no assumed linear relationship between X and E(Y|X)

Page 15: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

Cell Means Model

• A “cell” refers to a level of the factor

• Yij = μi + εij

– where μi is the theoretical mean or expected value of all observations at level (or cell) i

– the εij are iid N(0, σ2) which means

– Yij ~N(μi, σ2) and independent

– This is called the cell means model

Page 16: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

Parameters• The parameters of the model are

– μ1, μ2, … , μr

–σ2

• Question (Version 1) – Does our explanatory variable help explain Y?

• Question (Version 2) – Do the μi vary?

H0: μ1= μ2= … = μr = μ (a constant)

Ha: not all μ’s are the same

Page 17: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

Estimates• Estimate μi by the mean of the

observations at level i, (sample mean)

• ûi = = ΣYi,j/ni

• For each level i, also get an estimate of the variance

• = Σ(Yij- )2/(ni-1) (sample variance)

• We combine these to get an overall estimate of σ2

• Same approach as pooled t-test

iY

iY

iY2is

2is

Page 18: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

Pooled estimate of σ2

• If the ni were all the same we would average the – Do not average the si

• In general we pool the , giving weights proportional to the df, ni -1

• The pooled estimate is

2is

2is

)(1

112

22

rnsn

nsns

Tii

iii

Page 19: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

Running proc glm

proc glm data=a1; class design; model cases=design; means design; lsmeans designrun;

Difference 1: Need to specify factor variables

Difference 2: Ask for mean estimates

Page 20: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

Output

Class Level Information

Class Levels Valuesdesign 4 1 2 3 4

Number of Observations Read 19Number of Observations Used 19

Important summaries to check these summaries!!!

Page 21: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

SAS 9.3 default output for MEANS statement

Page 22: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

MEANS statement output

Level ofdesign N

cases

Mean Std Dev1 5 14.6000000 2.302172892 5 13.4000000 3.646916513 4 19.5000000 2.645751314 5 27.2000000 3.96232255

Table of sample means and sample variances

Page 23: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

SAS 9.3 default output for LSMEANS statement

Page 24: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

LSMEANS statement output

design cases LSMEANStandard

Error Pr > |t|1 14.6000000 1.4523544 <.00012 13.4000000 1.4523544 <.00013 19.5000000 1.6237816 <.00014 27.2000000 1.4523544 <.0001

Provides estimates based on model (i.e., constant variance)

Page 25: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

Notation

i iT

T

i j Tij

ij ij

nn

n

nYY

nY

..

i.

nsobservatio ofnumber total the is

mean) sample (grand /

mean) sample(trt /Y

Page 26: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

ANOVA Table

Source df SS MS

Model r-1 Σij( - )2 SSR/dfR

Error nT-r Σij(Yij - )2 SSE/dfE

Total nT-1 Σij(Yij - )2 SST/dfT..Y

..Yi.Y

i.Y

Page 27: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

ANOVA SAS Output

Source DFSum of

SquaresMean

SquareF

Value Pr > FModel 3 588.2210526 196.0736842 18.59 <.0001

Error 15 158.2000000 10.5466667    

Corrected Total

18 746.4210526      

R-Square Coeff Var Root MSE cases Mean0.788055 17.43042 3.247563 18.63158

Page 28: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

Expected Mean Squares

• E(MSR) > E(MSE) when the group means are different

• See KNNL p 694 – 698 for more details• In more complicated models, these tell

us how to construct the F test

Ti ii

i ii

nn

rnE

E

/ where

1)MSR(

)MSE(

.

2.

2

2

Page 29: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

F test

• F = MSR/MSE

• H0: μ1 = μ2 = … = μr

• Ha: not all of the μi are equal

• Under H0, F ~ F(r-1, nT-r)

• Reject H0 when F is large

• Report the P-value

Page 30: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

Maximum Likelihood Approach

proc glimmix data=a1;

class design;

model cases=design / dist=normal;

lsmeans design;

run;

Page 31: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

GLIMMIX OutputModel Information

Data Set WORK.A1

Response Variable cases

Response Distribution Gaussian

Link Function Identity

Variance Function Default

Variance Matrix Diagonal

Estimation Technique Restricted Maximum Likelihood

Degrees of Freedom Method Residual

Page 32: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

GLIMMIX Output

Fit Statistics-2 Res Log Likelihood 84.12AIC (smaller is better) 94.12AICC (smaller is better) 100.79BIC (smaller is better) 97.66CAIC (smaller is better) 102.66HQIC (smaller is better) 94.08Pearson Chi-Square 158.20Pearson Chi-Square / DF 10.55

Page 33: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

GLIMMIX OutputType III Tests of Fixed Effects

EffectNum

DFDen DF F Value Pr > F

design 3 15 18.59 <.0001

design Least Squares Means

design EstimateStandard

Error DF t Value Pr > |t|1 14.6000 1.4524 15 10.05 <.00012 13.4000 1.4524 15 9.23 <.00013 19.5000 1.6238 15 12.01 <.00014 27.2000 1.4524 15 18.73 <.0001

Page 34: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

Factor Effects Model

• A reparameterization of the cell means model

• Useful way at looking at more complicated models

• Null hypotheses are easier to state

• Yij = μ + i + εij

– the εij are iid N(0, σ2)

Page 35: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

Parameters

• The parameters of the model are

– μ, 1, 2, … , r

– σ2

• The cell means model had r + 1 parameters– r μ’s and σ2

• The factor effects model has r + 2 parameters– μ, the r ’s, and σ2

– Cannot uniquely estimate all parameters

Page 36: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

An example

• Suppose r=3; μ1 = 10, μ2 = 20, μ3 = 30

• What is an equivalent set of parameters for the factor effects model?

• We need to have μ + i = μi

• μ = 0, 1 = 10, 2 = 20, 3 = 30

• μ = 20, 1 = -10, 2 = 0, 3 = 10

• μ = 5000, 1 = -4990, 2 = -4980, 3 = -4970

Page 37: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

Problem with factor effects?• These parameters are not estimable

or not well defined (i.e., unique)• There are many solutions to the least

squares problem• There is an X΄X matrix for this

parameterization that does not have an inverse (perfect multicollinearity)

• The parameter estimators here are biased (SAS proc glm)

Page 38: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

Factor effects solution

• Put a constraint on the i

• Common to assume Σi i = 0

• This effectively reduces the number of parameters by 1

• Numerous other constraints possible

Page 39: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

Consequences• Regardless of constraint, we always have μi = μ + i

• The constraint Σi i = 0 implies

– μ = (Σi μi)/r (unweighted grand mean)

i = μi – μ (group effect)

• The “unweighted” complicates things when the ni are not all equal; see KNNL p 702-708

Page 40: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

Hypotheses

• H0: μ1 = μ2 = … = μr

• H1: not all of the μi are equal

are translated into

• H0: 1 = 2 = … = r = 0

• H1: at least one i is not 0

Page 41: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

Estimates of parameters

• With the constraint Σi i = 0

.i.i

..i i..

ˆYˆ

) (if YYˆ

nnr i

Page 42: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

Solution used by SAS

• Recall, X΄X does not have an inverse

• We can use a generalized inverse in its place

• (X΄X)- is the standard notation

• There are many generalized inverses, each corresponding to a different constraint

Page 43: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

Solution used by SAS

• (X΄X)- used in proc glm corresponds to the constraint r = 0

• Recall that μ and the i are not estimable

• But the linear combinations μ + i are estimable

• These are estimated by the cell means

Page 44: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

Cereal package example

• Y is the number of cases of cereal sold

• X is the design of the cereal package

• i =1 to 4 levels

• j =1 to ni stores with design i

Page 45: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

SAS coding for X•Class statement generates r explanatory variables •The ith explanatory variable is equal to 1 if the observation is from the ith group•In other words, the rows of X are 1 1 0 0 0 for design=1 1 0 1 0 0 for design=2 1 0 0 1 0 for design=3 1 0 0 0 1 for design=4

Page 46: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

Some options

proc glm data=a1; class design; model cases=design /xpx inverse solution;run;

Page 47: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

Output The X'X Matrix

Int d1 d2 d3 d4 casesInt 19 5 5 4 5 354d1 5 5 0 0 0 73d2 5 0 5 0 0 67d3 4 0 0 4 0 78d4 5 0 0 0 5 136cases 354 73 67 78 136 7342

Also contains X’Y

Page 48: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

Output

X'X Generalized Inverse (g2)

Int d1 d2 d3 d4 casesInt 0.2 -0.2 -0.2 -0.2 0 27.2d1 -0.2 0.4 0.2 0.2 0 -12.6d2 -0.2 0.2 0.4 0.2 0 -13.8d3 -0.2 0.2 0.2 0.45 0 -7.7d4 0 0 0 0 0 0cases 27.2 -12.6 -13.8 -7.7 0 158.2

Page 49: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

Output matrix•Actually, this matrix is

(X΄X)- (X΄X)- X΄Y Y΄X(X΄X)- Y΄Y-Y΄X(X΄X)- X΄Y

•Parameter estimates are in upper right corner, SSE is lower right corner (last column on previous page)

Page 50: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

Parameter estimates

StPar Est Err t PInt 27.2 B 1.45 18.73 <.0001d1 -12.6 B 2.05 -6.13 <.0001d2 -13.8 B 2.05 -6.72 <.0001d3 -7.7 B 2.17 -3.53 0.0030d4 0.0 B . . .

Page 51: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

Caution Message

NOTE: The X'X matrix has beenfound to be singular, and ageneralized inverse was usedto solve the normal equations.Terms whose estimates arefollowed by the letter 'B' arenot uniquely estimable.

Page 52: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

Interpretation

• If r = 0 (in our case, 4 = 0), then the corresponding estimate should be zero

• the intercept μ is estimated by the mean of the observations in group 4

• since μ + i is the mean of group i, the i are the differences between the mean of group i and the mean of group 4

Page 53: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

Recall the means output

Level ofdesign N Mean Std Dev

1 5 14.6 2.32 5 13.4 3.63 4 19.5 2.64 5 27.2 3.9

Page 54: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

Parameter estimates based on means

Level ofdesign Mean = 27.2 = 27.21 14.6 = 14.6-27.2 = -12.62 13.4 = 13.4-27.2 = -13.83 19.5 = 19.5-27.2 = -7.74 27.2 = 27.2-27.2 = 0

1234

Page 55: Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects

Last slide

• Read KNNL Chapter 16 up to 16.10• We used programs topic20.sas to generate the

output for today• Will focus more on the relationship between

regression and one-way ANOVA in next topic