59
Chap 11-1 Business Statistics Chapter 11 Analysis of Variance

Chapter 11 Analysis of Variance

  • Upload
    bary

  • View
    106

  • Download
    3

Embed Size (px)

DESCRIPTION

Business Statistics. Chapter 11 Analysis of Variance. Chapter Goals. After completing this chapter, you should be able to: Recognize situations in which to use analysis of variance Understand different analysis of variance designs Perform a single-factor hypothesis test and interpret results - PowerPoint PPT Presentation

Citation preview

Page 1: Chapter 11 Analysis of Variance

Chap 11-1

Business Statistics

Chapter 11Analysis of Variance

Page 2: Chapter 11 Analysis of Variance

Chap 11-2

Chapter Goals

After completing this chapter, you should be able to:

Recognize situations in which to use analysis of variance Understand different analysis of variance designs Perform a single-factor hypothesis test and interpret results Conduct and interpret post-analysis of variance pairwise

comparisons procedures Set up and perform randomized blocks analysis Analyze two-factor analysis of variance test with replications

results

Page 3: Chapter 11 Analysis of Variance

Chap 11-3

Chapter Overview

Analysis of Variance (ANOVA)

F-testF-test

Tukey-Kramer

testFisher’s Least

SignificantDifference test

One-Way ANOVA

Randomized Complete

Block ANOVA

Two-factor ANOVA

with replication

Page 4: Chapter 11 Analysis of Variance

Chap 11-4

General ANOVA Setting Investigator controls one or more independent

variables Called factors (or treatment variables) Each factor contains two or more levels (or

categories/classifications) Observe effects on dependent variable

Response to levels of independent variable Experimental design: the plan used to test

hypothesis

Page 5: Chapter 11 Analysis of Variance

Chap 11-5

One-Way Analysis of Variance

Evaluate the difference among the means of three or more populations

Examples: Accident rates for 1st, 2nd, and 3rd shift Expected mileage for five brands of tires

Assumptions Populations are normally distributed Populations have equal variances Samples are randomly and independently drawn

Page 6: Chapter 11 Analysis of Variance

Chap 11-6

Completely Randomized Design

Experimental units (subjects) are assigned randomly to treatments

Only one factor or independent variable With two or more treatment levels

Analyzed by One-factor analysis of variance (one-way ANOVA)

Called a Balanced Design if all factor levels have equal sample size

Page 7: Chapter 11 Analysis of Variance

Chap 11-7

Hypotheses of One-Way ANOVA

All population means are equal i.e., no treatment effect (no variation in means among

groups)

At least one population mean is different i.e., there is a treatment effect Does not mean that all population means are different

(some pairs may be the same)

k3210 μμμμ:H

same the are means population the of all Not:HA

Page 8: Chapter 11 Analysis of Variance

Chap 11-8

One-Factor ANOVA

All Means are the same:The Null Hypothesis is True

(No Treatment Effect)

k3210 μμμμ:H

same the are μ all Not:H iA

321 μμμ

Page 9: Chapter 11 Analysis of Variance

Chap 11-9

One-Factor ANOVA

At least one mean is different:The Null Hypothesis is NOT true

(Treatment Effect is present)

k3210 μμμμ:H

same the are μ all Not:H iA

321 μμμ 321 μμμ

or

(continued)

Page 10: Chapter 11 Analysis of Variance

Chap 11-10

Partitioning the Variation Total variation can be split into two parts:

SST = Total Sum of SquaresSSB = Sum of Squares BetweenSSW = Sum of Squares Within

SST = SSB + SSW

Page 11: Chapter 11 Analysis of Variance

Chap 11-11

Partitioning the Variation

Total Variation = the aggregate dispersion of the individual data values across the various factor levels (SST)

Within-Sample Variation = dispersion that exists among the data values within a particular factor level (SSW)

Between-Sample Variation = dispersion among the factor sample means (SSB)

SST = SSB + SSW

(continued)

Page 12: Chapter 11 Analysis of Variance

Chap 11-12

Partition of Total Variation

Variation Due to Factor (SSB)

Variation Due to Random Sampling (SSW)

Total Variation (SST)

Commonly referred to as: Sum of Squares Within Sum of Squares Error Sum of Squares Unexplained Within Groups Variation

Commonly referred to as: Sum of Squares Between Sum of Squares Among Sum of Squares Explained Among Groups Variation

= +

Page 13: Chapter 11 Analysis of Variance

Chap 11-13

Total Sum of Squares

k

i

n

jij

i

)xx(SST1 1

2

Where:

SST = Total sum of squares

k = number of populations (levels or treatments)

ni = sample size from population i

xij = jth measurement from population i

x = grand mean (mean of all data values)

SST = SSB + SSW

Page 14: Chapter 11 Analysis of Variance

Chap 11-14

c1=c(254,263,241,237,251) c2=c(234,218,235,227,216) c3=c(200,222,197,206,204) y=as.vector(rbind(c1,c2,c3)) Use R to find mean of c1, c2, c3, y Find deviations from the means Find sum of squared deviations

Page 15: Chapter 11 Analysis of Variance

Chap 11-15

Total Variation(continued)

Group 1 Group 2 Group 3

Response, X

X

2212

211 )xx(...)xx()xx(SST

kkn

Page 16: Chapter 11 Analysis of Variance

Chap 11-16

Sum of Squares Between

Where:

SSB = Sum of squares between

k = number of populations

ni = sample size from population i

xi = sample mean from population i

x = grand mean (mean of all data values)

2

1

)xx(nSSB i

k

ii

SST = SSB + SSW

Page 17: Chapter 11 Analysis of Variance

Chap 11-17

Between-Group Variation

Variation Due to Differences Among Groups

i j

2

1

)xx(nSSB i

k

ii

1

kSSBMSB

Mean Square Between = SSB/degrees of freedom

Page 18: Chapter 11 Analysis of Variance

Chap 11-18

Between-Group Variation(continued)

Group 1 Group 2 Group 3

Response, X

X1X 2X

3X

2222

211 )xx(n...)xx(n)xx(nSSB kk

Page 19: Chapter 11 Analysis of Variance

Chap 11-19

Sum of Squares Within

Where:

SSW = Sum of squares within

k = number of populations

ni = sample size from population i

xi = sample mean from population i

xij = jth measurement from population i

2

11

)xx(SSW iij

n

j

k

i

j

SST = SSB + SSW

Page 20: Chapter 11 Analysis of Variance

Chap 11-20

Within-Group Variation

Summing the variation within each group and then adding over all groups

i

kNSSWMSW

Mean Square Within = SSW/degrees of freedom

2

11

)xx(SSW iij

n

j

k

i

j

Page 21: Chapter 11 Analysis of Variance

Chap 11-21

Within-Group Variation(continued)

Group 1 Group 2 Group 3

Response, X

1X2X

3X

22212

2111 )xx(...)xx()xx(SSW kknk

Page 22: Chapter 11 Analysis of Variance

Chap 11-22

One-Way ANOVA Table

Source of Variation

dfSS MS

Between Samples SSB MSB =

Within Samples N - kSSW MSW =

Total N - 1SST =SSB+SSW

k - 1 MSBMSW

F ratio

k = number of populationsN = sum of the sample sizes from all populationsdf = degrees of freedom

SSBk - 1SSWN - k

F =

Page 23: Chapter 11 Analysis of Variance

Chap 11-23

One-Factor ANOVAF Test Statistic

Test statistic

MSB is mean squares between variancesMSW is mean squares within variances

Degrees of freedom df1 = k – 1 (k = number of populations) df2 = N – k (N = sum of sample sizes from all populations)

MSWMSBF

H0: μ1= μ2 = … = μ k

HA: At least two population means are different

Page 24: Chapter 11 Analysis of Variance

Chap 11-24

Interpreting One-Factor ANOVA

F Statistic The F statistic is the ratio of the between

estimate of variance and the within estimate of variance The ratio must always be positive df1 = k -1 will typically be small df2 = N - k will typically be large

The ratio should be close to 1 if H0: μ1= μ2 = … = μk is true

The ratio will be larger than 1 if H0: μ1= μ2 = … = μk is false

Page 25: Chapter 11 Analysis of Variance

Chap 11-25

One-Factor ANOVA F Test Example

You want to see if three different golf clubs yield different distances. You randomly select five measurements from trials on an automated driving machine for each club. At the .05 significance level, is there a difference in mean distance?

Club 1 Club 2 Club 3254 234 200263 218 222241 235 197237 227 206251 216 204

Page 26: Chapter 11 Analysis of Variance

Chap 11-26

••••

One-Factor ANOVA Example: Scatter Diagram

270

260

250

240

230

220

210

200

190

•••••

•••••

Distance

1X

2X

3X

X

227.0 x

205.8 x 226.0x 249.2x 321

Club 1 Club 2 Club 3254 234 200263 218 222241 235 197237 227 206251 216 204

Club1 2 3

Page 27: Chapter 11 Analysis of Variance

Chap 11-27

One-Factor ANOVA Example Computations

Club 1 Club 2 Club 3254 234 200263 218 222241 235 197237 227 206251 216 204

x1 = 249.2

x2 = 226.0

x3 = 205.8

x = 227.0

n1 = 5

n2 = 5

n3 = 5

N = 15

k = 3

SSB = 5 [ (249.2 – 227)2 + (226 – 227)2 + (205.8 – 227)2 ] = 4716.4

SSW = (254 – 249.2)2 + (263 – 249.2)2 +…+ (204 – 205.8)2 = 1119.6

MSB = 4716.4 / (3-1) = 2358.2

MSW = 1119.6 / (15-3) = 93.325.275

93.32358.2F

Page 28: Chapter 11 Analysis of Variance

Chap 11-28

F = 25.275

One-Factor ANOVA Example Solution

H0: μ1 = μ2 = μ3

HA: μi not all equal

= .05df1= 2 df2 = 12

Test Statistic:

Decision:

Conclusion:Reject H0 at = 0.05

There is evidence that at least one μi differs from the rest

0 = .05

F.05 = 3.885Reject H0Do not

reject H0

25.27593.3

2358.2MSWMSBF

Critical Value:

F = 3.885

Page 29: Chapter 11 Analysis of Variance

Chap 11-29

R program c1=c(254,263,241,237,251) c2=c(234,218,235,227,216) c3=c(200,222,197,206,204) f=factor(rep(1:3,5)) y=as.vector(rbind(c1,c2,c3)) aov(y~f); plot(f,y) #gives box plot anova(lm(y~f)) #prints F statistic

Page 30: Chapter 11 Analysis of Variance

Chap 11-30

SUMMARYGroups Count Sum Average Variance

Club 1 5 1246 249.2 108.2

Club 2 5 1130 226 77.5

Club 3 5 1029 205.8 94.2

ANOVASource of Variation SS df MS F P-value F crit

Between Groups 4716.4 2 2358.2 25.275 4.99E-05 3.885

Within Groups 1119.6 12 93.3

Total 5836.0 14        

ANOVA -- Single Factor:Excel Output

EXCEL: tools | data analysis | ANOVA: single factor

Page 31: Chapter 11 Analysis of Variance

Chap 11-31

The Tukey-Kramer Procedure

Tells which population means are significantly different e.g.: μ1 = μ2 μ3

Done after rejection of equal means in ANOVA Allows pair-wise comparisons

Compare absolute mean differences with critical range

xμ 1 = μ 2 μ 3

Page 32: Chapter 11 Analysis of Variance

Chap 11-32

Tukey-Kramer Critical Range

where:q = Value from standardized range table

with k and N - k degrees of freedom for the desired level of

MSW = Mean Square Within ni and nj = Sample sizes from populations (levels) i and j

ji n1

n1

2MSWqRange Critical

Page 33: Chapter 11 Analysis of Variance

Chap 11-33

The Tukey-Kramer Procedure: Example1. Compute absolute mean

differences:Club 1 Club 2 Club 3254 234 200263 218 222241 235 197237 227 206251 216 204 20.2205.8226.0xx

43.4205.8249.2xx

23.2226.0249.2xx

32

31

21

2. Find the q value from the table in appendix J with k and N - k degrees of freedom for

the desired level of

3.77qα

Page 34: Chapter 11 Analysis of Variance

Chap 11-34

The Tukey-Kramer Procedure: Example

5. All of the absolute mean differences are greater than critical range. Therefore there is a significant difference between each pair of means at 5% level of significance.

16.28551

51

293.33.77

n1

n1

2MSWqRange Critical

jiα

3. Compute Critical Range:

20.2xx

43.4xx

23.2xx

32

31

21

4. Compare:

Page 35: Chapter 11 Analysis of Variance

Chap 11-35

Tukey-Kramer in PHStat

Page 36: Chapter 11 Analysis of Variance

Chap 11-36

Randomized Complete Block ANOVA

Like One-Way ANOVA, we test for equal population means (for different factor levels, for example)...

...but we want to control for possible variation from a second factor (with two or more levels)

Used when more than one factor may influence the value of the dependent variable, but only one is of key interest

Levels of the secondary factor are called blocks

Page 37: Chapter 11 Analysis of Variance

Chap 11-37

Partitioning the Variation Total variation can now be split into three parts:

SST = Total sum of squaresSSB = Sum of squares between factor levelsSSBL = Sum of squares between blocksSSW = Sum of squares within levels

SST = SSB + SSBL + SSW

Page 38: Chapter 11 Analysis of Variance

Chap 11-38

Sum of Squares for Blocking

Where:

k = number of levels for this factor

b = number of blocks

xj = sample mean from the jth block

x = grand mean (mean of all data values)

2

1

)xx(kSSBL j

b

j

SST = SSB + SSBL + SSW

Page 39: Chapter 11 Analysis of Variance

Chap 11-39

Partitioning the Variation Total variation can now be split into three parts:

SST and SSB are computed as they were in One-Way ANOVA

SST = SSB + SSBL + SSW

SSW = SST – (SSB + SSBL)

Page 40: Chapter 11 Analysis of Variance

Chap 11-40

Mean Squares

1

kSSB

between square MeanMSB

1

bSSBLblocking square MeanMSBL

)b)(k(SSW withinsquare MeanMSW

11

Page 41: Chapter 11 Analysis of Variance

Chap 11-41

Randomized Block ANOVA Table

Source of Variation dfSS MS

Between Samples SSB MSB

Within Samples (k–1)(b-1)SSW MSW

Total N - 1SST

k - 1

MSBLMSW

F ratio

k = number of populations N = sum of the sample sizes from all populationsb = number of blocks df = degrees of freedom

Between Blocks SSBL b - 1 MSBL

MSBMSW

Page 42: Chapter 11 Analysis of Variance

Chap 11-42

Blocking Test

Blocking test: df1 = b - 1 df2 = (k – 1)(b – 1)

MSBLMSW

...μμμ:H b3b2b10

equal are means block all Not:HA

F =

Reject H0 if F > F

Page 43: Chapter 11 Analysis of Variance

Chap 11-43

Main Factor test: df1 = k - 1 df2 = (k – 1)(b – 1)

MSBMSW

k3210 μ...μμμ:H

equal are means population all Not:HA

F =

Reject H0 if F > F

Main Factor Test

Page 44: Chapter 11 Analysis of Variance

Chap 11-44

Fisher’s Least Significant Difference

Test To test which population means are significantly

different e.g.: μ1 = μ2 ≠ μ3 Done after rejection of equal means in randomized

block ANOVA design Allows pair-wise comparisons

Compare absolute mean differences with critical range

x = 1 2 3

Page 45: Chapter 11 Analysis of Variance

Chap 11-45

Fisher’s Least Significant Difference (LSD) Test

where: t/2 = Upper-tailed value from Student’s t-distribution

for /2 and (k -1)(n - 1) degrees of freedom MSW = Mean square within from ANOVA table

b = number of blocks k = number of levels of the main factor

b2MSWtLSD /2

Page 46: Chapter 11 Analysis of Variance

Chap 11-46

...etc

xx

xx

xx

32

31

21

Fisher’s Least Significant Difference (LSD) Test

(continued)

b2MSWtLSD /2

If the absolute mean difference is greater than LSD then there is a significant difference between that pair of means at the chosen level of significance.

Compare:

?LSDxxIs ji

Page 47: Chapter 11 Analysis of Variance

Chap 11-47

Two-Way ANOVA Examines the effect of

Two or more factors of interest on the dependent variable

e.g.: Percent carbonation and line speed on soft drink bottling process

Interaction between the different levels of these two factors (only if replications exist)

e.g.: Does the effect of one particular percentage of carbonation depend on which level the line speed is set?

Page 48: Chapter 11 Analysis of Variance

Chap 11-48

Two-Way ANOVA

Assumptions

Populations are normally distributed Populations have equal variances Independent random samples are

drawn

(continued)

Page 49: Chapter 11 Analysis of Variance

Chap 11-49

Two-Way ANOVA Sources of Variation

Two Factors of interest: A and B

a = number of levels of factor A

b = number of levels of factor B

N = total number of observations in all cells

n’= replications

Page 50: Chapter 11 Analysis of Variance

Chap 11-50

Two-Way ANOVA Sources of Variation

SSTTotal Variation

SSA

Variation due to factor A

SSB

Variation due to factor B

SSAB

Variation due to interaction between A and B

SSEInherent variation (Error)

Degrees of Freedom:

a – 1

b – 1

(a – 1)(b – 1)

N – ab

N - 1

SST = SSA + SSB + SSAB + SSE(continued)

Page 51: Chapter 11 Analysis of Variance

Chap 11-51

Two Factor ANOVA Equations

a

i

b

j

n

kijk )xx(SST

1 1 1

2

2

1

)xx(nbSSa

iiA

2

1

)xx(naSSb

jjB

Total Sum of Squares:

Sum of Squares Factor A:

Sum of Squares Factor B:

Page 52: Chapter 11 Analysis of Variance

Chap 11-52

Two Factor ANOVA Equations

2

1 1

)xxxx(nSSa

i

b

jjiijAB

a

i

b

j

n

kijijk )xx(SSE

1 1 1

2

Sum of Squares Interaction Between A and B:

Sum of Squares Error:

(continued)

Page 53: Chapter 11 Analysis of Variance

Chap 11-53

Two Factor ANOVA Equations

where:Mean Grand

nab

xx

a

i

b

j

n

kijk

1 1 1

Afactor of level each of Meannb

xx

b

j

n

kijk

i

1 1

B factor of level each of Meanna

xx

a

i

n

kijk

j

1 1

cell each of Meannx

xn

k

ijkij

1

a = number of levels of factor Ab = number of levels of factor Bn’ = number of replications in each cell

(continued)

Page 54: Chapter 11 Analysis of Variance

Chap 11-54

Mean Square Calculations

1

aSS Afactor square MeanMS A

A

1

bSSB factor square MeanMS B

B

)b)(a(SSninteractio square MeanMS AB

AB 11

abNSSEerror square MeanMSE

Page 55: Chapter 11 Analysis of Variance

Chap 11-55

Two-Way ANOVA:The F Test Statistic

F Test for Factor B Main Effect

F Test for Interaction Effect

H0: μA1 = μA2 = μA3 = • • •

HA: Not all μAi are equal

H0: factors A and B do not interact to affect the mean response

HA: factors A and B do interact

F Test for Factor A Main Effect

H0: μB1 = μB2 = μB3 = • • •

HA: Not all μBi are equal

Reject H0 if F > F

MSEMSF A

MSEMSF B

MSEMSF AB

Reject H0 if F > F

Reject H0 if F > F

Page 56: Chapter 11 Analysis of Variance

Chap 11-56

Two-Way ANOVASummary Table

Source ofVariation

Sum ofSquares

Degrees of Freedom

Mean Squares

FStatistic

Factor A SSA a – 1 MSA = SSA /(a – 1)

MSA

MSE

Factor B SSB b – 1MSB

= SSB /(b – 1)MSB

MSE

AB(Interaction) SSAB (a – 1)(b – 1) MSAB

= SSAB / [(a – 1)(b – 1)]MSAB

MSE

Error SSE N – ab MSE = SSE/(N – ab)

Total SST N – 1

Page 57: Chapter 11 Analysis of Variance

Chap 11-57

Features of Two-Way ANOVA F Test

Degrees of freedom always add up N-1 = (N-ab) + (a-1) + (b-1) + (a-1)(b-1) Total = error + factor A + factor B + interaction

The denominator of the F Test is always the same but the numerator is different

The sums of squares always add up SST = SSE + SSA + SSB + SSAB

Total = error + factor A + factor B + interaction

Page 58: Chapter 11 Analysis of Variance

Chap 11-58

Examples:Interaction vs. No Interaction

No interaction:

1 2

Factor B Level 1

Factor B Level 3

Factor B Level 2

Factor A Levels 1 2

Factor B Level 1

Factor B Level 3

Factor B Level 2

Factor A Levels

Mea

n R

espo

nse

Mea

n R

espo

nse

Interaction is present:

Page 59: Chapter 11 Analysis of Variance

Chap 11-59

Chapter Summary Described one-way analysis of variance

The logic of ANOVA ANOVA assumptions F test for difference in k means The Tukey-Kramer procedure for multiple comparisons

Described randomized complete block designs F test Fisher’s least significant difference test for multiple

comparisons Described two-way analysis of variance

Examined effects of multiple factors and interaction