49
12 - 1 pyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

Chapter 12

Embed Size (px)

DESCRIPTION

Analysis of Variance. Chapter 12. 1. 2. 3. 4. Chapter Goals. When you have completed this chapter, you will be able to:. Discuss the general idea of analysis of variance. List the characteristics of the F distribution. - PowerPoint PPT Presentation

Citation preview

12 - 1

Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

12 - 2

Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

1.1. Discuss the general idea of analysis of variance.

2.2. List the characteristics of the F distribution.

When you have completed this chapter, you will be able to:

Organize data into a one-way and a two-way ANOVA table.

3.3. Conduct a test of hypothesis to determine whether the variances of two populations are equal.

4.4.

12 - 3

Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

5.5. Define the terms treatments and blocks.

6.6. Conduct a test of hypothesis to determine whether three or more treatment means are equal.

7.7. Develop multiple tests for difference between each pair of treatment means.

12 - 4

Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

Characteristics of the F-Distribution

Characteristics of the F-Distribution

There is a “family of F-Distributions:There is a “family of F-Distributions:

Each member of the family is determined by two parameters:

…the numerator degrees of freedom, and the … denominator degrees of freedom

F cannot be negative, and it is a continuous distribution

The F distribution is positively skewed

Its values range from 0 to as F , the curve approaches the X-axis

12 - 5

Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

Test for Equal Variances Test for Equal Variances

For the two tailed test, the test statistic is given by:

The null hypothesis is rejected if the computed value of the test statistic

is greater than the critical value

22

21

s

sF 2

2

21

s

sF

and are the sample variances for the two samples21s 2

2s

12 - 6

Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

Colin, a stockbroker at Critical Securities, reported that the mean rate of return on a sample of 10 internet stocks was 12.6 percent

with a standard deviation of 3.9 percent.

The mean rate of return on a sample of 8 utility stocks was 10.9 percent with a

standard deviation of 3.5 percent.

At the .05 significance level, can Colin conclude that there is

more variation in the internet stocks?

12 - 7

Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

Do not reject H0Do not reject H0 Reject H0 and accept H1

Reject H0 and accept H1

State the null and alternate hypothesesState the null and alternate hypothesesStep 1Step 1

Select the level of significanceSelect the level of significanceStep 2Step 2

Identify the test statisticIdentify the test statisticStep 3Step 3

State the decision ruleState the decision ruleStep 4Step 4

Step 5Step 5

Hypothesis Testing Hypothesis Testing

Compute the value of the test statistic and make a decision

Compute the value of the test statistic and make a decision

12 - 8

Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

Hypothesis Test Hypothesis Test

State the null and alternate hypotheses

State the null and alternate hypotheses

Step 1Step 1

Select the level of significanceSelect the level of significanceStep 2Step 2

Identify the test statisticIdentify the test statisticStep 3Step 3

State the decision ruleState the decision ruleStep 4Step 4

= 0.05

The test statistic is the F distribution

State the decision ruleState the decision ruleStep 4Step 4

Compute the test statistic and make

a decision

Compute the test statistic and make

a decision

Step 5Step 5

Reject H0 if F > 3.68 The df are 9 in the numerator and

7 in the denominator.

Do not reject the null hypothesis; there is insufficient evidence to show more variation in the internet stocks.

= 1.2416 = 1.2416F 22

21

s

s 2

2

)5.3(

)9.3(

220: U IH

U22

1 : IH

12 - 9

Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

This this technique is called analysis of variance or ANOVA

The F distribution is also used for testing whether two or more sample means came from

the same or equal populations

The F distribution is also used for testing whether two or more sample means came from

the same or equal populations

ANOVA

12 - 10

Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

…the populations have equal standard deviations

…the samples are randomly selected and are independent

…the sampled populations follow the normal distribution

ANOVA requires the following

conditions…

ANOVA requires the following

conditions…

12 - 11

Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

The Null Hypothesis (H0) is that the population means are the same

The Alternative Hypothesis (H1) is that

at least one of the means is different

ANOVA ProcedureANOVA Procedure

The Test Statistic is the F distribution

The Decision rule is to reject H0

if F(computed) is greater than F(table)

with numerator and denominator df

12 - 12

Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

12 - 13

Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

Terminology

Total Variation …is the sum of the squared differences between each observation and

the overall mean

Random Variation …is the sum of the squared differences between each observation and

its treatment mean

Treatment Variation …is the sum of the squared differences

between each treatment mean and the overall mean

12 - 14

Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

SSESSTF

kn

k 1

If there are a total of n observations the denominator degrees of freedom is n - k

The test statistic is computed by:

If there k populations being sampled, the numerator degrees of freedom is k – 1

12 - 15

Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

• SS Total is the total sum of squares

nX

X2

2 )(TotalSS

12 - 16

Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

• SST is the treatment sum of squares

nX

nT

SSTc

c22

TC is the column total, nc is the number of observations in each column, X the sum of all the observations, and n the total number of observations

12 - 17

Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

•SSE is the sum of squares error

SST - totalSS SSE

12 - 18

Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

Easy Meals Restaurants specialize in meals for senior citizens.

Katy Smith, President, recently developed a new meat loaf dinner. Before making it a part of the regular

menu she decides to test it in several of her restaurants.

She would like to know if there is a difference in the mean number of dinners sold per day at the Aynor, Loris, and Lander restaurants.

Use the .05 significance level.

12 - 19

Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

Aynor Loris Lander13 10 1812 12 1614 13 1712 11 17

17

Tc 51 46 85nc 4 4 5

Tc 51 46 85nc 4 4 5

…continued

12 - 20

Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

…continued

• SS Total (is the total sum of squares)

= 8613

= 2634 -

)( TotalSS22 n

XX

(182)2

12 - 21

Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

…continued

nX

nT

SSTc

c22

•SST is the treatment sum of squares

= 76.2513

)182(5

854

464

51 222 2

12 - 22

Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

•SSE is the sum of squares error…continued

SSE = SS Total - SST

86 – 76.25

= 9.75

12 - 23

Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

Hypothesis Test Hypothesis Test

State the null and alternate hypotheses

State the null and alternate hypotheses

Step 1Step 1

Select the level of significanceSelect the level of significanceStep 2Step 2

Identify the test statisticIdentify the test statisticStep 3Step 3

State the decision ruleState the decision ruleStep 4Step 4

= 0.05The test statistic is the

F distribution

State the decision ruleState the decision ruleStep 4Step 4

Compute the test statistic and make

a decision

Compute the test statistic and make

a decision

Step 5Step 5

Reject H0 if F > 4.10 The df are 2 in the numerator and

10 in the denominator.

= 39.10 = 39.10

1:H

0:H 1 2 == 3Treatment means are not all equal

SSE

SSTF

kn

k 1

109.75 2 76.25

12 - 24

Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

The decision is to reject the null hypothesis

The treatment means are not the same

The mean number of meals sold at the three locations is not the same

…continued

12 - 25

Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

Analysis of Variance

Source DF SS MS F P

Factor 2 76.250 38.125 39.10 0.000

Error 10 9.750 0.975

Total 12 86.000

Individual 95% CIs For Mean Based on Pooled St.Dev

Level N Mean St.Dev ---------+---------+---------+-------

Aynor 4 12.750 0.957 (---*---)

Loris 4 11.500 1.291 (---*---)

Lander 5 17.000 0.707 (---*---)

---------+---------+---------+-------

Pooled St.Dev = 0.987 12.5 15.0 17.5

ANOVA TableANOVA Table

…from the Minitab system

12 - 26

Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

Analysis of Variance

in Excel

12 - 27

Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

SeeSee

Using

Click on DATA ANALYSIS

Click on DATA ANALYSIS

See…See…

Click on Tools

12 - 28

Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

Highlight ANOVA: SINGLE FACTOR…Click OK

Using

See…See…

SeeSee

12 - 29

Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

Using

INPUT NEEDS INPUT NEEDS

A1:C6

SeeSee

Click on OK

See…See…

Input the sample data in Columns A, B, C.Input the sample data in Columns A, B, C.

12 - 30

Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

Using

SS TotalSS Total

SSTSSTSSESSE

F testF test

12 - 31

Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

Inferences About

Treatment Means

Inferences About

Treatment Means

12 - 32

Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

When we reject the null hypothesis that the means are equal, we may want to know which treatment means differ

When we reject the null hypothesis that the means are equal, we may want to know which treatment means differ

One of the simplest procedures is through the use of confidence intervals

Inferences

About Treatment

Means

Inferences

About Treatment

Means

Confidence IntervalConfidence Interval

12 - 33

Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

Confidence Interval for the Difference Between Two Means

Confidence Interval for the Difference Between Two Means

where t is obtained from the t table with degrees of freedom (n - k).

MSE = [SSE/(n - k)]

where t is obtained from the t table with degrees of freedom (n - k).

MSE = [SSE/(n - k)]

X X1 2 t MSEn n1 2

1 1

12 - 34

Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

Develop a 95% confidence interval for the difference in the mean number

of meat loaf dinners sold in Lander and Aynor.

Can Katy conclude that there is a difference between the two restaurants?

Confidence Interval for the Difference Between Two Means

Confidence Interval for the Difference Between Two Means

12 - 35

Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

MSEMSE

X X1 2 t MSE

n n1 2

1 1

(17-12.75) 2.228 .9751

4

1

5

. .4 25 1 48 ( 2.77, 5.73)

Confidence Interval for the Difference Between Two Means

Confidence Interval for the Difference Between Two Means

12 - 36

Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

Because zero is not in the interval, we conclude that this pair of means differs

The mean number of meals sold in Aynor is different from Lander

Confidence Interval for the Difference Between Two Means

Confidence Interval for the Difference Between Two Means

…continued

12 - 37

Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

For the two-factor ANOVA we test whether there is a significant difference between the treatment effect

and whether there is a difference in the blocking effect!

…Let Br be the block totals (r for rows)

…Let SSB represent the sum of squares for the blocks

ANOVAANOVA

SSBBk

Xn

r

2 2( )

12 - 38

Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

The Bieber Manufacturing Co. operates 24 hours a day, five days a week.

The workers rotate shifts each week. Todd Bieber, the owner, is interested in whether

there is a difference in the number of units produced when the employees

work on various shifts. A sample of five workers is selected and their output

recorded on each shift. At the .05 significance level, can we conclude there is a difference in the

mean production by shift and in the mean production by employee?

ANOVAANOVA

12 - 39

Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

ANOVAANOVA

Employee DayOutput

EveningOutput

NightOutput

McCartney 31 25 35

Neary 33 26 33

Schoen 28 24 30

Thompson 30 29 28

Wagner 28 26 27

…continued

12 - 40

Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

Hypothesis Test Hypothesis Test

State the null and alternate hypotheses

State the null and alternate hypotheses

Step 1Step 1

Select the level of significanceSelect the level of significanceStep 2Step 2

Identify the test statisticIdentify the test statisticStep 3Step 3

State the decision ruleState the decision ruleStep 4Step 4

= 0.05

The test statistic is the F distribution

State the decision ruleState the decision ruleStep 4Step 4

Compute the test statistic and make

a decision

Compute the test statistic and make

a decision

Step 5Step 5

Reject H0 if F > 4.46. The df are 2

and 8

1:H

0:H 1 2 == 3Not all means are equal

)1)(1(

1

bkSSE

kSSTF

Difference between various shifts?Difference between various shifts?

12 - 41

Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

Compute the various sum of squares:

SS(total) = 139.73

SST = 62.53

SSB = 33.73

SSE = 43.47

df(block) = 4, df(treatment) = 2 df(error)=8

ANOVAANOVA

…continued

Using to get these results

Using to get these results

12 - 42

Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

Since 5.754 > 4.46, H0 is rejected.

151343.47

1353.62

ANOVAANOVA

…continued

Step 5Step 5

There is a difference in the mean number of units produced on the different shifts.

)1)(1(

1

bkSSE

kSSTF

= 5.754

12 - 43

Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

Hypothesis Test Hypothesis Test

State the null and alternate hypotheses

State the null and alternate hypotheses

Step 1Step 1

Select the level of significanceSelect the level of significanceStep 2Step 2

Identify the test statisticIdentify the test statisticStep 3Step 3

State the decision ruleState the decision ruleStep 4Step 4

= 0.05

The test statistic is the F distribution

State the decision ruleState the decision ruleStep 4Step 4

Compute the test statistic and make

a decision

Compute the test statistic and make

a decision

Step 5Step 5

1:H

0:H 1 2 == 3Not all means are equal

)1)(1(

1

bkSSE

kSSTF

Difference between various shifts?Difference between various shifts?

Reject H0 if F > 3.84 The df are 4 and 8

12 - 44

Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

ANOVAANOVA

…continued

Step 5Step 5

)1)(1(

1

bkSSE

kSSTF

Since 1.55 < 3.84, H0 is not rejected.

= 1.55 4243.47

433.73

There is no significant difference in the mean number of units produced by the various employees.

12 - 45

Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

Units versus Worker, ShiftAnalysis of Variance for Units Source DF SS MS F P

Worker 4 33.73 8.43 1.55 0.276

Shift 2 62.53 31.27 5.75 0.028

Error 8 43.47 5.43

Total 14 139.73

Units versus Worker, ShiftAnalysis of Variance for Units Source DF SS MS F P

Worker 4 33.73 8.43 1.55 0.276

Shift 2 62.53 31.27 5.75 0.028

Error 8 43.47 5.43

Total 14 139.73

…from the Minitab system

ANOVAANOVA

12 - 46

Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

Using

See…See…

Highlight ANOVA: TWO FACTOR WITHOUT REPLICATION

…Click OK

SelectSelect

INPUT DATA INPUT DATA

12 - 47

Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

SS TotalSS Total

SSTSSE

SSBFtestFtest FcriticalFcritical

Using

Since F(test) < F(critical), there is not sufficient evidence to reject H0

Since F(test) < F(critical), there is not sufficient evidence to reject H0

There is no significant difference in the average

number of units produced by the different employees.

There is no significant difference in the average

number of units produced by the different employees.

12 - 48

Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

Test your learning…Test your learning…

www.mcgrawhill.ca/college/lindClick on…Click on…

Online Learning Centrefor quizzes

extra contentdata setssearchable glossaryaccess to Statistics Canada’s E-Stat data…and much more!

12 - 49

Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

This completes Chapter 12This completes Chapter 12