19
10 The Analysis of Variance

10 The Analysis of Variance. 10.1 Single-Factor ANOVA

Embed Size (px)

Citation preview

Page 1: 10 The Analysis of Variance. 10.1 Single-Factor ANOVA

10 The Analysis of Variance

Page 2: 10 The Analysis of Variance. 10.1 Single-Factor ANOVA

10.1 Single-Factor ANOVA

Page 3: 10 The Analysis of Variance. 10.1 Single-Factor ANOVA

3

Single-Factor ANOVA

Single-factor ANOVA focuses on a comparison of more than two population or treatment means. Let

l = the number of populations or treatments being compared

1 = the mean of population 1 or the true average response when treatment 1 is applied

.

.

.

I = the mean of population I or the true average response when treatment I is applied

Page 4: 10 The Analysis of Variance. 10.1 Single-Factor ANOVA

4

Single-Factor ANOVA

The relevant hypotheses are

H0: 1 = 2 = ··· = I

versus

Ha: at least two the of the i’s are different

If I = 4, H0 is true only if all four i’s are identical. Ha would be true, for example, if

1 = 2 3 = 4, if 1 = 3 = 4 2,

or if all four i’s differ from one another.

Page 5: 10 The Analysis of Variance. 10.1 Single-Factor ANOVA

5

The sample means for the three samples are the same for each set. The variation among sample means for (a) is identical to (b). The variation among the individuals within the three samples is much less

for (b).

CONCLUSION: the samples in (b) contain a larger amount of variation among the sample means relative to the amount of variation within the samples, so ANOVA will find more significant differences among the means in (b)– assuming equal sample sizes here for (a) and (b).– Note: larger samples will find more significant differences.

The Idea of ANOVA

Page 6: 10 The Analysis of Variance. 10.1 Single-Factor ANOVA

6

Comparing Several MeansDo SUVs, trucks and midsize cars have same gas mileage?

Response variable: gas mileage (mpg)

Groups: vehicle classification

31 midsize cars

31 SUVs

14 standard-size pickup trucks

Data from the Environmental Protection Agency’s Model Year 2003 Fuel Economy Guide, www.fueleconomy.gov.

Page 7: 10 The Analysis of Variance. 10.1 Single-Factor ANOVA

7

Means:Midsize: 27.903SUV: 22.677Pickup: 21.286

Mean gas mileage for SUVs and pickups appears less than for midsize cars.

Are these differences statistically significant?

Comparing Several Means

Page 8: 10 The Analysis of Variance. 10.1 Single-Factor ANOVA

8

Null hypothesis: The true means (for gas mileage) are the same for all groups (the three vehicle classifications).

We could look at separate t tests to compare each pair of means to see if they are different: 27.903 vs. 22.677, 27.903 vs. 21.286, & 22.677 vs. 21.286 H0: μ1 = μ2 H0: μ1 = μ3 H0: μ2 = μ3

However, this gives rise to the problem of multiple comparisons!

Means:Midsize: 27.903SUV: 22.677Pickup: 21.286

Comparing Several Means

Page 9: 10 The Analysis of Variance. 10.1 Single-Factor ANOVA

9

Random sampling always produces chance variations. Any “factor effect” would thus show up in our data as the factor-driven differences plus chance variations (“error”):

Data = fit + residual

The one-way ANOVA model analyzes situations where chance variations are normally distributed N(0,σ) such that:

The One-Way ANOVA Model

Page 10: 10 The Analysis of Variance. 10.1 Single-Factor ANOVA

sample samein sindividual amongvariation

means sample amongvariation F

Difference in

means small

relative to overall

variability

Difference in

means large

relative to overall

variability

Larger F-values typically yield more significant results. How large depends on

the degrees of freedom (I− 1 and N− I).

F tends to be small F tends to be large

The ANOVA F TestTo determine statistical significance, we need a test statistic that we can calculate:The ANOVA F StatisticThe analysis of variance F statistic for testing the equality of several means has this form:

The ANOVA F StatisticThe analysis of variance F statistic for testing the equality of several means has this form:

variation among the sample meansF

variation among individuals in the same sample=

Page 11: 10 The Analysis of Variance. 10.1 Single-Factor ANOVA

11

The ANOVA F Test

The measures of variation in the numerator and denominator are mean squares:

Numerator: Mean Square for Treatments (MSTr)

Denominator: Mean Square for Error (MSE)

1

2222

211

I

II ..)xx(n..)xx(n..)xx(nMSTr

...

III

N

snsnsnMSE

2222

211 )()()( 111

Page 12: 10 The Analysis of Variance. 10.1 Single-Factor ANOVA

12

Notation

The individual sample means will be denoted by X1, X2, . . ., XI.

That is,

for i=1,…,I

Similarly, the average of all N observations, called the grand mean, is

i

1j

i

nX

n

ij

i

X

.

N

X ij

I

i

i

1j

n

X 1..

Page 13: 10 The Analysis of Variance. 10.1 Single-Factor ANOVA

13

Notation

Additionally, let , denote the sample variances:

for i=1,…,I1

.)( 2

2

i

1j

i

n

n

iij

i

XX

S

Page 14: 10 The Analysis of Variance. 10.1 Single-Factor ANOVA

14

The ANOVA Table

The computations are often summarized in a tabular format, called an ANOVA table in below Table.

Tables produced by statistical software customarily include a P-value column to the right of f.

An ANOVA Table

Source of variation Sum of squares Df Mean square F P value F crit

Treatments SSTr I -1 SSTr/(I -1) MSTr/MSE Tail area above F

Value of F for

Error SSE N – I SSE/(N – I)

Total SST=SSTr+SSE N – 1

Page 15: 10 The Analysis of Variance. 10.1 Single-Factor ANOVA

15

F Distributions and the F Test

Both v1 and v2 are positive integers. Figure 10.3 pictures an F density curve and the corresponding upper-tail critical value Appendix Table A.9 gives these critical values for = .10, .05, .01, and .001.

Values of v1 are identified with different columns of the table, and the rows are labeled with various values of v2.

An F density curve and critical value Figure 10.3

Page 16: 10 The Analysis of Variance. 10.1 Single-Factor ANOVA

Do nematodes affect plant growth? A botanist prepares

16 identical planting pots and adds different numbers of

nematodes into the pots. Seedling growth (in mm) is

recorded two weeks later.

Nematodes and plant growth

Nematodes0 10.8 9.1 13.5 9.2 10.65

1,000 11.1 11.1 8.2 11.3 10.435,000 5.4 4.6 7.4 5 5.6

10,000 5.8 5.3 3.2 7.5 5.45

Seedling growth

overall mean 8.03

x i

Hypotheses: All i are the same (H0)

versus not All i are the same (Ha)

Page 17: 10 The Analysis of Variance. 10.1 Single-Factor ANOVA

Output for the one-way ANOVA

numeratordenominator

Here, the calculated F-value (12.08) is larger than Fcritical (3.49) for 0.05.

Thus, the test is significant at 5% Not all mean seedling lengths are

the same; the number of nematodes is an influential factor.

Page 18: 10 The Analysis of Variance. 10.1 Single-Factor ANOVA

Using F-tableThe F distribution is asymmetrical and has two distinct degrees of

freedom. This was discovered by Fisher, hence the label “F.”

Once again, what we do is calculate the value of F for our sample data

and then look up the corresponding area under the curve in F-Table.

Page 19: 10 The Analysis of Variance. 10.1 Single-Factor ANOVA

Fcritical for 5% is 3.49

F = 12.08 > 10.80

Thus p< 0.001

ANOVASource of Variation SS df MS F P-value F crit

Between Treatments101 3 33.5 12.1 0.00062 3.4903Within Treatments33.3 12 2.78

Total 134 15