39
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Variance Chapter 16

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Variance Chapter 16

Embed Size (px)

Citation preview

Page 1: Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Variance Chapter 16

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.

Analysisof

Variance

Chapter 16

Page 2: Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Variance Chapter 16

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 2

ANOVA

Analysis of variance: tool for analyzing how the mean value of a quantitative response variable is affected by one or more categorical explanatory factors.

If one categorical variable: one-way ANOVA

If two categorical variables: two-way ANOVA

Page 3: Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Variance Chapter 16

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 3

16.1 Comparing Means with an ANOVA F-Test

F-statistic:

H0: 1 = 2 = … = k

Ha: The means are not all equal.

groups within variationNatural

means sample amongVariation F

Page 4: Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Variance Chapter 16

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 4

groups within variationNatural

means sample amongVariation F

Variation among sample means is 0 if all k sample means are equal and gets larger the more spread out they are.

If F is large enough => evidence at least one population mean differs from others => reject null hypothesis.

p-value found using an F-distribution (more later)

Page 5: Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Variance Chapter 16

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 5

Example 16.1 Seat Location and GPA

Q: Do best students sit in the front of a classroom?

Data on seat location and GPA for n = 384 students; 88 sit in front, 218 in middle, 78 in back

Students sitting in the front generally have slightly higher GPAs than others.

Page 6: Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Variance Chapter 16

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 6

Example 16.1 Seat Location and GPA (cont)

The F-statistic is 6.69 and the p-value is 0.001.

p-value so small => reject H0 and conclude there are differences among the means.

H0: 1 = 2 = 3

Ha: The means are not all equal.

Page 7: Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Variance Chapter 16

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 7

Example 16.1 Seat Location and GPA (cont)

95% Confidence Intervals for 3 population means:

Interval for “front” does not overlap with the other two intervals => significant difference between mean GPA for front-row sitters and mean GPA for other students

Page 8: Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Variance Chapter 16

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 8

Notation for Summary Statisticsk = number of groups , si, and ni are the mean, standard deviation, and sample size for the ith sample groupN = total sample size (N = n1 + n2 + … + nk)

x

Example 16.1 Seat Location and GPA (cont)

Three seat locations => k = 3n1 = 88, n2 = 218, n3 = 78; N = 88+218+78 = 384

5105.0 ,5577.0 ,5491.0

9194.2 ,9853.2 ,2029.3

321

321

sss

xxx

Page 9: Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Variance Chapter 16

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 9

Assumptions for the F-Test• Samples are independent random samples.• Distribution of response variable is a normal curve

within each population.• Different populations may have different means.• All populations have same standard deviation, .

e.g. How k = 3 populations might look …

Page 10: Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Variance Chapter 16

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 10

Conditions for Using the F-Test

• F-statistic can be used if data are not extremely skewed, there are no extreme outliers, and group standard deviations are not markedly different.

• Tests based on F-statistic are valid for data with skewness or outliers if sample sizes are large.

• A rough criterion for standard deviations is that the largest of the sample standard deviations should not be more than twice as large as the smallest of the sample standard deviations.

Page 11: Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Variance Chapter 16

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 11

Example 16.1 Seat Location and GPA (cont)

• The boxplot showed two outliers in the group of students who typically sit in the middle of a classroom, but there are 218 students in that group so these outliers don’t have much influence on the results.

• The standard deviations for the three groups are nearly the same.

• Data do not appear to be skewed.

Necessary conditions for F-test seem satisfied.

Page 12: Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Variance Chapter 16

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 12

The Family of F-Distributions• Skewed distributions with minimum value of 0. • Specific F-distribution indicated by two parameters

called degrees of freedom: numerator degrees of freedom and denominator degrees of freedom.

• In one-way ANOVA, numerator df = k – 1, and denominator df = N – k

Page 13: Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Variance Chapter 16

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 13

Determining the p-Value

Statistical Software reports the p-value in output.

Table A.4 provides critical values for 1% and 5% significance levels.

• If the F-statistic is > than the 5% critical value, the p-value < 0.05.

• If the F-statistic is > than the 1% critical value, the p-value < 0.01 .

• If the F-statistic is between the 1% and 5% critical values, the p-value is between 0.01 and 0.05.

Page 14: Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Variance Chapter 16

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 14

Example 16.2 Testosterone and Occupation

Study: Compare mean testosterone levels for k = 7 occupational groups

Reported F-statistic was F = 2.5 and p-value < 0.05

N = 66 men: num df = k – 1 = 7 – 1 = 6den df = N – k = 66 – 7 = 59

Table A.4 with df of (6, 60):The 5% critical value is 2.25 and the F-statistic was larger so the the p-value < 0.05.

Page 15: Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Variance Chapter 16

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 15

Multiple ComparisonsMultiple comparisons: two or more comparisons are made to examine specific pattern of differences among means.

Most common: all pairwise comparisons.

Ways to make inferences about each pair of means:

• Significance test to assess if two means significantly differ.

• Confidence interval for difference computed and if 0 is not in the interval, there is a statistically significant difference.

Page 16: Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Variance Chapter 16

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 16

Multiple ComparisonsMany statistical tests done => increased risk of making at least one type I error (erroneously rejecting a null hypothesis). Several procedures to control the overall family type I error rate or overall family confidence level.

• Family error rate for set of significance tests is probability of making one or more type I errors when more than one significance test is done.

• Family confidence level for procedure used to create a set of confidence intervals is the proportion of times all intervals in set capture their true parameter values.

Page 17: Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Variance Chapter 16

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 17

Example 16.1 Seat Location and GPA (cont)Pairwise Comparison Output:Tukey: Family confidence level of 0.95 Fisher: 0.95 level for each individual interval

Here, both give same conclusions:Only 1 interval covers 0, Middle – Back

Appears population mean GPAs differ for front and middle students and for front and back students.

Page 18: Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Variance Chapter 16

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 18

16.2 Details of One-Way Analysis of Variance

Fundamental concept: the variation among the data values in the overall sample can be separated into:(1) differences between group means(2) natural variation among observations within a group

Total variation = Variation between groups + Variation within groups

ANOVA Table displays this information.

Page 19: Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Variance Chapter 16

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 19

Measuring Variation Between Groups

Sum of squares for groups = SS Groups

1

Groups SSGroups MS

k

groups ii xxn 2Groups SS

Numerator of F-statistic = mean square for groups

Page 20: Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Variance Chapter 16

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 20

Measuring Variation within Groups

Sum of squared errors = SS Error

kN

Error SSMSE

groups ii sn 21Errors SS

Denominator of F-statistic = mean square error

Pooled standard deviation: MSEps

Page 21: Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Variance Chapter 16

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 21

Measuring Total Variation

Total sum of squares = SS Total = SSTO

values ij xx 2Total SS

SS Total = SS Groups + SS Error

Page 22: Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Variance Chapter 16

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 22

General Format of a One-Way ANOVA Table

Page 23: Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Variance Chapter 16

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 23

Example 16.3 Comparison of Weight Loss Programs

Program 3 appears to have the highest weight loss overall.

15

9

7

3

2

1

x

x

x

Page 24: Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Variance Chapter 16

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 24

Example 16.3 Comparison of Weight Loss Programs (cont)

10 and 3 ,3 ,4

10 and 15 ,9 ,7

321

321

Nnnn

xxxx

1141015310931074

Groups SS

222

2

groups ii xxn

5713

114

1

Groups SSGroups MS

k

Page 25: Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Variance Chapter 16

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 25

Example 16.3 Comparison of Weight Loss Programs (cont)

10 and 3 ,3 ,4

10 and 15 ,9 ,7

321

321

Nnnn

xxxx

148

101810121015

1071011109

107105109107

Total SS

222

222

2222

2

values ij xx

Page 26: Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Variance Chapter 16

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 26

Example 16.3 Comparison of Weight Loss Programs (cont)

10 and 3 ,3 ,4

10 and 15 ,9 ,7

321

321

Nnnn

xxxx

34114148

Groups SSTotal SSError SS

-

-

857.4310

34Error SSMSE

kN

df 7 and 2 with 74.11857.4

57

MSE

Groups MSF

Page 27: Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Variance Chapter 16

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 27

Example 16.3 Comparison of Weight Loss Programs (cont)

1

2

3

“Factor” used instead of Groups as the groups (weight-loss programs) form an explanatory factor for the response.

Note: Pooled StDev is 204.286.4MSE ps

Page 28: Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Variance Chapter 16

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 28

Example 16.4 Top Speeds of Supercars

Data: top speeds for six runs on each of five supercars. Kitchens (1998, p. 783)

Page 29: Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Variance Chapter 16

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 29

Example 16.4 Top Speeds (cont)

Page 30: Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Variance Chapter 16

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 30

Example 16.4 Top Speeds (cont)

• F = 25.15 and p-value is 0.000 => reject null hypothesis that population mean speeds are same for all five cars.

• Conditions are satisfied. Data not skewed and no extreme outliers. Largest sample std dev (5.02 Viper) not more than twice as large as smallest std dev (2.92 Acura).

• MS Error =14.5 is an estimate of variance of top speed for hypothetical distribution of all possible runs with one car. Estimated standard deviation for each car is 3.81.

• Based on sample means and CIs: Porsche and Ferrari seem to be significantly faster than other cars.

Page 31: Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Variance Chapter 16

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 31

95% Confidence Intervals for the Population Means

In one-way analysis of variance, a confidence interval for a population mean is

i

pi

n

stx *

where and

t* is such that the confidence level is the probability between -t* and t* in a t-distribution with df = N – k.

MSEps

Page 32: Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Variance Chapter 16

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 32

16.3 Other MethodsWhen data are skewed or extreme outliers present …better to analyze the median instead of mean

Two such tests are:

1. Kruskal-Wallis Test

2. Mood’s Median Test

Also called nonparametric tests.

H0: Population medians are equal.

Ha: Population medians are not all equal.

Page 33: Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Variance Chapter 16

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 33

Example 16.5 Drinks and Seat Location

Data: Seat location and number of alcoholic drinks per week

Students sitting in the back report drinking more.

Data appear skewed, sample standard deviations differ.

Page 34: Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Variance Chapter 16

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 34

Example 16.5 Drinks and Seat (cont)

P = 0.000 => strong evidence that the population median number of drinks per week are not all equal.

Page 35: Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Variance Chapter 16

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 35

Example 16.5 Drinks and Seat (cont)

P = 0.000 => the null hypothesis of equal population medians can be rejected.

Page 36: Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Variance Chapter 16

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 36

16.4 Two-Way ANOVA (CD Topic S4)

Two-way analysis of variance: to examine how two categorical explanatory variables affect the mean of a quantitative response variable.

Main effect: overall effect of a single explanatory variable.

Interaction: effect on response variable of one explanatory variable depends upon the specific value or level for the other explanatory variable.

Page 37: Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Variance Chapter 16

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 37

Example 16.6 Happy Faces and TipsQ: Does drawing a happy face on the restaurant

bill increase average tip to server?

Effect of drawing happy face depended on gender. Speculated customers felt happy face not gender appropriate for males.

Page 38: Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Variance Chapter 16

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 38

Example 16.7 You’ve Got to Have Heart

Response: Weight gain in InfantsExplanatory: Heartbeat Status (Yes or No)

Initial weight (low, med, high)

Weight gain generally greater for heartbeat group.

There is a main effect for the heartbeat status.

Approximately parallel lines => little/no interaction

Page 39: Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Variance Chapter 16

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 39

Example 16.6 Faces and Tips (cont)

Two-way ANOVA:Three F-statistics are made – one for each main effect and one for interaction.

Since interaction effect is significant => difficult to interpret the main effect.