Anova by Hazilah Mohd Amin

Preview:

DESCRIPTION

Nota kuliah Pensampelan dan analisis data

Citation preview

Analysis of Variance (ANOVA)

Hazilah Mohd Amin

Goals

After completing, you should be able to:

• Recognize situations in which to use analysis of variance (ANOVA)

• Perform a single-factor hypothesis test for Comparing More Than Two Means and interpret results

The F - Distribution

• Analysis-of-variance procedures rely on F-distribution.

• There are infinitely many F-distributions, and we identify an F-distribution (and F-curve) by its number of degrees of freedom. • F-distribution has two numbers of degrees of freedom.

Key Fact F distribuition curve:

Find Critical Value: Example

• Find the F value for 8 df for numerator, 14 df for denominator, and 0.05 area in the right tail of the F distribuition curve.

Critical value: F, df numerator,df denominator = F, 8,14 = ?

Table 12.1 (p. 534) Critical value: F, 8,14 = 2.70

Hypotheses of One-Way ANOVA

• – All population means are equal – i.e., no treatment effect (no variation in means among groups)

• – At least one population mean is different – i.e., there is a treatment effect – Does not mean that all population means are different (some pairs

may be the same)

k3210 μμμμ:H

same the are means population the of all Not:HA

The analysis of variance is a procedure that tests to determine whether differences exits between two or more population means.

One-Factor ANOVA

All Means are the same:The Null Hypothesis is True

(No Treatment Effect)

k3210 μμμμ:H

same the are μ all Not:H iA

321 μμμ

One-Factor ANOVA

At least one mean is different:The Null Hypothesis is NOT true

(Treatment Effect is present)

k3210 μμμμ:H

same the are μ all Not:H iA

321 μμμ 321 μμμ

or

One-Way Analysis of Variance

One-Factor ANOVA F Test: Example 1

You want to see if three different golf clubs yield different distances. You randomly select five measurements from trials on an automated driving machine for each club. At the .05 significance level, is there a difference in mean distance?

Club 1 Club 2 Club 3254 234 200263 218 222241 235 197237 227 206251 216 204

Solution of Example 1– The data are interval– The problem objective is to compare mean

distances in three type of golf club.– We hypothesize that the three population

means are equal

One Way Analysis of Variance

H0: 1 = 2= 3

H1: At least two means differ

• Solution

Defining the Hypotheses

Independent samples are drawn from k populations (treatments).

X11

x21

.

.

.Xn1,1

1

1x

n

X12

x22

.

.

.Xn2,2

2

2x

n

X1k

x2k

.

.

.Xnk,k

k

kx

n

Sample sizeSample mean

X is the “response variable”.The variables’ value are called “responses”.

Notation

Terminology• In the context of this problem…

Response variable – distance

Experimental unit – golf club when we record distance figures.

Factor or treatment – the criterion by which we classify the populations (the treatments). In this problems the factor is the type of golf clubs.

The rationale of the name of Analysis of Variance (ANOVA)

• We are testing the different between means but why ANOVA?

• Two types of variability are employed when testing for the equality of the population means: Within Samples and Between Samples

Graphical demonstration:Employing two types of variability: Within Samples and Between Samples

One Way Analysis of Variance

20

25

30

1

7

Treatment 1 Treatment 2 Treatment 3

10

12

19

9

Treatment 1Treatment 2Treatment 3

20

161514

1110

9

10x1

15x2

20x3

10x1

15x2

20x3

The sample means are the same as before,but the larger within-sample variability makes it harder to draw a conclusionabout the population means.

A small variability withinthe samples makes it easierto draw a conclusion about the population means.

••••

One-Factor ANOVA Example: Scatter Diagram

270

260

250

240

230

220

210

200

190

•••••

•••••

Distance

227.0 x

205.8 x 226.0x 249.2x 321

Club 1 Club 2 Club 3254 234 200263 218 222241 235 197237 227 206251 216 204

Club1 2 3

3x

1x

2x x

From scatter diagram, we can clearly see sample means

difference because of small within-sample variability

Test Statistics (F), Critical Value & Rejection Criterion

• Test statistic:

where MSB is mean squares between varianceswhere MSW is mean squares within variances

• Rejection Region: F > F, k-1,n-k

• Degrees of freedom– df1 = k – 1 (k = levels or treatments)

– df2 = n – k (n = sum of sample sizes from all populations)

MSW

MSBF

H0: μ1= μ2 = … = μ k

HA: At least two population means are different

The hypothesis test:

One-Factor ANOVA Example Computations

Club 1 Club 2 Club 3254 234 200263 218 222241 235 197237 227 206251 216 204

x1 = 249.2

x2 = 226.0

x3 = 205.8

x = 227.0

n1 = 5

n2 = 5

n3 = 5

n = 15

k = 3

MSB = 4716.4 / (3-1) = 2358.2

MSW = 1119.6 / (15-3) = 93.325.275

93.3

2358.2F

kn

SSWMSW

T

1

k

SSBMSB

MSW

MSBF

SSB = 4716.4

SSW = 1119.6

n

x

n

T

n

T

n

T2

3

23

2

22

1

21SSB

3

23

2

22

1

212 )(SSW

n

T

n

T

n

Tx

F = 25.275

One-Factor ANOVA Example Solution

H0: μ1 = μ2 = μ3

HA: μi not all equal = .05df1= k-1 =3-1 =2 df2 = n-k =15-3 =12

Test Statistic:

Decision: Test statistic F is greater than critical value

Conclusion:Reject H0 at = 0.05

There is evidence that at least one μi differs from the rest

0

= .05

F.05 = 3.885Reject H0Do not

reject H0

25.27593.3

2358.2

MSW

MSBF

Critical Value: F, k-1,n-k = F, 2,12 = 3.885

SUMMARY

Groups Count Sum Average Variance

Club 1 5 1246 249.2 108.2

Club 2 5 1130 226 77.5

Club 3 5 1029 205.8 94.2

ANOVA

Source of Variation

SS df MS F P-value F crit

Between Groups

4716.4 2 2358.2 25.275 4.99E-05 3.885

Within Groups

1119.6 12 93.3

Total 5836.0 14        

ANOVA Single Factor: Excel OutputEXCEL: tools | data analysis | ANOVA: single factor

25.27593.3

2358.2

MSW

MSBF F, k-1,n-k = F, 2,12 = 3.885

Rationale 1: Variability Between Sample

• If H0: μ1= μ2 = … = μk is true, we would expect all the sample means to be close to one another.

• If the alternative hypothesis is true, at least some of the sample means would differ.

• Thus, we measure variability between sample means (and hence MSB or MSTr).

• Large variability within the samples weakens the “ability” of the sample means to represent their corresponding population means.

• Therefore, even though sample means may markedly differ from one another, we have to consider the “within samples variability” (and hence MSW or MSE).

Rationale II: Variability Within

Interpreting One-Factor ANOVA F Statistic

• The F statistic is the ratio of the between estimate of variance and the within estimate of variance– The ratio must always be positive– df1 = k -1 will typically be small– df2 = n - k will typically be large

The test statistic F ratio should be close to 1 (SSB small due to small sample means difference) if

H0: μ1= μ2 = … = μk is true

The ratio will be larger than 1 (SSB large due to large sample means difference) if

H0: μ1= μ2 = … = μk is false

MSW

MSBF

Example 2

A study was conducted to determine if the drying time for a certain paint is affected by the type of applicator used. The data in the table on the next screen represents the drying time (in minutes) for 3 different applicators when the paint was applied to standard wallboard.

Is there any evidence to suggest the type of applicator has a significant effect on the paint drying time at the 0.05 level?

Notation: The type of applicator is the treatment, factor or level .

hence k = 3

Notation Used in ANOVAFactor Levels

Sample from Sample from Sample from Sample from

Replication Level 1 Level 2 Level 3 Level k

n = 1 x 1,1 x 2,1 x 3,1 x k ,1

n = 2 x 1,2 x 2,2 x 3,2 x k ,2

n = 3 x 1,3 x 2,3 x 3,3 x k ,3

Column T 1 T 2 T 3 T k T

Totals T = grand total = sum of all x 's = x = T i

. . .

. . .

. . .

Applicator (Level)Brush Roller Pad

(i = 1 ) (i = 2) (i = 3)39.1 31.6 32.739.4 33.4 33.231.1 30.2 28.733.7 41.8 29.230.5 33.9 25.834.6 31.4

26.729.5

Sum 208.4 170.9 237.2

Mean 34.73 34.18 29.65

1T 2T 3T

1x 2x 3x

Sample Results

Solution• Assumptions:

– The data (samples) was randomly collected and all observations are independent.

– The populations are (approximately) normally distributed.

– Populations have equal variances.

• The null and the alternative hypothesis:

Ho: 1 = 2 = 3

The mean drying time is the same for each applicator

Ha: At least one mean is different Not all drying time means are equal

Partition of Total Variation

Variation Due to Factor/Treatment (SSB)

Variation Due to Random Sampling (SSW)

Sum of Squares Total (SST)

Commonly referred to as: Sum of Squares Within (SSW) Sum of Squares Error (SSE) Sum of Squares Unexplained Within Groups Variation

Commonly referred to as: Sum of Squares Between (SSB) Sum of Squares Treatment (SSTr) Sum of Squares Factor Sum of Squares Among Sum of Squares Explained Among Groups Variation

= +

Total variation SST can be split into two parts: SST = SSB + SSW

n

x

n

T

n

T

n

T2

3

23

2

22

1

21SSB

3

23

2

22

1

212 )(SSW

n

T

n

T

n

Tx

x and x2 Calculator: Enter xi data, retrieve x and x2

• Enter Statictics SD: Mode Mode 1• Clear old data: Shift Clr 1 =

• Enter xi data: 39.1 DT 39.4 DT 31.1 DT 33.7 DT 30.5 DT 34.6 DT …29.5 DT

• Find (x): Shift S-SUM 2 = 616.5• Find (x2): Shift S-SUM 1 = 20,316.69

35

89.31280.2000369.2031619

5.61669.20316)(SST

22

2

n

xx

97.1088.2000377.20112

19

)5.616(

8

2.237

5

9.170

6

4.208

SSB

2222

2

3

23

2

22

1

21

n

x

n

T

n

T

n

T

92.20397.10889.312

SSBSSTSSW

Variation Sums of Squares

93.203

20112.77-20316.69

)(SSW3

23

2

22

1

212

n

T

n

T

n

Tx

Mean SquareThe mean square for the factor being tested and for the error is obtained by dividing the sum-of-square value by the corresponding number of degrees of freedom

75.1216

92.203df(error)SS(error)

MS(error)

49.542

97.108df(factor)SS(factor)

MS(factor)

Calculations:

Numerator degrees of freedom = df(factor) = k 1 = 3 1 = 2

df(total) = n 1 = 19 1 = 18

Denominator degrees of freedom = df(error) = n k = 19 3 = 16

One-Way ANOVA Table

Source of Variation

dfSS MS

Between Samples

SSB MSB =

Within Samples n - kSSW MSW =

Total n - 1SST =SSB+SSW

k - 1 MSBMSW

F ratio

SSBk - 1SSWn - k

F =

The sums of squares and the degrees of freedom must check SS(factor) + SS(error) = SS(total) or SSB + SSW = SST df(factor) + df(error) = df(total) or df(between) + df(within) = df(total)

An ANOVA table is often used to record the sums of squares and to organize the rest of the calculations. Format for the ANOVA Table:

Source df SS MS

Factor 2 108.97 54.59

Error 16 203.92 12.75

Total 18 312.89

The Completed ANOVA Table

The Complete ANOVA Table:

27.475.1249.54

MS(error)MS(factor)

* FThe Test Statistic:

Solution Continued

The Results

a. Decision: Reject Ho at = 0.05b. Conclusion: There is evidence to suggest the three population means are not all the same. The type of applicator has a significant effect on the paint drying time at the 0.05 level of significance.

Critical Value: F, k-1,n-k = F, 2,16 = 3.63

The Test Statistic F = 4.27 is in the rejection region.

Reject H0

F.05 = 3.63

Do not reject H0

= .05

One-Way ANOVA F-Test: Exercise 1•You’re a trainer for Microsoft Corp. Is there any evidence to suggest the type of training method has a significant effect on the learning time at the 0.05 level?•The data in the table represents the learning times (in hours) of 12 people using 4 different training methods.

M1 M2 M3 M410 11 13 18

9 16 8 235 9 9 25

© 1984-1994 T/Maker Co.

Answer: Critical Value = 4.07. Test statistic = 11.6

Hey! Lets get our hand dirty… Using SPSS….

One Way Analysis of Variance Using SPSS

• Suppose we want to know whether students who have to work many hours outside school to support themselves find their grade suffering.

• We examine this question by comparing the GPAs of students who work various hours outside school.

• Let’s examine this question using data in Student file. File>Open> Student

One Way Analysis of Variance Using SPSS

• First examine the average GPA for each of the three work categories (0 hrs, 1-19hrs, >20hrs) - WorkCat

• Graph>Boxplot then choose Simple and click Define. Select GPA as the variable and WorkCat for the Category Axis. Click

• Option

After Clicking Options…, click off Display groups defined by missing value, and click

Continue then OK.

• You’ll get this

What is the Box-plot telling us?

• Some variation across the groups• See median GPAs (dark line in the middle

of box) differ slightly between groups. • So, should we attribute the observed

difference to sampling error or they genuinely differ?

• Neither box-plot nor the median offer decisive evidence. Hence we need ANOVA.

One Way Analysis of Variance Using SPSS

• We are testing: H0 : 1 = 2 = 3

H1: At least two means differ

• Before attempting ANOVA, need to review the ANOVA assumptions. (i)Independent samples (ii) Normality (iii) Variances equality. We can test both (ii) & (iii).

• Analyze>Descriptive Statistics>Explore

Analyze>Descriptive Statistics>Explore

• In the Explore dialog box, select GPAs as the dependent List variable, WorkCat as the Factor List variable and Plot as the Display. Next, click Plot…

• We are interested in a normality test, select

Select this & deselect this only. Click Continue

and OK. See next slide…

The Output has several parts, let focus on the tests of normality

• The Kolmogorov-Smirnov test assesses whether there is significant departure from normality in the population distribution of the 3 groups. H0: Distributions are normal.

• Look at the p-values, all are > 0.05. Do not reject H0. Hence no evidence of non-normality.

One Way Analysis of Variance Using SPSS

• We still need to validate the homogeneity of variance assumption. We do this within ANOVA.

• Analyze>Compare Means>One-Way ANOVA• Dependent List variable is GPA and Factor variable is WorkCat.Click Option,

One Way Analysis of Variance Using SPSS

• under Statistics, select Descriptive and Homogeneity of variance test. Click Continue & OK

• H0: Variances are equal. One-Way ANOVA output consists many parts. Look at the p-value > 0.05.

• Hence do not reject H0.

Normality & Homogeneity of variances assumptions met … hence

• Let find out whether students who work various hours outside school differ in their GPAs.

• The P-value of .000 is very small, hence we reject Ho and conclude that

the means GPAs are not all the same. Where are the differences? Hence Post-Hoc test…

End of ANOVA

See U Later…

One-Way ANOVA F-Test: Exercise 1 Solution

•You’re a trainer for Microsoft Corp. Is there any evidence to suggest the type of training method has a significant effect on the learning time at the 0.05 level?•The data in the table represents the learning times (in hours) of 12 people using 4 different training methods.

M1 M2 M3 M410 11 13 18

9 16 8 235 9 9 25

© 1984-1994 T/Maker Co.

Summary Table Solution*

Source ofSource ofVariationVariation

DegreesDegrees ofofFreedomFreedom

Sum ofSum ofSquaresSquares

MeanMeanSquareSquare

(Variance)(Variance)

FF

TreatmentTreatment((MethodsMethods))

4 - 1 = 34 - 1 = 3 348348 116116 11.611.6

ErrorError 12 - 4 = 812 - 4 = 8 8080 1010

TotalTotal 12 - 1 = 1112 - 1 = 11 428428

FF00 4.074.07

One-Way ANOVA F-Test Solution*

•H0: 1 = 2 = 3 = 4

•Ha: Not All Equal• = .05•1 = 3 2 = 8 •Critical Value(s):

Test Statistic: Test Statistic:

Decision:Decision:

Conclusion:Conclusion:Reject at Reject at = .05 = .05

There Is Evidence Pop. There Is Evidence Pop. Means Are DifferentMeans Are Different

= .05= .05

FFMSBMSB

MSEMSE

116116

10101111 66..

Recommended