80
Comparing k Populations Means – One way Analysis of Variance (ANOVA)

Comparing k Populations Means – One way Analysis of Variance (ANOVA)

Embed Size (px)

Citation preview

Page 1: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

Comparing k Populations

Means – One way Analysis of Variance (ANOVA)

Page 2: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

The F test – for comparing k means

Situation

• We have k normal populations

• Let i and denote the mean and standard deviation of population i.

• i = 1, 2, 3, … k.

• Note: we assume that the standard deviation for each population is the same.

1 = 2 = … = k =

Page 3: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

We want to test

kH 3210 :

against

jiH jiA ,pair oneleast at for :

Page 4: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

The data• Assume we have collected data from each

of th k populations

• Let xi1, xi2 , xi3 , … denote the ni observations from population i.

• i = 1, 2, 3, … k.

Let

i

n

jij

i n

x

x

i

1

1

1

2

i

n

iiij

i n

xxs

Page 5: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

One possible solution (incorrect)• Choose the populations two at a time

• then perform a two sample t test of

• Repeat this for every possible pair of populations

1 1pooled

x yt

sn m

0 : vs :i j A i jH H

Page 6: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

• The flaw with this procedure is that you are performing a collection of tests rather than a single test

• If each test is performed with = 0.05, then the probability that each test makes a type I error is 5% but the probability the group of tests makes a type I error could be considerably higher than 5%.

• i.e. Suppose there is no different in the means of the populations. The chance that this procedure could declare a significant difference could be considerably higher than 5%

Page 7: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

The Bonferoni inequalityIf N tests are preformed with significance level .

then

P[group of N tests makes a type I error] ≤ 1 – (1 – )N

Example:Suppose . = 0.05, N = 10 then

P[group of N tests makes a type I error]

≤ 1 – (0.95)10 = 0.41

Page 8: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

For this reason we are going to consider a single test for testing:

kH 3210 :

against

jiH jiA ,pair oneleast at for :

Note: If k = 10, the number of pairs of means (and hence the number of tests that would have to be performed ) is:

10 2

10 10 945

2 2 1C

Page 9: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

The F test

Page 10: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

To test kH 3210 :

against jiH jiA ,pair oneleast at for :

knsn

kxxn

s

sF

k

ii

k

iii

k

iii

Pooled

Between

11

2

1

2

2

2

1

1use the test statistic

where mean for the sample.thix i

standard deviation for the samplethis i

1 1

1

overall meank k

k

n x n xx

n n

Page 11: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

is called the Between Sum of Squares and is denoted by SSBetween

It measures the variability between samples

the statistic 2

1

k

i ii

n x x

k – 1 is known as the Between degrees of freedom and

is called the Between Mean Square and is denoted by MSBetween

2

1

1k

i ii

n x x k

Page 12: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

is called the Within Sum of Squares and is denoted by SSWithin

the statistic

is known as the Within degrees of freedom and

is called the Within Mean Square and is denoted by MSWithin

2

1 1

1k k

i i ii i

n s n k

2

1

1k

i ii

n s

1

k

ii

n k N k

Page 13: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

then

Between WithinF MS MS

Page 14: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

The Computing formula for F:

k

i

n

jij

i

x1 1

2

Compute

ixTin

jiji samplefor Total

1

Total Grand 1 11

k

i

n

jij

k

ii

i

xTG

size sample Total1

k

iinN

k

i i

i

n

T

1

2

1)

2)

3)

4)

5)

Page 15: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

Then

1)

2)

k

i i

ik

i

n

jijWithin n

TxSS

i

1

2

1 1

2

BetweenSS

k

i i

i

N

G

n

T

1

22

3) kNSS

kSSF

Within

Between

1

Page 16: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

We reject

kH 3210 :

FF if

F is the critical point under the F distribution with 1 = k - 1degrees of freedom in the numerator and 2 = N – k degrees of freedom in the denominator

The critical region for the F test

Page 17: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

Example

In the following example we are comparing weight gains resulting from the following six diets

1. Diet 1 - High Protein , Beef

2. Diet 2 - High Protein , Cereal

3. Diet 3 - High Protein , Pork

4. Diet 4 - Low protein , Beef

5. Diet 5 - Low protein , Cereal

6. Diet 6 - Low protein , Pork

Page 18: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

Gains in weight (grams) for rats under six diets differing in level of protein (High or Low) and source of protein (Beef, Cereal, or Pork)

Diet 1 2 3 4 5 6

73 98 94 90 107 49 102 74 79 76 95 82 118 56 96 90 97 73 104 111 98 64 80 86 81 95 102 86 98 81 107 88 102 51 74 97 100 82 108 72 74 106 87 77 91 90 67 70 117 86 120 95 89 61 111 92 105 78 58 82

Mean 100.0 85.9 99.5 79.2 83.9 78.7 Std. Dev. 15.14 15.02 10.92 13.89 15.71 16.55

x 1000 859 995 792 839 787 x2 102062 75819 100075 64462 72613 64401

Page 19: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

Hence

4794321 1

2

k

i

n

jij

i

x

60 size sample Total1

k

iinN

4678461

2

k

i i

i

n

T

i 1 2 3 4 5 6 Total (G )T i 1000 859 995 792 839 787 5272

Page 20: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

Thus

115864678464794321

2

1 1

2

k

i i

ik

i

n

jijWithin n

TxSS

i

BetweenSS 933.461260

5272467846

2

1

22

k

i i

i

N

G

n

T

3.4

56.214

6.922

54/11586

5/933.46121

kNSS

kSSF

Within

Between

54 and 5 with 386.2 2105.0 F

Thus since F > 2.386 we reject H0

Page 21: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

The ANOVA Table

A convenient method for displaying the calculations for the F-test

Page 22: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

Source d.f. Sum of Squares

Mean Square

F-ratio

Between k - 1 SSBetween MSBetween MSB /MSW

Within N - k SSWithin MSWithin

Total N - 1 SSTotal

Anova Table

Page 23: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

Source d.f. Sum of Squares

Mean Square

F-ratio

Between 5 4612.933 922.587 4.3

Within 54 11586.000 214.556 (p = 0.0023)

Total 59 16198.933

The Diet Example

Page 24: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

Equivalence of the F-test and the t-test when k = 2

mns

yxt

Pooled

11

2

11 22

mn

smsns yx

Pooled

the t-test

Page 25: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

the F-test

knsn

kxxn

s

sF

k

ii

k

iii

k

iii

Pooled

Between

11

2

1

2

2

2

1

1

2 2

1 1 2 2

2 21 1 2 2 1 21 1 2

n x x n x x

n s n s n n

2 2

1 1 2 2numerator n x x n x x

2r denominato pooleds

Page 26: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

2

21

221122

222

nn

xnxnxnxxn

2

21

221111

211

nn

xnxnxnxxn

2212

21

221 xxnn

nn

2212

21

221 xx

nn

nn

Page 27: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

221221

212

2212

222

11 xxnn

nnnnxxnxxn

22121

21 xxnn

nn

221

21

11

1xx

nn

22

221

21

11

1t

s

xx

nn

FPooled

Hence

Page 28: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

Using SPSS

Note: The use of another statistical package such as Minitab is similar to using SPSS

Page 29: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

Assume the data is contained in an Excel file

Page 30: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

Each variable is in a column

1. Weight gain (wtgn)

2. diet

3. Source of protein (Source)

4. Level of Protein (Level)

Page 31: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

After starting the SSPS program the following dialogue box appears:

Page 32: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

If you select Opening an existing file and press OK the following dialogue box appears

Page 33: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

The following dialogue box appears:

Page 34: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

If the variable names are in the file ask it to read the names. If you do not specify the Range the program will identify the Range:

Once you “click OK”, two windows will appear

Page 35: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

One that will contain the output:

Page 36: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

The other containing the data:

Page 37: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

To perform ANOVA select Analyze->General Linear Model-> Univariate

Page 38: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

The following dialog box appears

Page 39: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

Select the dependent variable and the fixed factors

Press OK to perform the Analysis

Page 40: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

Tests of Between-Subjects Effects Dependent Variable: wtgn

Source Type III Sum of

Squares df Mean Square F Sig. Corrected Model 4612.933(a) 5 922.587 4.300 .002

Intercept 463233.067 1 463233.067 2159.036 .000

diet 4612.933 5 922.587 4.300 .002

Error 11586.000 54 214.556

Total 479432.000 60

Corrected Total 16198.933 59

a R Squared = .285 (Adjusted R Squared = .219)

The Output

Page 41: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

Comments

• The F-test H0: 1 = 2 = 3 = … = k against HA: at least one pair of means are different

• If H0 is accepted we know that all means are equal (not significantly different)

• If H0 is rejected we conclude that at least one pair of means is significantly different.

• The F – test gives no information to which pairs of means are different.

• One now can use two sample t tests to determine which pairs means are significantly different

Page 42: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

Fishers LSD (least significant difference) procedure:

1. Test H0: 1 = 2 = 3 = … = k against HA: at least one pair of means are different, using the ANOVA F-test

2. If H0 is accepted we know that all means are equal (not significantly different). Then stop in this case

3. If H0 is rejected we conclude that at least one pair of means is significantly different, then follow this by• using two sample t tests to determine which pairs

means are significantly different

Page 43: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

Example

In the following example we are comparing weight gains resulting from the following six diets

1. Diet 1 - High Protein , Beef

2. Diet 2 - High Protein , Cereal

3. Diet 3 - High Protein , Pork

4. Diet 4 - Low protein , Beef

5. Diet 5 - Low protein , Cereal

6. Diet 6 - Low protein , Pork

Page 44: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

Gains in weight (grams) for rats under six diets differing in level of protein (High or Low) and source of protein (Beef, Cereal, or Pork)

Diet 1 2 3 4 5 6

73 98 94 90 107 49 102 74 79 76 95 82 118 56 96 90 97 73 104 111 98 64 80 86 81 95 102 86 98 81 107 88 102 51 74 97 100 82 108 72 74 106 87 77 91 90 67 70 117 86 120 95 89 61 111 92 105 78 58 82

Mean 100.0 85.9 99.5 79.2 83.9 78.7 Std. Dev. 15.14 15.02 10.92 13.89 15.71 16.55

x 1000 859 995 792 839 787 x2 102062 75819 100075 64462 72613 64401

Page 45: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

Hence

4794321 1

2

k

i

n

jij

i

x

60 size sample Total1

k

iinN

4678461

2

k

i i

i

n

T

i 1 2 3 4 5 6 Total (G )T i 1000 859 995 792 839 787 5272

Page 46: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

Thus

115864678464794321

2

1 1

2

k

i i

ik

i

n

jijWithin n

TxSS

i

BetweenSS 933.461260

5272467846

2

1

22

k

i i

i

N

G

n

T

Page 47: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

Source d.f. Sum of Squares

Mean Square

F-ratio

Between 5 4612.933 922.587 4.3

Within 54 11586.000 214.556 (p = 0.0023)

Total 59 16198.933

The ANOVA Table

54 and 5 with 386.2 2105.0 F

Thus since F > 2.386 we reject H0

Conclusion: There are significant differences amongst the k = 6 means

Page 48: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

1 1i j

pooledi j

x xt

sn n

with t0.025 = 2.005 for 54 d.f.

Now we want to perform t tests to compare the k = 6 means

pooled withins MS

Page 49: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

LevelSource Beef Cereal Pork Beef Cereal Pork

Diet 1 2 3 4 5 6

Mean 100.0 85.9 99.5 79.2 83.9 78.7

High Low

Critical value t0.025 = 2.005 for 54 d.f.

t values that are significant are indicated in bold.

Table of means

t test results

i 1 vs i 2 vs i 3 vs i 4 vs i 5 vs i

2 2.1523 0.076 -2.0764 3.175 1.023 3.0995 2.458 0.305 2.381 -0.7176 3.252 1.099 3.175 0.076 0.794

value tabled is where 1 1

i jpooled within

pooledi j

x xt s MS

sn n

Page 50: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

Conclusions:

1. There is no significant difference between diet 1 (high protein, pork) and diet 3 (high protein, pork).

2. There are no significant differences amongst diets 2, 4, 5 and 6. (i. e. high protein, cereal (diet 2) and the low protein diets (diets 4, 5 and 6)).

3. There are significant differences between diets 1and 3 (high protein, meat) and the other diets (2, 4, 5, and 6).

Major conclusion: High protein diets result in a higher weight gain but only if the source of protein is a meat source.

i 1 vs i 2 vs i 3 vs i 4 vs i 5 vs i

2 2.1523 0.076 -2.0764 3.175 1.023 3.0995 2.458 0.305 2.381 -0.7176 3.252 1.099 3.175 0.076 0.794

Page 51: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

These are similar conclusions to those made using exploratory techniques– Examining box-plots

Page 52: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

Non-Outlier MaxNon-Outlier Min

Median; 75%25%

Box Plots: Weight Gains for Six Diets

Diet

We

igh

t G

ain

40

50

60

70

80

90

100

110

120

130

1 2 3 4 5 6

High Protein Low Protein

Beef Beef Cereal Cereal Pork Pork

Page 53: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

Conclusions

• Weight gain is higher for the high protein meat diets

• Increasing the level of protein - increases weight gain but only if source of protein is a meat sourceThe carrying out of the F-test and Fisher’s LSD ensures the significance of the conclusions. Differences observed exploratory methods could have occurred by chance.

Page 54: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

Comparing k Populations

Proportions

The 2 test for independence

Page 55: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

The two sample test for proportions

21

21

11ˆ1ˆ

ˆˆ statistictest

nnpp

ppz

21

21

2

21

1

11 ˆ and ˆ,ˆ

nn

xxp

n

xp

n

xp

population

1 2 Total

Success x1 x2 x1 + x2

Failuren1 - x2 n2 - x2

n1 + n2- (x1 + x2)

Total n1 n2 n1 + n2

The data can be displayed in the following table:

Page 56: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

1 2 c Total

1 x11 x12 R1

2x21 x22 R2

Rr

Total C1 C2 Cc N

This problem can be extended in two ways:1.Increasing the populations (columns) from 2 to k (or c)2.Increasing the number of categories (rows) from 2 to r.

Page 57: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

The 2 test for independence

Page 58: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

Situation

• We have two categorical variables R and C.

• The number of categories of R is r.

• The number of categories of C is c.

• We observe n subjects from the population and count xij = the number of subjects for which R = i and C = j.

• R = rows, C = columns

Page 59: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

Example

Both Systolic Blood pressure (C) and Serum Cholesterol (R) were meansured for a sample of n = 1237 subjects.

The categories for Blood Pressure are:

<126 127-146 147-166 167+

The categories for Cholesterol are:

<200 200-219 220-259 260+

Page 60: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

Table: two-way frequency

Serum Systolic Blood pressure Cholesterol <127 127-146 147-166 167+ Total

<200 117 121 47 22 307 200-219 85 98 43 20 246 220-259 119 209 68 43 439

260+ 67 99 46 33 245 Total 388 527 204 118 1237

Page 61: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

The 2 test for independence

DefineTotal row

1

thc

jiji ixR

Totalcolumn 1

thc

iiji jxC

n

CRE ji

ij

= Expected frequency in the (i,j) th cell in the case of independence.

Page 62: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

Justification - for Eij = (RiCj)/n in the case of independence

Let ij = P[R = i, C = j] = P[R = i] P[C = j] = ij in the case of independence

jijiijij nnnE ˆˆ

= Expected frequency in the (i,j) th cell in the case of independence.

n

CR

n

C

n

Rn jiji

Page 63: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

Use test statistic

r

i

c

j ij

ijij

E

Ex

1 1

2

2

Eij= Expected frequency in the (i,j) th cell in the case of independence.

H0: R and C are independent

against

HA: R and C are not independent

Then to test

xij= observed frequency in the (i,j) th cell

i jR C

n

Page 64: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

Sampling distribution of test statistic when H0 is true

r

i

c

j ij

ijij

E

Ex

1 1

2

2

- 2 distribution with degrees of freedom = (r - 1)(c - 1)

Critical and Acceptance Region

Reject H0 if : 2

Accept H0 if : 2

Page 65: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

Table Expected frequencies, Observed frequencies, Standardized Residuals

Serum Systolic Blood pressure

Cholesterol <127 127-146 147-166 167+ Total <200 96.29 130.79 50.63 29.29 307 (117) (121) (47) (22) 2.11 -0.86 -0.51 -1.35 200-219 77.16 104.80 40.47 23.47 246 (85) (98) (43) (20) 0.86 -0.66 0.38 -0.72 220-259 137.70 187.03 72.40 41.88 439 (119) (209) (68) (43) -1.59 1.61 -0.52 0.17 260+ 76.85 104.38 40.04 23.37 245 (67) (99) (46) (33) -1.12 -0.53 0.88 1.99 Total 388 527 204 118 1237

2 = 20.85

Page 66: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

Standardized residuals

ij

ijijij

E

Exr

85.20

1 1

2

1 1

2

2

r

i

c

jij

r

i

c

j ij

ijij rE

Ex

degrees of freedom = (r - 1)(c - 1) = 9

919.1605.0

Test statistic

Reject H0 using = 0.05

Page 67: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

Another Example

This data comes from a Globe and Mail study examining the attitudes of the baby boomers.Data was collected on various age groups

Age group Total

Echo (Age 20 – 29) 398Gen X (Age 30 – 39) 342

Younger Boomers (Age 40 – 49) 378Older Boomers (Age 50 – 59) 286

Pre Boomers (Age 60+) 445Total 1849

Page 68: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

One question with responses

In an average week, how many times would you drink alcohol?

Age group never once twice

three or four times

five more times Total

Echo (Age 20 – 29) 115 135 64 48 36 398 Gen X (Age 30 – 39) 130 123 38 31 20 342 Younger Boomers (Age 40 – 49) 136 87 64 57 34 378 Older Boomers (Age 50 – 59) 109 74 40 43 20 286

Pre Boomers (Age 60+) 218 80 45 40 62 445

Total 708 499 251 219 172 1849

Are there differences in weekly consumption of alcohol related to age?

Page 69: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

Table: Expected frequencies

Age group never once twice

three or four times

five more times Total

Echo (Age 20 – 29) 152.40 107.41 54.03 47.14 37.02 398 Gen X (Age 30 – 39) 130.96 92.30 46.43 40.51 31.81 342 Younger Boomers (Age 40 – 49) 144.74 102.01 51.31 44.77 35.16 378 Older Boomers (Age 50 – 59) 109.51 77.18 38.82 33.87 26.60 286

Pre Boomers (Age 60+) 170.39 120.09 60.41 52.71 41.40 445

Total 708 499 251 219 172 1849

Page 70: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

Table: Residuals

Conclusion: There is a significant relationship between age group and weekly alcohol use

Age group never once twice

three or four times

five more times

Echo (Age 20 – 29) -3.029 2.662 1.357 0.125 -0.168 Gen X (Age 30 – 39) -0.083 3.196 -1.237 -1.494 -2.095 Younger Boomers (Age 40 – 49) -0.726 -1.486 1.771 1.828 -0.196 Older Boomers (Age 50 – 59) -0.049 -0.362 0.189 1.568 -1.280

Pre Boomers (Age 60+) 3.647 -3.659 -1.982 -1.750 3.203

ij

ijijij

E

Exr

2

2 2

1 1 1 1

93.97r c r c

ij ij

iji j i jij

x Er

E

2.05 26.296 for 4 4 16 .d f

Page 71: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

Examining the Residuals allows one to identify the cells that indicate a departure from independence

• Large positive residuals indicate cells where the observed frequencies were larger than expected if independent Large negative residuals indicate cells where the observed frequencies were smaller than expected if independent

Age group never once twice

three or four times

five more times

Echo (Age 20 – 29) -3.029 2.662 1.357 0.125 -0.168 Gen X (Age 30 – 39) -0.083 3.196 -1.237 -1.494 -2.095 Younger Boomers (Age 40 – 49) -0.726 -1.486 1.771 1.828 -0.196 Older Boomers (Age 50 – 59) -0.049 -0.362 0.189 1.568 -1.280

Pre Boomers (Age 60+) 3.647 -3.659 -1.982 -1.750 3.203

Page 72: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

Another question with responses

Are there differences in weekly internet use related to age?

Age group never 1 to 4 times

5 to 9 times

10 or more times Total

Echo (Age 20 – 29) 48 72 100 178 398 Gen X (Age 30 – 39) 51 82 92 117 342 Younger Boomers (Age 40 – 49) 79 128 76 95 378 Older Boomers (Age 50 – 59) 92 63 57 74 286

Pre Boomers (Age 60+) 276 71 67 31 445

Total 546 416 392 495 1849

In an average week, how many times would you surf the internet?

Page 73: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

Table: Expected frequencies

Age group never 1 to 4 times

5 to 9 times

10 or more times Total

Echo (Age 20 – 29) 117.53 89.54 84.38 106.55 398 Gen X (Age 30 – 39) 100.99 76.95 72.51 91.56 342

Younger Boomers (Age 40 – 49) 111.62 85.04 80.14 101.20 378 Older Boomers (Age 50 – 59) 84.45 64.35 60.63 76.57 286

Pre Boomers (Age 60+) 131.41 100.12 94.34 119.13 445

Total 546 416 392 495 1849

Page 74: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

Table: Residuals

Conclusion: There is a significant relationship between age group and weekly internet use

ij

ijijij

E

Exr

2

2 2

1 1 1 1

406.29r c r c

ij ij

iji j i jij

x Er

E

2.05 21.03 for 4 3 12 .d f

Age group never 1 to 4 times

5 to 9 times

10 or more times

Echo (Age 20 – 29) -6.41 -1.85 1.70 6.92 Gen X (Age 30 – 39) -4.97 0.58 2.29 2.66

Younger Boomers (Age 40 – 49) -3.09 4.66 -0.46 -0.62 Older Boomers (Age 50 – 59) 0.82 -0.17 -0.47 -0.29

Pre Boomers (Age 60+) 12.61 -2.91 -2.82 -8.07

Page 75: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

0.0

10.0

20.0

30.0

40.0

50.0

60.0

70.0

never 1 to 4 times 5 to 9 times 10 or more times

Echo (Age 20 – 29)

Page 76: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

0.0

10.0

20.0

30.0

40.0

50.0

60.0

70.0

never 1 to 4 times 5 to 9 times 10 or more times

Gen X (Age 30 – 39)

Page 77: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

0.0

10.0

20.0

30.0

40.0

50.0

60.0

70.0

never 1 to 4 times 5 to 9 times 10 or more times

Younger Boomers (Age 40 – 49)

Page 78: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

0.0

10.0

20.0

30.0

40.0

50.0

60.0

70.0

never 1 to 4 times 5 to 9 times 10 or more times

Older Boomers (Age 50 – 59)

Page 79: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

0.0

10.0

20.0

30.0

40.0

50.0

60.0

70.0

never 1 to 4 times 5 to 9 times 10 or more times

Pre Boomers (Age 60+)

Page 80: Comparing k Populations Means – One way Analysis of Variance (ANOVA)

Regressions and Correlation

Estimation by confidence intervals, Hypothesis Testing