Chapter 17 Statistical Inference For Frequency Data IThree Applications of Pearson’s 2

1

Chapter 17

Statistical Inference For Frequency Data

I Three Applications of Pearson’s 2

Testing goodness of fit

Testing independence

Testing equality of proportions

2

A. Testing Goodness of Fit

1. Statistical hypotheses

H0: OPop 1 = EPop 1, . . . , OPop k = EPop k

H1: OPop j ≠ EPop j for some j and j

2. Randomization Plan

One random sample of n elements

Each element is classified in terms of

membership in one of k mutually exclusive

categories

3

B. Testing Independence


H0: p(A and B) = p(A)p(B)

H1: p(A and B) ≠ p(A)p(B)


One random sample of n elements

Each element is classified in terms of

two variables, denoted by A and B, where

each variable has two or more categories.

4

C. Testing Equality of Proportions


H0: p1 = p2 = . . . = pc

H1: pj ≠ pj for some j and j


c random samples, where c ≥ 2

For each sample, elements are classified in

terms of membership in one of r = 2 mutually

exclusive categories

5

II Testing Goodness of Fit

A. Chi-Square Distribution

f ( 2

)

2

df = 1

df = 2

df = 6

df = 10

6

B. Pearson’s chi-square statistic

1. Oj and Ej denote, respectively, observed and

expected frequencies. k denotes the number of

categories.

2. Critical value of chi square is with = k – 1

degrees of freedom.

2 =(O j − E j )

2

E jj=1

k∑

α , ν2

7

C. Grade-Distribution Example

1. Is the distribution of grades for summer-school

students in a statistics class different from that for

the fall and spring semesters?

Fall and Spring Summer Grade Proportion Obs. frequency

A .12 15 B .23 21 C .47 30 D .13 6 F .05 0

1.00 24

8

2. The statistical hypotheses are

H0: OPop 1 = EPop 1, . . . , OPop 5 = EPop 5

H1: OPop j ≠ EPop j for some j and j

3. Pearson’s chi-square statistic is

4. Critical value of chi square for α = .05, k = 5

categories, and = 5 – 1 = 4 degrees of freedom

is

2 =(O j − E j )

2

E jj=1

k∑

.05, 42 = 9.488.

9

Table 1. Computation of Pearson’s Chi-Square for n = 72 Summer-School Students

(1) (2) (3) (4) (5) (6)

Grade Oj pj npj = Ej Oj – Ej

(O j −E j )2

E j

A 15 .12 72(.12) = 8.6 6.4 4.763B 21 .23 72(.23) =16.6 4.4 1.166C 30 .47 72(.47) = 33.8 –3.8 0.427D 6 .13 72(.13) = 9.4 –-3.4 1.230F 0 .05 72(.05) = 3.6 –3.6 3.600

72 1.00 72.0 0 2 = 11.186*

*p < .025

10

5. Degrees of freedom when e parameters of a

theoretical distribution must be estimated is

k – 1 – e.

D. Practical Significance

1. Cohen’s w

observed and and expected proportions in the jth

category.

w =( pj − pj )

2

pjj=1

k∑

where p j and pj denote, respectively, the

11

2. Simpler equivalent formula for Cohen’s

w =2

n=

11.18672

=0.046

3. Cohen’s guidelines for interpreting w

0.1 is a small effect

0.3 is a medium effect

0.5 is a large effect

w

12

E. Yates’ Correction

1. When = 1, Yates’ correction can be applied to

make the sampling distribution of the test statistic

for Oj – Ej , which is discrete, better approximate

the chi-square distribution.

2 =(| O j − E j | − 0.5)2

E jj=1

k∑

13

F. Assumptions of the Goodness-of-Fit Test

1. Every observation is assigned to one and only

one category.

2. The observations are independent

3. If = 1, every expected frequency should be at

least 10. If > 1, every expected frequency should

be at least 5.

14

III Testing Independence

A. Statistical Hypotheses

H0: p(A and B) = p(A)p(B)

H1: p(A and B) ≠ p(A)p(B)

B.Chi-Square Statistic for an r c Contingency

Table with i = 1, . . . , r Rows and j = 1, . . . , c Columns

2 =(Oij −Eij )

2

Eijj=1

c∑

i=1

r∑

15

C. Computational Example: Is Success on an Employment-Test Item Independent of Gender?

Observed Expected

b1 b2 b1 b2

Fail Pass Fail Pass

a1 Man 84 18 102 88.9 13.1a2 Women 93 8 101 88.1 12.9

177 26 203

2 =(Oij −Eij )

2

Eijj=1

c∑

i=1

r∑ =4.299 * .05, 1

2 = 3.841

16

D. Computation of expected frequencies

1. A and B are statistically independent if

p(ai and bj) = p(ai)p(bj)

2. Expected frequency, for the cell in

row i and column j

Eai and bj

=np(ai )p(bj )

=(nai

nbj) / n

Eai and bj

,

=n(nai

/ n)(nbj/ n)

17

Ea2 and b1

=(na2nb1

) / n=(101)(177) / 203=88.1

Ea1 and b1

=(na1nb1

) / n=(102)(177) / 203=88.9

Ea2 and b2

=(na2nb2

) / n=(101)(26) / 203=12.9

Ea1 and b2

=(na1nb2

) / n=(102)(26) / 203=13.1

Observed Expectedb1 b2 b1 b2

a1 84 18 102 88.9 13.1

a2 93 8 101 88.1 12.9

177 26 203

18

E. Degrees of Freedom for an r c Contingency Table

df = k – 1 – e

= rc – 1 – [(r – 1) + (c – 1)]

= rc – 1 – r + 1 – c + 1

= rc – r – c + 1

= (r – 1)(c – 1)

= (2 – 1)(2 – 1) = 1

19

F. Strength of Association and Practical Significance

1. Cramér’s V

V =φobserved

φmaximum=

2 / ns−1

=2

n(s−1)

where s is the smaller of the number of rows and

columns.

V =2

n(s−1)=

4.299203(2 −1)

=0.146

20

2. Practical significance, Cohen’s w

w =( pij − pij )

2

pijj=1

c∑

i=1

r∑ =

2

n=0.146

3. For a contingency table, an alternative formula for

is w

w =V s−1 =0.146 2 −1 =0.146

21

G. Three-By-Three Contingency Table

1. Motivation and education of conscientious

objectors during WWII

High GradeCollege School School Total

Coward 12 25 35 72Partly Coward 19 23 30 72Not Coward 71 56 24 151

Total 102 104 89 295

22

2 =(Oij −Eij )

2

Eijj=1

c∑

i=1

r∑ =36.681* .05, 4

2 = 9.488

=(r −1)(c−1) =(3−1)(3−1) =4

2. Strength of Association, Cramér’s

3. Practical significance

V

w =V s−1 =0.249 3−1 =0.352

V =2

n(s−1)=

36.681295(3−1)

=0.249

23

H. Assumptions of the Independence Test


one cell of the contingency table.




be at least 5.

24

IV Testing Equality of c ≥ 2 Proportions

A. Statistical Hypotheses

H0: p1 = p2 = . . . = pc

H1: pj ≠ pj for some j and j

1. Computational example: three samples of n = 100

residents of nursing homes were surveyed.

Variable A was age heterogeneity in the home;

variable B was resident satisfaction.

25

Table 2. Nursing Home Data

Age Heterogeneity

Low b1 Medium b2 High b3

Satisfied a1 O = 56 O = 58 O = 38

E = 50.67 E = 50.67 E = 50.67

Not Satisfied a2 O = 44 O = 42 O = 52

E = 49.33 E = 49.33 E = 49.33

26

2 =(Oij −Eij )

2

Eijj=1

c∑

i=1

r∑ =9.708*

.05, 22 = 5.991

=(r −1)(c−1) =(2−1)(3−1) =2

B. Assumptions of the Equality of ProportionsTest


one cell of the contingency table.

27




be at least 5.

C. Test of Homogeneity of Proportions

1. Extension of the test of equality of

proportions when variable A has r > 2 rows

28


for columns j and j'

H0 :

pa1 |b1=pa1|b2 =L =pa1|bc

pa2 |b1 =pa2 |b2 =L =pa2 |bc

Mpar |b1 =par |b2 =L =par |bc

⎡

⎣

⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥

H1 : pai |b j≠pai |b j

in at least one row

Documents

Chapter 17 Statistical Inference For Frequency Data IThree Applications of Pearson’s 2