28
1 Chapter 17 Statistical Inference For Frequency Data I Three Applications of Pearson’s 2 Testing goodness of fit Testing independence Testing equality of proportions

Chapter 17 Statistical Inference For Frequency Data IThree Applications of Pearson’s 2

  • Upload
    maddox

  • View
    37

  • Download
    5

Embed Size (px)

DESCRIPTION

Chapter 17 Statistical Inference For Frequency Data IThree Applications of Pearson’s  2  Testing goodness of fit  Testing independence  Testing equality of proportions. A. Testing Goodness of Fit 1.Statistical hypotheses - PowerPoint PPT Presentation

Citation preview

Page 1: Chapter 17 Statistical Inference  For Frequency Data  IThree Applications of Pearson’s   2

1

Chapter 17

Statistical Inference For Frequency Data

I Three Applications of Pearson’s 2

Testing goodness of fit

Testing independence

Testing equality of proportions

Page 2: Chapter 17 Statistical Inference  For Frequency Data  IThree Applications of Pearson’s   2

2

A. Testing Goodness of Fit

1. Statistical hypotheses

H0: OPop 1 = EPop 1, . . . , OPop k = EPop k

H1: OPop j ≠ EPop j for some j and j

2. Randomization Plan

One random sample of n elements

Each element is classified in terms of

membership in one of k mutually exclusive

categories

Page 3: Chapter 17 Statistical Inference  For Frequency Data  IThree Applications of Pearson’s   2

3

B. Testing Independence

1. Statistical hypotheses

H0: p(A and B) = p(A)p(B)

H1: p(A and B) ≠ p(A)p(B)

2. Randomization Plan

One random sample of n elements

Each element is classified in terms of

two variables, denoted by A and B, where

each variable has two or more categories.

Page 4: Chapter 17 Statistical Inference  For Frequency Data  IThree Applications of Pearson’s   2

4

C. Testing Equality of Proportions

1. Statistical hypotheses

H0: p1 = p2 = . . . = pc

H1: pj ≠ pj for some j and j

2. Randomization Plan

c random samples, where c ≥ 2

For each sample, elements are classified in

terms of membership in one of r = 2 mutually

exclusive categories

Page 5: Chapter 17 Statistical Inference  For Frequency Data  IThree Applications of Pearson’s   2

5

II Testing Goodness of Fit

A. Chi-Square Distribution

f ( 2

)

2

df = 1

df = 2

df = 6

df = 10

Page 6: Chapter 17 Statistical Inference  For Frequency Data  IThree Applications of Pearson’s   2

6

B. Pearson’s chi-square statistic

1. Oj and Ej denote, respectively, observed and

expected frequencies. k denotes the number of

categories.

2. Critical value of chi square is with = k – 1

degrees of freedom.

2 =(O j − E j )

2

E jj=1

k∑

α , ν2

Page 7: Chapter 17 Statistical Inference  For Frequency Data  IThree Applications of Pearson’s   2

7

C. Grade-Distribution Example

1. Is the distribution of grades for summer-school

students in a statistics class different from that for

the fall and spring semesters?

Fall and Spring Summer Grade Proportion Obs. frequency

A .12 15 B .23 21 C .47 30 D .13 6 F .05 0

1.00 24

Page 8: Chapter 17 Statistical Inference  For Frequency Data  IThree Applications of Pearson’s   2

8

2. The statistical hypotheses are

H0: OPop 1 = EPop 1, . . . , OPop 5 = EPop 5

H1: OPop j ≠ EPop j for some j and j

3. Pearson’s chi-square statistic is

4. Critical value of chi square for α = .05, k = 5

categories, and = 5 – 1 = 4 degrees of freedom

is

2 =(O j − E j )

2

E jj=1

k∑

.05, 42 = 9.488.

Page 9: Chapter 17 Statistical Inference  For Frequency Data  IThree Applications of Pearson’s   2

9

Table 1. Computation of Pearson’s Chi-Square for n = 72 Summer-School Students

(1) (2) (3) (4) (5) (6)

Grade Oj pj npj = Ej Oj – Ej

(O j −E j )2

E j

A 15 .12 72(.12) = 8.6 6.4 4.763B 21 .23 72(.23) =16.6 4.4 1.166C 30 .47 72(.47) = 33.8 –3.8 0.427D 6 .13 72(.13) = 9.4 –-3.4 1.230F 0 .05 72(.05) = 3.6 –3.6 3.600

72 1.00 72.0 0 2 = 11.186*

*p < .025

Page 10: Chapter 17 Statistical Inference  For Frequency Data  IThree Applications of Pearson’s   2

10

5. Degrees of freedom when e parameters of a

theoretical distribution must be estimated is

k – 1 – e.

D. Practical Significance

1. Cohen’s w

observed and and expected proportions in the jth

category.

w =( pj − pj )

2

pjj=1

k∑

where p j and pj denote, respectively, the

Page 11: Chapter 17 Statistical Inference  For Frequency Data  IThree Applications of Pearson’s   2

11

2. Simpler equivalent formula for Cohen’s

w =2

n=

11.18672

=0.046

3. Cohen’s guidelines for interpreting w

0.1 is a small effect

0.3 is a medium effect

0.5 is a large effect

w

Page 12: Chapter 17 Statistical Inference  For Frequency Data  IThree Applications of Pearson’s   2

12

E. Yates’ Correction

1. When = 1, Yates’ correction can be applied to

make the sampling distribution of the test statistic

for Oj – Ej , which is discrete, better approximate

the chi-square distribution.

2 =(| O j − E j | − 0.5)2

E jj=1

k∑

Page 13: Chapter 17 Statistical Inference  For Frequency Data  IThree Applications of Pearson’s   2

13

F. Assumptions of the Goodness-of-Fit Test

1. Every observation is assigned to one and only

one category.

2. The observations are independent

3. If = 1, every expected frequency should be at

least 10. If > 1, every expected frequency should

be at least 5.

Page 14: Chapter 17 Statistical Inference  For Frequency Data  IThree Applications of Pearson’s   2

14

III Testing Independence

A. Statistical Hypotheses

H0: p(A and B) = p(A)p(B)

H1: p(A and B) ≠ p(A)p(B)

B.Chi-Square Statistic for an r c Contingency

Table with i = 1, . . . , r Rows and j = 1, . . . , c Columns

2 =(Oij −Eij )

2

Eijj=1

c∑

i=1

r∑

Page 15: Chapter 17 Statistical Inference  For Frequency Data  IThree Applications of Pearson’s   2

15

C. Computational Example: Is Success on an Employment-Test Item Independent of Gender?

Observed Expected

b1 b2 b1 b2

Fail Pass Fail Pass

a1 Man 84 18 102 88.9 13.1a2 Women 93 8 101 88.1 12.9

177 26 203

2 =(Oij −Eij )

2

Eijj=1

c∑

i=1

r∑ =4.299 * .05, 1

2 = 3.841

Page 16: Chapter 17 Statistical Inference  For Frequency Data  IThree Applications of Pearson’s   2

16

D. Computation of expected frequencies

1. A and B are statistically independent if

p(ai and bj) = p(ai)p(bj)

2. Expected frequency, for the cell in

row i and column j

Eai and bj

=np(ai )p(bj )

=(nai

nbj) / n

Eai and bj

,

=n(nai

/ n)(nbj/ n)

Page 17: Chapter 17 Statistical Inference  For Frequency Data  IThree Applications of Pearson’s   2

17

Ea2 and b1

=(na2nb1

) / n=(101)(177) / 203=88.1

Ea1 and b1

=(na1nb1

) / n=(102)(177) / 203=88.9

Ea2 and b2

=(na2nb2

) / n=(101)(26) / 203=12.9

Ea1 and b2

=(na1nb2

) / n=(102)(26) / 203=13.1

Observed Expectedb1 b2 b1 b2

a1 84 18 102 88.9 13.1

a2 93 8 101 88.1 12.9

177 26 203

Page 18: Chapter 17 Statistical Inference  For Frequency Data  IThree Applications of Pearson’s   2

18

E. Degrees of Freedom for an r c Contingency Table

df = k – 1 – e

= rc – 1 – [(r – 1) + (c – 1)]

= rc – 1 – r + 1 – c + 1

= rc – r – c + 1

= (r – 1)(c – 1)

= (2 – 1)(2 – 1) = 1

Page 19: Chapter 17 Statistical Inference  For Frequency Data  IThree Applications of Pearson’s   2

19

F. Strength of Association and Practical Significance

1. Cramér’s V

V =φobserved

φmaximum=

2 / ns−1

=2

n(s−1)

where s is the smaller of the number of rows and

columns.

V =2

n(s−1)=

4.299203(2 −1)

=0.146

Page 20: Chapter 17 Statistical Inference  For Frequency Data  IThree Applications of Pearson’s   2

20

2. Practical significance, Cohen’s w

w =( pij − pij )

2

pijj=1

c∑

i=1

r∑ =

2

n=0.146

3. For a contingency table, an alternative formula for

is w

w =V s−1 =0.146 2 −1 =0.146

Page 21: Chapter 17 Statistical Inference  For Frequency Data  IThree Applications of Pearson’s   2

21

G. Three-By-Three Contingency Table

1. Motivation and education of conscientious

objectors during WWII

High GradeCollege School School Total

Coward 12 25 35 72Partly Coward 19 23 30 72Not Coward 71 56 24 151

Total 102 104 89 295

Page 22: Chapter 17 Statistical Inference  For Frequency Data  IThree Applications of Pearson’s   2

22

2 =(Oij −Eij )

2

Eijj=1

c∑

i=1

r∑ =36.681* .05, 4

2 = 9.488

=(r −1)(c−1) =(3−1)(3−1) =4

2. Strength of Association, Cramér’s

3. Practical significance

V

w =V s−1 =0.249 3−1 =0.352

V =2

n(s−1)=

36.681295(3−1)

=0.249

Page 23: Chapter 17 Statistical Inference  For Frequency Data  IThree Applications of Pearson’s   2

23

H. Assumptions of the Independence Test

1. Every observation is assigned to one and only

one cell of the contingency table.

2. The observations are independent

3. If = 1, every expected frequency should be at

least 10. If > 1, every expected frequency should

be at least 5.

Page 24: Chapter 17 Statistical Inference  For Frequency Data  IThree Applications of Pearson’s   2

24

IV Testing Equality of c ≥ 2 Proportions

A. Statistical Hypotheses

H0: p1 = p2 = . . . = pc

H1: pj ≠ pj for some j and j

1. Computational example: three samples of n = 100

residents of nursing homes were surveyed.

Variable A was age heterogeneity in the home;

variable B was resident satisfaction.

Page 25: Chapter 17 Statistical Inference  For Frequency Data  IThree Applications of Pearson’s   2

25

Table 2. Nursing Home Data

Age Heterogeneity

Low b1 Medium b2 High b3

Satisfied a1 O = 56 O = 58 O = 38

E = 50.67 E = 50.67 E = 50.67

Not Satisfied a2 O = 44 O = 42 O = 52

E = 49.33 E = 49.33 E = 49.33

Page 26: Chapter 17 Statistical Inference  For Frequency Data  IThree Applications of Pearson’s   2

26

2 =(Oij −Eij )

2

Eijj=1

c∑

i=1

r∑ =9.708*

.05, 22 = 5.991

=(r −1)(c−1) =(2−1)(3−1) =2

B. Assumptions of the Equality of ProportionsTest

1. Every observation is assigned to one and only

one cell of the contingency table.

Page 27: Chapter 17 Statistical Inference  For Frequency Data  IThree Applications of Pearson’s   2

27

2. The observations are independent

3. If = 1, every expected frequency should be at

least 10. If > 1, every expected frequency should

be at least 5.

C. Test of Homogeneity of Proportions

1. Extension of the test of equality of

proportions when variable A has r > 2 rows

Page 28: Chapter 17 Statistical Inference  For Frequency Data  IThree Applications of Pearson’s   2

28

2. Statistical hypotheses

for columns j and j'

H0 :

pa1 |b1=pa1|b2 =L =pa1|bc

pa2 |b1 =pa2 |b2 =L =pa2 |bc

Mpar |b1 =par |b2 =L =par |bc

⎢⎢⎢⎢⎢

⎥⎥⎥⎥⎥

H1 : pai |b j≠pai |b j

in at least one row