84
© 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

Embed Size (px)

Citation preview

Page 1: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved

Chapter

Inference on Categorical Data

12

Page 2: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved

Section

Goodness-of-Fit Test

12.1

Page 3: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-3

Objective

• Perform a goodness-of-fit test

Page 4: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-4

Characteristics of the Chi-Square Distribution

1. It is not symmetric.

Page 5: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-5

1. It is not symmetric.2. The shape of the chi-square distribution

depends on the degrees of freedom, just like Student’s t-distribution.

Characteristics of the Chi-Square Distribution

Page 6: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-6

1. It is not symmetric.2. The shape of the chi-square distribution

depends on the degrees of freedom, just like Student’s t-distribution.

3. As the number of degrees of freedom increases, the chi-square distribution becomes more nearly symmetric.

Characteristics of the Chi-Square Distribution

Page 7: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-7

1. It is not symmetric.2. The shape of the chi-square distribution

depends on the degrees of freedom, just like Student’s t-distribution.

3. As the number of degrees of freedom increases, the chi-square distribution becomes more nearly symmetric.

4. The values of 2 are nonnegative, i.e., the values of 2 are greater than or equal to 0.

Characteristics of the Chi-Square Distribution

Page 8: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-8

Page 9: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-9

A goodness-of-fit test is an inferential procedure used to determine whether a frequency distribution follows a specific distribution.

Page 10: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-10

Expected Counts

Suppose that there are n independent trials of an

experiment with k ≥ 3 mutually exclusive possible

outcomes. Let p1 represent the probability of observing

the first outcome and E1 represent the expected count of

the first outcome; p2 represent the probability of

observing the second outcome and E2 represent the

expected count of the second outcome; and so on. The

expected counts for each possible outcome are given by

Ei = i = npi for i = 1, 2, …, k

Page 11: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-11

A sociologist wishes to determine whether the distribution for the number of years care-giving grandparents are responsible for their grandchildren is different today than it was in 2000. According to the United States Census Bureau, in 2000, 22.8% of grandparents have been responsible for their grandchildren less than 1 year; 23.9% of grandparents have been responsible for their grandchildren for 1 or 2 years; 17.6% of grandparents have been responsible for their grandchildren 3 or 4 years; and 35.7% of grandparents have been responsible for their grandchildren for 5 or more years. If the sociologist randomly selects 1,000 care-giving grandparents, compute the expected number within each category assuming the distribution has not changed from 2000.

Parallel Example 1: Finding Expected Counts

Page 12: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-12

Step 1: The probabilities are the relative frequencies from the 2000 distribution:

p<1yr = 0.228 p1-2yr = 0.239

p3-4yr = 0.176 p ≥5yr = 0.357

Solution

Page 13: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-13

Step 2: There are n=1,000 trials of the experiment so the expected counts are:

E<1yr = np<1yr = 1000(0.228) = 228

E1-2yr = np1-2yr = 1000(0.239) = 239

E3-4yr = np3-4yr =1000(0.176) = 176

E≥5yr = np ≥5yr = 1000(0.357) = 357

Solution

Page 14: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-14

Test Statistic for Goodness-of-Fit Tests

Let Oi represent the observed counts of category i, Ei

represent the expected counts of category i, k representthe number of categories, and n represent the number ofindependent trials of an experiment. Then the formula

approximately follows the chi-square distribution withk-1 degrees of freedom, provided that• all expected frequencies are greater than or equal to 1 (all Ei ≥ 1)

and• no more than 20% of the expected frequencies are less than 5.

2 Oi E i 2

E i

i 1, 2,, k

Page 15: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-15

CAUTION!

Goodness-of-fit tests are used to test hypotheses regarding the distribution of a variable based on a single population. If you wish to compare two or more populations, you must use the tests for homogeneity presented in Section 12.2.

Page 16: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-16

Step 1: Determine the null and alternative hypotheses. H0: The random variable follows a

certain distribution H1: The random variable does not

follow a certain distribution

The Goodness-of-Fit Test

To test the hypotheses regarding a distribution, we use the steps that follow.

Page 17: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-17

Step 2: Decide on a level of significance, , depending on the seriousness of making a Type I error.

Page 18: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-18

Step 3: a) Calculate the expected counts for each

of the k categories. The expected counts are Ei=npi for i = 1, 2, … , k where n is the number of trials and pi is the probability of the ith category, assuming that the null hypothesis is true.

Page 19: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-19

Step 3: b) Verify that the requirements for the goodness-

of-fit test are satisfied.1. All expected counts are greater than or

equal to 1 (all Ei ≥ 1).2. No more than 20% of the expected counts

are less than 5.c) Compute the test statistic:

Note: Oi is the observed count for the ith category.

02

Oi E i 2

E i

Page 20: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-20

CAUTION!

If the requirements in Step 3(b) are not satisfied, one option is to combine two or more of the low-frequency categories into a single category.

Page 21: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-21

Step 4: Determine the critical value. All goodness-of-fit tests are right-tailed tests, so the critical value is with k-1 degrees of freedom.

Classical Approach

2

Page 22: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-22

Step 5: Compare the critical value to the test statistic. If reject the null hypothesis.

Classical Approach

02

2,

Page 23: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-23

Step 4: Use Table VII to obtain an approximate P-value by determining the area under the chi-square distribution with k-1 degrees of freedom to the right of the test statistic.

P-Value Approach

Page 24: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-24

Step 5: If the P-value < , reject the null hypothesis.

P-Value Approach

Page 25: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-25

Step 6: State the conclusion.

Page 26: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-26

A sociologist wishes to determine whether the distribution forthe number of years care-giving grandparents are responsiblefor their grandchildren is different today than it was in 2000.

According to the United States Census Bureau, in 2000, 22.8%of grandparents have been responsible for their grandchildrenless than 1 year; 23.9% of grandparents have been responsiblefor their grandchildren for 1 or 2 years; 17.6% of grandparentshave been responsible for their grandchildren 3 or 4 years; and35.7% of grandparents have been responsible for theirgrandchildren for 5 or more years. The sociologist randomlyselects 1,000 care-giving grandparents and obtains thefollowing data.

Parallel Example 2: Conducting a Goodness-of -Fit Test

Page 27: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-27

Test the claim that the distribution is different today than it was in 2000 at the = 0.05 level of significance.

Page 28: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-28

Step 1: We want to know if the distribution today is different than it was in 2000. The hypotheses are then:

H0: The distribution for the number of years care-giving grandparents are responsible for their grandchildren is the same today as it was in 2000

H1: The distribution for the number of years care-giving grandparents are responsible for their grandchildren is different today than it was in 2000

Solution

Page 29: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-29

Step 2: The level of significance is =0.05.

Step 3:

(a) The expected counts were computed in Example 1.

Solution

Number of Years

Observed Counts

Expected Counts

<1 252 228

1-2 255 239

3-4 162 176

≥5 331 357

Page 30: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-30

Step 3:

(b) Since all expected counts are greater than or equal to 5, the requirements for the goodness-of-fit test are satisfied.

(c) The test statistic is

Solution

02

252 228 2

228

255 239 2

239

162 176 2

176

331 357 2

3576.605

Page 31: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-31

Step 4: There are k = 4 categories, so we find the critical value using 4-1=3 degrees of freedom. The critical value is

Solution: Classical Approach

0.052 7.815

Page 32: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-32

Step 5: Since the test statistic, is less than the critical value , we fail to reject the null hypothesis.

Solution: Classical Approach

02 6.605

0.052 7.815

Page 33: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-33

Step 4: There are k = 4 categories. The P-value is the area under the chi-square distribution with 4-1=3 degrees of freedom to the right of . Thus, P-value ≈ 0.09.

Solution: P-Value Approach

02 6.605

Page 34: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-34

Step 5: Since the P-value ≈ 0.09 is greater than the level of significance = 0.05, we fail to reject the null hypothesis.

Solution: P-Value Approach

Page 35: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-35

Step 6: There is insufficient evidence to conclude that the distribution for the number of years care-giving grandparents are responsible for their grandchildren is different today than it was in 2000 at the = 0.05 level of significance.

Solution

Page 36: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved

Section

Tests for Independence and the Homogeneity of Proportions

12.2

Page 37: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-37

1. Perform a test for independence

2. Perform a test for homogeneity of proportions

Objectives

Page 38: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-38

Objective 1

• Perform a Test for Independence

Page 39: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-39

The chi-square test for independence is used to determine whether there is an association between a row variable and column variable in a contingency table constructed from sample data. The null hypothesis is that the variables are not associated; in other words, they are independent. The alternative hypothesis is that the variables are associated, or dependent.

Page 40: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-40

“In Other Words”

In a chi-square independence test, the null

hypothesis is always

H0: The variables are independent

The alternative hypothesis is always

H0: The variables are not independent

Page 41: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-41

The idea behind testing these types of claims is to compare actual counts to the counts we would expect if the null hypothesis were true (if the variables are independent). If a significant difference between the actual counts and expected counts exists, we would take this as evidence against the null hypothesis.

Page 42: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-42

If two events are independent, thenP(A and B) = P(A)P(B)

We can use the Multiplication Principle for independent events to obtain the expected proportion of observations within each cell under the assumption of independence and multiply this result by n, the sample size, in order to obtain the expected count within each cell.

Page 43: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-43

In a poll, 883 males and 893 females were asked “If you could have only one of the following, which would you pick: money, health, or love?” Their responses are presented in the table below. Determine the expected counts within each cell assuming that gender and response are independent.

Source: Based on a Fox News Poll conducted in January, 1999

Parallel Example 1: Determining the Expected Counts in a Test for Independence

Page 44: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-44

Step 1: We first compute the row and column totals:

Solution

Money Health Love Row Totals

Men 82 446 355 883

Women 46 574 273 893

Column totals 128 1020 628 1776

Page 45: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-45

Step 2: Next compute the relative marginal frequencies for the row variable and column variable:

Solution

Money Health Love Relative Frequency

Men 82 446 355 883/1776

≈ 0.4972

Women 46 574 273 893/1776

≈0.5028

Relative Frequency

128/1776

≈0.0721

1020/1776

≈0.5743

628/1776

≈0.3536 1

Page 46: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-46

Step 3: Assuming gender and response are independent, we use the Multiplication Rule for Independent Events to compute the proportion of observations we would expect in each cell.

Solution

Money Health Love

Men 0.0358 0.2855 0.1758

Women 0.0362 0.2888 0.1778

Page 47: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-47

Step 4: We multiply the expected proportions from step 3 by 1776, the sample size, to obtain the expected counts under the assumption of independence.

Solution

Money Health Love

Men 1776(0.0358)

≈ 63.5808

1776(0.2855)

≈ 507.048

1776(0.1758)

≈ 312.2208

Women

1776(0.0362)

≈ 64.2912

1776(0.2888)

≈ 512.9088

1776(0.1778)

≈ 315.7728

Page 48: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-48

Expected Frequencies in a Chi-Square Test for Independence

To find the expected frequencies in a cell when performing a chi-square independence test, multiply the row total of the row containing the cell by the column total of the column containing the cell and divide this result by the table total. That is,

Expected frequency = (row total)(column total)

table total

Page 49: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-49

Test Statistic for the Test of Independence

Let Oi represent the observed number of counts in the

ith cell and Ei represent the expected number of countsin the ith cell. Then

approximately follows the chi-square distribution with (r-1)(c-1) degrees of freedom, where r is the number of rowsand c is the number of columns in the contingency table,provided that (1) all expected frequencies are greater than orequal to 1 and (2) no more than 20% of the expectedfrequencies are less than 5.

2 Oi E i 2

E i

Page 50: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-50

Step 1: Determine the null and alternative hypotheses. H0: The row variable and column

variable are independent. H1: The row variable and column

variables are dependent.

Chi-Square Test for Independence

To test the association (or independence of) two variables in a contingency table:

Page 51: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-51

Step 2: Choose a level of significance, , depending on the seriousness of making a Type I error.

Page 52: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-52

Step 3: a) Calculate the expected frequencies

(counts) for each cell in the contingency table.

b) Verify that the requirements for the chi-square test for independence are satisfied:1. All expected frequencies are greater

than or equal to 1 (all Ei ≥ 1).2. No more than 20% of the expected

frequencies are less than 5.

Page 53: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-53

Step 3: c) Compute the test statistic:

Note: Oi is the observed count for the ith category.

02

Oi E i 2

E i

Page 54: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-54

Step 4: Determine the critical value. All chi-square tests for independence are right-tailed tests, so the critical value is with (r-1)(c-1) degrees of freedom, where r is the number of rows and c is the number of columns in the contingency table.

Classical Approach

2

Page 55: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-55

Page 56: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-56

Step 5: Compare the critical value to the test statistic. If reject the null hypothesis.

Classical Approach

02

2,

Page 57: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-57

Step 4: Use Table VII to determine an approximate P-value by determining the area under the chi-square distribution with (r-1)(c-1) degrees of freedom to the right of the test statistic.

P-Value Approach

Page 58: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-58

Step 5: If the P-value < , reject the null hypothesis.

P-Value Approach

Page 59: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-59

Step 6: State the conclusion.

Page 60: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-60

In a poll, 883 males and 893 females were asked “If you could have only one of the following, which would you pick: money, health, or love?” Their responses are presented in the table below. Test the claim that gender and response are independent at the = 0.05 level of significance.

Source: Based on a Fox News Poll conducted in January, 1999

Parallel Example 2: Performing a Chi-Square Test for Independence

Page 61: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-61

Step 1: We want to know whether gender and response are dependent or independent so the hypotheses are:

H0: gender and response are independent

H1: gender and response are dependent

Step 2: The level of significance is =0.05.

Solution

Page 62: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-62

Step 3:

(a) The expected frequencies were computed in Example 1 and are given in parentheses in the table below, along with the observed frequencies.

Solution

Money Health Love

Men 82

(63.5808)

446

(507.048)

355

(312.2208)

Women 46

(64.2912)

574

(512.9088)

273

(315.7728)

Page 63: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-63

Step 3:

(b) Since none of the expected frequencies are less than 5, the requirements for the goodness-of-fit test are satisfied.

(c) The test statistic is

Solution

02

82 63.5808 2

63.5808

446 507.048 2

507.048

273 315.7728 2

315.772836.82

Page 64: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-64

Step 4: There are r = 2 rows and c =3 columns, so we find the critical value using (2-1)(3-1) = 2 degrees of freedom. The critical value is .

Solution: Classical Approach

0.052 5.99

Page 65: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-65

Step 5: Since the test statistic, is greater than the critical value , we reject the null hypothesis.

Solution: Classical Approach

02 36.82

0.052 5.99

Page 66: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-66

Step 4: There are r = 2 rows and c =3 columns so we find the P-value using (2-1)(3-1) = 2 degrees of freedom. The P-value is the area under the chi-square distribution with 2 degrees of freedom to the right of which is approximately 0.

Solution: P-Value Approach

02 36.82

Page 67: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-67

Step 5: Since the P-value is less than the level of significance = 0.05, we reject the null hypothesis.

Solution: P-Value Approach

Page 68: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-68

Step 6: There is sufficient evidence to conclude that gender and response are dependent at the = 0.05 level of significance.

Solution

Page 69: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-69

To see the relation between response and gender, we draw bar graphs of the conditional distributions of response by gender. Recall that a conditional distribution lists the relative frequency of each category of a variable, given a specific value of the other variable in a contingency table.

Page 70: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-70

Find the conditional distribution of response by gender for the data from the previous example, reproduced below.

Source: Based on a Fox News Poll conducted in January, 1999

Parallel Example 3: Constructing a Conditional Distribution and Bar Graph

Page 71: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-71

We first compute the conditional distribution of response by gender.

Solution

Money Health Love

Men 82/883

≈ 0.0929

446/883

≈ 0.5051

355/883

≈ 0.4020

Women 46/893

≈ 0.0515

574/893

≈ 0.6428

273/893

≈ 0.3057

Page 72: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-72

Solution

Page 73: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-73

Objective 2

• Perform a Test for Homogeneity of Proportions

Page 74: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-74

In a chi-square test for homogeneity of proportions, we test whether different populations have the same proportion of individuals with some characteristic.

Page 75: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-75

The procedures for performing a test of homogeneity are identical to those for a test of independence.

Page 76: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-76

The following question was asked of a random sample of individuals in 1992, 2002, and 2008: “Would you tell me if you feel being a teacher is an occupation of very great prestige?” The results of the survey are presented below:

Test the claim that the proportion of individuals that feel being a teacher is an occupation of very great prestige is the same for each year at the = 0.01 level of significance.

Source: The Harris Poll

Parallel Example 5: A Test for Homogeneity of Proportions

1992 2002 2008

Yes 418 479 525

No 602 541 485

Page 77: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-77

Step 1: The null hypothesis is a statement of “no difference” so the proportions for each year who feel that being a teacher is an occupation of very great prestige are equal. We state the hypotheses as follows:

H0: p1= p2= p3

H1: At least one of the proportions is different from the others.

Step 2: The level of significance is =0.01.

Solution

Page 78: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-78

Step 3:

(a) The expected frequencies are found by multiplying the appropriate row and column totals and then dividing by the total sample size. They are given in parentheses in the table below, along with the observed frequencies.

Solution

1992 2002 2008

Yes418

(475.554)

479

(475.554)

525

(470.892)

No602

(544.446)

541

(544.446)

485

(539.108)

Page 79: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-79

Step 3:

(b) Since none of the expected frequencies are less than 5, the requirements are satisfied.

(c) The test statistic is

Solution

02

418 475.554 2

475.554

479 475.554 2

475.554

485 539.108 2

539.10824.74

Page 80: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-80

Step 4: There are r = 2 rows and c =3 columns, so we find the critical value using (2-1)(3-1) = 2 degrees of freedom. The critical value is .

Solution: Classical Approach

0.012 9.210

Page 81: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-81

Step 5: Since the test statistic, is greater than the critical value , we reject the null hypothesis.

Solution: Classical Approach

02 24.74

0.012 9.210

Page 82: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-82

Step 4: There are r = 2 rows and c =3 columns so we find the P-value using (2-1)(3-1) = 2 degrees of freedom. The P-value is the area under the chi-square distribution with 2 degrees of freedom to the right of which is approximately 0.

Solution: P-Value Approach

02 24.74

Page 83: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-83

Step 5: Since the P-value is less than the level of significance = 0.01, we reject the null hypothesis.

Solution: P-Value Approach

Page 84: © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

© 2010 Pearson Prentice Hall. All rights reserved 12-84

Step 6: There is sufficient evidence to reject the null hypothesis at the = 0.01 level of significance. We conclude that the proportion of individuals who believe that teaching is a very prestigious career is different for at least one of the three years.

Solution