20
Multinomial Experiments Goodness of Fit Tests We have just seen an example of comparing two proportions. For that analysis, we used the normal distribution as a sampling distribution. We will now look an problems where we compare more than two proportions. We will not be able to use the normal distribution, but will use a different distribution called the Chi-Square or 2 Distribution. Consider the problem of testing a die to see if it it fair . The die has six numbers, all equally likely. If die is fair, then each number should have a probability of 1/6. In the long run, each number will come up 1/6 of the number of rolls. Suppose I want to test a die to see if it is fair . I take a sample of 60 rolls. Theoretically, each number should come up 1/6*60 = 10 times. If the numbers are not all 10, either the die is not fair, or, the die is fair, and the numbers different from 10 are explained by sampling variation. T o sort this out, I need a hypotheses test, a s ampling distribution, and a p-value. 1 Section 11.2, Page 241

JK 11 Chi-Square Applications

  • Upload
    xaxjp3p

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

Page 1: JK 11 Chi-Square Applications

8/8/2019 JK 11 Chi-Square Applications

http://slidepdf.com/reader/full/jk-11-chi-square-applications 1/20

Multinomial Experiments

Goodness of Fit TestsWe have just seen an example of comparing twoproportions. For that analysis, we used the normaldistribution as a sampling distribution.

We will now look an problems where we compare morethan two proportions. We will not be able to use thenormal distribution, but will use a different distributioncalled the Chi-Square or 2 Distribution.

Consider the problem of testing a die to see if it it fair. The

die has six numbers, all equally likely. If die is fair, theneach number should have a probability of 1/6. In the longrun, each number will come up 1/6 of the number of rolls.

Suppose I want to test a die to see if it is fair. I take asample of 60 rolls. Theoretically, each number shouldcome up 1/6*60 = 10 times. If the numbers are not all 10,either the die is not fair, or, the die is fair, and the numbersdifferent from 10 are explained by sampling variation.

To sort this out, I need a hypotheses test, a samplingdistribution, and a p-value.

1Section 11.2, Page 241

Page 2: JK 11 Chi-Square Applications

8/8/2019 JK 11 Chi-Square Applications

http://slidepdf.com/reader/full/jk-11-chi-square-applications 2/20

Goodness of Fit Test

Fair Die ExampleFollowing is the distribution of the observedfrequencies of results from rolling a die 60 times. Isthe die fair?

The hypotheses are as follows:

Clearly, the observed frequencies are not all equal tothe theoretical frequencies of 10. We need a way tomeasure how big the miss is to see if it likely to bedue to sampling variation, or if it is so large as to not beexplained by sampling variation.

2Section 11.2, Page 243

Page 3: JK 11 Chi-Square Applications

8/8/2019 JK 11 Chi-Square Applications

http://slidepdf.com/reader/full/jk-11-chi-square-applications 3/20

Chi Square Statistic

Fair Die ExampleWe calculate the miss called the chi-square statisticsimilarly to the way we calculate the variance.

Note that the expected frequencies always equal thetotal number of observations Ho true proportion for

each cell or proportion. Also note that the totalexpected frequencies always equals the total observedfrequencies.

The 2 Statistic is the Total, 2.2. Also, note thatminimum value of the 2 is zero. If we took another

sample, we would likely get a different value for thechi-square statistic.3Section 11.1, Page 241

Page 4: JK 11 Chi-Square Applications

8/8/2019 JK 11 Chi-Square Applications

http://slidepdf.com/reader/full/jk-11-chi-square-applications 4/20

Chi-Square Distribution

Fair Die ExampleNow we need a sampling distribution for the 2 statistic= 2.2, so we can calculate the probability of getting a 2

2.2 when the true proportions are all equal to 1/6.2 Distribution for 5 df

This is a distribution of all possible 2 statisticscalculated from all possible samples of 60observations when there are 6 proportions or cells.Note that the degree of freedom equals the numberof proportions 1.

Finding the p-value on the TI-83, Given 2 Stat, df PRGM CHI2DISTLOWER BOUND: 2.2UPPER BOUND: 2ND E99df: 5

Output: P-VALUE = 0.8208The null hypothesis cannot be rejected.

4Section 11.2, Page 242

Page 5: JK 11 Chi-Square Applications

8/8/2019 JK 11 Chi-Square Applications

http://slidepdf.com/reader/full/jk-11-chi-square-applications 5/20

Chi-Square Distribution

Conditions

The sample is random and the observed

data represents counts of of individuals inindividual categories of a categoricalvariable

Each expected count is 5 or greater

5Section 11.1, Page 240

Page 6: JK 11 Chi-Square Applications

8/8/2019 JK 11 Chi-Square Applications

http://slidepdf.com/reader/full/jk-11-chi-square-applications 6/20

Goodness of Fit Test

Fair Die Example TI-83 Add-InFollowing is the distribution of the observedfrequencies of results from rolling a die 60 times. Isthe die fair?

The hypotheses are as follows:

6Section 11.2, Page 243

Each expected cell = 1/6*60 = 10.

STAT-EDIT LI: Enter the observed frequencynumbersL2: Enter the expected values, 10 in each of 6cells.PRGM GOODFITOBSERVED LIST = 2ND L1EXPECTED LIST = 2ND L2Answer: p-value = .8208, Chi-Square Stat = 2.2Since p-value > ,05, H o cannot be rejected.

Page 7: JK 11 Chi-Square Applications

8/8/2019 JK 11 Chi-Square Applications

http://slidepdf.com/reader/full/jk-11-chi-square-applications 7/20

Goodness of Fit Test

Mendelian Theory Problem

Mendel s genetic theory of inheritance claims that thefrequencies of round and yellow, wrinkled and yellow, roundand green, and wrinkled and green peas will occur in the ratio of 9:3:3:1. In testing the theory, Mendel obtained frequencies of 315, 101, 108, and 32 respectively. Does the data contradict thetheory. Do a hypotheses test.

Ho: The data fits the theoryHa: The data does not fit the theory.

C alculation of Expected Values

Observed ExpectedProportions

Expected C ount

315 9/16 9/16 *556 = 312.75

101 3/16 3/16 * 556 = 104.25

108 3/16 3/16 *556 = 104.25

32 1/16 1/16 *556= 34.75

Total = 556 Total = 1 Total = 556

7Section 11.2, Page 245

Page 8: JK 11 Chi-Square Applications

8/8/2019 JK 11 Chi-Square Applications

http://slidepdf.com/reader/full/jk-11-chi-square-applications 8/20

Goodness of Fit Test

Mendelian Theory ProblemObserved Expected

ProportionsExpected C ount

315 9/16 9/16 *556 = 312.75

101 3/16 3/16 * 556 = 104.25

108 3/16 3/16 *556 = 104.25

32 1/16 1/16 * 556 = 34.75

Total = 556 Total = 1 Total = 556

STAT EDIT: Enter observed data in L1 and expected in L2

PRGM GOODFITOBSERVED LIST = 2ND L1EXPECTED LIST = 2ND L2Answer: p-value = .9254, Chi-Square Stat = .47The null hypothesis cannot be rejected. The observeddata does not contradict the theory

8Section 11.2, Page 245

Page 9: JK 11 Chi-Square Applications

8/8/2019 JK 11 Chi-Square Applications

http://slidepdf.com/reader/full/jk-11-chi-square-applications 9/20

Problems

a. Perform a hypotheses test to see if thepreferences are not all the same. State thehypotheses.

b. Find the p-value and state your conclusionc. What is the name of the model used for the

sampling distribution?

9Problems, Page 252

Page 10: JK 11 Chi-Square Applications

8/8/2019 JK 11 Chi-Square Applications

http://slidepdf.com/reader/full/jk-11-chi-square-applications 10/20

Problems

a. Perform a hypotheses test to see of theobserved data is consistent with the statedratios. State the appropriate hypotheses.

b. Find the expected counts for each color.c. What are the necessary conditions for the

sampling distribution?d. What is the name of the model used for the

sampling distribution?e. Find the p-value and state your conclusion.

10Problems, Page 252

Page 11: JK 11 Chi-Square Applications

8/8/2019 JK 11 Chi-Square Applications

http://slidepdf.com/reader/full/jk-11-chi-square-applications 11/20

Test for Independence

Following is a two way table. In this case, two categoricalvariables are measured on one group of college students. Foreach student, their Gender and Favorite Subject Area arerecorded.

Independence of Two VariablesConsider the Social Science category. 113/300 or 38% of allstudents chose Social Science. However, 41/122 or 34% of males chose the category and 72/178 or 40% of Femaleschose the category.Considering this a probability distribution, if I pick a personat random, there is a 38% chance the person chose SocialScience. However, it you tell me the person is a female, thenthe probability is 40% they chose the category.This is an indication that the two variables are notindependent, but related.Two variables are independent, if knowing the outcome of one variable does not change the probability of theoutcome of the other variable.

11Section 11.3, Page 246

Page 12: JK 11 Chi-Square Applications

8/8/2019 JK 11 Chi-Square Applications

http://slidepdf.com/reader/full/jk-11-chi-square-applications 12/20

Tests for Independence

To test for independence, we will use Chi-Squaremethods. The appropriate hypotheses are:

Ho: The variables are independentHa: The variables are not independent

Next, we need to calculate the expected valuesfor each cell of the data matrix under theassumption that the variables are independent.For example, if the variables are independent,then the the overall proportion of of students inthe Social science category is 113/300 = .3767.

Both the proportions for the category have to bethe same. The expected value for Males is0.3767*122= 45.95 and the expected values forFemales is 0.3767*178 = 67.05.

12Section 11.3, Page 248

Page 13: JK 11 Chi-Square Applications

8/8/2019 JK 11 Chi-Square Applications

http://slidepdf.com/reader/full/jk-11-chi-square-applications 13/20

Test for Independence

Shown above in the parentheses are all theexpected values. Next we need to calculate the

2 statistic for each data cell. For example, for

the first cell: (37-29.28) 2/29.28 = 2.0355.

Adding up the cell calculations for the 6 cellsgives total 2 statistic of 4.604. The formula fordf =(#rows 1)*(#columns 1) = (2-1)*(3-1) = 2.

The area under the curve to the right of 4.604= .1001 > .05. The null hypotheses cannot be

rejected.

13Section 11.3, Page 248

Page 14: JK 11 Chi-Square Applications

8/8/2019 JK 11 Chi-Square Applications

http://slidepdf.com/reader/full/jk-11-chi-square-applications 14/20

Test for Independence

Black Box Program

Ho: The variables are independentHa: The variables are not independent

2nd

MATRIX EDIT2 ENTER 3 (The data table is 2 rows and 3

columns. Ignore total row andtotal column)

Enter the data in matrix [A] left to rightSTAT-TESTS-C: 2-TEST

Observed: [A]Expected: [B]CalculateAnswer: p-value = .0999, 2-Stat = 4.60632nd MATRIX EDIT [B] ENTERDisplays the Expected Values MatrixAll cells 5; conditions satisfied

14Section 11.3, Page 248

Page 15: JK 11 Chi-Square Applications

8/8/2019 JK 11 Chi-Square Applications

http://slidepdf.com/reader/full/jk-11-chi-square-applications 15/20

Problems

a. Test the hypotheses that the size of community reared in is independent of thesize of community residing in. State theappropriate hypotheses.

b. Find the p-value and state your conclusionc. What is the name of the sampling

distribution?d. What are the necessary conditions, and are

they satisfied?

15Section 11.3, Page 254

Page 16: JK 11 Chi-Square Applications

8/8/2019 JK 11 Chi-Square Applications

http://slidepdf.com/reader/full/jk-11-chi-square-applications 16/20

Problems

a. Test the hypotheses that years of employmentand knowing what supervisor expects areindependent. State the appropriatehypotheses.

b. Find the p-value and state your conclusionc. What is the name of the sampling

distribution?d. What are the necessary conditions, and are

they satisfied?

16Section 11.3, Page 254

Page 17: JK 11 Chi-Square Applications

8/8/2019 JK 11 Chi-Square Applications

http://slidepdf.com/reader/full/jk-11-chi-square-applications 17/20

Tests for HomogeneityAnother application of Chi-Square procedures is testfor homogeneity, or essentially, a test whetherdifferent groups have the same distribution for agiven variable.Consider the table below that gives voter s opinion

on a proposal broken down by separate locations.

In the case of a test for independence, we had onegroup of individuals and measure two categorical

variables in that group.In the case of a test for homogeneity, we have onecategorical variable, Opinion on Proposal, and threeseparately located groups of voters. The hypothesisare:

Ho: The distributions are homogeneousHa: The distributions are not homogeneous

17Section 11.3, Page 250

Page 18: JK 11 Chi-Square Applications

8/8/2019 JK 11 Chi-Square Applications

http://slidepdf.com/reader/full/jk-11-chi-square-applications 18/20

Tests for HomogeneityThe mechanics for a test of homogeneity are exactlythe same as for a test of independence. We calculatethe expected values under the assumption Ho is true.

The proportion favor are all assumed to be254/500 = .5080. The expected value forurban is .5080*200 = 101.6. The 2 Stat forcell 1 = (143-101.6)2/101.6 = 16.8897. The

total 2 statistic for all cells is 91.72.

The df = 2 and the p-value = 1.21E-20 0

Ho is rejected, the distributions are not the same.

18Section 11.3, Page 251

Page 19: JK 11 Chi-Square Applications

8/8/2019 JK 11 Chi-Square Applications

http://slidepdf.com/reader/full/jk-11-chi-square-applications 19/20

Problems

19Section 11.3, Page 255

a. State the hypotheses.b. Find the p-value and state your

conclusion.

Page 20: JK 11 Chi-Square Applications

8/8/2019 JK 11 Chi-Square Applications

http://slidepdf.com/reader/full/jk-11-chi-square-applications 20/20

Problems

20Section 11.3, Page 255

a. State the hypotheses.b. Find the p-value and state your

conclusion.