Upload
xaxjp3p
View
218
Download
0
Embed Size (px)
Citation preview
8/8/2019 JK 11 Chi-Square Applications
http://slidepdf.com/reader/full/jk-11-chi-square-applications 1/20
Multinomial Experiments
Goodness of Fit TestsWe have just seen an example of comparing twoproportions. For that analysis, we used the normaldistribution as a sampling distribution.
We will now look an problems where we compare morethan two proportions. We will not be able to use thenormal distribution, but will use a different distributioncalled the Chi-Square or 2 Distribution.
Consider the problem of testing a die to see if it it fair. The
die has six numbers, all equally likely. If die is fair, theneach number should have a probability of 1/6. In the longrun, each number will come up 1/6 of the number of rolls.
Suppose I want to test a die to see if it is fair. I take asample of 60 rolls. Theoretically, each number shouldcome up 1/6*60 = 10 times. If the numbers are not all 10,either the die is not fair, or, the die is fair, and the numbersdifferent from 10 are explained by sampling variation.
To sort this out, I need a hypotheses test, a samplingdistribution, and a p-value.
1Section 11.2, Page 241
8/8/2019 JK 11 Chi-Square Applications
http://slidepdf.com/reader/full/jk-11-chi-square-applications 2/20
Goodness of Fit Test
Fair Die ExampleFollowing is the distribution of the observedfrequencies of results from rolling a die 60 times. Isthe die fair?
The hypotheses are as follows:
Clearly, the observed frequencies are not all equal tothe theoretical frequencies of 10. We need a way tomeasure how big the miss is to see if it likely to bedue to sampling variation, or if it is so large as to not beexplained by sampling variation.
2Section 11.2, Page 243
8/8/2019 JK 11 Chi-Square Applications
http://slidepdf.com/reader/full/jk-11-chi-square-applications 3/20
Chi Square Statistic
Fair Die ExampleWe calculate the miss called the chi-square statisticsimilarly to the way we calculate the variance.
Note that the expected frequencies always equal thetotal number of observations Ho true proportion for
each cell or proportion. Also note that the totalexpected frequencies always equals the total observedfrequencies.
The 2 Statistic is the Total, 2.2. Also, note thatminimum value of the 2 is zero. If we took another
sample, we would likely get a different value for thechi-square statistic.3Section 11.1, Page 241
8/8/2019 JK 11 Chi-Square Applications
http://slidepdf.com/reader/full/jk-11-chi-square-applications 4/20
Chi-Square Distribution
Fair Die ExampleNow we need a sampling distribution for the 2 statistic= 2.2, so we can calculate the probability of getting a 2
2.2 when the true proportions are all equal to 1/6.2 Distribution for 5 df
This is a distribution of all possible 2 statisticscalculated from all possible samples of 60observations when there are 6 proportions or cells.Note that the degree of freedom equals the numberof proportions 1.
Finding the p-value on the TI-83, Given 2 Stat, df PRGM CHI2DISTLOWER BOUND: 2.2UPPER BOUND: 2ND E99df: 5
Output: P-VALUE = 0.8208The null hypothesis cannot be rejected.
4Section 11.2, Page 242
8/8/2019 JK 11 Chi-Square Applications
http://slidepdf.com/reader/full/jk-11-chi-square-applications 5/20
Chi-Square Distribution
Conditions
The sample is random and the observed
data represents counts of of individuals inindividual categories of a categoricalvariable
Each expected count is 5 or greater
5Section 11.1, Page 240
8/8/2019 JK 11 Chi-Square Applications
http://slidepdf.com/reader/full/jk-11-chi-square-applications 6/20
Goodness of Fit Test
Fair Die Example TI-83 Add-InFollowing is the distribution of the observedfrequencies of results from rolling a die 60 times. Isthe die fair?
The hypotheses are as follows:
6Section 11.2, Page 243
Each expected cell = 1/6*60 = 10.
STAT-EDIT LI: Enter the observed frequencynumbersL2: Enter the expected values, 10 in each of 6cells.PRGM GOODFITOBSERVED LIST = 2ND L1EXPECTED LIST = 2ND L2Answer: p-value = .8208, Chi-Square Stat = 2.2Since p-value > ,05, H o cannot be rejected.
8/8/2019 JK 11 Chi-Square Applications
http://slidepdf.com/reader/full/jk-11-chi-square-applications 7/20
Goodness of Fit Test
Mendelian Theory Problem
Mendel s genetic theory of inheritance claims that thefrequencies of round and yellow, wrinkled and yellow, roundand green, and wrinkled and green peas will occur in the ratio of 9:3:3:1. In testing the theory, Mendel obtained frequencies of 315, 101, 108, and 32 respectively. Does the data contradict thetheory. Do a hypotheses test.
Ho: The data fits the theoryHa: The data does not fit the theory.
C alculation of Expected Values
Observed ExpectedProportions
Expected C ount
315 9/16 9/16 *556 = 312.75
101 3/16 3/16 * 556 = 104.25
108 3/16 3/16 *556 = 104.25
32 1/16 1/16 *556= 34.75
Total = 556 Total = 1 Total = 556
7Section 11.2, Page 245
8/8/2019 JK 11 Chi-Square Applications
http://slidepdf.com/reader/full/jk-11-chi-square-applications 8/20
Goodness of Fit Test
Mendelian Theory ProblemObserved Expected
ProportionsExpected C ount
315 9/16 9/16 *556 = 312.75
101 3/16 3/16 * 556 = 104.25
108 3/16 3/16 *556 = 104.25
32 1/16 1/16 * 556 = 34.75
Total = 556 Total = 1 Total = 556
STAT EDIT: Enter observed data in L1 and expected in L2
PRGM GOODFITOBSERVED LIST = 2ND L1EXPECTED LIST = 2ND L2Answer: p-value = .9254, Chi-Square Stat = .47The null hypothesis cannot be rejected. The observeddata does not contradict the theory
8Section 11.2, Page 245
8/8/2019 JK 11 Chi-Square Applications
http://slidepdf.com/reader/full/jk-11-chi-square-applications 9/20
Problems
a. Perform a hypotheses test to see if thepreferences are not all the same. State thehypotheses.
b. Find the p-value and state your conclusionc. What is the name of the model used for the
sampling distribution?
9Problems, Page 252
8/8/2019 JK 11 Chi-Square Applications
http://slidepdf.com/reader/full/jk-11-chi-square-applications 10/20
Problems
a. Perform a hypotheses test to see of theobserved data is consistent with the statedratios. State the appropriate hypotheses.
b. Find the expected counts for each color.c. What are the necessary conditions for the
sampling distribution?d. What is the name of the model used for the
sampling distribution?e. Find the p-value and state your conclusion.
10Problems, Page 252
8/8/2019 JK 11 Chi-Square Applications
http://slidepdf.com/reader/full/jk-11-chi-square-applications 11/20
Test for Independence
Following is a two way table. In this case, two categoricalvariables are measured on one group of college students. Foreach student, their Gender and Favorite Subject Area arerecorded.
Independence of Two VariablesConsider the Social Science category. 113/300 or 38% of allstudents chose Social Science. However, 41/122 or 34% of males chose the category and 72/178 or 40% of Femaleschose the category.Considering this a probability distribution, if I pick a personat random, there is a 38% chance the person chose SocialScience. However, it you tell me the person is a female, thenthe probability is 40% they chose the category.This is an indication that the two variables are notindependent, but related.Two variables are independent, if knowing the outcome of one variable does not change the probability of theoutcome of the other variable.
11Section 11.3, Page 246
8/8/2019 JK 11 Chi-Square Applications
http://slidepdf.com/reader/full/jk-11-chi-square-applications 12/20
Tests for Independence
To test for independence, we will use Chi-Squaremethods. The appropriate hypotheses are:
Ho: The variables are independentHa: The variables are not independent
Next, we need to calculate the expected valuesfor each cell of the data matrix under theassumption that the variables are independent.For example, if the variables are independent,then the the overall proportion of of students inthe Social science category is 113/300 = .3767.
Both the proportions for the category have to bethe same. The expected value for Males is0.3767*122= 45.95 and the expected values forFemales is 0.3767*178 = 67.05.
12Section 11.3, Page 248
8/8/2019 JK 11 Chi-Square Applications
http://slidepdf.com/reader/full/jk-11-chi-square-applications 13/20
Test for Independence
Shown above in the parentheses are all theexpected values. Next we need to calculate the
2 statistic for each data cell. For example, for
the first cell: (37-29.28) 2/29.28 = 2.0355.
Adding up the cell calculations for the 6 cellsgives total 2 statistic of 4.604. The formula fordf =(#rows 1)*(#columns 1) = (2-1)*(3-1) = 2.
The area under the curve to the right of 4.604= .1001 > .05. The null hypotheses cannot be
rejected.
13Section 11.3, Page 248
8/8/2019 JK 11 Chi-Square Applications
http://slidepdf.com/reader/full/jk-11-chi-square-applications 14/20
Test for Independence
Black Box Program
Ho: The variables are independentHa: The variables are not independent
2nd
MATRIX EDIT2 ENTER 3 (The data table is 2 rows and 3
columns. Ignore total row andtotal column)
Enter the data in matrix [A] left to rightSTAT-TESTS-C: 2-TEST
Observed: [A]Expected: [B]CalculateAnswer: p-value = .0999, 2-Stat = 4.60632nd MATRIX EDIT [B] ENTERDisplays the Expected Values MatrixAll cells 5; conditions satisfied
14Section 11.3, Page 248
8/8/2019 JK 11 Chi-Square Applications
http://slidepdf.com/reader/full/jk-11-chi-square-applications 15/20
Problems
a. Test the hypotheses that the size of community reared in is independent of thesize of community residing in. State theappropriate hypotheses.
b. Find the p-value and state your conclusionc. What is the name of the sampling
distribution?d. What are the necessary conditions, and are
they satisfied?
15Section 11.3, Page 254
8/8/2019 JK 11 Chi-Square Applications
http://slidepdf.com/reader/full/jk-11-chi-square-applications 16/20
Problems
a. Test the hypotheses that years of employmentand knowing what supervisor expects areindependent. State the appropriatehypotheses.
b. Find the p-value and state your conclusionc. What is the name of the sampling
distribution?d. What are the necessary conditions, and are
they satisfied?
16Section 11.3, Page 254
8/8/2019 JK 11 Chi-Square Applications
http://slidepdf.com/reader/full/jk-11-chi-square-applications 17/20
Tests for HomogeneityAnother application of Chi-Square procedures is testfor homogeneity, or essentially, a test whetherdifferent groups have the same distribution for agiven variable.Consider the table below that gives voter s opinion
on a proposal broken down by separate locations.
In the case of a test for independence, we had onegroup of individuals and measure two categorical
variables in that group.In the case of a test for homogeneity, we have onecategorical variable, Opinion on Proposal, and threeseparately located groups of voters. The hypothesisare:
Ho: The distributions are homogeneousHa: The distributions are not homogeneous
17Section 11.3, Page 250
8/8/2019 JK 11 Chi-Square Applications
http://slidepdf.com/reader/full/jk-11-chi-square-applications 18/20
Tests for HomogeneityThe mechanics for a test of homogeneity are exactlythe same as for a test of independence. We calculatethe expected values under the assumption Ho is true.
The proportion favor are all assumed to be254/500 = .5080. The expected value forurban is .5080*200 = 101.6. The 2 Stat forcell 1 = (143-101.6)2/101.6 = 16.8897. The
total 2 statistic for all cells is 91.72.
The df = 2 and the p-value = 1.21E-20 0
Ho is rejected, the distributions are not the same.
18Section 11.3, Page 251
8/8/2019 JK 11 Chi-Square Applications
http://slidepdf.com/reader/full/jk-11-chi-square-applications 19/20
Problems
19Section 11.3, Page 255
a. State the hypotheses.b. Find the p-value and state your
conclusion.
8/8/2019 JK 11 Chi-Square Applications
http://slidepdf.com/reader/full/jk-11-chi-square-applications 20/20
Problems
20Section 11.3, Page 255
a. State the hypotheses.b. Find the p-value and state your
conclusion.