Upload
social-media-marketing
View
997
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
The Practice of StatisticsThird Edition
Chapter (13.1) 14.1:Chi-square Test for Goodness of Fit
Copyright © 2008 by W. H. Freeman & Company
Daniel S. Yates
When to use the Chi Square, χ2, Procedure
• Used when the dependent variable is categorical or ranked data.
• When the assumptions about the population are not reasonable. For example, populations that are non-normal distributions.
This Chapter will Cover Three Tests based on the Chi-square
Distributions.• Test if observed counts for a categorical data
could come from a certain hypothesized distribution. ( Goodness of Fit).
• Test whether a single categorical variable has the same distribution in two or more distinct population. ( Inference for Two-Way Tables, Tests for Homogeneity of Populations)
• Test whether two categorical variables are associated or independent. (Inference for Two-Way Tables, Tests for of Association/ Independence.)
Required Conditions For Goodness of Fit Procedure
• SRS • The observations must be independent and each
observation must fit into one and only one cell or category.
• All individual expected counts are at least one and no more than 20% of the expected counts are less than 5.
• Please note: We are working with counts – not proportions.
• There is no mention of normality. Chi-squared procedures do not rely on assumptions about the population from which the sample is selected.
Hypothesizes for Goodness of Fit Test
• H0 = the actual population proportions are equal to the hypothesized proportions.
• Ha = the actual proportions are different from the hypothesized proportions.
Chi-square Test Statistic
E
EO 22
22
count expected
count) expected -count (observed
Degree of freedom = k – 1, where k is the number of categories.
Use the appropriate chi-square distribution based on degree of freedom, to find the critical value of χ2 at an α level.
Properties of Chi-square Distributions
• Total area under the curve is one.• Each chi-square distribution except for df
= 1 start at the origin, increases to a peak and then approach the x-axis asymptotically form above.
• Each distribution is skewed to the right. As the number of degrees of freedom increases the distribution becomes for symmetrical and looks like a normal curve.
Example 1• Consider the problem of determining whether the distribution of car sales in
the Eastern United States in the current year for Nissans, Mazdas, Toyotas and Hondas is the same as the known distribution of the pervious year, given in the table below:
Nissan 18%
Mazda 10%
Toyota 35%
Honda 37%
From the Motor Vehicle Bureau records, we select a random sample of 1,000 of new car purchases for one of these four types of foreign cars in the current year. The information is displayed below:
Frequency
Nissan 150
Mazda 65
Toyota 385
Honda 400
Is the current year’s sales distribution the same as last year’s sales ?
Example 1 Continued
• Step 1 – We want to determine if the sales distribution is different from last year’s sales distribution.– Population – this year sales of Nissan, Mazda,
Toyota, and Hondas.– Parameter – the proportion of each car sold.– H0 = The current year’s sales distribution is the same
as that of the pervious year’s distribution ( Nissan: 18%, Mazda: 10%, Toyota: 35%, and Honda: 37%).
– Ha = The current year’s sales distribution is not the same as the previous year.
Example 1 Continued
• Step 2 Condition– SRS – Random sample taken from the Motor Vehicle
Bureau. We do not know if the sample was taken from all state motor vehicle bureau is eastern United States. We will assume we have an SRS.
– Expected counts:Nissan: 0.18 x 1000 = 180
Mazda: 0.10 x 1000 = 100 Toyota: 0.35 x 1000 = 350 Honda: 0.37 x 1000 = 370 All expected counts are at least 5 or more.– Independence - observations or counts are
independent.
Example 1 Continued• Step 3 Calculations
Nissan 150 180 5
Mazda 65 100 12.25
Toyota 385 350 3.50
Honda 400 370 2.43
Observed Expected
Count (O) Count (E)
E
EO 2
Sum = 23.18
k
i E
EO
1
22
14 18.23 From Table D using df = 3 and α = 0.05, the critical χ2 * = 7.81.
Example 1 Continued
• Step 4 InterpretationSince χ2 = 23.18 is to the right of χ2*, the P-value is smaller than α = 0.05. The results are statistically significant to reject H0. The current sales distribution is not the same as last year’s sales distribution.
• The test only tells you there is a change. Additional analysis may be required. We need to look at (O –E)2/E column to find the major contributor to the Chi-square statistic. In this problem, not as many Mazda were sold in the current year.
Example 2
• Are you more likely to have a motor vehicle collision when using a cell phone? A study of 699 drivers who were using a cell phone when they were involved in a collision examined this question. These drivers made 26,798 cell phone calls during a 14 month study period, Each of the 699 collisions was classified in various ways. Here are the counts for each day of the week:
Day: Sun Mon Tues Wed Thu Fri Sat Total Num 20 133 126 159 136 113 12 699 Are the accidents equally likely to occur on any day of
the week?
Example 2 Continued
• Step 1– Population?– Parameter?– H0?– Ha?
• Step 1– Population – all accidents
involving cell phones.– Parameter – proportion of
accidents for each day of the week.
– H0: Motor vehicle accidents involving cell phone use are equally likely to occur on each day of the week.
– Ha: The probabilities of a motor accident involving a cell phone use vary from day to day ( not all the same.)
Example 2 continued
• Step 2 Conditions– SRS?– Expected counts?– Independent?
• Step 2– SRS Assume an SRS.– Expected counts are:Sun 699 x (1/7) = 99.857Mon 699 x (1/7) = 99.857Tue 699 x (1/7) = 99.857Wed 699 x (1/7) = 99.857Thu 699 x (1/7) = 99.857 Fri 699 x (1/7) = 99.857 Sat 699 x (1/7) = 99.857All expected counts are
greater than 5.- The observed counts are
independent.
Example 2 Continued
• Step 3 Calculations
Use calculator.
L1 = Observed counts
L2 = Expected counts
L3 – (O –E)2/E = (L1 – L2)2/ L2
Sum (L3)
Sum = χ2
2nd Distr χ2cdf( Lower bound, Upper bound, df)
Example 2 Continued
• Step 4 Interpretation– The P-value is extremely small. At α = 0.05
we would reject H0. The accidents involving cell phones are not evenly distributed over the days of the week.
– Additional analysis: Saturday and Sunday provided the biggest contribution to χ2 statistic. There were less accidents involving cell phones over the weekends.