Upload
vineet-sharma
View
221
Download
0
Embed Size (px)
Citation preview
8/11/2019 Unit 6 Chi-Square Distribution SLM
1/22
Page 1of 22
Course: Statistics
Unit 6
Chi-Square Distribution
8/11/2019 Unit 6 Chi-Square Distribution SLM
2/22
Page 2 of 22
Table of Contents
6.1. Learning Objectives ................................................................................................................. 36.2. Introduction .............................................................................................................................. 4
6.3. ChiSquare Distribution ....................................................................................................... 4
6.3.1. Properties of2
Distribution .......................................................................................................... 5
6.3.2. Characteristics of2
Test ............................................................................................................... 5
6.3.3. Degrees of Freedom ......................................................................................................................... 6
6.3.4. Restrictions and Conditions in Applying2
Test .......................................................................... 6
6.3.5. Levels of Significance ...................................................................................................................... 76.3.6. Steps in Solving
2 Problems......................................................................................................... 8
6.3.7. Interpretation .................................................................................................................................... 8
6.4. Uses of2
Test ........................................................................................................................ 9
6.5. Application of2
Test .......................................................................................................... 9
6.5.1. Tests for Independence of Attributes ............................................................................................... 9
6.5.2. Test of Goodness of Fit .................................................................................................................. 14
6.5.3. Test for Specified Variance............................................................................................................ 20
6.6. Summary ................................................................................................................................. 21
6.7. Reference ................................................................................................................................ 21
6.7.1. Recommended Textbooks .............................................................................................................. 21
6.7.2. Web References ............................................................................................................................. 21
8/11/2019 Unit 6 Chi-Square Distribution SLM
3/22
Page 3 of 22
6.1. Learning Objectives
By the end of this unit, you should be able to:
Recognise the importance of Chi-Square test
Recall Chi-Square distribution and its properties
List the conditions under which the test can be applied Apply Chi-square as a test of Independence
Apply Chi-square as a test of goodness of fit
Apply Chi-square as a test of specified variance
Case-1:
The ABC soap manufacturing company produces four varieties of soaps withdifferent ingredients and flavours. The Companys Marketing General Manager
wants to know age-wise the preference of the consumers with respect to the
varieties. He consults a Statistician and collects the following data as per his
instruction:
Table 6.1
Age (yrs)
Product
20-30 30-40 40 & above
S1 70 120 70
S2 130 200 130S3 120 190 20
G.M is also interested in knowing the distribution of complaints received in a week
by the firm. The Statistician collects the following information:
Table 6.2
Complaints 0 1 2 3 4
Numbers
Received
250 90 40 15 10
(Cont. in topic Degrees of Freedom)
8/11/2019 Unit 6 Chi-Square Distribution SLM
4/22
Page 4 of 22
6.2. Introduction
In the previous units, we learned how to test hypotheses using data from either one or two
samples. We used one-sample tests to determine whether a mean or a proportion was significantlydifferent from a hypothesized value. In the two-sample tests, we examined the difference between
either two means or two proportions, and we tried to learn whether this difference was significant.
Suppose we have proportions from five populations instead of only two. In this case, the methods
for comparing proportions described in for testing hypothesis for two-samples do not apply; we
must use the chi-square 2 test. 2 tests enable us to test whether more than two population
proportions can be considered equal.
Actually, chi-square 2 tests allow us to do a lot more than just test for the equality of several
proportions. If we classify a population into several categories with respect to two attributes (suchas age and job performance), we can then use a chi-square 2 test to determine whether the two
attributes are independent of each other.
6.3. ChiSquare Distribution
The square of a standard normal variate is called a chi-square variate with 1 degree
of freedom. That is, if X variable is normally distributed with a mean andstandard deviation then (X -) / is a 2variate with df = 1.
If X1, X2.Xn are n independent random variables following the normaldistribution with mean and SD respectively then the 2 variate is given by:
22
2
2
12 ...........
n
Chi-square is the sum of the squares of nindependent standard normal variates,
following the 2 distribution with n degrees of freedom.
8/11/2019 Unit 6 Chi-Square Distribution SLM
5/22
Page 5 of 22
6.3.1. Properties of 2 Distribution
6.3.2. Characteristics of 2 Test
1.
Mean of2
distribution = Degree of freedom = 2. S.D. of 2 distribution = 2
3. Median of 2 distribution divides the area of the curve into two equal parts, each part
being 0.5.
4. Mode of 2 distribution is equal to degrees of freedom less 2, that is, V-2.
5.2
values are always positively skewed.
6.2
values increases with the increase in the DF, there is a new 2 distribution with
every increase in the no. of degrees of freedom.
7. The lowest value of 2 is zero and the highest is infinity i,e. 0 < 2 < .
8.
When two chi-squares 12 and 22 are independent following 2 distribution with n1
and n2degrees of freedom, their sum 12
+ 22
will follow 2 distribution with n1+ n2
degrees of freedom.
9. When 2 >30, 2 2 (2-1) approximately follows the standard normal distribution.
test is based on frequencies and not on parameters.
Its a non-parametric test where no parameters regarding the rigidity of populationparameters are required.
Additive property is also found in 2 test.
2 test is useful to test the hypothesis about the independence of attributes.
The 2 test can be use in complex contingency tables.
The 2 test is very widely used for research purposes in behavioral and social sciencesincluding business research.
It is defined as = (0E)2/ E.
8/11/2019 Unit 6 Chi-Square Distribution SLM
6/22
Page 6 of 22
6.3.3. Degrees of Freedom
If a 2 is defined as the sum of the squares of nindependent standardized normal variates and
the condition of the satisfaction of one linear relation is imposed upon them (such as the
estimation of some population parametricvalue etc.) then the effect of these nconstraints wouldbe replaced by n k. If the sum of squares is taken about the sample mean instead of the
population mean when n is replaced by n-1 = , since one linear constraint had been imposed.
6.3.4. Restrictions and Conditions in Applying2
Test
The number of degrees of freedom for n observations is n k and is usually
denoted by where kis the number of independent linear constraints imposedupon them. Suppose we are asked to write any four numbers then we will have all
the numbers of our choice. If a restriction is applied or imposed to the choice that
the sum of these numbers should be 50; then the freedom of choice would be
reduced to three only and so the degrees of freedom would now be 3.
(Cont.from topic Introduction)
In Case Study the degrees of freedom is given by (3-1) (3-1) = 4. At 5% level of
significance the tabulated value is 9.488.
(Cont. in topic Tests for Independence of Attributes)
Restrictions
The sample observations should be independently and normally distributed. For this either the
parent population should be infinitely large (say, greater than 50) or sampling should be done
with replacement.
Constraints imposed upon the observations must be linear character. For example,
The 2 distribution is essentially a continuous distribution but its character of continuity is
maintained only when the individual frequencies of the Variate values remain greater than or
equal to 5. So in applying 2 test in the testing of the goodness of fit or in a contingency
table, the cell frequency should not be less than 5. In practical problems we can combine a
few values of small frequencies into one to get the pooled frequency greater than 5.
8/11/2019 Unit 6 Chi-Square Distribution SLM
7/22
Page 7 of 22
6.3.5. Levels of Significance
Conditions:
1) The frequencies used in chi-square test must be absolute and not in relative terms.
2)
The total no. of observations collected for this test must be large.
3) Each of the observations which make up the sample of this test must be independent of
each other.
4)
As 2 test is based wholly on sample data, no assumption is made concerning the
population distribution. In other words it is a non parametric-test.
5)
2 test is wholly dependent on degrees of freedom.
6)
The expected frequency of any item or cell must not be less than 5, the frequencies of
adjacent items or cells should be polled together in order to make it more than 5.
7) The data should be expressed in original units for convenience of comparison and the
given distribution should not be replaced by relative frequencies or proportions.
This test is used only for drawing inferences through test of the hypothesis, so it cannot be
used for estimation of parameter value.
Tables have been prepared for the values of P, the probability of getting a value of2
greater than or equal to 02
where 02
be an observed value. From these
tables, we can find the value of P corresponding to an observed value if
2
andthen proceed to test whether the difference between observed and theoretical
frequencies is significant or not. Smaller the values of P, greater the divergence
between fact and theory so that small values lead us to suspect the hypothesis. Not
only small values of P lead us to suspect the hypothesis but a value of P very near
to unity may also lead to a similar result. Thus if P = 1, 2 = 0, showing that there
is perfect agreement between fact and theory which is a very improbable event.
The two conventional levels of significance are:
If P is less than 0.05, we say that the observed value of 2 is significant at 5 percent level
of significance. Similar if P less than 0.01, the value is significant at 1 % level.
The formula for calculating 2 is given by:
e
eo
f
ff 2
2
Where, f0is observed frequency, feis expected frequency.
8/11/2019 Unit 6 Chi-Square Distribution SLM
8/22
Page 8 of 22
6.3.6. Steps in Solving 2 Problems
6.3.7. Interpretation
Figure 6.1
1) Calculate the expected frequencies. In general the expected frequency for any cell can
be calculated from the following expression:
2) Take the difference between observed and expected frequencies and obtain the squares
of these differences (OE)2.
3)
Divide the values obtained in step 2 by the respective expected frequency and add all
the values to get the value according to the formula:
e
eo
f
ff 2
2
After ascertaining the 2 value, the 2 table comprises of columns headed with symbols2
0.05for 5% level of significance,2
0.01for 1% level of significance and so on. The left
hand side indicates the degrees of freedom. If the calculated value of 2 falls in the
acceptance region, the null hypothesis HOis accepted and vice-versa.
8/11/2019 Unit 6 Chi-Square Distribution SLM
9/22
Page 9 of 22
6.4. Uses of2
Test
6.5. Application of 2 Test
6.5.1. Tests for Independence of Attributes
The2
test is used broadly to:
Test goodness of fit for one way classification or for one variable only
Test of independence or interaction for more than one row or column in the form of a
contingency table concerning several attributes
Test of population Variance 2through confidence intervals suggested by 2 test
The number of degrees of freedom is given by:
DOF
The expected value is given by:
8/11/2019 Unit 6 Chi-Square Distribution SLM
10/22
8/11/2019 Unit 6 Chi-Square Distribution SLM
11/22
Page 11 of 22
Example 6.1:
The following table gives the sales of a product by 3 salesman and 3
territories. Test at 5% level of significance whether salesman and territories areindependent.
Table 6.5
Salesman
Territories
1 2 3 Total
I 5 15 20 40
II 10 20 20 50III 15 25 20 60
Total 30 60 60 150
Solution:
Table 6.6
Observed Value (O) Expected Value (E) (OE)2 (OE)
2/E
5 40 x 30/150 = 8 9 1.1250
10 50 x 30/150 = 10 0 0.0000
15 60 x 30/150 = 12 9 0.750015 40 x 60 /150 = 16 1 0.0625
20 50 x 60/150 = 20 0 0.0000
25 60 x 60/150 = 24 1 0.0417
20 40 x 60/150 = 16 16 1.0000
20 50 x 60 /150 = 20 0 0.0000
20 60 x 60/150 = 24 16 0.66672
3.6459
1. Null hypothesis Ho: The salesman and territories are independent
Alternate hypothesis HA: They are dependent
2.
Level of Significance 5% and D.O.F (31) (31) = 4 2 tab= 9.493.
Test Statistics
2
2 0
4. Test 2 cal= 3.6459
5. Conclusion: Since 2 cal(3.6459) 2
tab(3.845) Hois rejected.
They are dependent.
8/11/2019 Unit 6 Chi-Square Distribution SLM
13/22
8/11/2019 Unit 6 Chi-Square Distribution SLM
14/22
Page 14 of 22
6.5.2. Test of Goodness of Fit
Degrees of freedom is n-1
Expected value = Average of the observed values.
(Cont. from topic Tests for Independence of Attributes)
From the nature of data the Statistician observes that it is more likely to be
closer to Poisson distribution. Therefore he fits a Poisson distribution to the
observed data.
Table 6.9
No. of complaints No. of times received
X f f x X0 210 0
1 90 90
2 40 80
3 15 45
4 10 40
Total 365 255
7.06986.0365
255 m
49658.007.0
00
07.00
eme m
3476.01
7.049658.01
1217.02
7.03476.02
0284.03
7.01217.03
0050.04
7.00284.04
Tabl e 6.10
Observed Value
(O)
Expected Value
(E)
(O-E) /E
210 0.49658 x 365 = 181.3 4.543
90 0.3476 x 365 =
126.9
10.72
40 0.127 x 365 = 44.5 0.44
10
1525
0.0341 x 365 =
12.4612.61
2calculated 28.33Note:Since expected frequency for last complaint is less than 5 it is combined
with the previous clause namely 3 complaints, as per one of the conditions
for applying 2test.(Cont. on next page)
8/11/2019 Unit 6 Chi-Square Distribution SLM
15/22
Page 15 of 22
(Cont. from previous page)
1.
Null hypothesis Ho: It is a good fitAlternate hypothesis HA: It is not a good fit
2.
Level of Significance 5% and D.O.F (4 -1-1) = 2 2 tab= 5.99
3.
Test Statistics
2
2 0
4.
Test 2 cal= 28.33
5. Conclusion: Since 2 cal(28.33)