Unit 6 Chi-Square Distribution SLM

Embed Size (px)

Citation preview

  • 8/11/2019 Unit 6 Chi-Square Distribution SLM

    1/22

    Page 1of 22

    Course: Statistics

    Unit 6

    Chi-Square Distribution

  • 8/11/2019 Unit 6 Chi-Square Distribution SLM

    2/22

    Page 2 of 22

    Table of Contents

    6.1. Learning Objectives ................................................................................................................. 36.2. Introduction .............................................................................................................................. 4

    6.3. ChiSquare Distribution ....................................................................................................... 4

    6.3.1. Properties of2

    Distribution .......................................................................................................... 5

    6.3.2. Characteristics of2

    Test ............................................................................................................... 5

    6.3.3. Degrees of Freedom ......................................................................................................................... 6

    6.3.4. Restrictions and Conditions in Applying2

    Test .......................................................................... 6

    6.3.5. Levels of Significance ...................................................................................................................... 76.3.6. Steps in Solving

    2 Problems......................................................................................................... 8

    6.3.7. Interpretation .................................................................................................................................... 8

    6.4. Uses of2

    Test ........................................................................................................................ 9

    6.5. Application of2

    Test .......................................................................................................... 9

    6.5.1. Tests for Independence of Attributes ............................................................................................... 9

    6.5.2. Test of Goodness of Fit .................................................................................................................. 14

    6.5.3. Test for Specified Variance............................................................................................................ 20

    6.6. Summary ................................................................................................................................. 21

    6.7. Reference ................................................................................................................................ 21

    6.7.1. Recommended Textbooks .............................................................................................................. 21

    6.7.2. Web References ............................................................................................................................. 21

  • 8/11/2019 Unit 6 Chi-Square Distribution SLM

    3/22

    Page 3 of 22

    6.1. Learning Objectives

    By the end of this unit, you should be able to:

    Recognise the importance of Chi-Square test

    Recall Chi-Square distribution and its properties

    List the conditions under which the test can be applied Apply Chi-square as a test of Independence

    Apply Chi-square as a test of goodness of fit

    Apply Chi-square as a test of specified variance

    Case-1:

    The ABC soap manufacturing company produces four varieties of soaps withdifferent ingredients and flavours. The Companys Marketing General Manager

    wants to know age-wise the preference of the consumers with respect to the

    varieties. He consults a Statistician and collects the following data as per his

    instruction:

    Table 6.1

    Age (yrs)

    Product

    20-30 30-40 40 & above

    S1 70 120 70

    S2 130 200 130S3 120 190 20

    G.M is also interested in knowing the distribution of complaints received in a week

    by the firm. The Statistician collects the following information:

    Table 6.2

    Complaints 0 1 2 3 4

    Numbers

    Received

    250 90 40 15 10

    (Cont. in topic Degrees of Freedom)

  • 8/11/2019 Unit 6 Chi-Square Distribution SLM

    4/22

    Page 4 of 22

    6.2. Introduction

    In the previous units, we learned how to test hypotheses using data from either one or two

    samples. We used one-sample tests to determine whether a mean or a proportion was significantlydifferent from a hypothesized value. In the two-sample tests, we examined the difference between

    either two means or two proportions, and we tried to learn whether this difference was significant.

    Suppose we have proportions from five populations instead of only two. In this case, the methods

    for comparing proportions described in for testing hypothesis for two-samples do not apply; we

    must use the chi-square 2 test. 2 tests enable us to test whether more than two population

    proportions can be considered equal.

    Actually, chi-square 2 tests allow us to do a lot more than just test for the equality of several

    proportions. If we classify a population into several categories with respect to two attributes (suchas age and job performance), we can then use a chi-square 2 test to determine whether the two

    attributes are independent of each other.

    6.3. ChiSquare Distribution

    The square of a standard normal variate is called a chi-square variate with 1 degree

    of freedom. That is, if X variable is normally distributed with a mean andstandard deviation then (X -) / is a 2variate with df = 1.

    If X1, X2.Xn are n independent random variables following the normaldistribution with mean and SD respectively then the 2 variate is given by:

    22

    2

    2

    12 ...........

    n

    Chi-square is the sum of the squares of nindependent standard normal variates,

    following the 2 distribution with n degrees of freedom.

  • 8/11/2019 Unit 6 Chi-Square Distribution SLM

    5/22

    Page 5 of 22

    6.3.1. Properties of 2 Distribution

    6.3.2. Characteristics of 2 Test

    1.

    Mean of2

    distribution = Degree of freedom = 2. S.D. of 2 distribution = 2

    3. Median of 2 distribution divides the area of the curve into two equal parts, each part

    being 0.5.

    4. Mode of 2 distribution is equal to degrees of freedom less 2, that is, V-2.

    5.2

    values are always positively skewed.

    6.2

    values increases with the increase in the DF, there is a new 2 distribution with

    every increase in the no. of degrees of freedom.

    7. The lowest value of 2 is zero and the highest is infinity i,e. 0 < 2 < .

    8.

    When two chi-squares 12 and 22 are independent following 2 distribution with n1

    and n2degrees of freedom, their sum 12

    + 22

    will follow 2 distribution with n1+ n2

    degrees of freedom.

    9. When 2 >30, 2 2 (2-1) approximately follows the standard normal distribution.

    test is based on frequencies and not on parameters.

    Its a non-parametric test where no parameters regarding the rigidity of populationparameters are required.

    Additive property is also found in 2 test.

    2 test is useful to test the hypothesis about the independence of attributes.

    The 2 test can be use in complex contingency tables.

    The 2 test is very widely used for research purposes in behavioral and social sciencesincluding business research.

    It is defined as = (0E)2/ E.

  • 8/11/2019 Unit 6 Chi-Square Distribution SLM

    6/22

    Page 6 of 22

    6.3.3. Degrees of Freedom

    If a 2 is defined as the sum of the squares of nindependent standardized normal variates and

    the condition of the satisfaction of one linear relation is imposed upon them (such as the

    estimation of some population parametricvalue etc.) then the effect of these nconstraints wouldbe replaced by n k. If the sum of squares is taken about the sample mean instead of the

    population mean when n is replaced by n-1 = , since one linear constraint had been imposed.

    6.3.4. Restrictions and Conditions in Applying2

    Test

    The number of degrees of freedom for n observations is n k and is usually

    denoted by where kis the number of independent linear constraints imposedupon them. Suppose we are asked to write any four numbers then we will have all

    the numbers of our choice. If a restriction is applied or imposed to the choice that

    the sum of these numbers should be 50; then the freedom of choice would be

    reduced to three only and so the degrees of freedom would now be 3.

    (Cont.from topic Introduction)

    In Case Study the degrees of freedom is given by (3-1) (3-1) = 4. At 5% level of

    significance the tabulated value is 9.488.

    (Cont. in topic Tests for Independence of Attributes)

    Restrictions

    The sample observations should be independently and normally distributed. For this either the

    parent population should be infinitely large (say, greater than 50) or sampling should be done

    with replacement.

    Constraints imposed upon the observations must be linear character. For example,

    The 2 distribution is essentially a continuous distribution but its character of continuity is

    maintained only when the individual frequencies of the Variate values remain greater than or

    equal to 5. So in applying 2 test in the testing of the goodness of fit or in a contingency

    table, the cell frequency should not be less than 5. In practical problems we can combine a

    few values of small frequencies into one to get the pooled frequency greater than 5.

  • 8/11/2019 Unit 6 Chi-Square Distribution SLM

    7/22

    Page 7 of 22

    6.3.5. Levels of Significance

    Conditions:

    1) The frequencies used in chi-square test must be absolute and not in relative terms.

    2)

    The total no. of observations collected for this test must be large.

    3) Each of the observations which make up the sample of this test must be independent of

    each other.

    4)

    As 2 test is based wholly on sample data, no assumption is made concerning the

    population distribution. In other words it is a non parametric-test.

    5)

    2 test is wholly dependent on degrees of freedom.

    6)

    The expected frequency of any item or cell must not be less than 5, the frequencies of

    adjacent items or cells should be polled together in order to make it more than 5.

    7) The data should be expressed in original units for convenience of comparison and the

    given distribution should not be replaced by relative frequencies or proportions.

    This test is used only for drawing inferences through test of the hypothesis, so it cannot be

    used for estimation of parameter value.

    Tables have been prepared for the values of P, the probability of getting a value of2

    greater than or equal to 02

    where 02

    be an observed value. From these

    tables, we can find the value of P corresponding to an observed value if

    2

    andthen proceed to test whether the difference between observed and theoretical

    frequencies is significant or not. Smaller the values of P, greater the divergence

    between fact and theory so that small values lead us to suspect the hypothesis. Not

    only small values of P lead us to suspect the hypothesis but a value of P very near

    to unity may also lead to a similar result. Thus if P = 1, 2 = 0, showing that there

    is perfect agreement between fact and theory which is a very improbable event.

    The two conventional levels of significance are:

    If P is less than 0.05, we say that the observed value of 2 is significant at 5 percent level

    of significance. Similar if P less than 0.01, the value is significant at 1 % level.

    The formula for calculating 2 is given by:

    e

    eo

    f

    ff 2

    2

    Where, f0is observed frequency, feis expected frequency.

  • 8/11/2019 Unit 6 Chi-Square Distribution SLM

    8/22

    Page 8 of 22

    6.3.6. Steps in Solving 2 Problems

    6.3.7. Interpretation

    Figure 6.1

    1) Calculate the expected frequencies. In general the expected frequency for any cell can

    be calculated from the following expression:

    2) Take the difference between observed and expected frequencies and obtain the squares

    of these differences (OE)2.

    3)

    Divide the values obtained in step 2 by the respective expected frequency and add all

    the values to get the value according to the formula:

    e

    eo

    f

    ff 2

    2

    After ascertaining the 2 value, the 2 table comprises of columns headed with symbols2

    0.05for 5% level of significance,2

    0.01for 1% level of significance and so on. The left

    hand side indicates the degrees of freedom. If the calculated value of 2 falls in the

    acceptance region, the null hypothesis HOis accepted and vice-versa.

  • 8/11/2019 Unit 6 Chi-Square Distribution SLM

    9/22

    Page 9 of 22

    6.4. Uses of2

    Test

    6.5. Application of 2 Test

    6.5.1. Tests for Independence of Attributes

    The2

    test is used broadly to:

    Test goodness of fit for one way classification or for one variable only

    Test of independence or interaction for more than one row or column in the form of a

    contingency table concerning several attributes

    Test of population Variance 2through confidence intervals suggested by 2 test

    The number of degrees of freedom is given by:

    DOF

    The expected value is given by:

  • 8/11/2019 Unit 6 Chi-Square Distribution SLM

    10/22

  • 8/11/2019 Unit 6 Chi-Square Distribution SLM

    11/22

    Page 11 of 22

    Example 6.1:

    The following table gives the sales of a product by 3 salesman and 3

    territories. Test at 5% level of significance whether salesman and territories areindependent.

    Table 6.5

    Salesman

    Territories

    1 2 3 Total

    I 5 15 20 40

    II 10 20 20 50III 15 25 20 60

    Total 30 60 60 150

    Solution:

    Table 6.6

    Observed Value (O) Expected Value (E) (OE)2 (OE)

    2/E

    5 40 x 30/150 = 8 9 1.1250

    10 50 x 30/150 = 10 0 0.0000

    15 60 x 30/150 = 12 9 0.750015 40 x 60 /150 = 16 1 0.0625

    20 50 x 60/150 = 20 0 0.0000

    25 60 x 60/150 = 24 1 0.0417

    20 40 x 60/150 = 16 16 1.0000

    20 50 x 60 /150 = 20 0 0.0000

    20 60 x 60/150 = 24 16 0.66672

    3.6459

    1. Null hypothesis Ho: The salesman and territories are independent

    Alternate hypothesis HA: They are dependent

    2.

    Level of Significance 5% and D.O.F (31) (31) = 4 2 tab= 9.493.

    Test Statistics

    2

    2 0

    4. Test 2 cal= 3.6459

    5. Conclusion: Since 2 cal(3.6459) 2

    tab(3.845) Hois rejected.

    They are dependent.

  • 8/11/2019 Unit 6 Chi-Square Distribution SLM

    13/22

  • 8/11/2019 Unit 6 Chi-Square Distribution SLM

    14/22

    Page 14 of 22

    6.5.2. Test of Goodness of Fit

    Degrees of freedom is n-1

    Expected value = Average of the observed values.

    (Cont. from topic Tests for Independence of Attributes)

    From the nature of data the Statistician observes that it is more likely to be

    closer to Poisson distribution. Therefore he fits a Poisson distribution to the

    observed data.

    Table 6.9

    No. of complaints No. of times received

    X f f x X0 210 0

    1 90 90

    2 40 80

    3 15 45

    4 10 40

    Total 365 255

    7.06986.0365

    255 m

    49658.007.0

    00

    07.00

    eme m

    3476.01

    7.049658.01

    1217.02

    7.03476.02

    0284.03

    7.01217.03

    0050.04

    7.00284.04

    Tabl e 6.10

    Observed Value

    (O)

    Expected Value

    (E)

    (O-E) /E

    210 0.49658 x 365 = 181.3 4.543

    90 0.3476 x 365 =

    126.9

    10.72

    40 0.127 x 365 = 44.5 0.44

    10

    1525

    0.0341 x 365 =

    12.4612.61

    2calculated 28.33Note:Since expected frequency for last complaint is less than 5 it is combined

    with the previous clause namely 3 complaints, as per one of the conditions

    for applying 2test.(Cont. on next page)

  • 8/11/2019 Unit 6 Chi-Square Distribution SLM

    15/22

    Page 15 of 22

    (Cont. from previous page)

    1.

    Null hypothesis Ho: It is a good fitAlternate hypothesis HA: It is not a good fit

    2.

    Level of Significance 5% and D.O.F (4 -1-1) = 2 2 tab= 5.99

    3.

    Test Statistics

    2

    2 0

    4.

    Test 2 cal= 28.33

    5. Conclusion: Since 2 cal(28.33)