The Chi Square Test.ppt

Embed Size (px)

Citation preview

  • 7/28/2019 The Chi Square Test.ppt

    1/57

    The Chi Square Test

    2 By SDK, AIM

  • 7/28/2019 The Chi Square Test.ppt

    2/57

    Chi sq: The test of

    the goodness of fit

  • 7/28/2019 The Chi Square Test.ppt

    3/57

    The Chi Square Test

    The Chi Square Test (2) is used to

    determine how well theoretical

    distributions (Normal, Binomial, Poisson,

    etc) fit empirical distributions (Those

    obtained from samples)

    Pearson developed this test in 1990 to

    check the goodness of fit of distributions

  • 7/28/2019 The Chi Square Test.ppt

    4/57

    Consider A Particular Sample : A set of possible events

    E1, E2,, Ek, that are observed to occur withfrequencies o1, o2,, ok (called observed

    frequencies). As per the rules of probability,

    these events are expected to occur with

    frequencies e1, e2,, ek

    Event E1 E2 Ek

    Observed frequency o1 o2 okExpected frequency e1 e2 ek

  • 7/28/2019 The Chi Square Test.ppt

    5/57

    Example

    If we toss a fair coin 100 times, we may

    expect 50 heads and 50 tails. However the

    results may not be obtained exactly

  • 7/28/2019 The Chi Square Test.ppt

    6/57

    The 2

    VariableThe 2 Variable gives a measure of thedisparity existing between theobserved and the expected frequencies

    2=i=1k (oi-ei)2/eiN = total frequency =i=1

    koi =i=1kei

  • 7/28/2019 The Chi Square Test.ppt

    7/57

    Thus

    2 =i=1n[(oi2-2oiei+ei2)/ei]=i=1

    n[(oi2/ei) 2N + N]

    =i=1n

    (oi2

    /ei)-N

  • 7/28/2019 The Chi Square Test.ppt

    8/57

    Example

    I assume that themarks of a class aredistributed

    normally. Howeverwhen theexamination takesplace I realize that

    the class has had abetter performance

    Marks Observedfrequency

    Expectedfrequency

    0-20 2 5

    21-40 5 10

    41-60 18 30

    61-80 23 10

    81-100 12 5

    Total 60 60

  • 7/28/2019 The Chi Square Test.ppt

    9/57

    Marks Observedfrequency

    Expectedfrequency

    (oi-ei)2/ei

    0-20 2 5 1.8

    21-40 5 10 2.5

    41-60 18 30 4.861-80 23 10 16.9

    81-100 12 5 9.8

    Total 60 60 35.8 =2

  • 7/28/2019 The Chi Square Test.ppt

    10/57

  • 7/28/2019 The Chi Square Test.ppt

    11/57

    Also, 20 as it is the sum of squares& the larger the 2 the greater thedifference in the two distributions

  • 7/28/2019 The Chi Square Test.ppt

    12/57

    The probability function of2

  • 7/28/2019 The Chi Square Test.ppt

    13/57

    Calculation of

    Populationparameters are notknown and have to

    be estimated fromsample statistics

    = k 1 m,

    m = no of populationparametersestimated

    Populationparameters areknown m = 0

    = k - 1

  • 7/28/2019 The Chi Square Test.ppt

    14/57

    The 2

    Curve

    Table value of2

    AcceptanceArea Rejection Area

    o X

    Y

  • 7/28/2019 The Chi Square Test.ppt

    15/57

    Example at = 0.05 and df = 5-1 = 4

    Table value of2

  • 7/28/2019 The Chi Square Test.ppt

    16/57

  • 7/28/2019 The Chi Square Test.ppt

    17/57

    Steps of the Chi Square Test

    Define H0 and H1 List the observedfrequencies

    Calculate the expectedfrequencies if the datafollows a theoreticaldistribution

    Compute2

  • 7/28/2019 The Chi Square Test.ppt

    18/57

    Accept H0 if computed 2 Chi sq comp Accept H0

    H0: Data follow the Binomialdistribution

  • 7/28/2019 The Chi Square Test.ppt

    32/57

  • 7/28/2019 The Chi Square Test.ppt

    33/57

    Chi sq as a test of

    Independence

  • 7/28/2019 The Chi Square Test.ppt

    34/57

    Note that :

    The tests of significance are allbased on the assumption that the

    population is normally distributed.However it is not always possibleto assume the underlying

    distribution pattern for thesampling done

  • 7/28/2019 The Chi Square Test.ppt

    35/57

    If we classify a population into

    several categories with respectto two attributes (e.g.: age, jobpreference), we can use the Chi

    Sq Test to determine if the twoattributes are independent of

    each other

  • 7/28/2019 The Chi Square Test.ppt

    36/57

    Example

    In 4 regions National Health Company

    samples its employees attitudes towards

    job performance reviews. Respondents

    are given a choice: between the presentmethod of 2 reviews a year and the

    proposed method of quarterly reviews.

  • 7/28/2019 The Chi Square Test.ppt

    37/57

    Also,

    1. pN is the proportion of employees from the northwho prefer the present plan

    2. pE is the proportion of employees from the eastwho prefer the present plan

    3. pS is the proportion of employees from the southwho prefer the present plan

    4. pW is the proportion of employees from the westwho prefer the present plan

    H0: pN = pE = pS = pW

  • 7/28/2019 The Chi Square Test.ppt

    38/57

    Contingency table

    North South East West TotalNumber

    who

    prefer

    present

    method

    68 75 57 79 279

    Number

    who

    prefer

    new

    method

    32 45 33 31 141

    Total 100 120 90 110 420

  • 7/28/2019 The Chi Square Test.ppt

    39/57

    Thus combined proportion of employees preferringthe new method = 1 0.6643 = 0.3357

    Thus,

    1. 0.6643 = Estimate of population proportion who prefer thecurrent method

    2. 0.3357 = Estimate of population proportion who prefer thenew method

    Multiply the estimate with the total number of employeessampled in each region to get the expected number

  • 7/28/2019 The Chi Square Test.ppt

    40/57

    Contingency table

    North South East West Total

    Number

    who

    prefer

    present

    method

    68 75 57 79 279

    Number

    who

    prefer

    new

    method

    32 45 33 31 141

    Total 100 120 90 110 420

    Observed values

    North South East West Total

    Number

    who

    prefer

    present

    method

    66 80 60 73 279

    Number

    who

    prefer

    new

    method

    34 40 30 37 141

    Total 100 120 90 110 420

    Expected Values

  • 7/28/2019 The Chi Square Test.ppt

    41/57

  • 7/28/2019 The Chi Square Test.ppt

    42/57

    Thus H0 isaccepted

    Degrees of

    freedom=(4-1)(2-1)

    = 3

  • 7/28/2019 The Chi Square Test.ppt

    43/57

  • 7/28/2019 The Chi Square Test.ppt

    44/57

    Criteria C1 C2 C3 Total

    R1 O11 O12 O13R1

    R2 O21 O22 O23 R1

    Total C1 C2 C3 n

    Consider a contingency table

  • 7/28/2019 The Chi Square Test.ppt

    45/57

    H0: Ri is independent of Cj

    Or P(Ri

    Cj

    ) = P(Ri

    ) P(Cj

    )

    But P(Ri Cj) = Eij/ n

    Also P(Ri) = Ri/ nP(Cj) = Cj/ n

    Thus =P(Ri Cj) = P(Ri) P(Cj)

    = (Ri/ n)(Cj/ n)Thus Eij = Ri Cj/ n2

  • 7/28/2019 The Chi Square Test.ppt

    46/57

  • 7/28/2019 The Chi Square Test.ppt

    47/57

    Example

    In order to study the profits and losses of

    firms by industry, a random sample of 100

    firms is selected, and for each form in the

    sample, we record whether the company

    made ,money or lost money, and whetherthe firm is a service company. The data

    are summarized in a 2x2 contingency

    table. Is the possibility of making a profitindependent of type of industry?

  • 7/28/2019 The Chi Square Test.ppt

    48/57

  • 7/28/2019 The Chi Square Test.ppt

    49/57

  • 7/28/2019 The Chi Square Test.ppt

    50/57

    An insurance companys data regarding

    claims gathered by studying three

    different age groups of sample size 100each is given below

    25 and

    under

    Over25 and

    under

    50

    50 and

    over

    Claim 40 35 60No

    claim60 65 40

    Age group H0: Claim is

    not related toage

  • 7/28/2019 The Chi Square Test.ppt

    51/57

    25 and

    under

    Over

    25 and

    under

    50

    50 and

    over

    Claim 40 35 60 135

    No

    claim 60 65 40 165100 100 100 300

    Age group

    Contingency Table(Observed Values)

    25 and

    under

    Over

    25 and

    under

    50

    50 and

    over

    Claim 45 45 45 135

    No

    claim 55 55 55 165100 100 100 300

    Age group

    Contingency Table(Expected Values)

  • 7/28/2019 The Chi Square Test.ppt

    52/57

    fo fe (fo-fe)2/fe

    40 45 0.56

    35 45 2.22

    60 45 5.00

    60 55 0.45

    65 55 1.82

    40 55 4.09

    Chi sq comp 14.14df 2

    Level of

    significance0.05

    Chi sq

    tabulated5.99 Thus reject H0

  • 7/28/2019 The Chi Square Test.ppt

    53/57

    THE MEDIAN TEST

  • 7/28/2019 The Chi Square Test.ppt

    54/57

    Example

    An economist wants to

    testy the nullhypothesis that medianfamily incomes in threerural areas areapproximately equal.

    For simplicity, an equalsample size of 10 ineach population waschosen. The familyincomes are shownalongside

    Region A Region B Region C

    22 31 28

    29 37 42

    36 26 21

    40 25 4735 20 18

    50 43 23

    38 27 51

    25 41 16

    62 57 30

    16 32 48

    Family incomes $1000 per year

  • 7/28/2019 The Chi Square Test.ppt

    55/57

    Region A Region B Region C

    22 31 28

    29 37 42

    36 26 2140 25 47

    35 20 18

    50 43 23

    38 27 51

    25 41 16

    62 57 30

    16 32 48

    Median 31.5

    Family incomes $1000 per year

    Region A Region B Region C

    No of incomes less

    than median4 5 6

    No of incomes

    greater than than

    median

    6 5 4

    Family incomes $1000 per year

  • 7/28/2019 The Chi Square Test.ppt

    56/57

    Region A Region B Region C Total

    No of incomes less

    than median 4 5 6 15

    No of incomes

    greater than than

    median

    6 5 4 15

    Total 10 10 10 30

    Family incomes $1000 per year

    Contingency Table(Observed Values)

    Region A Region B Region C

    No of incomes less

    than median5 5 5 15

    No of incomes

    greater than than

    median

    5 5 5 15

    Total 10 10 10 30

    Contingency Table(Expected Values)Expected value =(10x15)/30

  • 7/28/2019 The Chi Square Test.ppt

    57/57

    fo fe (fo-fe)2/fe

    4 5 0.2

    5 5 06 5 0.2

    6 5 0.2

    5 5 0

    4 5 0.2

    Chi sq comp 0.8

    df 2

    Level of

    significance0.05

    Chi sq tabulated 5.99Thus accept H0