Upload
claud-thomas
View
213
Download
0
Embed Size (px)
Citation preview
Topic 4 - Statistical applications 4.4 Chi square test for
independence
IB Mathematical Studies SL2c
Syllabus reference
Content Further guidance
Hypothesis Testing
Hypothesis testing involves making a conjecture (assumption) about some facet of our world, collecting data from a sample, performing a specific mathematical test, and then deciding whether or not the conjecture is true.
A conjecture must be stated in two parts:› The null hypothesis (H0) – states that there is no significant
difference between the two parameters being tested (they are “not related to” each other, i.e. independent)
› The alternative hypothesis (H1) states that this is a significant difference.(they are “related” in some way, i.e dependent)
The only hypothesis test covered by the Studies SL course is the Chi Squared test.
Chi-square (X 2) test by GDC
The Chi-square test itself is quite straight forward, your GDC can do it in two steps but you also must know the formula and be able to do it by hand
The hypothesis test which uses Chi-square determines whether or not two variables are related. It follows a general pattern:(1) Make a conjecture(2) Write the null hypothesis using “is not related to, or “independent”;
and write the alternative hypothesis using is related to or “dependent”
(3) Calculate the chi-square test(4) Determine reference values(5) Compare the two and either accept or reject the null hypothesis
Using the GDC
You can find chi-squared on your GDC by using the statistics mode Press Menu 6: Statistics Press 7: Stat Tests Select 8: x2 2-way Test Enter the name of the observed matrix
Note : You must have entered the data in to Matrix mode first and name the matrix!!
Example on page 334 in your textbook
The table shows the results of a sample of 400 randomly selected adults classified according to gender and regular exercise.The observed table, called 2 by 2 contingency table, is given as: Regular
exerciseNo regular exercise
sum
Male 110 106 216
Female 98 86 184
sum 208 192 400
The test is used when we deal with two categorical variables and we wish to determine whether the variables are dependent, for example females may tend to exercise more regularly, or independent, where there is no evidence that the gender of person has an effect on whether they exercise regularly.
2c
Using TI Nspire Enter the observed table as a matrix and save as a.Then use the chi squared test. After you enter the results will show.
We can get the expected frequencies from the calculator by pressing var and stat ExpMatrix
How can we obtain the expected frequency table by hand?
Hand calculations of expected frequency table.For each cell, we multiply the row sum by the column sum and divide by the total.
Regular exercise
No regular exercise
sum
Male 110 106 216
Female 98 86 184
sum 208 192 400
Regular exercise
No regular exercise
sum
Male 216x208/400 216
Female 184x192/400
184
sum 208 192 400
216 192
400
´
184 208
400
´
row sum
total
column sum
Expected frequency table becomes:
Regular exercise
No regular exercise
sum
Male 112.32 103.68 216
Female 95.68 88.32 184
sum 208 192 400
How do we calculate the chi squared?
The chi squared test examines the difference between the observed values we obtained from the original sample, and the expected values we have calculated . This value will be obtained from your GDC.In this case x2=0.217
Critical value of 2c
The critical value of chi squared depends on the significance level and degrees of freedom (size of the table).
For a contingency table which has r rows and c columns, degrees of freedom df are:
( 1)( 1)df r c= - -
In the example we considered:
Regular exercise
No regular exercise
sum
Male 110 106 216
Female 98 86 184
sum 208 192 400
( 1)( 1)
(2 1)(2 1) 1
df r c= - -
= - - =
To find the critical value on GDC:
As 0.217 is less than 3.841 we accept the null hypothesis and conclude that gender and regular exercise are independent.
Practice.
Use critical chi squared value of 7.815
Step 1Add the rows and columns:
BLACK WHITE RED BLUE Total
Male 51 22 33 24 130
Female
45 36 22 27 130
Total 96 58 55 51 260
Step 2Calculate expected frequencies:
BLACK WHITE RED BLUE Total
Male 130
Female
130
Total 96 58 55 51 260
130 58
260
´
130 51
260
´130 55
260
´
130 96
260
´
[48.,29.,27.5,25.5][48.,29.,27.5,25.5]
Step 3: Use GDC
Step 4 Compare chi squared on GDC to the critical chi squared
As 6.13 is less than 7.815 we accept the null hypothesis.
Conclusion: The favourite colour is independent of gender.
Calculate degrees of freedom:Number of rows r = 2Number of column c = 4
df=(2-1)(4-1)= 3
Hand calculations of 2c
We use the formula, as given in your formula booklet:
( )2
02 e
e
f f
fc
-=å
51 48 3 9 0.1875
22 29 -7 49 1.6897
33 27.5 5.5 30.25 1.1
24 25.5 -1.5 2.25 0.08824
45 48 -3 9 0.1875
36 29 7 49 1.68966
22 27.5 -5.5 30.25 1.1
27 25.5 1.5 2.25 0.08824
Total 6.13084
of ef 0 ef f- ( )2
0 ef f-( )2
0 e
e
f f
f
-
Hand calculations of chi squared. We need to construct the table:
Example 2 - Question From what Lauren
observed, she believes that the number of hours exercised per week is dependent on gender. She collected data randomly and organised the results in the table shown.
Determine whether there is enough evidence to accept or reject the null hypothesis:› a) for α=0.01› b) for α=0.05› c) for α=0.10
Hours exercised per week
Male 5 10 12
Female 9 8 4
Example 2 - Solution Write the null and alternative
hypotheses› H0 – The number of hours exercised
each week independent on gender › H1 – The number of hours exercised
each week is dependent on gender Calculate chi-square and the p-
valueX 2 Test
X 2 = 4.69 (3sf)p = 0.0959
(3sf)df = 2
Hours exercised per week
Male 5 10 12
Female 9 8 4
• Compare p-value to each signficance level
a) 0.09>0.01, hence accept null hypothesis
b) 0.09>0.05, hence accept null hypothesis
c) 0.09<0.10, hence we reject the null hypothesis
Whilst it is not technically correct to say “accept H0” it is
still accepted in the IB.
The chi-square test formula
This formula is on the IB formula sheet
› fo is the observed frequencies(i.e the raw data)
› fe is the expected frequencies
It is easiest to perform this sum calculation using a table one step at a time.
calc2
fo fe 2
fe
Understanding the final comparison method
If you are comparing p-value with α-level then if: › p > α accept the null hypothesis› p < α reject the null hypothesis
If you are comparing X 2 with CV then if: › X 2 < CV accept the null hypothesis› X 2 > CV reject the null hypothesis