27
Topic 4 - Statistical applications 4.4 Chi square test for independence IB Mathematical Studies SL 2 c

ContentFurther guidance Hypothesis testing involves making a conjecture (assumption) about some facet of our world, collecting data from a sample,

Embed Size (px)

Citation preview

Page 1: ContentFurther guidance  Hypothesis testing involves making a conjecture (assumption) about some facet of our world, collecting data from a sample,

Topic 4 - Statistical applications 4.4 Chi square test for

independence

IB Mathematical Studies SL2c

Page 2: ContentFurther guidance  Hypothesis testing involves making a conjecture (assumption) about some facet of our world, collecting data from a sample,

Syllabus reference

Content Further guidance

Page 3: ContentFurther guidance  Hypothesis testing involves making a conjecture (assumption) about some facet of our world, collecting data from a sample,
Page 4: ContentFurther guidance  Hypothesis testing involves making a conjecture (assumption) about some facet of our world, collecting data from a sample,

Hypothesis Testing

Hypothesis testing involves making a conjecture (assumption) about some facet of our world, collecting data from a sample, performing a specific mathematical test, and then deciding whether or not the conjecture is true.

A conjecture must be stated in two parts:› The null hypothesis (H0) – states that there is no significant

difference between the two parameters being tested (they are “not related to” each other, i.e. independent)

› The alternative hypothesis (H1) states that this is a significant difference.(they are “related” in some way, i.e dependent)

The only hypothesis test covered by the Studies SL course is the Chi Squared test.

Page 5: ContentFurther guidance  Hypothesis testing involves making a conjecture (assumption) about some facet of our world, collecting data from a sample,

Chi-square (X 2) test by GDC

The Chi-square test itself is quite straight forward, your GDC can do it in two steps but you also must know the formula and be able to do it by hand

The hypothesis test which uses Chi-square determines whether or not two variables are related. It follows a general pattern:(1) Make a conjecture(2) Write the null hypothesis using “is not related to, or “independent”;

and write the alternative hypothesis using is related to or “dependent”

(3) Calculate the chi-square test(4) Determine reference values(5) Compare the two and either accept or reject the null hypothesis

Page 6: ContentFurther guidance  Hypothesis testing involves making a conjecture (assumption) about some facet of our world, collecting data from a sample,

Using the GDC

You can find chi-squared on your GDC by using the statistics mode Press Menu 6: Statistics Press 7: Stat Tests Select 8: x2 2-way Test Enter the name of the observed matrix

Note : You must have entered the data in to Matrix mode first and name the matrix!!

Page 7: ContentFurther guidance  Hypothesis testing involves making a conjecture (assumption) about some facet of our world, collecting data from a sample,

Example on page 334 in your textbook

The table shows the results of a sample of 400 randomly selected adults classified according to gender and regular exercise.The observed table, called 2 by 2 contingency table, is given as: Regular

exerciseNo regular exercise

sum

Male 110 106 216

Female 98 86 184

sum 208 192 400

The test is used when we deal with two categorical variables and we wish to determine whether the variables are dependent, for example females may tend to exercise more regularly, or independent, where there is no evidence that the gender of person has an effect on whether they exercise regularly.

2c

Page 8: ContentFurther guidance  Hypothesis testing involves making a conjecture (assumption) about some facet of our world, collecting data from a sample,

Using TI Nspire Enter the observed table as a matrix and save as a.Then use the chi squared test. After you enter the results will show.

Page 9: ContentFurther guidance  Hypothesis testing involves making a conjecture (assumption) about some facet of our world, collecting data from a sample,

We can get the expected frequencies from the calculator by pressing var and stat ExpMatrix

How can we obtain the expected frequency table by hand?

Page 10: ContentFurther guidance  Hypothesis testing involves making a conjecture (assumption) about some facet of our world, collecting data from a sample,

Hand calculations of expected frequency table.For each cell, we multiply the row sum by the column sum and divide by the total.

Regular exercise

No regular exercise

sum

Male 110 106 216

Female 98 86 184

sum 208 192 400

Regular exercise

No regular exercise

sum

Male 216x208/400 216

Female 184x192/400

184

sum 208 192 400

216 192

400

´

184 208

400

´

row sum

total

column sum

Page 11: ContentFurther guidance  Hypothesis testing involves making a conjecture (assumption) about some facet of our world, collecting data from a sample,

Expected frequency table becomes:

Regular exercise

No regular exercise

sum

Male 112.32 103.68 216

Female 95.68 88.32 184

sum 208 192 400

How do we calculate the chi squared?

The chi squared test examines the difference between the observed values we obtained from the original sample, and the expected values we have calculated . This value will be obtained from your GDC.In this case x2=0.217

Page 12: ContentFurther guidance  Hypothesis testing involves making a conjecture (assumption) about some facet of our world, collecting data from a sample,
Page 13: ContentFurther guidance  Hypothesis testing involves making a conjecture (assumption) about some facet of our world, collecting data from a sample,
Page 14: ContentFurther guidance  Hypothesis testing involves making a conjecture (assumption) about some facet of our world, collecting data from a sample,

Critical value of 2c

The critical value of chi squared depends on the significance level and degrees of freedom (size of the table).

For a contingency table which has r rows and c columns, degrees of freedom df are:

( 1)( 1)df r c= - -

Page 15: ContentFurther guidance  Hypothesis testing involves making a conjecture (assumption) about some facet of our world, collecting data from a sample,

In the example we considered:

Regular exercise

No regular exercise

sum

Male 110 106 216

Female 98 86 184

sum 208 192 400

( 1)( 1)

(2 1)(2 1) 1

df r c= - -

= - - =

Page 16: ContentFurther guidance  Hypothesis testing involves making a conjecture (assumption) about some facet of our world, collecting data from a sample,

To find the critical value on GDC:

As 0.217 is less than 3.841 we accept the null hypothesis and conclude that gender and regular exercise are independent.

Page 17: ContentFurther guidance  Hypothesis testing involves making a conjecture (assumption) about some facet of our world, collecting data from a sample,

Practice.

Use critical chi squared value of 7.815

Page 18: ContentFurther guidance  Hypothesis testing involves making a conjecture (assumption) about some facet of our world, collecting data from a sample,

Step 1Add the rows and columns:

  BLACK WHITE RED BLUE Total

Male 51 22 33 24 130

Female

45 36 22 27 130

Total 96 58 55 51 260

Page 19: ContentFurther guidance  Hypothesis testing involves making a conjecture (assumption) about some facet of our world, collecting data from a sample,

Step 2Calculate expected frequencies:

  BLACK WHITE RED BLUE Total

Male 130

Female

130

Total 96 58 55 51 260

130 58

260

´

130 51

260

´130 55

260

´

130 96

260

´

[48.,29.,27.5,25.5][48.,29.,27.5,25.5]

Page 20: ContentFurther guidance  Hypothesis testing involves making a conjecture (assumption) about some facet of our world, collecting data from a sample,

Step 3: Use GDC

Page 21: ContentFurther guidance  Hypothesis testing involves making a conjecture (assumption) about some facet of our world, collecting data from a sample,

Step 4 Compare chi squared on GDC to the critical chi squared

As 6.13 is less than 7.815 we accept the null hypothesis.

Conclusion: The favourite colour is independent of gender.

Calculate degrees of freedom:Number of rows r = 2Number of column c = 4

df=(2-1)(4-1)= 3

Page 22: ContentFurther guidance  Hypothesis testing involves making a conjecture (assumption) about some facet of our world, collecting data from a sample,

Hand calculations of 2c

We use the formula, as given in your formula booklet:

( )2

02 e

e

f f

fc

-=å

Page 23: ContentFurther guidance  Hypothesis testing involves making a conjecture (assumption) about some facet of our world, collecting data from a sample,

51 48 3 9 0.1875

22 29 -7 49 1.6897

33 27.5 5.5 30.25 1.1

24 25.5 -1.5 2.25 0.08824

45 48 -3 9 0.1875

36 29 7 49 1.68966

22 27.5 -5.5 30.25 1.1

27 25.5 1.5 2.25 0.08824

      Total 6.13084

of ef 0 ef f- ( )2

0 ef f-( )2

0 e

e

f f

f

-

Hand calculations of chi squared. We need to construct the table:

Page 24: ContentFurther guidance  Hypothesis testing involves making a conjecture (assumption) about some facet of our world, collecting data from a sample,

Example 2 - Question From what Lauren

observed, she believes that the number of hours exercised per week is dependent on gender. She collected data randomly and organised the results in the table shown.

Determine whether there is enough evidence to accept or reject the null hypothesis:› a) for α=0.01› b) for α=0.05› c) for α=0.10

Hours exercised per week

Male 5 10 12

Female 9 8 4

Page 25: ContentFurther guidance  Hypothesis testing involves making a conjecture (assumption) about some facet of our world, collecting data from a sample,

Example 2 - Solution Write the null and alternative

hypotheses› H0 – The number of hours exercised

each week independent on gender › H1 – The number of hours exercised

each week is dependent on gender Calculate chi-square and the p-

valueX 2 Test

X 2 = 4.69 (3sf)p = 0.0959

(3sf)df = 2

Hours exercised per week

Male 5 10 12

Female 9 8 4

• Compare p-value to each signficance level

a) 0.09>0.01, hence accept null hypothesis

b) 0.09>0.05, hence accept null hypothesis

c) 0.09<0.10, hence we reject the null hypothesis

Whilst it is not technically correct to say “accept H0” it is

still accepted in the IB.

Page 26: ContentFurther guidance  Hypothesis testing involves making a conjecture (assumption) about some facet of our world, collecting data from a sample,

The chi-square test formula

This formula is on the IB formula sheet

› fo is the observed frequencies(i.e the raw data)

› fe is the expected frequencies

It is easiest to perform this sum calculation using a table one step at a time.

calc2

fo fe 2

fe

Page 27: ContentFurther guidance  Hypothesis testing involves making a conjecture (assumption) about some facet of our world, collecting data from a sample,

Understanding the final comparison method

If you are comparing p-value with α-level then if: › p > α accept the null hypothesis› p < α reject the null hypothesis

If you are comparing X 2 with CV then if: › X 2 < CV accept the null hypothesis› X 2 > CV reject the null hypothesis