45
1 Inferential Inferential Statistics Statistics Details of t-test & chi- square test

Chi square test

Embed Size (px)

DESCRIPTION

reasearch methodolgy

Citation preview

Page 1: Chi square test

1

Inferential StatisticsInferential StatisticsDetails of t-test & chi-

square test

Page 2: Chi square test

2

INTRODUCTION

T-TEST- Numerical Data- Comparing the Mean of

two groups

CHI-SQUARE- Categorical Data

BOTH TESTS ARE USED FOR UNPAIRED OBSERVATION

Page 3: Chi square test

3

T - TEST

Also referred as Student t-testSuppose we have taken a sample of 100 women and

found mean Hb as 11.4 gm. Now the question is whether this sample is from a population whose

standard deviation is 2 gm and mean Hb is 12 or not?

What should we be looking at? Is there a difference b/w

population mean and sample mean?

In this case difference is there Again the question is; Is this

difference significant ? Or it is due to chance alone

Page 4: Chi square test

4

Of course the difference should be large enough to be called significant

Now the difference can be written as Sample Mean – population mean

What other factor can effect the importance of the difference ?

If there is great variation in the population (Standard deviation with small sample size) the difference would not be of that much importance

If there is small variation (standard deviation with large sample size) a slight degree of difference can become important

Page 5: Chi square test

5

So we should add this factor as well to

decide about the importance of difference

The formula becomes Sample Mean – Population

mean Standard deviation/√N

And this is given the name of t test

Page 6: Chi square test

6

Hence the t test =

T= Sample Mean – population mean

Standard deviation/√N Putting the values in the formula t = 11.4 -12 = -0.6 2/√100 2/10

Calculated value of t = - 0.6/0.2 = - 3.0Critical value (cut off) = ± 2.00What should be the conclusionReject Ho or not As the calculated value is more than critical we

reject chance explanation and conclude that the sample is not from a population whose mean Hb level is 12 gm.

Page 7: Chi square test

7

Example 2 :In a certain province the proportion of women who are delivered through caesarian section is very high. A study is, therefore, conducted to discover why this is the case. As small height is known to be one of the risk factors related to difficult deliveries, the researcher may want to find out if there is a difference between the mean height of women in this province who had normal deliveries and of those who had Caesarian sections. No. of women

in study Height in cms

Standard deviation

Normal delivery

60 156 3.1

CS 52 154 2.8

1. Calculating the T-Value2. Using a T-Table3. Interpreting the Result

Page 8: Chi square test

8

USING THE T-TABLETo determine if null hypothesis is

rejected or not Significance level (p-value) - 0.05 Degree of freedom

- measure derived form the sample size

- student t-test the number of degree of freedom is calculated as the sum of two sample sizes minus 2

d.f. = 60 + 52 - 2 = 110

t-value belonging to the p-value and the degree of freedom is located in the table- t-value belonging to p -0.05 and d.f. = 120 and we find it is 1.98

Page 9: Chi square test

9

INTERPRETING THE RESULT Decision Rule Calculated t-value is larger than the value

derived from the table. p is smaller than the value indicated at the top of the column. We than reject the null hypothesis and conclude a statistical significant difference b/w the two means.

Calculated t-value is smaller, p is larger than the value indicated, we accept null hypothesis and conclude observed difference is not statistically significance.

Page 10: Chi square test

10

INTERPRETING THE RESULT

t-value calculated in step 1 is 3.6 larger than t-value derived in step 2 is (1.98)

p is smaller than 0.05, we reject the null hypothesis

Observed difference of two cms b/w mean heights of women with normal del. and CS is statistically significant difference.

Page 11: Chi square test

11

CHI-SQUARE (x2) TESTSuppose you are presented with two

vaccines A and B for the prevention of measles. Both the manufacturer claim that their vaccine is better. How would you resolve this issue?

If we want to decide the credibility of vaccine objectively. You would ask both the manufacture to provide some vaccine to conduct a clinical trial. 100 randomly selected children were given vaccine A and same number of children were vaccinated with B. Out of those vaccinated by A fifteen (15) developed infection 20 children vaccinated by B developed infection.

Which vaccine is better?

Page 12: Chi square test

12

If Vaccine A Then Producer of vaccine B may claim that this difference is by chance.

However the producer of vaccine A will call this difference as significant.

The dispute will not be resolved even if the difference is large enough until a cut off value is decided by some neutral person and that neutral umpire is test of significance.

Page 13: Chi square test

13

By putting information from the data following 2x2 table can be filled

Type of vaccine Infection No infection Total

A 100

B 100

Total 200

Page 14: Chi square test

14

By putting information from the data following 2x2 table can be filled

Type of vaccine Infection No infection Total

A 15 100

B 20 100

Total 200

Page 15: Chi square test

15

By putting information from the data following 2x2 table can be filled

Type of vaccine Infection No infection Total

A 15 85 100

B 20 80 100

Total 200

Page 16: Chi square test

16

By putting information from the data following 2x2 table can be filled

Type of vaccine Infection No infection Total

A 15 85 100

B 20 80 100

Total 35 165 200

Page 17: Chi square test

17

According to null Hypothesis If both vaccine had same effect then we should expect the same number of infection in both the groups. Accordingly the following table shows the shape of the table we should expect if null hypothesis is true

Page 18: Chi square test

18

If both vaccine same then half of infection will be in vaccine A and Half in vaccine B and the table will look like this

Type of vaccine Infection No infection Total

A 100

B 100

Total 35 165 200

Page 19: Chi square test

19

Alternatively we can determine the expected value in cell a by the principle of unit. i.e.Out of 200 the number of infection are 35Out of 1 the number of infection will be 35/200Out of 100 the number of infection will be 35 x 100 = 17.5 200

Type of vaccine Infection No infection Total

A 17.5 100

B 100

Total 35 165 200

Page 20: Chi square test

20

If you look care fully then you will observe that in the first step we have divided the coloumn total of cell a with grand total and then multiplied it with row total of cell a. So this two step procedure for expected value can be written as a single step

Expected value(EV) of cell a = Column total x Row total

Grand Total

Thus we can use this formula to calculate expected value of any cell

Now we fill the table with all expected values

Page 21: Chi square test

21

According to null hypothesis we should not find any difference in observed value in cells a, b, c, and d. and expected value. Here we find some difference. This difference may be small (insignificant) or large (significant). Hence the next logical step should be that we should subtract expected value from observed value to find the difference. We would get a table like this

Type of vaccine Infection No infection Total

A 17.5 82.5 100

B 17.5 82.5 100

Total 35 165 200

Page 22: Chi square test

Cell Observed

Value

O

Expected

Value

E

O – E (O – E)2 (O – E)2/ E Total

a 15

b 85

c 20

d 80

Total

Prof. Dr. M. I. Siddiqui – Clinision Workshop

Page 23: Chi square test

Cell Observed

Value

O

Expected

Value

E

O – E (O – E)2 (O – E)2/ E Total

a 15 17.5

b 85 82.5

c 20 17.5

d 80 82.5

Total

Prof. Dr. M. I. Siddiqui – Clinision Workshop

Page 24: Chi square test

Cell Observed

Value

O

Expected

Value

E

O – E (O – E)2 (O – E)2/ E Total

a 15 17.5 -2.5

b 85 82.5 2.5

c 20 17.5 2.5

d 80 82.5 -2.5

Total

Prof. Dr. M. I. Siddiqui – Clinision Workshop

Page 25: Chi square test

Cell Observed

Value

O

Expected

Value

E

O – E (O – E)2 (O – E)2/ E Total

a 15 17.5 -2.5 6.25

b 85 82.5 2.5 6.25

c 20 17.5 2.5 6.25

d 80 82.5 -2.5 6.25

Total

Prof. Dr. M. I. Siddiqui – Clinision Workshop

Page 26: Chi square test

Cell Observed

Value

O

Expected

Value

E

O – E (O – E)2 (O – E)2/ E Total

a 15 17.5 -2.5 6.25 6.25/ 17.5

b 85 82.5 2.5 6.25 6.25/ 82.5

c 20 17.5 2.5 6.25 6.25/ 17.5

d 80 82.5 -2.5 6.25 6.25/82.5

Total

Prof. Dr. M. I. Siddiqui – Clinision Workshop

Page 27: Chi square test

Cell Observed

Value

O

Expected

Value

E

O – E (O – E)2 (O – E)2/ E Total

a 15 17.5 -2.5 6.25 6.25/ 17.5 0.36

b 85 82.5 2.5 6.25 6.25/ 82.5 0.07

c 20 17.5 2.5 6.25 6.25/ 17.5 0.36

d 80 82.5 -2.5 6.25 6.25/82.5 0.07

Total = 0.86

Prof. Dr. M. I. Siddiqui – Clinision Workshop

Page 28: Chi square test

28

Table 4.12

D.F .0.50 .10 .050. 020 .010 .0030 .001

1 0.45 2.74 3.84 5.41 6.04 7.68 10.832 1.39 4.61 5.99 7.82 9.21 10 13.623 2.37 6.25 7.82 9.84 11.34 12.84 16.274 3.36 7.78 9.49 11.67 13.28 14.36 18.475 4.35 9.24 11.07 13.39 15.09 16.75 20.316 5.35 10.65 12.59 15.03 16.81 18.55 22.467 6.35 12.02 14.07 10.62 18.48 20.28 24.328 7.34 13.36 15.51 18.17 20.09 21.96 26.139 8.34 14.68 16.92 19.68 21.67 23.59 27.8810 9.34 15.99 18.31 21.16 23.21 25.19 29.59

( Source. JE Park K. Park, Text Book of Preventive and social Medicine)

Page 29: Chi square test

29

Calculated value of difference = 0.86

Critical value = 3.84 from the table of Chi

square at 5% level of significance

Decision ?

As the difference is less than the minimum difference

to be called significant. We conclude that there is

no significant difference between two vaccine and

we fail to reject null hypothesis.

In simple English both the vaccine are of same

efficacy and the difference is by chance and we

can decide to buy the one which is cheaper.

Page 30: Chi square test

30

CHI-SQUARE (x2) TESTExample:

Suppose that a study of the factors affecting the utilization of antenatal clinics you found that 51 women out of 80 who lived within 10 km of the clinic came for antenatal care, compared to only 35 out of 75 of those who lived more than 10 km away. This suggests that antenatal care (ANC) is used more often by women who live close to the clinics.

Distance from ANC Used ANC Did not use ANC Total

Less than 10 km 51 (64%) 29 (36%) 80 (100%)

10 km or > 35 (47%) 40(53%) 75 (100%)

Total 86 69 155

1) Calculate the x2 value2) Using a x2 Table3) Interpreting the Result

Page 31: Chi square test

31

Distance from ANC

Used ANC

Did not use ANC

Total

Less than 10 km

80 (100%)

10 km or > 75 (100%)

Total 86 69 155

Page 32: Chi square test

32

By putting expected value of cell from our calculation

Distance from ANC

Used ANC

Did not use ANC

Total

Less than 10 km

44.4 35.6 80 (100%)

10 km or > 41.6 33.4 75 (100%)

Total 86 69 155

Page 33: Chi square test

33

If there are 155 user then the ANC was used by 86

If there was 1 user then ANC was used by 86/155

If there were 80 people then the ANC was used by 80X86/155 (equation 1)

44.4We can write this expression as 80 = row total (RT)86 = coloumn Total (CT)155 = Grand total (GT)So we can write equation one as follow

EV= RTXCT/GTHence we can calculate expected value of

any cell with this expression or formula

Page 34: Chi square test

34

CALCULATE X2 VALUE

Expected frequency (E) for each cell.E = row total x column total / grand (overall) total

Each cell, subtract the expected frequency from the observed frequency (O)

O - E For each cell square the result of (O-E)

and divide by expected frequency E. Add the result of the above step for all

the cells

Cont…

Page 35: Chi square test

35

CALCULATE X2 VALUE Formula for calculating chi-

square value:x2 = ( 0 - E)2 / E

O is the observe frequency (indicated in the table)E is the expected frequency to be calculated (the sum of) directs you to add together the products

of (O-E)2 for all the cell of the table

For two by two table (which contain 4 cells) the formula isx2 = [(01 - E1)2 / E1] + [(02 - E2)2 / E2] + [(03 -

E3)2 / E3] + [(04 - E4)2 / E4]

Page 36: Chi square test

36

USING X2 TABLE

Decide a p-value example 0.05 Degree of freedom

df = (r-1) x (c-1)

for a 2 by 2 table the no. of d.f. is 1 (i.e. d.f. = (2-1) x (2-1) = 1)

Page 37: Chi square test

37

INTERPRETING THE RESULT

Step 1 (a)

expected frequency for each cell

E1 = 86 x 80 / 155 = 44.4Distance from ANC Used ANC Did not use ANC Total

Less than 10 km O1 = 51 E1 = 44.44 O2 = 29 E2 = 35.6 80

10 km or > O3 = 35 E3 = 41.6 O4 = 40 E4 = 33.4 75

Total 86 69 155

null hypothesis is rejected if < 0.05 which is the case if x2 is larger than theoretical x2 in table

Cont…

Page 38: Chi square test

38

INTERPRETING THE RESULT

Step 1(b) to (1d)

x2 = (51- 44.4)2 / 44.4 + …… + …… = 0.98 + 1.22 + 1.05 + 1.30 = 4.55

Step 2- (d.f.) is 1

- table of chi-square decided p-value = 0.05

- d.f. is 1, we look along row in the column where p=0.05. This gives us value of 3.84. Our value of 4.55 is > 3.84, which means that the p-value is < 0.05

Contd..

Page 39: Chi square test

39

INTERPRETING THE RESULT

Step 3

We can now conclude that the women living within the distance of 10 km from the clinic used antenatal care significantly more often than women living more than 10 km away.

Page 40: Chi square test

40

INTERPRETATION OF EXAMPLE

64% of the women living within a distance of 10 km from the clinic used antenatal care during pregnancy, compared to only 47% of women living 10 km or further away from the nearest clinic. This difference is statistically significant (x2 = 4.55; - < 0.05)

Page 41: Chi square test

41

NOTE• x2 test applied if the sample is large enough

• General rule total should be at least 40 and the expected frequency in each of the cells should be at least 5.

• x2 test can be used to compare more than two groups

• A table with 3 or more rows or columns would be designed rather than a two by two tables

• In above example < 5 km, 5 - 10 km, > 10 km data would be put in 2 x 3 tables.

• d.f. would be (3 -1) x (2 - 1) = 2

Page 42: Chi square test

42

QUICK FORMULA For two-by-two tables there is a quick method for

calculating the chi-square value, which can replace step 1 described above.

If the various numbers in the costs table are represented by the following letters

The quick formula for calculating the Chi-square value is

x2 = n (ad - bc)2 / efgh

Condition Total

+ -

Exposure a b e

Yes c d f

No g h n

Page 43: Chi square test

43

Objectives

By the end of session the participant will be able to

Define test of significance

Name test of significance according to type of data

Able to calculate T test

Able to compute X2

Interpret the result

Interpret statistical results in simple English language

Page 44: Chi square test

TutorialExercise for test of significanceA surgeon wants to compare the two surgical procedure for hysterectomy, Abdominal Hysterectomy (AH) and Vaginal Hysterectomy (VH) with a research question of infection rate between two surgical procedures. She operates on 400 patient with VH approach and 40 developed infection. Out of 100 AH cases 20 developed infection

Answer the following questions

a) What should be the title of research for publication in a journal?

b) Frame Null hypothesis in this case?c) What level of significance you will set?.d) What test of significance you will apply and why? e) Calculate test statisticsf) Interpret your result in statistical languageg) Describe the result in simple English

Page 45: Chi square test

45

Home Assignment

Home Assignment

Do exercise 12.1, 12.2 , 12.3 12.10, 12.13, 12.19