14
X 2 Tests How to know when to use them. Which one should I use?

How to know when to use them. Which one should I use?

Embed Size (px)

DESCRIPTION

Chi Square Goodness of Fit Test This test allows us to compare a collection of categorical data with some theoretical expected distribution. Sometimes called a One-Sample Test Degrees of Freedom = categories – one. Example: Birth month of Baseball players

Citation preview

Page 1: How to know when to use them. Which one should I use?

X2 TestsHow to know when to use them.

Which one should I use?

Page 2: How to know when to use them. Which one should I use?

Chi Square TestsHow do I know to use a chi-square test?

Only use a Chi-Square test if all of the data in question is categorical (remember your assumptions) Randomness Independence 10% Rule Counted Data – The values in each cell must be counts

for the categories of a categorical (qualitative) variable. Expected Cell Frequency – Every cell should contain a

count of at least 5.

Page 3: How to know when to use them. Which one should I use?

Chi Square Goodness of Fit TestThis test allows us to compare a collection of

categorical data with some theoretical expected distribution.

Sometimes called a One-Sample TestDegrees of Freedom = categories – one.

Example: Birth month of Baseball players

Page 4: How to know when to use them. Which one should I use?

Chi Square Goodness of Fit TestAfter getting trounced by your little brother in a children’s game, you suspect the die he gave you to roll may be unfair. To check, you roll it 60 times, recording the number of times each face appears. Do these results cast doubt on the die’s fairness?• If the die is fair, how many times

would you expect each face to show?

• To see if these results are unusual, what type of test will you perform?

• State your hypotheses

Face Count1 112 73 94 155 126 6

Page 5: How to know when to use them. Which one should I use?

Chi Square Goodness of Fit TestAfter getting trounced by your little brother in a children’s game, you suspect the die he gave you to roll may be unfair. To check, you roll it 60 times, recording the number of times each face appears. Do these results cast doubt on the die’s fairness?• Check the conditions

• How many degrees of freedom are there?

• Find x2 and the P-value

• State your conclusion

Face Count1 112 73 94 155 126 6

Page 6: How to know when to use them. Which one should I use?

Chi Square Test of HomogeneityA test comparing the distribution of counts for

two or more groups on the same categorical variable.Finds the expected counts based on the overall

frequencies, adjusted for the totals in each group under the (null hypothesis) assumption that the distributions are the same for each group.

Degrees of Freedom = (rows – 1)(columns – 1)Where rows gives the number of categories and

columns gives the number of independent groups.

Example: Future plans of Graduates based on college of study

Page 7: How to know when to use them. Which one should I use?

Chi Square Test of HomogeneityDoes your doctor know? A survey of articles from the New England Journal of Medicine (NEJM) classified them according to the principal statistics methods used. The articles recorded were all non-editorial articles appearing during the indicated years. Has there been a change in the use of Statistics?• What kind of test would be

appropriate?

• State the hypotheses.

• How many degrees of freedom are there?

• The smallest expected count will be in 1989/No cell. What is it?

Publication Year

1978-79 1989 2004

-05 Total

No Stats 90 14 40 144

Stats 242 101 271 614

Total 332 115 311 758

Page 8: How to know when to use them. Which one should I use?

Chi Square Test of HomogeneityDoes your doctor know? A survey of articles from the New England Journal of Medicine (NEJM) classified them according to the principal statistics methods used. The articles recorded were all non-editorial articles appearing during the indicated years. Has there been a change in the use of Statistics?• Check the assumptions and

conditions for inference.

• Calculate the component of chi-square for the 1989/No cell.

• For this test, x2 = 25.28. What’s the P-value?

• State your conclusion.

Publication Year

1978-79 1989 2004

-05 Total

No Stats 90 14 40 144

Stats 242 101 271 614

Total 332 115 311 758

Page 9: How to know when to use them. Which one should I use?

Chi Square Test of HomogeneityDoes your doctor know? A survey of articles from the New England Journal of Medicine (NEJM) classified them according to the principal statistics methods used. The articles recorded were all non-editorial articles appearing during the indicated years. Has there been a change in the use of Statistics?• Show how the residual for the

1989/No cell was calculated.

• What can you conclude from the patterns in the standardized residuals?

Publication Year

1978-79 1989 2004

-05

No Stats 3.39 -1.68 -2.48

Stats -1.64 0.81 1.20

Page 10: How to know when to use them. Which one should I use?

Chi Square Test of IndependenceA test of whether two categorical variables are

independent. Usually displayed in a contingency table.Contingency table – A two-way table that classifies

individuals according to two categorical variables.Examines the distribution of counts for one group

of individuals classified according to both variables.

Degrees of Freedom = (rows – 1)(columns – 1)Where rows give the number of categories in one

variable and columns gives the number of categories in the other.

Page 11: How to know when to use them. Which one should I use?

Chi Square Test of IndependenceThere is some concern that if a woman has an epidural to reduce pain during childbirth, the drug can get into the baby’s bloodstream, making the baby sleepier and less willing to breastfeed. Researchers followed up on 1178 births, noting whether the mother had an epidural and whether the baby was still nursing after 6 months.• What kind of test would be

appropriate?

• State the null and alternative hypotheses.

• How many degrees of freedom are there?

Epidural?

Breastfeeding at 6

months?

Yes No Total

Yes 206 498 704

No 190 284 474

Total 396 782 1178

Page 12: How to know when to use them. Which one should I use?

Chi Square Test of IndependenceThere is some concern that if a woman has an epidural to reduce pain during childbirth, the drug can get into the baby’s bloodstream, making the baby sleepier and less willing to breastfeed. Researchers followed up on 1178 births, noting whether the mother had an epidural and whether the baby was still nursing after 6 months.• The smallest expected count

will be in the epidural/no breastfeeding cell. What is it?

• Check the assumptions and conditions for inference.

• Calculate the component of chi-square for the epidural/no breastfeeding cell.

Epidural?

Breastfeeding at 6

months?

Yes No Total

Yes 206 498 704

No 190 284 474

Total 396 782 1178

Page 13: How to know when to use them. Which one should I use?

Chi Square Test of IndependenceThere is some concern that if a woman has an epidural to reduce pain during childbirth, the drug can get into the baby’s bloodstream, making the baby sleepier and less willing to breastfeed. Researchers followed up on 1178 births, noting whether the mother had an epidural and whether the baby was still nursing after 6 months.• For this test, x2 = 14.87. What’s

the P-value?• State your conclusion.• Show how the residual for the

epidural/no breastfeeding cell was calculated.

• What can you conclude from the standardized residuals?

Epidural?Breastfeeding at 6

months?

Yes No

Yes -1.99 14.2

No 2.43 -1.73

Page 14: How to know when to use them. Which one should I use?

Chi Square Test of IndependenceThere is some concern that if a woman has an epidural to reduce pain during childbirth, the drug can get into the baby’s bloodstream, making the baby sleepier and less willing to breastfeed. Researchers followed up on 1178 births, noting whether the mother had an epidural and whether the baby was still nursing after 6 months.Suppose a broader study included several additional issues, including whether a mother drank alcohol, whether this was a first child, and whether the parents occasionally supplemented breastfeeding with bottled formula. Why would it not be appropriate to use chi-square methods on the 2 × 8 table with yes/no columns for each potential factor?