22
The Chi-square goodness of fit test

The Chi-square goodness of fit test. Core issue in statistics: When are you viewing just random noise and when is there a real trend? – Example: To see

Embed Size (px)

Citation preview

Page 1: The Chi-square goodness of fit test. Core issue in statistics: When are you viewing just random noise and when is there a real trend? – Example: To see

The Chi-square goodness of fit test

Page 2: The Chi-square goodness of fit test. Core issue in statistics: When are you viewing just random noise and when is there a real trend? – Example: To see

• Core issue in statistics: When are you viewing just random noise and when is there a real trend?– Example: To see if squash shape & color are linked genes do a test cross.

Chi-square goodness of fit

1 : 1 : 1 : 1 ????

xGgLl ggll

Page 3: The Chi-square goodness of fit test. Core issue in statistics: When are you viewing just random noise and when is there a real trend? – Example: To see

• Your response variable is count data.

When to use a chi-square test

• You have more than one category of the response variable.

• You have a hypothesis for the responses you expect.

• You want to know if the difference between the responses you observe and the responses you expect is significant or not.

Page 4: The Chi-square goodness of fit test. Core issue in statistics: When are you viewing just random noise and when is there a real trend? – Example: To see

• Example hypothesis: “The MSU football team will win every single game this season.” So, according to my hypothesis, I expect MSU’s chance of winning any game is ___%.

100%Does this mean MSU will win 100% of their games?

• Example hypothesis: “The MSU football team’s number of wins and losses will be random.” So, according to this new hypothesis, I expect the team’s chance of winning any game is ___%.

50%

Turn a hypothesis into a numberYour hypothesis tells you what you expect any given response (observation) to be.

Turn your expectation into a fraction or percentage.

Page 5: The Chi-square goodness of fit test. Core issue in statistics: When are you viewing just random noise and when is there a real trend? – Example: To see

Hyp.: “People over the age of 60 are 50% more likely to attend a baseball game than younger people.” So, according to my hypothesis if I go to a baseball game and find out the ages for all the fans in the audience, I expect the odds of any one fan being > 60 to be…

x+ (x-50) = 100, solve for x.75% or 3 out of 4.What are the odds a fan will be < 60 years old?

Turn a hypothesis into a number

Page 6: The Chi-square goodness of fit test. Core issue in statistics: When are you viewing just random noise and when is there a real trend? – Example: To see

• “Pre-hypothesis”: Given the choice, people prefer red and blue m&m’s over the other 4 colors.

• But don’t know how strong their preference might be. So test the “null hypothesis”—People choose m&m colors at random, i.e. they don’t show preference. (vs. “alternative” or “experimental” hypothesis).

• So, according to my null hypothesis, if I hand around a bowl of m&ms, I expect the chance of each color being chosen is…

1/6 or 16.67%.• Use chi square test to see if what you actually observe is significantly different

from 1/6.

Turn a hypothesis into a number

Page 7: The Chi-square goodness of fit test. Core issue in statistics: When are you viewing just random noise and when is there a real trend? – Example: To see

The chi-square test

Game

% fans > 60 years old

1 69

2 80

3 20

4 55

5 67

6 76

7 47

8 81

9 70

10 68

Game

% fans > 60 years old

1 75

2 75

3 75

4 75

5 75

6 75

7 75

8 75

9 75

10 75

Observed Expected

The chi-square test determines whether or not the difference between the responses you observe and the responses you expect is significant.

Significant = not due to random chance alone.

Calculate the “strength of the difference”, get a value that tells you the probability the difference is due to chance (random noise) alone.

If this probability is small (<5%), we conclude there is a significant difference (the difference is not simply due to chance) between obs and exp values.

Page 8: The Chi-square goodness of fit test. Core issue in statistics: When are you viewing just random noise and when is there a real trend? – Example: To see

Interpreting the chi-square test

Game

% fans > 60 years old

1 69

2 80

3 20

4 55

5 67

6 76

7 47

8 81

9 70

10 68

Game

% fans > 60 years old

1 75

2 75

3 75

4 75

5 75

6 75

7 75

8 75

9 75

10 75

Observed ExpectedHypothesis: “People over the age of 60 are 50% more likely to attend a baseball game than younger people.”

If the test tells you your data are not significantly different from what you expect, (your data have a “good fit” to the expected values), you support the hypothesis.Note: no statistical test ever proves a hypothesis!

If the test tells you your data are significantly different from what you expect, you reject the hypothesis.

≈≠

Page 9: The Chi-square goodness of fit test. Core issue in statistics: When are you viewing just random noise and when is there a real trend? – Example: To see

What is chi-square?

“Chi-square” symbol is χ2 (Greek).

χ2 = (Observed – Expected)2

Expected

Observed Expected Obs-Exp (Obs-Exp)2 (Obs-Exp)2

ExpCategory 1

Category 2

…χ2 total

Degrees of Freedom

Based on your hypothesis!

“Sum of”

Σ

Number of categories minus 1 = N-1

Page 10: The Chi-square goodness of fit test. Core issue in statistics: When are you viewing just random noise and when is there a real trend? – Example: To see

Example problem #1

A university biology department would like to hire a new professor. They advertised the opening and received 220 applications, 25% of which came from women. The department came up with a “short list” of their favorite 25 candidates, 5 women and 20 men, for the job. You want to know if there is evidence for the search committee being biased against women. Note: If the committee is unbiased the proportion of women in the short list should match the proportion of women in all the applications. Define your hypothesis. Set up table.

Observed Expected Obs-Exp (Obs-Exp)2 (Obs-Exp)2

Exp

χ2 totalDegrees of Freedom

Women

Men

520

6.2518.75

-1.251.25

1.56251.5625

0.250.080.33

1

Women: 25 * 0.25 =

Men: 25 * 0.75 =

25 = 25

Page 11: The Chi-square goodness of fit test. Core issue in statistics: When are you viewing just random noise and when is there a real trend? – Example: To see

Chi-square probability table

Probabilities

Observed values are significantly different from

expected (differences not just due to random chance).

Reject hypothesis.

Reject hyp.

Observed values not significantly different from expected

(differences due to random chance). Support hypothesis.

Page 12: The Chi-square goodness of fit test. Core issue in statistics: When are you viewing just random noise and when is there a real trend? – Example: To see

Chi-square probability table

Probabilities

Observed values are significantly different from

expected (differences not just due to random chance).

Reject hypothesis.

Reject hyp.

Observed values not significantly different from expected

(differences due to random chance). Support hypothesis.

Probability range: 0.5 < p < 0.6Means that there is a 50-60% probability that the difference between obs & exp values are from random chance alone.

So, is the department biased against women

applicants?

Page 13: The Chi-square goodness of fit test. Core issue in statistics: When are you viewing just random noise and when is there a real trend? – Example: To see

Example problem #2Work in groups

Page 14: The Chi-square goodness of fit test. Core issue in statistics: When are you viewing just random noise and when is there a real trend? – Example: To see

Example problem #2Hypothesis: Body color and wing size are unlinked genes.

Expected ratio?9:3:3:1.

Expected values:

Gray Normal wings (GgWw): 9/16 * 102 = 57.375

Gray Vestigial wings (Ggww):3/16 * 102 = 19.125

Ebony Normal wings (ggWw):3/16 * 102 = 19.125

Ebony Vestigial (ggww): 1/16 * 102 = 6.375

Observed Expected Obs-Exp (Obs-Exp)2 (Obs-Exp)2

Exp

χ2 totalDegrees of Freedom

Gray Norm.Gray Vest.

5316

57.37519.125

-4.375-3.125

19.1419.766

0.3330.5111.8050.414

Ebony Norm.Ebony Vest.

258

19.1256.375

5.8751.625

34.5162.641

3.0633

102 = 102

Page 15: The Chi-square goodness of fit test. Core issue in statistics: When are you viewing just random noise and when is there a real trend? – Example: To see

Chi-square probability table

Probabilities

Support hypothesis. Reject hyp.

Probability range: 0.3 < p < 0.4Means that there is a 30-40% probability that the difference between obs & exp values are from random chance alone.

Biology?

Page 16: The Chi-square goodness of fit test. Core issue in statistics: When are you viewing just random noise and when is there a real trend? – Example: To see

Example problem #3Using Chi-square to test for

linked genes

Page 17: The Chi-square goodness of fit test. Core issue in statistics: When are you viewing just random noise and when is there a real trend? – Example: To see

Example problem #31. Hypothesis:

Squash color and shape are not linked genes. OR Squash color and shape are linked genes.

2. Describe the phenotypes and circle the recombinants.

LlGg llGg

llgg Llgg

3. If the 2 genes are not linked the expected ratio is:

1:1:1:1

4. If the two genes are linked the expected phenotype ratio is:

1:0:0:1

Page 18: The Chi-square goodness of fit test. Core issue in statistics: When are you viewing just random noise and when is there a real trend? – Example: To see

Example problem #3If you tested the hypothesis that squash shapre and color ARE LINKED (1:1:1:1) :5. Calculate the expected number of offspring for each phenotype:Wild Wild (LlGg) :

509/4 = 127.25Wild Orange (Llgg) :

127.25Round Wild (llGg) :

127.25Round Orange (llgg) :

127.25

Observed Expected Obs-Exp (Obs-Exp)2 (Obs-Exp)2

Exp

χ2 totalDegrees of Freedom

Wild WildWild Orange

22817

127.25127.25

100.75-110.25

10150.5612155.06

79.895.588.7

105.3Round WildRound Orange

21243

127.25127.25

-106.25115.75

11289.0613398.06

369.33

Page 19: The Chi-square goodness of fit test. Core issue in statistics: When are you viewing just random noise and when is there a real trend? – Example: To see

Chi-square probability table

Probabilities

Support hypothesis. Reject hyp.

Probability range: 0.3 < p < 0.4

Statistical meaning: 30-40% probability that the difference between obs & exp

values are from random chance alone. The obs and exp values are not significantly different. Support hypothesis.

Biological meaning?

Page 20: The Chi-square goodness of fit test. Core issue in statistics: When are you viewing just random noise and when is there a real trend? – Example: To see

Example problem #3If you tested the hypothesis that squash shapre and color ARE NOT LINKED (1:0:0:1) :5. Calculate the expected number of offspring for each phenotype:Wild Wild (LlGg) :

509/2 = 254.5Wild Orange (Llgg) :

0Round Wild (llGg) :

0Round Orange (llgg) :

509/2=254.5

Observed Expected Obs-Exp (Obs-Exp)2 (Obs-Exp)2

Exp

χ2 totalDegrees of Freedom

Wild WildWild Orange

22817

254.50

-26.517

702.25289

2.76(Undef.) 0(Undef.) 0

0.52Round WildRound Orange

21243

0254.5

21

-11.5

441132.25

3.283

Page 21: The Chi-square goodness of fit test. Core issue in statistics: When are you viewing just random noise and when is there a real trend? – Example: To see

Chi-square probability table

Probabilities

Support hypothesis. Reject hyp.

Probability range: p < 0.01

Statistical meaning: < 1% probability that the difference between obs & exp

values are from random chance alone. The obs and exp values are significantly different. Reject hypothesis.

Biological meaning?

Page 22: The Chi-square goodness of fit test. Core issue in statistics: When are you viewing just random noise and when is there a real trend? – Example: To see

Example problem #3

Hypothesis not linked p<0.01 Reject hypothesis

Hypothesis linked 0.3 < p < 0.4, in other words, p > 0.05 Support hypothesis

Are these test results in agreement?

So do these data show that the genes are linked or not?

If you weren’t very confident in your test results, what could you do next to improve your confidence?