65
Midterm: Next Thursday Oct 22 nd at 2pm Last Name Starts With Building and Room Number A-C MATX 1100 D-K MCML 166 L-Z WESB 100

at 2pm - Department of Zoology, UBCmfscott/lectures/09_Contingency_Analysis.pdf · Assignment #4 Chapter 7: 21, 22, 28 Due this Friday Oct. 16th by 2pm in your TA’s homework box

  • Upload
    lethuy

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Midterm: Next Thursday Oct 22nd

at 2pm Last Name Starts With

Building and Room Number

A-C MATX 1100

D-K MCML 166

L-Z WESB 100

Assignment #4

Chapter 7: 21, 22, 28 Due this Friday Oct. 16th by 2pm in your TA’s homework box

Assignment #5

Chapter 8: 16, 19 Chapter 9: 19 Due Two Fridays from now Oct. 30th by 2pm in your TA’s homework box

Reading

For Today: Chapter 9 For Thursday: Chapter 10

Chapter 8 Review

Goodness-of-fit tests Compare an observed frequency distribution with frequency distribution expected under simple probability model Binomial Test: Limited to categorical variables with only two possible outcomes χ2 Test: Can handle categorical and discrete numerical variables having more than two outcomes

χ2 Goodness-of-fit test

Uses a test statistic called χ2 to measure the discrepancy between an observed discrete frequency distribution and the frequencies expected under a simple probability model serving as the null

hypothesis.

Hypotheses for χ2 test

H0: The data come from a particular discrete probability distribution. HA: The data do not come from that

distribution.

Test statistic for χ2 test

χ 2 =Observedi − Expectedi( )2

Expectediall classes∑

Degrees of freedom for χ2 test

df = (Number of categories)

– (Number of parameters estimated from the data)

– 1

Table A - χ2 distribution

The 5% critical value

χ2 test as approximation of binomial test

•  χ2 goodness-of-fit test works even when there are only two categories, so it can be used as a substitute for the binomial test.

•  Very useful if the number of data points is large. –  Imagine if, in our red/blue wrestler example, rather

than 16/20 wins by red, we had 1600/2000 wins by red. Imagine calculating:

–  And then imagine calculating:

Pr[1600]= 2000!1600!400!

0.516000.5400

P = 2*(Pr[1600]+Pr[1601]+...+Pr[2000])

Assumptions of χ2 test

•  No more than 20% of categories have Expected<5

•  No category with Expected ≤ 1

Example: Fitting the Binomial Distribution One thousand coins were flipped eight times, and the number of heads was recorded for each coin. Were they fair coins?

Number of heads

Number of coins

0 6 1 32 2 105 3 1c86 4 236 5 201 6 98 7 33 8 103

Total 1000

Example: Fitting the Binomial Distribution One thousand coins were flipped eight times, and the number of heads was recorded for each coin. Were they fair coins? H0: The number of heads has a binomial distribution with p=0.05 HA: The number of heads does not have a binomial distribution with

p=0.05

Example: Fitting the Binomial Distribution One thousand coins were flipped eight times, and the number of heads was recorded for each coin. Were they fair coins?

Number of heads

Number of coins

0 6 1 32 2 105 3 186 4 236 5 201 6 98 7 33 8 103

Total 1000

Example: Fitting the Binomial Distribution One thousand coins were flipped eight times, and the number of heads was recorded for each coin. Were they fair coins?

Etc…

Pr[0]= 80

!

"#

$

%& 0.5( )0 0.5( )8 = 0.0039

Pr[1]= 81

!

"#

$

%& 0.5( )1 0.5( )7 = 0.0313

Example: Fitting the Binomial Distribution One thousand coins were flipped eight times, and the number of heads was recorded for each coin. Were they fair coins?

Number of heads

Number of coins

Binomial expectation

0 6 0.0039 1 32 0.0313 2 105 0.1094 3 186 0.2188 4 236 0.2734 5 201 0.2188 6 98 0.1094 7 33 0.0313 8 103 0.0039

Total 1000 1.0

Example: Fitting the Binomial Distribution One thousand coins were flipped eight times, and the number of heads was recorded for each coin. Were they fair coins? Expected Values = Expected probability * Total number of sets of trials Expected[0 heads] = 0.0039 * 1000 = 3.91 Expected[1 heads] = 0.313 * 1000 = 31.25 Etc…

Example: Fitting the Binomial Distribution One thousand coins were flipped eight times, and the number of heads was recorded for each coin. Were they fair coins?

Number of heads

Number of coins

Binomial expectation

Expected

0 6 0.0039 3.91 1 32 0.0313 31.25 2 105 0.1094 109.38 3 186 0.2188 218.75 4 236 0.2734 273.44 5 201 0.2188 218.75 6 98 0.1094 109.38 7 33 0.0313 31.25 8 103 0.0039 3.91

Total 1000 1.0 1000

Example: Fitting the Binomial Distribution One thousand coins were flipped eight times, and the number of heads was recorded for each coin. Were they fair coins?

Number of heads

Number of coins

Binomial expectation

Expected

0 6 0.0039 3.91 1 32 0.0313 31.25 2 105 0.1094 109.38 3 186 0.2188 218.75 4 236 0.2734 273.44 5 201 0.2188 218.75 6 98 0.1094 109.38 7 33 0.0313 31.25 8 103 0.0039 3.91

Total 1000 1.0 1000

Example: Fitting the Binomial Distribution One thousand coins were flipped eight times, and the number of heads was recorded for each coin. Were they fair coins?

Number of heads

Number of coins

Expected

0 or 1 38 35.16 2 105 109.38 3 186 218.75 4 236 273.44 5 201 218.75 6 98 109.38

7 or 8 136 35.16 Total 1000 1000

Test statistic for χ2 test

χ 2 =Observedi −Expectedi( )2

Expectediall classes∑

Example: Fitting the Binomial Distribution One thousand coins were flipped eight times, and the number of heads was recorded for each coin. Were they fair coins?

Number of heads

Number of coins

Expected (O-E)2 / E

0 or 1 38 35.16 0.23 2 105 109.38 0.18 3 186 218.75 4.90 4 236 273.44 5.13 5 201 218.75 1.44 6 98 109.38 1.18

7 or 8 136 35.16 289.2 Total 1000 1000 χ2 = 302.27

Example: Fitting the Binomial Distribution One thousand coins were flipped eight times, and the number of heads was recorded for each coin. Were they fair coins? χ2 = 302.27 df = 7 – 0 – 1 = 6

Example: Fitting the Binomial Distribution One thousand coins were flipped eight times, and the number of heads was recorded for each coin. Were they fair coins? χ2 = 302.27 df = 7 – 0 – 1 = 6

Example: Fitting the Binomial Distribution One thousand coins were flipped eight times, and the number of heads was recorded for each coin. Were they fair coins? χ2 = 302.27 df = 7 – 0 – 1 = 6 = 12.59 = 22.46 P < 0.001 We reject the null hypothesis. The coins were not fair.

χ0.05,62

χ0.001,62

In-class Exercise The parasitic nematode Camallanus oxycephalus infects many freshwater fish, including shad. Do they infect fish at random?

Number of nematodes

Number of fish

0 103 1 72 2 44 3 14 4 3 5 1 6 1

In-class Exercise The parasitic nematode Camallanus oxycephalus infects many freshwater fish, including shad. Do they infect fish at random? H0: The number of nematodes per fish has a Poisson distribution HA: The number of nematodes per fish does not have a Poisson

distribution

In-class Exercise The parasitic nematode Camallanus oxycephalus infects many freshwater fish, including shad. Do they infect fish at random? Number of nematodes

Number of fish

Poisson expectation

0 103 1 72 2 44 3 14 4 3 5 1 6 1

Pr X[ ] = e−µµ X

X!

Y = 103(0)+ 72(1)+ 44(2)+14(3)+3(4)+1(5)+1(6)238

= 0.945

In-class Exercise The parasitic nematode Camallanus oxycephalus infects many freshwater fish, including shad. Do they infect fish at random? Number of nematodes

Number of fish

Poisson expectation

0 103 0.389 1 72 0.367 2 44 0.174 3 14 0.055 4 3 0.013 5 1 0.002 6 1 0.000

In-class Exercise The parasitic nematode Camallanus oxycephalus infects many freshwater fish, including shad. Do they infect fish at random?

Number of nematodes

Number of fish

Poisson expectation

Expected

0 103 0.389 92.58 1 72 0.367 87.35 2 44 0.174 41.41 3 14 0.055 13.09 4 3 0.013 3.09 5 1 0.002 0.48 6 1 0.000 0.00

In-class Exercise The parasitic nematode Camallanus oxycephalus infects many freshwater fish, including shad. Do they infect fish at random?

Number of nematodes

Number of fish

Expected (O-E)2 / E

0 103 92.58 1.17 1 72 87.35 2.70 2 44 41.41 0.16 3 14 13.09 0.06 ≥4 5 3.57 0.57

χ2 = 4.66

In-class Exercise The parasitic nematode Camallanus oxycephalus infects many freshwater fish, including shad. Do they infect fish at random? χ2 = 4.66 df = 5 – 1 – 1 = 3

Example: Fitting the Binomial Distribution One thousand coins were flipped eight times, and the number of heads was recorded for each coin. Were they fair coins? χ2 = 302.27 df = 7 – 0 – 1 = 6

In-class Exercise The parasitic nematode Camallanus oxycephalus infects many freshwater fish, including shad. Do they infect fish at random? χ2 = 4.66 df = 5 – 1 – 1 = 3 = 7.81 P>0.05 We do not reject the null hypothesis. There is no evidence that nematodes do not infect fish randomly.

χ0.05,32

Contingency analysis: associations between categorical variables

Chapter 9

Odds

O =p

1− pOdds of survival:

Omen =0.201− 0.20

=0.200.80

= 0.25

Owomen =0.741− 0.74

=0.740.26

= 2.85

Or “1 to 4”

Or roughly “3 to 1”

The probability of success divided by the probability of failure.

Odds ratio

OR =O1O2

Odds ratio of female to male survival:

If interested, see text for how to calculate standard error and confidence interval.

The odds of success in one group divided by the odds of success in another group.

OR =Owomen

Omen

=2.850.25

=11.4

Used often in medical research

Contingency analysis

•  Test the independence of two or more categorical variables

•  We’ll learn one kind: χ2 contingency analysis

Music and wine buying OBSERVED French

music playing

German music

playing

Totals

Bottles of French wine

sold

40 12 52

Bottles of German

wine sold

8 22 30

Totals 48 34 82

Mosaic plot

Hypotheses

•  H0: The nationality of the bottle of wine is independent of the nationality of the music played when it is sold.

•  HA: The nationality of the bottle of wine sold depends on the nationality of the music being played when it is sold.

Calculating the expectations

With independence, Pr[ French wine AND French music] =

Pr[French wine] × Pr[French music]

Calculating the expectations

Pr[French wine] = 52/82=0.634 Pr[French music] = 48/82= 0.585

EXP. French music

German music

Totals

French wine sold

52

German wine sold

30

Totals 48 34 82

If H0 is true, Pr[French wine AND French music] = (0.634)(0.585) = 0.37112

Calculating the expectations

EXP. French music German music

Totals

French wine sold

0.37 (82) = 30.4 21.6 52

German wine sold 17.6 12.4 30

Totals 48 34 82

By H0, Pr[French wine AND French music] = (0.634)(0.585)=0.37112

χ2

χ 2 =Observedi − Expectedi( )2

Expectedii∑

=40 − 30.4( )2

30.4+12 − 21.6( )2

21.6+8 −17.6( )2

17.6+22 −12.4( )2

12.4

= 20.0

Degrees of freedom

df= (# columns -1 )(#rows -1)

For music/wine example, df = (2-1)(2-1) = 1

Conclusion

χ2 = 20.0 >> χ21,α=0.05 = 3.84,

So we can reject the null hypothesis of

independence, and say that the nationality of the wine sold did depend on what music was played.

Moreover, χ2 = 20.0 >> χ2

1,α=0.001 = 10.83, so we can say P < 0.001.

Assumptions

•  This χ2 test is just a special case of the χ2 goodness-of-fit test, so the same rules apply.

•  You can’t have any expectation less than 1, and no more than 20% < 5.

Fisher’s exact test

•  For 2 x 2 contingency analysis

•  Does not make assumptions about the size of expectations

•  JMP (or other programs) will do it, but cumbersome to do by hand

Winter Wren (Troglodytes troglodytes) •  Are western and eastern forms (currently considered subspecies) actually reproductively isolated, and therefore separate species?

Tumbler Ridge, BC or ?

T. (t.) pacificus T. t. hiemalis

Photos by D. Irwin

Association of DNA and song: The winter wren contact zone

OBSERVED Western song

Eastern song

Totals

Western mtDNA

12 0 12

Eastern mtDNA

0 4 4

Totals 12 4 16

Data from Toews & Irwin 2008, Molecular Ecology

Calculating the expectations A shortcut for calculating expectations (assuming H0 is true): EXP. Western

song Eastern

song Totals

Western mtDNA 12

Eastern mtDNA 4

Totals 12 4 16

Exp[row i, column j] =

(row i total)(column j total) grand total

Exp[w mtDNA, w song] = 12*12/16 = 9

Comparing observed and expected

EXP. Western song

Eastern song

Totals

Western mtDNA 9 3 12

Eastern mtDNA 3 1 4

Totals 12 4 16

OBS. Western song

Eastern song

Totals

Western mtDNA 12 0 12

Eastern mtDNA 0 4 4

Totals 12 4 16

Too many of the expected are below 5, so we cannot use the χ2 contingency test. Instead, we use a computer to do Fisher’s exact test:

P = 0.00055, so we reject the H0 of no association.

In-class Exercise Do mosquitos infected with malaria bite more people?

Infected Uninfected Total

Multiple Bites

20 16 36

Single Bite

69 157 226

Total 89 173 262

In-class Exercise Do mosquitos infected with malaria bite more people? H0: Biting multiple times is independent of malaria infection HA: Biting multiple times is dependent on malaria infection

In-class Exercise Do mosquitos infected with malaria bite more people?

Infected Uninfected Total

Multiple Bites

20 16 36

Single Bite

69 157 226

Total 89 173 262

Pr[Infected] = 89/262 Pr[Multiple] = 36/262

If H0 is true, Pr[Infected AND multiple bites] = (0.340)(0.137) = 0.047

In-class Exercise Do mosquitos infected with malaria bite more people?

Infected Uninfected Total

Multiple Bites

262(0.0467) = 12.23

23.77 36

Single Bite

76.77 149.23 226

Total 89 173 262

Pr[Infected] = 89/262 Pr[Multiple] = 36/262

If H0 is true, Pr[Infected AND multiple bites] = (0.3400)(0.1374) = 0.0467

Expected:

χ2

χ 2 =Observedi −Expectedi( )2

Expectedii∑

=20−12.23( )2

12.23+69− 76.77( )2

76.77+16− 23.77( )2

23.77+157−149.23( )2

149.23

= 8.67

In-class Exercise Do mosquitos infected with malaria bite more people? χ2 = 8.67 df = (2-1)(2-1)= 1

Table A - χ2 distribution

In-class Exercise Do mosquitos infected with malaria bite more people? χ2 = 8.67 df = (2-1)(2-1)= 1 0.005 > P > 0.001 We reject the null hypothesis. Biting multiple times is dependent on malaria infection

Midterm: Next Thursday Oct 22nd

at 2pm Last Name Starts With

Building and Room Number

A-C MATX 1100

D-K MCML 166

L-Z WESB 100