Topic 17 Simple Hypotheses - University of Arizonajwatkins/H2_hypothesis.pdfrejecting the null hypothesis that the resident species is the model species. Thenull hypothesis- 0 = 10

Mimicry Introduction to the Bayesian Approach Comparison of Approaches Formulation

Topic 17Simple Hypotheses

Power and the Receiving Operator Characteristic

1 / 24


Outline

MimicryNormal ObservationsPowerReceiver Operator CharacteristicPower versus Sample Size

Introduction to the Bayesian Approach

Comparison of Approaches

Formulation

2 / 24


Mimicry• Mimicry is the similarity of one species

to another in a manner that enhancesthe survivability of one or both species- the model and mimic.

• This similarity can be, for example, inappearance, behavior, sound, or scent.

• One method for producing a mimicspecies is hybridization. This results inthe transferring of adaptations fromthe model species to the mimic.

The genetic signature of this has recently been discovered in Heliconius butterflies witha region of the chromosome that displays an almost perfect genotype by phenotypeassociation across four species in the genus Heliconius.

3 / 24


Mimicry

Consider a model butterfly species with mean wingspan µ0 = 10 cm and a mimicspecies with mean wingspan µ1 = 7 cm. Both species have standard deviation σ0 = 3cm. Collect 16 specimen to decide if the mimic species has migrated into a givenregion. If we assume, for the null hypothesis, that the habitat under study is populatedby the model species, then

• a type I error is falsely concluding that the species is the mimic when indeed themodel species is resident and

• a type II error is falsely concluding that the species is the model when indeed themimic species has invaded.

If our action is to begin an eradication program if the mimic has invaded, then a type Ierror would result in the eradication of the resident model species and a type II errorwould result in the letting the invasion by the mimic take its course.

4 / 24


Mimicry

To begin, we set a significance level. The choice of an α = 0.05 test means that weare accepting a 5% chance of having a type I error. If the goal is to design a test thathas the lowest type II error probability, then the Neyman-Pearson lemma tells us thatthe critical region is determined by a threshold level kα for the likelihood ratio.

C =

{x;

L(µ1|x)

L(µ0|x)≥ kα

}.

Our model is X = (X1, . . . ,Xn), independent normal observations with unknown meanand known variance σ2

0. The hypothesis is

H0 : µ = µ0 versus H1 : µ = µ1.

5 / 24


Normal ObservationsWe look to determine the critical region.

L(µ1|x)

L(µ0|x)=

1√2πσ2

0

exp− (x1−µ1)2

2σ20· · · 1√

2πσ20

exp− (xn−µ1)2

2σ20

1√2πσ2

0

exp− (x1−µ0)2

2σ20· · · 1√

2πσ20

exp− (xn−µ1)2

2σ20

=exp− 1

2σ20

∑ni=1(xi − µ1)2

exp− 12σ2

0

∑ni=1(xi − µ0)2

= exp− 1

2σ20

n∑i=1

((xi − µ1)2 − (xi − µ0)2

)= exp−µ0 − µ1

2σ20

n∑i=1

(2xi − µ1 − µ0)

Because the exponential function is increasing, the critical region are those x so that

µ1 − µ0

2σ20

n∑i=1

(2xi − µ1 − µ0) exceeds some critical value.

6 / 24


Normal Observations

µ1 − µ0

2σ20

n∑i=1

(2xi − µ1 − µ0) exceeds some critical value.

Because µ1 < µ0, this is equivalent to x̄ bounded by some critical value,

x̄ ≤ k̃α,

where k̃α is chosen to satisfyPµ0{X̄ ≤ k̃α} = α.

If we assume that µ = µ0, then X̄ is N(µ0, σ0/√n) and consequently the standardized

version of X̄ ,

Z =X̄ − µ0

σ0/√n,

is a standard normal. Set zα so that P{Z ≤ −zα} = α.7 / 24


Normal ObservationsUnder the null hypothesis, X̄ has a normal distribution with mean µ0 = 10 andstandard deviation σ/

√n = 3/4. This using the distribution function of the normal we

can find critical values in R with

• qnorm(0.05,10,3/4), yielding k̃α = 8.767 for the test statistic x̄ or

• qnorm(0.05) yielding z̃α = −1.645 for the test statistic z .

Now let’s look at data.

> x

[1] 6.8 9.5 6.0 8.5 11.7 9.7 7.6 8.0 8.4 6.7 10.5 9.3 6.2 14.4 12.6 9.7

> mean(x)

[1] 9.1

Then x̄ = 9.1 and z =9.1− 10

3/√

16= −1.2.

k̃α = 8.766 < 9.1 or −zα = −1.645 < −1.2 and we fail to reject the null hypothesis.8 / 24


Power

Exercise. Give an intuitive explanation why the power should

• increase as a function of |µ1 − µ0|,• decrease as a function of σ2

0, and

• increase as a function of n.

Next we determine the type II error probability. We will be guided by the fact that

X̄ − µ1

σ0/√n

is a standard normal random variable in the case that H1 : µ = µ1 is true.

9 / 24


PowerExercise. Let zα be the α-upper tail probability. Show that

X̄ − µ0

σ0/√n< −zα if and only if

X̄ − µ1

σ0/√n< −zα +

µ0 − µ1

σ0/√n.

So the power is the probability of rejecting H0 when H1 is true. Assuming µ = µ1,

1− β = Pµ1

{X̄ − µ1

σ0/√n< −zα +

µ0 − µ1

σ0/√n

}= Φ

(−zα +

µ0 − µ1

σ0/√n

).

> alpha<-0.05;zalpha<-qnorm(1-alpha);mu0<-10;mu<-7;sigma<-3;n<-16

> power<-pnorm(-zalpha+(mu0-mu)/(sigma/sqrt(n)))

> power

[1] 0.9907423

The type II error probability is β = 1− 0.9907 = 0.0093, a bit under 1%.10 / 24


Power

• Density of X̄ for normal data underthe null hypothesis - µ0 = 10 (black)and σ0/

√n = 3/

√16 = 3/4.

• With an α = 0.05 level test, thecritical valuek̃α = µ0 − zασ0/

√n = 8.766.

• The alternatives shown are• µ1 = 9 (blue) power 0.3777,• µ1 = 8 (blue) power 0.8466, and• µ1 = 7 (red) power 0.9907.

4 6 8 10 12

0.0

0.1

0.2

0.3

0.4

0.5

0.6

mudensity

4 6 8 10 12

0.0

0.1

0.2

0.3

0.4

0.5

0.6

x

4 6 8 10 12

0.0

0.1

0.2

0.3

0.4

0.5

0.6

mudensity

4 6 8 10 12

0.0

0.1

0.2

0.3

0.4

0.5

0.6

mudensity

4 6 8 10 12

0.0

0.1

0.2

0.3

0.4

0.5

0.6

mudensity

11 / 24


Receiver Operator Characteristic

The corresponding receiver operator charac-teristics curves of the power 1−β versus thesignificance α using equation

1− β = Φ

(−zα +

µ1 − µ0

σ0/√n

).

The power for an α = 0.05 test indicated bythe intersection of vertical dashed line andthe curves.

• µ1 = 9 (blue) power 0.3777,

• µ1 = 8 (blue) power 0.8466, and

• µ1 = 7 (red) power 0.9907. 0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

alphapower

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

12 / 24


Power versus Sample Size

Power as a function of the number of obser-vations n for

• An α = 0.01 level test is chosen toreflect a stringent criterion forrejecting the null hypothesis that theresident species is the model species.

• The null hypothesis - µ0 = 10.

• The alternatives shown are µ1 = 9, 8(blue) and µ1 = 7 (red).

0 20 40 60 80 100

0.0

0.2

0.4

0.6

0.8

1.0

npower

0 20 40 60 80 100

0.0

0.2

0.4

0.6

0.8

1.0

npower

0 20 40 60 80 100

0.0

0.2

0.4

0.6

0.8

1.0

npower

13 / 24


Introduction to the Bayesian Approach

As with other aspects of the Bayesian approach to statistics, hypothesis testing isclosely aligned with Bayes theorem. For a simple hypothesis, we begin with a priorprobability for each of the competing hypotheses.

π{θ0} = P{H0 is true} and π{θ1} = P{H1 is true}.

Naturally, π{θ0}+ π{θ1} = 1. Although this is easy to state, the choice of a priorought to be grounded in solid scientific reasoning.

As before, we collect data and with it compute the posterior probabilities of the twoparameter values θ0 and θ1. This gives us the posterior probabilities that H0 is trueand H1 is true.

14 / 24


Comparison of Approaches

Bayesian Approach

• Begins with a prior probability that H0

is true.

• Uses the data and Bayes formula tocompute the posterior probability thatH1 is true.

• The decision to reject H0 is based onminimizing risk using presumed valuesfor losses for both type I and type IIerrors.

Classical Approach

• Begins with the assumption is that H0

is true.

• Uses a significance level to construct acritical region to make a decision toreject H0.

• The decision to reject H0 is based onwhether or not the data land in thecritical region. The region is chosen tominimize type II errors.

The question: What is the probability that H1 is true? has no meaning in the classicalsetting. 15 / 24


FormulationRecall Bayes formula for events A and C ,

P(C |A) =P(A|C )P(C )

P(A|C )P(C ) + P(A|C c)P(C c),

C = {Θ̃ = θ1} = {H1 is true} and A = {X = x}.For discrete data, we have the conditional probabilities for the alternative hypothesis.

P(A|C ) = Pθ1{X = x} = fX (x|θ1) = L(θ1|x).

Similarly, for the null hypothesis,

P(A|C c) = Pθ0{X = x} = fX (x|θ0) = L(θ0|x).

The posterior probability that H1 is true,

P(C |A) = fΘ̃|X (θ1|x) = P{H1 is true|X = x} = P{Θ̃ = θ1|X = x}.

16 / 24


Formulation

Returning to Bayes formula, we make the substitutions,

P(C |A) =P(A|C )P(C )

P(A|C )P(C ) + P(A|C c)P(C c),

fΘ̃|X (θ1|x) =L(θ1|x)π{θ1}

L(θ1|x)π{θ1}+ L(θ0|x)π{θ0}.

Rewrite the expression above in terms of odds, i. e., as the ratio of probabilities.

fΘ̃|X (θ1|x)

fΘ̃|X (θ0|x)=

P{H1 is true|X = x}P{H0 is true|X = x}

=P{Θ̃ = θ1|X = x}P{Θ̃ = θ0|X = x}

=L(θ1|x)

L(θ0|x)· π{θ1}π{θ0}

.

With this expression we see that the posterior odds are equal to the likelihood ratiotimes the prior odds.

17 / 24


Formulation

Relying on these odds alone failed to take into account that impact of an incorrectdecision.

The decision whether or not to reject H0 depends on the values assigned for the lossobtained in making such a conclusion. We begin by setting values for the loss.

loss function table

decision H0 is true H1 is true

H0 0 ÌIH1 Ì 0

The Bayes procedure is to make the decision that has the smaller posterior expectedloss, also known as the risk.

18 / 24


FormulationIf the decision is H0, the loss L0(x) takes on two values

L0(x) =

{0 with probability P{H0 is true|X = x},ÌI with probability P{H1 is true|X = x}.

In this case, the expected loss

EL0(x) = ÌIP{H1 is true|X = x}

is a product of the loss and the probability of incorrectly choosing H0.

Exercise. If the decision is H1, the expected loss

EL1(x) = ÌP{H0 is true|X = x}.

19 / 24


FormulationSo, we reject H0 whenever the risk is greater for H0 than H1

EL0(x) > EL1(x)

or

1 <EL0(x)

EL1(x)=ÌIP{H1 is true|X = x}ÌP{H0 is true|X = x}

.

Stated in terms of odds,

ÌÌI

<P{H1 is true|X = x}P{H0 is true|X = x}

=L(θ1|x)

L(θ0|x)· π{θ1}π{θ0}

,

andL(θ1|x)

L(θ0|x)>

ÌÌI

/π{θ1}π{θ0}

.

20 / 24


Formulation

L(θ1|x)

L(θ0|x)>

ÌÌI

/π{θ1}π{θ0}

.

Thus, the criterion for rejecting H0 is a level test on the likelihood ratio, exactly thesame type of criterion used in classical statistics. However, the rationale, thus thevalue for the ratio necessary to reject, can be quite different.

For normal observations with means µ0 for the null hypothesis and µ1 for thealternative hypothesis. If the standard deviation has a known value, σ0, we have thelikelihood ratio

L(µ1|x)

L(µ0|x)= exp−µ0 − µ1

2σ20

n∑i=1

(2xi − µ1 − µ0) = exp

(−µ0 − µ1

2σ20

n(2x̄ − µ1 − µ0)

).

21 / 24


Example

Exercise. For the example on the model and mime butterfly species, µ0 = 10, µ1 = 7,σ0 = 3, and n = 16 observations.

• Solve the equation for rejecting H0 for x̄ in terms of Ì, ÌI, π{θ0} and π{θ1}.• Give the threshold values for x̄ in the table below.

prior probability π{θ0}Ì/ÌI 0.05 0.10 0.20

1/2

1

2

• What situations give the lowest and highest threshold values for x̄? Explain youranswer.

22 / 24


Example

L(θ1|x)

L(θ0|x)= exp

(−µ0 − µ1

2σ20

n(2x̄ − µ1 − µ0)

)>

ÌÌI

/π{θ1}π{θ0}

2x̄ − µ1 − µ0 <2σ2

0

n(µ1 − µ0)ln

(ÌÌI

/π{θ1}π{θ0}

)x̄ <

σ20

n(µ1 − µ0)ln

(ÌÌI

/π{θ1}π{θ0}

)+µ1 + µ0

2

x̄ <3

nln

(ÌÌI

/π{θ1}π{θ0}

)+

17

2

23 / 24


Example

> pi0<-0.05;lr<-1/2;mu0<-10;mu1<-7;sigma2<-3;n<-16

> lr<-c(1,2,4)/2

> pi0<-0.05

> sigma2/(n*(mu1-mu0))*log(lr/((1-pi0)/pi0))+(mu0+mu1)/2

[1] 8.727349 8.684027 8.640706

> pi0<-0.10


[1] 8.680648 8.637327 8.594005

> pi0<-0.20


[1] 8.629965 8.586643 8.543322

24 / 24

Documents

Topic 17 Simple Hypotheses - University of Arizonajwatkins/H2_hypothesis.pdfrejecting the null hypothesis that the resident species is the model species. Thenull hypothesis- 0 = 10