Upload
shanna-haynes
View
219
Download
0
Tags:
Embed Size (px)
Citation preview
Hypothesis testing
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Hypothesis testing
• Want to know something about a population
• Take a sample from that population• Measure the sample• What would you expect the sample to
look like under the null hypothesis?• Compare the actual sample to this
expectation
population
sample
Y = 2675.4
Hypothesis testing
• Hypotheses are about populations
• Tested with data from samples
• Usually assume that sampling is random
Types of hypotheses
• Null hypothesis - a specific statement about a population parameter made for the purposes of argument
• Alternate hypothesis - includes other possible values for the population parameter besides the value states in the null hypothesis
The null hypothesis is usually the simplest
statement, whereas the alternative hypothesis
is usually the statement of greatest
interest.
A good null hypothesis would be interesting if proven wrong.
A null hypothesis is specific; an alternate hypothesis is not.
Hypothesis testing: exampleCan sheep recognize each other?
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
The experiment and the results
• Sheep were trained to get a reward near a certain other sheep’s picture
• Then placed in a Y-shaped maze
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
You must choose…
Stating the hypotheses
H0: Sheep go to each face with equal probability (p = 0.5).
HA: Sheep choose one face over the other (p ≠ 0.5).
Estimating the value
• 16 of 20 is a proportion of p = 0.8
• This is a discrepancy of 0.3 from the proportion proposed by the null hypothesis, p =0.5
Null distribution
• The null distribution is the sampling distribution of outcomes for a test statistic under the assumption that the null hypothesis is true
Proportion of correct choices
Probability
0 0.000001
1 0.00002
2 0.00018
3 0.0011
4 0.0046
5 0.015
6 0.037
7 0.074
8 0.12
9 0.16
10 0.18
11 0.16
12 0.12
13 0.074
14 0.037
15 0.015
16 0.0046
17 0.0011
18 0.00018
19 0.00002
20 0.000001
• Test statistic = a quantity calculated from the data that is used to evaluate how compatable the data are with the expectation under the null hypothesis
The null distribution of p
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Number of correct choices
Frequency
Test statistic = 16
The null distribution of p
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Number of correct choices
Frequency
Values at least as extreme as the test statistic
• P-value - the probability of obtaining the data* if the null hypothesis were true
*as great or greater difference from the null hypothesis
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Number of correct choices
Frequency
P = 0.012
P-value
P-value calculation
P
=2*(Pr[16]+Pr[17]+Pr[18]+Pr[19]+Pr[20])
=2*(0.005+0.001+0.0002+0.00002+0.000001)
= 0.012
How to find P-values
• Get test statistic
• Compare with null distribution from:– Simulation– Parametric tests– Non-parametric tests– Re-sampling
Statistical significance
The significance level, α, i s a probability used as a criterion
for rejecti ng the nul l hypothesi .s If th e P-valu e for a test is
less than or equal t o α, the n the null hypothesi s is rejecte .d
α is often 0.05
Significance for the sheep example
• P = 0.012
• P < α, so we can reject the null hypothesis
Larger samples give more information
• A larger sample will tend to give and estimate with a smaller confidence interval
• A larger sample will give more power to reject a false null hypothesis
Hypothesis testing: another exampleThe genetics of symmetry in flowers
Heteranthera - Mud plantain
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Stigma and anthers are asymmetric in different genotypes
Can the pattern of inheritance be explained by a single locus
with simple dominance? Model predicts a 3:1 ratio of right-handed
flowers
H0: Right- and left-handed offspring occur at a 3:1 ratio (the proportion of right-handed individuals in the offspring population is p = 3/4)
HA: Right- and left-handed offspring do not occur at a 3:1 ratio (p ≠ 3/4)
Data
Of 27 offspring, 21 were “right-handed” and 6 were “left-handed.”
Estimating the proportion
€
ˆ p =21
27= 0.778
* The “hat” notation denotes an estimate for apopulation parameter from a sample
Number of right-handed flowers in a random sampleof 27
Probability
0-8 09 0.00000510 0.00003111 0.00012412 0.00049213 0.00175214 0.00530115 0.01383016 0.03109417 0.06067318 0.10089119 0.14305120 0.17233921 0.17178222 0.14103423 0.09147724 0.04549925 0.01640926 0.00375927 0.000457
Sampling distribution of null hypothesis
P = 0.83.
The P-value:
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Rock-paper-scissors battle
Jargon
Significance level
• A probability used as a criterion for rejecting the null hypothesis
• Called α• If p < α, reject the null hypothesis
• For most purposes, α = 0.05 is acceptable
Type I error
• Rejecting a true null hypothesis
• Probability of Type I error is α (the significance level)
Type II error
• Not rejecting a false null hypothesis
• The probability of a Type II error is
• The smaller , the more power a test has
Power
• The probability that a random sample of a particular size will lead to rejection of a false null hypothesis
• Power = 1-
Reality
Result
Ho true Ho false
Reject Ho
Do not reject Ho correct
correctType I error
Type II error
One- and two-tailed tests
• Most tests are two-tailed tests
• This means that a deviation in either direction would reject the null hypothesis
• Normally α is divided into α/2 on one side and α/2 on the other
Test statistic
One-sided tests
• Also called one-tailed tests
• Only used when one side of the null distribution is nonsensical
• For example, comparing grades on a multiple choice test to that expected by random guessing
Critical value
• The value of a test statistic beyond which the null hypothesis can be rejected
“Statistically significant”
• P < α
• We can “reject the null hypothesis”
We never “accept the null hypothesis”