Upload
solomon-hamilton
View
228
Download
0
Embed Size (px)
Citation preview
Hypothesis Testing: One Sample Cases
Outline:
– The logic of hypothesis testing– The Five-Step Model– Hypothesis testing for single sample means
(z test)– One- vs. Two- tailed tests
Significant Differences• Hypothesis testing is designed to detect significant
differences: differences that did not occur by random chance.
• In the “one sample” case: we compare a random sample (from a large group) to a population.
• We compare a sample statistic to a population parameter to see if there is a significant difference.
Our Problem:
• The education department at a university has been accused of “grade inflation” so education graduates have much higher scores than students in general.
• Scores of all education graduates should be compared with the scores of all students.– There are 1000s of education graduates, far too many to
interview. – How can this be investigated without interviewing all education
graduates?
What we know:• The mean scores for all
students is 2.70 and the s.d = 0.7. These values are parameters.
The population value
• To the right is the statistical information for a random sample of education graduates:
= 2.70 = 0.70
Sample mean = 3.00
Sample size n = 117
X
Questions to ask:
• Is there a difference between the parameter (2.70) and the statistic (3.00) sample mean?
• Could the observed difference have been caused by random chance?
• Is the difference real (significant)?
Two Possibilities:
1. The sample mean (3.00) is the same as the pop. mean (2.70).
– The difference is trivial and caused by random chance.
2. The difference is real (significant).– Education graduates are different from all
students.
The Null and Alternative Hypotheses:1. Null Hypothesis (H0)
– The difference is caused by random chance.
– The H0 always states there is “no significant difference.” In this
case, we mean that there is no significant difference between the population mean and the sample mean.
2. Alternative hypothesis (H1)
– “The difference is real”.
– (H1) always contradicts the H0.
• One (and only one) of these explanations must be true. Which one?
Test the Explanations• We always test the Null Hypothesis.
• Assuming that the H0 is true:
– What is the probability of getting the sample mean (3.00) if the H0 is true and all education
graduates really have a mean of 2.70?
– In other words, the difference between the means is due to random chance.
– If the probability associated with this difference is less than 5%, reject the null hypothesis.
Test the Hypotheses• Use the 5% value as a guideline to identify differences that
would be rare or extremely unlikely if H0 is true. This
“alpha” value defines the “region of rejection.”
• Use the Z score formula for single samples and
• If the probability is less than 5%, the calculated or “observed” Z score will be beyond ±1.96 (the “critical” Z score).
Two-tailed Hypothesis Test:
When α = 5%, then 2.5% of the area is distributed on either side of the curve in area (C )
The 95% in the middle section represents no significant difference between the population and the sample mean.
The cut-off between the middle section and +/- 2.5% is represented by a Z-value of +/- 1.96.
Z= -1.96
c
Z = +1.96
c
Testing Hypotheses:Using The Five Step Model…
1. Make Assumptions and meet test requirements.
2. State the null hypothesis.3. Select the sampling distribution and
establish the critical region.4. Compute the test statistic.5. Make a decision and interpret results.
Step 1: Make Assumptions and Meet Test Requirements
• Random sampling– Hypothesis testing assumes samples were selected using random
sampling. – In this case, the sample of 117 cases was randomly selected from all
education graduates.
• Sampling Distribution is normal in shape– This is a “large” sample (n≥100).
Step 2 State the Null Hypothesis
• H0: μ = 2.7
• You can also state Ho: No difference between the sample mean and
the population parameter
– (In other words, the sample mean of 3.0 is really the same as the population mean of 2.7 – the difference is not real but is due to chance.)
– The sample of 117 comes from a population that has a score of 2.7. – The difference between 2.7 and 3.0 is trivial and caused by random
chance.
Step 2 (cont.) State the Alternate Hypothesis• H1: μ≠2.7
• Or H1: There is a difference between the sample mean and the population parameter – The sample of 117 comes from a population that does not have
a score of 2.7. In reality, it comes from a different population.– The difference between 2.7 and 3.0 reflects an actual difference
between education graduates and other students.– Note that we are testing whether the population the sample
comes from is from a different population or is the same as the general student population.
Step 3 Select Sampling Distribution and Establish the Critical Region
• Sampling Distribution= Z
– Alpha (α) = 5%
– α is the indicator of “rare” events.
– Any difference with a probability less than α is rare and will cause us to reject the H0.
Step 3 (cont.) Select Sampling Distribution and Establish the Critical Region
• Critical Region begins at Z= ± 1.96
– This is the critical Z score associated with α = 5%, two-tailed test.
– If the obtained Z score falls in the Critical Region, or “the region of rejection,” then we would reject the H0.
Step 4: Use Formula to Compute the Test Statistic Z for large samples (≥ 100)
Z
N
Test the Hypotheses
• Substituting the values into the formula, we calculate a Z score of 4.6357.
3.0 2.74.6357
0.7
117
Z
Step 5 Make a Decision and Interpret Results
• The obtained Z score falls in the Critical Region, so we reject the H0.
The Z value >1.96
– If the H0 were true, a sample outcome of 3.00 would be unlikely.
– Therefore, the H0 is false and must be rejected.
• Education graduates have a score that is significantly different from the general student body (Z = 4.62, α = 5%).*
• *Note: Always report significant statistics.
Looking at the curve:(Area C = Critical Region when α=.05)
Z= -1.96
c
Z = +1.96
c z= +4.62 I
Summary:
• The score of education graduates is significantly different from the score of the general student body.
• In hypothesis testing, we try to identify statistically significant differences that did not occur by random chance.
• In this example, the difference between the parameter 2.70 and the statistic 3.00 was large and unlikely
(p < 5%) to have occurred by random chance.
Summary (cont.)
• We rejected the H0 and concluded that the
difference was significant.
• It is very likely that Education graduates have scores higher than the general student body
Rule of Thumb:• If the test statistic is in the Critical Region
( α = 5%, beyond ±1.96):
– Reject the H0. The difference is significant.
• If the test statistic is not in the Critical Region (at α = 5%, between +1.96 and -1.96):
– Fail to reject the H0. The difference is not
significant.
Two-tailed vs. One-tailed Tests• In a two-tailed test, the direction of the
difference is not predicted.• A two-tailed test splits the critical region
equally on both sides of the curve.• In a one-tailed test, the researcher predicts
the direction (i.e. greater or less than) of the difference.
• All of the critical region is placed on the side of the curve in the direction of the prediction.
• Assumptions– Population Is Normally Distributed– If Not Normal, use large samples– Null Hypothesis Has =, , or Sign Only
• Z Test Statistic:
One-Tail Z Test for Mean (Known)
xz
n
Z
Reject H 0
Z
Reject H
0
H0: H1: <
H0: 0 H1: >
Must Be Significantly Below
=
Small values don’t contradict H0
Don’t Reject H0!
Rejection Region
•Does an average box of cereal contain 368 grams of cereal?
•A random sample of 25 boxes showed X = 372.5.
•The company has specified to be 15 grams.
•Test at the 5% level.
368 gm.
Example: One Tail Test
H0: 368 H1: > 368
_
Z0 1.645
What Is Z given = 5%?
= 5%
Finding Critical Values: One Tail
Critical Value = 1.645
InvNormal(.95) = 1.645
Use % points table below normal table in formula book
•= 5%
•n = 25
•Critical Value: 1.645
Test Statistic:
Decision:
Conclusion:
Do Not Reject H0 at = 5%
No Evidence True Mean Is More than 368
Z0 1.645
.05
Reject
Example Solution: One Tail
H0: 368 H1: > 368 372.5 368
1.5015
25
XZ
n
1.5<critical value of 1.645
1.5
Z0 1.50
p Value6.68%
From Z Table: Lookup InvNorm (1.50)
93.32%
Use the alternative hypothesis to find the direction of the test.
P(Z 1.50) = 6.68%
p Value Solution
0 1.50 Z
Reject
(p Value = 6.68%) ( = 5%). Do Not Reject.
p Value = 6.68%
= 5%
Test Statistic Is In the Do Not Reject Region
p Value Solution
•Does an average box of cereal contains 368 grams of cereal?
•A random sample of 25 boxes showed X = 372.5.
•The company has specified to be 15 grams.
•Test at the 5% level.
368 gm.
Example: Two Tail Test
H0: 368
H1:
368
•= 5%
•n = 25
•Critical Value: ±1.96
Test Statistic:
Decision:
Conclusion:
Do Not Reject Ho at = 5%
No Evidence that True Mean Is Not 368
Example Solution: Two Tail
Z0 1.96
2.5%
Reject
-1.96
2.5%
H0: 386
H1:
386372.5 368
1.5015
25
XZ
n
The Curve for Two- vs. One-tailed Tests at α = .05:
Two-tailed test:“is there a significant
difference?”
One-tailed tests:“is the sample meangreater than µ?”
“is the sample meanless than µ?”
11.36
Interpreting the p-value…• The smaller the p-value, the more statistical evidence exists
to support the alternative hypothesis.
• If the p-value is less than 1%, there is overwhelming evidence that supports the alternative hypothesis.
• If the p-value is between 1% and 5%, there is a strong evidence that supports the alternative hypothesis.
• If the p-value is between 5% and 10% there is a weak evidence that supports the alternative hypothesis.
• If the p-value exceeds 10%, there is no evidence that supports the alternative hypothesis.
Type I error• Occurs when an experimenter thinks she/he has a significant
result, but it is really due to chance• Analogous to a “false positive” on a drug test.• Risk of a Type I error is the same as the significance level, e.g.,
p < 5%• Solutions: use a more stringent significance level, use
replication
Type II error• Occurs when a researcher fails to find a significant result
when, in fact, there was something significant going on.• Analogous to a “false negative” on a drug test.• Must be calculated with a test of statistical “power,” e.g.,
given the sample size, how big would an effect have to be in order to detect it?
• Solutions: increase sample size, use more sensitive precise measures, use replication
Type I and Type II errors
In the larger population, H0 is correct
In the larger population H1 is correct
Based on sample, H0 is supported
Correct Decision
Incorrect Decision: Type II Error
Base on sample, H1 is accepted
Incorrect Decision: Type I Error
Correct Decision
Implications• To some extant, Type I and Type II errors trade off with one
another– Decreasing the chance of a Type I error may increase the chance
of a Type II error.• A Type I error is the more shocking of the two
– Type I entails shouting “Eureka” when you haven’t really found it.
– Scientific skepticism makes Type II errors more palatable