Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
One-Population Tests
One Population
t Test (1 & 2 tail)
Z Test (1 & 2 tail)
Z Test
Mean Proportion
(1 & 2 tail)
One-sample test of proportion Z Test of Proportion Exact method using Binomial Distribution
Examples
Example 1. You’re an accounting manager. A year-end audit showed 4% of transactions had errors. You implement new procedures. A random sample of 500 transactions had 25 errors. Has the proportion of incorrect transactions changed? Use the 0.05 significance level.
H0: p = 0.04 vs. H1: p ≠ 0.04
Example 2. A researcher claims that less than 20% of adults in the U.S. are allergic to an herbal medicine. In a SRS of 25 adults, 3 say they have such an allergy. Does this support the researcher’s claim? Test at the 5% level.
H0: p = 0. 2 vs. H1: p < 0. 2
Binomial Distribution
X ~ Binomial (n, p) n = number of trials, p = probability of positive outcome Mean(X) = n p, Var(X) = n p(1- p)
X/n = p = proportion of positive outcomes in a sample of size n = p (population proportion)
Var( ) = p(1- p)/n By CLT, can be approximated by Normal:
if
^
5)1( ≥− pnp
)ˆ(E p
p̂
))1(,(~ˆn
pppNp −
One-Sample Z Test for Proportion
Hypothesis: H0: p=p0 v.s. H1: p≠p0
Assumptions Two Categorical Outcomes # of success follows Binomial distribution Normal approximation can be used If
500 ≥qnp
One-Sample Z Test for Proportion
Hypothesis: H0: p=p0 v.s. H1: p≠p0
Assumptions Two Categorical Outcomes # of success Population Follows Binomial Distribution Normal Approximation Can Be Used If
Z-test statistic for proportion
500 ≥qnp
n)p( 1p
ppZ00
0
−⋅−
=ˆ
Hypothesized population proportion
One-Sample Test of Proportion Example 1
You’re an accounting manager. A year-end audit showed 4% of transactions had errors. You implement new procedures. A random sample of 500 transactions had 25 errors. Has the proportion of incorrect transactions changed? Use the 0.05 significance level.
One-Sample Z Test of Proportion
H0: p=p0 = 0.04 Ha: p ≠ p0=0.04 α = .05 n = 500 Critical Value(s):
Test Statistic:
Decision: Conclusion:
One-Sample Z Test of Proportion
H0: p=p0 = 0.04 Ha: p ≠ p0=0.04 α = .05 n = 500 Critical Value(s):
Test Statistic:
Decision: Conclusion:
0 0 500*0.04*(1 0.04) 19.2 5np q = −
= ≥ Z Test
One-Sample Z Test of Proportion
H0: p=p0 = 0.04 Ha: p ≠ p0=0.04 α = .05 n = 500 Critical Value(s):
Test Statistic:
Decision: Conclusion:
14.1
500)04.1(04.
04.50025
ˆ=
−⋅
−=
−⋅−
≈
n)p( 1p
ppZ00
0
0 0 500*0.04*(1 0.04) 19.2 5np q = −
= ≥ Z Test
One-Sample Z Test of Proportion
H0: p=p0 = 0.04 Ha: p ≠ p0=0.04 α = .05 n = 500 Critical Value(s):
Test Statistic:
Decision: Conclusion:
Z0 1.96-1.96
.025Reject H0 Reject H0
.025
14.1
500)04.1(04.
04.50025
ˆ=
−⋅
−=
−⋅−
≈
n)p( 1p
ppZ00
0
0 0 500*0.04*(1 0.04) 19.2 5np q = −
= ≥ Z Test
One-Sample Z Test of Proportion
H0: p=p0 = 0.04 Ha: p ≠ p0=0.04 α = .05 n = 500 Critical Value(s):
Test Statistic:
Decision: Conclusion:
Z0 1.96-1.96
.025Reject H0 Reject H0
.025
Do not reject at α = .05
There is no evidence proportion has changed from 4%
14.1
500)04.1(04.
04.50025
ˆ=
−⋅
−=
−⋅−
≈
n)p( 1p
ppZ00
0
0 0 500*0.04*(1 0.04) 19.2 5np q = −
= ≥ Z Test
One-sample test of Proportion Example 2
A researcher claims that less than 20% of adults in the U.S. are allergic to an herbal medicine. In a SRS of 25 adults, 3 say they have such an allergy. Does this support the researcher’s claim? Test at the 5% level.
Is ?
25 * 0.2 * 0.8 = 4
500 ≥qnp
Exact Method using Binomial Distribution-One sided p-value
If Normal approximation cannot be used, i.e. if
then Ha : p<p0 one-sided p-value=P(X ≤ x success in n trials | H0) EXCEL: BINOMDIST(x,n,p0,TRUE) Ha : p >p0 one-sided p-value=P(X ≥ x success in n trials | H0) EXCEL: 1-BINOMDIST(x-1,n,p0,TRUE)
500 <qnp
0 00(1 )x k n k
k
np p
k−
=
= −
∑
0 0(1 )n k n kk x
np p
k−
=
= −
∑
Exact Method using Binomial Distribution-Two sided p-value
If Normal approximation cannot be used, i.e. if
then for Ha : p≠ p0, the two sided pvalue can be calculated by If p-value=2 P(X ≤ x success in n trials | H0) EXCEL: 2*BINOMDIST(x,n,p0,TRUE) If p-value=2* P(X>=x successs in n trials | H0) EXCEL: 2*(1-BINOMDIST(x-1,n,p0,TRUE))
500 <qnp
0 002 (1 )x k n k
k
np p
k−
=
= −
∑
0 02 (1 )n k n kk x
np p
k−
=
= −
∑
0ˆ xp pn
= <
0ˆ xp pn
= >
NOTE: TRUE: cumulative FALSE: probability mass
One-sample test of Proportion Example 2
A researcher claims that less than 20% of adults in the U.S. are allergic to an herbal medicine. In a SRS of 25 adults, 3 say they have such an allergy. Does this support the researcher’s claim? Test at the 5% level.
n=25 p0=0.2 x=3
3ˆ 0.1225
xpn
= = =
One-sample test of Proportion Example 2 Solution
H0: Ha: α = n =
Decision:
P-value = Conclusion:
One-sample test of Proportion Example 2 Solution
H0: p = p0=0.2 Ha: p< p0=0.2 α = .05 n = 25 x=3
Decision:
P-value = Conclusion:
One-sample test of Proportion Example 2 Solution
H0: p = p0=0.2 Ha: p< p0=0.2 α = .05 n = 25 x=3
Decision:
P-value = Conclusion:
kk
k k−
=
−
∑ 253
0)2.01(2.0
25
0 25 1 24
2 23 3 22
25 250.2 0.8 0.2 0.8
0 1
25 250.2 0.8 0.2 0.8
2 3
+ +
= +
EXCEL: BINOMDIST(3,25,0.2,TRUE)
One-sample test of Proportion Example 2 Solution
H0: p = p0=0.2 Ha: p0< p0=0.2 α = .05 n = 25 x=3
Decision: Do not reject at α = .05
P-value = Conclusion:
kk
k k−
=
−
∑ 253
0)2.01(2.0
25
05.0234.0
8.02.0325
8.02.0225
8.02.0125
8.02.0025
223232
241250
>=
+
+
+
=
There is no evidence Proportion is less than 20%
EXCEL: BINOMDIST(3,25,0.2,TRUE)
Review for Hypothesis Testing
One-sample tests for population mean, μ: Z-test if σ is known T-test if σ is unknown
One-sample test for population proportion, p: Z-test if npoqo ≥ 5 Exact method using Binomial distribution
Two-sample tests for difference in population means, μ1 – μ2: Independent samples:
Z-test if σ’s are known T-test with pooled estimate of variance if σ’s are unknown and can be assumed
equal T-test with unequal variances if σ’s are unknown and cannot be assumed equal
Paired samples: Paired Z-test for the difference if σ is known Paired T-test for the difference if σ is unknown
Two-sample test for difference in population variances: F-test with df1 = n1 – 1 and df2 = n2 – 1
Analysis of Variance (ANOVA) Multisample Inference
Learning Objectives
Until now, we have considered two groups of individuals and we've wanted to know if the two groups were sampled from distributions with equal population means or medians.
Suppose we would like to consider more than two groups of individuals and, in particular, test whether the groups were sampled from distributions with equal population means.
How to use one-way analysis of variance (ANOVA) to test for differences among the means of several populations ( “groups”)
Hypotheses of One-Way ANOVA
All population means are equal No treatment effect (no variation in means among groups)
At least one population mean is different There is a treatment effect Does not mean that all population means are different
(some pairs may be the same)
H1 :Not all of the population means are the same
One-Factor ANOVA
All means are the same: The null hypothesis is true
(No treatment effect)
One-Factor ANOVA
At least one mean is different: The null hypothesis is NOT true
(Treatment effect is present)
or
(continued)
One-Way ANOVA: Model Assumptions
The K random samples are drawn from K independent populations
The variances of the populations are identical The underlying data are approximately normally
distributed
Basic Idea partitioning the variation
Suppose there are K groups with observations. = =
=
-th observation in -th group, overall mean,
mean of group ij
i
y j i y
y i
( ) ( )= + − + −ij i ij iy y y y y y
Deviation of group mean from grand
mean
Deviation of observations from
group mean
Knnn ,...,, 21
Partitioning the variation
( ) ( )− = − + −ij ij i iy y y y y y
y ij − yi = Deviation of observations from group mean (within group variability)
Deviation of observations from overall mean (between group variability) − = iy y
Group 1 Group 2 Group 3
Response, X
y
y1 y2
y3
Partitioning the variation
( ) ( ) ( )− = − + −∑ ∑ ∑2 2 2
ij ij i iy y y y y yTotal variation (total SS)
Variation due to random sampling (within SS)
Variation due to factor (between SS)
Total variation is the sum of Within-group variability and Between-group variability
( ) ( )− = − + −ij ij i iy y y y y y
y ij − yi = Deviation of observations from group mean (within group variability)
Deviation of observations from overall mean (between group variability) − = iy y
Group 1 Group 2 Group 3
Response, X
Group 1 Group 2 Group 3
Response, X
If Between group variability is large and Within group variability is small => reject Ho
If Between group
variability is small and Within group variability is large => accept Ho
Basic Idea of ANOVA
Partition of Total Variation
Variation Due to Factor (Between SS)
Variation Due to Random Sampling (Within SS)
Total Variation (total SS)
Commonly referred to as: Sum of Squares Within Sum of Squares Error Sum of Squares Unexplained Within-Group Variation
Commonly referred to as: Sum of Squares Between Sum of Squares Among Sum of Squares Explained Among Groups Variation
= +
d.f. = n – 1
d.f. = k – 1 d.f. = n – k
Total Sum of Squares
= =
= −∑∑ 2
1 1( )
jnk
ijj i
Total SS y y
Where:
Total SS = Total sum of squares
k = number of groups (levels or treatments)
nj = number of observations in group j
yij = ith observation from group j
= grand mean (mean of all data values) y
Total SS = Between SS + Within SS
Total Variation
Group 1 Group 2 Group 3
Response, X
= − + − + + −2 2 211 12( ) ( ) ... ( )
kknTotal SS y y y y y y
y
Between-Group Variation
y1
Group 1 Group 2 Group 3
Response, X
= =
= − = − + − + + −∑∑ 2 2 2 21 1 2 2
1 1( ) ( ) ( ) ... ( )
jnk
j k kj i
Between SS y y n y y n y y n y y
y2
y3
y
Within-Group Variation
1Y3Y
Group 1 Group 2 Group 3
Response, X
= = =
= − = −∑∑ ∑2 2
1 1 1( ) ( 1) *
ink k
ij i i ii j i
Within SS y y n S
(continued)
2Y
Obtaining the Mean Squares
Within MS =
Within SSn − k
Between MS =
Between SSk − 1
Total MS =
Total SSn − 1
One-Way ANOVA Table
Source of Variation
df SS MS (Variance)
Between Groups
B SS BMS =
Within Groups n - k W SS WMS =
Total n - 1 TSS = BSS+WSS
k - 1 BMS WMS
F ratio
k = number of groups n = sum of the sample sizes from all groups df = degrees of freedom
BSS k - 1 WSS n - k
F =
One-Way ANOVA F Test Statistic
Test statistic
Degrees of freedom
df1 = k – 1 (k = number of groups) df2 = n – k (n = sum of sample sizes from all populations)
=1 2,~ df df
Between MSF FWithin MS
Interpreting One-Way ANOVA F Statistic
The F statistic is the ratio of the among estimate of variance and the within estimate of variance The ratio must always be positive df1 = k -1 will typically be small df2 = n - k will typically be large FU is the critical value for α = .05
Decision Rule: Reject H0 if F > FU Otherwise do not
reject H0 0
α = .05
Reject H0 Do not reject H0
FU
Example
You want to see if three different golf clubs yield different distances. You randomly select five measurements from trials on an automated driving machine for each club. At the 0.05 significance level, is there a difference in mean distance?
Club 1 Club 2 Club 3 254 234 200 263 218 222 241 235 197 237 227 206 251 216 204
• • • •
•
Example
270
260
250
240
230
220
210
200
190
• • • • •
• • • • •
Distance
Y 1 = 249.2 Y 2 = 226.0 Y 3 = 205.8
Y = 227.0
Club 1 Club 2 Club 3 254 234 200 263 218 222 241 235 197 237 227 206 251 216 204
Club 1 2 3
Y 1
Y 2
Y 3
Y
Example
Club 1 Club 2 Club 3 254 234 200 263 218 222 241 235 197 237 227 206 251 216 204
Y1 = 249.2
Y2 = 226.0
Y3 = 205.8
Y = 227.0
n1 = 5
n2 = 5
n3 = 5
n = 15
k = 3
B SS = 5 (249.2 – 227)2 + 5 (226 – 227)2 + 5 (205.8 – 227)2 = 4716.4
W SS = (254 – 249.2)2 + (263 – 249.2)2 +…+ (204 – 205.8)2 = 1119.6
BMS = 4716.4 / (3-1) = 2358.2
WMS = 1119.6 / (15-3) = 93.3 25.275
93.32358.2F ==
Test Statistic:
Decision:
Conclusion:
0 α = .05
FU = 3.89 Reject H0 Do not
reject H0
Critical Value:
FU = 3.89
Example H0: µ1 = µ2 = µ3 H1: µj not all equal α = 0.05 df1= 2, df2 = 12 Critical Value =FINV(0.05,2,12)=3.89
Test Statistic:
Decision:
Conclusion:
0 α = .05
FU = 3.89 Reject H0 Do not
reject H0
Critical Value:
FU = 3.89
Example H0: µ1 = µ2 = µ3 H1: µj not all equal α = 0.05 df1= 2, df2 = 12 Critical Value =FINV(0.05,2,12)=3.89
F =
BMSWMS
=2358.293.3
= 25.275
Test Statistic:
Decision:
Conclusion:
0 α = .05
FU = 3.89 Reject H0 Do not
reject H0
Critical Value:
FU = 3.89
Example H0: µ1 = µ2 = µ3 H1: µj not all equal α = 0.05 df1= 2, df2 = 12 Critical Value =FINV(0.05,2,12)=3.89
F = 25.275
F =
BMSWMS
=2358.293.3
= 25.275
Test Statistic:
Decision: Reject H0 at α = 0.05 Conclusion:
0 α = .05
FU = 3.89 Reject H0 Do not
reject H0
Critical Value:
FU = 3.89
Example H0: µ1 = µ2 = µ3 H1: µj not all equal α = 0.05 df1= 2, df2 = 12 Critical Value =FINV(0.05,2,12)=3.89
F = 25.275
F =
BMSWMS
=2358.293.3
= 25.275
Test Statistic:
Decision: Reject H0 at α = 0.05 Conclusion: There is evidence that
at least one µj differs from the rest
0 α = .05
FU = 3.89 Reject H0 Do not
reject H0
Critical Value:
FU = 3.89
Example H0: µ1 = µ2 = µ3 H1: µj not all equal α = 0.05 df1= 2, df2 = 12 Critical Value =FINV(0.05,2,12)=3.89
F =
BMSWMS
=2358.293.3
= 25.275
F = 25.275
One-Way ANOVA Table
Source of Variation
df SS MS (Variance)
Between Groups
B SS BMS =
Within Groups n - k W SS WMS =
Total n - 1 TSS = BSS+WSS
k - 1 BMS WMS
F ratio
k = number of groups n = sum of the sample sizes from all groups df = degrees of freedom
BSS k - 1 WSS n - k
F =
Source SS DF MS F P-value Between 4716.4 2 2358.2 25.76 <0.001
Within 1119.6 12 93.3
Total 5836.0
ANOVA Table
EXCEL ANOVA Analysis
EXCELDataData AnalysisANOVA: Single Factor
EXCEL ANOVA Analysis Results
Anova: Single Factor
SUMMARY
Groups Count Sum Average Variance
Column 1 5 1246 249.2 108.2
Column 2 5 1130 226 77.5
Column 3 5 1029 205.8 94.2
ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 4716.4 2 2358.2 25.27546 4.99E-05 3.885294
Within Groups 1119.6 12 93.3
Total 5836 14
Comparisons of specific groups in One-way ANOVA
What happens when the null hypothesis is rejected? We conclude that the population means are not all
equal, but we cannot be more specific than this. We often want to conduct additional tests to determine
where the differences lie. We need to perform post hoc test to confirm where the
differences occurred between groups. If the group variances are homogeneous, use
Tukey’s honestly significant difference (HSD)