Basic Business Statistics, 10/efenyolab.org/presentations/Introduction_Biostatistics... · 2015-10-17 · Basic Idea . partitioning the variation . Suppose there are . K. groups with

One-Population Tests

One Population

t Test (1 & 2 tail)

Z Test (1 & 2 tail)

Z Test

Mean Proportion

(1 & 2 tail)

One-sample test of proportion Z Test of Proportion Exact method using Binomial Distribution

Examples

Example 1. You’re an accounting manager. A year-end audit showed 4% of transactions had errors. You implement new procedures. A random sample of 500 transactions had 25 errors. Has the proportion of incorrect transactions changed? Use the 0.05 significance level.

H0: p = 0.04 vs. H1: p ≠ 0.04

Example 2. A researcher claims that less than 20% of adults in the U.S. are allergic to an herbal medicine. In a SRS of 25 adults, 3 say they have such an allergy. Does this support the researcher’s claim? Test at the 5% level.

H0: p = 0. 2 vs. H1: p < 0. 2

Binomial Distribution

X ~ Binomial (n, p) n = number of trials, p = probability of positive outcome Mean(X) = n p, Var(X) = n p(1- p)

X/n = p = proportion of positive outcomes in a sample of size n = p (population proportion)

Var( ) = p(1- p)/n By CLT, can be approximated by Normal:

if

^

5)1( ≥− pnp

)ˆ(E p

p̂

))1(,(~ˆn

pppNp −

One-Sample Z Test for Proportion

Hypothesis: H0: p=p0 v.s. H1: p≠p0

Assumptions Two Categorical Outcomes # of success follows Binomial distribution Normal approximation can be used If

500 ≥qnp

One-Sample Z Test for Proportion

Hypothesis: H0: p=p0 v.s. H1: p≠p0

Assumptions Two Categorical Outcomes # of success Population Follows Binomial Distribution Normal Approximation Can Be Used If

Z-test statistic for proportion

500 ≥qnp

n)p( 1p

ppZ00

0

−⋅−

=ˆ

Hypothesized population proportion

One-Sample Test of Proportion Example 1

You’re an accounting manager. A year-end audit showed 4% of transactions had errors. You implement new procedures. A random sample of 500 transactions had 25 errors. Has the proportion of incorrect transactions changed? Use the 0.05 significance level.

One-Sample Z Test of Proportion

H0: p=p0 = 0.04 Ha: p ≠ p0=0.04 α = .05 n = 500 Critical Value(s):

Test Statistic:

Decision: Conclusion:



Test Statistic:


0 0 500*0.04*(1 0.04) 19.2 5np q = −

= ≥ Z Test



Test Statistic:


14.1

500)04.1(04.

04.50025

ˆ=

−⋅

−=

−⋅−

≈

n)p( 1p

ppZ00

0

0 0 500*0.04*(1 0.04) 19.2 5np q = −

= ≥ Z Test



Test Statistic:


Z0 1.96-1.96

.025Reject H0 Reject H0

.025

14.1

500)04.1(04.

04.50025

ˆ=

−⋅

−=

−⋅−

≈

n)p( 1p

ppZ00

0

0 0 500*0.04*(1 0.04) 19.2 5np q = −

= ≥ Z Test



Test Statistic:


Z0 1.96-1.96

.025Reject H0 Reject H0

.025

Do not reject at α = .05

There is no evidence proportion has changed from 4%

14.1

500)04.1(04.

04.50025

ˆ=

−⋅

−=

−⋅−

≈

n)p( 1p

ppZ00

0

0 0 500*0.04*(1 0.04) 19.2 5np q = −

= ≥ Z Test

One-sample test of Proportion Example 2

A researcher claims that less than 20% of adults in the U.S. are allergic to an herbal medicine. In a SRS of 25 adults, 3 say they have such an allergy. Does this support the researcher’s claim? Test at the 5% level.

Is ?

25 * 0.2 * 0.8 = 4

500 ≥qnp

Exact Method using Binomial Distribution-One sided p-value

If Normal approximation cannot be used, i.e. if

then Ha : p<p0 one-sided p-value=P(X ≤ x success in n trials | H0) EXCEL: BINOMDIST(x,n,p0,TRUE) Ha : p >p0 one-sided p-value=P(X ≥ x success in n trials | H0) EXCEL: 1-BINOMDIST(x-1,n,p0,TRUE)

500 <qnp

0 00(1 )x k n k

k

np p

k−

=

= −

∑

0 0(1 )n k n kk x

np p

k−

=

= −

∑

Exact Method using Binomial Distribution-Two sided p-value

If Normal approximation cannot be used, i.e. if

then for Ha : p≠ p0, the two sided pvalue can be calculated by If p-value=2 P(X ≤ x success in n trials | H0) EXCEL: 2*BINOMDIST(x,n,p0,TRUE) If p-value=2* P(X>=x successs in n trials | H0) EXCEL: 2*(1-BINOMDIST(x-1,n,p0,TRUE))

500 <qnp

0 002 (1 )x k n k

k

np p

k−

=

= −

∑

0 02 (1 )n k n kk x

np p

k−

=

= −

∑

0ˆ xp pn

= <

0ˆ xp pn

= >

NOTE: TRUE: cumulative FALSE: probability mass

One-sample test of Proportion Example 2

A researcher claims that less than 20% of adults in the U.S. are allergic to an herbal medicine. In a SRS of 25 adults, 3 say they have such an allergy. Does this support the researcher’s claim? Test at the 5% level.

n=25 p0=0.2 x=3

3ˆ 0.1225

xpn

= = =

One-sample test of Proportion Example 2 Solution

H0: Ha: α = n =

Decision:

P-value = Conclusion:


H0: p = p0=0.2 Ha: p< p0=0.2 α = .05 n = 25 x=3

Decision:



H0: p = p0=0.2 Ha: p< p0=0.2 α = .05 n = 25 x=3

Decision:


kk

k k−

=

−

∑ 253

0)2.01(2.0

25

0 25 1 24

2 23 3 22

25 250.2 0.8 0.2 0.8

0 1

25 250.2 0.8 0.2 0.8

2 3

+ +

= +

EXCEL: BINOMDIST(3,25,0.2,TRUE)


H0: p = p0=0.2 Ha: p0< p0=0.2 α = .05 n = 25 x=3

Decision: Do not reject at α = .05


kk

k k−

=

−

∑ 253

0)2.01(2.0

25

05.0234.0

8.02.0325

8.02.0225

8.02.0125

8.02.0025

223232

241250

>=

+

+

+

=

There is no evidence Proportion is less than 20%

EXCEL: BINOMDIST(3,25,0.2,TRUE)

Review for Hypothesis Testing

One-sample tests for population mean, μ: Z-test if σ is known T-test if σ is unknown

One-sample test for population proportion, p: Z-test if npoqo ≥ 5 Exact method using Binomial distribution

Two-sample tests for difference in population means, μ1 – μ2: Independent samples:

Z-test if σ’s are known T-test with pooled estimate of variance if σ’s are unknown and can be assumed

equal T-test with unequal variances if σ’s are unknown and cannot be assumed equal

Paired samples: Paired Z-test for the difference if σ is known Paired T-test for the difference if σ is unknown

Two-sample test for difference in population variances: F-test with df1 = n1 – 1 and df2 = n2 – 1

Analysis of Variance (ANOVA) Multisample Inference

Learning Objectives

Until now, we have considered two groups of individuals and we've wanted to know if the two groups were sampled from distributions with equal population means or medians.

Suppose we would like to consider more than two groups of individuals and, in particular, test whether the groups were sampled from distributions with equal population means.

How to use one-way analysis of variance (ANOVA) to test for differences among the means of several populations ( “groups”)

Hypotheses of One-Way ANOVA

All population means are equal No treatment effect (no variation in means among groups)

At least one population mean is different There is a treatment effect Does not mean that all population means are different

(some pairs may be the same)

H1 :Not all of the population means are the same

One-Factor ANOVA

All means are the same: The null hypothesis is true

(No treatment effect)

One-Factor ANOVA

At least one mean is different: The null hypothesis is NOT true

(Treatment effect is present)

or

(continued)

One-Way ANOVA: Model Assumptions

The K random samples are drawn from K independent populations

The variances of the populations are identical The underlying data are approximately normally

distributed

Basic Idea partitioning the variation

Suppose there are K groups with observations. = =

=

-th observation in -th group, overall mean,

mean of group ij

i

y j i y

y i

( ) ( )= + − + −ij i ij iy y y y y y

Deviation of group mean from grand

mean

Deviation of observations from

group mean

Knnn ,...,, 21

Partitioning the variation

( ) ( )− = − + −ij ij i iy y y y y y

y ij − yi = Deviation of observations from group mean (within group variability)

Deviation of observations from overall mean (between group variability) − = iy y

Group 1 Group 2 Group 3

Response, X

y

y1 y2

y3

Partitioning the variation

( ) ( ) ( )− = − + −∑ ∑ ∑2 2 2

ij ij i iy y y y y yTotal variation (total SS)

Variation due to random sampling (within SS)

Variation due to factor (between SS)

Total variation is the sum of Within-group variability and Between-group variability

( ) ( )− = − + −ij ij i iy y y y y y

y ij − yi = Deviation of observations from group mean (within group variability)

Deviation of observations from overall mean (between group variability) − = iy y


Response, X


Response, X

If Between group variability is large and Within group variability is small => reject Ho

If Between group

variability is small and Within group variability is large => accept Ho

Basic Idea of ANOVA

Partition of Total Variation

Variation Due to Factor (Between SS)

Variation Due to Random Sampling (Within SS)

Total Variation (total SS)

Commonly referred to as: Sum of Squares Within Sum of Squares Error Sum of Squares Unexplained Within-Group Variation

Commonly referred to as: Sum of Squares Between Sum of Squares Among Sum of Squares Explained Among Groups Variation

= +

d.f. = n – 1

d.f. = k – 1 d.f. = n – k

Total Sum of Squares

= =

= −∑∑ 2

1 1( )

jnk

ijj i

Total SS y y

Where:

Total SS = Total sum of squares

k = number of groups (levels or treatments)

nj = number of observations in group j

yij = ith observation from group j

= grand mean (mean of all data values) y

Total SS = Between SS + Within SS

Total Variation


Response, X

= − + − + + −2 2 211 12( ) ( ) ... ( )

kknTotal SS y y y y y y

y

Between-Group Variation

y1


Response, X

= =

= − = − + − + + −∑∑ 2 2 2 21 1 2 2

1 1( ) ( ) ( ) ... ( )

jnk

j k kj i

Between SS y y n y y n y y n y y

y2

y3

y

Within-Group Variation

1Y3Y


Response, X

= = =

= − = −∑∑ ∑2 2

1 1 1( ) ( 1) *

ink k

ij i i ii j i

Within SS y y n S

(continued)

2Y

Obtaining the Mean Squares

Within MS =

Within SSn − k

Between MS =

Between SSk − 1

Total MS =

Total SSn − 1

One-Way ANOVA Table

Source of Variation

df SS MS (Variance)

Between Groups

B SS BMS =

Within Groups n - k W SS WMS =

Total n - 1 TSS = BSS+WSS

k - 1 BMS WMS

F ratio

k = number of groups n = sum of the sample sizes from all groups df = degrees of freedom

BSS k - 1 WSS n - k

F =

One-Way ANOVA F Test Statistic

Test statistic

Degrees of freedom

df1 = k – 1 (k = number of groups) df2 = n – k (n = sum of sample sizes from all populations)

=1 2,~ df df

Between MSF FWithin MS

Interpreting One-Way ANOVA F Statistic

The F statistic is the ratio of the among estimate of variance and the within estimate of variance The ratio must always be positive df1 = k -1 will typically be small df2 = n - k will typically be large FU is the critical value for α = .05

Decision Rule: Reject H0 if F > FU Otherwise do not

reject H0 0

α = .05

Reject H0 Do not reject H0

FU

Example

You want to see if three different golf clubs yield different distances. You randomly select five measurements from trials on an automated driving machine for each club. At the 0.05 significance level, is there a difference in mean distance?

Club 1 Club 2 Club 3 254 234 200 263 218 222 241 235 197 237 227 206 251 216 204

• • • •

•

Example

270

260

250

240

230

220

210

200

190

• • • • •

• • • • •

Distance

Y 1 = 249.2 Y 2 = 226.0 Y 3 = 205.8

Y = 227.0

Club 1 Club 2 Club 3 254 234 200 263 218 222 241 235 197 237 227 206 251 216 204

Club 1 2 3

Y 1

Y 2

Y 3

Y

Example

Club 1 Club 2 Club 3 254 234 200 263 218 222 241 235 197 237 227 206 251 216 204

Y1 = 249.2

Y2 = 226.0

Y3 = 205.8

Y = 227.0

n1 = 5

n2 = 5

n3 = 5

n = 15

k = 3

B SS = 5 (249.2 – 227)2 + 5 (226 – 227)2 + 5 (205.8 – 227)2 = 4716.4

W SS = (254 – 249.2)2 + (263 – 249.2)2 +…+ (204 – 205.8)2 = 1119.6

BMS = 4716.4 / (3-1) = 2358.2

WMS = 1119.6 / (15-3) = 93.3 25.275

93.32358.2F ==

Test Statistic:

Decision:

Conclusion:

0 α = .05

FU = 3.89 Reject H0 Do not

reject H0

Critical Value:

FU = 3.89

Example H0: µ1 = µ2 = µ3 H1: µj not all equal α = 0.05 df1= 2, df2 = 12 Critical Value =FINV(0.05,2,12)=3.89

Test Statistic:

Decision:

Conclusion:

0 α = .05


reject H0

Critical Value:

FU = 3.89


F =

BMSWMS

=2358.293.3

= 25.275

Test Statistic:

Decision:

Conclusion:

0 α = .05


reject H0

Critical Value:

FU = 3.89


F = 25.275

F =

BMSWMS

=2358.293.3

= 25.275

Test Statistic:

Decision: Reject H0 at α = 0.05 Conclusion:

0 α = .05


reject H0

Critical Value:

FU = 3.89


F = 25.275

F =

BMSWMS

=2358.293.3

= 25.275

Test Statistic:

Decision: Reject H0 at α = 0.05 Conclusion: There is evidence that

at least one µj differs from the rest

0 α = .05


reject H0

Critical Value:

FU = 3.89


F =

BMSWMS

=2358.293.3

= 25.275

F = 25.275

One-Way ANOVA Table

Source of Variation

df SS MS (Variance)

Between Groups

B SS BMS =

Within Groups n - k W SS WMS =

Total n - 1 TSS = BSS+WSS

k - 1 BMS WMS

F ratio

k = number of groups n = sum of the sample sizes from all groups df = degrees of freedom

BSS k - 1 WSS n - k

F =

Source SS DF MS F P-value Between 4716.4 2 2358.2 25.76 <0.001

Within 1119.6 12 93.3

Total 5836.0

ANOVA Table

EXCEL ANOVA Analysis

EXCELDataData AnalysisANOVA: Single Factor

EXCEL ANOVA Analysis Results

Anova: Single Factor

SUMMARY

Groups Count Sum Average Variance

Column 1 5 1246 249.2 108.2

Column 2 5 1130 226 77.5

Column 3 5 1029 205.8 94.2

ANOVA

Source of Variation SS df MS F P-value F crit

Between Groups 4716.4 2 2358.2 25.27546 4.99E-05 3.885294

Within Groups 1119.6 12 93.3

Total 5836 14

Comparisons of specific groups in One-way ANOVA

What happens when the null hypothesis is rejected? We conclude that the population means are not all

equal, but we cannot be more specific than this. We often want to conduct additional tests to determine

where the differences lie. We need to perform post hoc test to confirm where the

differences occurred between groups. If the group variances are homogeneous, use

Tukey’s honestly significant difference (HSD)

Documents

Basic Business Statistics, 10/efenyolab.org/presentations/Introduction_Biostatistics... · 2015-10-17 · Basic Idea . partitioning the variation . Suppose there are . K. groups with