34
Chapter 8. Inferences about More Than Two Population Central Values

Chapter 8. Inferences about More Than Two Population ...web.cjcu.edu.tw/~jdwu/biostat01/lect008.pdfamong the sample means for the five populations. • Under the null hypothesis that

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Chapter 8. Inferences about More Than Two Population ...web.cjcu.edu.tw/~jdwu/biostat01/lect008.pdfamong the sample means for the five populations. • Under the null hypothesis that

Chapter 8. Inferences about More Than Two

Population Central Values

Page 2: Chapter 8. Inferences about More Than Two Population ...web.cjcu.edu.tw/~jdwu/biostat01/lect008.pdfamong the sample means for the five populations. • Under the null hypothesis that

Case Study: Effect of Timing of the Treatment of Port-Wine Stains with Lasers (1)

• To investigate whether treatment at a young age would yield better results than treatment at an older age.

• Data classified by age:

Page 3: Chapter 8. Inferences about More Than Two Population ...web.cjcu.edu.tw/~jdwu/biostat01/lect008.pdfamong the sample means for the five populations. • Under the null hypothesis that

Case Study: Effect of Timing of the Treatment of Port-Wine Stains with Lasers (2)

Page 4: Chapter 8. Inferences about More Than Two Population ...web.cjcu.edu.tw/~jdwu/biostat01/lect008.pdfamong the sample means for the five populations. • Under the null hypothesis that

Case Study: Effect of Timing of the Treatment of Port-Wine Stains with Lasers (3)

• From the boxplots, we can observe that the four groups do not appear to have that great a difference in improvement.

• We can now develop the analysis of variance procedure to confirm whether a statistically significant difference exists between the four age groups.

Page 5: Chapter 8. Inferences about More Than Two Population ...web.cjcu.edu.tw/~jdwu/biostat01/lect008.pdfamong the sample means for the five populations. • Under the null hypothesis that

Within-Sample Variation

• Because the variability among the sample means is large in comparison to the within-sample variation, we might conclude intuitively that the corresponding population means are different.

Page 6: Chapter 8. Inferences about More Than Two Population ...web.cjcu.edu.tw/~jdwu/biostat01/lect008.pdfamong the sample means for the five populations. • Under the null hypothesis that

Between-Sample Variation

• In this table, the sample means are the same as given in the previous table, but the variability within a sample is much larger, and the between-sample variation is small relative to the within-sample variability.

• We would be less likely to conclude that the corresponding population means differ based on these data.

Page 7: Chapter 8. Inferences about More Than Two Population ...web.cjcu.edu.tw/~jdwu/biostat01/lect008.pdfamong the sample means for the five populations. • Under the null hypothesis that

An Analysis of Variance (1)

• A statistical test about more than two population means• T test is used to test the equality of two population means.

• is a pooled estimate of the common population variance . • Now suppose that we wish to extend this method to test the equality of more

than two population means. A more general method of data analysis is the analysis of variance.

)1()1( 21

21

nnsyyt

p +−

=

2)1()1(

)1()1()1()1(

21

222

211

21

222

2112

−+−+−

=−+−−+−

=nn

snsnnn

snsnsp

2pS 2σ

Page 8: Chapter 8. Inferences about More Than Two Population ...web.cjcu.edu.tw/~jdwu/biostat01/lect008.pdfamong the sample means for the five populations. • Under the null hypothesis that

An Analysis of Variance (2)• Summary of the samples results for five populations

• If we are interested in testing the equality of the population means (i.e., ) we might be tempted to run all possible pairwisecomparisons of two population means.

• If we confirm that the five distributions are approximately normal with the same variance σ2, we could run 10 t tests comparing all pairs of means.

• Although we may have probability of a Type error fixed at α=0.05 for each individual test, the probability of falsely rejecting at least one of those tests is larger than 0.05.

54321 µµµµµ ====

Page 9: Chapter 8. Inferences about More Than Two Population ...web.cjcu.edu.tw/~jdwu/biostat01/lect008.pdfamong the sample means for the five populations. • Under the null hypothesis that

An Analysis of Variance (3)

• The analysis of variance procedures are developed under the following conditions:– Each of the five populations has a normal distribution.– The variances of the five populations are equal: that

is, .– The five sets of measurements are independent random samples form

their respective populations.

225

24

23

22

21 σσσσσσ =====

Page 10: Chapter 8. Inferences about More Than Two Population ...web.cjcu.edu.tw/~jdwu/biostat01/lect008.pdfamong the sample means for the five populations. • Under the null hypothesis that

An Analysis of Variance (4)

• Within-sample variance

• Note that this quantity is merely an extension of

• represents a combined estimate of the common variance σ2, and it measures the variability of the observations within the five populations.

5)1()1()1()1()1(

)1()1()1()1()1()1()1()1()1()1(

54321

255

244

233

222

211

54321

255

244

233

222

2112

−++++−+−+−+−+−

=

−+−+−+−+−−+−+−+−+−

=

nnnnnSnSnSnSnSn

nnnnnSnSnSnSnSnSW

2)1()1(

)1()1()1()1(

21

222

211

21

222

2112

−+−+−

=−+−−+−

=nn

snsnnn

snsnsp

2WS

Page 11: Chapter 8. Inferences about More Than Two Population ...web.cjcu.edu.tw/~jdwu/biostat01/lect008.pdfamong the sample means for the five populations. • Under the null hypothesis that

An Analysis of Variance (5)

• If the null hypothesis is true, then the populations are identical, with mean µ and variance σ2.

• Drawing single samples from the five populations is then equivalent to drawing five different samples from the same population.

• To evaluate the variation in the five sample means, we need to know the sampling distribution of the sample mean computed from a random sample of 25 observations from a normal population.

• We can estimate the variance of the distribution of sample means σ2/25 , using the formula:

54321 µµµµµ ====

Page 12: Chapter 8. Inferences about More Than Two Population ...web.cjcu.edu.tw/~jdwu/biostat01/lect008.pdfamong the sample means for the five populations. • Under the null hypothesis that

An Analysis of Variance (6)

• Between-sample variance– The quantity estimates σ2/25 ,and hence 25 × (sample variance of the

means) estimates σ2.– The quantity as . The subscript B denotes a measure of the variability

among the sample means for the five populations.

• Under the null hypothesis that all five population means are identical, we have two estimates of σ2---namely, and . Suppose the ratio

is used as the test statistics to test the hypothesis that . • follows an F distribution with degrees of freedom df1=4 for and

df2=120 for .

2BS

2WS2

BS

2

2

W

B

SS

54321 µµµµµ ====

2

2

W

B

SS

2WS

2BS

Page 13: Chapter 8. Inferences about More Than Two Population ...web.cjcu.edu.tw/~jdwu/biostat01/lect008.pdfamong the sample means for the five populations. • Under the null hypothesis that

An Analysis of Variance (7)

• The test statistic used to test equality of the population means is

• When the null hypothesis is true, both and estimate σ2 , and expect F to assume a value near F = 1.

• When the hypothesis of equality is false, will tend to be large than due to the differences among the population means.

• If the calculated value of F falls in the rejection region, we conclude that not all five population means are identical.

2

2

W

B

SSF =

2BS 2

WS

2BS

2WS

Page 14: Chapter 8. Inferences about More Than Two Population ...web.cjcu.edu.tw/~jdwu/biostat01/lect008.pdfamong the sample means for the five populations. • Under the null hypothesis that

Analysis of Variance ---Completely Randomized Design (1)

• Summary of sample data for a completely randomized design

Page 15: Chapter 8. Inferences about More Than Two Population ...web.cjcu.edu.tw/~jdwu/biostat01/lect008.pdfamong the sample means for the five populations. • Under the null hypothesis that

Analysis of Variance ---Completely Randomized Design (2)

• Total sum of squares (TSS)– Let be the sample variance of the nT measurements.

– It is possible to partition the total sum of squares as follows:

– Within-sample sum of squares (SSW; )

– Between-sample sum of squares (SSB; )

2TS

∑∑= =

−=−=t

i

n

jTTij

i

SnyyTSS1 1

22.. )1()(

∑∑ ∑ −+−=−i

iiij ij

iijij yynyyyy 2...

2.

2.. )()()(

2WS

∑ −++−+−=−=ij

iiiij SnSnSnyySSW 2222

211

2. )1(...)1()1()(

2BS

∑ −=i

ii yynSSB 2... )(

Page 16: Chapter 8. Inferences about More Than Two Population ...web.cjcu.edu.tw/~jdwu/biostat01/lect008.pdfamong the sample means for the five populations. • Under the null hypothesis that

Analysis of Variance ---Completely Randomized Design (3)

• Although the formulas for TSS, SSW and SSB are easily interpreted, they are not easy to use for calculations. Instead, we recommend using a computer software program.

• An analysis of variance for a completely randomized design with tpopulations has the following null and alternative hypotheses:

• The quantities and can be computed using the shortcut formula

rest. thefrom differs means population t theof oneleast At :...: 3210

a

t

HµµµµH ====

2WS

2BS

12

−=tSSBSB tn

SSWST

W −=2

Page 17: Chapter 8. Inferences about More Than Two Population ...web.cjcu.edu.tw/~jdwu/biostat01/lect008.pdfamong the sample means for the five populations. • Under the null hypothesis that

Analysis of Variance Table

Page 18: Chapter 8. Inferences about More Than Two Population ...web.cjcu.edu.tw/~jdwu/biostat01/lect008.pdfamong the sample means for the five populations. • Under the null hypothesis that

Example 8.2

• A clinical psychologist wished to compare three methods for reducing hostility levels in university students, and used a certain test (HLT) to measure the degree of hostility.

Page 19: Chapter 8. Inferences about More Than Two Population ...web.cjcu.edu.tw/~jdwu/biostat01/lect008.pdfamong the sample means for the five populations. • Under the null hypothesis that

Answer to Example 8.2

Page 20: Chapter 8. Inferences about More Than Two Population ...web.cjcu.edu.tw/~jdwu/biostat01/lect008.pdfamong the sample means for the five populations. • Under the null hypothesis that

The Model for Observation in a Completely Randomized Design (One-Way Classification)

• Assumptions– Independent random samples– Each sample is selected from a normal population.– The mean and variance for population i are, respectively, μi and σ2.

• Four distributions

Page 21: Chapter 8. Inferences about More Than Two Population ...web.cjcu.edu.tw/~jdwu/biostat01/lect008.pdfamong the sample means for the five populations. • Under the null hypothesis that

Model for Analysis of Variance (1)

• yij : the jth sample measurement selected from population i,is the sum of three terms.

• μ: denotes an overall mean that is an unknown constant.• αi: denotes an effect due to population i. αi is an

unknown constant.

• εij: represents the random deviation of yij about the ithpopulation mean, μi. The εij‘s are often referred to aserror terms. The term error simply refers to the fact thatthe observations from the t populations differ by morethan just their means.

ijiij εαµy ++=

Page 22: Chapter 8. Inferences about More Than Two Population ...web.cjcu.edu.tw/~jdwu/biostat01/lect008.pdfamong the sample means for the five populations. • Under the null hypothesis that

Model for Analysis of Variance (2)

• The εij‘s are normally distributed with mean 0 and variance . Also, the variance for each of the t populations can be shown to be .

• Summary of some of the assumptions for a completely randomized design

2εσ

2εσ

iijiijiiji αµεEαµεαµEyEµ +=++=++== )()()(

Page 23: Chapter 8. Inferences about More Than Two Population ...web.cjcu.edu.tw/~jdwu/biostat01/lect008.pdfamong the sample means for the five populations. • Under the null hypothesis that

Model for Analysis of Variance (3)

• Using the model, the null hypothesis is:

• We need to verify that these conditions are satisfied prior to making inferences from the analysis of variance table.– The normality condition is not as critical as the equal variance

assumption when we have large sample sizes unless the populations are severely skewed or have very heavy tails.

– The assumption of homogeneity (equality) of population variances is less critical when the sample sizes are nearly equal, where the variances can be markedly different and the p-values for an analysis of variance will still be only mildly distorted.

0. from differs s theof oneleast At :0...:

i

210

αHαααH

a

t ====

Page 24: Chapter 8. Inferences about More Than Two Population ...web.cjcu.edu.tw/~jdwu/biostat01/lect008.pdfamong the sample means for the five populations. • Under the null hypothesis that

Model for Analysis of Variance (4)--- Residual Analysis

• The evaluation of the normal condition will be evaluated using residual analysis.

• Then if the condition of equal variances is valid, the εijs are a random sample from a normal population, However, μi is an unknown constant, but if we estimate μi with , and let

• Then we can use the eijs to evaluate the normality assumption. Even when the individual nis are small, we would have nT residuals, which would provide a sufficient number of values to evaluate the normality condition.

• We can plot the eijs in a boxplot or a normality plot to evaluate whether the data appear to have been generate from normal populations.

iijij µyε −=

.iy

.iijij yye −=

Page 25: Chapter 8. Inferences about More Than Two Population ...web.cjcu.edu.tw/~jdwu/biostat01/lect008.pdfamong the sample means for the five populations. • Under the null hypothesis that

Example 8.3 • An international organization wanted to determine whether the clerics from

different religions have different levels of awareness with respect to the causes of mental illness. Three random samples were drawn, one containing then Methodist ministers, a second containing ten catholic priests, and a third containing ten Pentecostal ministers. Each of the 30 clerics was then examined, using a standard written test, to measure his or her knowledge about causes of mental illness. The test scores are listed in the following table.

Page 26: Chapter 8. Inferences about More Than Two Population ...web.cjcu.edu.tw/~jdwu/biostat01/lect008.pdfamong the sample means for the five populations. • Under the null hypothesis that

Answer to Example 8.3 (1)

• Residuals eij for clerics’knowledge of mental illness

.iijij yye −=

Page 27: Chapter 8. Inferences about More Than Two Population ...web.cjcu.edu.tw/~jdwu/biostat01/lect008.pdfamong the sample means for the five populations. • Under the null hypothesis that

Answer to Example 8.3 (2)

• Normal probability plot for residuals

• A lack of concentration of the residuals about the straight line.

Page 28: Chapter 8. Inferences about More Than Two Population ...web.cjcu.edu.tw/~jdwu/biostat01/lect008.pdfamong the sample means for the five populations. • Under the null hypothesis that

Answer to Example 8.3 (3)

Page 29: Chapter 8. Inferences about More Than Two Population ...web.cjcu.edu.tw/~jdwu/biostat01/lect008.pdfamong the sample means for the five populations. • Under the null hypothesis that

Answer to Example 8.3 (4)

• Equal variance test– Levine’s test statistics from L = MSB/MSW =178.3/186.9 = 0.95 ( <

the critical value 3.35). Thus we fail to reject the null hypothesis that the standard deviations are equal.

• The Kruskal-Wallis test can be used when the populations are nonnormal but have identical distributions under the null hypothesis.

Page 30: Chapter 8. Inferences about More Than Two Population ...web.cjcu.edu.tw/~jdwu/biostat01/lect008.pdfamong the sample means for the five populations. • Under the null hypothesis that

Transformations of the Data ---An Alternative Analysis (1)

• A transformation of the sample data is defined to be a process in which the measurements on the original scale are systematically converted to a new scale of measurement.

• Transformation to achieve uniform variance

Page 31: Chapter 8. Inferences about More Than Two Population ...web.cjcu.edu.tw/~jdwu/biostat01/lect008.pdfamong the sample means for the five populations. • Under the null hypothesis that

Transformations of the Data ---An Alternative Analysis (2)

• When it appear that . The transformation is appropriate.

• The logarithmic transformation (yT = log(y)) is appropriate any time the coefficient of variation σi/μi is constant across the populations of interest.

• The transformation ( ) is particular appropriate for data recorded as percentages or proportions.

yyT arcsin=

1 with 2 ≈= kµkσ ii 375.0+= yyT

Page 32: Chapter 8. Inferences about More Than Two Population ...web.cjcu.edu.tw/~jdwu/biostat01/lect008.pdfamong the sample means for the five populations. • Under the null hypothesis that

A Nonparametric Alternative : The Kruskal-Wallis Test (1)

• Extension of the rank sum test for more than two populations– H0:The k distributions are identical.– Ha: Not all the distributions are the same.

∑ +−+

=i

Ti

i

TT

nnT

nnH )1(3

)1(12 2

ranked.been have tsmeasuremen sample combined after the samplein tsmeasuremen for the ranks theof sum thedenotes

and is, that size; sample (total) combined the:) ..., 2, 1, ( sample from nsobservatio ofnumber the:

i

Tnnnkiin

ii

iTT

i

∑==

Page 33: Chapter 8. Inferences about More Than Two Population ...web.cjcu.edu.tw/~jdwu/biostat01/lect008.pdfamong the sample means for the five populations. • Under the null hypothesis that

A Nonparametric Alternative : The Kruskal-Wallis Test (2)

• Note: When there are a large number of ties in the ranks of the sample measurements, use

where tj is the number of observations in the jth group of tied ranks.

∑ −−−=

jTTjj nntt

HH)]/()([1 33

'

Page 34: Chapter 8. Inferences about More Than Two Population ...web.cjcu.edu.tw/~jdwu/biostat01/lect008.pdfamong the sample means for the five populations. • Under the null hypothesis that

A Nonparametric Alternative : The Kruskal-Wallis Test (3)