View
226
Download
0
Embed Size (px)
Citation preview
Lecture 10
• Inference about the difference between population proportions (Chapter 13.6)
• One-way analysis of variance (Chapter 15.2)
Testing p1 – p2
• There are two cases to consider:
Case 1: H0: p1-p2 =0
Calculate the pooled proportion
21
21
nn
xxp̂
Then Then
Case 2: H0: p1-p2 =D (D is not equal to 0)Do not pool the data
2
22 n
xp̂
1
11 n
xp̂
)n1
n1
)(p̂1(p̂
)pp()p̂p̂(Z
21
2121
)n1
n1
)(p̂1(p̂
)pp()p̂p̂(Z
21
2121
2
22
1
11
21
n)p̂1(p̂
n)p̂1(p̂
D)p̂p̂(Z
2
22
1
11
21
n)p̂1(p̂
n)p̂1(p̂
D)p̂p̂(Z
• Example 13.9 (Revisit Example 13.8)– Management needs to decide which of two new
packaging designs to adopt, to help improve sales of a certain soap.
– A study is performed in two supermarkets:– For the brightly-colored design to be financially
viable it has to outsell the simple design by at least 3%.
Testing p1 – p2 Testing p1 – p2
• Solution– The hypotheses to test are
H0: p1 - p2 = .03H1: p1 - p2 > .03
– We identify this application as case 2 (the hypothesized difference is not equal to zero).
Testing p1 – p2 (Case 2) Testing p1 – p2 (Case 2)
• Compute: Manually
The rejection region is z > z = z.05 = 1.645.Conclusion: Since 1.15 < 1.645 do not reject the null hypothesis. There is insufficient evidence to infer that the brightly-colored design will outsell the simple design by 3% or more.
Testing p1 – p2 (Case 2) Testing p1 – p2 (Case 2)
15.1
038,1
)1493.1(1493.
904
)1991.1(1991.
03.038,1
155
904
180
)ˆ1(ˆ)ˆ1(ˆ
)ˆˆ(
2
22
1
11
21
n
pp
n
pp
DppZ
Confidence Interval for
• confidence interval :
•
21 pp
%100)1(
2
22
1
112/21
)ˆ1(ˆ)ˆ1(ˆ)ˆˆ(
n
pp
n
ppzpp
Estimating p1 – p2 Estimating p1 – p2
• Estimating the cost of life saved– Two drugs are used to treat heart attack victims:
• Streptokinase (available since 1959, costs $460)
• t-PA (genetically engineered, costs $2900).
– The maker of t-PA claims that its drug outperforms Streptokinase.
– An experiment was conducted in 15 countries. • 20,500 patients were given t-PA
• 20,500 patients were given Streptokinase
• The number of deaths by heart attacks was recorded.
• Experiment results– A total of 1497 patients treated with
Streptokinase died.– A total of 1292 patients treated with t-PA died.
• Estimate the cost per life saved by using t-PA instead of Streptokinase.
Estimating p1 – p2 Estimating p1 – p2
• Interpretation– We estimate that between .51% and 1.49%
more heart attack victims will survive because of the use of t-PA.
– The difference in cost per life saved is 2900-460= $2440.
– The cost per life saved by switching to t-PA is estimated to be between 2440/.0149 = $163,758 and 2440/.0051 = $478,431
Estimating p1 – p2 Estimating p1 – p2
15.2 One-way ANOVA
• Analysis of variance compares two or more populations of interval data.
• Specifically, we are interested in determining whether differences exist between the population means.
• We obtain independent samples from each population.
• Generalization of two sample problem to two or more populations
Examples
• Compare the effect of three different teaching methods on test scores.
• Compare the effect of four different therapies on how long a cancer patient lives.
• Compare the effect of using different amounts of fertilizer on the yield of a crop.
• Compare the amount of time that ten different tire brands last.
• Example 15.1– An apple juice manufacturer is planning to develop
a new product -a liquid concentrate.– The marketing manager has to decide how to market
the new product.– Three strategies are considered
• Emphasize convenience of using the product.• Emphasize the quality of the product.• Emphasize the product’s low price.
One Way Analysis of Variance
• Example 15.1 - continued
– An experiment was conducted as follows:
• In three cities an advertisement campaign was launched .
• In each city only one of the three characteristics (convenience,
quality, and price) was emphasized.
• The weekly sales were recorded for twenty weeks following
the beginning of the campaigns.
One Way Analysis of Variance
One Way Analysis of Variance
Convnce Quality Price529 804 672658 630 531793 774 443514 717 596663 679 602719 604 502711 620 659606 697 689461 706 675529 615 512498 492 691663 719 733604 787 698495 699 776485 572 561557 523 572353 584 469557 634 581542 580 679614 624 532
Convnce Quality Price529 804 672658 630 531793 774 443514 717 596663 679 602719 604 502711 620 659606 697 689461 706 675529 615 512498 492 691663 719 733604 787 698495 699 776485 572 561557 523 572353 584 469557 634 581542 580 679614 624 532
See file Xm15 -01
Weekly sales
Weekly sales
Weekly sales
• Solution– The data are interval.
– The problem objective is to compare sales in three cities.
– We hypothesize that the three population means are equal.
One Way Analysis of Variance
H0: 1 = 2= 3
H1: At least two means differ
To build the statistic needed to test thehypotheses use the following notation:
• Solution
Defining the Hypotheses
Independent samples are drawn from k populations (treatments).
1 2 kX11
x21
.
.
.Xn1,1
1
1x
n
X12
x22
.
.
.Xn2,2
2
2x
n
X1k
x2k
.
.
.Xnk,k
k
kx
n
Sample sizeSample mean
First observation,first sample
Second observation,second sample
X is the “response variable”.The variables’ value are called “responses”.
Notation
Terminology
• In the context of this problem…Response variable – weekly salesResponses – actual sale valuesExperimental unit – weeks in the three cities when we record sales figures.Factor – the criterion by which we classify the populations (the treatments). In this problems the factor is the marketing strategy.
Factor levels – the population (treatment) names. In this problem factor levels are the marketing strategies.
Rationale Behind Test Statistic
• Two types of variability are employed when testing for the equality of population means– Variability of the sample means– Variability within samples
• Test statistic is essentially (Variability of the sample means)/(Variability within samples)
The rationale behind the test statistic – I
• If the null hypothesis is true, we would expect all the sample means to be close to one another (and as a result, close to the grand mean).
• If the alternative hypothesis is true, at least some of the sample means would differ.
• Thus, we measure variability between sample means.
• The variability between the sample means is measured as the sum of squared distances between each mean and the grand mean.
This sum is called the
Sum of Squares for Treatments
SSTIn our example treatments arerepresented by the differentadvertising strategies.
Variability between sample means
2k
1jjj )xx(nSST
There are k treatments
The size of sample j The mean of sample j
Sum of squares for treatments (SST)
Note: When the sample means are close toone another, their distance from the grand mean is small, leading to a small SST. Thus, large SST indicates large variation between sample means, which supports H1.
• Solution – continuedCalculate SST
2k
1jjj
321
)xx(nSST
65.608x00.653x577.55x
= 20(577.55 - 613.07)2 + + 20(653.00 - 613.07)2 + + 20(608.65 - 613.07)2 == 57,512.23
The grand mean is calculated by
k21
kk2211
n...nnxn...xnxn
X
Sum of squares for treatments (SST)
Is SST = 57,512.23 large enough to reject H0 in favor of H1?
Large compared to what?
Sum of squares for treatments (SST)
20
25
30
1
7
Treatment 1 Treatment 2 Treatment 3
10
12
19
9
Treatment 1Treatment 2Treatment 3
20
161514
1110
9
10x1
15x2
20x3
10x1
15x2
20x3
The sample means are the same as before,but the larger within-sample variability makes it harder to draw a conclusionabout the population means.
A small variability withinthe samples makes it easierto draw a conclusion about the population means.
• Large variability within the samples weakens the “ability” of the sample means to represent their corresponding population means.
• Therefore, even though sample means may markedly differ from one another, SST must be judged relative to the “within samples variability”.
The rationale behind test statistic – II
• The variability within samples is measured by adding all the squared distances between observations and their sample means.
This sum is called the
Sum of Squares for Error
SSEIn our example this is the sum of all squared differencesbetween sales in city j and thesample mean of city j (over all the three cities).
Within samples variability
• Solution – continuedCalculate SSE
Sum of squares for errors (SSE)
k
jjij
n
i
xxSSE
sss
j
1
2
1
23
22
21
)(
24.670,811,238,700.775,10
(n1 - 1)s12 + (n2 -1)s2
2 + (n3 -1)s32
= (20 -1)10,774.44 + (20 -1)7,238.61+ (20-1)8,670.24 = 506,983.50
Is SST = 57,512.23 large enough relative to SSE = 506,983.50 to reject the null hypothesis that specifies that all the means are equal?
Sum of squares for errors (SSE)
To perform the test we need to calculate the mean squaresmean squares as follows:
The mean sum of squares
Calculation of MST - Mean Square for Treatments
12.756,2813
23.512,571
k
SSTMST
Calculation of MSEMean Square for Error
45.894,8360
50.983,509
kn
SSEMSE
Calculation of the test statistic
23.3
45.894,8
12.756,28
MSE
MSTF
with the following degrees of freedom:v1=k -1 and v2=n-k
Required Conditions:1. The populations tested are normally distributed.2. The variances of all the populations tested are equal.
And finally the hypothesis test:
H0: 1 = 2 = …=k
H1: At least two means differ
Test statistic:
R.R: F>F,k-1,n-k
MSEMST
F
The F test rejection region
The F test
Ho: 1 = 2= 3
H1: At least two means differ
Test statistic F= MST MSE= 3.2315.3FFF:.R.R 360,13,05.0knk 1
Since 3.23 > 3.15, there is sufficient evidence to reject Ho in favor of H1, and argue that at least one of the mean sales is different than the others.
23.317.894,812.756,28
MSEMST
F