Upload
kallie-baye
View
216
Download
0
Embed Size (px)
Citation preview
Copyright 2004 David J. Lilja 1
Comparing Two Alternatives Use confidence intervals for
Before-and-after comparisons Noncorresponding measurements
Copyright 2004 David J. Lilja 2
Comparing more than two alternatives
ANOVA Analysis of Variance
Partitions total variation in a set of measurements into Variation due to real differences in alternatives Variation due to errors
Copyright 2004 David J. Lilja 3
Comparing Two Alternatives
1. Before-and-afterDid a change to the system have a statistically significant impact on performance?
2. Non-corresponding measurementsIs there a statistically significant difference between two different systems?
Copyright 2004 David J. Lilja 4
Before-and-After Comparison
Assumptions Before-and-after measurements are not
independent Variances in two sets of measurements may not
be equal
→ Measurements are related Form obvious corresponding pairs
Find mean of differences
Copyright 2004 David J. Lilja 5
Before-and-After Comparison
n
stdcc
ds
dd
bad
a
b
dn
id
i
iii
i
i
1;2/121 ),(
ofdeviation standard
of mean value
tmeasuremenafter
tmeasuremen before
Copyright 2004 David J. Lilja 6
Before-and-After Comparison
Measurement
(i)
Before
(bi)
After
(ai)
Difference
(di = bi – ai)
1 85 86 -1
2 83 88 -5
3 94 90 4
4 90 95 -5
5 88 91 -3
6 87 83 4
Copyright 2004 David J. Lilja 7
Before-and-After Comparison
From mean of differences, appears that change reduced performance.
However, standard deviation is large.
15.4deviation Standard
1sdifference ofMean
ds
d
Copyright 2004 David J. Lilja 8
95% Confidence Interval for Mean of Differences
571.25;975.01;2/1 tt n
a
n 0.90 0.95 0.975
… … … …
5 1.476 2.015 2.571
6 1.440 1.943 2.447
… … … …
∞ 1.282 1.645 1.960
Copyright 2004 David J. Lilja 9
95% Confidence Interval for Mean of Differences
]36.3,36.5[
6
15.4571.21
571.2
2,1
2,1
5;975.01;2/1
1;2/12,1
c
c
ttn
stdc
n
dn
Copyright 2004 David J. Lilja 10
95% Confidence Interval for Mean of Differences
c1,2 = [-5.36, 3.36] Interval includes 0
→ With 95% confidence, there is no statistically significant difference between the two systems.
Copyright 2004 David J. Lilja 11
Noncorresponding Measurements
No direct correspondence between pairs of measurements
Unpaired observations n1 measurements of system 1
n2 measurements of system 2
Copyright 2004 David J. Lilja 12
Confidence Interval for Difference of Means
1. Compute means
2. Compute difference of means
3. Compute standard deviation of difference of means
4. Find confidence interval for this difference
5. No statistically significant difference between systems if interval includes 0
Copyright 2004 David J. Lilja 13
Confidence Interval for Difference of Means
2
22
1
21
21
:deviation standard Combined
:means of Difference
n
s
n
ss
xxx
x
Copyright 2004 David J. Lilja 14
Why Add Standard Deviations?
x1 x2
s2s1
x1 – x2
Copyright 2004 David J. Lilja 15
Number of Degrees of Freedom
11
2simply Not
2
2
222
1
2
121
2
2
22
1
21
21
nns
nns
ns
ns
n
nnn
df
df
Copyright 2004 David J. Lilja 16
Example – Noncorresponding Measurements
5.38
s 1243
tsmeasuremen 12
1
1
1
s
x
n
0.54
s 1085
tsmeasuremen 7
2
2
2
s
x
n
Copyright 2004 David J. Lilja 17
Example (cont.)
1062.9
17754
112125.38
754
125.38
24.237
54
12
5.38
15810851243
2222
222
22
21
df
x
n
s
xxx
Copyright 2004 David J. Lilja 18
Example (cont.): 90% CI
]200,116[
24.23813.1158
813.1
2,1
2,1
10;95.0;2/1
;2/12,1
c
c
tt
stxc
df
df
n
xn
Copyright 2004 David J. Lilja 19
A Special Case
If n1 <≈ 30 or n2 <≈ 30 And errors are normally distributed And s1 = s2 (standard deviations are equal)
OR
If n1 = n2
And errors are normally distributed Even if s1 is not equal s2
Then special case applies…
Copyright 2004 David J. Lilja 20
A Special Case
2
)1()1(
2
11),(
21
2221
21
21
21;2/121
nn
nsnss
nnn
nnstxcc
p
df
pndf
Typically produces tighter confidence interval Sometimes useful after obtaining additional
measurements to tease out small differences
Copyright 2004 David J. Lilja 21
Comparing Proportions
2 systemin events# total
2 systemin interest of events#
1 systemin events# total
1 systemin interest of events#
2
2
1
1
n
m
n
m
Copyright 2004 David J. Lilja 22
Comparing Proportions
Since it is a binomial distribution
i
ii
i
ii
n
pp
n
mp
)1(Variance
Mean
Copyright 2004 David J. Lilja 23
Comparing Proportions
p
p
szpcc
n
pp
n
pps
ppp
2/121
2
22
1
11
21
),(
)1()1(
)(
Copyright 2004 David J. Lilja 24
OS Example (cont)
Initial operating system n1 = 1,300,203 interrupts (3.5 hours) m1 = 142,892 interrupts occurred in OS code p1 = 0.1099, or 11% of time executing in OS
Upgrade OS n2 = 999,382 m2 = 84,876 p2 = 0.0849, or 8.5% of time executing in OS
Statistically significant improvement?
Copyright 2004 David J. Lilja 25
OS Example (cont)
p = p1 – p2 = 0.0250 sp = 0.0003911 90% confidence interval
(0.0242, 0.0257) Statistically significant difference?
Copyright 2004 David J. Lilja 26
Important Points
Use confidence intervals to determine if there are statistically significant differences Before-and-after comparisons
Find interval for mean of differences Noncorresponding measurements
Find interval for difference of means Proportions
If interval includes zero
→ No statistically significant difference
Copyright 2004 David J. Lilja 27
Comparing More Than Two Alternatives
Naïve approach Compare confidence intervals
Copyright 2004 David J. Lilja 28
One-Factor Analysis of Variance (ANOVA)
Very general technique Look at total variation in a set of measurements Divide into meaningful components
Also called One-way classification One-factor experimental design
Introduce basic concept with one-factor ANOVA
Generalize later with design of experiments
Copyright 2004 David J. Lilja 29
One-Factor Analysis of Variance (ANOVA)
Separates total variation observed in a set of measurements into:
1. Variation within one system Due to random measurement errors
2. Variation between systems Due to real differences + random error
Is variation(2) statistically > variation(1)?
Copyright 2004 David J. Lilja 30
ANOVA
Make n measurements of k alternatives yij = ith measurment on jth alternative Assumes errors are:
Independent Gaussian (normal)
Copyright 2004 David J. Lilja 31
Measurements for All Alternatives
Alternatives
Measurements
1 2 … j … k
1 y11 y12 … y1j … yk1
2 y21 y22 … y2j … y2k
… … … … … … …
i yi1 yi2 … yij … yik
… … … … … … …
n yn1 yn2 … ynj … ynk
Col mean y.1 y.2 … y.j … y.k
Effect α1 α2 … αj … αk
Copyright 2004 David J. Lilja 32
Column Means
Column means are average values of all measurements within a single alternative Average performance of one alternative
n
yy
n
i ijj
1.
Copyright 2004 David J. Lilja 33
Column Means
Alternatives
Measurements
1 2 … j … k
1 y11 y12 … y1j … yk1
2 y21 y22 … y2j … y2k
… … … … … … …
i yi1 yi2 … yij … yik
… … … … … … …
n yn1 yn2 … ynj … ynk
Col mean y.1 y.2 … y.j … y.k
Effect α1 α2 … αj … αk
Copyright 2004 David J. Lilja 34
Deviation From Column Mean
tsmeasuremenin error
meancolumn from ofdeviation
.
ijij
ijjij
ye
eyy
Copyright 2004 David J. Lilja 35
Error = Deviation From Column Mean
Alternatives
Measurements
1 2 … j … k
1 y11 y12 … y1j … yk1
2 y21 y22 … y2j … y2k
… … … … … … …
i yi1 yi2 … yij … yik
… … … … … … …
n yn1 yn2 … ynj … ynk
Col mean y.1 y.2 … y.j … y.k
Effect α1 α2 … αj … αk
Copyright 2004 David J. Lilja 36
Overall Mean
Average of all measurements made of all alternatives
kn
yy
k
j
n
i ij 1 1..
Copyright 2004 David J. Lilja 37
Overall Mean
Alternatives
Measurements
1 2 … j … k
1 y11 y12 … y1j … yk1
2 y21 y22 … y2j … y2k
… … … … … … …
i yi1 yi2 … yij … yik
… … … … … … …
n yn1 yn2 … ynj … ynk
Col mean y.1 y.2 … y.j … y.k
Effect α1 α2 … αj … αk
Copyright 2004 David J. Lilja 38
Deviation From Overall Mean
j
yy
j
jj
ealternativ ofeffect
mean overall frommean column ofdeviation
...
Copyright 2004 David J. Lilja 39
Effect = Deviation From Overall Mean
Alternatives
Measurements
1 2 … j … k
1 y11 y12 … y1j … yk1
2 y21 y22 … y2j … y2k
… … … … … … …
i yi1 yi2 … yij … yik
… … … … … … …
n yn1 yn2 … ynj … ynk
Col mean y.1 y.2 … y.j … y.k
Effect α1 α2 … αj … αk
Copyright 2004 David J. Lilja 40
Effects and Errors
Effect is distance from overall mean Horizontally across alternatives
Error is distance from column mean Vertically within one alternative Error across alternatives, too
Individual measurements are then:
ijjij eyy ..
Copyright 2004 David J. Lilja 41
Sum of Squares of Differences: SSE
2
1 1.
2
1 1
.
.
k
j
n
ijij
k
j
n
iij
jijij
ijjij
yyeSSE
yye
eyy
Copyright 2004 David J. Lilja 42
Sum of Squares of Differences: SSA
2
1...
2
1
...
...
k
jj
k
jj
jj
jj
yynnSSA
yy
yy
Copyright 2004 David J. Lilja 43
Sum of Squares of Differences: SST
2
1 1..
2
1 1
..
..
k
j
n
iij
k
j
n
iij
ijijjij
ijjij
yytSST
yyet
eyy
Copyright 2004 David J. Lilja 44
Sum of Squares of Differences
2
1 1..
2
1 1.
2
1...
k
j
n
iij
k
j
n
ijij
k
jj
yySST
yySSE
yynSSA
Copyright 2004 David J. Lilja 45
Sum of Squares of Differences
SST = differences between each measurement and overall mean
SSA = variation due to effects of alternatives SSE = variation due to errors in measurments
SSESSASST
Copyright 2004 David J. Lilja 46
ANOVA – Fundamental Idea
Separates variation in measured values into:
1. Variation due to effects of alternatives• SSA – variation across columns
2. Variation due to errors• SSE – variation within a single column
If differences among alternatives are due to real differences,
• SSA should be statistically > SSE
Copyright 2004 David J. Lilja 47
Comparing SSE and SSA
Simple approach SSA / SST = fraction of total variation explained
by differences among alternatives SSE / SST = fraction of total variation due to
experimental error But is it statistically significant?
Copyright 2004 David J. Lilja 48
Statistically Comparing SSE and SSA
df
SSxsx
2
freedom of degrees
variationtotal
valuesquaremean Variance
Copyright 2004 David J. Lilja 49
Degrees of Freedom
df(SSA) = k – 1, since k alternatives df(SSE) = k(n – 1), since k alternatives, each
with (n – 1) df df(SST) = df(SSA) + df(SSE) = kn - 1
Copyright 2004 David J. Lilja 50
Degrees of Freedom for Effects
Alternatives
Measurements
1 2 … j … k
1 y11 y12 … y1j … yk1
2 y21 y22 … y2j … y2k
… … … … … … …
i yi1 yi2 … yij … yik
… … … … … … …
n yn1 yn2 … ynj … ynk
Col mean y.1 y.2 … y.j … y.k
Effect α1 α2 … αj … αk
Copyright 2004 David J. Lilja 51
Degrees of Freedom for Errors
Alternatives
Measurements
1 2 … j … k
1 y11 y12 … y1j … yk1
2 y21 y22 … y2j … y2k
… … … … … … …
i yi1 yi2 … yij … yik
… … … … … … …
n yn1 yn2 … ynj … ynk
Col mean y.1 y.2 … y.j … y.k
Effect α1 α2 … αj … αk
Copyright 2004 David J. Lilja 52
Degrees of Freedom for Errors
Alternatives
Measurements
1 2 … j … k
1 y11 y12 … y1j … yk1
2 y21 y22 … y2j … y2k
… … … … … … …
i yi1 yi2 … yij … yik
… … … … … … …
n yn1 yn2 … ynj … ynk
Col mean y.1 y.2 … y.j … y.k
Effect α1 α2 … αj … αk
Copyright 2004 David J. Lilja 53
Variances from Sum of Squares (Mean Square Value)
)1(
1
2
2
nk
SSEs
k
SSAs
e
a
Copyright 2004 David J. Lilja 54
Comparing Variances
Use F-test to compare ratio of variances
valuescritical tabulated)](),(;1[
2
2
denomdfnumdf
e
a
F
s
sF
Copyright 2004 David J. Lilja 55
F-test
If Fcomputed > Ftable
→ We have (1 – α) * 100% confidence that variation due to actual differences in alternatives, SSA, is statistically greater than variation due to errors, SSE.
Copyright 2004 David J. Lilja 56
ANOVA Summary
)]1(),1(;1[
22
22
Tabulated
Computed
)]1([)1(squareMean
1)1(1freedom Deg
squares of Sum
TotalErroresAlternativVariation
nkk
ea
ea
FF
ssF
nkSSEskSSAs
knnkk
SSTSSESSA
Copyright 2004 David J. Lilja 57
ANOVA Example
Alternatives
Measurements 1 2 3 Overall mean
1 0.0972 0.1382 0.7966
2 0.0971 0.1432 0.5300
3 0.0969 0.1382 0.5152
4 0.1954 0.1730 0.6675
5 0.0974 0.1383 0.5298
Column mean 0.1168 0.1462 0.6078 0.2903
Effects -0.1735 -0.1441 0.3175
Copyright 2004 David J. Lilja 58
ANOVA Example
89.3 Tabulated
4.660057.03793.0 Computed
0057.03793.0squareMean
14112)1(21freedom Deg
8270.00685.07585.0squares of Sum
TotalErroresAlternativVariation
]12,2;95.0[
22
FF
F
ss
knnkk
SSTSSESSA
ea
Copyright 2004 David J. Lilja 59
Conclusions from example
SSA/SST = 0.7585/0.8270 = 0.917→ 91.7% of total variation in measurements is due to
differences among alternatives
SSE/SST = 0.0685/0.8270 = 0.083→ 8.3% of total variation in measurements is due to noise in
measurements
Computed F statistic > tabulated F statistic→ 95% confidence that differences among alternatives are
statistically significant.
Copyright 2004 David J. Lilja 60
Contrasts
ANOVA tells us that there is a statistically significant difference among alternatives
But it does not tell us where difference is Use method of contrasts to compare subsets
of alternatives A vs B {A, B} vs {C} Etc.
Copyright 2004 David J. Lilja 61
Contrasts
Contrast = linear combination of effects of alternatives
k
jj
k
jjj
w
wc
1
1
0
Copyright 2004 David J. Lilja 62
Contrasts
E.g. Compare effect of system 1 to effect of
system 2
21
321
3
2
1
)0()1()1(
0
1
1
c
w
w
w
Copyright 2004 David J. Lilja 63
Construct confidence interval for contrasts
Need Estimate of variance Appropriate value from t table
Compute confidence interval as before If interval includes 0
Then no statistically significant difference exists between the alternatives included in the contrast
Copyright 2004 David J. Lilja 64
Variance of random variables
Recall that, for independent random variables X1 and X2
]Var[]Var[
]Var[]Var[]Var[
12
1
2121
XaaX
XXXX
Copyright 2004 David J. Lilja 65
Variance of a contrast c
k
j jj
k
j jj
k
j jj
w
w
wc
1
2
1
1
]Var[
]Var[
)](Var[]Var[
)1()(
)1(
)(
2
2
1
22
2
nksdf
nk
SSEs
kn
sws
c
e
k
j ej
c
Assumes variation due to errors is equally distributed among kn total measurements
Copyright 2004 David J. Lilja 66
Confidence interval for contrasts
)1(
)(
),(
2
1
22
)1(;2/121
nk
SSEs
kn
sws
stccc
e
k
j ej
c
cnk
Copyright 2004 David J. Lilja 67
Example
90% confidence interval for contrast of [Sys1- Sys2]
)0196.0,0784.0(),(:%90
0275.0)5(3
0)1(1
0294.0)1441.0(1735.0
3175.0
1441.0
1735.0
21
222
]21[
3
2
1
cc
ss
c
ec
Copyright 2004 David J. Lilja 68
Important Points
Use one-factor ANOVA to separate total variation into:– Variation within one system
Due to random errors
– Variation between systems Due to real differences (+ random error)
Is the variation due to real differences statistically greater than the variation due to errors?
Copyright 2004 David J. Lilja 69
Important Points
Use contrasts to compare effects of subsets of alternatives