Copyright 2004 David J. Lilja1 Comparing Two Alternatives Use confidence intervals for Before-and-after comparisons Noncorresponding measurements

Copyright 2004 David J. Lilja 1

Comparing Two Alternatives Use confidence intervals for

Before-and-after comparisons Noncorresponding measurements


Comparing more than two alternatives

ANOVA Analysis of Variance

Partitions total variation in a set of measurements into Variation due to real differences in alternatives Variation due to errors


Comparing Two Alternatives

1. Before-and-afterDid a change to the system have a statistically significant impact on performance?

2. Non-corresponding measurementsIs there a statistically significant difference between two different systems?


Before-and-After Comparison

Assumptions Before-and-after measurements are not

independent Variances in two sets of measurements may not

be equal

→ Measurements are related Form obvious corresponding pairs

Find mean of differences



n

stdcc

ds

dd

bad

a

b

dn

id

i

iii

i

i

1;2/121 ),(

ofdeviation standard

of mean value

tmeasuremenafter

tmeasuremen before



Measurement

(i)

Before

(bi)

After

(ai)

Difference

(di = bi – ai)

1 85 86 -1

2 83 88 -5

3 94 90 4

4 90 95 -5

5 88 91 -3

6 87 83 4



From mean of differences, appears that change reduced performance.

However, standard deviation is large.

15.4deviation Standard

1sdifference ofMean

ds

d


95% Confidence Interval for Mean of Differences

571.25;975.01;2/1 tt n

a

n 0.90 0.95 0.975

… … … …

5 1.476 2.015 2.571

6 1.440 1.943 2.447

… … … …

∞ 1.282 1.645 1.960



]36.3,36.5[

6

15.4571.21

571.2

2,1

2,1

5;975.01;2/1

1;2/12,1

c

c

ttn

stdc

n

dn



c1,2 = [-5.36, 3.36] Interval includes 0

→ With 95% confidence, there is no statistically significant difference between the two systems.


Noncorresponding Measurements

No direct correspondence between pairs of measurements

Unpaired observations n1 measurements of system 1

n2 measurements of system 2


Confidence Interval for Difference of Means

1. Compute means

2. Compute difference of means

3. Compute standard deviation of difference of means

4. Find confidence interval for this difference

5. No statistically significant difference between systems if interval includes 0


Confidence Interval for Difference of Means

2

22

1

21

21

:deviation standard Combined

:means of Difference

n

s

n

ss

xxx

x


Why Add Standard Deviations?

x1 x2

s2s1

x1 – x2


Number of Degrees of Freedom

11

2simply Not

2

2

222

1

2

121

2

2

22

1

21

21

nns

nns

ns

ns

n

nnn

df

df


Example – Noncorresponding Measurements

5.38

s 1243

tsmeasuremen 12

1

1

1

s

x

n

0.54

s 1085

tsmeasuremen 7

2

2

2

s

x

n


Example (cont.)

1062.9

17754

112125.38

754

125.38

24.237

54

12

5.38

15810851243

2222

222

22

21

df

x

n

s

xxx


Example (cont.): 90% CI

]200,116[

24.23813.1158

813.1

2,1

2,1

10;95.0;2/1

;2/12,1

c

c

tt

stxc

df

df

n

xn


A Special Case

If n1 <≈ 30 or n2 <≈ 30 And errors are normally distributed And s1 = s2 (standard deviations are equal)

OR

If n1 = n2

And errors are normally distributed Even if s1 is not equal s2

Then special case applies…


A Special Case

2

)1()1(

2

11),(

21

2221

21

21

21;2/121

nn

nsnss

nnn

nnstxcc

p

df

pndf

Typically produces tighter confidence interval Sometimes useful after obtaining additional

measurements to tease out small differences


Comparing Proportions

2 systemin events# total

2 systemin interest of events#

1 systemin events# total

1 systemin interest of events#

2

2

1

1

n

m

n

m



Since it is a binomial distribution

i

ii

i

ii

n

pp

n

mp

)1(Variance

Mean



p

p

szpcc

n

pp

n

pps

ppp

2/121

2

22

1

11

21

),(

)1()1(

)(


OS Example (cont)

Initial operating system n1 = 1,300,203 interrupts (3.5 hours) m1 = 142,892 interrupts occurred in OS code p1 = 0.1099, or 11% of time executing in OS

Upgrade OS n2 = 999,382 m2 = 84,876 p2 = 0.0849, or 8.5% of time executing in OS

Statistically significant improvement?


OS Example (cont)

p = p1 – p2 = 0.0250 sp = 0.0003911 90% confidence interval

(0.0242, 0.0257) Statistically significant difference?


Important Points

Use confidence intervals to determine if there are statistically significant differences Before-and-after comparisons

Find interval for mean of differences Noncorresponding measurements

Find interval for difference of means Proportions

If interval includes zero

→ No statistically significant difference


Comparing More Than Two Alternatives

Naïve approach Compare confidence intervals


One-Factor Analysis of Variance (ANOVA)

Very general technique Look at total variation in a set of measurements Divide into meaningful components

Also called One-way classification One-factor experimental design

Introduce basic concept with one-factor ANOVA

Generalize later with design of experiments


One-Factor Analysis of Variance (ANOVA)

Separates total variation observed in a set of measurements into:

1. Variation within one system Due to random measurement errors

2. Variation between systems Due to real differences + random error

Is variation(2) statistically > variation(1)?


ANOVA

Make n measurements of k alternatives yij = ith measurment on jth alternative Assumes errors are:

Independent Gaussian (normal)


Measurements for All Alternatives

Alternatives

Measurements

1 2 … j … k

1 y11 y12 … y1j … yk1

2 y21 y22 … y2j … y2k

… … … … … … …

i yi1 yi2 … yij … yik

… … … … … … …

n yn1 yn2 … ynj … ynk

Col mean y.1 y.2 … y.j … y.k

Effect α1 α2 … αj … αk


Column Means

Column means are average values of all measurements within a single alternative Average performance of one alternative

n

yy

n

i ijj

1.


Column Means

Alternatives

Measurements

1 2 … j … k

1 y11 y12 … y1j … yk1

2 y21 y22 … y2j … y2k

… … … … … … …


… … … … … … …





Deviation From Column Mean

tsmeasuremenin error

meancolumn from ofdeviation

.

ijij

ijjij

ye

eyy


Error = Deviation From Column Mean

Alternatives

Measurements

1 2 … j … k

1 y11 y12 … y1j … yk1

2 y21 y22 … y2j … y2k

… … … … … … …


… … … … … … …





Overall Mean

Average of all measurements made of all alternatives

kn

yy

k

j

n

i ij 1 1..


Overall Mean

Alternatives

Measurements

1 2 … j … k

1 y11 y12 … y1j … yk1

2 y21 y22 … y2j … y2k

… … … … … … …


… … … … … … …





Deviation From Overall Mean

j

yy

j

jj

ealternativ ofeffect

mean overall frommean column ofdeviation

...


Effect = Deviation From Overall Mean

Alternatives

Measurements

1 2 … j … k

1 y11 y12 … y1j … yk1

2 y21 y22 … y2j … y2k

… … … … … … …


… … … … … … …





Effects and Errors

Effect is distance from overall mean Horizontally across alternatives

Error is distance from column mean Vertically within one alternative Error across alternatives, too

Individual measurements are then:

ijjij eyy ..


Sum of Squares of Differences: SSE

2

1 1.

2

1 1

.

.

k

j

n

ijij

k

j

n

iij

jijij

ijjij

yyeSSE

yye

eyy


Sum of Squares of Differences: SSA

2

1...

2

1

...

...

k

jj

k

jj

jj

jj

yynnSSA

yy

yy


Sum of Squares of Differences: SST

2

1 1..

2

1 1

..

..

k

j

n

iij

k

j

n

iij

ijijjij

ijjij

yytSST

yyet

eyy


Sum of Squares of Differences

2

1 1..

2

1 1.

2

1...

k

j

n

iij

k

j

n

ijij

k

jj

yySST

yySSE

yynSSA


Sum of Squares of Differences

SST = differences between each measurement and overall mean

SSA = variation due to effects of alternatives SSE = variation due to errors in measurments

SSESSASST


ANOVA – Fundamental Idea

Separates variation in measured values into:

1. Variation due to effects of alternatives• SSA – variation across columns

2. Variation due to errors• SSE – variation within a single column

If differences among alternatives are due to real differences,

• SSA should be statistically > SSE


Comparing SSE and SSA

Simple approach SSA / SST = fraction of total variation explained

by differences among alternatives SSE / SST = fraction of total variation due to

experimental error But is it statistically significant?


Statistically Comparing SSE and SSA

df

SSxsx

2

freedom of degrees

variationtotal

valuesquaremean Variance


Degrees of Freedom

df(SSA) = k – 1, since k alternatives df(SSE) = k(n – 1), since k alternatives, each

with (n – 1) df df(SST) = df(SSA) + df(SSE) = kn - 1


Degrees of Freedom for Effects

Alternatives

Measurements

1 2 … j … k

1 y11 y12 … y1j … yk1

2 y21 y22 … y2j … y2k

… … … … … … …


… … … … … … …





Degrees of Freedom for Errors

Alternatives

Measurements

1 2 … j … k

1 y11 y12 … y1j … yk1

2 y21 y22 … y2j … y2k

… … … … … … …


… … … … … … …





Degrees of Freedom for Errors

Alternatives

Measurements

1 2 … j … k

1 y11 y12 … y1j … yk1

2 y21 y22 … y2j … y2k

… … … … … … …


… … … … … … …





Variances from Sum of Squares (Mean Square Value)

)1(

1

2

2

nk

SSEs

k

SSAs

e

a


Comparing Variances

Use F-test to compare ratio of variances

valuescritical tabulated)](),(;1[

2

2

denomdfnumdf

e

a

F

s

sF


F-test

If Fcomputed > Ftable

→ We have (1 – α) * 100% confidence that variation due to actual differences in alternatives, SSA, is statistically greater than variation due to errors, SSE.


ANOVA Summary

)]1(),1(;1[

22

22

Tabulated

Computed

)]1([)1(squareMean

1)1(1freedom Deg

squares of Sum

TotalErroresAlternativVariation

nkk

ea

ea

FF

ssF

nkSSEskSSAs

knnkk

SSTSSESSA


ANOVA Example

Alternatives

Measurements 1 2 3 Overall mean

1 0.0972 0.1382 0.7966

2 0.0971 0.1432 0.5300

3 0.0969 0.1382 0.5152

4 0.1954 0.1730 0.6675

5 0.0974 0.1383 0.5298

Column mean 0.1168 0.1462 0.6078 0.2903

Effects -0.1735 -0.1441 0.3175


ANOVA Example

89.3 Tabulated

4.660057.03793.0 Computed

0057.03793.0squareMean

14112)1(21freedom Deg

8270.00685.07585.0squares of Sum

TotalErroresAlternativVariation

]12,2;95.0[

22

FF

F

ss

knnkk

SSTSSESSA

ea


Conclusions from example

SSA/SST = 0.7585/0.8270 = 0.917→ 91.7% of total variation in measurements is due to

differences among alternatives

SSE/SST = 0.0685/0.8270 = 0.083→ 8.3% of total variation in measurements is due to noise in

measurements

Computed F statistic > tabulated F statistic→ 95% confidence that differences among alternatives are

statistically significant.


Contrasts

ANOVA tells us that there is a statistically significant difference among alternatives

But it does not tell us where difference is Use method of contrasts to compare subsets

of alternatives A vs B {A, B} vs {C} Etc.


Contrasts

Contrast = linear combination of effects of alternatives

k

jj

k

jjj

w

wc

1

1

0


Contrasts

E.g. Compare effect of system 1 to effect of

system 2

21

321

3

2

1

)0()1()1(

0

1

1

c

w

w

w


Construct confidence interval for contrasts

Need Estimate of variance Appropriate value from t table

Compute confidence interval as before If interval includes 0

Then no statistically significant difference exists between the alternatives included in the contrast


Variance of random variables

Recall that, for independent random variables X1 and X2

]Var[]Var[

]Var[]Var[]Var[

12

1

2121

XaaX

XXXX


Variance of a contrast c

k

j jj

k

j jj

k

j jj

w

w

wc

1

2

1

1

]Var[

]Var[

)](Var[]Var[

)1()(

)1(

)(

2

2

1

22

2

nksdf

nk

SSEs

kn

sws

c

e

k

j ej

c

Assumes variation due to errors is equally distributed among kn total measurements


Confidence interval for contrasts

)1(

)(

),(

2

1

22

)1(;2/121

nk

SSEs

kn

sws

stccc

e

k

j ej

c

cnk


Example

90% confidence interval for contrast of [Sys1- Sys2]

)0196.0,0784.0(),(:%90

0275.0)5(3

0)1(1

0294.0)1441.0(1735.0

3175.0

1441.0

1735.0

21

222

]21[

3

2

1

cc

ss

c

ec


Important Points

Use one-factor ANOVA to separate total variation into:– Variation within one system

Due to random errors

– Variation between systems Due to real differences (+ random error)

Is the variation due to real differences statistically greater than the variation due to errors?


Important Points

Use contrasts to compare effects of subsets of alternatives

Documents

Copyright 2004 David J. Lilja1 Comparing Two Alternatives Use confidence intervals for Before-and-after comparisons Noncorresponding measurements