Upload
martin-jacobs
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Today
Review bootstrap estimate of se (from homework).
Review sign and permutation tests for paired samples.
Lots of examples of hypothesis tests.
Recall ...
There is a true value of the statistic. But we don’t know it.
We can compute the sample statistic.
We know sample means are normally distrubuted (as n gets big):
nˆ
n)x(se xx
But we don’t know anything about the distribution of other sample statistics (medians, correlations, etc.)!
Bootstrap world
unknown distribution F
observed random sample X
statistic of interest )X(sˆ
empirical distribution
bootstrap random sample X*
bootstrap replication *)X(s*ˆ
F̂
statistics about the estimate (e.g., standard error)
Bootstrap estimate of se
Run B bootstrap replicates, and compute the statistic each time:θ*[1], θ*[2], θ*[3], ..., θ*[B]
B
1i
2
B1B
*ˆ]i[*ˆ*ˆse
B
]i[*ˆ
*ˆ
B
1i
(mean of θ* across replications)
(sample standard deviation of θ* across replications)
Sign Test
H0: F and G have the same medianmedian(F) – median(G) = 0
Pr(x > y) = 0.5 sign(x – y) ~ binomial distribution compute bin(N+, 0.5)
N
Nn
5.0,nbinp
Example: gzip speed
build gzip with –O2 or with –O0
on about 650 filesout of 1000,gzip-O2 was faster
binomial distribution, p = 0.5, n = 1000p < 3 x 10-24
Permutation Test
H0: F = G Suppose difference in sample
means is d. How likely is this difference (or a
greater one) under H0? For i = 1 to P
Randomly permute each (xi, yi) Compute difference in sample means
Example: gzip speed
1000 permutations:difference ofsample meansunder H0 iscentered on 0
-1579 is veryextreme; p ≈ 0
Comparing speed is tricky!
It is very difficult to control for everything that could affect runtime.
Solution 1: do the best you can. Solution 2: many runs, and then
do ANOVA tests (or their nonparametric equivalents).
“Is there more variance between conditions than within conditions?”
Order effects
Well-known in psychology. What the subject does at time t
will affect what she does at time t+1.
Sign and Permutation Tests
median(F) median(G)
all distribution pairs (F, G) F G
sign test rejects H0
Sign and Permutation Tests
median(F) median(G)
all distribution pairs (F, G) F G
permutation test rejects H0
Sign and Permutation Tests
median(F) median(G)
all distribution pairs (F, G) F G
permutation test rejects H0
sign test rejects H0
There are other tests!
We have chosen two that are nonparametric easy to implement
Others include: Wilcoxon Signed Rank Test Kruskal-Wallis (nonparametric
“ANOVA”)
Pre-increment?
Conventional wisdom:
“Better to use ++x than to use x++.”
Really, with a modern compiler?
Two (toy) programs
for(i = 0; i < (1 << 30); ++i)j = ++k;
for(i = 0; i < (1 << 30); i++)j = k++;
ran each 200 times (interleaved) mean runtimes were 2.835 and 2.735 significant well below .05
What?
leal -8(%ebp), %eaxincl (%eax)movl -8(%ebp), %eax
movl -8(%ebp), %eaxleal -8(%ebp), %edxincl (%edx) %edx is not used anywhere else
Pre-increment, take 2
Take gzip source code. Replace all post-increments with
pre-increments, in places where semantics won’t change.
Run on 1000 files, 10 times each. Compare average runtime by file.
Conclusion
Preincrementing is faster!
... but what about –O? sign test: p = 0.197 permutation test: p = 0.672
Preincrement matters without an optimizing compiler.
Your programs ...
8 students had a working program both weeks.
6 people changed their code. 1 person changed nothing. 1 person changed to –O3. 3 people lossy in week 1. Everyone lossy in week 2!
Your programs!
Was there an improvement on compression between the two versions?
H0: No. Find sampling distribution of
difference in means, using permutations.
Homework Assignment 2
6 experiments:1. Does your program compress text or
images better?2. What about variance of compression?3. What about gzip’s compression?4. Variance of gzip’s compression?5. Was there a change in the
compression of your program from week 1 to week 2?
6. In the runtime?