Upload
lamthuy
View
222
Download
0
Embed Size (px)
Citation preview
Foundations of Statistics Brad Efron
“Why do some of the bestBayesian ideas
come from Frequentists?”
— B. Efron, October 2007
•More a comparison of different attitudes thanof different philosophies
• Compromise Attitude Empirical Bayes
1
Foundations of Statistics Brad Efron
Stein Estimation (1955-1961)
• Observe
xiiid∼ N (µi, 1) i = 1, 2, . . . , N (N ≥ 3)
•Wish to estimate µ with squared error loss
L(µ, δ(x)) = ||δ(x)−µ||2 =
N∑i=1
(δi(x)−µi)2
•MLE δ0(x) = x
James-Stein δ1(x) =[1− N − 2∑
x2i
]x
• Theorem (James & Stein, 1961)For every µ
EµL(µ, δ1(x)) < EµL(µ, δ0(x))
2
Foundations of Statistics Brad Efron
Eighteen Baseball Players
Observed
Name hits/AB Ave ”TRUTH” James-Stein
Clemente 18/45 .400 .346 0.290
F Robinson 17/45 .378 .298 0.286
F Howard 16/45 .356 .276 0.281
Johnstone 15/45 .333 .222 0.277
Berry 14/45 .311 .273 0.273
Spencer 14/45 .311 .270 0.273
Kessinger 13/45 .289 .263 0.268
L Alvarado 12/45 .267 .210 0.264
Santo 11/45 .244 .269 0.259
Swoboda 11/45 .244 .230 0.259
Unser 10/34 .222 .264 0.254
Williams 10/45 .222 .256 0.254
Scott 10/45 .222 .303 0.254
Petrocelli 10/45 .222 .264 0.254
E Rodriguez 10/45 .222 .226 0.254
Campaneris 9/45 .200 .286 0.249
Munson 8/45 .178 .316 0.244
Alvis 7/45 .156 .200 0.239
Grand Average .265 .265 0.265
3
Foundations of Statistics Brad Efron
Bayesian Interpretation (Lindley, 1962?)
• Bayes Model
µ ∼ NN(0, AI)
x|µ ∼ NN(µ, I)⇒
x ∼ NN(0, (A + 1)I)
µ|x ∼ NN
( A
A + 1x,
A
A + 1I)
• Bayes Rule
δA(x) =A
A + 1x =
[1− 1
A + 1
]x
• Empirical Bayes Don’t know A:
if x ∼ NN (0, (A + 1)I) then
EA
{N − 2
‖x‖2
}=
1
A + 1
⇒ δ̂A(x) =
[1− N − 2
‖x‖2
]x (= JS!)
6
Foundations of Statistics Brad Efron
Relative Savings Loss
(Efron & Morris, 1972)
• µ ∼ NN (0, AI) x|µ ∼ N (µ, I)
• δA = AA+1x δ0 = x δ1 = JS
• RBayes = EA‖δA − µ‖2 = N AA+1
• R0 = N • R0 −RA = NA+1
• Theorem R1−RA
R0−RA = 2N
• “JS loses 2/N possible savings of Bayes Rule.”
• Question Which estimation problemsshould be combined?
7
Foundations of Statistics Brad Efron
Example of Stein Estimation
• N = 10 independent cases, with
x[i] ∼ N (µ[i], 1) i = 1, 2, 3, . . . , 10
• 1000 simulations: table shows average squarederror per trial for each case, and also the total.
• Stein is better overall but . . .µ MSEmle MSEstein
-0.81 0.95 0.61-0.39 1.04 0.62-0.39 1.03 0.62-0.08 0.99 0.580.69 1.06 0.670.71 0.98 0.631.28 0.95 0.711.32 1.04 0.771.89 1.00 0.884.00 1.08 2.04 !!
Total sqerr 10.12 8.13
• Relevance Functions: (Efron & Morris, 1972)Let individual coords opt out of the joint esti-mation procedure if far away from the others.
8
Foundations of Statistics Brad Efron
Large-Scale Hypothesis Testing
(Robbins 1951-1956)
Problem
• Observe xi ∼ N (µ, 1) i = 1, 2, . . . , N(Not necessarily independent)
• Simultaneously test all null hypothesesH0i : µi = 0.
Bonferroni
• p-value pi = Prob0{Xi ≥ xi} = 1− Φ(x)
• Reject all H0i with pi ≤ αN
⇒ Prob{reject any true H0i} ≤ α.
Robbins
• Asymptotically achieve Bayes risk as if youknew true {µ} distribution.
9
Foundations of Statistics Brad Efron
Microarray Example (Efron 2006)
• Prostate Study: n = 102 men,50 controls and 52 prostate cancer patients;N = 6033 genes in each microarray
•XN×n → two-sample t-stats “ti”,i = 1, 2, . . . , N
• z-values: zi = Φ−1(F100(ti))
[F100 is cdf of t100 dist.]
• Theoretical Null Hypothesis
zi ∼ N(0, 1)
• Simple Model:
zi ∼ N (µi, 1) H0i : µi = 0 [zi ⇔ “xi”]
10
Foundations of Statistics Brad Efron
Bayesian Two-Groups Model
• Each case (gene) either “null” or “non-null”,
prior probs
p0 = Prob{null} f0(z) density if null
p1 = Prob{non−null} f1(z) density if non-null
• Simple Model: f0(z) = ϕ(z),
f1(z) =
∫ ∞
−∞ϕ(z − µ)g1(µ)dµ
where g1(µ) prior density of non-zero µ’s
• Bayes Rule
Prob{null|zi = z} = p0f0(z)/f(z)
where f (z) = p0f0(z) + p0f1(z),
the mixture density.
12
Foundations of Statistics Brad Efron
False Discovery Rates
(Benjamini & Hochberg, 1995)
• F0(z) = Prob{Z ≥ z} = 1− Φ(z)
• F̂ (z) = #{zi ≥ z}/N
• F̂dr(z) = p0F0(z)/F̂ (z)
• BH Rule Reject all H0i for zi ≥ z0 where
z0 = minz{F̂dr(z) ≤ q}
• Theorem Expected proportion falsely rejected
nulls less than q.
True
Nulls
False
NReject
Accept
a b
c d
Reject
NTrue= E { b/NTrue }
Fdr = E { b/NReject }
13
Foundations of Statistics Brad Efron
Empirical Bayes Interpretation
• Bayesian Fdr:
Fdr(z) = Prob{null |zi ≥ z} = p0F0(z)/F (z)
• F̂dr(z) estimates Fdr(z)
• BH rule: Reject H0i if estimated
Prob{null |zi ≥ z} is ≤ q.
• F̂dr(z) = p0F0(z)/F̂ (z) only depends on
proportion zi’s ≥ z.
14
Foundations of Statistics Brad Efron
F̂dr Example, Prostate Study
• N(3.3) = 49 of the genes have zi ≥ 3.3
• Expected number of nulls ≥ 3.3 is
Np0(1−Φ(3.3)) = 6033 · .93 · .000483 = 7.6
• F̂dr(3.3) = 7.6/49 = .16 so
5
6of the 49 are non-null
• Exchangeability
Don’t know which 16 are false discoveries:
assign probability 1/6 to all 49
15
Foundations of Statistics Brad Efron
Relevance
• Hidden Assumption: All N cases equally rel-evant to inference for any particular case.
• Brain Study
n = 12 children, 6 dyslexic and 6 controls
• Each: DTI brain image giving response
at N = 15443 voxels
• ti = two-sample t-stat for ith voxel
⇒ zi = Φ−1(F10(ti))
(Schwartzman, et al., 2005)
•Next Figure plots zi vs di, distance voxelfrom back of brain
•Waves!
16
Foundations of Statistics Brad Efron
Relevance Functions (Efron 2007)
• ρi(d) ∈ [0, 1]: relevance of voxel at dto one at di
• E.g., ρi(d) = [1 + |d− di|/10]−1
• Lemma Fdr(zi) = Fdr(z)E0{ρi(D)|Z≥z}E{ρi(D)|Z≥z}
• Frequentists worry a lot about individualbad situations.
18
Foundations of Statistics Brad Efron
Wellcome SNP Study (2007)
• “ . . . we do not subscribe to the view thatone should correct significance levels for thenumber of tests performed . . .”
• Bayes: Prob{null |Z > z}
• Empirical Bayes: F̂dr(z) = p0(1−Φ(z))/F̂ (z)
• EB only depends on proportion
F̂ (z) = #{zi ≥ z}/N
• Suppose z1 < z2 < · · · < zN :
F̂dr(zN ) = p0(1− Φ(zN )/F̂ (zN )
= p0 · pval(zN )/(1/N)
so BH significant if F̂dr(zN ) ≤ q or
pval(zN ) ≤ q
p0
1
NBonferroni!
19
Foundations of Statistics Brad Efron
hyperparameter
density
density
The hierarchical Bayes model hyperparameter η sampledfrom hyperprior density h(·). (JASA, 1996: 538–565)
20
Foundations of Statistics Brad Efron
Estimation compared to
Hypothesis Testing
(Efron 2006)
• Common Model
µi ∼ g(µ) and zi |µi ∼ N (0, 1)
• Estimation: g(µ) smooth, e.g., N (0, A)
• Hypothesis Testing: g(µ) bumpy, e.g.,
p0δ0(µ) + (1− p0)N (0, A)
21
Foundations of Statistics Brad Efron
References
Benjamini & Hochberg (1995). JRSS-B 57, 289–300.
Efron (2007b). Microarrays, empirical Bayes, and the two-groups model. Available at http://www-stat.stanford.edu/∼brad/papers/twogroups.pdf.
Efron (2007c). Simultaneous inference: When should hy-pothesis testing problems be combined? Available at:http://www-stat.stanford.edu/∼brad/papers/combinationpaper.pdf.
Efron & Morris (1972). JASA 67 130–139.
James & Stern (1961). Proc. 4th Berkeley Symp. 1,361–379.
Robbins (1956). Proc. 3rd Berkeley Symp. 1, 157–163.
Schwartzman, Dougherty & Taylor (2005). Mag. Res. Med.53, 1423–1431.
Wellcome Trust (2007). Nature 447, 661–678. Availableat http://www.nature.com/nature/journal/v447/n7145/abs/nature05911.html.
22