Brad Efron - Stanford Universitystatweb.stanford.edu/~ckirby/brad/talks/2007BayesEBFreq.pdf · Foundations of Statistics Brad Efron h y p e r p a r a m e t e r d e n s i t y d e n

Bayesians, empirical Bayesiansand Frequentists

Brad Efron

“Foundations of Statistics”

October 2007

Foundations of Statistics Brad Efron

“Why do some of the bestBayesian ideas

come from Frequentists?”

— B. Efron, October 2007

•More a comparison of different attitudes thanof different philosophies

• Compromise Attitude Empirical Bayes

1


Stein Estimation (1955-1961)

• Observe

xiiid∼ N (µi, 1) i = 1, 2, . . . , N (N ≥ 3)

•Wish to estimate µ with squared error loss

L(µ, δ(x)) = ||δ(x)−µ||2 =

N∑i=1

(δi(x)−µi)2

•MLE δ0(x) = x

James-Stein δ1(x) =[1− N − 2∑

x2i

]x

• Theorem (James & Stein, 1961)For every µ

EµL(µ, δ1(x)) < EµL(µ, δ0(x))

2


Eighteen Baseball Players

Observed

Name hits/AB Ave ”TRUTH” James-Stein

Clemente 18/45 .400 .346 0.290

F Robinson 17/45 .378 .298 0.286

F Howard 16/45 .356 .276 0.281

Johnstone 15/45 .333 .222 0.277

Berry 14/45 .311 .273 0.273

Spencer 14/45 .311 .270 0.273

Kessinger 13/45 .289 .263 0.268

L Alvarado 12/45 .267 .210 0.264

Santo 11/45 .244 .269 0.259

Swoboda 11/45 .244 .230 0.259

Unser 10/34 .222 .264 0.254

Williams 10/45 .222 .256 0.254

Scott 10/45 .222 .303 0.254

Petrocelli 10/45 .222 .264 0.254

E Rodriguez 10/45 .222 .226 0.254

Campaneris 9/45 .200 .286 0.249

Munson 8/45 .178 .316 0.244

Alvis 7/45 .156 .200 0.239

Grand Average .265 .265 0.265

3


4


Pythagoras and Stein

5


Bayesian Interpretation (Lindley, 1962?)

• Bayes Model

µ ∼ NN(0, AI)

x|µ ∼ NN(µ, I)⇒

x ∼ NN(0, (A + 1)I)

µ|x ∼ NN

( A

A + 1x,

A

A + 1I)

• Bayes Rule

δA(x) =A

A + 1x =

[1− 1

A + 1

]x

• Empirical Bayes Don’t know A:

if x ∼ NN (0, (A + 1)I) then

EA

{N − 2

‖x‖2

}=

1

A + 1

⇒ δ̂A(x) =

[1− N − 2

‖x‖2

]x (= JS!)

6


Relative Savings Loss

(Efron & Morris, 1972)

• µ ∼ NN (0, AI) x|µ ∼ N (µ, I)

• δA = AA+1x δ0 = x δ1 = JS

• RBayes = EA‖δA − µ‖2 = N AA+1

• R0 = N • R0 −RA = NA+1

• Theorem R1−RA

R0−RA = 2N

• “JS loses 2/N possible savings of Bayes Rule.”

• Question Which estimation problemsshould be combined?

7


Example of Stein Estimation

• N = 10 independent cases, with

x[i] ∼ N (µ[i], 1) i = 1, 2, 3, . . . , 10

• 1000 simulations: table shows average squarederror per trial for each case, and also the total.

• Stein is better overall but . . .µ MSEmle MSEstein

-0.81 0.95 0.61-0.39 1.04 0.62-0.39 1.03 0.62-0.08 0.99 0.580.69 1.06 0.670.71 0.98 0.631.28 0.95 0.711.32 1.04 0.771.89 1.00 0.884.00 1.08 2.04 !!

Total sqerr 10.12 8.13

• Relevance Functions: (Efron & Morris, 1972)Let individual coords opt out of the joint esti-mation procedure if far away from the others.

8


Large-Scale Hypothesis Testing

(Robbins 1951-1956)

Problem

• Observe xi ∼ N (µ, 1) i = 1, 2, . . . , N(Not necessarily independent)

• Simultaneously test all null hypothesesH0i : µi = 0.

Bonferroni

• p-value pi = Prob0{Xi ≥ xi} = 1− Φ(x)

• Reject all H0i with pi ≤ αN

⇒ Prob{reject any true H0i} ≤ α.

Robbins

• Asymptotically achieve Bayes risk as if youknew true {µ} distribution.

9


Microarray Example (Efron 2006)

• Prostate Study: n = 102 men,50 controls and 52 prostate cancer patients;N = 6033 genes in each microarray

•XN×n → two-sample t-stats “ti”,i = 1, 2, . . . , N

• z-values: zi = Φ−1(F100(ti))

[F100 is cdf of t100 dist.]

• Theoretical Null Hypothesis

zi ∼ N(0, 1)

• Simple Model:

zi ∼ N (µi, 1) H0i : µi = 0 [zi ⇔ “xi”]

10


11


Bayesian Two-Groups Model

• Each case (gene) either “null” or “non-null”,

prior probs

p0 = Prob{null} f0(z) density if null

p1 = Prob{non−null} f1(z) density if non-null

• Simple Model: f0(z) = ϕ(z),

f1(z) =

∫ ∞

−∞ϕ(z − µ)g1(µ)dµ

where g1(µ) prior density of non-zero µ’s

• Bayes Rule

Prob{null|zi = z} = p0f0(z)/f(z)

where f (z) = p0f0(z) + p0f1(z),

the mixture density.

12


False Discovery Rates

(Benjamini & Hochberg, 1995)

• F0(z) = Prob{Z ≥ z} = 1− Φ(z)

• F̂ (z) = #{zi ≥ z}/N

• F̂dr(z) = p0F0(z)/F̂ (z)

• BH Rule Reject all H0i for zi ≥ z0 where

z0 = minz{F̂dr(z) ≤ q}

• Theorem Expected proportion falsely rejected

nulls less than q.

True

Nulls

False

NReject

Accept

a b

c d

Reject

NTrue= E { b/NTrue }

Fdr = E { b/NReject }

13


Empirical Bayes Interpretation

• Bayesian Fdr:

Fdr(z) = Prob{null |zi ≥ z} = p0F0(z)/F (z)

• F̂dr(z) estimates Fdr(z)

• BH rule: Reject H0i if estimated

Prob{null |zi ≥ z} is ≤ q.

• F̂dr(z) = p0F0(z)/F̂ (z) only depends on

proportion zi’s ≥ z.

14


F̂dr Example, Prostate Study

• N(3.3) = 49 of the genes have zi ≥ 3.3

• Expected number of nulls ≥ 3.3 is

Np0(1−Φ(3.3)) = 6033 · .93 · .000483 = 7.6

• F̂dr(3.3) = 7.6/49 = .16 so

5

6of the 49 are non-null

• Exchangeability

Don’t know which 16 are false discoveries:

assign probability 1/6 to all 49

15


Relevance

• Hidden Assumption: All N cases equally rel-evant to inference for any particular case.

• Brain Study

n = 12 children, 6 dyslexic and 6 controls

• Each: DTI brain image giving response

at N = 15443 voxels

• ti = two-sample t-stat for ith voxel

⇒ zi = Φ−1(F10(ti))

(Schwartzman, et al., 2005)

•Next Figure plots zi vs di, distance voxelfrom back of brain

•Waves!

16


17


Relevance Functions (Efron 2007)

• ρi(d) ∈ [0, 1]: relevance of voxel at dto one at di

• E.g., ρi(d) = [1 + |d− di|/10]−1

• Lemma Fdr(zi) = Fdr(z)E0{ρi(D)|Z≥z}E{ρi(D)|Z≥z}

• Frequentists worry a lot about individualbad situations.

18


Wellcome SNP Study (2007)

• “ . . . we do not subscribe to the view thatone should correct significance levels for thenumber of tests performed . . .”

• Bayes: Prob{null |Z > z}

• Empirical Bayes: F̂dr(z) = p0(1−Φ(z))/F̂ (z)

• EB only depends on proportion

F̂ (z) = #{zi ≥ z}/N

• Suppose z1 < z2 < · · · < zN :

F̂dr(zN ) = p0(1− Φ(zN )/F̂ (zN )

= p0 · pval(zN )/(1/N)

so BH significant if F̂dr(zN ) ≤ q or

pval(zN ) ≤ q

p0

1

NBonferroni!

19


hyperparameter

density

density

The hierarchical Bayes model hyperparameter η sampledfrom hyperprior density h(·). (JASA, 1996: 538–565)

20


Estimation compared to

Hypothesis Testing

(Efron 2006)

• Common Model

µi ∼ g(µ) and zi |µi ∼ N (0, 1)

• Estimation: g(µ) smooth, e.g., N (0, A)

• Hypothesis Testing: g(µ) bumpy, e.g.,

p0δ0(µ) + (1− p0)N (0, A)

21


References

Benjamini & Hochberg (1995). JRSS-B 57, 289–300.

Efron (2007b). Microarrays, empirical Bayes, and the two-groups model. Available at http://www-stat.stanford.edu/∼brad/papers/twogroups.pdf.

Efron (2007c). Simultaneous inference: When should hy-pothesis testing problems be combined? Available at:http://www-stat.stanford.edu/∼brad/papers/combinationpaper.pdf.

Efron & Morris (1972). JASA 67 130–139.

James & Stern (1961). Proc. 4th Berkeley Symp. 1,361–379.

Robbins (1956). Proc. 3rd Berkeley Symp. 1, 157–163.

Schwartzman, Dougherty & Taylor (2005). Mag. Res. Med.53, 1423–1431.

Wellcome Trust (2007). Nature 447, 661–678. Availableat http://www.nature.com/nature/journal/v447/n7145/abs/nature05911.html.

22

Documents

Brad Efron - Stanford Universitystatweb.stanford.edu/~ckirby/brad/talks/2007BayesEBFreq.pdf · Foundations of Statistics Brad Efron h y p e r p a r a m e t e r d e n s i t y d e n