23
Bayesians, empirical Bayesians and Frequentists Brad Efron “Foundations of Statistics” October 2007

Brad Efron - Stanford Universitystatweb.stanford.edu/~ckirby/brad/talks/2007BayesEBFreq.pdf · Foundations of Statistics Brad Efron h y p e r p a r a m e t e r d e n s i t y d e n

  • Upload
    lamthuy

  • View
    222

  • Download
    0

Embed Size (px)

Citation preview

Bayesians, empirical Bayesiansand Frequentists

Brad Efron

“Foundations of Statistics”

October 2007

Foundations of Statistics Brad Efron

“Why do some of the bestBayesian ideas

come from Frequentists?”

— B. Efron, October 2007

•More a comparison of different attitudes thanof different philosophies

• Compromise Attitude Empirical Bayes

1

Foundations of Statistics Brad Efron

Stein Estimation (1955-1961)

• Observe

xiiid∼ N (µi, 1) i = 1, 2, . . . , N (N ≥ 3)

•Wish to estimate µ with squared error loss

L(µ, δ(x)) = ||δ(x)−µ||2 =

N∑i=1

(δi(x)−µi)2

•MLE δ0(x) = x

James-Stein δ1(x) =[1− N − 2∑

x2i

]x

• Theorem (James & Stein, 1961)For every µ

EµL(µ, δ1(x)) < EµL(µ, δ0(x))

2

Foundations of Statistics Brad Efron

Eighteen Baseball Players

Observed

Name hits/AB Ave ”TRUTH” James-Stein

Clemente 18/45 .400 .346 0.290

F Robinson 17/45 .378 .298 0.286

F Howard 16/45 .356 .276 0.281

Johnstone 15/45 .333 .222 0.277

Berry 14/45 .311 .273 0.273

Spencer 14/45 .311 .270 0.273

Kessinger 13/45 .289 .263 0.268

L Alvarado 12/45 .267 .210 0.264

Santo 11/45 .244 .269 0.259

Swoboda 11/45 .244 .230 0.259

Unser 10/34 .222 .264 0.254

Williams 10/45 .222 .256 0.254

Scott 10/45 .222 .303 0.254

Petrocelli 10/45 .222 .264 0.254

E Rodriguez 10/45 .222 .226 0.254

Campaneris 9/45 .200 .286 0.249

Munson 8/45 .178 .316 0.244

Alvis 7/45 .156 .200 0.239

Grand Average .265 .265 0.265

3

Foundations of Statistics Brad Efron

4

Foundations of Statistics Brad Efron

Pythagoras and Stein

5

Foundations of Statistics Brad Efron

Bayesian Interpretation (Lindley, 1962?)

• Bayes Model

µ ∼ NN(0, AI)

x|µ ∼ NN(µ, I)⇒

x ∼ NN(0, (A + 1)I)

µ|x ∼ NN

( A

A + 1x,

A

A + 1I)

• Bayes Rule

δA(x) =A

A + 1x =

[1− 1

A + 1

]x

• Empirical Bayes Don’t know A:

if x ∼ NN (0, (A + 1)I) then

EA

{N − 2

‖x‖2

}=

1

A + 1

⇒ δ̂A(x) =

[1− N − 2

‖x‖2

]x (= JS!)

6

Foundations of Statistics Brad Efron

Relative Savings Loss

(Efron & Morris, 1972)

• µ ∼ NN (0, AI) x|µ ∼ N (µ, I)

• δA = AA+1x δ0 = x δ1 = JS

• RBayes = EA‖δA − µ‖2 = N AA+1

• R0 = N • R0 −RA = NA+1

• Theorem R1−RA

R0−RA = 2N

• “JS loses 2/N possible savings of Bayes Rule.”

• Question Which estimation problemsshould be combined?

7

Foundations of Statistics Brad Efron

Example of Stein Estimation

• N = 10 independent cases, with

x[i] ∼ N (µ[i], 1) i = 1, 2, 3, . . . , 10

• 1000 simulations: table shows average squarederror per trial for each case, and also the total.

• Stein is better overall but . . .µ MSEmle MSEstein

-0.81 0.95 0.61-0.39 1.04 0.62-0.39 1.03 0.62-0.08 0.99 0.580.69 1.06 0.670.71 0.98 0.631.28 0.95 0.711.32 1.04 0.771.89 1.00 0.884.00 1.08 2.04 !!

Total sqerr 10.12 8.13

• Relevance Functions: (Efron & Morris, 1972)Let individual coords opt out of the joint esti-mation procedure if far away from the others.

8

Foundations of Statistics Brad Efron

Large-Scale Hypothesis Testing

(Robbins 1951-1956)

Problem

• Observe xi ∼ N (µ, 1) i = 1, 2, . . . , N(Not necessarily independent)

• Simultaneously test all null hypothesesH0i : µi = 0.

Bonferroni

• p-value pi = Prob0{Xi ≥ xi} = 1− Φ(x)

• Reject all H0i with pi ≤ αN

⇒ Prob{reject any true H0i} ≤ α.

Robbins

• Asymptotically achieve Bayes risk as if youknew true {µ} distribution.

9

Foundations of Statistics Brad Efron

Microarray Example (Efron 2006)

• Prostate Study: n = 102 men,50 controls and 52 prostate cancer patients;N = 6033 genes in each microarray

•XN×n → two-sample t-stats “ti”,i = 1, 2, . . . , N

• z-values: zi = Φ−1(F100(ti))

[F100 is cdf of t100 dist.]

• Theoretical Null Hypothesis

zi ∼ N(0, 1)

• Simple Model:

zi ∼ N (µi, 1) H0i : µi = 0 [zi ⇔ “xi”]

10

Foundations of Statistics Brad Efron

11

Foundations of Statistics Brad Efron

Bayesian Two-Groups Model

• Each case (gene) either “null” or “non-null”,

prior probs

p0 = Prob{null} f0(z) density if null

p1 = Prob{non−null} f1(z) density if non-null

• Simple Model: f0(z) = ϕ(z),

f1(z) =

∫ ∞

−∞ϕ(z − µ)g1(µ)dµ

where g1(µ) prior density of non-zero µ’s

• Bayes Rule

Prob{null|zi = z} = p0f0(z)/f(z)

where f (z) = p0f0(z) + p0f1(z),

the mixture density.

12

Foundations of Statistics Brad Efron

False Discovery Rates

(Benjamini & Hochberg, 1995)

• F0(z) = Prob{Z ≥ z} = 1− Φ(z)

• F̂ (z) = #{zi ≥ z}/N

• F̂dr(z) = p0F0(z)/F̂ (z)

• BH Rule Reject all H0i for zi ≥ z0 where

z0 = minz{F̂dr(z) ≤ q}

• Theorem Expected proportion falsely rejected

nulls less than q.

True

Nulls

False

NReject

Accept

a b

c d

Reject

NTrue= E { b/NTrue }

Fdr = E { b/NReject }

13

Foundations of Statistics Brad Efron

Empirical Bayes Interpretation

• Bayesian Fdr:

Fdr(z) = Prob{null |zi ≥ z} = p0F0(z)/F (z)

• F̂dr(z) estimates Fdr(z)

• BH rule: Reject H0i if estimated

Prob{null |zi ≥ z} is ≤ q.

• F̂dr(z) = p0F0(z)/F̂ (z) only depends on

proportion zi’s ≥ z.

14

Foundations of Statistics Brad Efron

F̂dr Example, Prostate Study

• N(3.3) = 49 of the genes have zi ≥ 3.3

• Expected number of nulls ≥ 3.3 is

Np0(1−Φ(3.3)) = 6033 · .93 · .000483 = 7.6

• F̂dr(3.3) = 7.6/49 = .16 so

5

6of the 49 are non-null

• Exchangeability

Don’t know which 16 are false discoveries:

assign probability 1/6 to all 49

15

Foundations of Statistics Brad Efron

Relevance

• Hidden Assumption: All N cases equally rel-evant to inference for any particular case.

• Brain Study

n = 12 children, 6 dyslexic and 6 controls

• Each: DTI brain image giving response

at N = 15443 voxels

• ti = two-sample t-stat for ith voxel

⇒ zi = Φ−1(F10(ti))

(Schwartzman, et al., 2005)

•Next Figure plots zi vs di, distance voxelfrom back of brain

•Waves!

16

Foundations of Statistics Brad Efron

17

Foundations of Statistics Brad Efron

Relevance Functions (Efron 2007)

• ρi(d) ∈ [0, 1]: relevance of voxel at dto one at di

• E.g., ρi(d) = [1 + |d− di|/10]−1

• Lemma Fdr(zi) = Fdr(z)E0{ρi(D)|Z≥z}E{ρi(D)|Z≥z}

• Frequentists worry a lot about individualbad situations.

18

Foundations of Statistics Brad Efron

Wellcome SNP Study (2007)

• “ . . . we do not subscribe to the view thatone should correct significance levels for thenumber of tests performed . . .”

• Bayes: Prob{null |Z > z}

• Empirical Bayes: F̂dr(z) = p0(1−Φ(z))/F̂ (z)

• EB only depends on proportion

F̂ (z) = #{zi ≥ z}/N

• Suppose z1 < z2 < · · · < zN :

F̂dr(zN ) = p0(1− Φ(zN )/F̂ (zN )

= p0 · pval(zN )/(1/N)

so BH significant if F̂dr(zN ) ≤ q or

pval(zN ) ≤ q

p0

1

NBonferroni!

19

Foundations of Statistics Brad Efron

hyperparameter

density

density

The hierarchical Bayes model hyperparameter η sampledfrom hyperprior density h(·). (JASA, 1996: 538–565)

20

Foundations of Statistics Brad Efron

Estimation compared to

Hypothesis Testing

(Efron 2006)

• Common Model

µi ∼ g(µ) and zi |µi ∼ N (0, 1)

• Estimation: g(µ) smooth, e.g., N (0, A)

• Hypothesis Testing: g(µ) bumpy, e.g.,

p0δ0(µ) + (1− p0)N (0, A)

21

Foundations of Statistics Brad Efron

References

Benjamini & Hochberg (1995). JRSS-B 57, 289–300.

Efron (2007b). Microarrays, empirical Bayes, and the two-groups model. Available at http://www-stat.stanford.edu/∼brad/papers/twogroups.pdf.

Efron (2007c). Simultaneous inference: When should hy-pothesis testing problems be combined? Available at:http://www-stat.stanford.edu/∼brad/papers/combinationpaper.pdf.

Efron & Morris (1972). JASA 67 130–139.

James & Stern (1961). Proc. 4th Berkeley Symp. 1,361–379.

Robbins (1956). Proc. 3rd Berkeley Symp. 1, 157–163.

Schwartzman, Dougherty & Taylor (2005). Mag. Res. Med.53, 1423–1431.

Wellcome Trust (2007). Nature 447, 661–678. Availableat http://www.nature.com/nature/journal/v447/n7145/abs/nature05911.html.

22