Module 1: Definitie Basiscursus Steekproevenmavdwiel/SHDD/ShrinkBayes_slides.pdf · 1 Analysis of RNAseq data1 using ShrinkBayes Department of Epidemiology & Biostatistics VUmc, Amsterdam

1

Analysis of RNAseq data1 using ShrinkBayes

Department of Epidemiology & Biostatistics VUmc, Amsterdam

Mark van de Wiel [email protected]

1and other high-dimensional data

2

7. ShrinkBayes: plusses and minuses 6.

Overview

ShrinkBayes: inference ShrinkBayes: shrinkage 4.

5. ShrinkBayes: software

ShrinkBayes: model 3.

Contents

Why go Bayesian? 1. INLA 2.

3

Some Bayesian statistics

Source: http://xkcd.com/1132/

5

Bayes in a nutshell…

• Data: Y • Parameter of interest: β

• Known: likelihood π(Y| β) and prior π(β)

• ?Posterior: π(β | Y)

• Bayes’ rule:

• 95% credibility interval: CI= [q0.025(π(β | Y), q0.975(π(β | Y)]

• Simple inference: β ∈ CI?

∫==

β

βd)β(π)β|(π)β(π)β|(π

)(π)β(π)β|(π)|β(π

YY

YYY

Presentator

Presentatienotities

q_alpha(f) denotes the alpha quantile of density function f (or its corresponding distr function F)

6

Bayesian inference, single test Parameter of interest: gene expression difference between two conditions: primary and metastasis. Posteriors:

Presentator

Presentatienotities

Credibility interval: dotted lines. First: beta = 0 not in CI (null-hypothesis ‘rejected’), second: beta = 0 in CI.

7

Bayesian inference

Conjugate priors

Prior π(β) is conjugate to likelihood π(Y β) when the posterior has an analytical form. 1. Normal-Normal 2. Normal-Normal-Inverse Gamma (for variance σ2) 3. Binomial-Beta (=Beta-binomial distribution) 4. Poission-Gamma (= Negative Binomial) 5. .....

Mixtures of conjugates are also conjugate when the mixture components are known

8

Bayesian inference, conjugate priors, example

9

Bayesian inference, conjugate priors, example

1

1Ignoring constant -1/2

10

Bayesian inference, approximations

In many practical cases, exact formulas for the posteriors are not available, hence approximations are necessary: 1. Markov Chain Monte Carlo

Very popular. Computationally intensive. Needs to be developed for each new problem. Very general though

2. Variational Bayes approximations. Fast. Approximate posteriors by analytical formulas

assuming knowledge of conditional posteriors. Require specific development for each problem

3. Integrated Nested Laplace Integration (INLA)

11

INLA

12

Bayesian inference, GLM setting

13

• Approximations for marginal posteriors based on Laplace approximations • Fast, accurate alternative for MCMC

• Applicable to a wide variety of models, including GLM.

• Can also be used for count data

• Implementation in R: www.r-inla.org

INLA

14

INLA

15

Why go Bayesian?

Tossing of 10.000 coins, a simple example

• Setting: 10.000 coins, each flipped only three times

• Suppose 8000 fair coins: p(head) = 0.5; 1000 for which p(head) = 0.35 and 1000 for which p(head) = 0.65

• Some computations:

nk

:estimatetFrequentis

++n+k

=]k|p[E:estimateBayesian

),(Beta~p)p,n(Bin~X

βαα

βα

16

Why go Bayesian?

Beta-distribution

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

x

dens

ity

Black: beta(3,3) Red: beta(2,2) Green: beta(1,1) = U[0,1]

17

Why go Bayesian?

Estimates

nk

=)n,k(F:estimatetFrequentis

++n+k

=),,n,k(B=]k|p[E:estimateBayesianβα

αβα

Suppose k=1. F(1,3) = 1/3. B(1,3,1,1) = 2/5; B(1,3,3,3) = 4/9

Suppose k=0. F(1,3) = 0. B(1,3,1,1) = 1/5; B(1,3,3,3) = 3/9

In particular, the informative prior renders better estimates for the majority of coins (not for all)

18

Why go Bayesian?

Estimation also improves in terms of precision

Possibly more important: Bayesian estimates using an informative prior tend to be more precise (less variance) Show this for the beta-binomial coin tossing experiment. [exer]

19

Why go Bayesian?

Informative prior renders better estimates

Nice, but we don’t know it. True, but we can estimate it! Empirical Bayes: estimate the prior from data. Many different forms. Simplest one: Equate the theoretical moments of the data to the empirical moments.

20

Why go Bayesian?

Empirical Bayes

Equate theoretical moments of the data to empirical ones.

2m

1ii

222

22

22

m

1ii

)X-X()1-m/(1V̂

)βα/(αn))βα/(α(n-)]1-βα()βα/[(αβ)n-n(

]p[nE])p[E(n-]p[V)n-n(

])p[E-]p[E(n-]p[Vn]]p|X[V[E]]p|X[E[VVX

m/XX)βα/(αn]]p|X[E[EEX

)β,α(Beta~p),p,n(Bin~X

∑

∑

=

=

==

+++++=

+=

=+=

==+==

21

ShrinkBayes: motivating example

22

Motivating example • 25 libraries, 70,000 tags (tag clusters) • 5 brain regions • 7 donors • Main interest: which tags differ between brain regions (Groups)

23

Motivating example

24

Motivating example

25

Random effects

Source: http://anythingbutrbitrary.blogspot.fr/2012/06/random-regression-coefficients-using.html

26

ShrinkBayes: model

27

Model MODEL

Yij ~ ZI-NB(p0, μij, φi) = p0iδ(0) + (1-p0i)NB(μij, φi)

μij = g-1(Xjβi) βik ~ N(0,(τk)2

) [or βik ~ N(0,(τik)2 ); (τik)-2 ~ Г(α,β) ]

θil ~ Fl g : link function, e.g. g(x)=log(x) θil : hyper-parameter (not linked to mean): φi, p0i, etc.

Presentator

Presentatienotities

Same link function as for Poisson; this is the natural link function

28

Model Poisson,Negative Binomial, Zero-Inflated Negative Binomial

29

ShrinkBayes: shrinkage

30

ShrinkBayes How does Bayesian shrinkage work?

31

ShrinkBayes Why is shrinkage useful?

• Dispersion enables borrowing information across features leads to more stable estimates

• Parameters of interest (e.g. βi,group) accommodates multiplicity correction corrects for ‘selection bias’ caused by regression to the mean

• Nuisance parameters: (e.g. βi,batch) automatic ‘model selection’ property: prior shrinks to null when unimportant

32

ShrinkBayes: shrinkage Back to the example....

Yij ~ p0iδ(0) + (1-p0i)NB(μij, φi), φi = log(νi)

μij = exp(βi0 + βi,groupXg + βi,batch Xb + βi,indjXindj)

Priors1

1Vague priors for p0i and βi0

33

ShrinkBayes: shrinkage Priors for the example

How do we estimate the parameters of the priors?

34

ShrinkBayes Bayesian Empirical Bayes

35

ShrinkBayes Iterative estimation of the prior parameters

36

ShrinkBayes Nonparametric prior

• Shape of prior may be important, in particular for central parameter of interest (e.g. θi = βi,group) • How to get

from

θi~N(0,τ2)

to

θi~Fnp

?

37


• Empirical Bayes fitting is easy, remember:

• So, we simply sample from this mixture (given an initial prior) and smoothen it (e.g. using ‘density’ in R):

∑≈p

1=ii )|()( Yθπθπ

38


• Nice, but INLA does not accept non–parametric priors!

• Solution: 1. Use iterative procedure to find a good parametric

shrinkage prior: fp(θ) 2. Obtain posteriors under fp(θ) 3. Apply the same iterative procedure to obtain a non-

parametric estimate fnp, computing posteriors by:

==

=

∫∫ θd)θ(F)θ(f

)|θ(π1C1θd)|θ(π

]exer[)θ(f)θ(f

)|θ(πC)|θ(π

f

npipinp

p

npipinp

YY

YY

⇔

39

ShrinkBayes: inference

40

ShrinkBayes: inference Hypothesis testing in a Bayesian setting

• In a Bayesian setting it is very natural to test interval H0: |θi|≤ δ

• Using δ ≠ 0 avoids detecting small, biologically non-relevant effects

• Testing H0: |θi| ≤ δ trivial from posteriors: ...and BFDR(t) =E[lfdr|lfdr ≤ t] computed from lfdr as before1

i

δ

δ- iiiii0 θd)|θ(P)|δ|θ(|P)|H(Plfdr ∫≤ YYY ===

1More on False Discovery Rate (FDR) in later lectures

41

ShrinkBayes: inference Bayesian hypothesis testing: multiple comparisons

• Multiple comparisons: θikl = θik - θil for all (k,l). E.g. (βi,group1 - βi,group2, βi,group1 - βi,group3, βi,group2 - βi,group3, etc.)

• H0: |θikl|≤ δ for all (k,l).

kl)l,k(iikl)l,k(

iikl)l,k(iikl)l,k(

iikl)l,k(iikl)l,k(i0

lfdrmin)|δ|θ(|Pmin

)]|δ|θ|P-1[min)|δ|θ(|Pmax-1

)|δ|θ|(maxP-1)|δ|θ|(maxP)|H(Plfdr

==

>=>

>===

Y

YY

YYY

≤≤

≤

• So: if minimum lfdr < α then certainly global lfdr < α

• Also works for BFDR

Presentator

Presentatienotities

Inequallity comes from P[union (Ai)] >= P(Ai) for any i, hence also for the index corrsponding to the maximum probability

42


E.g: ( ) ),0(N)p1(+0p=)( 200 τδθπ -

∫+=

+=

θd)θ|(π)θ('π)p-1()0|(πp)0|(πp

)(Π)p-1()0|(πp)0|(πp

)(π00

0

00

00 YY

YYY

YY

+

Then1:

Point null-hypothesis in a Bayesian setting

• Recall that for testing H0: θi = 0 (e.g. θi = βi,group) prior needs to contain mass on 0!

1dropping index i

43

ShrinkBayes: inference Point null-hypothesis in a Bayesian setting

π(Y|0) and П(Y) are marginal likelihoods which are conveniently provided by INLA (where π(Y|0) is obtained after fitting the null-model: i.e. the model not containing βik

[ ]{ }

{ }∑

∑

=

=== p

1it)(π

p

1it)(π0

00

0

0

I

I)(πˆt)(π)|(πE)t(BFDR

≤

≤

≤Y

YYYY

∫+=

+=

θd)θ|(π)θ('π)p-1()0|(πp)0|(πp

)(Π)p-1()0|(πp)0|(πp

)(π00

0

00

00 YY

YYY

YY

44


0for,)p1(+)(/)0|(p

)|('f)p1(=)|(f

)()p1(+)0|(p)0|(p

=)|0=(P

00

0

00

0

≠θππ

θθ

πππ

θ

-YYY-

Y

Y-YY

Y

Posterior

Posterior is [exer]:

Where f’(θ|Y) is the posterior obtained by INLA under the Guassian prior N(0,τ2)

45

ShrinkBayes: Software

46

ShrinkBayes Demo

See the R-Vignette ShrinkBayesVignette.pdf for many examples

47

ShrinkBayes: Plusses and minuses

48

ShrinkBayes Plusses and minuses, comparison with edgeR

+ Better (in terms of power) for small sample sizes + Can deal with pairs, random effects, excess of zeros + Automatic model selection + More reproducible results - More difficult to use

- More time-consuming

- Bayesian concept, less familiar to most biologists

Documents

Module 1: Definitie Basiscursus Steekproevenmavdwiel/SHDD/ShrinkBayes_slides.pdf · 1 Analysis of RNAseq data1 using ShrinkBayes Department of Epidemiology & Biostatistics VUmc, Amsterdam