Inference via Bayesian Synthetic Likelihoods for a Mixed-Effects SDE Model of Tumor Growth

Inference via Bayesian Synthetic Likelihoods fora Mixed-Effects SDE Model of Tumor Growth

Umberto PicchiniCentre for Mathematical Sciences,

Lund University

MCQMC 14–19 August 2016, Stanford University (CA)

Umberto Picchini ([email protected])

This is joint ongoing work with Julie Lyng Forman (Biostatistics unit,University of Copenhagen).

This presentation is based on the working paper:

Forman and P. (2016). Stochastic differential equation mixed effectsmodels for tumor growth and response to treatment,arXiv:1607.02633.


Nowadays there are several ways to deal with “intractable likelihoods”.

“Plug-and-play methods”: the only requirements is the ability to simulatefrom the data-generating-model.

particle marginal methods (PMMH, PMCMC) based on SMC filters[Andrieu et al. 2010].

(improved) Iterated filtering [Ionides et al. 2015]

approximate Bayesian computation (ABC) [Marin et al. 2012].

Synthetic likelihoods [Wood 2010].

In the following I focus on Synthetic Likelihoods.Andrieu et al. 2010. Particle Markov chain Monte Carlo methods. JRSS-B.

Ionides et al. 2015. Inference for dynamic and latent variable models via iterated,perturbed Bayes maps. PNAS.

Marin et al. 2012. Approximate Bayesian computational methods. Stat. Comput.

Wood 2010. Statistical inference for noisy nonlinear ecological dynamic systems.Nature.


In this talk:

We formulate a hierarchical/mixed effects state-space model fortumor growth.

the model can be treated with exact Bayesian inference usingparticle marginal methods.

we show that synthetic likelihoods work fine for this model.

This implies that, should we decide to make our model morecomplex, we can seriously consider the synthetic likelihoodapproach for non-state-space models.


Our experiment: a tumor xenography study

a tumor is grown in each mice in the study.

3 groups of mice: 2 groups get an experimental treatment groups+ 1 control group (no treatment).

experimental groups get treated with chemio or radiation therapy.

we wish to assess the effect of the treatments on tumor growth,that is estimate model parameters.

Only 5–8 mice per group. Data are sparse.


0 5 10 15 20 25 30 35 40

days

3.5

4

4.5

5

5.5

6

6.5

7

7.5

log

volu

me

(mm

3 )

group 1

0 5 10 15 20 25 30 35 40

days

2.5

3

3.5

4

4.5

5

5.5

6

6.5

7

7.5

log

volu

me

(mm

3 )

group 3

0 5 10 15 20 25 30 35

days

3

3.5

4

4.5

5

5.5

6

6.5

7

7.5

8lo

g vo

lum

e (m

m3 )

group 5

Figure: Data of log-volumes (mm3) for the three groups. (top left and right)treatment groups; (bottom) no treatment.


Figure: Source http://www.nature.com/articles/srep04384


We use the population approach1 for statistical estimation oflongitudinal data.Repeated measurements taken on a series of individuals/animals playan important role in biomedical research.

say that we have measurements on M subjects.

It is often reasonable to assume that responses follow the samemodel form for all subjects, but model parameters φi varyrandomly among individuals⇒ Linear/Nonlinear Mixed-Effectsmodels

φi ∼ p(φ|θ), i = 1, ..., M

it may be desirable to consider random variations into individualprocess dynamics (⇒ stochastic differential equations)

1M. Lavielle (2014), Mixed effects models for the population approach, CRCpress.


We formulate a state-space model accounting for:intra-individual variation: explained via an SDE;between-individuals variation: modelled by assuming “mixedeffects” φi ∼ p(φ|θ). Interest is on θ.residual variation.

Our data represent the size of the total volume Vi(t) at time t forsubject i = 1, ..., M.

For subject i, a fraction αi of the tumor volume has cells killed by thetreatment, 0 6 αi 6 1.

Vi(t) = Vsurv

i (t) + Vkilli (t)

Vkilli (0) = αivi,0 fraction of killed tumor volume

Vsurvi (0) = (1 − αi)vi,0 fraction of survived tumor volume

vi,0 = 100 [mm3] known starting tumor volume


SDE mixed effects model

For subject i we take ni measurements.

Yij = log(Vij) + εij, i = 1, ..., M; j = 1, ..., ni

Vi(t) = Vsurvi (t) + Vkill

i (t),dVsurv

i (t) = (βi + γ2/2)Vsurv

i (t)dt + γVsurvi (t)dBi(t), Vsurv

i (0) = (1 − αi)vi,0

dVkilli (t) = (−δi + τ

2/2)Vkilli (t)dt + τVkill

i (t)dWi(t), Vkilli (0) = αivi,0.

We assume Gaussian random effects, one realization per individual:

βi ∼ N(β,σ2β); δi ∼ N(δ,σ2

δ); αi ∼ N(0,1)(α,σ2α)

And Gaussian residual variation (independent of everything else)

εij ∼iid N(0,σ2ε)


Data Yij|Vi(tj) are conditionally independent.

Latent state {Vi(t)} is Markovian, conditionally on random effects.

The model is of state space type.

We wish to fit the model to the entire pool of data for M subjects.Different groups are fitted separately.

Notice that data are very sparse, which makes inference challenging.

We estimate all population parameters and residual variation:

θ = ( β, δ, α︸︷︷︸means random effects

, γ, τ︸︷︷︸intra-subj variation

, σ2β,σ2

δ,σ2α︸︷︷︸

variances random effects

, σ2ε︸︷︷︸

residual variance

)


We estimate the vector parameter θ using both synthetic likelihoodsand particle marginal methods.


a synthetic intro to Synthetic Likelihoods

Regardless the specific application, assume the following:

y: observed data, from static or dynamic models

s(y): (vector of) summary statistics of data, e.g. mean,autocorrelations, marginal quantiles etc.

assumes(y) ∼ N(µθ,Σθ)

an assumption justifiable via second order Taylor expansion(same as in Laplace approximations).

µθ and Σθ unknown: estimate them via simulations.


nature09319-f2.2.jpg (JPEG Image, 946 × 867 pixels) - Scaled (84%) http://www.nature.com/nature/journal/v466/n7310/images/nature09319...

1 of 1 29/05/2016 16:03

Figure: Figure from Wood 2010.Umberto Picchini ([email protected])

For fixed θ we simulate N artificial datasets y∗1 , ..., y∗N and computecorresponding (possibly vector valued) summaries s∗1 , ..., s∗N .

compute

µθ =1N

N∑i=1

s∗i , Σθ =1

N − 1

N∑i=1

(s∗i − µθ)(s∗i − µθ)′

compute the statistics sobs for the observed data y.

evaluate a multivariate Gaussian likelihood at sobs

LN(sobs|θ) := exp(lN(sobs|θ)) ∝1√|Σθ|

e−(sobs−µθ)Σ−1θ (sobs−µθ)/2

This synthetic likelihood can be maximized w.r.t. θ or be plugged in a(marginal) MCMC algorithm for Bayesian inference

πN(θ|sobs) ∝ LN(sobs|θ)π(θ)


Bayesian synthetic likelihoods

Actually we follow Pierce et al 20162 (see the appendix to theseslides).

They construct an unbiased estimator LN for a Gaussianlikelihood, this implies that for any statistic s

E(LN(s|θ)) = L(s|θ)

plug LN(sobs|θ) into a MCMC algorithm for inference on θ.

resulting draws have stationary distribution π(θ|sobs) notπN(θ|sobs).

The latter follows from Beaumont 2003, Andrieu and Roberts 2009.

2Price, Drovandi, Lee and Nott. Bayesian synthetic likelihood. 2016.http://eprints.qut.edu.au/92795/


Recall we have not one but M subjects to fit simultaneously.

Data are y = (y1, ..., yM).

We construct the following vector-statistics:

s = (sindiv1 , ..., sindiv

M , sbetween)

For subject i individual summaries sindivi contain:

mean absolute deviation for subject i;

slope of the line segment connecting the first and the lastobservation, (yi(tni) − yi(t1))/(tni − t1);

first two measurement values yi(t1), yi(t2);

the estimated slope βi1 from the autoregressionE(yij) = βi0 + βi1yi,j−1


Inter-individuals summaries sbetween include:

MAD{yi1}i=1:M, the mean absolute deviation between subjects atthe first time point;

the same as above but for the second time point;

min{yi1}i=1:M and max{yi1}i=1:M, that is the minimum andmaximum observed value across subjects at the first time point;

same as above but for the second time point;

SD{yi1}i=1:M, the standard deviation for the measurementsrecorded at the first time point, across subjects.


Our data (again)

0 5 10 15 20 25 30 35 40

days

3.5

4

4.5

5

5.5

6

6.5

7

7.5

log

volu

me

(mm

3 )

group 1

0 5 10 15 20 25 30 35 40

days

2.5

3

3.5

4

4.5

5

5.5

6

6.5

7

7.5

log

volu

me

(mm

3 )

group 3

0 5 10 15 20 25 30 35

days

3

3.5

4

4.5

5

5.5

6

6.5

7

7.5

8

log

volu

me

(mm

3 )

group 5


Therefore to run a single iteration of an MCMC algorithm usingsynthetic likelihoods we must:

simulate M independent realizations of the random effects, andcorresponding M subjects trajectories;

do the above N times (can be done in parallel);

compute summary statistics for the M × N trajectories.


R = 20, 000 MCMC iterations. Synthetic likelihoods are evaluated usingN = 5, 000 simulations from the model.

group 1 group 3 group 5β 7.26 [5.02,9.17] 4.59 [3.52,5.70] 6.35 [2.22,10.95]

7.16 [5.63,8.87] 4.83 [3.87,5.96] 4.95 [1.47,9.05]δ 2.29 [0.73,5.84] 1.62 [0.71,3.23] –

2.16 [0.66,5.48] 1.93 [0.60,4.49]α 0.39 [0.10,0.73] 0.62 [0.33,0.91] –

0.42 [0.27,0.60] 0.54 [0.26,0.86]γ 1.16 [0.75,1.71] 1.00 [0.72,1.39] 3.05 [2.37,3.91]

1.12 [0.76,1.49] 1.14 [0.79, 1.53] 2.14 [0.78,4.07]τ 1.36 [0.58,2.84] 2.61 [1.93,3.61] –

1.32 [0.67,2.35] 2.64 [2.07,3.47]σβ 0.50 [0.21,1.10] 0.46 [0.21,1.05] 0.47 [0.19,1.13]

0.46 [0.20,0.97] 0.34 [0.22,0.53] 0.51 [0.20,1.33]σδ 0.49 [0.20,1.21] 0.47 [0.21,1.05] –

0.47 [0.21,0.97] 0.35 [0.20,0.56]σα 0.33 [0.14,0.76] 0.32 [0.17,0.57] –

0.35 [0.17,0.68] 0.37 [0.19,0.74]σε 0.112 [0.075,0.171] 0.072 [0.054,0.092] 0.154 [0.098,0.224]

0.149 [0.092,0.218] 0.112 [0.076,0.160] 0.81 [0.38,1.17]

Table: First exact Bayesian, then synthetic likelihoods estimation.Umberto Picchini ([email protected])

Group 3 results

-1 0 1 20

2

4

6

(a) log β

-2 -1 0 1 2 30

0.5

1

1.5

(b) log δ

0 0.2 0.4 0.6 0.8 10

1

2

3

(c) α0 1 2 3 4

0

1

2

3

(d) γUmberto Picchini ([email protected])

0 2 4 60

0.5

1

1.5

(e) τ0 0.5 1 1.5 2 2.5

0

2

4

6

(f) σβ

0 0.5 1 1.5 2 2.50

2

4

6

(g) σδ0 0.5 1 1.5

0

1

2

3

4

5

(h) σα

0 0.5 1 1.50

10

20

30

40

(i) σε


Interesting to look at the convergence of the chains for the residualvariability σε.

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

iteration

0

0.2

0.4

0.6

0.8

1

1.2Inference for σ

ǫ

particle marginal (exact Bayesian)

Bayesian synthetic likelihoods

Of course we don’t know the value of σε. We let it start at σε = 1. Itrapidly converge to similar values for both methods, with very littlevariation.


Particle marginal methods (if time allows...)

In the results table we have shown results obtained with SL and aparticle marginal method. We now comment on the latter.

In our experiment, subjects measurements on M individuals areassumed independent.

y = (y1, ..., yM)

The likelihood function based on y is

L(y|θ) =M∏

i=1

p(yi|θ)

Each p(yi|θ) can be estimated unbiasedly using a SMC algorithm,such as the bootstrap filter.


φi = (βi, δi,αi) are random effects.

p(yi|θ) =

∫p(yi|φi; θ)p(φi|θ)dφi

=

∫(∫p(yi|Xi; θ)p(Xi|φi; θ)dXi

)p(φi|θ)dφi

=

∫(∫{ ni∏j=1

p(yij|Xij,φi, θ)p(Xi,j|Xi,j−1,φi; θ)}

p(Xi0|φi, θ)dXi

)× p(φi|θ)dφi.

The set of p(y1|θ), ..., p(yM |θ) can be estimated unbiasedly using thealgorithm that follows.


a particle marginal algorithm for exact Bayesian inference

for i = 1, ..., M dodraw φl

i ∼ p(φi|θ)if j = 1 then

Sample xli1 ∼ p(xi1|x0,φl

i; θ).Compute wl

i1 = p(yi1|xli1) and p(yi1) =

∑Ll=1 wl

i1/L.Normalization: wl

i1 := wli1/∑L

l=1 wli1.

Resampling: sample L times with replacement from {xli1, wl

i1}. Denote thesampled particles with xl

i1.end iffor j = 2, ..., ni do

Forward propagation: sample xlij ∼ p(xij|xl

i,j−1,φli; θ).

Compute wlij = p(yij|xl

ij) and normalise wlij := wl

ij/∑L

l=1 wlij

Compute p(yij|yi,1:j−1) =∑L

l=1 wlij/L

Resample L times with replacement from {xlij, wl

ij}. Sampled particles are xlij.

end forend for


Each iteration of the previous for loop gives p(yi|θ).

Since E[p(yi|θ)] = p(yi|θ)

and since all the p(yi|θ)) are independent one of the other

then E[∏M

i=1 p(yi|θ)] =∏M

i=1 p(yi|θ)

The above means that the overall likelihood for our mixed effectsmodel can be estimated unbiasedly.

Therefore exact Bayesian inference can be obtained usingpseudo-marginal arguments (e.g. Andrieu and Roberts 20093).

3Andrieu and Roberts 2009. The pseudo-marginal approach for efficient MonteCarlo computations. The Annals of Statistics: 697-725.


Summary

A simulation study (not yet in the paper) with a larger number ofsubjects confirms that SL and particle MCMC perform similarly.

Synthetic likelihoods needs stronger assumptions than ABC(approximate Bayesian computations).However ABC methods are often difficult to tune, because

1 the ABC threshold ε > 0 has a strong impact on thecomputational performance (often selected too large to easesampling);

2 with ABC, components of s need to be weighted by performing apilot study [Prangle 20164]. Not necessary with syntheticlikelihoods.

this was our very first experiment using synthetic likelihoods:given the complexity of the model it worked remarkably smooth.

4Prangle 2016. Adapting the ABC Distance Function. Bayesian Analysis.Umberto Picchini ([email protected])

We presented a work in progress.

Results give us confidence of the possibility for the method to extendthe model to be of non-state-space type.

Preliminary results of our ongoing work are at:

Forman and P. (2016). Stochastic differential equation mixed effectsmodels for tumor growth and response to treatment,arXiv:1607.02633.

Thank you


Appendix


Unbiased Gaussian likelihood estimate

Price et al. 2016 note than plugging-in the estimates µ(θ) and Σ(θ) into theGaussian likelihood p(s|θ) results in a biased estimate, while one couldinstead use the unbiased estimator of given by

p(s|θ) = (2π)−d/2 c(d, N − 2)c(d, N − 1)(1 − 1/N)d/2 |(N − 1)ΣN(θ)|

−(n−d−2)/2

×{ψ((N − 1)ΣN(θ) − (s − µN(θ))(s − µN(θ))

′/(1 − 1/N))}(N−d−3)/2

where d = dim(s), π denotes the mathematical constant, N > d + 3, and fora square matrix A the function ψ(A) is defined as ψ(A) = |A| if A is positivedefinite and ψ(A) = 0 otherwise. Finallyc(k, v) = 2−kv/2π−k(k−1)/4/

∏ki=1 Γ(

12 (v − i + 1)).