1
Inferring parameters in a large stochastic model using only proportions of cell death: insights from the birth-death model Holly Ainsworth, Richard Boys & Colin Gillespie * Newcastle University, UK R EFERENCES [1] Tang, M.Y., Proctor, C.J., Woulfe, J., Gray, D.A. Experimental and Computational Analysis of Polyglutamine-Mediated Cytotoxicity In PLoS Com- put Biol., 2010 . I NTRODUCTION Expanded polyglutamine (PolyQ) proteins are known to be the causative agents in a number of neurodegenerative diseases, but they are still poorly understood. Aggregation of specific proteins are part of the normal process of ageing in the brain as well as in many age-related diseases. A causative link between aggregation and disease is not universally accepted. S TOCHASTIC M ODEL Stochastic kinetic model with uncertain pa- rameters: 27 (chemical) species 70 reactions 40 rate constants, denoted θ The model aims to explore the relationship between PolyQ, p38MAPK activation, genera- tion of reactive oxygen species (ROS), protea- some inhibition and inclusion body formation. D ATA Data are proportions (of cell death) - not quan- titative trait measurements Scenario 24hrs 36hrs 48hrs GFP 15.03 14.55 26.08 H25 18.97 18.07 22.50 H103 21.68 23.44 36.44 S IMULATION Denote the probability of cell death at time t, p t (θ ). Note the probability of cell death depends on parameters θ . Given θ , the Gillespie algorithm can be used to simulate the time evolution of a particular cell: The simulation gives us a binary time series for the cell, with 1= death and 0= no death. Repeating the above for n cells gives us a handle on p t (θ ) via the observed proportion of cell death ˆ p t (θ ), where ˆ p t (θ ) 1 n Bin(n, p t (θ )). B IRTH -D EATH M ODEL Let x denote the number of individuals present in the population. In chemical kinetic notation, this system is represented as R 1 : x λ -→ 2x (birth) R 2 :2x μ -→ x (death) Example simulations from the model: 0 5 10 15 0 4 8 12 Time Population 0.00 0.25 0.50 0.75 1.00 0 4 8 12 Time Probability of extinction Simulator n = 10 n = 100 n = 1000 n = 10000 Comparisons with PolyQ model: Compare a single population governed by a birth-death process with a cell governed by the PolyQ model. A cell becoming extinct in the birth-death process can be likened to a cell dying in the PolyQ model. Proportions of extinction from the birth- death model are comparible to propor- tions of cell death from the PolyQ model. An analytic expression for the probability of extinction in the birth-death process for given t, λ, μ and initial population level is available. I NFERENCE Work with proportions on the logit scale and assume data model: y t = logit x t = logit p t (θ )+ σ t , t =1,...,T where t N (0, 1) independently. The posterior of interest is π (θ |y ) π (θ )π (σ )π (y |θ ). Approaches to inference: Vanilla MCMC - approximate the distribution of ˆ p t (θ ) and account for the uncertainty using elogit ˆ p t (θ ) N logit p t (θ ), 1 np t (θ )[1 - p t (θ )] approximately. Pseudo marginal 1 - construct a Monte Carlo estimate of the marginal likelihood. Pseudo marginal 2 - at each iteration of the MCMC scheme, use a (SIR) particle filter to construct a SMC approximation to the marginal likelihood. R ESULTS log(λ29 log(μ29 log(σ29 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 n = 10 n = 100 n = 1000 -3 -2 -1 0 1 -0.5 0.0 0.5 1.0 -3 -2 -1 0 1 Parameter value Density Vanilla MCMC Pseudo-marginal MCMC 1 Pseudo-marginal MCMC 2 Inference using exact probability of death Both pseudo-marginal schemes involve more work. They use n × #particles runs of the simulator at each iteration, com- pared to n runs for the original scheme. The pseudo-marginal approach has the advantage that it performs exact infer- ence (for a particular choice of n). This will be useful particularly when the asymptotic distributional result for elogit ˆ p t (θ ) is poor for small n. The SMC approach to estimating the marginal likelihood appears to mix better than the MC approach.

Poster for Information, probability and inference in systems biology (IPISB 2013)

Embed Size (px)

DESCRIPTION

Interest lies in inference for the rate parameters in a complex stochastic biological model describing the aggregation of proteins within human cells. Protein aggregation is a factor in many age-related diseases such as Alzheimer's disease. Ideally time-course measurements on all chemical species in the model would be available. However, current experimental techniques only allow noisy observations on the proportions of cell death at a few time points. Although the model has a large state space and is analytically intractable, realisations from the model can be obtained using a stochastic simulator. The time evolution of a cell can be repeatedly simulated giving an estimate of the proportion of cell death. Unfortunately, simulation from the model is too slow to be used in an MCMC inference scheme. A Gaussian process emulator, which is very fast, can be used to approximate the simulator. An MCMC scheme can be constructed targeting the posterior distribution of interest, however evaluating the marginal likelihood is challenging. A pseudo-marginal approach replaces the marginal likelihood with an easy to construct unbiased estimate while still targeting the true posterior. The methods will be illustrated using a toy birth-death model, allowing comparison with the exact model.

Citation preview

Page 1: Poster for Information, probability and inference in systems biology (IPISB 2013)

Inferring parameters in a large stochastic model using only proportions of celldeath: insights from the birth-death model

Holly Ainsworth, Richard Boys & Colin Gillespie∗Newcastle University, UK

REFERENCES[1] Tang, M.Y., Proctor, C.J., Woulfe, J., Gray, D.A.

Experimental and Computational Analysis ofPolyglutamine-Mediated Cytotoxicity In PLoS Com-put Biol., 2010 .

INTRODUCTION →• Expanded polyglutamine (PolyQ) proteins

are known to be the causative agents in anumber of neurodegenerative diseases, butthey are still poorly understood.

• Aggregation of specific proteins are part ofthe normal process of ageing in the brainas well as in many age-related diseases.

• A causative link between aggregation anddisease is not universally accepted.

STOCHASTIC MODEL ↓Stochastic kinetic model with uncertain pa-rameters:

• 27 (chemical) species

• 70 reactions

• 40 rate constants, denoted θ

The model aims to explore the relationshipbetween PolyQ, p38MAPK activation, genera-tion of reactive oxygen species (ROS), protea-some inhibition and inclusion body formation.

DATA ←Data are proportions (of cell death) - not quan-titative trait measurements

Scenario 24hrs 36hrs 48hrs

GFP 15.03 14.55 26.08H25 18.97 18.07 22.50H103 21.68 23.44 36.44

SIMULATION ↓Denote the probability of cell death at time t, pt(θ). Note the probability of cell death dependson parameters θ. Given θ, the Gillespie algorithm can be used to simulate the time evolution ofa particular cell:

• The simulation gives us a binary time series for the cell, with 1 = death and 0 = no death.

• Repeating the above for n cells gives us a handle on pt(θ) via the observed proportion ofcell death p̂t(θ), where

p̂t(θ) ∼1

nBin(n, pt(θ)).

BIRTH-DEATH MODEL →Let x denote the number of individualspresent in the population. In chemical kineticnotation, this system is represented as

R1 : xλ−→ 2x (birth)

R2 : 2xµ−→ x (death)

Example simulations from the model:

● ●●0

5

10

15

0 4 8 12Time

Pop

ulat

ion

0.00

0.25

0.50

0.75

1.00

0 4 8 12Time

Pro

babi

lity

of e

xtin

ctio

n

Simulator n = 10 n = 100 n = 1000 n = 10000

Comparisons with PolyQ model:

• Compare a single population governed bya birth-death process with a cell governedby the PolyQ model.

• A cell becoming extinct in the birth-deathprocess can be likened to a cell dying inthe PolyQ model.

• Proportions of extinction from the birth-death model are comparible to propor-tions of cell death from the PolyQ model.

An analytic expression for the probability ofextinction in the birth-death process for givent, λ, µ and initial population level is available.

INFERENCE ↓Work with proportions on the logit scale and assume data model:

yt = logitxt = logit pt(θ) + σεt, t = 1, . . . , T

where εt ∼ N(0, 1) independently. The posterior of interest is

π(θ, σ|y) ∝ π(θ)π(σ)π(y|θ, σ).

Approaches to inference:

• Vanilla MCMC - approximate the distribution of p̂t(θ) and account for the uncertainty using

elogit p̂t(θ) ∼ N(

logit pt(θ),1

npt(θ)[1− pt(θ)]

)approximately.

• Pseudo marginal 1 - construct a Monte Carlo estimate of the marginal likelihood.

• Pseudo marginal 2 - at each iteration of the MCMC scheme, use a (SIR) particle filter toconstruct a SMC approximation to the marginal likelihood.

RESULTS

log(λ) log(µ) log(σ)

0

1

2

3

4

5

0

1

2

3

4

5

0

1

2

3

4

5

n=

10n

=100

n=

1000

−3 −2 −1 0 1 −0.5 0.0 0.5 1.0 −3 −2 −1 0 1Parameter value

Den

sity

Vanilla MCMC Pseudo−marginal MCMC 1 Pseudo−marginal MCMC 2 Inference using exact probability of death

• Both pseudo-marginal schemes involvemore work. They use n×#particles runsof the simulator at each iteration, com-pared to n runs for the original scheme.

• The pseudo-marginal approach has theadvantage that it performs exact infer-ence (for a particular choice of n).

• This will be useful particularly whenthe asymptotic distributional result forelogit p̂t(θ) is poor for small n.

• The SMC approach to estimating themarginal likelihood appears to mix betterthan the MC approach.