Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Dual-purpose Bayesian design for parameter estimationand model discrimination of models with intractable
likelihoods
M. B. Dehideniya 1 C. C. Drovandi 1,2 J. M. McGree 1,2
1School of Mathematical SciencesQueensland University of Technology
Australia
2ARC Centre of Excellence for Mathematical and Statistical Frontiers (ACEMS)
Bayes on the Beach - 2017
1
Experimental design in EpidemiologySpread of a disease within a herd of cows.(eg.Foot and mouth disease)
Competing models - SIR (Orsel et al., 2007) and SEIR (Backer et al.,2012)
Not practical to continuously observe the process.A set of distinct observational times {t1, t2, ..., tn} - Design.
2
Experimental design in EpidemiologySpread of a disease within a herd of cows.(eg.Foot and mouth disease)
Competing models - SIR (Orsel et al., 2007) and SEIR (Backer et al.,2012)
Not practical to continuously observe the process.A set of distinct observational times {t1, t2, ..., tn} - Design.
2
Experimental design in EpidemiologySpread of a disease within a herd of cows.(eg.Foot and mouth disease)
Competing models - SIR (Orsel et al., 2007) and SEIR (Backer et al.,2012)
Not practical to continuously observe the process.A set of distinct observational times {t1, t2, ..., tn} - Design.
2
BackgroundBayesian experimental designs
Consider Bayesian design,I due to the availability of important utilities (total entropy).I to appropriately handle uncertainty about models and parameters.
Assume Bayesian inference will be undertaken upon observing data.Optimal design - d∗ = arg maxd u(d), where
u(d) =K∑
m=1p(m)
∫y
u(d , y ,m)p(y |d ,m)dy .
u(d , y ,m) is some measure of information gained from d given modelm and observed data y .
3
BackgroundBayesian experimental designs
Consider Bayesian design,I due to the availability of important utilities (total entropy).I to appropriately handle uncertainty about models and parameters.
Assume Bayesian inference will be undertaken upon observing data.
Optimal design - d∗ = arg maxd u(d), where
u(d) =K∑
m=1p(m)
∫y
u(d , y ,m)p(y |d ,m)dy .
u(d , y ,m) is some measure of information gained from d given modelm and observed data y .
3
BackgroundBayesian experimental designs
Consider Bayesian design,I due to the availability of important utilities (total entropy).I to appropriately handle uncertainty about models and parameters.
Assume Bayesian inference will be undertaken upon observing data.Optimal design - d∗ = arg maxd u(d), where
u(d) =K∑
m=1p(m)
∫y
u(d , y ,m)p(y |d ,m)dy .
u(d , y ,m) is some measure of information gained from d given modelm and observed data y .
3
BackgroundBayesian experimental designs
Consider Bayesian design,I due to the availability of important utilities (total entropy).I to appropriately handle uncertainty about models and parameters.
Assume Bayesian inference will be undertaken upon observing data.Optimal design - d∗ = arg maxd u(d), where
u(d) =K∑
m=1p(m)
∫y
u(d , y ,m)p(y |d ,m)dy .
u(d , y ,m) is some measure of information gained from d given modelm and observed data y .
3
BackgroundBayesian experimental designs
In most cases, u(d) cannot be solved analytically.
But it can be approximated, e.g., Monte Carlo integration,
u(d) ≈K∑
m=1p(m) 1
B
B∑b=1
u(d , ymb,m),
where ymb ∼ p(y |θmb,m,d) and θmb ∼ p(θ|m).Here, some posterior inference is required to evaluate u(d , ymb,m).Hence, K × B posterior distributions need to be approximated orsampled from to approximated u(d).Computationally challenging task,
I approximating the expected utility;I maximising the utility.
4
BackgroundBayesian experimental designs
In most cases, u(d) cannot be solved analytically.But it can be approximated, e.g., Monte Carlo integration,
u(d) ≈K∑
m=1p(m) 1
B
B∑b=1
u(d , ymb,m),
where ymb ∼ p(y |θmb,m,d) and θmb ∼ p(θ|m).
Here, some posterior inference is required to evaluate u(d , ymb,m).Hence, K × B posterior distributions need to be approximated orsampled from to approximated u(d).Computationally challenging task,
I approximating the expected utility;I maximising the utility.
4
BackgroundBayesian experimental designs
In most cases, u(d) cannot be solved analytically.But it can be approximated, e.g., Monte Carlo integration,
u(d) ≈K∑
m=1p(m) 1
B
B∑b=1
u(d , ymb,m),
where ymb ∼ p(y |θmb,m,d) and θmb ∼ p(θ|m).Here, some posterior inference is required to evaluate u(d , ymb,m).
Hence, K × B posterior distributions need to be approximated orsampled from to approximated u(d).Computationally challenging task,
I approximating the expected utility;I maximising the utility.
4
BackgroundBayesian experimental designs
In most cases, u(d) cannot be solved analytically.But it can be approximated, e.g., Monte Carlo integration,
u(d) ≈K∑
m=1p(m) 1
B
B∑b=1
u(d , ymb,m),
where ymb ∼ p(y |θmb,m,d) and θmb ∼ p(θ|m).Here, some posterior inference is required to evaluate u(d , ymb,m).Hence, K × B posterior distributions need to be approximated orsampled from to approximated u(d).
Computationally challenging task,I approximating the expected utility;I maximising the utility.
4
BackgroundBayesian experimental designs
In most cases, u(d) cannot be solved analytically.But it can be approximated, e.g., Monte Carlo integration,
u(d) ≈K∑
m=1p(m) 1
B
B∑b=1
u(d , ymb,m),
where ymb ∼ p(y |θmb,m,d) and θmb ∼ p(θ|m).Here, some posterior inference is required to evaluate u(d , ymb,m).Hence, K × B posterior distributions need to be approximated orsampled from to approximated u(d).Computationally challenging task,
I approximating the expected utility;I maximising the utility.
4
Total entropy
The total entropy utility function can be defined as follows:
uT (d , y ,m) =∫
θp(θ|m, y ,d) log p(y |θ,m,d)dθ − log p(y |d).
In general, total entropy is a computationally challenging utility.Limited use (Borth, 1975 and McGree, 2017).Difficult to evaluate p(y |θ,m,d) for models with intractablelikelihoods.Hence, p(θ|m, y ,d) and log p(y |d) are more difficult to approximate.Motivates the use of a synthetic likelihood approach.
5
Total entropy
The total entropy utility function can be defined as follows:
uT (d , y ,m) =∫
θp(θ|m, y ,d) log p(y |θ,m,d)dθ − log p(y |d).
In general, total entropy is a computationally challenging utility.Limited use (Borth, 1975 and McGree, 2017).
Difficult to evaluate p(y |θ,m,d) for models with intractablelikelihoods.Hence, p(θ|m, y ,d) and log p(y |d) are more difficult to approximate.Motivates the use of a synthetic likelihood approach.
5
Total entropy
The total entropy utility function can be defined as follows:
uT (d , y ,m) =∫
θp(θ|m, y ,d) log p(y |θ,m,d)dθ − log p(y |d).
In general, total entropy is a computationally challenging utility.Limited use (Borth, 1975 and McGree, 2017).Difficult to evaluate p(y |θ,m,d) for models with intractablelikelihoods.
Hence, p(θ|m, y ,d) and log p(y |d) are more difficult to approximate.Motivates the use of a synthetic likelihood approach.
5
Total entropy
The total entropy utility function can be defined as follows:
uT (d , y ,m) =∫
θp(θ|m, y ,d) log p(y |θ,m,d)dθ − log p(y |d).
In general, total entropy is a computationally challenging utility.Limited use (Borth, 1975 and McGree, 2017).Difficult to evaluate p(y |θ,m,d) for models with intractablelikelihoods.Hence, p(θ|m, y ,d) and log p(y |d) are more difficult to approximate.
Motivates the use of a synthetic likelihood approach.
5
Total entropy
The total entropy utility function can be defined as follows:
uT (d , y ,m) =∫
θp(θ|m, y ,d) log p(y |θ,m,d)dθ − log p(y |d).
In general, total entropy is a computationally challenging utility.Limited use (Borth, 1975 and McGree, 2017).Difficult to evaluate p(y |θ,m,d) for models with intractablelikelihoods.Hence, p(θ|m, y ,d) and log p(y |d) are more difficult to approximate.Motivates the use of a synthetic likelihood approach.
5
Other utilitiesEstimation (KLD) (McGree, 2017).
uP(d , y ,m) =∫
θp(θ|m, y ,d) log p(y |θ,m,d)dθ − log p(y |m,d).
Model discrimination (Drovandi, McGree, Pettitt, 2014).
uM(d , y ,m) = − log p(m|y ,d).
Difficult to evaluate p(y |θ,m,d) for models with intractablelikelihoods.Hence, p(θ|m, y ,d) , log p(y |m,d) and log p(m|y ,d) are moredifficult to approximate.Motivates the use of a synthetic likelihood approach more generallythan just with total entropy.
6
Other utilitiesEstimation (KLD) (McGree, 2017).
uP(d , y ,m) =∫
θp(θ|m, y ,d) log p(y |θ,m,d)dθ − log p(y |m,d).
Model discrimination (Drovandi, McGree, Pettitt, 2014).
uM(d , y ,m) = − log p(m|y ,d).
Difficult to evaluate p(y |θ,m,d) for models with intractablelikelihoods.Hence, p(θ|m, y ,d) , log p(y |m,d) and log p(m|y ,d) are moredifficult to approximate.
Motivates the use of a synthetic likelihood approach more generallythan just with total entropy.
6
Other utilitiesEstimation (KLD) (McGree, 2017).
uP(d , y ,m) =∫
θp(θ|m, y ,d) log p(y |θ,m,d)dθ − log p(y |m,d).
Model discrimination (Drovandi, McGree, Pettitt, 2014).
uM(d , y ,m) = − log p(m|y ,d).
Difficult to evaluate p(y |θ,m,d) for models with intractablelikelihoods.Hence, p(θ|m, y ,d) , log p(y |m,d) and log p(m|y ,d) are moredifficult to approximate.Motivates the use of a synthetic likelihood approach more generallythan just with total entropy.
6
Synthetic likelihood approachWood (2010) approach - p(y |θ,m,d))
7
Synthetic likelihood approach
Counts will be observed from our experiments.
Extension for discrete data.In our case, no summary statistics are considered (mean and varianceof simulated data).Idea: the Normal distribution via continuity correction.Likelihood for discrete data is thus:
pSL(Y = y |θ,m,d) = p(y1− c < Y 1 < y1 + c, . . . , yp − c < Y p < yp + c),
where (Y 1, . . . ,Y p) ∼ N(µ(θ,m,d), Σ(θ,m,d)), c = 0.5 .
8
Synthetic likelihood approach
Counts will be observed from our experiments.Extension for discrete data.
In our case, no summary statistics are considered (mean and varianceof simulated data).Idea: the Normal distribution via continuity correction.Likelihood for discrete data is thus:
pSL(Y = y |θ,m,d) = p(y1− c < Y 1 < y1 + c, . . . , yp − c < Y p < yp + c),
where (Y 1, . . . ,Y p) ∼ N(µ(θ,m,d), Σ(θ,m,d)), c = 0.5 .
8
Synthetic likelihood approach
Counts will be observed from our experiments.Extension for discrete data.In our case, no summary statistics are considered (mean and varianceof simulated data).
Idea: the Normal distribution via continuity correction.Likelihood for discrete data is thus:
pSL(Y = y |θ,m,d) = p(y1− c < Y 1 < y1 + c, . . . , yp − c < Y p < yp + c),
where (Y 1, . . . ,Y p) ∼ N(µ(θ,m,d), Σ(θ,m,d)), c = 0.5 .
8
Synthetic likelihood approach
Counts will be observed from our experiments.Extension for discrete data.In our case, no summary statistics are considered (mean and varianceof simulated data).Idea: the Normal distribution via continuity correction.
Likelihood for discrete data is thus:
pSL(Y = y |θ,m,d) = p(y1− c < Y 1 < y1 + c, . . . , yp − c < Y p < yp + c),
where (Y 1, . . . ,Y p) ∼ N(µ(θ,m,d), Σ(θ,m,d)), c = 0.5 .
8
Synthetic likelihood approach
Counts will be observed from our experiments.Extension for discrete data.In our case, no summary statistics are considered (mean and varianceof simulated data).Idea: the Normal distribution via continuity correction.Likelihood for discrete data is thus:
pSL(Y = y |θ,m,d) = p(y1− c < Y 1 < y1 + c, . . . , yp − c < Y p < yp + c),
where (Y 1, . . . ,Y p) ∼ N(µ(θ,m,d), Σ(θ,m,d)), c = 0.5 .
8
Approximating utility functionsMarginal likelihood can be approximated as follows:
p(y |m,d) = 1B
B∑b=1
pSL(y |θb,m,d),
where θb ∼ p(θ|m).
Also for p(y |d)
p(y |d) =K∑
m=1p(y |m,d)p(m).
Then, posterior model probabilities:
p(m|y ,d) = p(y |m,d)p(m)p(y |d) .
9
Approximating utility functionsMarginal likelihood can be approximated as follows:
p(y |m,d) = 1B
B∑b=1
pSL(y |θb,m,d),
where θb ∼ p(θ|m).Also for p(y |d)
p(y |d) =K∑
m=1p(y |m,d)p(m).
Then, posterior model probabilities:
p(m|y ,d) = p(y |m,d)p(m)p(y |d) .
9
Approximating utility functionsMarginal likelihood can be approximated as follows:
p(y |m,d) = 1B
B∑b=1
pSL(y |θb,m,d),
where θb ∼ p(θ|m).Also for p(y |d)
p(y |d) =K∑
m=1p(y |m,d)p(m).
Then, posterior model probabilities:
p(m|y ,d) = p(y |m,d)p(m)p(y |d) .
9
Approximating utility functions
Employ importance sampling for approximating posteriordistributions.
Use prior as importance distribution.Sample θb ∼ p(θ), b = 1, . . . ,B (equal weights).Update weights via synthetic likelihood to yield Wb; the normalisedimportance weights.p(θ|y ,m,d) can be approximated by the particle set:
{θb,Wb}Bb=1.
10
Approximating utility functions
Employ importance sampling for approximating posteriordistributions.Use prior as importance distribution.
Sample θb ∼ p(θ), b = 1, . . . ,B (equal weights).Update weights via synthetic likelihood to yield Wb; the normalisedimportance weights.p(θ|y ,m,d) can be approximated by the particle set:
{θb,Wb}Bb=1.
10
Approximating utility functions
Employ importance sampling for approximating posteriordistributions.Use prior as importance distribution.Sample θb ∼ p(θ), b = 1, . . . ,B (equal weights).
Update weights via synthetic likelihood to yield Wb; the normalisedimportance weights.p(θ|y ,m,d) can be approximated by the particle set:
{θb,Wb}Bb=1.
10
Approximating utility functions
Employ importance sampling for approximating posteriordistributions.Use prior as importance distribution.Sample θb ∼ p(θ), b = 1, . . . ,B (equal weights).Update weights via synthetic likelihood to yield Wb; the normalisedimportance weights.p(θ|y ,m,d) can be approximated by the particle set:
{θb,Wb}Bb=1.
10
Approximating utility functions
Estimation:
uP(d , y ,m) =B∑
b=1W b
m log p(y |θb,m,d)− log p(y |m,d).
Discrimination:
uM(d , y ,m) = log p(m|y ,d).
Total entropy:
uT (d , y ,m) =B∑
b=1W b
m log p(y |θb,m,d)− log p(y |d).
11
SIR model
Given that at time t there are s susceptibles and i infectious individuals ina closed population of size N, then the probabilities of possible events inthe next time period ∆t are
a Susceptible becomes an Infectious individual
P[s − 1, i + 1|s, i
]= β s i
N ∆t +O(∆t),
an Infectious individual gets Recovered
P[s, i − 1|s, i
]= αi ∆t +O(∆t).
12
SEIR model
The probabilities of possible events in the next time period ∆t area Susceptible becomes an Exposed individual
P[s − 1, e + 1, i |s, e, i
]= β s i
N ∆t +O(∆t),
an Exposed individual becomes an Infectious individual
P[s, e − 1, i + 1|s, e, i
]= αIe ∆t +O(∆t),
an Infectious individual gets Recovered ,
P[s, e, i − 1|s, e, i
]= αR i ∆t +O(∆t).
13
ApplicationPrior predictive distribution under SIR and SEIR models (prior for SEIRmodel taken from Backer et al., 2012).
SEIR : β ∼ LN(0.44, 0.162), αI ∼ G(25.55, 0.02), αR ∼ G(7.25, 0.04).
SIR: β ∼ LN(−0.09, 0.192), α ∼ G(10.30, 0.02).
14
Optimal designsRefined coordinate exchange algorithm (Dehideniya et al., 2017)
Utility Function Optimal designd ∗ U(d ∗)
KLD
(11.6) 0.91(9.4, 19.1) 1.27
(7.4, 14.2, 27.1) 1.47(7.3, 10.9, 16.4, 27.1) 1.60
MI
(3.1) -0.43(4.1, 16) -0.34
(0.7, 4.1, 18.4) -0.30(0.7, 4.1, 10.1, 25.3) -0.28
TE
(7.0) 0.97(6.7, 17.5) 1.56
(6.5, 13.5, 27.1) 1.81(5.5, 10.8, 16.3, 27.1) 1.97
15
Performance of optimal designs in model discriminationM
I−1D
TE
−1D
KLD
−1D
RD
−1D
MI−
2D
TE
−2D
KLD
−2D
RD
−2D
MI−
3D
TE
−3D
KLD
−3D
RD
−3D
MI−
4D
TE
−4D
KLD
−4D
RD
−4D
0.0
0.2
0.4
0.6
0.8
1.0
Pos
terio
r m
odel
pro
babi
lity
− p
(m=
SIR
|y)
(a) SIR model
MI−
1D
TE
−1D
KLD
−1D
RD
−1D
MI−
2D
TE
−2D
KLD
−2D
RD
−2D
MI−
3D
TE
−3D
KLD
−3D
RD
−3D
MI−
4D
TE
−4D
KLD
−4D
RD
−4D
0.2
0.4
0.6
0.8
1.0
Pos
terio
r m
odel
pro
babi
lity
− p
(m=
SE
IR|y
)(b) SEIR model
16
Performance of optimal designs in parameter estimationM
I−1D
TE
−1D
KLD
−1D
RD
−1D
MI−
2D
TE
−2D
KLD
−2D
RD
−2D
MI−
3D
TE
−3D
KLD
−3D
RD
−3D
MI−
4D
TE
−4D
KLD
−4D
RD
−4D
9
10
11
12
13
14
log(
1/de
t(co
v(th
eta|
y)))
(a) SIR model
MI−
1D
TE
−1D
KLD
−1D
RD
−1D
MI−
2D
TE
−2D
KLD
−2D
RD
−2D
MI−
3D
TE
−3D
KLD
−3D
RD
−3D
MI−
4D
TE
−4D
KLD
−4D
RD
−4D
12
13
14
15
16
17
log(
1/de
t(co
v(th
eta|
y)))
(b) SEIR model
17
Discussion
Approach to design experiments for models with intractablelikelihoods.
Flexible in that a variety of utility functions can be efficientlyestimated.
18
Discussion
Approach to design experiments for models with intractablelikelihoods.Flexible in that a variety of utility functions can be efficientlyestimated.
18
Future research
Is the normal approximation reasonable, in general? (Otherdistributions were considered.)
How small can the sample size (no. of individuals) be?How to extend this method for high dimensional Bayesian designproblems for models with intractable likelihoods?
I Suitable posterior approximations.I Possible computational resources (GPU).
19
Future research
Is the normal approximation reasonable, in general? (Otherdistributions were considered.)How small can the sample size (no. of individuals) be?
How to extend this method for high dimensional Bayesian designproblems for models with intractable likelihoods?
I Suitable posterior approximations.I Possible computational resources (GPU).
19
Future research
Is the normal approximation reasonable, in general? (Otherdistributions were considered.)How small can the sample size (no. of individuals) be?How to extend this method for high dimensional Bayesian designproblems for models with intractable likelihoods?
I Suitable posterior approximations.I Possible computational resources (GPU).
19
Key referencesBorth, D. M., 1975. A total entropy criterion for the dual problem of model discriminationand parameter estimation. Journal of the Royal Statistical Society. Series B(Methodological) 37 (1), 77-87.Backer, J., Hagenaars, T., Nodelijk, G., and van Roermund, H. (2012). Vaccinationagainst foot-and-mouth disease i: Epidemiological consequences. Preventive VeterinaryMedicine 107, 27 - 40.Dehideniya, M. B., Drovandi, C. C., and McGree, J. M. (2017). Optimal Bayesian designfor discriminating between models with intractable likelihoods in epidemiology. Technicalreport, Queensland University of Technology.Drovandi, C. C., McGree, J. M., Pettitt, A. N., 2014. A sequential Monte Carlo algorithmto incorporate model uncertainty in Bayesian sequential design. Journal of Computationaland Graphical Statistics 23 (1), 3-24.McGree, J. M. , 2017. Developments of the total entropy utility function for the dualpurpose of model discrimination and parameter estimation in Bayesian design.Computational Statistics & Data Analysis, 113, 207-225.Orsel, K., de Jong, M., Bouma, A., Stegeman, J., and Dekker, A. (2007). The effect ofvaccination on foot and mouth disease virus transmission among dairy cows. Vaccine 25,327 - 335.Wood, S. N. (2010). Statistical inference for noisy nonlinear ecological dynamic systems.Nature 466, 1102-1104.
20