Upload
hakiet
View
235
Download
1
Embed Size (px)
Citation preview
Bayesian Optimization for Likelihood-FreeInference
Michael Gutmann
https://sites.google.com/site/michaelgutmann
University of Edinburgh
14th September 2016
Reference
For further information:
M.U. Gutmann and J. CoranderBayesian optimization for likelihood-free inference ofsimulator-based statistical modelsJournal of Machine Learning Research, 17(125): 1–47, 2016
J. Lintusaari, M.U. Gutmann, R. Dutta, S. Kaski, and J. CoranderFundamentals and Recent Developments in Approximate BayesianComputationSystematic Biology, in press, 2016
Michael Gutmann BOLFI 2 / 23
Overall goal
I Inference: Given data yo , learn about properties of its source
I Enables decision making, predictions, . . .
yo
Data space
Observation
Inference
Data source
Unknown properties
Michael Gutmann BOLFI 3 / 23
Approach
I Set up a model with potential properties θ (hypotheses)
I See which θ are in line with the observed data yo
yo
Data space
Observation
Inference
Data source
Unknown properties
Model
M(θ)
Michael Gutmann BOLFI 4 / 23
The likelihood function L(θ)
I Measures agreement between θ and the observed data yo
I Probability to generate data like yo if hypothesis θ holds
yo
Data space
ObservationData source
Unknown properties
Model
M(θ)Data generation
εy|θ
Michael Gutmann BOLFI 5 / 23
Performing statistical inference
I If L(θ) is known, inference is straightforward
I Maximum likelihood estimation
θ̂ = argmaxθ L(θ)
I Bayesian inference
p(θ|y) ∝ p(θ) × L(θ)
posterior ∝ prior× likelihood
Allows us to learn from data by updating probabilities
Michael Gutmann BOLFI 6 / 23
Likelihood-free inference
Statistical inference for models where
1. the likelihood function is too costly to compute
2. sampling – simulating data – from the model is possible
Michael Gutmann BOLFI 7 / 23
Importance of likelihood-free inference
One reason: Such generative / simulator-based models occur widely
I Astrophysics:Simulating the formation ofgalaxies, stars, or planets
I Evolutionary biology:Simulating the evolution of life
I Neuroscience:Simulating neural circuits
I Computer vision:Simulating natural scenes
I Health science:Simulating the spread of aninfectious disease
I . . .
Simulated neural activity in rat somatosensory cortex
(Figure from https://bbp.epfl.ch/nmc-portal)
Michael Gutmann BOLFI 8 / 23
Flavors of likelihood-free inference
I There are several flavors of likelihood-free inference. InBayesian setting e.g.
I Approximate Bayesian computation (ABC)I Synthetic likelihood (Wood, 2010)
I General idea: Identify the values of the parameters of interestθ for which simulated data resemble the observed data
I Simulated data resemble the observed data if some distancemeasure d ≥ 0 is small.
Here: Focus on ABC, see
JMLR paper for synthetic likelihood
Michael Gutmann BOLFI 9 / 23
Meta ABC algorithm
I Let yo be the observed data.I Iterate many times:
1. Sample θ from a proposal distribution q(θ)2. Sample y |θ according to the model3. Compute distance d(y , yo) between simulated and observed
data4. Retain θ if d(y , yo) ≤ ε
I Different choices for q(θ) give different algorithms
I Produces samples from the (approximate) posterior when ε issmall
Michael Gutmann BOLFI 10 / 23
Implicit likelihood approximation
Likelihood: Probability to generate data like yo if hypothesis θ holds
yo
ε
Data spaceyθ(1)
yθ(2)
yθ(3)
yθ(4)
yθ(5)
yθ(6)
Model
M(θ)
Likelihood L(θ) ≈ proportion of green outcomes
L(θ) ≈ 1N
∑Ni=1 1
(d(y
(i)θ , y o) ≤ ε
)Michael Gutmann BOLFI 11 / 23
Example: Bacterial infections in child care centers
I Likelihood intractable for cross-sectional dataI But generating data from the model is possible
Individual
Stra
in
5 10 15 20 25 30 35
5
10
15
20
25
30
Individual
Stra
in
5 10 15 20 25 30 35
5
10
15
20
25
30
Individual
Stra
in
5 10 15 20 25 30 35
5
10
15
20
25
30
Time
Individual
Stra
in
5 10 15 20 25 30 35
5
10
15
20
25
30
Individual
Str
ain
Parameters of interest:- rate of infections within a center- rate of infections from outside- competition between the strains
(Numminen et al, 2013)
Michael Gutmann BOLFI 12 / 23
Example: Bacterial infections in child care centers
I Data: Streptococcus pneumoniae colonization for 29 centers
I Inference with Population Monte Carlo ABC
I Reveals strong competition between different bacterial strains
Expensive:
I 4.5 days on a cluster with200 cores
I More than one millionsimulated data sets
0 0.2 0.4 0.6 0.8 10
2
4
6
8
10
12
14
16
18
strong weak
Competition
Competition parameter
prob
abilit
y de
nsity
func
tion
priorposterior
Michael Gutmann BOLFI 13 / 23
Why is the ABC algorithm so expensive?
1. It rejects most samples when ε is small
2. It does not make assumptions about the shape of L(θ)
3. It does not use all information available
4. It aims at equal accuracy for all parameters
L(θ) ≈ 1N
∑Ni=1 1
(d(y
(i)θ , y o) ≤ ε
)
Approximate lik function for competition
parameter. N = 300.
0 0.05 0.1 0.15 0.20
1
2
3
4
5
6
Competition parameter
Approximate likelihood function(rescaled)
Threshold ε
Average distance
Variabilitydistances
Michael Gutmann BOLFI 14 / 23
Proposed solution
(Gutmann and Corander, 2016)
1. It rejects most samples when ε is small⇒ Don’t reject samples – learn from them
2. It does not make assumptions about the shape of L(θ)⇒ Model the distances, assume average distance is smooth
3. It does not use all information available⇒ Use Bayes’ theorem to update the model
4. It aims at equal accuracy for all parameters⇒ Prioritize parameter regions with small distances
equivalent strategy applies toinference with synthetic likelihood
Michael Gutmann BOLFI 15 / 23
Modeling (points 1 & 2)
I Data are tuples (θi , di ), where di = d(y(i)θ , yo)
I Model the conditional distribution of d given θ
I Estimated model yields approximation L̂(θ) for any choice of ε
L̂(θ) ∝ P̂r (d ≤ ε | θ)
P̂r is probability under the estimated model.I Here: Use (log) Gaussian process as model (with squared
exponential covariance function)
I Approach not restricted to Gaussian processes.
Michael Gutmann BOLFI 16 / 23
Data acquisition (points 3 & 4)
I Samples of θ could be obtained by sampling from the prior orsome adaptively constructed proposal distribution
I Give priority to regions in the parameter space where distanced tends to be small.
I Use Bayesian optimization to find such regions
I Here: Use lower confidence bound acquisition function (e.g. Cox
and John, 1992; Srinivas et al, 2012)
At(θ) = µt(θ)︸ ︷︷ ︸post mean
−√
η2t︸︷︷︸
weight
vt(θ)︸ ︷︷ ︸post var
(1)
t: number of samples acquired so far
I Approach not restricted to this acquisition function.
Michael Gutmann BOLFI 17 / 23
Bayesian optimization for likelihood-free inference
0 0.05 0.1 0.15 0.2-15
-10
-5
0
5
Competition parameter
Model based on 2 data points
Acquisition function
20%
10%
5%
80%
90%
95%
0 0.05 0.1 0.15 0.2-3
-2
-1
0
1
2
3
4
5
6
Competition parameter
Model based on 3 data points
0 0.05 0.1 0.15 0.2-1
0
1
2
3
4
5
6
Competition parameter
Model based on 4 data points
Next parameterto try
DataModel
Exploration vs exploitation
Bayes' theorem
dis
tance
dis
tance
50%mean
Michael Gutmann BOLFI 18 / 23
Example: Bacterial infections in child care centers
I Comparison of the proposed approach with a standardpopulation Monte Carlo ABC approach.
I Roughly equal results using 1000 times fewer simulations.
4.5 days with 200 cores↓
90 minutes with seven cores
Posterior means: solid lines,
credibility intervals: shaded areas or dashed lines.
2 2.5 3 3.5 4 4.5 5 5.5 6
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Computational cost (log10)
Co
mp
etitio
n p
ara
me
ter
Developed Fast Method
Standard Method
(Gutmann and Corander, 2016)
Michael Gutmann BOLFI 19 / 23
Example: Bacterial infections in child care centers
I Comparison of the proposed approach with a standardpopulation Monte Carlo ABC approach.
I Roughly equal results using 1000 times fewer simulations.
2 2.5 3 3.5 4 4.5 5 5.5 6
1
2
3
4
5
6
7
8
9
10
11
Computational cost (log10)
Inte
rnal in
fection p
ara
mete
r
Developed Fast Method
Standard Method
2 2.5 3 3.5 4 4.5 5 5.5 6
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
Computational cost (log10)
Exte
rnal in
fection p
ara
mete
r
Developed Fast Method
Standard Method
Posterior means are shown as solid lines, credibility intervals as shaded areas or dashed lines.
Michael Gutmann BOLFI 20 / 23
Further benefits
I The proposed method makes the inference more efficient.
I Allowed us to perform far more comprehensive data analysisthan with standard approach (Numminen et al, 2016)
I Enables inference for models which were out of reach till now
I model of evolution where simulating a single data set took us12-24 hours (Marttinen et al, 2015)
I Enables easier assessment of parameter identifiability forcomplex models
I model about transmission dynamics of tuberculosis(Lintusaari et al, 2016)
Michael Gutmann BOLFI 21 / 23
Open questions
I Model: How to best model the distance between simulatedand observed data?
I Acquisition function: Can we find strategies which are optimalfor parameter inference?
I Efficient high-dimensional inference: Can we use the approachto infer the joint distribution of 1000 variables?
see JMLR paper for a discussion
Michael Gutmann BOLFI 22 / 23
Summary
I Topic: Inference for models where the likelihood is intractablebut sampling is possible
I Inference principle: Find parameter values for which thedistance between simulated and observed data is small
I Problem considered: Computational cost
I Proposed approach: Combine statistical modeling of thedistance with decision making under uncertainty (Bayesianoptimization)
I Outcome: Approach increases the efficiency of the inferenceby several orders of magnitude
Michael Gutmann BOLFI 23 / 23
References
I M.U. Gutmann and J. Corander. Bayesian optimization for likelihood-freeinference of simulator-based statistical models, Journal of Machine LearningResearch, 17(125): 1–47, 2016
I J. Lintusaari, M.U. Gutmann, R. Dutta, S. Kaski, and J. Corander.Fundamentals and Recent Developments in Approximate Bayesian Computation,Systematic Biology, in press, 2016
I E. Numminen, M.U. Gutmann, M. Shubin, et al. The impact of hostmetapopulation structure on the population genetics of colonizing bacteriaJournal of Theoretical Biology, 396: 53–62, 2016
I J. Lintusaari, M.U. Gutmann, S. Kaski, and J. Corander. On the identifiabilityof transmission dynamic models for infectious disease Genetics, 202(3):911–918, 2016
I P. Marttinen, N.J. Croucher, M.U. Gutmann, J. Corander, and W.P. Hanage.Recombination produces coherent bacterial species clusters in both core andaccessory genomes, Microbial Genomics, 1(5), 2015
I Numminen et al. Estimating the Transmission Dynamics of Streptococcuspneumoniae from Strain Prevalence Data. Biometrics 9, 2013.
I N. Srinivas, A. Krause, S.M. Kakade, and M. Seeger. Information-theoreticregret bounds for Gaussian process optimization in the bandit setting. IEEETransactions on Information Theory, 58(5):3250–3265, 2012.
I S.N. Wood. Statistical inference for noisy nonlinear ecological dynamic systems,Nature, 466: 1102–1104, 2010
I D. Cox and S. John. A statistical method for global optimization, Proc. IEEEConference on Systems, Man and Cybernetics, 2: 1241–1246, 1992