Upload
nathaniel-russo
View
218
Download
0
Tags:
Embed Size (px)
Citation preview
Bayesian tools for analysing and reducing uncertainty
Tony O’Hagan
University of Sheffield
Or …
Uncertainty, Complexity and Predictive Reliability
of (environmental/biological) process models
Summary
Uncertainty
Complexity
Predictive Reliability
Uncertainty
is everywhere … Internal parameters Initial conditions Forcing inputs Model structure Observational error Code uncertainty
Uncertainty (2) And all sources of uncertainty must be
recognised quantified
Otherwise we don’t know how good model predictions are how to use data
Tasks involving uncertainty Whether or not we have data
Sensitivity analysis Uncertainty analysis
Interacting with observational data Calibration Data assimilation Discrepancy estimation Validation
Complexity This is already a big task It is massively exacerbated by model
complexity High dimensionality Long model run times
But there are powerful statistical tools available
It’s a big task
Quantifying uncertainty is often difficult Unfamiliar task Need for expert statistical skills
Statistical modelling Elicitation It deserves to be recognised as a task of
comparable status to developing the model And EMS is all about respecting each other’s
expertise
Computational complexity All the tasks involving uncertainty can
be computed by simple (MC)MC methods if the model runs quickly enough
Otherwise emulation is needed Requires orders of magnitude fewer model
runs
Emulation A computer model encodes a function,
that takes inputs and produces outputs An emulator is a statistical
approximation of that function NOT just an approximation Estimates what outputs would be obtained
from given inputs With statistically valid measure of
uncertainty
Emulators Multiple regression models
Do not make valid uncertainty statements Neural networks
Can make valid uncertainty statements but complex
Data based mechanistic models Do not make valid uncertainty statements
Gaussian processes
GPs Gaussian process emulators
are nonparametric make no assumptions other than smoothness estimate the code accurately with small uncertainty and run “instantly”
So we can do uncertainty based tasks fast and efficiently
Conceptually, we use model runs to learn about the function then derive any desired properties of model
2 code runs Consider one input and one output Emulator estimate interpolates data
2 code runs Emulator uncertainty grows between
data points
3 code runs Adding another point changes estimate
and reduces uncertainty
5 code runs And so on
Smoothness It is the basic assumption of a
(homogeneously) smooth, continuous function that gives the GP its computational advantages
The actual degree of smoothness concerns how rapidly the function “wiggles”
A rough function responds strongly to quite small changes in inputs
We need many more data points to emulate accurately a rough function over a given range
Effect of Smoothness Smoothness determines how fast the
uncertainty increases between data points
Estimating smoothness We can estimate the smoothness from
the data This is obviously a key Gaussian process
parameter to estimate But tricky Need robust estimate Validate by predicting left-out data points
Code uncertainty Emulation, like MC, is just a
computational device But a highly efficient one!
Like MC, quantities of interest are computed subject to error Statistically quantifiable and validatable Reducible if we can do more model runs
This is code uncertainty
And finally …Predictive Reliability
What can we do with observational data?
Model validationCheck observations against predictive distributions based on current knowledge
Calibration
Data assimilation
Model correction
Learn about values of uncertain model parameters (possibly including model structure)
For dynamic models, learn about the current value of the state vectorLearn about model discrepancy function Do all of these
(in one coherent Bayesian system)
Doing it all It’s crucial to model uncertainties
carefully to avoid using data twice to apportion observation error between
parameters, state vector and model discrepancy
to get appropriate learning about all these Data assimilation alone is useful only for
short term prediction
This is challenging We (Sheffield and Durham) have
developed theory and serious case studies Growing practical experience But still lots to do, both theoretically and
practically Each new model poses new challenges
Our science is as exciting and challenging as any other
Sorry … We are not yet at the stage where
implementation is routine Very limited software Most publications in the statistics literature
But we’re working on it And we’re very willing to interact with
modellers/users in any discipline Particularly if you have resources!
Who we are Sheffield
Tony O’Hagan ([email protected])http://shef.ac.uk/~st1ao
Marc Kennedy, Stefano Conti, Jeremy Oakley
Durham Michael Goldstein
([email protected]) Peter Craig, Jonathan Rougier, Alan Seheult