Bayesian tools for analysing and reducing uncertainty Tony OHagan University of Sheffield

Bayesian tools for analysing and reducing uncertainty

Tony O’Hagan

University of Sheffield

Or …

Uncertainty, Complexity and Predictive Reliability

of (environmental/biological) process models

Summary

Uncertainty

Complexity

Predictive Reliability

Uncertainty

is everywhere … Internal parameters Initial conditions Forcing inputs Model structure Observational error Code uncertainty

Uncertainty (2) And all sources of uncertainty must be

recognised quantified

Otherwise we don’t know how good model predictions are how to use data

Tasks involving uncertainty Whether or not we have data

Sensitivity analysis Uncertainty analysis

Interacting with observational data Calibration Data assimilation Discrepancy estimation Validation

Complexity This is already a big task It is massively exacerbated by model

complexity High dimensionality Long model run times

But there are powerful statistical tools available

It’s a big task

Quantifying uncertainty is often difficult Unfamiliar task Need for expert statistical skills

Statistical modelling Elicitation It deserves to be recognised as a task of

comparable status to developing the model And EMS is all about respecting each other’s

expertise

Computational complexity All the tasks involving uncertainty can

be computed by simple (MC)MC methods if the model runs quickly enough

Otherwise emulation is needed Requires orders of magnitude fewer model

runs

Emulation A computer model encodes a function,

that takes inputs and produces outputs An emulator is a statistical

approximation of that function NOT just an approximation Estimates what outputs would be obtained

from given inputs With statistically valid measure of

uncertainty

Emulators Multiple regression models

Do not make valid uncertainty statements Neural networks

Can make valid uncertainty statements but complex

Data based mechanistic models Do not make valid uncertainty statements

Gaussian processes

GPs Gaussian process emulators

are nonparametric make no assumptions other than smoothness estimate the code accurately with small uncertainty and run “instantly”

So we can do uncertainty based tasks fast and efficiently

Conceptually, we use model runs to learn about the function then derive any desired properties of model

2 code runs Consider one input and one output Emulator estimate interpolates data

2 code runs Emulator uncertainty grows between

data points

3 code runs Adding another point changes estimate

and reduces uncertainty

5 code runs And so on

Smoothness It is the basic assumption of a

(homogeneously) smooth, continuous function that gives the GP its computational advantages

The actual degree of smoothness concerns how rapidly the function “wiggles”

A rough function responds strongly to quite small changes in inputs

We need many more data points to emulate accurately a rough function over a given range

Effect of Smoothness Smoothness determines how fast the

uncertainty increases between data points

Estimating smoothness We can estimate the smoothness from

the data This is obviously a key Gaussian process

parameter to estimate But tricky Need robust estimate Validate by predicting left-out data points

Code uncertainty Emulation, like MC, is just a

computational device But a highly efficient one!

Like MC, quantities of interest are computed subject to error Statistically quantifiable and validatable Reducible if we can do more model runs

This is code uncertainty

And finally …Predictive Reliability

What can we do with observational data?

Model validationCheck observations against predictive distributions based on current knowledge

Calibration

Data assimilation

Model correction

Learn about values of uncertain model parameters (possibly including model structure)

For dynamic models, learn about the current value of the state vectorLearn about model discrepancy function Do all of these

(in one coherent Bayesian system)

Doing it all It’s crucial to model uncertainties

carefully to avoid using data twice to apportion observation error between

parameters, state vector and model discrepancy

to get appropriate learning about all these Data assimilation alone is useful only for

short term prediction

This is challenging We (Sheffield and Durham) have

developed theory and serious case studies Growing practical experience But still lots to do, both theoretically and

practically Each new model poses new challenges

Our science is as exciting and challenging as any other

Sorry … We are not yet at the stage where

implementation is routine Very limited software Most publications in the statistics literature

But we’re working on it And we’re very willing to interact with

modellers/users in any discipline Particularly if you have resources!

Who we are Sheffield

Tony O’Hagan ([email protected])http://shef.ac.uk/~st1ao

Marc Kennedy, Stefano Conti, Jeremy Oakley

Durham Michael Goldstein

([email protected]) Peter Craig, Jonathan Rougier, Alan Seheult

Documents

Bayesian tools for analysing and reducing uncertainty Tony OHagan University of Sheffield