84
Emulation, Uncertainty, and Sensitivity Building an Emulator

Emulation, Uncertainty, and Sensitivity Building an Emulator

Embed Size (px)

Citation preview

Page 1: Emulation, Uncertainty, and Sensitivity Building an Emulator

Emulation, Uncertainty, and Sensitivity

Building an Emulator

Page 2: Emulation, Uncertainty, and Sensitivity Building an Emulator

2UQ minitutorial– session 2

Outline

• Recipe for building an emulator – MUCM toolkit• Screening – which simulator inputs matter• Design – where to run the simulator• Model structure – mean and covariance functions• Estimation / inference – building the emulator• Validation – making sure the emulator is OK

• Possible extensions• Multiple outputs• Dynamic simulators• Bayes linear methods

• Summary of emulation

Page 3: Emulation, Uncertainty, and Sensitivity Building an Emulator

The ‘standard’ problem

The MUCM toolkit recipewww.mucm.ac.uk

Page 4: Emulation, Uncertainty, and Sensitivity Building an Emulator

Step 0: Know your simulator

Before attempting to create an emulator it is important you understand your simulator• What are the plausible input ranges?• What constraints are there in input

combinations?• What is the output behaviour like?

Ideally you may wish to elicit beliefs about the distributions of the inputs if these are not known

• At least ranges are needed for all inputs

Page 5: Emulation, Uncertainty, and Sensitivity Building an Emulator

5UQ12 mintutorial– session 2

Step 1: Screening – active inputs

• All serious simulators require more than one input• The norm is anything from a few to thousands• All of the basic emulation theory in the toolkit assumes multiple

inputs• Large numbers of inputs pose computational problems

• Dimension reduction techniques have been developed• Output typically depends principally on a few inputs

• Screening seeks to identify the most important inputs for a given output• Most often the Morris method is used, which is a cheap

sensitivity analysis approximation

Page 6: Emulation, Uncertainty, and Sensitivity Building an Emulator

6UQ12 mintutorial– session 2

Screening: the Morris method

• Basic idea: develop a design that changes one input at a time, while filling space• Morris designs based on a series of repeated trajectories which

change only one input each step• Example (3 inputs)

• Trajectory changes in one directionat a time

• Do several trajectories• Rate of change in output in each leg:

• “Elementary effect” of that input

Page 7: Emulation, Uncertainty, and Sensitivity Building an Emulator

7UQ12 minitutorial– session 2

• Compute for each input:• Average of the

elementary effects (μ)• Variance of the

elementary effects (σ)• Active inputs to retain:

• High variance indicates non-linear effect or interaction

• High mean indicates stronglinear effect

• Screen out inputs with lowmean and variance

no effect linear effect

non-linear effect non

-line

ar e

ffect

Page 8: Emulation, Uncertainty, and Sensitivity Building an Emulator

8UQ12 minitutorial – session 2

Step 2: Design

• To build an emulator, we use a set of simulator runs• Our training data are y1 = f(x1), ..., yn = f(xn)

• Where x1, x2, ..., xn are n different points in the space of inputs• This set of n points is a design

• A good design will provide us with maximum information about the simulator• And hence an emulator that is as good as possible

Page 9: Emulation, Uncertainty, and Sensitivity Building an Emulator

Design principles

Design is all about choosing where to run the simulator to develop a good emulator

• We also need to consider validation and calibrationThere are many options for design and many issues• In the absence of additional information space-filling

designs are used• But this is an area of ongoing research

• Grids are infeasible for all but trivial simulators

Page 10: Emulation, Uncertainty, and Sensitivity Building an Emulator

10UQ12 minitutorial – session 2

Latin hypercube designs

• LHC designs• Use n values for each input• Combining randomly• Here 50 x-values are combined

randomly with 50 y-values• Advantages

• Doesn’t necessarily requirea large number of points

• Nothing lost if some inputs are inactive• Disadvantages

• Random choice may not produce an even spread of points• Need to generate many LHC designs and pick the best

Page 11: Emulation, Uncertainty, and Sensitivity Building an Emulator

11UQ12 minitutorial – session 2

Page 12: Emulation, Uncertainty, and Sensitivity Building an Emulator

12 UQ12 minitutorial – session 2

Some more design choices

• Various formulae and algorithms exist to generate space-filling designs for any number of inputs• The Sobol sequence is often used• Quick and convenient• Not always good when some inputs are inactive

• Optimal designs maximise/minimise some criterion• E.g. maximum entropy designs• Can be hard to compute, often not massive gains

• Hybrid designs try to satisfy two criteria• Space-filling but also having a few points closer together

• In order to estimate correlation lengths well

Page 13: Emulation, Uncertainty, and Sensitivity Building an Emulator

13UQ12 minitutorial – session 2

• In deciding on the structure of the emulator we have some choices to make:• The mean function• The covariance function• The prior specifications

• There are no universal solutions here• So judgement and validation play an important role

Step 3: Building the emulator

Page 14: Emulation, Uncertainty, and Sensitivity Building an Emulator

14UQ12 minitutorial – session 2

The technical part (overview!)

• The emulator is a Gaussian process • The conditional distribution of the simulator output, y, given the

input, x, and “hyperparameters” is multivariate normal• The following initial choices are generally made:

• Mean function m(x) = h(x)Tβ, with h(x) typically [1,x]• Covariance function σ2c(x,x’) = σ2exp-{(x-x’)TC(x-x’)}

• C is a diagonal matrix of inverse length scales 1/ δ2

• The hyperparameters are (β, σ2, δ)

• The choices we make can be important

Page 15: Emulation, Uncertainty, and Sensitivity Building an Emulator

15UQ12 minitutorial – session 2

The GP mean function

• We can use this to say what kind of shape we would expect the output to take as a function of the inputs

• Most simulator outputs exhibit some overall trend in response to varying a single input• So we usually specify a linear mean function• Slopes (positive or negative) are estimated from the training

data• The emulator mean smoothes the residuals after fitting the

linear terms• We can generalise to other kinds of mean function if we have

a clear idea of how the simulator will behave• The better the mean function the less the GP has to do

Page 16: Emulation, Uncertainty, and Sensitivity Building an Emulator

16UQ12 minitutorial – session 2

Example

• Simulator issolid line

• Dashed line islinear fit

• Blue arrowsindicate fitted residuals

• Without the linear meanfunction, we’d have a horizontal (constant) fit• and larger residuals• leading to larger emulator uncertainty

Page 17: Emulation, Uncertainty, and Sensitivity Building an Emulator

17UQ12 minitutorial – session 2

Page 18: Emulation, Uncertainty, and Sensitivity Building an Emulator

The GP covariance function

The covariance function determines how ‘wiggly’ the response is to each input

There’s a lot of flexibility here, but standard covariance functions have a parameter for each

input• these ‘correlation length’ parameters are also

estimated from the training data• but some care is needed

For predicting output at untried x, correlation lengths are important

• they determine how much information comes from nearby training points

• and hence the emulator accuracy

Page 19: Emulation, Uncertainty, and Sensitivity Building an Emulator

Prior distributionsPrior information enters through the form of the mean

function• And to a lesser extent the covariance function

But we can also supply prior information through the prior distributions for “hyperparameters”

• For slope/regression parameters and correlation lengths• Also the overall variance parameterPutting in genuine prior information here generally

improves emulator performance• Compared with standard ‘non-informative’ priors

• e.g.

22 )(

1)(

1)(

Page 20: Emulation, Uncertainty, and Sensitivity Building an Emulator

20UQ12 minitutorial – session 2

Step 4: Learning the emulator

• We normally proceed using Bayesian inference• Just how Bayesian depends on size of problem• Ideally we would ‘integrate out’ all unknown parameters, but

this can be difficult, requiring MCMC• Details are on the toolkit, but in summary

• Typically one can integrate out the regression coefficients (β) and variance parameter (σ2)

• Optimise (maximum likelihood, or MAP) the covariance length scales (δ)

• Ignoring uncertainty in length scales can be a problem if they are not well identified• Which is often the case

Page 21: Emulation, Uncertainty, and Sensitivity Building an Emulator

21UQ12 minitutorial – session 2

Page 22: Emulation, Uncertainty, and Sensitivity Building an Emulator

22UQ12 minitutorial – session 2

Prediction with the emulator

• Once the (hyper)-parameters of the emulator have been learnt (or integrated out) one can use the emulator to predict at a new input what the simulator output would have been• This is always a predictive distribution • Example with 6 (left) and 12 (right) training points

Page 23: Emulation, Uncertainty, and Sensitivity Building an Emulator

23UQ12 minitutorial – session 2

Page 24: Emulation, Uncertainty, and Sensitivity Building an Emulator

24UQ12 minitutorial – session 2

Step 5: Validating the emulator

• Validating the emulator is essential• Full probabilistic assessment of

fitness for purpose • First examine the standardised

residuals, with +/- 2 std intervals• Visual assessment is often very

helpful and provides diagnostic information

• More sophisticated diagnostics make use of correlation structure

Page 25: Emulation, Uncertainty, and Sensitivity Building an Emulator

25UQ12 minitutorial – session 2

What is validation?

• What does it mean to validate an emulator?• Compare the emulator’s predictions with the simulator output• Make a validation sample of runs at new input configurations• The emulator mean is the best prediction and is always wrong

• But the emulator predicts uncertainty around that mean• The emulator is valid if its expressions of uncertainty are

correct• Actual outputs should fall in 95% intervals 95% of the time

• No less and no more than 95% of the time• Standardised residuals should have zero mean and unit variance

• See MUCM toolkit

Page 26: Emulation, Uncertainty, and Sensitivity Building an Emulator

Measures for validation

The Mahalanobis distance on a test set• Accounts for the predictive covariance on the test set• Follows an F-distribution so we can check the value is close to

the theoretical one for a given test set sizeA useful diagnostic is the pivoted Cholesky decomposition of

the predictive covariance

Suggests non-stationary / poor predictive varianceSuggests poor length scale / covariance function

Should look like a normal distribution

Page 27: Emulation, Uncertainty, and Sensitivity Building an Emulator

27

Steps in building an emulator

• Specify the Gaussian process model• Select the prior distributions for the GP

hyperparameters• Choose a design for training and validation• Fit the emulator to the simulator runs• Validate and re-fit if needed

UQ12 mintutorial – session 2

Page 28: Emulation, Uncertainty, and Sensitivity Building an Emulator

Inputs: 18 initial surface temperatures +8 others

Outputs: 18 final temperatures + 4 others

We use the mean surface temperature as our output

Vary solar constant

Simple Example: Energy Balance Model

Page 29: Emulation, Uncertainty, and Sensitivity Building an Emulator

Example

UQ12 mintutorial – session 2

Page 30: Emulation, Uncertainty, and Sensitivity Building an Emulator

30UQ12 mintutorial – session 2

Page 31: Emulation, Uncertainty, and Sensitivity Building an Emulator

Extensions

What types of simulator are amenable to emulation?

Page 32: Emulation, Uncertainty, and Sensitivity Building an Emulator

32UQ12 minitutorial – session 2

Many outputs

• Most simulators also produce multiple outputs• For instance, a climate simulator may predict temperature on a

grid, etc.• Usually, for any given use of the simulator we are interested in

just one output• So we can just emulate that one, particularly if it is some

combination of the others, e.g. mean global surface temperature

• But some problems require multi-output emulation• Again, there are dimension reduction techniques

• All described in the MUCM toolkit

Page 33: Emulation, Uncertainty, and Sensitivity Building an Emulator

33UQ12 minitutorial – session 2

Multi-output emulators

• When we need to emulate several simulator outputs, there are a number of available approaches• Single output GP with added input(s) indexing the outputs

• For temperature outputs on a grid, make grid coordinates 2 additional inputs

• Independent GPs• Multivariate GP• Independent GPs for a linear transformation

• E.g. principal components• Possibility for dimension reduction

• These are all documented in the MUCM toolkit

Page 34: Emulation, Uncertainty, and Sensitivity Building an Emulator

34UQ12 minitutorial – session 2

Page 35: Emulation, Uncertainty, and Sensitivity Building an Emulator

35UQ12 minitutorial – session 2

Dynamic emulation

• Many simulators predict a process evolving in time• At each time-step the simulator updates the system state• Often driven by external forcing variables at each time-step• Climate models are usually dynamic in this sense

• We are interested in emulating the simulator’s time series of outputs• The various forms of multi-output emulation can be used• Or a dynamic emulator, emulating the single time-step

• And then iterating the emulator

• Also documented in the MUCM toolkit

Page 36: Emulation, Uncertainty, and Sensitivity Building an Emulator

36UQ12 minitutorial – session 2

Page 37: Emulation, Uncertainty, and Sensitivity Building an Emulator

37UQ12 minitutorial – session 2

Stochastic emulation

• Other simulators produce non-deterministic outputs• Running a stochastic simulator twice with the same input x

produces randomly different outputs• Different emulation strategies arise depending on what aspect

of the output is of interest• Interest focuses on the mean

• Output has added noise• Which we allow for when building the emulator

• Interest focuses on risk of exceeding a threshold• Emulate the distribution and derive the risk• Emulate the risk

• This is not yet covered in the MUCM toolkit

Page 38: Emulation, Uncertainty, and Sensitivity Building an Emulator

38UQ12 minitutorial – session 2

Bayes linear methods

• So far assumed a fully Bayesian framework• But there is an alternative framework – Bayes linear methods

• Based only on first and second order moments• Means, variances, covariances• Avoids making assumptions about distributions

• Its predictions are also first and second order moments• Means, variances, covariances but no distributions

• The toolkit contains theory and procedures for Bayes linear emulators

Page 39: Emulation, Uncertainty, and Sensitivity Building an Emulator

39UQ12 minitutorial – session 2

Page 40: Emulation, Uncertainty, and Sensitivity Building an Emulator

40UQ12 minitutorial – session 2

Bayes linear emulators

• Much of the mathematics is very similar• A Bayes linear emulator is not a GP but gives the same mean

and variance predictions • For given correlation lengths, mean function parameters• Although these are handled differently

• But the emulator predictions no longer have distributions• Compared with GP emulators

• Advantages – simpler and may be feasible for more complex problems

• Disadvantages – absence of distributions limits many of the uses of emulators

• Compromises made

Page 41: Emulation, Uncertainty, and Sensitivity Building an Emulator

Summary and Limitations

Why emulation is not a panacea

Page 42: Emulation, Uncertainty, and Sensitivity Building an Emulator

42UQ12 minitutorial – session 2

Some caveats on emulation

• Not all simulators are suitable for emulation• With very large numbers of (>50) outputs need specific

emulators and large training sets• For the problem you are solving are all outputs needed?

• For dynamic simulators with high dimensional state spaces there remain computational issues

• With discrete inputs and outputs Gaussian processes are not well suited

• But these issues are being addressed actively in research projects across the world including MUCM

Page 43: Emulation, Uncertainty, and Sensitivity Building an Emulator

43UQ12 minitutorial – session 2

Typical sequence of emulation

Define the problem youwant to solve, identify

the simulator

Identify the inputs,define ranges andscreen to select

Design the trainingset and run the

simulator

Choose the emulator (mean and covariance)

and define priors

Train the emulatorusing the training setand inference method

Use the emulatorand if necessary refine

Validate the emulatorand if necessary refine

Modify the simulatoror refine it, maybe using observations

Page 44: Emulation, Uncertainty, and Sensitivity Building an Emulator

44UQ12 minitutorial – session 2

Summary• Before you emulate know your simulator!• Think carefully about the problem you really want to solve

• Emulation is a tool to solve interesting problems and not an aim in itself

• The more prior knowledge you bring the easier the task will be • Choosing mean and covariance, eliciting priors

• Spend time on validation and refinement

• Building an emulator will help you understand your simulators … not replace them!

Page 45: Emulation, Uncertainty, and Sensitivity Building an Emulator

We have built an emulator. What now?

UQ12 minitutorial– session 2 45

Page 46: Emulation, Uncertainty, and Sensitivity Building an Emulator

47

Outline

• So we’ve built an emulator – what can we use it for?• Prediction

• What would the simulator output y be at an untried input x?• Uncertainty analysis

• Given uncertainty in x, what is the implied uncertainty in y?• Sensitivity analysis

• Which inputs influence the output most?• Which inputs are responsible for most output uncertainty?

• Calibration• Given observation of the real system, how can we use that to

learn about the best input values?

UQ12 mintutorial – session 2 47

Page 47: Emulation, Uncertainty, and Sensitivity Building an Emulator

Prediction and UA

Page 48: Emulation, Uncertainty, and Sensitivity Building an Emulator

49

Prediction

• Prediction is simple because that’s precisely what the emulator does• For any given x, the emulator mean E(f(x)) is an estimate• The emulator variance var[f(x)] expresses uncertainty

• Known as code uncertainty

• Similarly, given x and some threshold c we can evaluate P[f(x) > c]

UQ12 mintutorial – session 2 49

Page 49: Emulation, Uncertainty, and Sensitivity Building an Emulator

50

y sample: 10000

-0.4 -0.2 0.0 0.2

0.0 2.5 5.0 7.5 10.0

Uncertainty analysis

• If X has distribution g(x) then UA looks at the implied distribution of Y = f(X)• How do we evaluate that?

• In Session 1 we used Monte Carlo for a simple nonlinear simulator• Mean = 0.117• Median = 0.122• Std. dev. = 0.048

• But all these are estimates• Accuracy depends on the size of the Monte Carlo sample• 95% interval for the mean is (0.116, 0.118)

UQ12 mintutorial – session 2 50

Page 50: Emulation, Uncertainty, and Sensitivity Building an Emulator

51

UA with an emulator

• Consider the expected output M = E[Y] • It is uncertain because of code uncertainty• The emulator mean value for M is

E[M] = ∫ E[f(x)] g(x) dx• We can evaluate this by Monte Carlo

• Sample many values of x, evaluate the emulator mean E[f(x)] for each and average them

• This is already much faster than making many simulator runs to evaluate f(x)

• But we can often do the integral exactly

UQ12 mintutorial – session 2 51

Page 51: Emulation, Uncertainty, and Sensitivity Building an Emulator

52

Why emulation is more efficient

• Similarly we can evaluate var[M]• This is code uncertainty and depends on the number of

simulator runs to build the emulator• We want to compute/estimate M sufficiently accurately, so we

want var[M] sufficiently small• Emulation is more efficient because we can typically achieve

the desired accuracy using far fewer simulator runs to build the emulator than using traditional methods• For the simple nonlinear model, using only 25 simulator

runs to build the emulator, a 95% interval for M is (0.1173, 0.1179)

• Using the emulator we can also compute/estimate all those other quantities of interest, like var[Y] or P[Y > c]

UQ12 mintutorial – session 2 52

Page 52: Emulation, Uncertainty, and Sensitivity Building an Emulator

Sensitivity analysis

Page 53: Emulation, Uncertainty, and Sensitivity Building an Emulator

54

Sensitivity analysis

• Which inputs affect the output most?

• This is a common question

• Sensitivity analysis (SA) attempts to address it

• There are various forms of SA

• The methods most frequently used are not the most helpful!

UQ12 mintutorial – session 2 54

Page 54: Emulation, Uncertainty, and Sensitivity Building an Emulator

55

Recap – the nonlinear model

• The simple nonlinear model of the first session y = sin(x1)/{1+exp(x1+x2)}

• Just two inputs• Uncertainty analysis:

• Normal distributions on inputs • Output mean = 0.117, median = 0.122• Std. dev. = 0.048

• Which of these two inputs influences output most?• And in what ways?

UQ12 mintutorial – session 2 55

Page 55: Emulation, Uncertainty, and Sensitivity Building an Emulator

56

Local sensitivity analysis

• To measure the sensitivity of y to input xi, compute the derivative of y with respect to xi

• Nonlinear model:• At x1 = x2 = 0.5, the derivatives are

wrt x1, 0.142; wrt x2, –0.094

• How useful is this?• Derivatives evaluated only at the central estimate

• Could be quite different at other points nearby• Doesn’t capture interactions between inputs

• E.g. sensitivity of y to increasing both x1 and x2 could be greater or less than the sum of their individual sensitivities

• Not invariant to change of units

UQ12 mintutorial – session 2 56

Page 56: Emulation, Uncertainty, and Sensitivity Building an Emulator

57

One-way SA

• Vary inputs one at a time from central estimate• Nonlinear model:

• Vary x1 to 0.25, 0.75, output is 0.079, 0.152• Vary x2 to 0.25, 0.75, output is 0.154, 0.107

• Is this more useful?• Depends on how far we vary each input

• Relative sensitivities of different inputs change if we change the ranges

• But ranges are arbitrary• Also fails to capture interactions

• Statisticians have known for decades that varying factors one at a time is bad experimental design!

UQ12 mintutorial – session 2 57

Page 57: Emulation, Uncertainty, and Sensitivity Building an Emulator

58

Multi-way SA

• Vary factors two or more at a time• Maybe statistical factorial design• Full factorial designs require very many runs

• Can find interactions but hard to interpret• Often just look for the biggest change of output among all runs

• Still dependent on how far we vary each input

UQ12 mintutorial – session 2 58

Page 58: Emulation, Uncertainty, and Sensitivity Building an Emulator

59

Probabilistic SA

• Inputs varied according to their probability distributions• As in Uncertainty Analysis (UA)• Sensitivities still depend on ranges of distributions (variances),

but these are now not necessarily arbitrary• Gives an overall picture and can identify interactions

UQ12 mintutorial – session 2 59

Page 59: Emulation, Uncertainty, and Sensitivity Building an Emulator

60

Variance decomposition

• One way to characterise the sensitivity of the output to individual inputs is to compute how much of the UA variance is due to each input

• For the simple non-linear model, we have

UQ12 mintutorial – session 2

Input Contribution

X1 80.30 %

X2 16.77 %

X1.X2 interaction 2.93 %

60

Page 60: Emulation, Uncertainty, and Sensitivity Building an Emulator

61

Main effects

• We can also plot the effect of varying one input averaged over the others

• Nonlinear model• Averaging y = sin(x1)/{1+exp(x1+x2)} with respect to the

uncertainty in x2, we can plot it as a function of x1

• Similarly, we can plot it as a function of x2 averaged over uncertainty in x1

• We can also plot interaction effects

UQ12 mintutorial – session 2 61

Page 61: Emulation, Uncertainty, and Sensitivity Building an Emulator

62

1 2

1.00.50.0

0.15

0.10

0.05

0.00

x

y

Main effects in the simple nonlinear model

• Red is main effect of x1 (averaged over x2)

• Blue is main effect of x2 (averaged over x1)

UQ12 mintutorial – session 2 62

Page 62: Emulation, Uncertainty, and Sensitivity Building an Emulator

63

Joint effect in the simple nonlinear model

UQ12 mintutorial – session 2

-1-0.5

00.5

11.5

2

x2

-1

-0.5

0

0.5

1

1.5

2

x1

-0.8

-0.6

-0.4

-0.2

00

.20

.40

.6jo

int

eff

ect

63

Page 63: Emulation, Uncertainty, and Sensitivity Building an Emulator

64

A more complex example

• 5 inputs have appreciable influence, and account for 57% of the total UA variance

• Interactions account for 28%

UQ12 mintutorial – session 2

< 1%2%2%3%

5%2%

10%

14%

10% 11%

12%

28%

True main effects1%2%2%3%

4%1%

9%

13%

11%10%

13%

29%

Estimated main effects

64

Page 64: Emulation, Uncertainty, and Sensitivity Building an Emulator

65

Amplifying on variances

• Main effect plots amplify on the information given in the variance decomposition

• The variance component associated with input xi is equal to the amount by which its main effect varies over the range of uncertainty in xi

UQ12 mintutorial – session 2

The 5 inputs with most influence are dashed

65

Page 65: Emulation, Uncertainty, and Sensitivity Building an Emulator

Sensitivity – an example

UQ12 mintutorial – session 2 67

Page 66: Emulation, Uncertainty, and Sensitivity Building an Emulator

68

GEM-SA

• GEM-SA is a user-friendly piece of software that does many of the things we’ve been discussing• Can create some kinds of design• Fits an emulator to simulator output• Computes uncertainty and sensitivity analyses

• It’s freely available• http://tonyohagan.co.uk/academic/GEM

• It’s really useful for experimenting with relatively simple simulators• But not always reliable

UQ12 mintutorial – session 2 68

Page 67: Emulation, Uncertainty, and Sensitivity Building an Emulator

69

Example

• ForestETP vegetation simulator• 7 input parameters• 120 simulator runs

• Objective: conduct a variance-based sensitivity analysis • To identify which uncertain inputs are driving the output

uncertainty.

UQ12 mintutorial – session 2 69

Page 68: Emulation, Uncertainty, and Sensitivity Building an Emulator

70

Exploratory scatter plots

UQ12 mintutorial – session 2 70

Page 69: Emulation, Uncertainty, and Sensitivity Building an Emulator

71

• Looks like X6 is most important, and probably also X5• But these plots are hard to read because of the scatter• Fit an emulator and carried out the uncertainty and sensitivity

analyses

UQ12 mintutorial – session 2 71

Page 70: Emulation, Uncertainty, and Sensitivity Building an Emulator

72

Variance of main effects

UQ12 mintutorial – session 2

Main effects for each input. Input 6 has the greatest individual contribution to the variance

Main effects sum to 66% of the total variance

72

Page 71: Emulation, Uncertainty, and Sensitivity Building an Emulator

73

Main effect plots

UQ12 mintutorial – session 2 73

Page 72: Emulation, Uncertainty, and Sensitivity Building an Emulator

74

Main effect plots

UQ12 mintutorial – session 2

Fixing X6 = 18, this point shows the expected value of the output (obtained by averaging over all other inputs).

Simply fixing all the other inputs at their central values and comparing X6=10 with X6=40 would underestimate the influence of this input

(The thickness of the band shows emulator uncertainty)

74

Page 73: Emulation, Uncertainty, and Sensitivity Building an Emulator

75

Interactions and total effects

• Main effects explain only 2/3 of the variance• Model must contain interactions

• Any input can have small main effect, but large interaction effect, so overall still an ‘important’ input

• We can compute all pair-wise interaction effects• 435 in total for a 30 input model – can take some time!

• Useful to know what to look for

UQ12 mintutorial – session 2 75

Page 74: Emulation, Uncertainty, and Sensitivity Building an Emulator

76

Interactions and total effects

• For each input Xi

Total effect = main effect for Xi + all interactions involving Xi

• Assumes independent inputs• Main effects and total effects normalised by variance• Total effect >> main effect implies interactions in the model • Look for inputs with large total effects relative to main effects

• Investigate possible interactions involving those inputs

UQ12 mintutorial – session 2 76

Page 75: Emulation, Uncertainty, and Sensitivity Building an Emulator

77

Interactions and total effects

UQ12 mintutorial – session 2

Total effects for inputs 4 and 7 much larger than their main effects.

Implies presence of interactions.

77

Page 76: Emulation, Uncertainty, and Sensitivity Building an Emulator

78

Interaction effects

• Compute pair-wise joint effect variances• All interactions between X4, X5, X6, X7

UQ12 mintutorial – session 2 78

Page 77: Emulation, Uncertainty, and Sensitivity Building an Emulator

79

Main and interaction effects

UQ12 mintutorial – session 2

Note interactions involving inputs 4 and 7

Main effects and selected interactions now sum to almost 92% of the total variance

79

Page 78: Emulation, Uncertainty, and Sensitivity Building an Emulator

80

What have we learnt here?

• Most important inputs are X4, X5, X6, X7• We can more or less ignore X1, X2, X3

• Together these 3 account for < 10% of overall variance• X6 is most important single input

• 36% of variance• Has only minor interactions with other inputs

• X4, X5, X7 interact in complex ways• But together account for over 50% of variance

• Main effect plots are useful• Particularly for X6• But less so for the others! Need to look at how they interact

UQ12 mintutorial – session 2 80

Page 79: Emulation, Uncertainty, and Sensitivity Building an Emulator

81

SA summary

Why SA?

1. For the model user: SA identifies which inputs it would be most useful to reduce uncertainty about

2. For the model builder: main effect and interaction plots demonstrate how the simulator is behaving• Sometimes surprisingly!

UQ12 mintutorial – session 2 81

Page 80: Emulation, Uncertainty, and Sensitivity Building an Emulator

Calibration

Page 81: Emulation, Uncertainty, and Sensitivity Building an Emulator

83

Calibration

• Simulator users often want to tune the simulator using observations of the real system

• Adjust the input parameters so that the simulator output matches observations as well as possible

• Two very important points1. Calibration will reduce uncertainty about x but will not

eliminate it2. It is necessary to understand how the simulator relates to

reality• Model discrepancy

UQ12 mintutorial – session 2 83

Page 82: Emulation, Uncertainty, and Sensitivity Building an Emulator

84

Model discrepancy

• Simulator output y = f(x) will not equal the real system value z• Even with best/correct inputs x

• Model discrepancy is the difference z – f(x)• As discussed in Session 1, model discrepancy is due to

• Wrong or incomplete science• Programming errors, rounding errors• Inaccuracy in numerically solving systems of equations

• Ignoring model discrepancy leads to poor calibration• Over-fitting of parameter estimates• Over-confidence in the fitted values

• We’ll look at calibration in the next session

UQ12 mintutorial – session 2 84

Page 83: Emulation, Uncertainty, and Sensitivity Building an Emulator

85

References

• For variance-based sensitivity analysis• http://mucm.ac.uk.toolkit• Toolkit has a “topic thread” on this subject

• GEM-SA• http://tonyohagan.co.uk/academic/GEM• Other software packages implementing many of the toolkit

methods exist, but none are comprehensive or guaranteed to be stable

UQ12 mintutorial – session 2 85

Page 84: Emulation, Uncertainty, and Sensitivity Building an Emulator

86UQ12 minitutorial – session 2

References

• http://mucm.ac.uk/toolkit !!