43
1 SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT Photo: Kendra Holt For i = minE To maxE Step Estep Cells(Erow, Ecol) = i SolverOk SetCell:="$B$26", MaxMinVal:=2, SolverSolve True neg_log_L(i) = Cells(L_row, L_col) If neg_log_L(i) <= min_L Then min_L = neg_log_L(i) Ebest = i End If Next i for (i=1;i<=nobs;i++) { ad_begin_funnel(); x=S(i); y=R(i); lim1=x*mfexp(-6*sx); lim2=x*mfexp(8*sx); p=adromb(&model_parameters::fz,lim1,lim2,nsteps); resid+=log(p+1.e-300); }

SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT

  • Upload
    moses

  • View
    32

  • Download
    2

Embed Size (px)

DESCRIPTION

Photo: Kendra Holt. SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT. For i = minE To maxE Step Estep Cells(Erow, Ecol) = i SolverOk SetCell:="$B$26", MaxMinVal:=2, SolverSolve True neg_log_L(i) = Cells(L_row, L_col) If neg_log_L(i)

Citation preview

Page 1: SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT

1

SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT

Photo: Kendra Holt

For i = minE To maxE Step Estep Cells(Erow, Ecol) = i SolverOk SetCell:="$B$26", MaxMinVal:=2, SolverSolve True neg_log_L(i) = Cells(L_row, L_col) If neg_log_L(i) <= min_L Then min_L = neg_log_L(i) Ebest = i End IfNext i

for (i=1;i<=nobs;i++){ ad_begin_funnel(); x=S(i); y=R(i); lim1=x*mfexp(-6*sx); lim2=x*mfexp(8*sx); p=adromb(&model_parameters::fz,lim1,lim2,nsteps); resid+=log(p+1.e-300);}

Page 2: SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT

2

Statistical Simulation

Statistics

Power analysis

Bootstrapping

Randomization

Simulation/Estimation for testing statistical methods

Fitting models to data

Page 3: SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT

3

Statistics

The ability to simplify means to eliminate the unnecessary so that the necessary may speak

Hans Hoffman

Unfortunately, acquisition of statistical knowledge is blocked by a formidable wall of mathematics

It is equally unfortunate that much of this mathematical basis relies on relatively strong assumptions about such things as large sample sizes and asymptotic normality

On the positive side, modern computing power allows us to relax many of these assumptions, particularly those that commonly arise in natural resource applications

Page 4: SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT

4

Statistics

Three main objectives of Statistics:

1. How should data be collected in order to:a. test hypotheses?b. estimate parameters for a model?c. make decisions?

2. How should data be summarized and analyzed?

3. How accurate are the summaries and analyses?Do they adequately reflect the “truth”?

Data Collection

Analysis

Inference

Page 5: SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT

5

Statistical Inference

A sample of 15 (X,Y) pairs from the universe of 100

possible pairs

Page 6: SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT

6

Statistical Inference

A sample of 15 (X,Y) pairs from the universe of 100

possible pairs

The complete universe of 100 possible pairs

“The Truth”

Statistics tries to infer the picture on the right from the picture on the left

Page 7: SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT

7

X = {76.3, 74.2, 90.2, 83.9, 89.8, 89.1, 94.3, 62.3, 87.9}

mean(X) = 96.7sd(X) = 10.6

How accurate is the mean of X?

Accuracy formula for the mean (and only the “mean”)

53.39

6.10

..

2

2

n

ses

Statistical Inference – the mean

Page 8: SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT

8

Bootstrapping

That was easy! But try that for the correlation coefficient

A statistical formula might be available, but it will require assumptions about the distribution of the data

Page 9: SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT

9

Bootstrapping – what’s the big idea?

X=(x1,x2,…xn) Original Dataset

X*1 X*2 X*3 Bootstrap samples

s(X*1) s(X*2) s(X*3)Bootstrap replications of the function s(X*)

There is no real limit to what the function s() represents!!

Page 10: SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT

10

Bootstrapping

There is no real limit to what the function s() represents!!

For example, s() could be the mean or other summary statistic. It could be a correlation coefficient, a regression estimate, parameter estimates for a non-linear model

The data could be multivariate in which more than one response variable is observed for each sample. Thus, s() could be a Principal Component statistic.

Most, if not all, statistical functions are automatically generated in software like R, SAS, Systat, S-plus, etc. Therefore, you can make inferences just by (1) resampling the data, (2) repeating the analysis, (3) summarizing your estimates, and (4) drawing inferences. All without much mathematical or statistical training.

Page 11: SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT

11

Bootstrapping - Example

Suppose we wish to infer the regression slope for our hypothetical universe of possible data using only the sample

1. Resample the 12 data points 1000 times

2. Compute the regression slope for each

3. Compute the mean

4. Compute 95% confidence limits by a. sorting the slope estimates b. L95 is 25th estimate c. U95 is 975th estimate

Page 12: SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT

12

Bootstrapping – R code for regression#initialise vector to hold slope estimatesslope<-0.#generate 1000 bootstrap replications of X,Y, slopefor(i in 1:1000){ #randomly choose nobs = 12 index values boot<-sample(seq(1, nobs, 1), size = nobs, replace=T) #create X,Y bootstrap pair Xboot<-X[boot] Yboot<-Y[boot] #perform regression using lm() function reg.boot<-lm(Yboot~Xboot) #extract slope from resulting list slope.boot[i]<-reg.boot$coeff[2]}#generate histogramhist(slope.boot, main="")#generate summary statsprint(mean(slope.boot))slope<-sort(slope.boot)L95<-slope.boot [25]; U95<-slope.boot [975]print(c(L95,U95))

Bootstrap reps of X*,Y*

The function s(X,Y)

The distribution F *

The inference

Page 13: SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT

13

Bootstrapping - Example

67.0ˆ

42.0ˆ

56.0ˆ

50.0

975.0

025.0

boot

true

Original Data The Distribution F *

Page 14: SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT

14

Model fitting

Curve fitting and Model fitting including some important definitions

Probability models and Likelihood Functions

Likelihood functions that obey constraints

“Safe” parameterization of non-linear models

Linear vs. non-linear estimation

Estimation

Page 15: SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT

15

Curve fitting

A toxicologist has performed an experiment on accumulation of a chemicalpollutant in tissue where she obtained the following data:

She selects from a class of functions, e.g.,

mmBBBC 2

210

She then wishes to summarize this data into a more comprehensible and reduced form.

Page 16: SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT

16

Curve fitting

A toxicologist has performed an experiment on accumulation of a chemicalpollutant in tissue where she obtained the following data:

This is called CURVE FITTING

She then wishes to summarize this data into a more comprehensible and reduced form.

She selects from a class of functions, e.g.,

the curve and parameters that best fit her data.

mmBBBC 2

210

Page 17: SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT

17

Curve fitting

Curve fitting typically involves polynomials

mmBBBC 2

210ˆ

Values for the parameters 0, 1,… m, are chosen so as to get the best possible fit to the data

In curve fitting, the most common objective function used to judge the fit between curve and data is a least-squares criterion

2

1 0

2

1

ˆ

n

j

m

i

ijij

n

jjj xCCCSS

predictedobserved predictedobserved

Page 18: SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT

18

Curve fitting

Curve fitting involves two levels of arbitrariness:

1. The function used to predict the data is arbitrary, being dictated only to a minor extent by the process from which the data came

2. The best fit criterion is arbitrary, being independent of statistical considerations (e.g., sums-of-squares is not probability distribution?)

Page 19: SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT

19

Curve fitting

Curve fitting as described is also easy because equations like:

are linear functions of the parameters that can solved analytically

To see why this is linear, examine the independent variables in tabular or experimental design form

mmBBBC 2

210

X0 X1 X2 … … Xm

1 B B2 … … Bm

Page 20: SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT

20

Curve fitting

The original equation can now be rewritten from

to

mmXXXC 22110

mmBBBC 2

210

which is just a linear model (e.g., multiple linear regression) that can be solved by setting all derivatives

0id

dSS

And solving the set of simultaneous linear equations for

Page 21: SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT

21

Model fitting

Suppose now that our toxicologist friend is familiar with the biophysical laws governing accumulation of contaminants in tissue

B

BC

2

1

She may then choose to derive an equation that obeys these laws, e.g.,

The parameters 1 and 2 now have a biophysical interpretation

maximum thehalf reaches

t contaminanat which sizebody

constant scalingt contaminan

2

1

The function also obeys the biological constraint C ≥ 0.

Page 22: SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT

22

Model fitting

When an equation is derived based on theoretical considerations, the procedure of finding the best fitting parameters is called MODEL FITTING

She may choose similar goodness of fit criteria, but the form of the equation is no longer guided by computational convenience

B

BC

2

1

In this case, the model

is no longer linear in the parameters

Computing the “best-fit” must involve some numerical procedure

Page 23: SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT

23

Model estimation

Because parameters from model fitting have some natural interpretation, we may wish to ask: “What is the true value of 2 in nature?”

The imprecise nature of the measurements means that we can never answer this question with absolute certainty

Also, if she performed this experiment on a new set of subjects, she may get a different best-fitting 2 value

The process of finding parameter values that

a. Fit the data wellb. Come close on average to the true valuesc. Do not vary excessively from one experiment to the next

is called MODEL ESTIMATION

Model estimation is a critical component of Simulation Modelling!

Page 24: SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT

24

Statistical estimation

Determining the parameters of a probability distribution is calledSTATISTICAL ESTIMATION

with the usual estimates

Statistical estimation is also a critical component of Simulation Modelling!

For example, the observed value of a random variable h may be the height of trees from an even-aged stand of lodgepole pine

If we assume that h has a normal distribution with mean h0 and standard deviation , then the probability density function (pdf) of h is

2

022 2

1exp

2

1)( hhhp

n

ii

n

ii hh

nh

nh

1

20

2

10 1

1 and

1

STATISTICAL ESTIMATION

Page 25: SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT

25

Parameter estimation

Model estimation can be combined with Statistical estimation in the following way:

Suppose the measured concentration of pollutant Ci taken at body size Bi is a random variable whose mean is given by

i

ii B

BC

2

1

If many measurements were taken at body size Bi we would expect C values to fluctuate around this mean with standard deviation .

If we assume that these variations have a normal distribution, then the probability density function for Ci is

2

2

122 2

1exp

2

1)(

i

iii B

BCCp

Page 26: SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT

26

Parameter estimation

Bi

Ci

i

ii B

BC

2

2

2

122 2

1exp

2

1)(

i

iii B

BCCp

Each value of Ci will have a mean that depends on Bi and the spread of Ci values depends on standard deviation

Page 27: SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT

27

Parameter estimation

For several measurements (Ci, Bi) where i={1,2,…n}, we have a pdf for each:

Page 28: SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT

28

Parameter estimation

,, 21

Although each value of Ci will have a mean that depends on Bi, the set of parameters

are common to all measurements!

Estimating parameters that are common to both the model and the probability density function of the observations is called PARAMETER ESTIMATION

Because parameter estimation encompasses all other forms of estimation, it is the most critical component of Simulation Modelling!

Page 29: SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT

29

Simulation vs. Estimation

Simulation generates predicted values of state variables (observations) from a known set of parameter values

Initial Conditionst=0

Spring Juvenile Production

Summer Juvenile Survival

Update Fall Abundance

Calculate Harvest

Winter Survival

Adult Spring Abundance

End Yes

Hunting effortyr t

Set t=t+1

t=T? No

Echo parameter inputOutput initial conditions

Output results yr t

Input parameters of function equationsParameters in:

Observations out:

Page 30: SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT

30

The key elements of simulation models are Parameters, State Variables, and Controls

We can use the notation:

to state that The Simulation Problem is to go from known Parameters and Controls to unknown State Variables:

YZΘ ,

Parameters Θ Survival, production rates, error variances, etc.

Controls Z Hunting effort, harvest rates

State Variables* Y Duck abundance, or index

*”State Variable” in this context includes a measurement of state such as index. Inthat case, the parameter set includes parameters of the measurement system

Simulation vs. Estimation

Page 31: SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT

31

Simulation vs. Estimation

Simulation generates predicted values of state variables (observations) from a known set of parameter values

Initial Conditionst=0

Spring Juvenile Production

Summer Juvenile Survival

Update Fall Abundance

Calculate Harvest

Winter Survival

Adult Spring Abundance

End Yes

Hunting effortyr t

Set t=t+1

t=T? No

Echo parameter inputOutput initial conditions

Output results yr t

Input parameters of function equationsParameters in:

Observations out:

Θ

Z

Y

Page 32: SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT

32

Simulation vs. Estimation

Estimation is concerned with finding the models and parameter values that generate the observed data.

The Estimation Problem is to go from known (observed) State Variables and Controls to unknown Parameters:

ΘZY ,In this case the “known” state variables Y represent the data or observations

Page 33: SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT

33

Simulation vs. Estimation

YZΘ ,

Simulation

ΘZY ,

Estimation

Page 34: SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT

34

Yeah, but how does estimation work?

For the contaminant problem, we have

State Variables eCY Controls BZ

Model

i

ii Z

ZY

2

1

,, 21Parameters

Page 35: SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT

35

ΘZY ,Estimation

Yeah, but how does estimation work?

ii YZΘ ˆ,

2. Use simulation step to “predict” state variables given parameters at current iteration

1. Guess initial parameters at iteration i=00iΘ

3. Calculate likelihood that data would have arisen if these parameters were true

n

i i

ii Z

ZYp

1

2

2

122 2

1exp

2

1)(

ZΘ,|Y

4. Repeat 1-3 after adjusting parameters in “best-fit” direction

Page 36: SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT

36

The simulated predictions and likelihood function

20.0,60,25. 21

Page 37: SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT

37

20.0,52,25. 21 The simulated predictions and likelihood function

Page 38: SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT

38

20.0,40,25. 21 The simulated predictions and likelihood function

Page 39: SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT

39

20.0,28,25. 21 The simulated predictions and likelihood function

Page 40: SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT

40

20.0,20,25. 21 The simulated predictions and likelihood function

Page 41: SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT

41

Lo

g-L

ikel

iho

od

2

The total likelihood function

Optimization procedure used tofind the value *2 that maximizesthe likelihood

Page 42: SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT

42

The “Model” can be anything used to predict data

Parameters Simulation Model

Controls

Observed DataYobs

Likelihood Function

Predicted DataYpred

Confrontation between Modeland Data

Page 43: SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT

43

The Likelihood Function

The likelihood function used depends on the type of data/observations

Most stats books have Appendices listing distributions for particular data types, pdfs, expected values (means, variances), and random generation

The likelihood can only tell you what hypotheses/parameters are more likely than others. It cannot tell you the probability that a given hypothesis is true…that requires a Bayesian Approach