17
Emulation of a Stochastic Forest Simulator Using Kernel Stick- Breaking Processes (Work in Progress) James L. Crooks (SAMSI, Duke University)

James L. Crooks (SAMSI, Duke University)

  • Upload
    feryal

  • View
    24

  • Download
    0

Embed Size (px)

DESCRIPTION

Emulation of a Stochastic Forest Simulator Using Kernel Stick-Breaking Processes (Work in Progress). James L. Crooks (SAMSI, Duke University). Background. We desire to predict the distribution of tree species in the North Carolina forest under a variety of future climate change scenarios. - PowerPoint PPT Presentation

Citation preview

Page 1: James L. Crooks (SAMSI, Duke University)

Emulation of a Stochastic Forest Simulator Using Kernel Stick-Breaking Processes

(Work in Progress)

James L. Crooks (SAMSI, Duke University)

Page 2: James L. Crooks (SAMSI, Duke University)

Background● We desire to predict the distribution of tree species in

the North Carolina forest under a variety of future climate change scenarios.

● Toward this end we can use the forest simulator developed by J. Clark and P. Agarwal’s joint research group.

● This simulator models the life-cycle of individual trees within a tree stand of pre-specified area.

● Growth and fecundity are in part mediated by the climate-influenced variables temperature and soil moisture.

Page 3: James L. Crooks (SAMSI, Duke University)

Motivation● The forest simulator has the following properties that

make emulation both important and difficult:– Its speed limits the physical area that can be simulated in

reasonable time (the current standard is 128 m x 128 m)– Its output is stochastic– Its output distribution can be non-gaussian – Its output distribution can vary over the input space.

● Thus there is a need for a local, nonparametric statistical method to emulate the entire output distribution across in the input space.

Page 4: James L. Crooks (SAMSI, Duke University)

Objectives● Run simulator with 3 species under “standard” climatic

conditions for 1000+ years to establish equilibrium initial conditions.

● Run simulator for a further 100 years at each of various points in the climate input space (temperature and soil moisture increase rates).

● Emulate the output over this input space using the Kernel Stick-Breaking Processes idea of Dunson and Park (2006).

Page 5: James L. Crooks (SAMSI, Duke University)

2122

21211 iiiiii

Ti xxxxxxX

i indexes the run of the simulatorxi1 = Mean Temperature Increase / Century xi2 = Mean Soil Moisture Increase / Century

yi1 = Final Number of Adult Trees of Species 1yi2 = Final Number of Adult Trees of Species 2yi3 = Final Number of Adult Trees of Species 3

Simulator Climate Input Variables

Design Matrix (see below)

Simulator Output Variables

Summary of Input and Output Variables

Page 6: James L. Crooks (SAMSI, Duke University)

Forest Simulator output for the

1001 year initialization run.

We will focuson number of adult trees.

Legend— Total— Species 1— Species 2— Species 3

Page 7: James L. Crooks (SAMSI, Duke University)

• We expect that the mean response will be suppressed at extreme values of climate variables.

→Therefore we model the mean response as:

with a design matrix having up to quadratic terms

}6,...,1{

}3,2,1{

},...,1{

,βXexpμ6

1kijkikij

k

j

Ni

Climate Variable (Temperature or Soil Moisture Increase Rate)

Num

ber o

f Tre

es

i indexes simulator run

j indexes the tree species

k indexes the regression coef.

Single Regression Surface

Justifying the Choice of Model

Page 8: James L. Crooks (SAMSI, Duke University)

● We do not a priori expect the output distribution to be Gaussian anywhere on the input space.

→ Use a non-parametric (Dirichlet Process) infinite mixture of regression surfaces instead of a single surface.

● We do not a priori expect the shape of the output distribution to be constant over the input space.

→ Use the Kernel Stick-Breaking Process of Dunson and Park (2006) to allow the DP mixture to be predictor-dependent.

Climate Variable (Temperature or Soil Moisture Increase Rate)

Num

ber o

f Tre

es Finite (Truncated) Mixture ofRegression Surfaces

Page 9: James L. Crooks (SAMSI, Duke University)

Negative Binomial Likelihood● The output variable of interest is number of adult trees of each

species. Why not use a Poisson likelihood?● Preliminary data show Var[y] scales roughly like E[y]2, not

E[y], and Var[y] is also inversely dependent on the forest area.

→Use the negative binomial distribution, which has pmf:

and moments:

where the prior range of can be increased with area.

μνμ

μνν

Γ(ν)1)Γ(yν)Γ(yνμ,|yf

ν

μμνμ,|yVarμνμ,|yE2

Page 10: James L. Crooks (SAMSI, Duke University)

The Full Model

22i22

21i11i

ii

im

mmi1

iix

6

1kijkikij

3

1jijjii

ixiii

ΓxψΓxψexpΓ,xK

,Γ,xKVΓ,V;xW

,βGΓ,V;xW1Γ,V;xWβG

βXexpμ,μ,νNegBinν,β|yf

N1,...,i,βdGν,β|yfν|yf

i

i

lll

llll

lll

ll

��

��

Kernel Stick-Breaking Process

Page 11: James L. Crooks (SAMSI, Duke University)

{1,2,3}jid,DiscreteGr~ν

LogNormal~ψ

Wishart~Φ,Σ

al,MatrixNorm~β

,Φ,Σ,βalMatrixNorm~G

,ηGDP~G

,α1,Beta~V

id,DiscreteGr~Γ

j

1,2

10

10

0

0000

0

��

����

l

l

l

Page 12: James L. Crooks (SAMSI, Duke University)

Comments on the Model● This model, unlike Dunson and Park’s original, lacks

conjugacy between f and G0; thus two changes must be made to their algorithm:– We no longer have the full conditional for , so we must use

a Metropolis-Hastings step to update it.– The integral cannot be evaluated exactly

so we must approximate it numerically using (e.g., ) Monte-Carlo integration.

● The original MATLAB code is itself not fast, but once a posterior sample has been generated it is cheap to predict the output pmf at new points in the input space.

i0ii βdGν,β|yf��

iβ�

Page 13: James L. Crooks (SAMSI, Duke University)

Generating Simple Climate Change Scenarios

● The ballpark estimates of today’s (soil moisture, temperature) mean and covariance are:

● The 1000+ year initialization run has temperature and soil moisture generated by a MVN with this mean and covariance.

● Temperature is measured in °C and soil moisture in %.

14.380.190.191.78

cov18.9416.61,mean

Page 14: James L. Crooks (SAMSI, Duke University)

● Future 100 year scenarios are generated assuming the means change linearly in time with rates given by the points on plot below:

• GCM’s generally predict hotter, drier conditions for the Southeastern US.

•Accordingly, ranges were: [-1,+2]*SD/century for Temperature and [-2,+1]*SD/century for Soil Moisture.

Page 15: James L. Crooks (SAMSI, Duke University)

Shown are the generated soil moisture and temperature used in the initialization run, and three generated future scenarios. Climate change begins at year 1052.

Legend— Stable Climate— Hotter/Drier— Cooler/Wetter

Page 16: James L. Crooks (SAMSI, Duke University)

Results● I just got the initialization run back last week, so

ask me in 3 months.

Other Thoughts● May need to continue the initialization run another 500-1000 years to get a better

equilibrium.● Need a lot more runs when using nonparametrics anyway, so the benefits of using a

Latin Hyper-Cube design are less obvious (in 2-D anyway).

Page 17: James L. Crooks (SAMSI, Duke University)

Acknowledgements● Jim Clark’s group for use of their simulator, and

especially Sean McMahon for his invaluable assistance.● David Dunson and Ju-Hyun Park for explaining their

paper to me and letting me use their algorithm.● The SAMSI Methodology and Terrestrial Models

Working Groups for fruitful discussions.

ReferencesDunson, D. B., and J.-H. Park, “Kernel Stick-Breaking Processess”, ISDS Discussion Paper

22 (2006) and Biometrika (accepted)Govindarajan, S., M. Dietze, P. Agarwal, and J. S. Clark, “A scalable simulator for forest

dynamics”, Symposium on Computational Geometry 2004: 106-115Govindarajan, S., M. Dietze, P. Agarwal, and J. S. Clark, “A scalable algorithm for dispersing populations”, Journal of Intelligent Information Systems 2004 (online)