Upload
feryal
View
24
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Emulation of a Stochastic Forest Simulator Using Kernel Stick-Breaking Processes (Work in Progress). James L. Crooks (SAMSI, Duke University). Background. We desire to predict the distribution of tree species in the North Carolina forest under a variety of future climate change scenarios. - PowerPoint PPT Presentation
Citation preview
Emulation of a Stochastic Forest Simulator Using Kernel Stick-Breaking Processes
(Work in Progress)
James L. Crooks (SAMSI, Duke University)
Background● We desire to predict the distribution of tree species in
the North Carolina forest under a variety of future climate change scenarios.
● Toward this end we can use the forest simulator developed by J. Clark and P. Agarwal’s joint research group.
● This simulator models the life-cycle of individual trees within a tree stand of pre-specified area.
● Growth and fecundity are in part mediated by the climate-influenced variables temperature and soil moisture.
Motivation● The forest simulator has the following properties that
make emulation both important and difficult:– Its speed limits the physical area that can be simulated in
reasonable time (the current standard is 128 m x 128 m)– Its output is stochastic– Its output distribution can be non-gaussian – Its output distribution can vary over the input space.
● Thus there is a need for a local, nonparametric statistical method to emulate the entire output distribution across in the input space.
Objectives● Run simulator with 3 species under “standard” climatic
conditions for 1000+ years to establish equilibrium initial conditions.
● Run simulator for a further 100 years at each of various points in the climate input space (temperature and soil moisture increase rates).
● Emulate the output over this input space using the Kernel Stick-Breaking Processes idea of Dunson and Park (2006).
2122
21211 iiiiii
Ti xxxxxxX
i indexes the run of the simulatorxi1 = Mean Temperature Increase / Century xi2 = Mean Soil Moisture Increase / Century
yi1 = Final Number of Adult Trees of Species 1yi2 = Final Number of Adult Trees of Species 2yi3 = Final Number of Adult Trees of Species 3
Simulator Climate Input Variables
Design Matrix (see below)
Simulator Output Variables
Summary of Input and Output Variables
Forest Simulator output for the
1001 year initialization run.
We will focuson number of adult trees.
Legend— Total— Species 1— Species 2— Species 3
• We expect that the mean response will be suppressed at extreme values of climate variables.
→Therefore we model the mean response as:
with a design matrix having up to quadratic terms
}6,...,1{
}3,2,1{
},...,1{
,βXexpμ6
1kijkikij
k
j
Ni
Climate Variable (Temperature or Soil Moisture Increase Rate)
Num
ber o
f Tre
es
i indexes simulator run
j indexes the tree species
k indexes the regression coef.
Single Regression Surface
Justifying the Choice of Model
● We do not a priori expect the output distribution to be Gaussian anywhere on the input space.
→ Use a non-parametric (Dirichlet Process) infinite mixture of regression surfaces instead of a single surface.
● We do not a priori expect the shape of the output distribution to be constant over the input space.
→ Use the Kernel Stick-Breaking Process of Dunson and Park (2006) to allow the DP mixture to be predictor-dependent.
Climate Variable (Temperature or Soil Moisture Increase Rate)
Num
ber o
f Tre
es Finite (Truncated) Mixture ofRegression Surfaces
Negative Binomial Likelihood● The output variable of interest is number of adult trees of each
species. Why not use a Poisson likelihood?● Preliminary data show Var[y] scales roughly like E[y]2, not
E[y], and Var[y] is also inversely dependent on the forest area.
→Use the negative binomial distribution, which has pmf:
and moments:
where the prior range of can be increased with area.
yν
μνμ
μνν
Γ(ν)1)Γ(yν)Γ(yνμ,|yf
ν
μμνμ,|yVarμνμ,|yE2
The Full Model
22i22
21i11i
ii
im
mmi1
iix
6
1kijkikij
3
1jijjii
ixiii
ΓxψΓxψexpΓ,xK
,Γ,xKVΓ,V;xW
,βGΓ,V;xW1Γ,V;xWβG
βXexpμ,μ,νNegBinν,β|yf
N1,...,i,βdGν,β|yfν|yf
i
i
lll
llll
lll
ll
��
�
��
Kernel Stick-Breaking Process
{1,2,3}jid,DiscreteGr~ν
LogNormal~ψ
Wishart~Φ,Σ
al,MatrixNorm~β
,Φ,Σ,βalMatrixNorm~G
,ηGDP~G
,α1,Beta~V
id,DiscreteGr~Γ
j
1,2
10
10
0
0000
0
��
����
l
l
l
Comments on the Model● This model, unlike Dunson and Park’s original, lacks
conjugacy between f and G0; thus two changes must be made to their algorithm:– We no longer have the full conditional for , so we must use
a Metropolis-Hastings step to update it.– The integral cannot be evaluated exactly
so we must approximate it numerically using (e.g., ) Monte-Carlo integration.
● The original MATLAB code is itself not fast, but once a posterior sample has been generated it is cheap to predict the output pmf at new points in the input space.
i0ii βdGν,β|yf��
iβ�
Generating Simple Climate Change Scenarios
● The ballpark estimates of today’s (soil moisture, temperature) mean and covariance are:
● The 1000+ year initialization run has temperature and soil moisture generated by a MVN with this mean and covariance.
● Temperature is measured in °C and soil moisture in %.
14.380.190.191.78
cov18.9416.61,mean
● Future 100 year scenarios are generated assuming the means change linearly in time with rates given by the points on plot below:
• GCM’s generally predict hotter, drier conditions for the Southeastern US.
•Accordingly, ranges were: [-1,+2]*SD/century for Temperature and [-2,+1]*SD/century for Soil Moisture.
Shown are the generated soil moisture and temperature used in the initialization run, and three generated future scenarios. Climate change begins at year 1052.
Legend— Stable Climate— Hotter/Drier— Cooler/Wetter
Results● I just got the initialization run back last week, so
ask me in 3 months.
Other Thoughts● May need to continue the initialization run another 500-1000 years to get a better
equilibrium.● Need a lot more runs when using nonparametrics anyway, so the benefits of using a
Latin Hyper-Cube design are less obvious (in 2-D anyway).
Acknowledgements● Jim Clark’s group for use of their simulator, and
especially Sean McMahon for his invaluable assistance.● David Dunson and Ju-Hyun Park for explaining their
paper to me and letting me use their algorithm.● The SAMSI Methodology and Terrestrial Models
Working Groups for fruitful discussions.
ReferencesDunson, D. B., and J.-H. Park, “Kernel Stick-Breaking Processess”, ISDS Discussion Paper
22 (2006) and Biometrika (accepted)Govindarajan, S., M. Dietze, P. Agarwal, and J. S. Clark, “A scalable simulator for forest
dynamics”, Symposium on Computational Geometry 2004: 106-115Govindarajan, S., M. Dietze, P. Agarwal, and J. S. Clark, “A scalable algorithm for dispersing populations”, Journal of Intelligent Information Systems 2004 (online)