14
Ecological Modelling 235–236 (2012) 81–94 Contents lists available at SciVerse ScienceDirect Ecological Modelling jo ur n al homep ag e: www.elsevier.com/locate/ecolmodel Pollen-based climate reconstruction: Calibration of the vegetation–pollen processes V. Garreta a,b,c,, J. Guiot a , F. Mortier d , J. Chadœuf c , C. Hély a a CEREGE UMR6635, CNRS Université Aix-Marseille, Europôle de l’Arbois BP 80, 13545 Aix en Provence, France b Department of Statistics, Trinity College Dublin, Dublin 2, Ireland c INRA, Département Biométrie et Processus Spatiaux, Domaine Saint Paul, 84914 Avignon, France d CIRAD, Campus international de Baillarguet, 34398 Montpellier, France a r t i c l e i n f o Article history: Received 10 August 2011 Received in revised form 26 March 2012 Accepted 27 March 2012 Available online 4 May 2012 Keywords: Palaeoclimate reconstruction Pollen production Pollen dispersal Vegetation model error Multinomial data Hierarchical Bayesian model a b s t r a c t Palaeoclimate reconstructions are based on the relationship between climate and sediment pollen assem- blages. This model is called the transfer function (TF). Process-based TF emerge as an opportunity to better quantify past climate changes. For example, when a process-based model of vegetation dynamics is part of the TF it allows to include atmospheric CO 2 concentration and plant–plant interactions as factors affect- ing the reconstruction. We propose the missing piece for a fully process-based TF: the model linking, at a continental scale, vegetation model outputs and pollen sampled in sediments. We perform its calibration and we explore the quality of fit. The model represents the error of the vegetation model LPJ-GUESS and four main processes: pollen production, dispersal, accumulation and sampling. Accumulation and sampling processes are either mod- elled using a multinomial-Poisson (MP) or a multinomial-negative binomial (MNB) model, both models allowing for overdispersion and structural zeros in the sense of null multinomial probabilities. We per- form inference for a European pollen dataset by parallelising a Monte Carlo Markov Chain algorithm. Model fitness diagnostics indicate that MP model is not supported by the European dataset. The MNB model is also detected inconsistent, but with a p-value of 0.014 and without stationarity nor overdisper- sion problems. At this stage, the MNB model is considered as a robust alternative to more complex models. We finally discuss the challenge of the TF inversion for palaeoclimate reconstruction and vegetation model re-calibration. © 2012 Elsevier B.V. All rights reserved. 1. Introduction Pollen-based palaeoclimate reconstructions are obtained using models of the relationship between climate and pollen assemblages sampled in sediment cores. These models are called transfer func- tions (TF). See for example the pioneering work of Webb and Bryson (1972), Guiot and De Vernal (2007), Birks et al. (2010), and Bartlein et al. (2011) for recent reviews and applications of the methods. The reconstruction process is twofold. The calibration step con- sists in designing and fitting the TF to a modern dataset that is typically massive (thousands of multivariate counts) and spatially distributed at a continental scale. This is imposed by the need to infer a robust link between climate and pollen over a large cli- mate range, at least as large as expected for past variations. The Corresponding author at: Department of Statistics, Trinity College Dublin, Dublin 2, Ireland. E-mail address: [email protected] (V. Garreta). reconstruction step consists in using the calibrated model for the inference of past climate states corresponding to a given set of pollen assemblages. Most existing TF are correlative in that they describe a static (in time and space) modern association between climate and pollen without accounting for processes expected to link them. The first step toward the development of more natural and transparent TF has been the creation of direct or forward TF, i.e. TF modelling pollen as a function of climate (the ‘response surface’ method, Bartlein et al., 1986). With direct TF, the calibration step is the inference of a statistical model and the reconstruction step is more difficult because associated to the ‘inverse’ problem of reconstruct- ing covariates (drivers) associated to a given pollen response. The inverse problem was heuristically solved in the original response surface method. The natural solution to the inverse problem is pro- vided by the Bayesian approach, first used in Vasko et al. (2000) and Guiot et al. (2000). Haslett et al. (2006) showed that forward TF in a Bayesian framework allow a major improvement by properly quan- tifying and propagating uncertainties entering the calibration and reconstruction steps. 0304-3800/$ see front matter © 2012 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.ecolmodel.2012.03.031

Pollen-based climate reconstruction: Calibration of the vegetation–pollen processes

Embed Size (px)

Citation preview

Page 1: Pollen-based climate reconstruction: Calibration of the vegetation–pollen processes

Pp

Va

b

c

d

a

ARRAA

KPPPVMH

1

mst(e

stdim

2

0h

Ecological Modelling 235– 236 (2012) 81– 94

Contents lists available at SciVerse ScienceDirect

Ecological Modelling

jo ur n al homep ag e: www.elsev ier .com/ locate /eco lmodel

ollen-based climate reconstruction: Calibration of the vegetation–pollenrocesses

. Garretaa,b,c,∗, J. Guiota, F. Mortierd, J. Chadœufc, C. Hélya

CEREGE UMR6635, CNRS Université Aix-Marseille, Europôle de l’Arbois BP 80, 13545 Aix en Provence, FranceDepartment of Statistics, Trinity College Dublin, Dublin 2, IrelandINRA, Département Biométrie et Processus Spatiaux, Domaine Saint Paul, 84914 Avignon, FranceCIRAD, Campus international de Baillarguet, 34398 Montpellier, France

r t i c l e i n f o

rticle history:eceived 10 August 2011eceived in revised form 26 March 2012ccepted 27 March 2012vailable online 4 May 2012

eywords:alaeoclimate reconstructionollen productionollen dispersalegetation model errorultinomial data

a b s t r a c t

Palaeoclimate reconstructions are based on the relationship between climate and sediment pollen assem-blages. This model is called the transfer function (TF). Process-based TF emerge as an opportunity to betterquantify past climate changes. For example, when a process-based model of vegetation dynamics is partof the TF it allows to include atmospheric CO2 concentration and plant–plant interactions as factors affect-ing the reconstruction. We propose the missing piece for a fully process-based TF: the model linking, at acontinental scale, vegetation model outputs and pollen sampled in sediments. We perform its calibrationand we explore the quality of fit.

The model represents the error of the vegetation model LPJ-GUESS and four main processes: pollenproduction, dispersal, accumulation and sampling. Accumulation and sampling processes are either mod-elled using a multinomial-Poisson (MP) or a multinomial-negative binomial (MNB) model, both modelsallowing for overdispersion and structural zeros in the sense of null multinomial probabilities. We per-

ierarchical Bayesian model form inference for a European pollen dataset by parallelising a Monte Carlo Markov Chain algorithm.Model fitness diagnostics indicate that MP model is not supported by the European dataset. The MNB

model is also detected inconsistent, but with a p-value of 0.014 and without stationarity nor overdisper-sion problems. At this stage, the MNB model is considered as a robust alternative to more complex models.We finally discuss the challenge of the TF inversion for palaeoclimate reconstruction and vegetation modelre-calibration.

. Introduction

Pollen-based palaeoclimate reconstructions are obtained usingodels of the relationship between climate and pollen assemblages

ampled in sediment cores. These models are called transfer func-ions (TF). See for example the pioneering work of Webb and Bryson1972), Guiot and De Vernal (2007), Birks et al. (2010), and Bartleint al. (2011) for recent reviews and applications of the methods.

The reconstruction process is twofold. The calibration step con-ists in designing and fitting the TF to a modern dataset that is

ypically massive (thousands of multivariate counts) and spatiallyistributed at a continental scale. This is imposed by the need to

nfer a robust link between climate and pollen over a large cli-ate range, at least as large as expected for past variations. The

∗ Corresponding author at: Department of Statistics, Trinity College Dublin, Dublin, Ireland.

E-mail address: [email protected] (V. Garreta).

304-3800/$ – see front matter © 2012 Elsevier B.V. All rights reserved.ttp://dx.doi.org/10.1016/j.ecolmodel.2012.03.031

© 2012 Elsevier B.V. All rights reserved.

reconstruction step consists in using the calibrated model for theinference of past climate states corresponding to a given set ofpollen assemblages.

Most existing TF are correlative in that they describe a static (intime and space) modern association between climate and pollenwithout accounting for processes expected to link them. The firststep toward the development of more natural and transparent TFhas been the creation of direct or forward TF, i.e. TF modellingpollen as a function of climate (the ‘response surface’ method,Bartlein et al., 1986). With direct TF, the calibration step is theinference of a statistical model and the reconstruction step is moredifficult because associated to the ‘inverse’ problem of reconstruct-ing covariates (drivers) associated to a given pollen response. Theinverse problem was heuristically solved in the original responsesurface method. The natural solution to the inverse problem is pro-vided by the Bayesian approach, first used in Vasko et al. (2000) and

Guiot et al. (2000). Haslett et al. (2006) showed that forward TF in aBayesian framework allow a major improvement by properly quan-tifying and propagating uncertainties entering the calibration andreconstruction steps.
Page 2: Pollen-based climate reconstruction: Calibration of the vegetation–pollen processes

8 l Modelling 235– 236 (2012) 81– 94

tGiatl(rs2nbNbeTtepdpai

dbpssMn

L(otwtpobeaacGt

aTcwSpaeoictcmf

t

Table 1The k = 15 groups of pollen and vegetation model outputs.

j Pollen taxa Vegetation group from model outputs

1 Abies Abies alba2 Alnus Alnus incana3 Betula Betula pendula +Betula pubescens4 Carpinus Carpinus betulus5 Corylus Corylus avellana6 Fagus Fagus sylvatica7 Fraxinus Fraxinus excelsior8 Picea Picea abies9 Pinus Pinus sylvestris +Pinus halepensis10 Quercus evergreen Quercus coccifera +Quercus ilex11 Quercus deciduous Quercus robur12 Tilia Tilia cordata13 Ulmus Ulmus glabra

2 V. Garreta et al. / Ecologica

Process-based TF form another branch which appeared thankso the Bayesian approach. The first TF of this sort is proposed byuiot et al. (2000) and includes a vegetation model. The method

s often called ‘model inversion’ because the forward nature of vegetation model, and processes in general, requires to solvehe inversion problem. The strength of this TF is to describe eco-ogical processes and interactions more realistically. Guiot et al.2000) showed that it allows to control the effect of CO2 forcing oneconstructions. The new branch has slowly burgeoned, demon-trating its capacity in linking multiple proxies (Hatté and Guiot,005; Rousseau et al., 2006; Guiot et al., 2009), incorporating theew class of dynamic vegetation models (Garreta et al., 2010) andeing adopted in dendroclimatology (Tolwinski-Ward et al., 2011).ow, various conjectures exist about the benefits of using process-ased TF (Anderson et al., 2006; Hughes and Ammann, 2009; Jonest al., 2009; Guiot et al., 2009; Hughes et al., 2010; Garreta, 2010).he main idea is perhaps that, replacing simple descriptive rela-ionships by richer structures based on processes should allow toxtract more, or more precise, information from data. From theractical side, a process-based TF is often more complex than aescriptive TF but since its parameters control physical or biologicalrocesses, they have values that can be discussed. The transparencynd great potential of this type of TF counterbalances the big andnterdisciplinary effort required to develop them.

Existing process-based TF still use a correlative model toescribe the link between simulated vegetation and pollen assem-lages in sediments. Processes expected to link vegetation andollen accumulated in sediments are the subject of discussionsince von Post (1916), Tauber (1965) and Webb (1974). Today,everal process-based models exist (Sugita, 2007a,b; Paciorek andcLachlan, 2009) but they must be adapted to account for conti-

ental scale pollen dataset as those used in palaeoclimatology.We propose the coupling of (i) a dynamic vegetation model,

PJ-GUESS (Smith et al., 2001), to simulate net primary productionNPP) of vegetation per species based on climate inputs with, (ii) anriginal statistical model representing the main processes expectedo link vegetation and pollen at a continental scale. We frame thehole model in the Bayesian hierarchical paradigm which allows

o formally account for the vegetation model error along with aarametric representation of processes. In this article we only focusn the calibration step of the new composite model. The process-ased nature of the model makes explicit the links between ecology,cophysiology and palaeoclimatology and even in this first step of

palaeoclimatological work, conclusions are relevant to a widerudience than palaeoclimatologists. The reconstruction step is dis-ussed and a first solution is proposed in Garreta et al. (2010) andarreta (2010). The calibration and reconstruction steps are illus-

rated in Fig. 1.The modelling and inference of the processes linking vegetation

nd pollen at a continental scale raise two challenges in statistics.he first challenge consists in building a model with a sufficientomplexity to represent the main features of the processes buthose calibration using thousands of data points remains feasible.

econd, we have to consider the modelling of multinomial overdis-ersion in the context of structural zeros. Indeed, pollen counts thatre naturally modelled using the multinomial distribution (Haslettt al., 2006) are over-dispersed (i.e. with a variance larger than thene of the multinomial distribution) and contain ‘structural’ zerosn the sense that the probabilities controlling the multinomial, for aertain component, can be exactly zero (due to true absence of theaxa in the sampled region). Overdispersion of the multinomial islassically accounted for by using the multinomial-Dirichlet (MD)

odel (e.g. Paciorek and McLachlan, 2009), which does not allow

or structural zeros.We propose a multinomial-Poisson (MP) and its extension,

he multinomial-negative binomial (MNB) model to represent

14 Populus Populus tremula15 Grasses and Shrubs C3 grasses

multinomial overdispersion in the context of structural zeros. Manyzeros (often referred to as zero-inflation when compared to theamount prescribed by a specific distribution) is a common featureof data in many domains and has recently received particular atten-tion in ecology (review in Martin et al., 2005). Null observationshave different causes: (i) structural zeros relate to the (true) absenceof a species in unsuitable habitats, whereas (ii) random zeros ariseby chance from ecological processes (e.g. randomness in the dis-persal), or sampling or observation error (Martin et al., 2005). Wepropose to model the multinomial probabilities as arising from thenormalising of k latent random variables (X1, . . ., Xk) following dis-crete distributions, either Poisson or negative binomial, where k isthe number of multinomial components considered. We demon-strate identifiability of the MP model, we interpret its parametersand we test both MP and MNB models using simulated datasetsbefore performing inference for the European dataset.

We present data in Section 2.1 and the model in Section 2.2through its latent levels. The last level of the model (Section 2.2.3)is made of the MP or the MNB model. In Section 3 we explain theparallelisation of the MCMC algorithm and we present inferencetests using simulated datasets. We present the results for the Euro-pean dataset in Section 4, followed by a discussion and a conclusionin Sections 5 and 6.

2. Material and methods

2.1. Data and simulations

We used the pollen surface sample database compiled by Bordon(2008). It consists of pollen grain abundances counted per taxa. Weextracted all n = 1301 sites whose coordinates are between −10◦

and 32◦W and 34◦ and 71◦N, the geographic region we call ‘Europe’in the following. The total number of pollen counted per site, Ni,ranges between 100 and 500. A pollen assemblage sampled at sitei = 1, . . ., n will be noted Yi = (Y1

i, . . . , Yk

i). Then, Yj

iis the number of

pollen grains of the taxa j at site i. Taxa have been defined to obtainthe best agreement with vegetation model outputs, see Table 1.

For the simulation of the vegetation corresponding to pollensamples, we used the 20th century climate dataset CRU TS1.2(New et al., 2002) having a monthly temporal resolution and a10 min spatial resolution. These series have been interpolated atthe pollen sites using ordinary kriging with the altitude as externaldrift (Cressie, 1991).

We simulated the vegetation net primary production (NPP)

in kg m−1 yr−1 per species considered in the model at each ofthe pollen sites. These simulations are obtained using LPJ-GUESS(Smith et al., 2001) parameterised with 17 major tree species inEurope plus one group representing all grasses and shrubs. Some
Page 3: Pollen-based climate reconstruction: Calibration of the vegetation–pollen processes

V. Garreta et al. / Ecological Modelling 235– 236 (2012) 81– 94 83

Fig. 1. Diagram of the calibration (left) and reconstruction (right) steps in a climate reconstruction problem. Ci is a set of climate variables at site and time point (i). Yi is a pollena riablesl d as ‘c alues

o1ds(w

wlr

1

2

3

2

psluai

ovfBfId

[

TdfG

ssemblage recorded at the same point i. Variables in a squared box are known, vainks between variables and processes, e.g. ‘climate drives LPJ-GUESS’ is representeontrolling the climate-pollen link. The reconstruction step uses these parameter v

f these 18 species are summed to reduce the number of taxa to5 for a better agreement with pollen data, see Table 1. For a fullescription of the species in terms of vegetation model parameters,ee Miller et al. (2008) and Garreta et al. (2010). The vector of theabsolute) net primary production simulated by LPJ-GUESS at site iill be noted NPPi = (NPP1

i , . . . , NPPki ).

Modern pollen samples are compared to vegetation obtainedith climate reaching until 1990, i.e. obtained by ‘transient’ simu-

ations controlled by the 20th century climate. A pseudo code forunning LPJ-GUESS, for each site i, is,

. Spin-up the model during 500 years, by repeating the 1901–1930climate chronology at site i which has been detrended, and theCO2 concentration of 1901,

. Run the model for the years 1901–1990 using the climatechronology at site i for the period 1901–1990 and the CO2 yearlyconcentration measured over this period,

. Retain the average NPP for the years 1961–1990 to form NPPi.

.2. Process-based statistical model

We build a hierarchical Bayesian model representing the mainrocesses linking LPJ-GUESS simulated vegetation and pollen dataampled over Europe. Each of these processes is implemented in aatent level whose succession is causative: starting from the sim-lated vegetation → actual vegetation → produced, dispersed andccumulated pollen → sampled pollen. See the Direct Acyclic Graphn Fig. 2.

In the next sections, the graph in Fig. 2 is translated into a seriesf probability density functions (pdf) linking the latent randomariables. One such pdf is noted [A|B, C] for ‘the probability densityunction of the random variable A given the values of the variables

and C’. The Bayesian hierarchical model containing A, B and C isully specified when, e.g. the pdf [A|B, C], [B|C] and [C] are specified.n the Bayesian context, the inference of B and C given an observedataset A = aobs is obtained as the posterior pdf,

[aobs|B, C][B|C][C]

B, C|aobs] = ∫

[aobs|B, C][B|C][C]dBdC

he numerical evaluation of such a posterior may be extremelyifficult if the dimensions of the variables A, B and C are large. Forurther reading about applied Bayesian modelling and inference seeelman et al. (2004).

in a rounded box are to be calibrated or reconstructed. Arrows represent directedC → LPJ − GUESS’. The calibration consists in inferring the value of the parametersto reconstruct, i.e. infer, past climate.

2.2.1. From potential and uncertain to actual vegetationThe vegetation simulated using LPJ-GUESS is potential vegeta-

tion. This means that it is controlled by climate, soil properties,CO2 and not disturbed by human activities. Moreover, the vegeta-tion model outputs are expected to be uncertain due to uncertaintyon processes and parameter values governing plant species. SeeKennedy and O’Hagan (2001) for a statistical approach of ‘modeluncertainty’, Tang and Bartlein (2008) in the context of vegeta-tion models and Zaehle et al. (2005) and Wramneby et al. (2008)with LPJ-GUESS. We model the modern vegetation composition asa noisy image of the simulated NPP. Let Vi = (V1

i, . . . , Vk

i) be the

unobserved modern or ‘actual’ vegetation producing pollen. Weassume that given the simulated vegetation, its distribution is amixture of gamma distributions:

[Vj

i|NPPj

i, �j, mj, qj

]=

⎧⎪⎪⎨⎪⎪⎩

G(

(NPPji)2

�j,

NPPji

�j

)if NPPj

i> 0

qjı0 + (1 − qj) G(

(mj)2

�j,

mj

�j

)if NPPj

i= 0

(1)

where ı0 is the Dirac mass at 0 and G(s, r) the gamma distribu-tion with shape and rate parameters s and r. This modelling ofthe anthropogenic disturbance and LPJ-GUESS uncertainty is inter-preted as follows: when the taxa j is simulated at site i (NPPj

i>

0), the vegetation is distributed following a gamma distribution(showing no probability mass at 0) centred on the simulated valueNPPj

iwith variance �j. Then, we assume that, if simulated, a species

is present, at least in a very small proportion around the consideredsite i. When not simulated at site i (NPPj

i= 0), the actual vegetation

species Vji

has a probability qj (∈[0, 1]) of being absent. If present

regardless of NPPji= 0 it is distributed following a gamma distri-

bution centred on mj and with variance �j. The overall mean andvariance of

[Vj

i|NPPj

i= 0]

are (1 − qj)mj and (1 − qj)(�j + qjmj).

This parsimonious representation of LPJ-GUESS error is basedon the hypothesis that parameters q, m and � are constant overthe spatial domain. LPJ-GUESS error is thus said ‘stationary’ inthe sense that it has the same distribution over the whole spa-tial domain (Europe). Stationarity – in the sense of no dependenceon spatial location – is crucial for palaeoclimatology because TFuse spatial variations to reconstruct past temporal ones. It can

be weakened by linking parameters q, m and � to surrogates forthe vegetation model error, such as for example, indicators of theanthropogenic disturbance. We will test and comprehensively dis-cuss this assumption in the application.
Page 4: Pollen-based climate reconstruction: Calibration of the vegetation–pollen processes

84 V. Garreta et al. / Ecological Modelling 235– 236 (2012) 81– 94

Fig. 2. Direct Acyclic Graph of the hierarchical Bayesian model linking modern climate (Ci) to modern pollen samples (Yi). For clarity, we represent only three instances ofeach variable, i.e. i = 1, 2, 3. Squares represent data (observed variables), rounded boxes represent latent variables. Arrows represent conditional probability dependenciesb he firsa , �, is s(

bspprbucie

ltfied

2

is

picctbWt

etween variables. Parameters are presented on the right column of the diagram. Tnd multinomial-negative binomial (MNB) models we propose. The last parametere.g. Europe).

We centred the actual vegetation on the simulated valuesecause we had no quantitative vegetation data at a continentalcale and at the level of the species. Even if the simulated meanser taxa are slightly biased, this has only an impact on the pollenroduction parameters because pollen data only inform us aboutelative proportions of each taxa. We also considered independenceetween taxa, which corresponds to assume that LPJ-GUESS sim-lates correctly processes driving the inter-species relations (e.g.ompetition). This is a simplification we found necessary becausentroducing a prior dependence between species makes the infer-nce intractable.

LPJ-GUESS error is expected to be spatially correlated due to theack of representation of spatial dynamics and (spatial) human dis-urbances inside the vegetation model. In this section we modelledrst a spatially independent error. The spatial dependence in therror is captured in the next latent level at the same time as pollenispersal.

.2.2. Linear production and Gaussian dispersal of the pollenThe absolute quantity of pollen produced at site i by species j

s the absolute actual vegetation times a production parameter perpecies, bj: bjV j

i.

The assumption of a linear relationship between the absoluteollen quantity and the absolute vegetation quantity is classical

n palaeo-ecology (e.g. Sugita, 2007a). Nevertheless, if the variablehosen to characterise the absolute vegetation quantity is not wellhosen, the approximation can be very crude. For example, the

ree number is not useful because a large quantity of saplings maye present which are not yet mature and don’t produce pollen.e centred V on the total NPP. This could be refined by using only

he NPP from mature trees. Such a variable can be computed using

t five parameters (q, m, �, b and �) are common to both multinomial-Poisson (MP)pecific to the MNB model. Latent variables are located on a two dimensional space

LPJ-GUESS but it typically contains more variability than the totalNPP. In this work we selected a robust but less precise indicator ofthe absolute vegetation quantity and we postpone the comparisonof both possibilities to a future work.

Next, the pollen produced by each species is dispersed follow-ing a Gaussian kernel whose dispersal parameter depends on thespecies. For each spatial location i and each species j, the pollenbrought by dispersal, bjSj

i, equal to the convolution of the Gaussian

kernel ˛j(.) centred on i:

bjSji= bj

n∑l=1

˛j(d(i, l))Vjl

(2)

where ˛j(x) ∝ (1/� j) exp (− (x2/2(� j)2)) is the kernel, d(x, y) =√(x − y)2 is the Euclidian distance, � j (j = 1, . . ., k) are the disper-

sal parameters and l = 1,. . .,n is an index scanning the whole set ofspatial locations.

As mentioned in the previous section, kernels account at thesame time for pollen dispersal and for spatial dependencies in thevegetation model errors. Indeed, the Vj

i, i = 1, . . ., n represent actual

vegetation and carry independent vegetation model errors. Theycould be smoothed first, using a kernel k1, to create a spatiallystructured ‘model error’ field, in the manner introduced in Higdon(1998). This first field could be itself smoothed using a kernel k2, torepresent the dispersal process. Since we have no vegetation data,at a European scale and at the species or genus level, to identifythe parameter of kernel k1, we use the property of convolution to

use a single kernel representing the convolution of kernel k1 andk2. Then, dispersal parameters � j have to be interpreted as a mix-ture of pollen distance dispersal and spatial correlation range forthe vegetation model errors. For example, if we assume a Gaussian
Page 5: Pollen-based climate reconstruction: Calibration of the vegetation–pollen processes

Mode

k

abtaacbddpPabicabt

2

i(chtlppnXp

atdei

[

wp

peTteometoEa‘rd

ti

V. Garreta et al. / Ecological

ernel for both spatial structure with parameters �j1 and �j

2, thenj = �j

1 + �j2.

This representation of the production and dispersal processeslong with the spatial correlation of the vegetation model error isased on the hypothesis of stationarity of all processes over the spa-ial domain. For the pollen production, experiments demonstrate

change of pollen production with increasing CO2 (e.g. LaDeaund Clark, 2006 and review in Jackson and Williams, 2004). Thisould be included in the reconstruction step for period of lower CO2y linking bj to the CO2 concentration, but requires experimentalata for calibration. The Gaussian, i.e. isotropic light-tailed kernelerives from the Sutton equation which is classical in the study oflant–pollen relations (Tauber, 1965; Prentice, 1985; Sugita, 2007a;aciorek and McLachlan, 2009). It corresponds to ignore long rangend preferential directions for dispersal. A fat-tailed kernel coulde included in the model at no cost if its number of parameters

s the same as the Gaussian. In this first approach we use thelassical model to allow comparison with other studies. Includingnisotropy is an attractive perspective, which could be achievedy using spatially varying kernels k(. , s) related, for example, toopography and wind maps in the manner of Higdon (1998).

.2.3. Accumulation and sampling of the pollenPollen accumulation in natural traps (mosses, lakes, peat bogs)

s seen as the capture of airborne pollen over the area of the trapPrentice, 1985). This main process is perturbed by local conditionsreating, for example, anisotropy in the local dispersal function,eterogeneity of the trap capture and differential concentration ofhe incoming fluxes. Moreover, information about the trap is oftenacking or loose (e.g. the size of the lake in the past, the strength ofast wind fields). Since the processes are too local, numerous andoorly informed to be modelled, we will represent them by addingoise to Si = (S1

i, . . . , Sk

i) the pollen brought by dispersal, to form

i = (X1i

, . . . , Xki) the pollen quantities accumulated at site i. We

ropose two models for linking Si to Xi in the next sections.Conditional on the pollen proportions trapped, pi = (p1

i, . . . , pk

i)

rising from the local normalising of Xi, i.e. pji= Xj

i/�m=1,...,k Xm

i,

he sampling process is naturally modelled using a multinomialistribution (Mosimann, 1965; Parsons and Prentice, 1981; Haslettt al., 2006). Then, Yi = (Y1

i, . . . , Yk

i), the pollen assemblage at site

, follows the multinomial distribution

Yi|Xi] = M(pi, Ni)

ith Ni the total pollen number at site i and pi = (p1i, . . . , pk

i) the

roportions of pollen accumulated in the trap.In Bayesian hierarchical model, the multinomial proportions,

i, are always assumed to follow a Dirichlet distribution (Vaskot al., 2000; Haslett et al., 2006; Paciorek and McLachlan, 2009).his is convenient for inference because the Dirichlet distribu-ion is conjugated to the multinomial making inference muchasier. Nevertheless, using a multinomial-Dirichlet model (MD),nes assume that, first, no zero data is generated by the accu-ulation process because Dirichlet distribution cannot produce

xactly zeros and, second, that all species are present over allhe space considered. Both hypotheses are not consistent withur data. For example, evergreen oak, a species of southernurope cannot be sampled in northern Europe because it is trulybsent. In this sense, continental-scale ecological datasets contain

structural’ zeros (the true absence of the taxa in large sampledegion) and the use of MD model is not consistent with such

ata.

We propose two natural extensions of the MD model allowingo take into account structural zeros. For each model we provide annterpretation in terms of processes and underlying assumptions.

lling 235– 236 (2012) 81– 94 85

Multinomial-Poisson modelThe pollen Xj

iof the species j accumulated at the site i is

[Xji|bj, �j, Vj] = P(bjSj

i) (3)

where P is the Poisson distribution and Sji

the pollen brought bydispersal at site i for taxa j (Eq. (2)). When the pollen is theoret-ically absent (Sj

i= 0), the Poisson distribution degenerates into a

Dirac mass at 0. Thus, no pollen can appear during the accumulationprocess if it was not brought by dispersal.

The absolute pollen productions bj are not identifiable becausethe pollen data we consider are multinomial, i.e. contain only infor-mation about proportions. We demonstrate in Appendix A thatk − 1 parameters, bj/bk, j = 1, . . ., k are interpretable in terms of rela-tive pollen productions per species. The kth parameter K = �k

j=1 bj

controls the overdispersion of the multinomial. When K decreases,overdispersion increases.

This model is interpreted as follows: consider pollen grainsof species j that are dispersed on the ground following a Poissonprocess of intensity bjSj

i(in grains m−2) and every trap has the

same size, noted ̌ (in m2). Then, trap i receives Xji

pollen grains

of species j following a Poisson distribution with parameter ˇbjSji.

In this parametrisation, the trap size ̌ is not identifiable becauseabsolute pollen productions bj are not identifiable. Nevertheless, Kplays the role of ̌ and controls the overdispersion that is relatedto the traps’ size. When the trap size decreases (K decreases) andoverdispersion increases, i.e. the trap represents less and less wellthe pollen composition brought by dispersal.

Multinomial-negative binomial modelThe previous model may be too restrictive in assuming that

every trap has the same size. One extension would consist in spec-ifying a latent and varying trap size, noted Ti. The global modelcreated under this assumption is not tractable due to its priordependence between fields Xj, j = 1, . . ., k through the latent trapsize Ti.

We propose to model Tji

as independent gamma-distributedrandom variables,

Tji

= G(ˇ�j, �j)

with mean ˇ and variance ˇ/�j. These variables represent, at thesame time, a varying trap size per site and taxa and sites spe-cific perturbations of the accumulation process. Assuming that[Xj

i|bj, �j, Vj, Tj

i] = P(Tj

ibjSj

i), the gamma distribution provides con-

jugacy with the Poisson distribution. This makes Xji

followingnegative binomial distributions that are independent per site andtaxa

[Xji|�j, bj, �j, Vj] = NB(�j, bjSj

i/(�j + bjSj

i)) (4)

The distribution is centred on bjSji, with variance bjSj

i+ (bjSj

i)2/�j

and probability of Xji= 0 equal to (�j/(�j + bjSj

i))�j

. We haveremoved ̌ because it is already taken into account in the sumK = �k

j=1 bj . Here �j is an extra-overdispersion parameter per taxa,compared to the Poisson distribution.

For this model, we have no theoretical proof of the identifia-bility of bj and �j, j = 1, . . ., k. We made simulation tests (resultsnot presented here) showing that the model becomes identifiablewhen overdispersion is strong (�j < 10). For �j around or higher than10, the inference algorithm diverges quickly to high values (several

�j > 1000). Without formal demonstration, this indicates that themodel can be inferred for very overdispersed models and whenthe model is not identifiable, this is detectable on the posteriordistributions.
Page 6: Pollen-based climate reconstruction: Calibration of the vegetation–pollen processes

8 l Mode

2

pmlwi

3

3

psmbcufo

wpGrpL

[

wfiW

Pt

3

eoWfCtsbpT

tc

[

6 V. Garreta et al. / Ecologica

.2.4. PriorsThe specification of a Bayesian model requires the choice of a

rior distribution (Gelman et al., 2004) [b, � , �, m, q] for the MPodel and [b, � , �, m, q, �] for the MNB one. This distribution trans-

ates our prior knowledge on the values of the parameters. Sincee have no strong beliefs about parameter values, we used poorly

nformative priors. They are listed and discussed in Appendix B.

. Calculation

.1. Bayesian inference

The global model connecting climate to pollen, the TF, is com-osed of two main modules. The vegetation model, LPJ-GUESS, istochastic and defines the distribution of NPP conditional on cli-ate chronology C: [NPP|C]. This distribution is implicit, it can only

e simulated by running the model and cannot be evaluated for aouple of values (NPP, C) (see Garreta et al., 2010). The second mod-le is the hierarchical model we described. The Bayesian calibrationor the whole TF consists in obtaining a numerical approximationf the posterior

[X, V, b, �, �, m, q|Yp, Cp] ∝

[Yp|X] [X|b, �, V ](

∫[V |NPP, �, m, q] [NPP|Cp] dNPP)

[b, �, �, m, q]

(5)

here the superscript p indicates the modern dataset (p forresent). Performing the numerical integration involving LPJ-UESS distribution would be too consuming in terms of computing

esources. We approximate this integral using importance sam-ling (Robert and Casella, 1999) by sampling the NPP following thePJ-GUESS distribution. The posterior is then

X, V, b, �, �, m, q|Yp, Cp] ∝ [Yp|X] [X|b, �, V ] [V |NPPp, �, m, q]

[b, �, �, m, q] (6)

here NPPp is the NPP simulated at each pollen site. This simpli-cation cuts the route to the LPJ-GUESS parameter re-calibration.e discuss this point in the conclusion.For the MNB model we use the same strategy but replace the

oisson distribution with the negative binomial and we augmenthe parameter vector with �.

.2. Markov Chain Monte Carlo using computer parallelism

The model contains two sets of k = 15 latent fields (V and X) mod-lled at n = 1301 spatial locations and five or six sets (MP or MNB)f k = 15 latent variables (q, m, �, b, � and � in the case of MNB).e resort to using a Markov Chain Monte Carlo algorithm to per-

orm inference of this large and non-standard model (Robert andasella, 1999). This type of algorithm allows to draw samples fromhe posterior. Then, based on these samples one can compute anyummary, e.g. mean, covariance and quantiles of the posterior. Weuilt a specific Metropolis-within-Gibbs algorithm with a burn-inhase during which we adapt the random walk proposal variance.he code is available upon request to the first author.

More than 95% of computing time is devoted to sampling fromhe full-conditional distribution of points from the Vj fields. The fullonditional distribution of a point Vj

iis

Vji| . . .] ∝

(N∏

k=1

[Xjk| . . . , Vj]

)[Vj

i|NPPj

i, qj, mj, �j]

lling 235– 236 (2012) 81– 94

Such simulation is time demanding because it requires to computeSj

k, i.e. a convolution of the whole field Vj. We can parallelise the

algorithm over the fields, i.e. simulate independently each field,because full-conditional distributions of the field j’s points onlydepend on the field j (and related variables).

The inference algorithm is coded using C language. Paralleli-sation of the C code on one single (multi-core) shared-memorycomputer is obtained adding a few lines of OpenMP language(http://openmp.org). The only difficulty when parallelising thecode is the need for a parallel and efficient random number gen-erator. We use the combined multiple recursive random numbergenerators from L’Ecuyer (1999). Parallelising the computation on8 processors allowed us to reduce computing time by a factor of 7.

3.3. Model validation

We want to check the consistency between model structureand (hidden) data structure, often referred to as ‘goodness of fit’ or‘model adequacy’ testing (Gelman et al., 1996). Since the model ishierarchical (with two latent levels) and contains a non-Gaussianstructure, there is no classical way to formally test its adequacy.Evaluation of our model by cross-validation is hampered by themultivariate nature of pollen data and the very long computing timerequired for inference. We then use a Bayesian model ‘checking’approach (Gelman et al., 1996; Stern and Cressie, 2000; Marshalland Spiegelhalter, 2003). Based on the comparison of posteriorand unconstrained simulations, the method allows to compare theposited structure with the posterior one without re-inferring themodel.

The method is described in Appendix C.

3.4. Tests with simulated datasets

We made several inference tests using simulated datasets tocheck the correctness of the computer code, the robustness of theMCMC algorithm, the identifiability of the MNB model and theinfluence of vegetation priors. The datasets are simulated followingthe model (either MP or MNB) for k = 3 different species sampledat n = 150 points distributed following a uniform distribution on aone-dimensional space (between 0 and 40). See Fig. 3 for MP model.

For a selected set of 3-dimensional parameter values (r, b, � , q,m, �) and � in the case of MNB model, a dataset is simulated asfollows:

• NPPjiare simulated following a spatially structured Gaussian field

truncated at 0 (simulations below 0 are set to 0). The spatialstructure is given by a Gaussian covariance function with a scaleparameter rj by species and a variance equal to 1.

• Actual vegetation Vji

is simulated following the mixture model inEquation 1 with selected (q, m, �) parameter values.

• The pollen dispersed at each point for each species, Sji, is com-

puted with selected � parameter values.• Accumulated pollen Xj

iis simulated following either a Poisson or

a NB distribution with selected b (and � for NB) parameter values.• After normalisation, pollen accumulated proportions pj

iare

computed and pollen is generated following multinomial distri-butions whose total counts (Ni) are equal to 200.

Inference is performed using only sampled pollen and simulatedNPP.

3.4.1. Multinomial-Poisson modelWe made two kinds of simulations. First, for different

sets of parameters (rj ∈ {1 ; 2 ; 3}, b = (0.5, 0.1, 0.4) × K andK ∈ {10 ; 50 ; 100}, � j ∈ {1 ; 2 ; 3}, qj ∈ {0.6 ; 0.7 ; 0.8 ; 0.9}, mj ∈ {0.5, 1,

Page 7: Pollen-based climate reconstruction: Calibration of the vegetation–pollen processes

V. Garreta et al. / Ecological Modelling 235– 236 (2012) 81– 94 87

Fig. 3. Multinomial-Poisson model with overdispersion given by K = �jbj = 10. Inference for a dataset simulated over an arbitrary spatial index (x-axis). Each column is adifferent species and each row a different latent quantity. The black line is the expected (simulated) value. Posterior mean is the thick line in light grey and 95% HighestP e vegev y b. Thp

2ictarli

i

osterior Region (HPR) are given by the dashed lines. First line shows the absolutegetation or dispersed pollen without reference to the relative production carried bj for each sampled site.

} × mean(NPPj[NPPj > 0]) and �j = var(NPPj[NPPj > 0])) we ran thenference algorithm (for 1.5 million MCMC iterations in 5 h) andhecked that the 95% Highest Posterior Regions (HPR) containedhe parameters and V, X latent fields used for the simulation inpproximately 95% of the cases. These tests indicated that the algo-ithm is robust even with large overdispersion from the Poisson

atent field. We present here the most overdispersed case (K = 10)n Fig. 3.

The second kind of simulations (not presented here) consistedn using wrong informative priors for the vegetation parameters

tation abundance per species Vj (j = 1, . . ., 3). Second line shows Sj , the dispersedird line shows the accumulated pollen Xj and last line the multinomial probabilities

m and � to check their effect on the inferred values. This test canbe interpreted as a rough prior sensitivity analysis to asses if theinformative prior, used for the real dataset, will have a strong influ-ence on the inferred values of all the parameters. The ‘wrong’ priorsused for these tests are the ones described in Appendix B with bothof their parameters multiplied by {0.3 ; 0.5 ; 2 ; 3}. Results indicate

that these priors only influence the inferred m, � and q values. More-over, when the prior range does not include the expected value, theposterior mean is as close as possible of it, but stays in 95% highprobability region of the prior.
Page 8: Pollen-based climate reconstruction: Calibration of the vegetation–pollen processes

8 l Modelling 235– 236 (2012) 81– 94

3

Rmiiuiiw

4

61rpis

4

mteiabvXtscttaNcs

0ecpfia

4

T

saf

Ntp

rfe

nega

tive

bino

mia

l mod

el. T

able

of

the

pos

teri

or

mea

n

of

the

par

amet

ers.

Ejis

the

mea

n

of

NPP

jw

hen

NPP

j>

0.

q,

m

and

are

inte

rpre

ted

in

term

s

of

vege

tati

on

mod

el

erro

r.

�j

are

dis

per

sal l

engt

h

par

amet

ers,

bjar

eat

ed

to

the

rela

tive

pol

len

pro

du

ctio

n

of

spec

ies

j,

and

�jar

e

rela

ted

to

the

over

dis

per

sion

of

the

mu

ltin

omia

l dis

trib

uti

on

(th

e

neg

ativ

e

bin

omia

l par

amet

er).

Abi

es

Aln

us

Bet

ula

Car

pin

us

Cor

ylu

s

Fagu

s

Frax

inu

s Pi

cea

Pin

us

Qu

ercu

s

ever

.

Qu

ercu

s

dec

.

Tili

a

Ulm

us

Pop

ulu

s

Gra

ss

&Sh

rub

6256

91

66

85

66

77

94

56

89

56

90

83

73

760.

26

0.37

0.63

0.11

0.59

0.32

0.84

0.64

0.33

0.23

0.14

0.68

0.84

1.03

1.01

1.9

1.7

2.3

1.7

1.4

1.6

1.1

2.0

1.5

0.8

1.5

0.8

1.2

0.5

2.4

45

97

115

172

73

50

79

89

93

32

71

71

171

328

110

9503

2190

3326

343

586

136

141

296

2713

414

286

50

58

33

14,9

870.

7

2.4

2.6

0.9

1.7

0.6

0.5

1.4

1.2

0.2

0.9

0.4

0.9

0.04

1.2

8 V. Garreta et al. / Ecologica

.4.2. Multinomial-negative binomial modelWe made the same kind of simulations as for the MP model.

obustness of the inference seems comparable to those for the MPodel except for the � parameter. We performed tests by vary-

ng the �j values. These tests show that for simulations whose �j

s higher than 10, the algorithm diverges toward inferred �j val-es higher than 1000. The fact that divergence is easily detectable

ndicates that we can use the model in practice. Indeed divergencendicates that � is too high, i.e. overdispersion is not significant and

e can either fix some of the �j or shift to a MP model.

. Results

For the European dataset the inference algorithm is iterated00,000 times (12 days of computing). We retain 1 iteration over00 over the last 100,000 iterations to avoid autocorrelation andemove the burn-in phase of the algorithm. Summaries are com-uted using these 1000 simulations. Convergence of the algorithm,

n the sense of stationarity of the chain, is monitored visually. Wehow parameters chains related to Corylus in Fig. 4.

.1. Model checking and selection

For the MP model, the posterior predictive p-value as well as theixed posterior predictive p-value are equal to 1. This indicates that

he model is clearly inadequate. We further investigated this inad-quacy and found that it is related to, (i) the lack of overdispersionn the MP model and, (ii) an apparent violation of the stationarityssumption. The lack of overdispersion in the MP model is detectedy comparing, in the posterior simulations, the theoretical Poissonariances (bjSj), with the ‘empirical’ variances computed using thej. For several taxa, the empirical variance is significantly largerhan the theoretical one, clearly showing the lack of overdisper-ion in the model. Spatial stationarity assumption is checked byomputing the mean discrepancy for each spatial point and plot-ing points over Europe, see Fig. 5. As expected from earlier studies,he deviance is higher in the South (mainly is Spain and Greece)nd Centre (France, Switzerland, Belgium and Austria) than in theorth. For the southern points we interpret these high discrepan-ies by a lack of realism of the model (e.g. poor sensitivity to droughttress Garreta et al., 2010).

For the MNB model, the posterior predictive p-value is equal to.119 and the mixed one is equal to 0.014. The model is thus consid-red not adequate by the second measure at a level of 5%. Furtherhecks of overdispersion and spatial stationarity showed that bothroblems of the MP model have disappeared for this model. In theollowing, we adopt the MNB model as a robust model represent-ng current knowledge in ecology and whose major features are ingreement with data.

.2. Parameter values

Posterior mean values of the MNB parameters are presented inable 2.

Parameters q, m and � are related to the errors in the vegetationimulations, e.g. such as the one created by not taking into accountnthropogenic disturbances. One aspect of these posteriors is thator most of the taxa (13 over 15), mj, the mean of Vj

i> 0 given

PPji= 0 is lower than the mean of NPPj > 0. This is expected: when

he vegetation NPP has not been simulated, V should not be moreroductive than in places where NPP has been simulated.

Parameters � j account at the same time for pollen dispersalange and spatial correlation in the vegetation model error due,or example, to migrational process and heterogeneity in the localnvironment. Parameter values are thus expected to be a mixture Ta

ble

2M

ulti

nom

ial-

ind

irec

tly

rel

Taxa

(j)

qj(i

n

%)

mj /

Ej√ �

j /E

j

�j(i

n

km)

bj

�j

Page 9: Pollen-based climate reconstruction: Calibration of the vegetation–pollen processes

V. Garreta et al. / Ecological Modelling 235– 236 (2012) 81– 94 89

F the pi and t

om11fdaff(it3Jsv

ig. 4. Multinomial-negative binomial model. Sequence of posterior simulations ofterations (graph at the bottom left). The light grey lines show the mean (solid line)

f ranges associated to both dispersal and error structure. Posteriorean values we obtained range from 32 to 115 km for 12 of the

5 taxa. Carpinus, Ulmus and Populus respectively show ranges of72, 171 and 328 km. The comparison of these values with thoserom other studies of the pollen–vegetation relation is hard becauseispersal as well as processes creating the vegetation model errorre highly related to the spatial and taxonomic scale of data. Weound only two studies where pollen dispersal is directly inferredrom pollen and large scale vegetation data. Bradshaw and Webb1985) base their study on modern pollen collected in lakes and treenventory. They use a linear regression to relate pollen proportionso vegetation proportions and find dispersal distances superior to

0 km for Pinus and Quercus and 4.5 km for Fagus. Williams andackson (2003) measure the link between vegetation as sensed byatellite imaging and pollen data. They then use a classification ofegetation in term of broadleaved and needleleaved plants, which

arameters of Corylus and variation of the posterior predictive deviance over thehe 95% highest probability region (dashed lines).

are found to match well with pollen records when averaged overa window whose half-widths is in the range 25–75 km. We theninterpret ranges found for 12 taxa as globally in agreement withwhat is inferred when using real vegetation instead of vegetationmodel simulations. Carpinus and Ulmus have to be further inves-tigated but Populus range of 328 km is clearly an indication of thepoor predictive capacity of the vegetation model. This may be dueto that, in some regions, Populus abundance is more related to riverpresence than any climate variable included in LPJ-GUESS.

We interpret bj as having an indirect relation with relativepollen production and �j in terms of overdispersion of the MNBdistribution. The �j parameters range from 0.04 to 2.6. These values

indicate a large variance for the NB model. Populus is the worstpredicted taxa, with the smallest �j. bj values have a large rangeof variation accounting for the large differences in productivitybetween taxa. The largest bj value is several order of magnitude
Page 10: Pollen-based climate reconstruction: Calibration of the vegetation–pollen processes

90 V. Garreta et al. / Ecological Modelling 235– 236 (2012) 81– 94

F erencd the dw remo

lha�fz

cLa

5

(cesdrteMrFanoBSvlbTi

5

ludo

ig. 5. Multinomial-Poisson model spatial discrepancy. Spatial repartition of the diffefined in the model validation (Section 3.3). (Left) Each square’s colour representshose difference is higher than 50. Squares covering less than 10 points have been

arger than values obtained in the case of MP model (not presentedere). We interpret this as an indication the MNB model effectivelyccounts for more overdispersion than the MP model does (smallj values confirm this), without requiring a lot of zeros comingrom the accumulation process, i.e. data do not call for a strongero-inflation. Indeed, in the NB model the proportion of zero

oming from the accumulation process is given by (�j/(�j + bjSjs))

�j.

arge bj values thus mean that this probability of zero coming fromccumulation is approximately null.

. Discussion

In this paper, we proposed a climate-pollen transfer functionTF) composed of a vegetation model and a Bayesian hierar-hical model (BHM). The BHM represents the vegetation modelrrors along with pollen production, dispersal, accumulation andampling. For validating our model, we used a mixed posterior pre-ictive approach that allows to handle multivariate data withoute-performing inference. The test detected that the model con-aining the MP component is not valid and we found that this isffectively due to the MP component. The model containing theNB component is also found inconsistent but with a mixed poste-

ior p-value of 0.014, close to the acceptance region we set at 0.025.or this model, problems with overdispersion and spatial station-rity disappear. This makes us think that the MNB component isot directly responsible for the misfit, potentially due to any levelf the BHM. The various assumptions we made to obtain a tractableHM with as much as possible predictive power are compared inection 5.1 with those usually made in ecology when modelling theegetation–pollen relationship. In Section 5.2 we explain the chal-enge of inverting such a TF for the reconstruction of palaeoclimateased on pollen data and we discuss another potential of such aF which is the ‘re-calibration’ of the vegetation model to includenformation present in the European dataset.

.1. Pollen processes modelling

Our representation of the vegetation–pollen link, except the

evel V for the vegetation model error, is comparable with modelssed in ecology for palaeo-vegetation reconstruction. Their onlyifference is the spatial scale of studies varying from a regionf tens to hundreds square kilometres for palaeoecology, to an

e between the mean of the deviance for the posterior and reference distributionsifference for the data points covered by the square. The black dots show the pointsved for their unreliability. (Right) Number of data points per square.

entire continent in palaeoclimatology. Models and methods used inpalaeo-landscape reconstruction are presented in Prentice (1985),Sugita (1994, 2007a,b); we refer to them as ‘classical’. Paciorekand McLachlan (2009) recently proposed the first BHM approachfor this application, later in the text we call it ‘PM09’. The threeapproaches (classical, PM09 and ours) stand on the representationof the same three main processes: pollen production, dispersal andaccumulation.

5.1.1. ProductionProduction of pollen varies between species. This hampers the

direct interpretation of pollen proportions in terms of vegetationcomposition (Davis, 1963). Production parameters thus form thecore of all methods by linearly relating absolute vegetation quan-tities to absolute pollen quantities. The classical approach differsfrom the two others by separating the inference of productionparameters from the inference of the other parameters. This poten-tially introduces biases and/or poor estimation of the uncertainties,which is avoided by using a BHM and performing inference in theBayesian framework (PM09 and our approach).

5.1.2. DispersalDispersal of pollen varies between species due, for example,

to differences in pollen grain morphology (Prentice, 1985). In thevarious steps of the classical approach, dispersal is considered indifferent ways. For example, when estimating relative productionparameters, authors adopt either a dispersal model based on par-ticle diffusion theory or a simple distance weighting scheme (e.g.review in Duffin and Bunting, 2008). In later steps of the inference,the dominant model is the Sutton’s model for particle dispersalwhose core dispersal kernel is exponential and parameters havephysical interpretations. For example, in Sugita (2007a,b), dispersalparameters associated to each species are not inferred but providedunder the form of ‘fall speed’ values per species. PM09 representdispersal by modelling a latent, ‘actual’, vegetation on a regular gridcovering the whole space and pollen is related to the convolution ofa Gaussian kernel over this grid. They use the same kernel param-eter for all the species and mention experiments they performed,

indicating that different kernel parameters do not significantlyimprove the model. We proposed to model a latent, ‘actual’, vege-tation (V) only at pollen sites and we relate accumulated pollen tothe convolution of a Gaussian kernel with one dispersal parameter
Page 11: Pollen-based climate reconstruction: Calibration of the vegetation–pollen processes

Mode

pdtrTgwcbrMo

5

ostvfmcOsorTsiirsc

5

i(irsfiFidtttctdw

ivaiRisfirita

V. Garreta et al. / Ecological

er species. The choice of simulating vegetation at pollen sites isriven by the need to keep inference feasible without removingoo much information. Indeed, pollen sites are highly clustered inegions expected to show as little human disturbance as possible.his makes the use of a regular grid inefficient because a coarserid would lead to the loss of most information and a fine gridould be empty for most of the points at the same time as being

omputationally not realistic. Irregular grids such as those obtainedy tessellation cannot be used since they would not be adapted ineconstruction, due to new sampling locations of sediment cores.oreover, since the pollen present at a site is primarily dependent

n the vegetation at the site, our choice is quite natural.

.1.3. AccumulationAccumulation process is seen as the capture of airborne pollen

ver the entire surface of a trap. In the classical approach, the trapize has a crucial role in relating the pollen assemblage of the trapo local or ‘background’ sources of pollen (Sugita, 2007b). In thisision, major pollen sources for a specific trap are located at a dif-erent distance, depending on the trap size. This derives from a

echanistic representation of the dispersal and accumulation pro-esses and requires to have a precise estimate of the trap surface.ur model is adapted to the global scale and situations where trap

urfaces are not available. We model accumulation as the capturef airborne pollen over a single, unknown, trap size (MP model) orandom lake size and extra randomness per species (MNB model).hen, our models rely on hypotheses of stationarity of the trapize distribution in space, implying that only extra-variability isntroduced instead of bias. PM09 adopt the same kind of approachn which trap size is not taken into account and accumulation iselated to the same distance of sources for each trap. They thustand on the same stationarity assumption as ours which is a hardonstraint to satisfy on the restricted area they consider (a forest).

.2. Palaeoclimatology, ecology and vegetation models

Climate reconstructions based on our model correspond to thenference of (climate and vegetation) states in a state space modelCappé et al., 2005) whose transition distribution in time is implic-tly determined by the vegetation model (Garreta, 2010). Sucheconstructions can theoretically be obtained from one or severalediment cores at the same time. Garreta et al. (2010) proposed altering inference algorithm in the case of a single sediment core.urther tests, discussed in Garreta (2010), showed that the filter-ng distribution is potentially a bad approximation of the smoothedistribution we require, and the method fails to provide reconstruc-ions from several cores at the same time. This original challenge,he inference of states in spatio-temporal models partly definedhrough a computer code, is at the heart of many problem and oftenalled a data-assimilation problem. We think that, by conjugatinghe current interest for computational statistics and the needs inifferent fields of applications, credible solutions to this problemill rapidly emerge.

From an ecological perspective, our approach provides an orig-nal checking of some large scale (continental) features of theegetation model outputs at the level of the taxa. The BHM canlso be interpreted as a mean for extracting information containedn pollen samples for the re-calibration of the vegetation model.ecalibration of vegetation models that have been built and tuned

n many steps during tens of years of development is a major per-pective which raises two main challenges. The first challenge is tond an inversion algorithm. This problem is similar to the inversion

equired for palaeoclimate reconstruction. In the work presentedn this article, we had to ‘cut-feedbacks’ to the stochastic vegeta-ion model, i.e. to perform a single simulation of the vegetationt each site for a fixed set of vegetation model parameter. This

lling 235– 236 (2012) 81– 94 91

hampers re-calibration which requires the ability to modify thevalue of the vegetation model parameters. Solutions to this problemmay reside in approximate inference algorithm and/or in approx-imating the vegetation model itself (Kennedy et al., 2008). Thesecond problem is to properly assess the ‘structural’ error of thevegetation model, i.e. the one which is irreducible by parametertuning.

6. Conclusion

The inversion of a vegetation model coupled to a hierarchicalBayesian model is one of the most appropriate solutions to inferclimate from pollen data. The vegetation model (here LPJ-GUESS)expresses potential vegetation as a function of the climate by takinginto account factors usually assumed to be constant or negligiblesuch as atmospheric CO2 and competition between species. Fourmain processes which successively transform potential vegetationinto pollen proportions in the sediments are taken into account bya hierarchical Bayesian model: pollen production, dispersal, accu-mulation and sampling. This is the first model able to describe allthe major processes between climate and pollen samples, as sug-gested by Guiot et al. (2009) in their conclusions. Other processesassociated to old sediments, such as diagenesis and chronologyuncertainties, remain to be modelled but this paper proposes afirst solution and, perhaps more importantly, a path for futureresearches. This approach also allows to extract information fromseveral proxies measured on the same core. As shown by Guiot et al.(2009), vegetation models can also model fractionation of ı13C inthe plant, water absorbed by the plant and water running off, allquantities having also their proxies in the sediment cores. Finally,as the uniformitarian principle (proposed by James Hutton) behindgeology and palaeoclimatology concerns the analogy between pastand present processes instead of the analogy between past andmodern data, process-based transfer functions are definitely thefuture of palaeoclimatology and palaeoecology.

Acknowledgements

The first author would like to thank Paul Miller for supportwith LPJ-GUESS. The demonstration (Appendix A) has been greatlyimproved by Professor John Haslett (TCD, Dublin). Two anony-mous reviewers greatly helped in improving the paper. This workhas been funded by the European Science Foundation under theEUROCORES Programme EuroCLIMATE (project DECVEG) and bythe French Centre National de la Recherche Scientifique. The firstauthor is now funded through an EMPOWER postdoctoral fel-lowship from IRCSET (Ireland) and revised the manuscript at theIsaac Newton Institute (Cambridge, UK) during the programme‘Mathematical and Statistical Approaches to Climate Modelling andPrediction’.

Appendix A. Identifiability of the multinomial-Poissonmodel

The identifiability of the k bj parameters is not trivial due tonormalising. For any site i the pollen accumulated, Xi, is distributedfollowing Poisson distributions centred on the k-dimensionalmean vector (b1S1

i, . . . , bkSk

i). Sj

iare the spatial regressors and

bj some species specific parameters. The normalisation of the

Xj

iforms the k proportions pj

i. The k proportions define a k − 1

dimensional space due to their sum to one constraint and they arematched to the pollen data throughout the multinomial likelihood.Therefore, a priori, the k − 1 dimensional space for the proportions

Page 12: Pollen-based climate reconstruction: Calibration of the vegetation–pollen processes

9 l Mode

cc

E

ww

oapfipl

V

TvmWIfit

A

adRV

b

f

p

E

w

2 V. Garreta et al. / Ecologica

annot constrain the k bj parameters. This is demonstrated byalculating the pj expectation in the next section.

[pji|�j=1,...,k Xj

i> 0] = Sj

i.bj

�j=1,...,k bjSji

hich is centred on the same value for any vector � × (b1, . . ., bk)ith a real � > 0.

We show that the kth parameter is related to the variancef the proportions. Let us leave the site subscript i and notej = Sj . bj/�j=1,. . .,k bjSj and K = �j=1,. . .,k bjSj. Thus the aj’s are k − 1arameters (due to the sum to one constraint) that are indenti-able by the mean of the proportion data and K is the overall sumarameter. We demonstrate in the next section that for a not too

ittle K, say K ≥ 10 then

ar[pj|�j=1,...,k Xj > 0] ≈ aj(1 − aj)K

his result implies that the parameter K is identifiable by theariance of the proportion data. It is the parameter, in theultinomial-Poisson model, which accounts for overdispersion.hen this parameter is very large the overdispersion disappears.

n the Bayesian framework for inference, one can restrict the prioror K to a maximum value corresponding to a numerically insignif-cant overdispersion to allow convergence of the MCMC chain inhe absence of overdispersion.

.1. Mean and variance of a Poisson ratio

Let N∼P(�) be a discrete random value with Poisson distributionnd � ≥ 0. Let M∼P(�) be another discrete random value, indepen-ent of N, with Poisson distribution and � ≥ 0. Our interest lies in

= N/(N + M), when K = N + M > 0 and in particular in E[R|K > 0] andar[R|K > 0].

We note that, conditional on K = k, N ∼ B(k, p) a binomial distri-ution with k outcomes and p = �/(� + �).

Thus E[R|K = k] = p and Var[R|K = k] = (1/k)p(1 − p). From theormer E[R|K > 0] = EK>0[E[R|K]] = p.

Since E[R2|K] = p2 + (1/K)p(1 − p) we have E[R2|K > 0] = p2 +(1 − p)EK>0[1/K], where K ∼ P() and = � + �. But

K>0

[1K

]= e−

1 − e−

(∑k>0

k

k!k

)= e−

1 − e−

0

eu − 1u

du

hich is not analytically tractable.We can obtain bounds for

∑k>0(k/k ! k),

1

(∞∑

k=1

k+1

(k + 1)!

)<∑

k>0

k

k!k= 1

(∞∑

k=1

k+1

(k + 1)!.k + 1

k

)

1

(e − 1 − ) <∑

k>0

k

k!k= 1

(e − 1 − +

∞∑k=1

k+1

(k + 1)!k

)

1

(e − 1 − ) <∑

k>0

k

k!k<

1

(e − 1 − + 3

(∞∑

k=1

k+2

(k + 2)!

))

1

(e − 1 − ) <∑

k>0

k

k!k<

1

(e − 1 − ) + 32

(e − 1 − − 2

2

)

(A.1)

These bound EK>0[1/K] to

− − [ ] − −

1 − e − e

(1 − e−)< EK>0

1K

<1 − e − e

(1 − e−)

+ 3(1 − e− − e− − 2e−/2)2(1 − e−)

lling 235– 236 (2012) 81– 94

hence EK>0[1/K] ≈ 1 and Var[R|K > 0] ≈ (1/)p(1 − p) for ‘suf-

ficiently large’. This demonstrates that the model works like aMultinomial-Dirichlet model with one overdispersion parameterfor the probabilities.

Appendix B. Priors

We list here the priors selected to complete the Bayesian model.• The parameters of pollen production per species (bj, Eq. (2))

have independent gamma distributions.

[bj] = G(10−3, 10−3)

Because no prior arises intuitively from the problem at hand, wechose the gamma distribution, which is conjugated to the Poissondistribution and allows to use a step of Gibbs sampling in the MCMCalgorithm. The gamma parameters are selected to form a weaklyinformative prior.

• (MNB model only) The �j parameters of the negative binomial(Eq. (4)), i.e. over-dispersion with respect to the Poisson distribu-tion, have independent improper priors.

[�j] = 1{�j>0}

where 1 is the indicator function. Because no prior arises intuitivelyfrom the problem at hand, we chose the weakest informative priorwe found.

• The parameters � j of dispersal distance (Eq. (2)) are difficult toestimate. Indeed, no conjugacy property exists and for every newproposed value � j* it is necessary to compute the (pollen dispersal)kernel for every site, which is prohibitive with respect to computingtime and memory size. Based on Diggle et al. (2003), we propose topre-compute L different kernels associated to L values g1, . . ., gL ofthe dispersal distance. The inference algorithm consists in scanningthis discretised space. The prior underlying this method is a discreteuniform prior over the g1, . . ., gL pre-specified values.

[�j] = �Ll=1 ı�j=gl

/L

These values are chosen so that the grid fully covers the prior rangeof � . Moreover the distance between consecutive gl values mustnot be too large since it results in poor approximation of the poste-rior density due to its coarse discretisation and poor mixing of theMCMC chain. Indeed, large distances between consecutive gl makevery different the consecutive kernels, which lead to high rejectionduring the Metropolis step. In practice we use a grid covering therange of possibilities for � with a uniform grid lag determined bythe number of kernel matrices that can be computed using a rea-sonable amount of memory size (in practice, 35 for our Europeandataset).

• The mixture parameters qj for the actual vegetation (Eq. (1))are assumed to be independent and uniformly distributed over [0.5,1].

[qj] = U(0.5, 1)

The lower bound for this distribution is based on the assumptionthat, over the large area considered (Europe), using a vegetationmodel allows to better predict the absence of the vegetation than apure random experiment which has a probability of 0.5 to be right.

• Finally, scale and shape parameters mj and �j (Eq. (1)) for eachspecies are independent and follow Gaussian distributions trun-cated to be strictly positive.

[mj] = N(hj1, hj

1/2) truncated to (0; +∞)

The parameter mj is the mean of the actual vegetation taxa j whenit is present but has not been simulated by the model. hj

1 is set

Page 13: Pollen-based climate reconstruction: Calibration of the vegetation–pollen processes

Mode

e(

[

Tsp

A

C

bua(

lmscY

oit

p

wiXsoc

tac

im

C

trtal

a

(

c

R

A

V. Garreta et al. / Ecological

qual to the mean of the simulated NPPj over Europe when presentNPPj > 0).

�j] = N(hj2, hj

2/2) truncated to (0; +∞)

he parameter �j is the variance of the actual vegetation taxa. hj2 is

et equal to the variance of the simulated NPPj over Europe whenresent (NPPj > 0).

ppendix C. Model validation

.1. Posterior predictive p-value

The main idea is to compare the distributions of statisticsased on posterior simulations against the same statistics based onnconstrained simulations. When the posited structure is the sames the data structure, both statistics have the same distributionGuttman, 1967; Rubin, 1984; Gelman et al., 1996).

At the end of the inference we have a set of M posterior simu-ations (Xpost,m, Vpost,m, bpost,m, �post,m, �post,m, mpost,m, qpost,m) with

= 1, . . ., M following the posterior in Eq. (6). From the posteriorimulations Xpost,m (m = 1, . . ., M) a set of corresponding ‘repli-ate’ data Yrep can be simulated following the multinomial modelrep,m ∼ [Y|Xpost,m]

The posterior predictive p-value diagnostic consists in selectingne univariate discrepancy measure or statistic T(Y, X) and compar-ng the distributions of T(Y, Xpost,m) with the one of T(Yrep,m, Xpost,m)hrough the p-value

(T(Y, Xpost,m) < T(Y rep,m, Xpost,m)) (C.1)

hich is approximated as the proportion of times that T(Y, Xpost,m)s lower than T(Yrep,m, Xpost,m) for a set of M posterior simulationspost,m (m = 1, . . ., M). If the p-value is lower or higher than pre-pecified bounds, e.g. 0.025 and 0.975, the posterior discrepancy isutside of its reference distribution. This indicates that data are notonsistent with the model.

We use T, the deviance of the multinomial distribution. Weested other options such as Multivariate Gaussian distributionsnd 2 statistics, but the results are robust to the discrepancyhoice.

The posterior predictive p-value is known to be conservative,.e. it fails in detecting small to medium inconsistencies between

odel and data (Stern and Cressie, 2000).

.2. Mixed posterior predictive p-value

To reduce conservatism, we use the mixed predictive distribu-ion from Marshall and Spiegelhalter (2003). Instead of using theeference distribution Yrep ∼ [Y|Xpost], we simulate a reference dis-ribution from the highest level of our model to obtain a measures less conservative as possible, i.e. we generate replicates for bothatent levels and data.

From the posterior simulations of the parameters we simulate replicate (Yrep,m, Xrep,m, Vrep,m) following

Y rep,m, Xrep,m, V rep,m)∼[Y |Xrep,m] [Xrep,m|bpost,m, Srep,m]

[V rep,m|NPP, �post,m, mpost,m, qpost,m]

We compute the p-value as previously and using the same dis-repancy.

eferences

nderson, N.J., Bugmann, H., Dearing, J.A., Gaillard, M.J., 2006. Linking palaeoenvi-ronmental data and models to understand the past and to predict the future.Trends in Ecology & Evolution 21 (12).

lling 235– 236 (2012) 81– 94 93

Bartlein, P.J., Harrison, S.P., Brewer, S., Connor, S., Davis, B.A.S., Gajewski, K., Guiot,J., Harrison-Prentice, T.I., Henderson, A., Peyron, O., Prentice, I.C., Scholze, M.,Seppä, H., Shuman, B., Sugita, S., Thompson, R.S., Viau, A.E., Williams, J., Wu, H.,2011. Pollen-based continental climate reconstructions at 6 and 21 ka: a globalsynthesis. Climate Dynamics 37, 775–802.

Bartlein, P.J., Prentice, I.C., Webb III, T., 1986. Climatic response surfaces from pollendata for some eastern North American taxa. Journal of Biogeography 13, 35–57.

Birks, H.J.B., Heiri, O., Seppä, H., Bjune, A.E., 2010. Strengths and weaknesses of quan-titative climate reconstructions based on late-quaternary biological proxies. TheOpen Ecology Journal 3, 68–110.

Bordon, A., 2008. Dynamique de la végétation et variations climatiques dans lesBalkans au cours du dernier cycle climatique à partir des séquences polliniquesdes lacs Maliq et Ochrid (Albanie). PhD thesis, Université de Franche-Comté.

Bradshaw, R.H.W., Webb III, T., 1985. Relationships between contemporary pollenand vegetation data from Wisconsin and Michigan, USA. Ecology 66 (3),721–737.

Cappé, O., Moulines, E., Rydén, T., 2005. Inference in Hidden Markov Model. SpringerTexts in Statistics. Springer, New York.

Cressie, N., 1991. Statistics for Spatial Data. John Wiley and Sons, New York.Davis, M.B., 1963. On the theory of pollen analysis. American Journal of Science 261,

897–912.Diggle, P.J., Ribeiro Jr., P.J., Christensen, O.F., 2003. An introduction to model-based

geostatistics. In: Moller, J. (Ed.), Spatial Statistics and Computational Methods,Lecture Notes in Statistics. Springer, New York.

Duffin, K.I., Bunting, M.J., 2008. Relative pollen productivity and fall speed estimatesfor southern African savanna taxa? Vegetation History and Archaeobotany 17(5), 507–525.

Garreta, V., 2010. Bayesian approach of pollen-based palaeoclimate reconstruc-tions: toward the modelling of ecological processes. PhD thesis, UniversitéAix-Marseille III, http://tel.archives-ouvertes.fr/tel-00495890/en/.

Garreta, V., Miller, P.A., Guiot, J., Hély, C., Brewer, S., Sykes, M.T., Litt, T., 2010. Amethod for climate and vegetation reconstruction through the inversion of adynamic vegetation model. Climate Dynamics 35, 371–389.

Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B., 2004. Bayesian Data Analysis. Textsin Statistical Science, Second edition. Chapman & Hall/CRC, Boca Raton.

Gelman, A., Meng, X.L., Stern, H.S., 1996. Posterior predictive assement of modelfitness via realized discrepancies. Statistica Sinica 6, 733–807.

Guiot, J., De Vernal, A., 2007. Transfer functions: methods for quantitative paleo-ceanography based on microfossils. In: Hillaire-Marcel, C., De Vernal, A. (Eds.),Developments in Marine Geology, volume 1. Elsevier, Amsterdam, chapter C.

Guiot, J., Torre, F., Jolly, D., Peyron, O., Borreux, J.J., Cheddadi, R., 2000. Inversevegetation modeling by Monte Carlo sampling to reconstruct palaeoclimateunder changed precipitation seasonality and CO2 conditions: applicationto glacial climate in Mediterranean region? Ecological Modelling 1 (127),119–140.

Guiot, J., Wu, H.B., Garreta, V., Hatté, C., Magny, M., 2009. A few prospective ideason climate reconstruction: from a statistical single proxy approach towards amulti-proxy approach. Climate of the Past 5, 571–583.

Guttman, I., 1967. The use of the concept of a future observation in goodness-of-fitproblems. Journal of the Royal Statistical Society, Series B 29, 83–100.

Haslett, J., Whiley, M., Bhattacharya, S., Salter Townshend, M., Wilson, S., Allen, J.R.M.,Huntley, B., Mitchell, F.J.G., 2006. Bayesian paleoclimate reconstruction. Journalof the Royal Statistical Society, Series A 169 (3), 395–438.

Hatté, C., Guiot, J., 2005. Paleoprecipitation reconstruction by inverse modellingusing the isotopic signal of loess matter: application to the NuBloch loesssequence (Rhine Valley, Germany). Climate Dynamics 25 (2-3), 315–327.

Higdon, D., 1998. A process-convolution approach to modelling temperatures in theNorth Atlantic Ocean. Environmental and Ecological Statistics 5, 173–190.

Hughes, M.K., Ammann, C.M., 2009. The future of the past-an earth system frame-work for high resolution paleoclimatology: editorial essay. Climatic Change 94,247–259.

Hughes, M.K., Guiot, J., Ammann, C., 2010. An emerging paradigm: process-basedreconstructions. PAGES News 18 (2), 87–89.

Jackson, S.T., Williams, J.W., 2004. Modern analogs in quaternary paleoecology—heretoday, gone yesterday, gone tomorrow? Annual Review of Earth and PlanetarySciences 32, 495–537.

Jones, P.D., Briffa, K.R., Osborn, T.J., Lough, J.M., van Ommen, T.D., Vinther, B.M., Luter-bacher, J., Wahl, E.R., Zwiers, F.W., Mann, M.E., Schmidt, G.A., Ammann, C.M.,Buckley, C.M., Cobb, K.M., Esper, J., Goosse, H., Graham, N., Jansen, E., Kiefer,T., Kull, C., Küttel, M., Mosley-Thompson, E., Overpeck, J.T., Riedwyl, N., Schulz,M., Tudhope, A.W., Villalba, R., Wanner, H., Wolff, E., Xoplaki, E., 2009. High-resolution palaeoclimatology of the last millennium: a review of current statusand future prospects. The Holocene 19, 3–49.

Kennedy, M., Anderson, C., O’Hagan, A., Lomas, M., Woodward, I., Gosling, P., 2008.Quantifying uncertainty in the biospheric carbon flux for England and Wales.Journal of the Royal Statistical Society, Series A 171 (1), 109–135.

Kennedy, M., O’Hagan, A., 2001. Bayesian calibration of computer models. Journal ofthe Royal Statistical Society, Series B 63, 425–464.

LaDeau, S.L., Clark, J.S., 2006. Pollen production by Pinus taeda growing in elevatedatmospheric CO2. Functional Ecology 20, 541–547.

L’Ecuyer, P., 1999. Good parameters and implementations for combined multiple

recursive random number generators. Operations Research 47 (1), 159–164.

Marshall, E.C., Spiegelhalter, D.J., 2003. Approximate cross-validatory predictivechecks in disease mapping models. Statistics in Medicine 22, 1649–1660.

Martin, T.G., Wintle, B.A., Rhodes, J.R., Kuhnert, P.M., Field, S.A., Low-Choy, S.J.,Tyre, A.J., Possingham, H.P., 2005. Zero tolerance ecology: improving ecological

Page 14: Pollen-based climate reconstruction: Calibration of the vegetation–pollen processes

9 l Mode

M

M

N

P

P

P

R

R

R

S

S

S

4 V. Garreta et al. / Ecologica

inference by modelling the source of zero observations? Ecological Letters 8(11), 1235–1246.

iller, P.A., Giesecke, T., Hickler, T., Bradshaw, R.H.W., Smith, B., Seppä, H., Valdes,P.J., Sykes, M.T., 2008. Exploring climatic and biotic controls on holocene vege-tation change in Fennoscandia? Journal of Ecology 96 (2), 247–259.

osimann, J.E., 1965. Handbook of paleontological techniques. In: Kummel, B., Raup,D. (Eds.), Statistical methods for the pollen analyst: multinomial and negativebinomial techniques. Freeman, San Francisco, pp. 636–673.

ew, M., Lister, D., Hulme, M., Makin, I., 2002. A high-resolution data set of surfaceclimate over global land areas. Climate Research 21.

aciorek, C.J., McLachlan, J.S., 2009. Mapping ancient forests: Bayesian inference forspatio-temporal trends in forest composition. Journal of the American StatisticalAssociation 104, 608–622.

arsons, R.W., Prentice, I.C., 1981. Statistical approaches to R-values and thepollen–vegetational relationship. Review of Palaeobotany and Palynology 32,127–152.

rentice, I.C., 1985. Pollen representation, source area and basin size: toward aunified theory of pollen analysis. Quaternary Research 23, 76–86.

obert, C.P., Casella, G., 1999. Monte Carlo Statistical Methods. Springer Texts inStatistics. Springer-Verlag, New York.

ousseau, D., Hatté, C., Guiot, J., Duzer, D., Schevin, P., Kukla, G., 2006. Reconstruc-tion of the Grande Pile Eemian using inverse modelling of biomes and ı13c.Quaternary Science Reviews 25, 2808–2819.

ubin, D.B., 1984. Bayesianly justifiable and relevant frequency calculations for theapplied statistician. Annals of Statistics 12, 1151–1172.

mith, B., Prentice, I.C., Sykes, M.T., 2001. Representation of vegetation dynamicsin modelling of terrestrial ecosystems: comparing two contrasting approaches

within European climate space. Global Ecology & Biogeography 10, 621–637.

tern, H.S., Cressie, N., 2000. Posterior predictive model checks for disease mappingmodels. Statistics in Medicine 19, 2377–2397.

ugita, S., 1994. Pollen representation of vegetation in Quaternary sediments: theoryand method in patchy vegetation. Journal of Ecology 82, 881–897.

lling 235– 236 (2012) 81– 94

Sugita, S., 2007a. Theory of quantitative reconstruction of vegetation I: pollen fromlarge sites REVEALS regional vegetation composition? The Holocene 17 (2),229–241.

Sugita, S., 2007b. Theory of quantitative reconstruction of vegetation II: all you needis LOVE? The Holocene 17 (2), 243–257.

Tang, G., Bartlein, P.J., 2008. Simulating the climatic effects on vegetation:approaches, issues and challenges. Progress in Physical Geography 32 (5),543–556.

Tauber, H., 1965. Differential pollen dispersion and the interpretation of pollendiagrams. Danmarks Geol. Undersœgelse II 89.

Tolwinski-Ward, S.E., Evans, M.N., Hughes, M.K., Anchukaitis, K.J., 2011. An effi-cient forward model of the climate controls on interannual variation in tree-ringwidth. Climate Dynamics 36, 2419–2439.

Vasko, K., Toivonen, H.T., Korhola, A., 2000. A Bayesian multinomial Gaussianresponse model for organism-based environmental reconstruction? Journal ofPaleolimnology 24 (2), 43–250.

von Post, L., 1916. Om skogsträdpollen i sydsvenska torfmosslagerföljder. Geologiskaföreningens Stockholm förhandlingar 38, 384–390.

Webb, T., 1974. Corresponding distributions of modern pollen and vegetation inLower Michigan. Ecology 55 (17-18).

Webb, T.J., Bryson, R.A., 1972. Late- and Postglacial climatic change in the north-ern Midwest, USA: Quantitative estimates derived from fossil pollen spectra bymultivariate statistical analysis. Quaternary Research 2, 70–115.

Williams, J.W., Jackson, S.T., 2003. Palynological and AVHRR observations of modernvegetational gradients in eastern North America? The Holocene 13 (4), 485–497.

Wramneby, A., Smith, B., Zaehle, S., Sykes, M.T., 2008. Parameter uncertainties inthe modelling of vegetation dynamics-effects on tree community structure and

ecosystem functioning in European forest biomes. Ecological Modelling 216,277–290.

Zaehle, S., Sitch, S., Smith, B., Hatterman, F., 2005. Effects of parameter uncertain-ties on the modeling of terrestrial biosphere dynamics. Global BiogeochemicalCycles, 19 (16p.), http://dx.doi.org/10.1029/2004GB002395.