Geostatistical analysis of Sycamore (Acre pseudoplatanus) in Flanders (Belgium)

Embed Size (px)

Text of Geostatistical analysis of Sycamore (Acre pseudoplatanus) in Flanders (Belgium)

  • Faculty of Sciences

    Geostatistical analysis of the regeneration of

    Sycamore (Acer pseudoplatanus) in Flanders

    (Belgium)

    by

    ir. Thierry Onkelinx

    Promoters:

    Prof. Dr. ir. M. Van Meirvenne, Department of Soil Management

    Prof. Dr. ir. K. Verheyen, Department of Forest and Water Management

    Dr. D. Bauwens, Research Institute for Nature and Forest

    Master dissertation submitted to obtain the degree of

    Master of Statistical Data Analysis

    Academic year 20082009

  • iii

    Preface

    This thesis is the final piece of my education as a master in statistical data analysis.

    The master course revealed to me how fascinating the world of statistics can be. The

    thesis allowed me to explore three of my favourite research topics: forestry, geographical

    information science and statistics.

    First of all I would like to express my gratitude to ir. Paul Quataert of the Research

    Institute for Nature and Forest (INBO). He gave me the necessary facilities to combine

    my full-time job with the master course during 4 years. Furthermore he encourages our

    team to keep up-to-date with the current evolutions in statistics.

    This thesis was not feasible without the dendrometrical data. Therefore my thanks go

    out to dr. ir. Martine Waterinckx, Bart Roelandt and ir. Wout Damiaans (all Nature and

    Forestry Agency, ANB) for kindly providing the data of the national forest inventory and

    the forest management plans. ir. Kris Vandekerkhove, ir. Luc De Keersmaeker and Peter

    Van de Kerkhove (all INBO) kindly providing the data of the forest reserves. All this data

    are confidential to the extent that we can only distribute the results of our study but not

    the data itself.

    I could not have finalised this thesis without the input of my promoters: prof. dr. ir.

    Marc Van Meirvenne (UGent), prof. dr. ir. Kris Verheyen (UGent) and dr. Dirk Bauwens

    (INBO). They were willing to guide me through my thesis based on my first rough ideas

    on the topic. Their invaluable comments helped me to clearly define the scope of this

    thesis. Special thanks go to dr. Dirk Bauwens for expertly proof-reading this thesis.

    And last but not least I own many thanks to Ester, my future wife. She took care of

    many things so I could spend enough time on my thesis and the courses.

    ir. Thierry Onkelinx, june 2009

  • vAdmission for circulating the work

    The author and the promoters give permission to consult this master dissertation and to

    copy it or parts of it for personal use. Each other use falls under the restrictions of the

    copyright, in particular concerning the obligation to mention explicitly the source when

    using results of this master dissertation.

    ir. Thierry Onkelinx, june 2009

  • vi

  • CONTENTS vii

    Contents

    Preface ii

    Table of contents v

    1 Abstract 1

    2 Introduction 3

    2.1 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    2.2 The data sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    2.2.1 Sampling technique . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

    2.2.2 Measure for success of regeneration . . . . . . . . . . . . . . . . . . 5

    3 Modelling and predicting ecological data 7

    3.1 Analysing spatially auto-correlated data . . . . . . . . . . . . . . . . . . . 7

    3.1.1 Auto-covariate models . . . . . . . . . . . . . . . . . . . . . . . . . 7

    3.1.2 Generalised least squares regression . . . . . . . . . . . . . . . . . . 8

    3.1.3 Autoregressive models . . . . . . . . . . . . . . . . . . . . . . . . . 8

    3.1.4 Spatial generalised linear mixed models (GLMM) . . . . . . . . . . 9

    3.1.5 Spatial generalised estimating equations (GEE) . . . . . . . . . . . 9

    3.2 Regression models for count data . . . . . . . . . . . . . . . . . . . . . . . 10

    3.3 Assessing the impact of capturing spatial auto-correlation . . . . . . . . . . 10

    3.3.1 Selected methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    3.3.2 Comparing model parameters . . . . . . . . . . . . . . . . . . . . . 13

    3.3.3 Assessing the quality of the predictions . . . . . . . . . . . . . . . . 14

    3.4 Parametric spatial bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    4 Material and methods 17

    4.1 Creating a data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    4.2 Building the models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    4.2.1 Tested variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    4.2.2 Model selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

    4.3 Bootstrapping the model parameters . . . . . . . . . . . . . . . . . . . . . 21

    4.4 Cross-validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

    4.4.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

    4.4.2 Working around some problems . . . . . . . . . . . . . . . . . . . . 23

  • viii CONTENTS

    5 Results 25

    5.1 Influence on model parameters . . . . . . . . . . . . . . . . . . . . . . . . . 25

    5.1.1 Models assuming Gaussian data . . . . . . . . . . . . . . . . . . . . 25

    5.1.2 Models assuming Poisson data . . . . . . . . . . . . . . . . . . . . . 33

    5.1.3 Models assuming binomial data . . . . . . . . . . . . . . . . . . . . 39

    5.2 Influence on cross-validation of predictions . . . . . . . . . . . . . . . . . . 44

    5.2.1 Models assuming Gaussian data . . . . . . . . . . . . . . . . . . . . 44

    5.2.2 Models assuming count data . . . . . . . . . . . . . . . . . . . . . . 47

    5.2.3 Models assuming binomial data . . . . . . . . . . . . . . . . . . . . 49

    6 Discussion and conclusions 53

    6.1 Implications on modeling ecological data . . . . . . . . . . . . . . . . . . . 54

    6.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

    Bibliography 57

    A Exploratory data analysis 63

    A.1 Natural regeneration of sycamore . . . . . . . . . . . . . . . . . . . . . . . 63

    A.2 Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

    A.3 Geomorphology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

    A.4 Forest management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

    A.5 Soil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

    B Overview of the models 79

    B.1 Gaussian models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

    B.2 Poisson models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

    B.3 Logistic models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

    C Glossary and abbreviations 89

    C.1 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

    C.2 Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

    C.3 R packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

  • 1Chapter 1

    Abstract

    Autocorrelation is a very general statistical property of ecological variables observed across

    geographic space (Legendre, 1993). Spatial autocorrelation implies that measurements at

    locations close to each other exhibit more similar values than those taken at sites that are

    further apart (Dormann et al., 2007). Spatial autocorrelation, which comes either from

    the physical forcing of environmental variables or from community processes, presents

    a problem for statistical testing. Indeed, autocorrelated data violate the assumption of

    independence that is made by most standard statistical procedures (Legendre, 1993). The

    violation of independent and identically distributed (i.i.d.) residuals may bias parameter

    estimates and can increase type I error rates (Bini et al., 2009; Dormann et al., 2007).

    Nevertheless, a lot authors still use the basic statistical models and tests that assume i.i.d.

    residuals.

    We here investigate the impact on both parameter estimates and the model predictions

    of incorporating the spatial structure of the data in the statistical model. Therefore we

    compare a basic method (assuming i.i.d. residuals) with four methods that deal with the

    spatial structure in the data: auto-covariates (AC), generalised least squares (GLS), a

    simultaneous autoregressive model (SAR) and a conditional autoregressive model (CAR).

    Our case study is a fairly large data set of sycamore (Acer pseudoplatanus) regen-

    eration from Flanders (northern part of Belgium). We model the presence-absence data

    (binomial), the number of saplings (Poisson) and the log transformed number of saplings

    (Gaussian). The explanatory variables are derived from the dendrometrical data or from

    available GIS layers.

    A spatial parametric bootstrap procedure is used to quantify the distribution of the

    model parameter estimates. They show both bias and differences in variance. Mainly the

    parameter estimates of explanatory variables with a spatial link are biased and become

    more variable. The other explanatory variables exhibit seldom bias. The effect on the

    variance depends on the method. Adding auto-covariates has little effect on the variances.

    Whereas GLS and SAR results in model parameters with smaller variances for the non-

    spatial explanatory variables. CAR results in extremely unstable model parameters.

    The predictions are evalu