Intro to Geostats

Embed Size (px)

Citation preview

  • 7/30/2019 Intro to Geostats

    1/13

    Introduction to Geostatistics and Some Preliminaries

    Lecture 1:

    1. Introductory remarks.2. The history of geostatistics.3. The statistical programming environment R.

    Lecture 2:

    1. Interpolation methods.

    Lecture 3:

    1. Basic statistics.2. Exploratory data analysis.

    Lecture 4:

    1. Spatial processes.2. Spatial correlation.3. Evaluation of the first course day.

    Forskningscenter Foulum 1

  • 7/30/2019 Intro to Geostats

    2/13

    Spatial Processes

    Random functions

    Suppose that we have a region, D 2, of interest.

    We migth be interested in modelling the yield of spring barley or the amount of plantavailable nitrogen in the soil.

    At points xi, i = 1, 2, . . . , n in D we measure the quantity of interest.

    Let z(xi) denote the value of the measurement at point xi.We may view z(xi) as a realisation of a random variable Z(xi) living at the point xi.

    In geostatistics the set of z(xi) is called a regionalized variable.

    The set of random variables

    {Z(x1), Z(x2), . . . , Z (xn)}

    constitute a random function, random process, or a stochastic process.

    Forskningscenter Foulum 2

  • 7/30/2019 Intro to Geostats

    3/13

    Spatial distribution

    There is in general a double infinity involved in random functions.

    We have an infinity number of points in the field or region D.

    In each point we may have a random variable which may take an infinity of values.

    In order to characterize a consistent definition of spatial distribution we need the following.

    A random function is described by its finite-dimensional distributions.

    For any number k and any points x1, x2, . . . , xk the set of random variables

    {Z(x1), Z(x2), . . . , Z (xk)}

    have a joint distribution.

    Some calls this a spatial distribution.

    The finite-dimensional distributions themselves do not specify a valid probability distribu-tion over the whole region consisting of noncountable number of points.

    Forskningscenter Foulum 3

  • 7/30/2019 Intro to Geostats

    4/13

    Defining a spatial distribution alone on the finite-dimensional distribution makes it impos-sible to calculate e.g.

    P

    supxD

    Z(x) < z0

    .

    This problem is overcome by adding further assumptions of the random function.

    The assumption is that for an distribution over a noncountable number of points thereexist an equivalent distribution over a countable number of points.

    From this equivalent distribution a consistent probability distribution may be defined interms of the finite dimensional distributions.

    Interested may consult Doob, (1953)1

    1Doob, J. L. (1953). Stochastic Processes,Wiley & Sons, New York, Sec. 2.2 (reprinted 1990).

    Forskningscenter Foulum 4

  • 7/30/2019 Intro to Geostats

    5/13

    Stationarity

    By stationarity is meant that the distribution of the random process has certain attributesthat are the same everywhere in the field D.

    Strict stationarity

    Whenever for any numbers k and any points x1, x2, . . . , xk the finite-dimensional distri-butions remains the same under an arbitrary translation h of the points, i.e.

    P(Z(x1) < z1, . . . , Z (xk) < zk) = P(Z(x1 + h) < z1, . . . , Z (xk + h) < zk),

    then the process is said to be strictly stationary.

    The first two moments of a strictly stationary process are invariant under translation.

    Physically, this means that the phenomenon, we measure, is homogeneous in space.

    The process repeats, so to speak, itself in the whole space of interest.

    Forskningscenter Foulum 5

  • 7/30/2019 Intro to Geostats

    6/13

    Second order stationarity

    Consider the following moments of the random process.

    The first moment, i.e. the mean of the process

    (Z(x)).

    The second central moment between variables, i.e. the covariance between the randomvariables Z(x) and Z(x + h),

    cov

    Z(x), Z(x + h)

    =

    Z(x)

    Z(x)

    Z(x + h)

    Z(x + h)

    .

    If the mean is constant,

    (Z(x)) = m

    and

    cov

    Z(x), Z(x + h)

    = C(h)

    Forskningscenter Foulum 6

  • 7/30/2019 Intro to Geostats

    7/13

    is a function, C(h), which only depends on the separation h, then the process is secondorder stationary (or weakly stationary, or wide-sense stationary).

    A random field with the function C(h) only dependent on the distance h and not onits orientiation is said to be isotropic

    Such a field is both translation and rotation invariant, that is homogeneous.

    Intrinsic stationarity

    A milder hypothesis is to assume that for every vector h the increment Z(x + h) Z(x)

    is second order stationary.Then Z(x) is called an intrinsic random function and is characterized by

    Z(x + h) Z(x)

    = 0

    varZ(x + h) Z(x) = 2(h),for an function (h) which only depends on the separation vector h.

    The mean of the increments is zero.

    The quantity var

    Z(x + h) Z(x)

    is called the (semi-)variance at lag h.

    Forskningscenter Foulum 7

  • 7/30/2019 Intro to Geostats

    8/13

    Ergodicity

    Ergodicity is closely related to stationarity. A process is said to be ergodic if the momentsof a random process on a finite region approach the moments of the random process onthe whole space when the bounds of the region expand towards infinity.

    This may be characterized by the mean of the random process. By definition, if

    limV

    1

    |V|

    V

    Z(x)dx = m

    where m = (Z(x)) and |V| is the area of the region D, then the process is ergodic.

    This simply means that if we measure a process by sampling the process on a finite regionthen the sample is representative for the process on the whole space.

    An ergodic process means that we can make sensible statistical inference in the process

    since the choosen estimators make sense.

    In practice, ergodicity is never a problem, see Chiles et. al, (1999)2 and the referencestherein.

    2Chiles, Jean-Paul and Delfiner, Pierre, Geostatistics Modeling Spatial Uncertainty, John Wiley & Sons, New York, 1999,Sec. 1.1.6.

    Forskningscenter Foulum 8

  • 7/30/2019 Intro to Geostats

    9/13

    Spatial Correlation

    Covariance function

    Suppose that we have a second order stationary process, then

    (Z(x)) = m

    and

    C(h) =

    Z(x)

    Z(x)

    Z(x + h)

    Z(x + h)

    ,

    where the function C(h) is a function of the lag h only.

    It is called the covariance function (by some the autocovariance function).

    It describes the dependence between the random function Z(x) for all positions separatedby a lag of h.

    We have that C(h) = C(h) and |C(h)| C(0) for all h.

    Also C(h) 0 for h .

    Forskningscenter Foulum 9

  • 7/30/2019 Intro to Geostats

    10/13

    Later we shall see that C(h) must fulfill a positive definite condition in order to be a validcovariance function.

    Correlogram

    The covariance function depends on the scale on which the process is measured.

    Let C(0) be the covariance at lag 0. C(0) is the variance, say 2, of the process.

    Define

    c(h) =C(h)

    C(0) =1

    2C(h).

    The function c(h) is called the correlation function.

    It is independent of the scale measured on!

    Furthermore,

    |c(h)| 1.

    Forskningscenter Foulum 10

  • 7/30/2019 Intro to Geostats

    11/13

    Variogram

    Now suppose that Z(x) is an intrinsic random function.

    Then,

    Z(x + h) Z(x)

    = 0

    and

    2(h) = var

    Z(x + h) Z(x)

    .

    The function (h) is called the (semi-)variogram.

    The variogram shows how the dissimilarity between Z(x) and Z(x + h) evolves withseparation h.

    We have (h) = (h), (h) 0, and (0) = 0.

    Later we shall see that (h) must fulfill a conditionally positive definite condition in orderto be a valid variogram.

    Forskningscenter Foulum 11

  • 7/30/2019 Intro to Geostats

    12/13

    The relation between covariance function and variogram

    An second order stationary process (SRF) is also an intrinsic stationary process (IRF).

    The reverse statement is not true. The standard example is a Brownian motion (white

    noise) which is intrinsic stationary but not second order stationary.

    Suppose that Z(x) is a second order random function. Then the variogram is linked tothe covariance function by the relation

    (h) = C(0) C(h).

    The variogram is thus bounded above by 2C(0).

    Conversely if the variogram is bounded above then (h) is of the form above for a sta-tionary C(h) and the IRF only differs from a SRF by a constant.

    The variogram has a larger degree of generality than the covariance function.

    See Gneiting et. al, (2000)3 for more information on the similarity between variogramsand covariance functions.

    3Gneiting, T., Sasvari, Z. and Schlather, M. (2000) Analogies and correspondences between variograms and covariancefunctions. Advances in Applied Probability, 33, 617630.

    Forskningscenter Foulum 12

  • 7/30/2019 Intro to Geostats

    13/13

    End of lecture 4.

    Forskningscenter Foulum 13