Multivariate Geostatistics - tu-freiberg.deGeoStatistics Geostatistics is a generalization of classical statistics for georeferenced, spatially and stochastically dependent random

Multivariate Geostatistics

Winter Term 2018/19

Stochastic

About Probability and Statistics

Mathematics

Fields of Mathematics

AnalysisLinear AlgebraFunctional AnalysisDifferential EquationsIntegral Transforms...

Stochastics

Probability

Mathematical Statistics

Applied Statistics

Data Analysis, Data Mining

Mathematics – Analysis, Linear Algebra

Mapping

Mappings (“Abbildungen”) are a central issue in almost allmathematical disciplines.

Analysisreal functions, continuous functions, continuouslydifferentiable functions, integrable functions, etc.

f : R 7→ R+, f (x) = x2,

f : [−π/2, π/2] 7→ [−1, 1], f (ω) = cos2κ ω, κ ∈ N

Linear Algebralinear mapping, linear map provided by a matrix, matrixassociated with a linear map, homomorphism, isomorphism,etc.

A : R2 7→ R2, w =

(cosω − sinωsinω cosω

)v

Mathematics – Probability

Mapping

Probabilitymeasurable mapping, random variable, real random variable

Z : (Ω,A,P) 7→ (R,B)

P(Z (ω) ∈ B) := P(ω ∈ Z−1(B)) for all B ∈ B

where (Ω,A,P) denotes a probability space and (R1,B) thereal measurable space.

A random variable Z is completely defined by its probabilitylaw, i.e. its distribution. The distribution tells the probabilitywith which the random variable Z realizes values z = Z (ω).The values Z (ω), ω ∈ Ω, are called realizations.

Mathematics – Probability

Random variable

In contrast to e.g. analysis or linear algebra, there is usually noexplicit formula or rule (“Abbildungsvorschrift”) for a randomvariable Z : ω 7→ z how to assign z ’s to ω’s.

A random variable is given by its distribution; what can be knownof a random variable is its distribution.

If the distribution is known, then the random variable is known andall its properties can be deduced from the distribution.

Probability

Mathematical probability assumes the probability law to beknown and develops a theorie how to describe random events byrandom variables and investigates their properties.

Probability

For a real random variable Z : (Ω,A,P) 7→ (R,B), the distributionmay be given in terms of the distribution function F

F (z) := P(Z (ω) ∈ (−∞, z ]), z ∈ R

If the distribution function F can be represented as the integral

F (z) =

∫ z

−∞f (x)dx ,

then f is called probability density function, and in this case thedistribution may also be represented by its probability densityfunction.

For instance, the exponential law with parameter λ is given

P(Z ≤ z) = F (z) = 1− exp(−λz) = λ

∫ z

−∞exp(−λx)dx ,

f (x) = λ exp(−λx), 0 ≤ x , 0 < λ

Probability

Expectation, variance

Major properties of a real random variable are given in terms of“moments” or “central moments” of the distribution. The twomost prominent moments are the expectation of a real randomvariable

EZ =

∫ ∞−∞

z dF (z) =

∫ ∞−∞

z f (z)dz = µ

and the variance of a real random variable

VarZ = E(Z − EZ )2 =

∫ ∞−∞

(z − µ)2 f (z)dz = σ2.

For instance, the exponential law is a one–parameter distributionand its meaning is

EZ =1

λ, VarZ =

1

λ2.

Probability

Cauchy distribution

In the same way as there are probability laws for which aprobability density function does not exis, there are distributions forwhich moments do not exist, for instance the Cauchy distribution

f (z) =1

π

λ

λ2 + (z − µ)2, 0 < λ, µ ∈ R,

does not have an expectation nor a variance.

However, its median and mode is given by µ.

Probability

Covariance

The covariance of two real random variables Z1,Z2

Cov(Z1,Z2) = E(

(Z1 − EZ1)(Z2 − EZ2))

is a measure for the extent of a linear relationship between Z1 andZ2.

Independence of random variables

Two real random variables are called indepedent if their jointprobability law is the product of the two individual (“marginal”)probability laws.

Probability

Covariance

The covariance of two real random variables Z1,Z2

Cov(Z1,Z2) = E(

(Z1 − EZ1)(Z2 − EZ2))

is a measure for the extent of a linear relationship between Z1 andZ2.

Independence of random variables

Two real random variables are called indepedent if their jointprobability law is the product of the two individual (“marginal”)probability laws.

Probability

It holds

E(aZ + b) = aEZ + b (a, b ∈ R)

Var(aZ + b) = a2VarZ

E(Z1 ± Z2) = EZ1 ± EZ2

Var(Z1 ± Z2) = VarZ1 + VarZ2 ± 2Cov(Z1,Z2)

with Cov(Z1,Z2) = E(

(Z1 − EZ1)(Z2 − EZ2))

= E(Z1Z2)− EZ1EZ2

Probability

Uncorrelatedness vs. independence

Two random variables Z1,Z2 are called uncorrelated, if

Cov(Z1,Z2) = 0

For uncorrelated random variables Z1,Z2 it holds

E(Z1Z2) = EZ1 EZ2

Var(Z1 ± Z2) = VarZ1 + VarZ2

If two random variables are stochastically independent, then theyare also uncorrelated. The inverse is not generally true.


Mathematical statistics develops methods to determine theparameters of a distribution from a mathematical sample and itdevelops statistical tests to check hypotheses.


Mathematical sample

Mathematical statistics initially models a sequence of n realobserved univariate data z1, z2, . . . , zn ∈ R1 as independentrealizations of a real random variable Z .

In the multivariate case, zi ∈ Rm, i = 1, . . . , n, and Z denotes arandom vector.


Mathematical sample

The model of mathematical statistics may be generalized along thefollowing example.

Throwing a proper dice n times and recording the results each timeis equivalent to simultaneously throwing n dices once and recordingall outcomes if(i) the dice–cup is large enough such that the dices do notinterfere, and(ii) the n dices are identical copies of the initial one.

Mathematical StatisticsMathematical sample


The link of mathematical statistics and probability is established byemploying a twofold model of the data as follows

Z → z1, z2, . . . zn↑ ↑ ↑ ↑Z1, Z2, . . . Zn

where Z1,Z2, . . . ,Zn is a sequence of independent identicaldistributed (iid) random variables.

The sequence of data is modeled as the n–fold realization of aunique random variable Z and as a unique realization of an iidsequence of n random variables Z1,Z2, . . . ,Zn.The set Z1, . . . ,Zn is called mathematical sample.

Mathematical StatisticsMathematical sample


The link of mathematical statistics and probability is established byemploying a twofold model of the data as follows

Z → z1, z2, . . . zn↑ ↑ ↑ ↑Z1, Z2, . . . Zn

where Z1,Z2, . . . ,Zn is a sequence of independent identicaldistributed (iid) random variables.

The sequence of data is modeled as the n–fold realization of aunique random variable Z and as a unique realization of an iidsequence of n random variables Z1,Z2, . . . ,Zn.The set Z1, . . . ,Zn is called mathematical sample.


Mathematical sample

According to the second model of mathematical statistics we findthe following results. Let

M :=1

n

n∑i=1

Zi

Then

EM = E(1

n

n∑i=1

Zi

)=

1

n

n∑i=1

EZi = EZ = µ

VarM = Var(1

n

n∑i=1

Zi

)=

1

n2

n∑i=1

VarZi =1

nVarZ =

1

nσ2

Applied Mathematical Statistics – Data Analysis

Applied statistics attempts to find the probability law of therandom variable from which the observed data are thought of asrealizations and applies statistical tests to real world data inpractice.


Statistics

Descriptive mathematical statistics attempts to condense theinformation conveyed by the data into a few numbers describingand characterizing the set of data. In this way, empirical mean,empirical variance, empirical covariance, etc. may be seen asemprirical parameters of the set of data.

z =1

n

n∑i=1

zi

s2 =1

n − 1

n∑i=1

(zi − z)2, s =√s2

s12 =1

n − 1

n∑i=1

(z1i − z1)(z2i − z2)

r =1

n−1∑n

i=1(z1i − z1)(z2i − z2)

s1s2


Statistics

Inferential mathematical statistics attempts to infer theprobability law of the random variable from the sequence of thedata and its descriptive parameters, and to prove hypotheses bymeans of statistical tests.

At best, the empirical parameters of the data turn out to bereasonable estimates of the parameters of the probability law andto provide insight.

Applied Statistics

Stochastic modelRandom variables

ExpectationVarianceCovarianceCovariance matrix

Real WorldData

Arithmetic meanEmpirical varianceEmprirical covarianceMatrix empirical covariances


The fundamental modeling assumption of applied statistics

The elements of the sample are assumed to be independentand identical repetitions of the same experiment orobservation; otherwise, classical statistics does not apply.

GeoStatistics

Geostatistics is a generalization of classical statistics forgeoreferenced, spatially and stochastically dependent randomvariables. Thus, the fundamental modeling assumption of classicalstatistics is violated by definition.

The fundamental modeling assumption of geostatistics ishomogeneity (stationarity) which is a mathematicalexpression for a conservation law, e.g. the increments of anytwo random variables are not spatially dependent.

References (1)

Armstrong, M., 1998, Linear Geostatistics: Springer

Armstrong, M. und Dowd, P.A., 1994, Geostatistical Simulations:Kluwer Academic Publishers

Armstrong, M., Galli, A.G., Leloc’h, G.L.Y., Geffroy, F.L., Eschard,R., 2003, Plurigaussian Simulations in Geosciences: Springer

Chiles, J.-P., Delfiner, P., 1999, Geostatistics: ModelingSpatial Uncertainty: Wiley

Christakos, G., 1992, Random Field Models in Earth Sciences:Academic Press

Christakos, G., 2000, Modern Spatiotemporal Geostatistics: OxfordUniversity Press

Cressie, N.A.C., 1993, Statistics for Spatial Data - RevisedEdition: Wiley

David, M., 1977, Geostatistical Ore Reserve Estimation: Elsevier

References (2)

Deutsch, C.V, Journel, A.G., 1998, Geostatistical Software Libraryand User’s Guide, Second Edition: Oxford University Press

Goovaerts, P., 1997, Geostatistics for Natural ResourcesEvaluation: Oxford University Press

Houlding, S.W., 2000, Practical Geostatistics: Springer

Isaaks, E.H., Srivastava, R.M., 1989, An Introduction to AppliedGeostatistics: Oxford University Press

Journel, A.G., 1989, Fundamentals of Geostatistics in FiveLessons: American Geophysical Union

Journel, A.G., Huijbregts, C., 1978, Mining Geostatistics:Academic Press

Kitanidis, P.K., 1997, Introduction to Geostatistics: Applications inHydrogeology: Cambridge University Press

Krige, D., 1978, Lognormal - de Wijsian Geostatistics for OreEvaluation: South African Institute of Mining and Mineralogy

References (3)

Lantuejoul, C., 2002, Geostatistical Simulation: Models andAlgorithms: Springer

Mallet, J.-L., 2002, Geomodeling: Cambridge University Press

Matheron, G., 1971, The theory of regionalized variables andits applications: Les Cahiers du Centre de MorphologieMathematique de Fontainebleau, no 5

Matheron, G., 1989, Estimating and Choosing: Springer

Muller, W.G., 1998, Collecting Spatial Data: Physica Verlag

Myers, J., 1997, Geostatistical Error Mangement QuantifyingUncertainty For Environmental Sampling and Mapping: VanNostrand Reinhold

Olea, R. A., 1991, Geostatistical Glossary and MultilingualDictionary: Oxford University Press

Olea, R. A., 1999, Geostatistics for Engineers and Earth Scientists:Kluwer Academic Publishers

References (4)

Pawlowsky–Glahn, V., Olea, R. A., 2004, Geostatistical Analysis ofCompositional Data: Oxford University Press

Rivoirard, J., Simmonds, J., Foote, K.G., Fernandes, P., Bez, N.,2000, Geostatistics for Estimating Fish Abundance: BlackwellScience

Stein, M.L., 1999, Interpolation of Spatial Data - Some Theory forKriging: Springer

Wackernagel, H., 1998, Multivariate Geostatistics (2nd completelyrevised version): Springer

References: Gestatistics in the www

https://wiki.52north.org/AI_GEOSTATS/WebHome

http://www.iamg.org/

https://wiki.52north.org/AI_GEOSTATS/WebHome

http://www.iamg.org/

Geostatistics in a nutshell

The problem and prerequisites of its resolution

Experiencing spatially induced correlation

Applying spatial correlation to prediction I:Heuristic models

Applying spatial correlation to prediction II:Stochastic model – “Kriging”

Data based descriptive models of spatial correlation:The experimental semi–variogramStochastics modeling with random functions:The semi–variogramFundamental assumption of geostatistics – HomogeneityModeling the semi–variogramBest linear unbiased estimator (BLUE)Kriging systemsPractice of ordinary kriging

Stochastic simulation of random functions

The problem

Inventory

symbol description

x1, . . . , xn ∈ D sites of samplingz(x1), . . . , z(xn) data (scalar or vectorial)

Let x0 ∈ D with z(x0) unknown; define the linear combination

z∗(x0) =n∑`=1

w`(x0)z(x`)

with initially unknown coefficients (“weights”) w`(x0), ` = 1, . . . , n.

The problem

Linear combination

z∗(x0) =n∑`=1

w`(x0)z(x`)

Problem

What are the prerequisites that z∗(x0) would be a reasonablepredictor of z(x0)?

What is a reasonable way to determine the weigthsw`(x0), ` = 1, . . . , n, in such a way that– z∗(x0) is a good predictor of z(x0),– is the “best” predictor with respect to what criterion?

Conceptually,

given x0, which w`(x0) 6= 0, i.e., which z(x`) enter theprediction z∗(x0) ?

if w`(x0) 6= 0, how to determine it?

The prerequisites

Tendency of preservation

The general possibility of a reasonable predictor z∗(x0) requiressome kind of “tendency of preservation” of the properties beingsampled.

Such a tendency could mathematically be captured with terms likecontinuity or continuous differentiability, i.e., with some measure ofsmoothness.

In probability or statistics it would be termed spatially inducedsimilarity, correlation, dependence.

The prerequisites

Counter–example

Throwing dices at x`, ` = 1, . . . , n.

“The next observation is as surprising as each previous one.”

Contradiction to the fundamental assumption of statistics

Approaching the problem stochastically, a new kind of statistics isrequired as classical statistics depends on the fundamentalassumption of independent identical distributed random variables,i.e., on independent repetitions of identical sampling.

The prerequisites

Counter–example





The prerequisites

Counter–example





The prerequisites

Counter–example





Experiencing spatially induced correlation

Scientists’ and engineers’ experience communicates, e.g., that theore contents of the specimen in the sample from a homogeneousore deposit

are the more similar the closer their respective samplinglocation independently of their actual value;

are no longer similar at all, if the distance of their respectivelocations is larger than a specific distance characteristic forthe ore deposit.

This experience can be generalized for many spatial phenomena,too.

Applying spatial correlation

Some kind of spatially induced “similarity”, “continuity”,“correlation” is a prerequisite of any reasonable prediction.

According to common experience, the extent of this spatiallyinduced “similarity”, “continuity”, “correlation” is a function ofdistance.

Turning this experience constructive, the linear ansatz may berewritten as

z∗(x0) =n∑`=1

w`(x0)z(x`) =n∑`=1

w(x0 − x`)z(x`)

with a decreasing weight function w radially symmetric withrespect to the origin.

Applying spatial correlation to predictionHeuristic models (1)

Inverse distance weighting

z∗(x0) =n∑`=1

w(x0 − x`)z(x`)

with

w`(x0) = w(x0 − x`) =1/‖x0 − x`‖∑n`=1 1/‖x0 − x`‖

Other choices?


How to choose appropriate weight functions – naive view

Weighted mean of data

z∗(x0) =n∑`=1

w`(x0)︸︷︷︸weight

z(x`)︸︷︷︸data


How to choose an appropriate weight function – dual view

Linear combination of radially symmetric basis function

z∗(x) =n∑`=1

w`(x)︸︷︷︸base functions

z(x`)︸︷︷︸weights

=n∑`=1

w(x − x`)︸︷︷︸base function

z(x`)︸︷︷︸weights

What can be said about the smoothness of z∗(x) in terms ofcontinuity, continuous differentiability, etc.?

Applying spatial correlation to predictionStochastic model (1)

Apply data analysis, i.e., descriptive statistics to derive adescription of the spatially induced correlation.


h–scatter plot

A h–scatter plot is a scatter plot of all pairs of measurements ofthe same attribute z at locations z(x`), z(x` + h) separated by thevector h.

Note that h is a vector such that x`, x` + h ∈ D.

A h–scatter plot visualizes spatial variability or continuity,respectively. It is very helpful to identify extreme values.


h–scatter plot

For most natural phenomena it is generally expected that thespatial variability increases, i.e., spatial continuity decreases, as thelength ‖ h ‖ of h increases.

Thus, for increasing ‖ h ‖ the points cluster worse around the firstbisector in the (z(x), z(x + h))–plane.

This behaviour may be different for different directions h, which isreferred to as anisotropy.

A h–scatter plot may be summarized by the mean of the squaredorthogonal distances of the points to the first bisector. Thus, interms of mechanics, it is the moment of inertia with respect to thefirst bisector.


Sample semi–variogram

γ(h) :=1

2N(h)

N(h)∑α=1

[z(x`)− z(x` + h)]2

where z(x`)− z(x` + h) are referred to as h–increments of z .

Pythogoras’ theorem helps to see that

1

2[z(x`)− z(x` + h)]2 = cos2

π

4[z(x`)− z(x` + h)]2

actually is the squared orthogonal distance of the point withcoordinates (z(x`), z(x` + h)) to the first bisector.


Sample semi–variogram

The sample semi–variogram is a data–driven figure describing theincreasing dissimilarity of observations at any two sites withincreasing distance.

From the plots of an sample semivariogram we may read off thesill, the range, and the nugget–effect, and summarize it in thesethree terms.


Descriptive statistics vs. mathematical statistics

What is the counterpart of the semi–variogram in terms ofprobability and mathematical statistics?

Measuring the dissimilarity of z(x`) compared to z(x` + h), or thevariability of the increments (z(x`)− z(x` + h)), its counterpartshould be a variance.

Being a mean, the semi–variogram is reminiscent of an expectation.

Variance of increments

Var(Z (x)− Z (x + h)

)= E

(Z (x)− Z (x + h)

)2 − (E(Z (x)− Z (x + h)

))2︸︷︷︸homogeneity assumption: ≡0








)= E

(Z (x)− Z (x + h)

)2 − (E(Z (x)− Z (x + h)









)= E

(Z (x)− Z (x + h)

)2 − (E(Z (x)− Z (x + h)


The stochastic model: Random functions (1)

Random function

The set of spatially indexed random variables (Z (x), x ∈ D) iscalled a random function (RF).

Inventory of the model

symbol description

x1, . . . , xn ∈ D sites of samplingZ (x1), . . . ,Z (xn) random variables authorized for sampling sitesz(x1), . . . , z(xn) data z(x`)

interpreted as single realization of Z (x`)Z (x0) random variable authorized for location x0z(x0) unknown, to be estimatedZ ∗(x0) random variable,

estimator of the random variable Z (x0)z∗(x0) realisation of Z ∗(x0), estimate of z(x0)


The novel model provided by a random function interprets the dataz`, ` = 1, . . . , n, supported by the n specimen at the samplinglocations x` ∈ D, ` = 1, . . . , n, as a unique discrete realisationz(x`), x` ∈ D, ` = 1, . . . , n, of a unique spatial random functionZ (x), x ∈ D.

The random function is also referred to as regionalized randomvariable.


Estimator – estimate

Let x0 ∈ D; define the estimator Z ∗(x0) as linear combination ofrandom variables, i.e.,

Z ∗(x0) =n∑`=1

λ`(x0)Z (x`)

with initially unknown coefficients (“weights”) λ`(x0), ` = 1, . . . , n.

Problem rephrased

What is a reasonable way to determine the weigthsλ`(x0), ` = 1, . . . , n, in such a way that– Z ∗(x0) is a good estimator of Z (x0),– is the “best” estimator with respect to what criterion?



Let x0 ∈ D; define the estimator Z ∗(x0) as linear combination ofrandom variables, i.e.,

Z ∗(x0) =n∑`=1

λ`(x0)Z (x`)

with initially unknown coefficients (“weights”) λ`(x0), ` = 1, . . . , n.

Problem rephrased

What is a reasonable way to determine the weigthsλ`(x0), ` = 1, . . . , n, in such a way that– Z ∗(x0) is a good estimator of Z (x0),– is the “best” estimator with respect to what criterion?



If

Z ∗(x0) =n∑`=1

λ`(x0)Z (x`)

is a good estimator of Z (x0), then its realisation

z∗(x0) =n∑`=1

λ`(x0)z(x`)

with the same weights should be a good estimate of z(x0).

Random functions – Moments (1)

Any marginal one–point distribution function is given by

F (x ; z) := P(Z (x) ≤ z)

The mean of the RF is the expected value function

m(x) = EZ (x)

and its variance is the variance function

VarZ (x) = EZ 2(x)−m2(x)

of the random funcction (Z (x), x ∈ D).

Moments (2)

The centered 2–point covariance is the covariance of therandom variables Z (x1) and Z (x2)

Cov(Z (x1),Z (x2)

)= E[Z (x1)−m(x1)][Z (x2)−m(x2)]

= E(Z (x1)Z (x2))−m(x1)m(x2)

It is assumed that both moments exist, i.e. that they are finite.Then Z (x) is called a second–order random function: it has afinite variance and its covariance exists everywhere.

A covariance function is an even and positive definite function.

Moments (3)

The two–point variogram is defined as

Var(Z (x1)− Z (x2)

)= 2γ(x1, x2)

The variogram is an even and non–negative function. Moreimportant, −γ is a conditionally positive definite function.

Moments (4)

It holds

2γ(x`, xk) = Var(Z (x`)− Z (xk)

)= Var

(Z (x`)

)+ Var

(Z (xk)

)− 2Cov

(Z (x`),Z (xk)

)If

Var(Z (x`)

)= Var

(Z (xk)

)=: C (0)

and Cov(Z (x`),Z (xk)

)=: C (x`, xk)

then

γ(x`, xk) = C (0)− C (x`, xk)

where C (0) corresponds to the sill of the two–point semivariogram.

Statistics of random functions (1)

The problem of geostats

Only one discrete realisation z(x`), ` = 1, . . . , n, of the randomfunction Z (x) sampled at locations x` ∈ D exists, i.e., just for afew random variables Z (x`) of the random function Z (x) only asingle realization z(x`) has been sampled.

The geostatistical solution

A random function is furnished with “pleasant” properties suchthat the model permits to derive statistics based on a singlediscrete realization.

The solution will be provided by an appropriate generalization ofthe assumption of independent and identical distribution inherentto classical statistics.


The problem of geostats

Only one discrete realisation z(x`), ` = 1, . . . , n, of the randomfunction Z (x) sampled at locations x` ∈ D exists, i.e., just for afew random variables Z (x`) of the random function Z (x) only asingle realization z(x`) has been sampled.

The geostatistical solution

A random function is furnished with “pleasant” properties suchthat the model permits to derive statistics based on a singlediscrete realization.

The solution will be provided by an appropriate generalization ofthe assumption of independent and identical distribution inherentto classical statistics.


Homogeneity

The required “pleasant” property essentially consists in themodeling assumption that the increments z(x` + h)− z(x`) andz(xk + h)− z(xk) are realizations of a unique random variable∆(h) := Z (x + h)− Z (x) representing increments independently oftheir involved locations.

Note that it is not assumed that ∆(h1) and ∆(h2) areindependent.

The arithmetic mean of observed incrementsδ(h) = z(x + h)− z(x) provides an reasonable estimate of theexpectation E(∆(h)).


Homogeneity

The required “pleasant” property essentially consists in themodeling assumption that the increments z(x` + h)− z(x`) andz(xk + h)− z(xk) are realizations of a unique random variable∆(h) := Z (x + h)− Z (x) representing increments independently oftheir involved locations.

Note that it is not assumed that ∆(h1) and ∆(h2) areindependent.

The arithmetic mean of observed incrementsδ(h) = z(x + h)− z(x) provides an reasonable estimate of theexpectation E(∆(h)).

Homogeneity of random functions (1)

Strong homogeneity

A random function is called strongly (strictly) stationary (better:homogeneous) if all finite–dimensional joint distribtions aretranslation–invariant, i.e.

P(Z (x1) ≤ z1, . . . ,Z (xk) ≤ zk

)=

P(Z (x1 + h) ≤ z1, . . . ,Z (xk + h) ≤ zk

)


Strong stationarity (homogeneity) implies that its moments areinvariant under translation (if they exist), i.e.

EZ (x) = m

Cov(Z (x),Z (x + h)

)= E[Z (x)−m][Z (x + h)−m] = C (h)

Var(Z (x + h)− Z (x)

)= E[Z (x + h)− Z (x)]2

Thus, the mean is constant and the covariance function dependsonly on the lag h.


Second–order homogeneity

A random function is called second–order (weakly) stationary(better: homogeneous) SRF if

EZ (x) = m

Cov(Z (x),Z (x + h)

)= C (h)

If C is a function of |h| only, then the SRF is isotropic.

Second–order stationarity (homogeneity) implies

E(Z (x + h)− Z (x)

)= 0


)= 2

(C (0)− C (h)

)


Intrinsic homogeneity

A random function is called intrinsically stationary (better:homogeneous) IRF if the increment variable∆(x , h) = Z (x + h)− Z (x) is a SRF with respect to x ∈ D, i.e.

E(Z (x + h)− Z (x)

)= aTh


)= 2γ(h)

Thus, the expectation of the increments is a linear function of thelag h (linear drift), and its variance is given by the variogram(function). If γ is a function of |h| only, then the IRF is isotropic.

If the linear drift is zero, i.e. if the mean is constant, then theintrinsic model is of the form

E(Z (x + h)− Z (x)

)= 0, E

(Z (x + h)− Z (x)

)2= 2γ(h)

Bounded variograms and stationarity

An SRF is also IRF and therefore has a variogram. In this case

γ(h) = C (0)− C (h)

Thus the variogram of an SRF is bounded by C (0).

Conversely, if the variogram of an IRF is bounded, then γ is of theform given above with a stationary covariance C (h).

Sill of a variogram and sample variance

The theoretical variance of Z (x) is equal to C (0) if Z is an SRF,or does not exist if Z is a nonstationary IRF.

In the case of an SRF, the expectation of the sample variance isalways smaller than the theoretical variance. Thus, the samplevariance is a biased estimate of the theoretical variance C (0).

The variogram, unlike the covariance, does not require theknowledge of the mean. In practice, this mean is not known. Thevariogram is not affected by these problems, because itautomatically filters the mean.

Sample variogram

The sample variogram (experimental, empirical variogram) isdefined as

γ(h) =1

2N(h)

∑x`−xk'h

(z(x`)− z(xk)

)2where N(h) denotes the total number of pairs of points separatedby the lag h.

Variances (1)

What is the variance of an arbitrary linear combination∑λ`Z (x`)?

Let Z ∗ be a finite linear combination of random variables Z (x`) ofa random function (Z (x), x ∈ D), i.e.,

Z ∗ =∑`

λ`Z (x`) , λ` ∈ R

then

Var(Z ∗) = Var(∑

`

λ`Z (x`))

= Cov(∑

`

λ`Z (x`),∑k

λkZ (xk))

=∑`

∑k

λ`λkCov(Z (x`),Z (xk)

)=

∑`

∑k

λ`λkC (x` − xk) = C (x` − xk)

Variances (2)

If Z is an SRF, then

0 ≤ Varn∑`=1

λ`Z (x`) =n∑`=1

n∑k=1

λ`λkC (x` − xk)

A function C with this property is called positive definite.

Variances (3)

Assuming intrinsic stationarity (homogeneity), the first twomoments of the incremets, in particular the semivariogram exists.However, the covariance function of the increments may not exist.If Z is an IRF, then only linear combinations which can berepresented as linear combinations of increments have a finitevariance, i.e.,

n∑`=1

λ`Z (x`) =n∑`=1

λ`

(Z (x`)− Z (x0)

)if

n∑`=1

λ` = 0

Variances (4)

If Z is an IRF and∑λ` = 0, then

0 ≤ Varn∑`=1

λ`Z (x`) = −n∑`=1

n∑k=1

λ`λkγ(x` − xk)

Thus, −γ is a conditionally positive definite functions.

In the case of∑λ` = 0 the covariance function C may formally be

replaced by the negative semivariogram −γ.

Variances (4)

Let

Z ∗(x0) =n∑`=1

λ`Z (x`)

then the variance

Var( n∑`=1

λ`Z (x`)− Z (x0))

can be represented in terms of the variogram if

n∑`=1

λ` = 1

Variogram models (1)

Pure nugget–effect

γ(h) :=

0 , if h = 01 , otherwise

Spherical semivariogram

γ(h) :=

1.5 h

a − 0.5(ha)3, if h < a

1 , otherwise

Exponential semivariogram

γ(h) := 1− exp

(−3h

a

)Gaussian semivariogram

γ(h) := 1− exp

(−3h2

a2

)

Variogram models (2)

Power variogram

γ(h) := hω 0 < ω < 2

Variogram models and covariance functions

For bounded model semivariograms (with a sill) referred to as“transition models” the corresponding model covariance isprovided by

c(h) := 1− γ(h)

For unbounded model semivariogram functions a correspondingcovariance function does not exist.

Kriging (1)

Named in honour of Danie Krige, kriging is the genuine method ofgeostatistics for spatial prediction.

It is a stochastics based method of spatial prediction (estimation)employing the spatial structure, i.e., the spatial correlation, ascaptured in the semivariogram.

Based on n georeferenced data of an attribute z(x`), ` = 1, . . . , n,the value of this attribute z shall be predicted for any locationx0 ∈ D by a linear combination of the data, i.e.

z∗(x0) :=n∑`=1

λ`(x0)z(x`)

where the weights depend on the spatial correlation of the data.

Kriging (2)

Notation

The random fucntion (Z (x), x ∈ D).

The set of points S ⊂ D where Z (x) has been sampled:S = x ∈ D | z(x) known . Usually S is finite a consists of npoints.

The data z(x`), ` = 1, . . . , n.

The mean value m(x`) = m`

The covariances Cov(Z (x`),Z (xk)) = σij

Change of support: Variability depends on material support,and must therefore be considered, e.g. point kriging, blockkriging, ... .

Neighborhoods: local vs. global approach

Kriging (3)

Linear approach of kriging

Z ∗(x0) =n∑`=1

λ`(x0)Z (x`) + λ0(x0)

or for short

Z ∗ =n∑`=1

λ`Z` + λ0

Eventually we shall see that kriging is of the form

Z ∗ =n∑`=1

λ`Z`

Kriging (4)

A proper estimator should be unbiased

E[Z ∗(x0)] = E[Z (x0)] i.e. E[Z ∗(x0)− Z (x0)] = 0

and its associated estimation variance

σ2E (x0) = Var[Z ∗(x0)− Z (x0)]!−→ min

should be as small as possible.

Note

Var[Z ∗(x0)− Z (x0)] = E[Z ∗(x0)− Z (x0)]2︸︷︷︸mean square error

−E2[Z ∗(x0)− Z (x0)]︸︷︷︸bias

Kriging (5)

These two requirements are the characteristics of kriging:

blue, blup – best linear unbiased estimator, predictor

and lead to the problem of quadratic programming (optimization)

σ2E (x0)!−→ min

subject to unbiasedness

E[Z ∗(x0)− Z (x0)] = 0

Kriging (6)

Simple kriging (SK) refers to kriging in case of a constantknown mean and a known covariance function and thus tosecond–order stationarity.

Ordinary kriging (OK) refers to the case of a constant unknownmean and a known variogram and thus to intrinsic stationarity.

Universal kriging (UK) refers to case of an unknown mean ofknown type and a known variogram.

Simple Kriging (1)

Assuming a known constant mean m: m(x) = mLinear approach

Z ∗ =n∑`=1

λ`Z` + λ0

The constant λ0 and the weights λ` are determined so as tominimze the error Z ∗ − Z0 characterized by its expected meansquare E(Z ∗ − Z0)2.The mean square error (mse) is

E(Z ∗ − Z0)2 = Var(Z ∗ − Z0)︸︷︷︸variance term

+ E2(Z ∗ − Z0)︸︷︷︸bias term

To make the bias term vanish, it is necessary that

λ0 = m0 −∑`

λ`m`

Simple Kriging (2)

Then

Z ∗ = m0 +n∑`=1

λ` (Z` −m`)︸︷︷︸Y`

with

Y ∗ =n∑`=1

λ`Y`

This amounts to predicting the zero–mean variableY (x) = Z (x)−m(x) by the linear estimator Y ∗ =

∑` λ`Y`, and

adding the mean afterwards

Z ∗ = Y ∗ + m0

Therefore, the case of a known mean is equivalent to the case of azero mean and λ0 = 0.

Simple Kriging (3)

The mse which is now a variance is

Var(Z ∗−Z0) =∑`

∑k

λ`λkCov(Z`,Zk)−2∑`

λ`Cov(Z`,Z0)+Var(Z0)

Setting all partial derivatives to 0

∂

∂λ`E(Z ∗ − Z0)2 = 2

∑k

λkCov(Z`,Zk)− 2Cov(Z`,Z0)!= 0

As the mse is a convex function due to the positive definitness ofthe covariance, the weights are finally given as the solutions of thesimple kriging system∑

k

λSKk Cov(Z`,Zk) = Cov(Z`,Z0)

These equations provide the best linear unbiased predictor (blup).

Simple Kriging (4)

With ∑k

λSKk C (x` − xk) = C (x` − x0)

the estimation variance, called the kriging variance, associated withZ ∗ is

σ2SK(x0) = E(Z ∗ − Z0)2 = Var(Z0)−∑`

λSK` Cov(Z`,Z0)

= C (0)−∑`

λSK` C (x` − x0)

Note that the kriging variance does not depend on the variablesZ (x`) nor on the data z(x`)!

Simple Kriging (5)

More explicitly the simple krige – system reads C (x1 − x1) . . . C (x1 − xn)...

. . ....

C (xn − x1) . . . C (xn − xn)

λSK1 (x0)

...λSKn (x0)

=

C (x1 − x0)...

C (xn − x0)

,

which reduces in matrix notation to

KSKλSK (x0) = kSK ,

Then, its solution is given by

λSK (x0) = K−1SK kSK

and the kriging – variance is accordingly

σ2SK = C (0)− λTSK (x0)kSK

= C (0)− kTSKK−1SK kSK

Simple Kriging (6)

A solution of the kriging system C (x1 − x1) . . . C (x1 − xn)...

. . ....

C (xn − x1) . . . C (xn − xn)

λSK1 (x0)

...λSKn (x0)

=

C (x1 − x0)...

C (xn − x0)

,

in matrix notationKSKλSK (x0) = kSK

exists and is unique, and the kriging variance is positive, ifKSK = C (xi − xj) is positive definit, i.e. in practice if

1 xi 6= xj fur i 6= j

2 the covariance function has been modeled by authorizedmathematical model functions.

Simple Kriging (9)

Properties

Interpolation

z∗(x`) = z(x`), σ2SK(x`) = 0

Smoothing

VarZ ∗ =∑`

∑k

λSK` λSK

k Cov(Z`,Zk) =∑`

λSK` Cov(Z`,Z0),

VarZ ∗ = VarZ0 − σ2SK

Simple Kriging – Dual Kriging (1)

z∗(x) =∑`

λSK` (x) z(x`) =

z(x1)...

z(xn)

T λSK1 (x)

...λSKn (x)

=

z(x1)...

z(xn)

T C (x1 − x1) · · · C (x1 − xn)

.... . .

...C (xn − x1) · · · C (xn − xn)

−1 C (x1 − x0)

...C (xn − x0)

=

b1(x1, . . . , xn)...

bn(x1, . . . , xn)

T C (x1 − x0)

...C (xn − x0)

=∑`

b` C (x` − x)

or in matrix notation

z∗(x) = zTλSK(x) = zTK−1SK kSK =∑`

b`C (x` − x) .


Reading


b`C (x` − x) .

as a superposition of shifted covariance functionsCov(Z`,Z ) = C (x` − x) centered at the sampling locations x`, thecovariance function C determines the “smoothness”, i.e. continuityand regularity, of z∗.

If the covariance is parabolic near the origin, then z∗ isdifferentiable;

if it is linear near the origin, then z∗ is continuous but withcusps at the data points;

if the covariance has a discontinuity at the origin, then therewill be isolated jumps at the data points.


Rewriting


b`C (x` − x) .

and postulating interpolation

z∗(xk) =∑`

b`C (x` − xk)!= z(xk) , j = 1, . . . , n

or in matrix notationKSKb = z ,

the kriging estimator can be characterized as solution of theinterpolation problem.

Ordinary Kriging (1)

Assuming an unknown constant mean m: m(x) = mOrdinary kriging is the most simple case where the randon functionZ (x) is decomposed according to

Z (x) = m(x) + Y (x)

into a sum of a deterministic function m called the drift describingthe systematic behaviour, and a zero–mean random function Y (x)called the residulas capturing the erratic fluctuations.


Linear approach

Z ∗ =n∑`=1

λ`Z` + λ0

minimizing the mse

E(Z ∗ − Z0)2 = Var(Z ∗ − Z0) + E2(Z ∗ − Z0)

= Var(Z ∗ − Z0)︸︷︷︸variance term

+[λ0 +

(∑`

λ` − 1)m]2

︸︷︷︸bias term


To make the bias term[λ0 +

(∑`

λ` − 1)m]2

vanish, it is necessary that

λ0 = 0∑`

λ` = 1

Then

E[Z ∗ − Z0] =∑`

λ`m −m =(∑

`

λ` − 1)m = 0

as∑

` λ` = 1.


Subject to the condition∑

` λ` = 1 the mse to be minimized is

Var(Z ∗ − Z0) = −∑`

∑k

λ`λkγ(Z`,Zk) + 2∑`

λ`γ(Z`,Z0)

Applying the method of Lagrangian multipliers to solve theconstrained minimization problem leads to considering theLagrangian function

L(λ1, . . . , λn;µOK) = Var(Z ∗ − Z0) + 2µOK

(∑`

λ` − 1)

and setting all its partial derivatives to zero

∂L

∂λ`= 2

∑k

γ(Z`,Zk)− 2γ(Z`,Z0) + 2µOK!= 0

∂L

∂µOK= 2

∂L

∂λ`

!= 0

leads to the ordinary kriging system


Ordinary kriging system∑k

λOKk γ(Z`,Zk) + µOK = γ(Z`,Z0)∑

`

λOK` = 1

With the ordinary kriging weights λOK the estimation variance,called the kriging variance, associated with Z ∗ is

σ2OK(x0) = E(Z ∗ − Z0)2 =∑`

λSK` γ(Z`,Z0) + µOK

Note that the kriging variance does not depend on the variablesZ (x`) nor on the data z(x`)!


OK in terms of covariances reads∑k

λOKk Cov(Z`,Zk) + µOK = Cov(Z`,Z0)∑

`

λOK` = 1

More explicitly, OK weights are determined by(λOKk (x)µOK (x)

)=

(C (x` − xk) 1

1 0

)−1(C (x` − x0)

1

)Then, with the ordinary kriging weights λOK the kriging varianceassociated with Z ∗ is

σ2OK(x0) = E(Z ∗ − Z0)2 = Var(Z0)−∑`

λSK` Var(Z`,Z0)− µOK

= C (0)−∑`

λSK` C (x` − x0)− µOK


In case of a SRF, OK is equivalent to optimum estimation of theunknown mean (by kriging!) followed by SK of the residuals fromthe (optimum) mean estimate as if the mean was known, i.e.perfectly estimated.

If the mean would have been estimated in a different(conventional) way, then the above statement is not true.

In case of IRF, the SK estimator is not defined. Moreover, since anIRF is defined by increments, the unknown mean of its randomvariables is indeterminate and cannot be estimated.


In practice, the semivariogram is modeled (variography); fornumerics it is replaced by a pseudo covariance function

A− γ(h)

where A denotes a sufficiently large real number such that

A− γ(h) ≥ 0

for all lags h, which are numerically relevant.

In this way, the numerical performance of software can beimproved.

Documents

Multivariate Geostatistics - tu-freiberg.deGeoStatistics Geostatistics is a generalization of classical statistics for georeferenced, spatially and stochastically dependent random