Upload
others
View
33
Download
2
Embed Size (px)
Citation preview
Multivariate Geostatistics
Winter Term 2018/19
Stochastic
About Probability and Statistics
Mathematics
Fields of Mathematics
AnalysisLinear AlgebraFunctional AnalysisDifferential EquationsIntegral Transforms...
Stochastics
Probability
Mathematical Statistics
Applied Statistics
Data Analysis, Data Mining
Mathematics – Analysis, Linear Algebra
Mapping
Mappings (“Abbildungen”) are a central issue in almost allmathematical disciplines.
Analysisreal functions, continuous functions, continuouslydifferentiable functions, integrable functions, etc.
f : R 7→ R+, f (x) = x2,
f : [−π/2, π/2] 7→ [−1, 1], f (ω) = cos2κ ω, κ ∈ N
Linear Algebralinear mapping, linear map provided by a matrix, matrixassociated with a linear map, homomorphism, isomorphism,etc.
A : R2 7→ R2, w =
(cosω − sinωsinω cosω
)v
Mathematics – Probability
Mapping
Probabilitymeasurable mapping, random variable, real random variable
Z : (Ω,A,P) 7→ (R,B)
P(Z (ω) ∈ B) := P(ω ∈ Z−1(B)) for all B ∈ B
where (Ω,A,P) denotes a probability space and (R1,B) thereal measurable space.
A random variable Z is completely defined by its probabilitylaw, i.e. its distribution. The distribution tells the probabilitywith which the random variable Z realizes values z = Z (ω).The values Z (ω), ω ∈ Ω, are called realizations.
Mathematics – Probability
Random variable
In contrast to e.g. analysis or linear algebra, there is usually noexplicit formula or rule (“Abbildungsvorschrift”) for a randomvariable Z : ω 7→ z how to assign z ’s to ω’s.
A random variable is given by its distribution; what can be knownof a random variable is its distribution.
If the distribution is known, then the random variable is known andall its properties can be deduced from the distribution.
Probability
Mathematical probability assumes the probability law to beknown and develops a theorie how to describe random events byrandom variables and investigates their properties.
Probability
For a real random variable Z : (Ω,A,P) 7→ (R,B), the distributionmay be given in terms of the distribution function F
F (z) := P(Z (ω) ∈ (−∞, z ]), z ∈ R
If the distribution function F can be represented as the integral
F (z) =
∫ z
−∞f (x)dx ,
then f is called probability density function, and in this case thedistribution may also be represented by its probability densityfunction.
For instance, the exponential law with parameter λ is given
P(Z ≤ z) = F (z) = 1− exp(−λz) = λ
∫ z
−∞exp(−λx)dx ,
f (x) = λ exp(−λx), 0 ≤ x , 0 < λ
Probability
Expectation, variance
Major properties of a real random variable are given in terms of“moments” or “central moments” of the distribution. The twomost prominent moments are the expectation of a real randomvariable
EZ =
∫ ∞−∞
z dF (z) =
∫ ∞−∞
z f (z)dz = µ
and the variance of a real random variable
VarZ = E(Z − EZ )2 =
∫ ∞−∞
(z − µ)2 f (z)dz = σ2.
For instance, the exponential law is a one–parameter distributionand its meaning is
EZ =1
λ, VarZ =
1
λ2.
Probability
Cauchy distribution
In the same way as there are probability laws for which aprobability density function does not exis, there are distributions forwhich moments do not exist, for instance the Cauchy distribution
f (z) =1
π
λ
λ2 + (z − µ)2, 0 < λ, µ ∈ R,
does not have an expectation nor a variance.
However, its median and mode is given by µ.
Probability
Covariance
The covariance of two real random variables Z1,Z2
Cov(Z1,Z2) = E(
(Z1 − EZ1)(Z2 − EZ2))
is a measure for the extent of a linear relationship between Z1 andZ2.
Independence of random variables
Two real random variables are called indepedent if their jointprobability law is the product of the two individual (“marginal”)probability laws.
Probability
Covariance
The covariance of two real random variables Z1,Z2
Cov(Z1,Z2) = E(
(Z1 − EZ1)(Z2 − EZ2))
is a measure for the extent of a linear relationship between Z1 andZ2.
Independence of random variables
Two real random variables are called indepedent if their jointprobability law is the product of the two individual (“marginal”)probability laws.
Probability
It holds
E(aZ + b) = aEZ + b (a, b ∈ R)
Var(aZ + b) = a2VarZ
E(Z1 ± Z2) = EZ1 ± EZ2
Var(Z1 ± Z2) = VarZ1 + VarZ2 ± 2Cov(Z1,Z2)
with Cov(Z1,Z2) = E(
(Z1 − EZ1)(Z2 − EZ2))
= E(Z1Z2)− EZ1EZ2
Probability
Uncorrelatedness vs. independence
Two random variables Z1,Z2 are called uncorrelated, if
Cov(Z1,Z2) = 0
For uncorrelated random variables Z1,Z2 it holds
E(Z1Z2) = EZ1 EZ2
Var(Z1 ± Z2) = VarZ1 + VarZ2
If two random variables are stochastically independent, then theyare also uncorrelated. The inverse is not generally true.
Mathematical Statistics
Mathematical statistics develops methods to determine theparameters of a distribution from a mathematical sample and itdevelops statistical tests to check hypotheses.
Mathematical Statistics
Mathematical sample
Mathematical statistics initially models a sequence of n realobserved univariate data z1, z2, . . . , zn ∈ R1 as independentrealizations of a real random variable Z .
In the multivariate case, zi ∈ Rm, i = 1, . . . , n, and Z denotes arandom vector.
Mathematical Statistics
Mathematical sample
The model of mathematical statistics may be generalized along thefollowing example.
Throwing a proper dice n times and recording the results each timeis equivalent to simultaneously throwing n dices once and recordingall outcomes if(i) the dice–cup is large enough such that the dices do notinterfere, and(ii) the n dices are identical copies of the initial one.
Mathematical StatisticsMathematical sample
Mathematical statistics initially models a sequence of n realobserved univariate data z1, z2, . . . , zn ∈ R1 as independentrealizations of a real random variable Z .
The link of mathematical statistics and probability is established byemploying a twofold model of the data as follows
Z → z1, z2, . . . zn↑ ↑ ↑ ↑Z1, Z2, . . . Zn
where Z1,Z2, . . . ,Zn is a sequence of independent identicaldistributed (iid) random variables.
The sequence of data is modeled as the n–fold realization of aunique random variable Z and as a unique realization of an iidsequence of n random variables Z1,Z2, . . . ,Zn.The set Z1, . . . ,Zn is called mathematical sample.
Mathematical StatisticsMathematical sample
Mathematical statistics initially models a sequence of n realobserved univariate data z1, z2, . . . , zn ∈ R1 as independentrealizations of a real random variable Z .
The link of mathematical statistics and probability is established byemploying a twofold model of the data as follows
Z → z1, z2, . . . zn↑ ↑ ↑ ↑Z1, Z2, . . . Zn
where Z1,Z2, . . . ,Zn is a sequence of independent identicaldistributed (iid) random variables.
The sequence of data is modeled as the n–fold realization of aunique random variable Z and as a unique realization of an iidsequence of n random variables Z1,Z2, . . . ,Zn.The set Z1, . . . ,Zn is called mathematical sample.
Mathematical Statistics
Mathematical sample
According to the second model of mathematical statistics we findthe following results. Let
M :=1
n
n∑i=1
Zi
Then
EM = E(1
n
n∑i=1
Zi
)=
1
n
n∑i=1
EZi = EZ = µ
VarM = Var(1
n
n∑i=1
Zi
)=
1
n2
n∑i=1
VarZi =1
nVarZ =
1
nσ2
Applied Mathematical Statistics – Data Analysis
Applied statistics attempts to find the probability law of therandom variable from which the observed data are thought of asrealizations and applies statistical tests to real world data inpractice.
Applied Mathematical Statistics – Data Analysis
Statistics
Descriptive mathematical statistics attempts to condense theinformation conveyed by the data into a few numbers describingand characterizing the set of data. In this way, empirical mean,empirical variance, empirical covariance, etc. may be seen asemprirical parameters of the set of data.
z =1
n
n∑i=1
zi
s2 =1
n − 1
n∑i=1
(zi − z)2, s =√s2
s12 =1
n − 1
n∑i=1
(z1i − z1)(z2i − z2)
r =1
n−1∑n
i=1(z1i − z1)(z2i − z2)
s1s2
Applied Mathematical Statistics – Data Analysis
Statistics
Inferential mathematical statistics attempts to infer theprobability law of the random variable from the sequence of thedata and its descriptive parameters, and to prove hypotheses bymeans of statistical tests.
At best, the empirical parameters of the data turn out to bereasonable estimates of the parameters of the probability law andto provide insight.
Applied Statistics
Stochastic modelRandom variables
ExpectationVarianceCovarianceCovariance matrix
Real WorldData
Arithmetic meanEmpirical varianceEmprirical covarianceMatrix empirical covariances
Mathematical Statistics
The fundamental modeling assumption of applied statistics
The elements of the sample are assumed to be independentand identical repetitions of the same experiment orobservation; otherwise, classical statistics does not apply.
GeoStatistics
Geostatistics is a generalization of classical statistics forgeoreferenced, spatially and stochastically dependent randomvariables. Thus, the fundamental modeling assumption of classicalstatistics is violated by definition.
The fundamental modeling assumption of geostatistics ishomogeneity (stationarity) which is a mathematicalexpression for a conservation law, e.g. the increments of anytwo random variables are not spatially dependent.
References (1)
Armstrong, M., 1998, Linear Geostatistics: Springer
Armstrong, M. und Dowd, P.A., 1994, Geostatistical Simulations:Kluwer Academic Publishers
Armstrong, M., Galli, A.G., Leloc’h, G.L.Y., Geffroy, F.L., Eschard,R., 2003, Plurigaussian Simulations in Geosciences: Springer
Chiles, J.-P., Delfiner, P., 1999, Geostatistics: ModelingSpatial Uncertainty: Wiley
Christakos, G., 1992, Random Field Models in Earth Sciences:Academic Press
Christakos, G., 2000, Modern Spatiotemporal Geostatistics: OxfordUniversity Press
Cressie, N.A.C., 1993, Statistics for Spatial Data - RevisedEdition: Wiley
David, M., 1977, Geostatistical Ore Reserve Estimation: Elsevier
References (2)
Deutsch, C.V, Journel, A.G., 1998, Geostatistical Software Libraryand User’s Guide, Second Edition: Oxford University Press
Goovaerts, P., 1997, Geostatistics for Natural ResourcesEvaluation: Oxford University Press
Houlding, S.W., 2000, Practical Geostatistics: Springer
Isaaks, E.H., Srivastava, R.M., 1989, An Introduction to AppliedGeostatistics: Oxford University Press
Journel, A.G., 1989, Fundamentals of Geostatistics in FiveLessons: American Geophysical Union
Journel, A.G., Huijbregts, C., 1978, Mining Geostatistics:Academic Press
Kitanidis, P.K., 1997, Introduction to Geostatistics: Applications inHydrogeology: Cambridge University Press
Krige, D., 1978, Lognormal - de Wijsian Geostatistics for OreEvaluation: South African Institute of Mining and Mineralogy
References (3)
Lantuejoul, C., 2002, Geostatistical Simulation: Models andAlgorithms: Springer
Mallet, J.-L., 2002, Geomodeling: Cambridge University Press
Matheron, G., 1971, The theory of regionalized variables andits applications: Les Cahiers du Centre de MorphologieMathematique de Fontainebleau, no 5
Matheron, G., 1989, Estimating and Choosing: Springer
Muller, W.G., 1998, Collecting Spatial Data: Physica Verlag
Myers, J., 1997, Geostatistical Error Mangement QuantifyingUncertainty For Environmental Sampling and Mapping: VanNostrand Reinhold
Olea, R. A., 1991, Geostatistical Glossary and MultilingualDictionary: Oxford University Press
Olea, R. A., 1999, Geostatistics for Engineers and Earth Scientists:Kluwer Academic Publishers
References (4)
Pawlowsky–Glahn, V., Olea, R. A., 2004, Geostatistical Analysis ofCompositional Data: Oxford University Press
Rivoirard, J., Simmonds, J., Foote, K.G., Fernandes, P., Bez, N.,2000, Geostatistics for Estimating Fish Abundance: BlackwellScience
Stein, M.L., 1999, Interpolation of Spatial Data - Some Theory forKriging: Springer
Wackernagel, H., 1998, Multivariate Geostatistics (2nd completelyrevised version): Springer
References: Gestatistics in the www
https://wiki.52north.org/AI_GEOSTATS/WebHome
http://www.iamg.org/
Geostatistics in a nutshell
The problem and prerequisites of its resolution
Experiencing spatially induced correlation
Applying spatial correlation to prediction I:Heuristic models
Applying spatial correlation to prediction II:Stochastic model – “Kriging”
Data based descriptive models of spatial correlation:The experimental semi–variogramStochastics modeling with random functions:The semi–variogramFundamental assumption of geostatistics – HomogeneityModeling the semi–variogramBest linear unbiased estimator (BLUE)Kriging systemsPractice of ordinary kriging
Stochastic simulation of random functions
The problem
Inventory
symbol description
x1, . . . , xn ∈ D sites of samplingz(x1), . . . , z(xn) data (scalar or vectorial)
Let x0 ∈ D with z(x0) unknown; define the linear combination
z∗(x0) =n∑`=1
w`(x0)z(x`)
with initially unknown coefficients (“weights”) w`(x0), ` = 1, . . . , n.
The problem
Linear combination
z∗(x0) =n∑`=1
w`(x0)z(x`)
Problem
What are the prerequisites that z∗(x0) would be a reasonablepredictor of z(x0)?
What is a reasonable way to determine the weigthsw`(x0), ` = 1, . . . , n, in such a way that– z∗(x0) is a good predictor of z(x0),– is the “best” predictor with respect to what criterion?
Conceptually,
given x0, which w`(x0) 6= 0, i.e., which z(x`) enter theprediction z∗(x0) ?
if w`(x0) 6= 0, how to determine it?
The prerequisites
Tendency of preservation
The general possibility of a reasonable predictor z∗(x0) requiressome kind of “tendency of preservation” of the properties beingsampled.
Such a tendency could mathematically be captured with terms likecontinuity or continuous differentiability, i.e., with some measure ofsmoothness.
In probability or statistics it would be termed spatially inducedsimilarity, correlation, dependence.
The prerequisites
Counter–example
Throwing dices at x`, ` = 1, . . . , n.
“The next observation is as surprising as each previous one.”
Contradiction to the fundamental assumption of statistics
Approaching the problem stochastically, a new kind of statistics isrequired as classical statistics depends on the fundamentalassumption of independent identical distributed random variables,i.e., on independent repetitions of identical sampling.
The prerequisites
Counter–example
Throwing dices at x`, ` = 1, . . . , n.
“The next observation is as surprising as each previous one.”
Contradiction to the fundamental assumption of statistics
Approaching the problem stochastically, a new kind of statistics isrequired as classical statistics depends on the fundamentalassumption of independent identical distributed random variables,i.e., on independent repetitions of identical sampling.
The prerequisites
Counter–example
Throwing dices at x`, ` = 1, . . . , n.
“The next observation is as surprising as each previous one.”
Contradiction to the fundamental assumption of statistics
Approaching the problem stochastically, a new kind of statistics isrequired as classical statistics depends on the fundamentalassumption of independent identical distributed random variables,i.e., on independent repetitions of identical sampling.
The prerequisites
Counter–example
Throwing dices at x`, ` = 1, . . . , n.
“The next observation is as surprising as each previous one.”
Contradiction to the fundamental assumption of statistics
Approaching the problem stochastically, a new kind of statistics isrequired as classical statistics depends on the fundamentalassumption of independent identical distributed random variables,i.e., on independent repetitions of identical sampling.
Experiencing spatially induced correlation
Scientists’ and engineers’ experience communicates, e.g., that theore contents of the specimen in the sample from a homogeneousore deposit
are the more similar the closer their respective samplinglocation independently of their actual value;
are no longer similar at all, if the distance of their respectivelocations is larger than a specific distance characteristic forthe ore deposit.
This experience can be generalized for many spatial phenomena,too.
Applying spatial correlation
Some kind of spatially induced “similarity”, “continuity”,“correlation” is a prerequisite of any reasonable prediction.
According to common experience, the extent of this spatiallyinduced “similarity”, “continuity”, “correlation” is a function ofdistance.
Turning this experience constructive, the linear ansatz may berewritten as
z∗(x0) =n∑`=1
w`(x0)z(x`) =n∑`=1
w(x0 − x`)z(x`)
with a decreasing weight function w radially symmetric withrespect to the origin.
Applying spatial correlation to predictionHeuristic models (1)
Inverse distance weighting
z∗(x0) =n∑`=1
w(x0 − x`)z(x`)
with
w`(x0) = w(x0 − x`) =1/‖x0 − x`‖∑n`=1 1/‖x0 − x`‖
Other choices?
Applying spatial correlation to predictionHeuristic models (2)
How to choose appropriate weight functions – naive view
Weighted mean of data
z∗(x0) =n∑`=1
w`(x0)︸ ︷︷ ︸weight
z(x`)︸ ︷︷ ︸data
Applying spatial correlation to predictionHeuristic models (3)
How to choose an appropriate weight function – dual view
Linear combination of radially symmetric basis function
z∗(x) =n∑`=1
w`(x)︸ ︷︷ ︸base functions
z(x`)︸ ︷︷ ︸weights
=n∑`=1
w(x − x`)︸ ︷︷ ︸base function
z(x`)︸ ︷︷ ︸weights
What can be said about the smoothness of z∗(x) in terms ofcontinuity, continuous differentiability, etc.?
Applying spatial correlation to predictionStochastic model (1)
Apply data analysis, i.e., descriptive statistics to derive adescription of the spatially induced correlation.
Applying spatial correlation to predictionStochastic model (2)
h–scatter plot
A h–scatter plot is a scatter plot of all pairs of measurements ofthe same attribute z at locations z(x`), z(x` + h) separated by thevector h.
Note that h is a vector such that x`, x` + h ∈ D.
A h–scatter plot visualizes spatial variability or continuity,respectively. It is very helpful to identify extreme values.
Applying spatial correlation to predictionStochastic model (3)
h–scatter plot
For most natural phenomena it is generally expected that thespatial variability increases, i.e., spatial continuity decreases, as thelength ‖ h ‖ of h increases.
Thus, for increasing ‖ h ‖ the points cluster worse around the firstbisector in the (z(x), z(x + h))–plane.
This behaviour may be different for different directions h, which isreferred to as anisotropy.
A h–scatter plot may be summarized by the mean of the squaredorthogonal distances of the points to the first bisector. Thus, interms of mechanics, it is the moment of inertia with respect to thefirst bisector.
Applying spatial correlation to predictionStochastic model (4)
Sample semi–variogram
γ(h) :=1
2N(h)
N(h)∑α=1
[z(x`)− z(x` + h)]2
where z(x`)− z(x` + h) are referred to as h–increments of z .
Pythogoras’ theorem helps to see that
1
2[z(x`)− z(x` + h)]2 = cos2
π
4[z(x`)− z(x` + h)]2
actually is the squared orthogonal distance of the point withcoordinates (z(x`), z(x` + h)) to the first bisector.
Applying spatial correlation to predictionStochastic model (5)
Sample semi–variogram
The sample semi–variogram is a data–driven figure describing theincreasing dissimilarity of observations at any two sites withincreasing distance.
From the plots of an sample semivariogram we may read off thesill, the range, and the nugget–effect, and summarize it in thesethree terms.
Applying spatial correlation to predictionStochastic model (6)
Descriptive statistics vs. mathematical statistics
What is the counterpart of the semi–variogram in terms ofprobability and mathematical statistics?
Measuring the dissimilarity of z(x`) compared to z(x` + h), or thevariability of the increments (z(x`)− z(x` + h)), its counterpartshould be a variance.
Being a mean, the semi–variogram is reminiscent of an expectation.
Variance of increments
Var(Z (x)− Z (x + h)
)= E
(Z (x)− Z (x + h)
)2 − (E(Z (x)− Z (x + h)
))2︸ ︷︷ ︸homogeneity assumption: ≡0
Applying spatial correlation to predictionStochastic model (6)
Descriptive statistics vs. mathematical statistics
What is the counterpart of the semi–variogram in terms ofprobability and mathematical statistics?
Measuring the dissimilarity of z(x`) compared to z(x` + h), or thevariability of the increments (z(x`)− z(x` + h)), its counterpartshould be a variance.
Being a mean, the semi–variogram is reminiscent of an expectation.
Variance of increments
Var(Z (x)− Z (x + h)
)= E
(Z (x)− Z (x + h)
)2 − (E(Z (x)− Z (x + h)
))2︸ ︷︷ ︸homogeneity assumption: ≡0
Applying spatial correlation to predictionStochastic model (6)
Descriptive statistics vs. mathematical statistics
What is the counterpart of the semi–variogram in terms ofprobability and mathematical statistics?
Measuring the dissimilarity of z(x`) compared to z(x` + h), or thevariability of the increments (z(x`)− z(x` + h)), its counterpartshould be a variance.
Being a mean, the semi–variogram is reminiscent of an expectation.
Variance of increments
Var(Z (x)− Z (x + h)
)= E
(Z (x)− Z (x + h)
)2 − (E(Z (x)− Z (x + h)
))2︸ ︷︷ ︸homogeneity assumption: ≡0
The stochastic model: Random functions (1)
Random function
The set of spatially indexed random variables (Z (x), x ∈ D) iscalled a random function (RF).
Inventory of the model
symbol description
x1, . . . , xn ∈ D sites of samplingZ (x1), . . . ,Z (xn) random variables authorized for sampling sitesz(x1), . . . , z(xn) data z(x`)
interpreted as single realization of Z (x`)Z (x0) random variable authorized for location x0z(x0) unknown, to be estimatedZ ∗(x0) random variable,
estimator of the random variable Z (x0)z∗(x0) realisation of Z ∗(x0), estimate of z(x0)
The stochastic model: Random functions (2)
The novel model provided by a random function interprets the dataz`, ` = 1, . . . , n, supported by the n specimen at the samplinglocations x` ∈ D, ` = 1, . . . , n, as a unique discrete realisationz(x`), x` ∈ D, ` = 1, . . . , n, of a unique spatial random functionZ (x), x ∈ D.
The random function is also referred to as regionalized randomvariable.
The stochastic model: Random functions (3)
Estimator – estimate
Let x0 ∈ D; define the estimator Z ∗(x0) as linear combination ofrandom variables, i.e.,
Z ∗(x0) =n∑`=1
λ`(x0)Z (x`)
with initially unknown coefficients (“weights”) λ`(x0), ` = 1, . . . , n.
Problem rephrased
What is a reasonable way to determine the weigthsλ`(x0), ` = 1, . . . , n, in such a way that– Z ∗(x0) is a good estimator of Z (x0),– is the “best” estimator with respect to what criterion?
The stochastic model: Random functions (3)
Estimator – estimate
Let x0 ∈ D; define the estimator Z ∗(x0) as linear combination ofrandom variables, i.e.,
Z ∗(x0) =n∑`=1
λ`(x0)Z (x`)
with initially unknown coefficients (“weights”) λ`(x0), ` = 1, . . . , n.
Problem rephrased
What is a reasonable way to determine the weigthsλ`(x0), ` = 1, . . . , n, in such a way that– Z ∗(x0) is a good estimator of Z (x0),– is the “best” estimator with respect to what criterion?
The stochastic model: Random functions (4)
Estimator – estimate
If
Z ∗(x0) =n∑`=1
λ`(x0)Z (x`)
is a good estimator of Z (x0), then its realisation
z∗(x0) =n∑`=1
λ`(x0)z(x`)
with the same weights should be a good estimate of z(x0).
Random functions – Moments (1)
Any marginal one–point distribution function is given by
F (x ; z) := P(Z (x) ≤ z)
The mean of the RF is the expected value function
m(x) = EZ (x)
and its variance is the variance function
VarZ (x) = EZ 2(x)−m2(x)
of the random funcction (Z (x), x ∈ D).
Moments (2)
The centered 2–point covariance is the covariance of therandom variables Z (x1) and Z (x2)
Cov(Z (x1),Z (x2)
)= E[Z (x1)−m(x1)][Z (x2)−m(x2)]
= E(Z (x1)Z (x2))−m(x1)m(x2)
It is assumed that both moments exist, i.e. that they are finite.Then Z (x) is called a second–order random function: it has afinite variance and its covariance exists everywhere.
A covariance function is an even and positive definite function.
Moments (3)
The two–point variogram is defined as
Var(Z (x1)− Z (x2)
)= 2γ(x1, x2)
The variogram is an even and non–negative function. Moreimportant, −γ is a conditionally positive definite function.
Moments (4)
It holds
2γ(x`, xk) = Var(Z (x`)− Z (xk)
)= Var
(Z (x`)
)+ Var
(Z (xk)
)− 2Cov
(Z (x`),Z (xk)
)If
Var(Z (x`)
)= Var
(Z (xk)
)=: C (0)
and Cov(Z (x`),Z (xk)
)=: C (x`, xk)
then
γ(x`, xk) = C (0)− C (x`, xk)
where C (0) corresponds to the sill of the two–point semivariogram.
Statistics of random functions (1)
The problem of geostats
Only one discrete realisation z(x`), ` = 1, . . . , n, of the randomfunction Z (x) sampled at locations x` ∈ D exists, i.e., just for afew random variables Z (x`) of the random function Z (x) only asingle realization z(x`) has been sampled.
The geostatistical solution
A random function is furnished with “pleasant” properties suchthat the model permits to derive statistics based on a singlediscrete realization.
The solution will be provided by an appropriate generalization ofthe assumption of independent and identical distribution inherentto classical statistics.
Statistics of random functions (1)
The problem of geostats
Only one discrete realisation z(x`), ` = 1, . . . , n, of the randomfunction Z (x) sampled at locations x` ∈ D exists, i.e., just for afew random variables Z (x`) of the random function Z (x) only asingle realization z(x`) has been sampled.
The geostatistical solution
A random function is furnished with “pleasant” properties suchthat the model permits to derive statistics based on a singlediscrete realization.
The solution will be provided by an appropriate generalization ofthe assumption of independent and identical distribution inherentto classical statistics.
Statistics of random functions (2)
Homogeneity
The required “pleasant” property essentially consists in themodeling assumption that the increments z(x` + h)− z(x`) andz(xk + h)− z(xk) are realizations of a unique random variable∆(h) := Z (x + h)− Z (x) representing increments independently oftheir involved locations.
Note that it is not assumed that ∆(h1) and ∆(h2) areindependent.
The arithmetic mean of observed incrementsδ(h) = z(x + h)− z(x) provides an reasonable estimate of theexpectation E(∆(h)).
Statistics of random functions (2)
Homogeneity
The required “pleasant” property essentially consists in themodeling assumption that the increments z(x` + h)− z(x`) andz(xk + h)− z(xk) are realizations of a unique random variable∆(h) := Z (x + h)− Z (x) representing increments independently oftheir involved locations.
Note that it is not assumed that ∆(h1) and ∆(h2) areindependent.
The arithmetic mean of observed incrementsδ(h) = z(x + h)− z(x) provides an reasonable estimate of theexpectation E(∆(h)).
Homogeneity of random functions (1)
Strong homogeneity
A random function is called strongly (strictly) stationary (better:homogeneous) if all finite–dimensional joint distribtions aretranslation–invariant, i.e.
P(Z (x1) ≤ z1, . . . ,Z (xk) ≤ zk
)=
P(Z (x1 + h) ≤ z1, . . . ,Z (xk + h) ≤ zk
)
Homogeneity of random functions (2)
Strong stationarity (homogeneity) implies that its moments areinvariant under translation (if they exist), i.e.
EZ (x) = m
Cov(Z (x),Z (x + h)
)= E[Z (x)−m][Z (x + h)−m] = C (h)
Var(Z (x + h)− Z (x)
)= E[Z (x + h)− Z (x)]2
Thus, the mean is constant and the covariance function dependsonly on the lag h.
Homogeneity of random functions (3)
Second–order homogeneity
A random function is called second–order (weakly) stationary(better: homogeneous) SRF if
EZ (x) = m
Cov(Z (x),Z (x + h)
)= C (h)
If C is a function of |h| only, then the SRF is isotropic.
Second–order stationarity (homogeneity) implies
E(Z (x + h)− Z (x)
)= 0
Var(Z (x + h)− Z (x)
)= 2
(C (0)− C (h)
)
Homogeneity of random functions (4)
Intrinsic homogeneity
A random function is called intrinsically stationary (better:homogeneous) IRF if the increment variable∆(x , h) = Z (x + h)− Z (x) is a SRF with respect to x ∈ D, i.e.
E(Z (x + h)− Z (x)
)= aTh
Var(Z (x + h)− Z (x)
)= 2γ(h)
Thus, the expectation of the increments is a linear function of thelag h (linear drift), and its variance is given by the variogram(function). If γ is a function of |h| only, then the IRF is isotropic.
If the linear drift is zero, i.e. if the mean is constant, then theintrinsic model is of the form
E(Z (x + h)− Z (x)
)= 0, E
(Z (x + h)− Z (x)
)2= 2γ(h)
Bounded variograms and stationarity
An SRF is also IRF and therefore has a variogram. In this case
γ(h) = C (0)− C (h)
Thus the variogram of an SRF is bounded by C (0).
Conversely, if the variogram of an IRF is bounded, then γ is of theform given above with a stationary covariance C (h).
Sill of a variogram and sample variance
The theoretical variance of Z (x) is equal to C (0) if Z is an SRF,or does not exist if Z is a nonstationary IRF.
In the case of an SRF, the expectation of the sample variance isalways smaller than the theoretical variance. Thus, the samplevariance is a biased estimate of the theoretical variance C (0).
The variogram, unlike the covariance, does not require theknowledge of the mean. In practice, this mean is not known. Thevariogram is not affected by these problems, because itautomatically filters the mean.
Sample variogram
The sample variogram (experimental, empirical variogram) isdefined as
γ(h) =1
2N(h)
∑x`−xk'h
(z(x`)− z(xk)
)2where N(h) denotes the total number of pairs of points separatedby the lag h.
Variances (1)
What is the variance of an arbitrary linear combination∑λ`Z (x`)?
Let Z ∗ be a finite linear combination of random variables Z (x`) ofa random function (Z (x), x ∈ D), i.e.,
Z ∗ =∑`
λ`Z (x`) , λ` ∈ R
then
Var(Z ∗) = Var(∑
`
λ`Z (x`))
= Cov(∑
`
λ`Z (x`),∑k
λkZ (xk))
=∑`
∑k
λ`λkCov(Z (x`),Z (xk)
)=
∑`
∑k
λ`λkC (x` − xk) = C (x` − xk)
Variances (2)
If Z is an SRF, then
0 ≤ Varn∑`=1
λ`Z (x`) =n∑`=1
n∑k=1
λ`λkC (x` − xk)
A function C with this property is called positive definite.
Variances (3)
Assuming intrinsic stationarity (homogeneity), the first twomoments of the incremets, in particular the semivariogram exists.However, the covariance function of the increments may not exist.If Z is an IRF, then only linear combinations which can berepresented as linear combinations of increments have a finitevariance, i.e.,
n∑`=1
λ`Z (x`) =n∑`=1
λ`
(Z (x`)− Z (x0)
)if
n∑`=1
λ` = 0
Variances (4)
If Z is an IRF and∑λ` = 0, then
0 ≤ Varn∑`=1
λ`Z (x`) = −n∑`=1
n∑k=1
λ`λkγ(x` − xk)
Thus, −γ is a conditionally positive definite functions.
In the case of∑λ` = 0 the covariance function C may formally be
replaced by the negative semivariogram −γ.
Variances (4)
Let
Z ∗(x0) =n∑`=1
λ`Z (x`)
then the variance
Var( n∑`=1
λ`Z (x`)− Z (x0))
can be represented in terms of the variogram if
n∑`=1
λ` = 1
Variogram models (1)
Pure nugget–effect
γ(h) :=
0 , if h = 01 , otherwise
Spherical semivariogram
γ(h) :=
1.5 h
a − 0.5(ha)3, if h < a
1 , otherwise
Exponential semivariogram
γ(h) := 1− exp
(−3h
a
)Gaussian semivariogram
γ(h) := 1− exp
(−3h2
a2
)
Variogram models (2)
Power variogram
γ(h) := hω 0 < ω < 2
Variogram models and covariance functions
For bounded model semivariograms (with a sill) referred to as“transition models” the corresponding model covariance isprovided by
c(h) := 1− γ(h)
For unbounded model semivariogram functions a correspondingcovariance function does not exist.
Kriging (1)
Named in honour of Danie Krige, kriging is the genuine method ofgeostatistics for spatial prediction.
It is a stochastics based method of spatial prediction (estimation)employing the spatial structure, i.e., the spatial correlation, ascaptured in the semivariogram.
Based on n georeferenced data of an attribute z(x`), ` = 1, . . . , n,the value of this attribute z shall be predicted for any locationx0 ∈ D by a linear combination of the data, i.e.
z∗(x0) :=n∑`=1
λ`(x0)z(x`)
where the weights depend on the spatial correlation of the data.
Kriging (2)
Notation
The random fucntion (Z (x), x ∈ D).
The set of points S ⊂ D where Z (x) has been sampled:S = x ∈ D | z(x) known . Usually S is finite a consists of npoints.
The data z(x`), ` = 1, . . . , n.
The mean value m(x`) = m`
The covariances Cov(Z (x`),Z (xk)) = σij
Change of support: Variability depends on material support,and must therefore be considered, e.g. point kriging, blockkriging, ... .
Neighborhoods: local vs. global approach
Kriging (3)
Linear approach of kriging
Z ∗(x0) =n∑`=1
λ`(x0)Z (x`) + λ0(x0)
or for short
Z ∗ =n∑`=1
λ`Z` + λ0
Eventually we shall see that kriging is of the form
Z ∗ =n∑`=1
λ`Z`
Kriging (4)
A proper estimator should be unbiased
E[Z ∗(x0)] = E[Z (x0)] i.e. E[Z ∗(x0)− Z (x0)] = 0
and its associated estimation variance
σ2E (x0) = Var[Z ∗(x0)− Z (x0)]!−→ min
should be as small as possible.
Note
Var[Z ∗(x0)− Z (x0)] = E[Z ∗(x0)− Z (x0)]2︸ ︷︷ ︸mean square error
−E2[Z ∗(x0)− Z (x0)]︸ ︷︷ ︸bias
Kriging (5)
These two requirements are the characteristics of kriging:
blue, blup – best linear unbiased estimator, predictor
and lead to the problem of quadratic programming (optimization)
σ2E (x0)!−→ min
subject to unbiasedness
E[Z ∗(x0)− Z (x0)] = 0
Kriging (6)
Simple kriging (SK) refers to kriging in case of a constantknown mean and a known covariance function and thus tosecond–order stationarity.
Ordinary kriging (OK) refers to the case of a constant unknownmean and a known variogram and thus to intrinsic stationarity.
Universal kriging (UK) refers to case of an unknown mean ofknown type and a known variogram.
Simple Kriging (1)
Assuming a known constant mean m: m(x) = mLinear approach
Z ∗ =n∑`=1
λ`Z` + λ0
The constant λ0 and the weights λ` are determined so as tominimze the error Z ∗ − Z0 characterized by its expected meansquare E(Z ∗ − Z0)2.The mean square error (mse) is
E(Z ∗ − Z0)2 = Var(Z ∗ − Z0)︸ ︷︷ ︸variance term
+ E2(Z ∗ − Z0)︸ ︷︷ ︸bias term
To make the bias term vanish, it is necessary that
λ0 = m0 −∑`
λ`m`
Simple Kriging (2)
Then
Z ∗ = m0 +n∑`=1
λ` (Z` −m`)︸ ︷︷ ︸Y`
with
Y ∗ =n∑`=1
λ`Y`
This amounts to predicting the zero–mean variableY (x) = Z (x)−m(x) by the linear estimator Y ∗ =
∑` λ`Y`, and
adding the mean afterwards
Z ∗ = Y ∗ + m0
Therefore, the case of a known mean is equivalent to the case of azero mean and λ0 = 0.
Simple Kriging (3)
The mse which is now a variance is
Var(Z ∗−Z0) =∑`
∑k
λ`λkCov(Z`,Zk)−2∑`
λ`Cov(Z`,Z0)+Var(Z0)
Setting all partial derivatives to 0
∂
∂λ`E(Z ∗ − Z0)2 = 2
∑k
λkCov(Z`,Zk)− 2Cov(Z`,Z0)!= 0
As the mse is a convex function due to the positive definitness ofthe covariance, the weights are finally given as the solutions of thesimple kriging system∑
k
λSKk Cov(Z`,Zk) = Cov(Z`,Z0)
These equations provide the best linear unbiased predictor (blup).
Simple Kriging (4)
With ∑k
λSKk C (x` − xk) = C (x` − x0)
the estimation variance, called the kriging variance, associated withZ ∗ is
σ2SK(x0) = E(Z ∗ − Z0)2 = Var(Z0)−∑`
λSK` Cov(Z`,Z0)
= C (0)−∑`
λSK` C (x` − x0)
Note that the kriging variance does not depend on the variablesZ (x`) nor on the data z(x`)!
Simple Kriging (5)
More explicitly the simple krige – system reads C (x1 − x1) . . . C (x1 − xn)...
. . ....
C (xn − x1) . . . C (xn − xn)
λSK1 (x0)
...λSKn (x0)
=
C (x1 − x0)...
C (xn − x0)
,
which reduces in matrix notation to
KSKλSK (x0) = kSK ,
Then, its solution is given by
λSK (x0) = K−1SK kSK
and the kriging – variance is accordingly
σ2SK = C (0)− λTSK (x0)kSK
= C (0)− kTSKK−1SK kSK
Simple Kriging (6)
A solution of the kriging system C (x1 − x1) . . . C (x1 − xn)...
. . ....
C (xn − x1) . . . C (xn − xn)
λSK1 (x0)
...λSKn (x0)
=
C (x1 − x0)...
C (xn − x0)
,
in matrix notationKSKλSK (x0) = kSK
exists and is unique, and the kriging variance is positive, ifKSK = C (xi − xj) is positive definit, i.e. in practice if
1 xi 6= xj fur i 6= j
2 the covariance function has been modeled by authorizedmathematical model functions.
Simple Kriging (9)
Properties
Interpolation
z∗(x`) = z(x`), σ2SK(x`) = 0
Smoothing
VarZ ∗ =∑`
∑k
λSK` λSK
k Cov(Z`,Zk) =∑`
λSK` Cov(Z`,Z0),
VarZ ∗ = VarZ0 − σ2SK
Simple Kriging – Dual Kriging (1)
z∗(x) =∑`
λSK` (x) z(x`) =
z(x1)...
z(xn)
T λSK1 (x)
...λSKn (x)
=
z(x1)...
z(xn)
T C (x1 − x1) · · · C (x1 − xn)
.... . .
...C (xn − x1) · · · C (xn − xn)
−1 C (x1 − x0)
...C (xn − x0)
=
b1(x1, . . . , xn)...
bn(x1, . . . , xn)
T C (x1 − x0)
...C (xn − x0)
=∑`
b` C (x` − x)
or in matrix notation
z∗(x) = zTλSK(x) = zTK−1SK kSK =∑`
b`C (x` − x) .
Simple Kriging – Dual Kriging (2)
Reading
z∗(x) = zTλSK(x) = zTK−1SK kSK =∑`
b`C (x` − x) .
as a superposition of shifted covariance functionsCov(Z`,Z ) = C (x` − x) centered at the sampling locations x`, thecovariance function C determines the “smoothness”, i.e. continuityand regularity, of z∗.
If the covariance is parabolic near the origin, then z∗ isdifferentiable;
if it is linear near the origin, then z∗ is continuous but withcusps at the data points;
if the covariance has a discontinuity at the origin, then therewill be isolated jumps at the data points.
Simple Kriging – Dual Kriging (3)
Rewriting
z∗(x) = zTλSK(x) = zTK−1SK kSK =∑`
b`C (x` − x) .
and postulating interpolation
z∗(xk) =∑`
b`C (x` − xk)!= z(xk) , j = 1, . . . , n
or in matrix notationKSKb = z ,
the kriging estimator can be characterized as solution of theinterpolation problem.
Ordinary Kriging (1)
Assuming an unknown constant mean m: m(x) = mOrdinary kriging is the most simple case where the randon functionZ (x) is decomposed according to
Z (x) = m(x) + Y (x)
into a sum of a deterministic function m called the drift describingthe systematic behaviour, and a zero–mean random function Y (x)called the residulas capturing the erratic fluctuations.
Ordinary Kriging (2)
Linear approach
Z ∗ =n∑`=1
λ`Z` + λ0
minimizing the mse
E(Z ∗ − Z0)2 = Var(Z ∗ − Z0) + E2(Z ∗ − Z0)
= Var(Z ∗ − Z0)︸ ︷︷ ︸variance term
+[λ0 +
(∑`
λ` − 1)m]2
︸ ︷︷ ︸bias term
Ordinary Kriging (3)
To make the bias term[λ0 +
(∑`
λ` − 1)m]2
vanish, it is necessary that
λ0 = 0∑`
λ` = 1
Then
E[Z ∗ − Z0] =∑`
λ`m −m =(∑
`
λ` − 1)m = 0
as∑
` λ` = 1.
Ordinary Kriging (4)
Subject to the condition∑
` λ` = 1 the mse to be minimized is
Var(Z ∗ − Z0) = −∑`
∑k
λ`λkγ(Z`,Zk) + 2∑`
λ`γ(Z`,Z0)
Applying the method of Lagrangian multipliers to solve theconstrained minimization problem leads to considering theLagrangian function
L(λ1, . . . , λn;µOK) = Var(Z ∗ − Z0) + 2µOK
(∑`
λ` − 1)
and setting all its partial derivatives to zero
∂L
∂λ`= 2
∑k
γ(Z`,Zk)− 2γ(Z`,Z0) + 2µOK!= 0
∂L
∂µOK= 2
∂L
∂λ`
!= 0
leads to the ordinary kriging system
Ordinary Kriging (5)
Ordinary kriging system∑k
λOKk γ(Z`,Zk) + µOK = γ(Z`,Z0)∑
`
λOK` = 1
With the ordinary kriging weights λOK the estimation variance,called the kriging variance, associated with Z ∗ is
σ2OK(x0) = E(Z ∗ − Z0)2 =∑`
λSK` γ(Z`,Z0) + µOK
Note that the kriging variance does not depend on the variablesZ (x`) nor on the data z(x`)!
Ordinary Kriging (6)
OK in terms of covariances reads∑k
λOKk Cov(Z`,Zk) + µOK = Cov(Z`,Z0)∑
`
λOK` = 1
More explicitly, OK weights are determined by(λOKk (x)µOK (x)
)=
(C (x` − xk) 1
1 0
)−1(C (x` − x0)
1
)Then, with the ordinary kriging weights λOK the kriging varianceassociated with Z ∗ is
σ2OK(x0) = E(Z ∗ − Z0)2 = Var(Z0)−∑`
λSK` Var(Z`,Z0)− µOK
= C (0)−∑`
λSK` C (x` − x0)− µOK
Ordinary Kriging (7)
In case of a SRF, OK is equivalent to optimum estimation of theunknown mean (by kriging!) followed by SK of the residuals fromthe (optimum) mean estimate as if the mean was known, i.e.perfectly estimated.
If the mean would have been estimated in a different(conventional) way, then the above statement is not true.
In case of IRF, the SK estimator is not defined. Moreover, since anIRF is defined by increments, the unknown mean of its randomvariables is indeterminate and cannot be estimated.
Ordinary Kriging (8)
In practice, the semivariogram is modeled (variography); fornumerics it is replaced by a pseudo covariance function
A− γ(h)
where A denotes a sufficiently large real number such that
A− γ(h) ≥ 0
for all lags h, which are numerically relevant.
In this way, the numerical performance of software can beimproved.