22
2/24/2020 1 LECTURE 05: GEOSTATISTICS ANALYSIS Everything is related to everything else, but near things are more related than distant things, Tobler’s Law 2/24/2020 1 http://smtom.lecture.ub.ac.id/ Password: https://syukur16tom.wordpress.com/ Password: (i) (a) (b) (i) (ii) z x (ii) (iii) (iii) Variation LECTURE OUTCOMES After the completion of this lecture and mastering the lecture materials, students should be able to 1. to describe basic concept of Geostatistical methods of interpolation. 2. to describe Geostatistics flexibility. 3. to describe regionalized variable in Geostatistics. 4. to describe structural analysis, variogram, variogram models and autocorrelation in Geostatistics. 5. to calculate the estimation of an attribute z (factor) value using ordinary Kriging. 2/24/2020 2

GEOSTATISTICS ANALYSIS · 2020-02-05 · Geostatistics can also combines soft information and conditional simulation which is useful for computing data for raster-based environment-tal

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: GEOSTATISTICS ANALYSIS · 2020-02-05 · Geostatistics can also combines soft information and conditional simulation which is useful for computing data for raster-based environment-tal

2/24/2020

1

LECTURE 05:GEOSTATISTICS

ANALYSIS

Everything is related to everything else, but near thingsare more related than distant things, Tobler’s Law2/24/2020 1

http://smtom.lecture.ub.ac.id/Password:

https://syukur16tom.wordpress.com/Password:

(i)

(a)

(b)(i)(ii)

z

x

(ii)(iii)(iii)

Variation

LECTURE OUTCOMESAfter the completion of this lecture and mastering thelecture materials, students should be able to1. to describe basic concept of Geostatistical methods of

interpolation.2. to describe Geostatistics flexibility.3. to describe regionalized variable in Geostatistics.4. to describe structural analysis, variogram, variogram

models and autocorrelation in Geostatistics.5. to calculate the estimation of an attribute z (factor) value

using ordinary Kriging.

2/24/2020 2

Page 2: GEOSTATISTICS ANALYSIS · 2020-02-05 · Geostatistics can also combines soft information and conditional simulation which is useful for computing data for raster-based environment-tal

2/24/2020

2

LECTURE OUTLINE1. INTRODUCTION Data and Geostatistics Geostatistical Flexibility

2. GEOSTATISTICS BASES Regionalized Variable Structural Analysis Variogram Variogram Models Autocorelation

3. ORGINARY KRIGING Weight Estimation Basic Kriging

1. INTRODUCTION1. Data and Geostatistics

1. When data are sparse, the choice of method and theassumptions, made about the underlying variationthat has been sampled, can be critical if one is toavoid misleading results.

2. Most interpolation techniques give similar resultswhen data are abundant.

3. Geostatistical methods provide ways to deal with thelimitations of deterministic interpolation methods, andensure that the prediction of attribute values atunvisited points is optimal in terms of theassumptions made.

4. Geostatistical methods of interpolation, popularlyknown as kriging, attempt to optimize interpolation.

Page 3: GEOSTATISTICS ANALYSIS · 2020-02-05 · Geostatistics can also combines soft information and conditional simulation which is useful for computing data for raster-based environment-tal

2/24/2020

3

2. Geostatistical Flexibility1. Geostatistical methods with great flexibility for

interpolation, provide ways to interpolate to areas or volumes larger than the

support (block kriging), methods for interpolating binary data (indicator kriging), and methods for incorporating soft information about trends

(universal kriging) or stratification (stratified kriging).

2. All these methods of interpolation yield smoothly varyingsurfaces accompanied by an estimation variancesurface.

3. Geostatistics can also combines soft information andconditional simulation which is useful for computing datafor raster-based environment-tal models.

4. Finally, the information in the variogram of geostatisticscan be used to help optimize sampling schemes formapping from point data.

2. GEOSTATISTICS BASES1. Regionalized Variable

1. Regionalized variables can be defined as variables distributed in space (or time), descriptions of phenomena with geographical distribution, values taken according to spatial location (e.g. elevation of

ground surface).2. The term regionalized variable was coined by Matheron

(1963, 1965) to emphasize two apparentlycontradictory aspects of these types of variables: a random aspect, which accounts for local irregularities, and a structured aspect, which reflects large scale tendencies.

3. The common statistical models including trend sur-faces put all the randomness into the error term whileall the structure is put into the deterministic term.

Page 4: GEOSTATISTICS ANALYSIS · 2020-02-05 · Geostatistics can also combines soft information and conditional simulation which is useful for computing data for raster-based environment-tal

2/24/2020

4

2/24/2020 7

SOV df SS MS Fcal P Fcrit5% 1%

Repication 3 0.54 0.1785 3.083 0.057 3.863 6.992

Elevation (E) 3 0.83 0.2754 4.756 0.016 3.863

Error (a) 9 0.52 0.0579

Nitrogen (N) 3 0.33 0.1102 1.725 0.193 3.863 6.992

E x N 9 1.87 0.2081 3.258 0.011 2.153 2.946

Error (b) 36 2.30 0.0639

Total 63 6.39

Table. An example of ANOV (analysis of variance) of Split Plot Design

3. Unfortunately this is not realistic for geologicalphenomena.

4. A better way of representing the reality is tointroduce randomness in terms of fluctuations arounda fixed surface which Matheron called the "drift" toavoid any confusion with the term "trend".

5. A regionalized variable is intermediate between atruly random variable and a completely deterministicvariable in that it varies in a continuous manner fromone location to the next. Therefore points that are near each other have a

certain degree of spatial correlation, but points that arewidely separated are statistically independent (Davis,1986).

Therefore, unknown values could be estimated fromdata taken at specific locations that can be sampled.

Page 5: GEOSTATISTICS ANALYSIS · 2020-02-05 · Geostatistics can also combines soft information and conditional simulation which is useful for computing data for raster-based environment-tal

2/24/2020

5

+Z1

+Z3

+Zn

+Z4 +Zi

+Z2

Regionalized Variable Zi

Variables are spatially correlated,Therefore:Z(x+h) can be estimated fromZ(x) by using a regression model.

** This assumption holds true with arecognized increased in error, fromother lest square models.

Function Z in domain D= a set of space dependent values

Histogram of samples zi

Z(x)

Z(x+h)

Cov(Z(x),Z(x+h))

D}i:{z(i)

2. Structural Analysis1. Fluctuations are not "errors" but rather fully fledged

features of the phenomenon, with a structure of theirown.

2. The identification of these structures, hence the namestructural analysis, is the first task in a geostatisticalstudy.

3. Regionalized variable theory assumes that the spatialvariation of any variable can be expressed as the sumof three major components (Fig. 6.1):(a) a structural component, having a constant mean or

trend;(b) a random, but spatially correlated component, known

as the variation of the regionalized variable, and(c) a spatially uncorrelated random noise or residual error

term.

Page 6: GEOSTATISTICS ANALYSIS · 2020-02-05 · Geostatistics can also combines soft information and conditional simulation which is useful for computing data for raster-based environment-tal

2/24/2020

6

2/24/2020 11

(i)

(a)

(b)

z

x

(ii)(iii)

Fig. 6.1 Regionalized variable theory divides complex spatial variation into:(i) spatially correlated, but irregular (‘random’) variation,(ii) average behaviour such as differences in mean levels (upper) or a trend

(lower), and(iii) random, uncorrelated local variation caused by measurement error and

short range spatial variation.

4. Let x be a position in 1, 2, or 3 dimensions, then thevalue of a random variable Z at x is given by

Z(x) = m(x) + '(x) + '‘ (6.1)

wherem(x) is a deterministic function describing the‘structural’ component of Z at x,(x) is the term denoting the stochastic, locally varyingbut spatially dependent residuals from m(x)—theregionalized variable—, and is a residual, spatially independent Gaussian noiseterm having zero mean and variance 2.

Note the use of the capital letter to indicate that Z is a randomfunction and not a measured attribute z.

5. The first step is to decide on a suitable function form(x).

Page 7: GEOSTATISTICS ANALYSIS · 2020-02-05 · Geostatistics can also combines soft information and conditional simulation which is useful for computing data for raster-based environment-tal

2/24/2020

7

6. Where no trend or drift is present, m(x) equals themean value in the sampling area and the average orexpected difference between any two places x and x+ h separated by a distance vector h, will be zero:

E[{Z(x) - Z(x+h)}2] = 0 6.2where Z(x), Z(x + h) are the values of randomvariable Z at locations x, x + h.

7. Also, it is assumed that the variance of differencesdepends only on the distance between sites, h, sothat

E[{Z(x) - Z(x + h)}2] =E[{(x) - (x+h)}2] = 2(h) 6.3where (h) is known as the semivariance.

8. The two conditions, stationarity of difference andvariance of differences, define the requirements forthe intrinsic hypothesis of regionalized variabletheory. This means that once structural effects have been accounted

for, the remaining variation is homogeneous in its variation sothat differences between sites are merely a function of thedistance between them.

9. So, for a given distance h, the variance of the randomcomponent of Z(x) is described by the semivariance:

var[(x) - (x+h)] = 2(h) 6.4

3. Variogram1. The variogram is defined as the variance of the

difference between field values at two locationsacross realizations of the field (Cressie, 1993).

Page 8: GEOSTATISTICS ANALYSIS · 2020-02-05 · Geostatistics can also combines soft information and conditional simulation which is useful for computing data for raster-based environment-tal

2/24/2020

8

2. The term of variogram and semivariogram are usedinterchangeably for 2(h), and (h) was proposed byBachmaler & Backes (2008) to be called variogramwhich is also known as semivariance.

3. Spatial correlation of an attribute is quantified bysemivariogram which is a plot of semivariance versusrange (total distance).

4. The term of semivariance is used to express the rate ofchange of regionalized variable along a specificorientantion (Davies, 2002).

5. A variogram consists of two parts: an experimental variogram and a model variogram

6. If the conditions specified by the intrinsic hypothesisare fulfilled, the semivariance can be estimated fromsample data.

7. Suppose that the value to be interpolated isreferred to as z. Several formulas can be used tocompute the variance, but it is typically computedas one half the difference in z squared as follows:

Page 9: GEOSTATISTICS ANALYSIS · 2020-02-05 · Geostatistics can also combines soft information and conditional simulation which is useful for computing data for raster-based environment-tal

2/24/2020

9

where n is the number of pairs of sample points ofobservations of the values of attribute z separated bydistance h.

8. A plot of (h) against h is known as theexperimental variogram.

9. The experimental variogram is the first step towards aquantitative description of the regionalized variation.

10. The variogram provides useful information forinterpolation, optimizing sampling and determiningspatial patterns.

11. To do this, however, we must first fit a theoreticalmodel to the experimental variogram.

n

1i

2i hxzxz

n21

Mean =(1+3+6+5+3+1+2+3)/8 = 3.0Variance = [(1-3)2+(3-3)2+(6-3)2+(5-3)2+(3-3)2+(1-3)2+(2-3)2+(3-3)2]/8 = 22/8 = 2.75

Box 6.1. Example of computing the first and secondmoments of a simple series

Covariance (1) = [(1-3)*(3-3)+(3-3)*(6-3)+(6-3)*(5-3)+(5-3)*(3-3)+(3-3)*(1-3)+(1-3)*(2-3)+(2-3)*(3-3)]/7 =8/7 = 1.14Semivariance (1) = [(1-3)2+(3-6)2+(6-5)2+(5-3)2+(3-1)2+(1-2)2+(2-3)2]/(2*7) = 24/14 = 2.714

1

3

6

5

3

12

3

0

1

2

3

4

5

6

7

0 2 4 6 8

z

Distance

Page 10: GEOSTATISTICS ANALYSIS · 2020-02-05 · Geostatistics can also combines soft information and conditional simulation which is useful for computing data for raster-based environment-tal

2/24/2020

10

4. Variogram Models1. A typical experimental variogram of data from a not

too smoothly varying attribute, such as a soilproperty, is shown in the following figure (Fig. 6.2).

Sill

C1

Lag(h)

range a

nuggetC0

(h)

Fig. 6.2. An example of a simple transitional variogram with range,nugget, and sill

2. There are several variogram models as follows:1. Spherical model2. Exponential model3. Gaussian model4. Linear model

3. Spherical model, as shown below, is used when thenugget variance is important but not too large, andthere is a clear range and sill.

for 0ha 6.6

for ha

where a is the range, h is the lag, c0 is the nuggetvariance, and c0+c1 equals the sill, often fits observedvariograms well.

3

10 ah21a2h3

cch

10 cch

Page 11: GEOSTATISTICS ANALYSIS · 2020-02-05 · Geostatistics can also combines soft information and conditional simulation which is useful for computing data for raster-based environment-tal

2/24/2020

11

4. Exponential model, as shown below, is a good choiceif there is a clear nugget and sill, but only a gradualapproach to the range.

6.7

5. Gaussian model, as shown below, is a curve havingan inflection that can be used if the variation is verysmooth and the nugget variance is very smallcompared to the spatially dependent randomvariation (x).

6.8

6. All these models are known as transitive variogramsbecause the spatial correlation structure varies with h.

ahexp1cch 10

2210 ahexp1cch

7. Linear model, one of non-transitive variograms, hasno sill within the area sampled and may as follows:

6.9

where b is the slope of the line. A linear variogramtypifies attributes which vary at all scales, such assimple Brownian motion.

8. Variogram estimation and modelling is extremelyimportant for structural analysis and for interpolation.

9. The variogram models cannot be any haphazardlychosen function, as they must obey certainmathematical constraints.

bhch 0

Page 12: GEOSTATISTICS ANALYSIS · 2020-02-05 · Geostatistics can also combines soft information and conditional simulation which is useful for computing data for raster-based environment-tal

2/24/2020

12

2/24/2020 23

Fig. 6.3. Examples of the most commonly used variogrammodels: (a) spherical; (b) exponential; (c) linear; and (d) Gaussian

The variograms reproducespatial structure of simulatedrandom fields.

Page 13: GEOSTATISTICS ANALYSIS · 2020-02-05 · Geostatistics can also combines soft information and conditional simulation which is useful for computing data for raster-based environment-tal

2/24/2020

13

5. Autocorrelation1. Autocorrelation literally means that a variable is

correlated with itself. The simplest definition of autocorrelation states that pairs

of subjects that are close to each other are more likely tohave values that are more similar, and pairs of subjectsfar apart from each other are more likely to have valuesthat are less similar

2. The First Law of Geography, formulated by WaldoTobler, also called Tobler’s law (1970, EconomicGeography) states that

“everything is related to everything else,but near things are more related thandistant things”

“Things” closer together (in both space andtime) are more alike then things far apart”

3. Geostatistical analysis can be used only ifa spatial correlation exists. No spatial correlation (Fig. A) Spatially correlated (Fig. B)

Page 14: GEOSTATISTICS ANALYSIS · 2020-02-05 · Geostatistics can also combines soft information and conditional simulation which is useful for computing data for raster-based environment-tal

2/24/2020

14

4. Autocorrelation can be divided into threeclasses as follows: Positive spatial autocorrelation occurs when features that are

similar in location are also similar in attributes.

Negative spatial autocorrelation occurs when features that areclose together in space are dissimilar in attributes.

Zero autocorrelation occurs when attributes are independentof location. m

3. ORDINARY KRIGGING1. Weigth Estimation

1. Given that the spatially dependent random variationsare not swamped by uncorrelated noise, the fittedvariogram can be used to determine the weights ineeded for local interpolation.

2. The procedure is similar to that used in weightedmoving average interpolation except that now theweights are derived from a geostatistical analysis ofthe data rather than from a general, and possiblyinappropriate, model.

3. The ‘true’ value z(x0) is given by:

with in

ii xz.xz

1

0

n

ii

1

1

Page 15: GEOSTATISTICS ANALYSIS · 2020-02-05 · Geostatistics can also combines soft information and conditional simulation which is useful for computing data for raster-based environment-tal

2/24/2020

15

The weights i are chosen so that the estimate (x0) isunbiased, and that the estimation variance e

2 is lessthan for any other linear combination of the observedvalues.

4. The minimum variance of [ (x0) - z(x0)], the predictionerror, or ‘kriging variance’ is given by:

and is obtained when

4. The quantity (xi,xj) is the semivariance of z betweenthe sampling points xi and xj;

n

iiie x,x

10

2

n

ijjii jallforx,xx,x

10

(xi,x0) is the semivariance between the sampling pointxi and the unvisited point x0. Both these quantities[(xi,xj) & (xi,x0)] are obtained from the fitted variogram.The quantity is a Lagrange multiplier required for theminimalization.

5. The method is known as ordinary kriging: it is an exactinterpolator in the sense that when the equations givenabove are used, the interpolated values, or best localaverage, will coincide with the values at the data points.

6. The interpolated values can then be converted to acontour map using the techniques already described.

7. Similarly, the estimation error e2, known as the kriging

variance, can also be mapped to give valuableinformation about the reliability of the interpolated valuesover the area of interest.

Page 16: GEOSTATISTICS ANALYSIS · 2020-02-05 · Geostatistics can also combines soft information and conditional simulation which is useful for computing data for raster-based environment-tal

2/24/2020

16

8. The interpolated values can then be converted to acontour map using the techniques already described.Similarly, the estimation error e

2, known as thekriging variance, can also be mapped to give valuableinformation about the reliability of the interpolatedvalues over the area of interest. Often the kriging variance is mapped as the kriging standard

deviation (or kriging error), because this has the same unitsas the predictions.

2. Basic Kriging1. As the first step, Kriging is limited to two samples of

observation for an easy of analysis (Fig.)2. Experimental Semi-variogram on the study area is

assumed to follow a spherical model with theparameter value of C0 = 2.5, C1 = 7.5 dan a = 10.0.

3. The spherical model with the parameter values isshown below:

3

102

1

10*2

35,75,2)(

hhh

Fig. The position ofsampling points(1 & 2) and unsamledpoint (?)

Page 17: GEOSTATISTICS ANALYSIS · 2020-02-05 · Geostatistics can also combines soft information and conditional simulation which is useful for computing data for raster-based environment-tal

2/24/2020

17

3. Distance between points. The step of calculation inthe estimation of z(x0) at the coordinate of x = 1.5 andy = 4.0 with the value of z(x1) = 4 and z(x2) = 6 is tomake the matrix of points shown below.

Table 1. The matrix of points, coordinates and values

The distance between sampling points can be easilycalculated on the basis of coordinates of points understudy as shown in the following matrix (number 1 & 2show the sampled points.

No. x y z1 1 3 42 3 7 63 1,5 4 ?

Table 2. The matrix of distances between sampledpoints.

[(3-1)2 + (7-3)2] =[22+42]0.5 = 200.5

4. Distance Vector. With the same approach as above,the vector of distances between sampled points (1 &2) and unsampled point (3, ?) can be calculated asshown in the following table.

No. 1 2

1 0.000 4.472

2 4.472 0.000

Page 18: GEOSTATISTICS ANALYSIS · 2020-02-05 · Geostatistics can also combines soft information and conditional simulation which is useful for computing data for raster-based environment-tal

2/24/2020

18

5. Distance Vector. With the same approach as above,the vector of distances between sampled points (i = 1& 2) and unsampled point (point 3, ? or 0) can becalculated as shown in the following table.

1.12[(1.5-1)2+(4-3)2]0.5 = [0.52+12]0.5

3.35[(3-1.5)2+(7-4)2]0.5 = [1.52+32]0.5

6. Matrix A and b. The substitution of numbers in theabove tables into the variogram model results insemivarinces (Matrices A and b).

i 01 1.122 3.353 1.00

Matrix A (semivarince)

Matrix b (semivarince)Note: The addition of a columnand rows with the value of ‘1’ isto ensure that the sum ofweight is 1.

A = i 1 2 31 2.500 7.196 1.0002 7.196 2.500 1.0003 1.000 1.000 1.000

b = i1 3.7532 6.1323 1.000

Page 19: GEOSTATISTICS ANALYSIS · 2020-02-05 · Geostatistics can also combines soft information and conditional simulation which is useful for computing data for raster-based environment-tal

2/24/2020

19

6. Matrix Inversion A = [A]-1. The weights (i) is obtainedfrom the product of a matrix inverse and a constant ‘b’([A]-1*[b]), then the inversion of matrix A is required,and [A]/[A] = [A]x[A]-1 = 1, so the inversion of matrix[A] can be generated with the following procedure.

The calculation of component value of matrix [A]-1 isdone step by step as follows:

2/24/2020 38

Page 20: GEOSTATISTICS ANALYSIS · 2020-02-05 · Geostatistics can also combines soft information and conditional simulation which is useful for computing data for raster-based environment-tal

2/24/2020

20

Forinstance,

(2.500 x a11)+(7.739 x a21) + (1.000 x a31) = 1(7.739 x a11)+(2.500 x a21) + (1.000 x a31) = 0(1.000 x a11)+(1.000 x a21) + (1.000 x a31) = 0

The above procedure, rarely done to get an inverse ofbig matrix due to its difficulty, results in Matrix A-1 asfollows:

A = i 1 2 31 -0.035 0.156 -0.1212 0.156 -0.035 -0.1213 -0.121 -0.121 1.243

3. Weight . The weights are then obtained from theproduct of matrix inverse and constant ‘b’ as follows.

Matrix A-1 Matrix b

x

The multiplication of matrix is executed row by row ofmatrix A-1 with the column of Matrix b as follows:

(-0.035*3.753)+(0.156*6.132)+(-0.121*1.000) = 0.7192

(0.156*3.753)+(-0.035*6.132)+(-0.121*1.000) = 0.2498

(-0.121*3.753)+(-0.121*6.132)+(1.243*1.000) = 0.0469

A = i 1 2 31 -0.035 0.156 -0.1212 0.156 -0.035 -0.1213 -0.121 -0.121 1.243

b = i1 3.7532 6.1323 1.000

Page 21: GEOSTATISTICS ANALYSIS · 2020-02-05 · Geostatistics can also combines soft information and conditional simulation which is useful for computing data for raster-based environment-tal

2/24/2020

21

As

then

3. Estimation of z(x0) Value. The value of z(x0) isobtained from the product of z(xi0), as presentedpreviously in Table 1 (step 3), and weight i, asfollows:z(x0) = (0.74 x 4) +(0.26 x 6) = 4.52

No. [A]-1*[b] 1 0.72 0.742 0.25 0.263 0.05

A-1*b =

APPLE TREES

2/24/2020 42

Page 22: GEOSTATISTICS ANALYSIS · 2020-02-05 · Geostatistics can also combines soft information and conditional simulation which is useful for computing data for raster-based environment-tal

2/24/2020

22

MALANG THI

2/24/2020 43

THI Malang 2050

THIThermal humidity index