Practical Geostatistics 2000-2 Spatial Statistics

Embed Size (px)

Citation preview

  • 7/16/2019 Practical Geostatistics 2000-2 Spatial Statistics

    1/138

    p | 1

    Practical Geostatistics

    2000-2

    Spatial Statistics

  • 7/16/2019 Practical Geostatistics 2000-2 Spatial Statistics

    2/138

    p | 2

  • 7/16/2019 Practical Geostatistics 2000-2 Spatial Statistics

    3/138

    p | 3

    TABLE OF CONTENTS

    Part 1 The Spatial Aspect ...................................................................................... 7

    Spatial Relationships ........................................................ ............................................................ .............. 9

    Including Location as well as Value ................................................................................................................... 9

    Spatial Relationships ............................................................................................................................................ 11

    Inverse Distance Estimation ................................................................................................................ 13

    Inverse Distance Estimation .............................................................................................................................. 13

    Worked Examples .................................................................................................................................... 19

    Worked Examples .................................................................................................................................................. 19

    Coal Project, Calorific Values ............................................................................................................................. 19

    Iron Ore Project....................................................................................................................................................... 20

    Wolfcamp Aquifer .................................................................................................................................................. 21

    Scallops Caught ....................................................................................................................................................... 21

    Part 2 The Semi-Variogram ............................................................................... 25

    The Experimental Semi-Variogram .......................................................... ......................................... 27

    The Semi-Variogram ............................................................................................................................................. 27

    The Experimental Semi-Variogram ................................................................................................................ 31

    Irregular Sampling ................................................................................................................................................. 33

    Cautionary Notes .................................................................................................................................................... 34

    Modelling the Semi-Variogram Function ........................................................................................ 35

    Modelling of the Semi-Variogram Function ................................................................................................ 35

    The Linear Model.................................................................................................................................................... 35

    The Generalised Linear Model .......................................................................................................................... 36

    The Spherical Model .............................................................................................................................................. 36

    The Exponential Model ........................................................................................................................................ 37

    The Gaussian Model .............................................................................................................................................. 38

    The Hole Effect Model .......................................................................................................................................... 38

    Paddington Mix Model ......................................................................................................................................... 39Judging How Well the Model Fits the Data .................................................................................................. 40

    Equivalence to Covariance Function .............................................................................................................. 41

    The Nugget Effect ................................................................................................................................................... 41

    Worked Examples .................................................................................................................................... 42

    Worked Examples .................................................................................................................................................. 42

    Silver Example ......................................................................................................................................................... 43

    Coal Project, Calorific Values ............................................................................................................................. 44

    Wolfcamp Aquifer .................................................................................................................................................. 46Part 3 Estimation and Kriging .......................................................................... 53

  • 7/16/2019 Practical Geostatistics 2000-2 Spatial Statistics

    4/138

    p | 4

    Introduction .................................................... ............................................................ ................................ 55

    Estimation and Kriging ........................................................................................................................................ 55

    Estimation Error ....................................................................................................................................... 58

    Estimation Error ..................................................................................................................................................... 58

    One Sample Estimation ........................................................................................................................................ 59Another Single Sample ......................................................................................................................................... 62

    Two Sample Estimation ....................................................................................................................................... 64

    Another Two Sample Estimation ..................................................................................................................... 69

    Three Sample Estimator ...................................................................................................................................... 71

    Choosing the Optimal Weights ......................................................... ................................................... 73

    Choosing the Optimal Weights ......................................................................................................................... 73

    Three Sample Estimation .................................................................................................................................... 75

    The General Form for the 'Optimal' Estimator .......................................................................................... 79Confidence Levels and Degrees of Freedom ............................................................................................... 80

    Simple Kriging ......................................................................................................................................................... 81

    Ordinary Kriging ..................................................... ............................................................ ...................... 82

    Ordinary Kriging ..................................................................................................................................................... 82

    'Optimal' Unbiassed Estimator ......................................................................................................................... 84

    Alternate Form: Matrices .................................................................................................................................... 86

    Alternate Form: Covariance ............................................................................................................................... 86

    Three Sample Estimation .................................................................................................................................... 87Cross-Validation ....................................................... ........................................................... ...................... 88

    Cross Validation ...................................................................................................................................................... 88

    Cross Cross Validation.......................................................................................................................................... 93

    Worked Examples .................................................................................................................................... 94

    Worked Examples .................................................................................................................................................. 94

    Coal Project, Calorific Values ............................................................................................................................. 94

    Iron Ore Example ................................................................................................................................................... 96

    Wolfcamp, Residuals from Quadratic Surface ..................... ...................... ..................... ..................... ....... 97

    Part 4 Areas and Volumes ................................................................................101

    The Impact on the Distribution .........................................................................................................103

    Areas and Volumes ............................................................................................................................................. 103

    The Impact on the Distribution ..................................................................................................................... 104

    Iron Ore, Normal Example ............................................................................................................................... 106

    Geevor Tin Mine, Lognormal(ish) Example.............................................................................................. 109

    The Impact on Kriging ..........................................................................................................................112

    The Impact on Kriging ....................................................................................................................................... 112

    The Use of Auxiliary Functions ...................................................................................................................... 114

  • 7/16/2019 Practical Geostatistics 2000-2 Spatial Statistics

    5/138

    p | 5

    Iron Ore Example, Page 95 .............................................................................................................................. 118

    Wolfcamp Aquifer, Quadratic Residuals .................................................................................................... 118

    Part 5 Other Kriging Approaches ................................................................................ 121

    Universal Kriging ....................................................................................................................................123

    Other Kriging ......................................................................................................................................................... 123

    Universal Kriging ................................................................................................................................................. 124

    Wolfcamp Aquifer ............................................................................................................................................... 127

    Lognormal Kriging .................................................................................................................................128

    Lognormal Kriging .............................................................................................................................................. 128

    The Lognormal Transformation .................................................................................................................... 130

    Geevor Tin Mine, Grades .................................................................................................................................. 131

    SA Gold Mine ......................................................................................................................................................... 132

    Indicator and Rank Uniform Kriging ........................................................ .......................................133Indicator Kriging.................................................................................................................................................. 133

    Rank Uniform Kriging ........................................................................................................................................ 136

    Summary of Part 6 .............................................................................................................................................. 138

  • 7/16/2019 Practical Geostatistics 2000-2 Spatial Statistics

    6/138

    p | 6

  • 7/16/2019 Practical Geostatistics 2000-2 Spatial Statistics

    7/138

    p | 7

    Part 1

    The Spatial Aspect

  • 7/16/2019 Practical Geostatistics 2000-2 Spatial Statistics

    8/138

    p | 8

  • 7/16/2019 Practical Geostatistics 2000-2 Spatial Statistics

    9/138

    p | 9

    Spatial Relationships

    Including Location as well as Value

    Apart from the last couple of applications of Least Squares regression in Part 5 ofthe previous course, all of our discussions so far have considered only themeasured values at each sample location. We have concentrated on assessing the'global' qualities of our variables and on estimating population parameters bethey means and standard deviations or correlations and relationships. However,our original problem, as defined in the Introduction, was to produce 'maps' of thevalues at unsampled locations. That is, to estimate unknown values at locationswhich have not been sampled.

    We will use a set of data published in the original Practical Geostatistics (Clark

    (1979)) which was a simulation based on an actual iron ore project (Iron Ore ).The values are the average quantity of iron (%Fe) in borehole samples takenthrough the whole intersection of the 'economic mineralisation'.

    Several boreholes have been drilled, all at the same angle,on a regular 100 foot grid. The resulting values are shownin the borehole layout. Notice that some of the boreholeshave not (yet?) been drilled. Within the terms of ouroriginal question, we might ask 'what would be the valueat the indicated location where no borehole has been

    drilled?'.In our statistical analyses in the previous course we have, tacitly, assumed thatthe measured values are drawn randomly from some idealised population of allpossible samples. It was that population in which we were interested. Noreference was made to location. We have to amend our basic assumptions if weredefine our problem to refer to a particular potential measurement rather thanthe whole population. The first two assumptions are retained:

    sample values are measured precisely and are reproducible; sample values are measured accurately and represent the true value at that

    location.

    Just to remind ourselves, the other two essentialassumptions were:

    these samples constitute a very small part of a largehomogeneous population of all possible samples;

    these samples were gathered randomly and independently from that largepopulation.

  • 7/16/2019 Practical Geostatistics 2000-2 Spatial Statistics

    10/138

    p | 10

    For our redefined problem we need to replace these last two assumptions withthe following:

    The samples are collected from a physically continuous, homogeneouspopulation of all possible samples. In simpler terms, the phenomenon we have

    measured at the sample locations also exists at all the unsampled locationswithin the study area with no sudden changes in characteristic. For example,if we were dealing with a coal seam, we assume that the coal seam is present atall potential drilling sites within the study area and that there are no faults,washouts or burnt areas within that area. As another example, if we arecounting weeds in a field, we assume that there are no areas where, say, thefarmer has spread extra weedkiller and no significant changes in soil typewhich might affect weed growth.

    The values at unsampled locations are related to the values at the sampledlocations. If there is no relationship between samples and unsampled values,then we are back to our 'random' concept and the best estimate for anunknown value () would be the average of the population. Our 'worse casescenario', then, is a random phenomenon, which would give us an estimatedvalue of:

    where:

    i. denotes the unknown value at the unsampled location;

    ii. * is any estimate of that unknown value;

    iii. * is our standard notation for the estimate of the true average of thepopulation, and

    iv. is the simple arithmetic average of our sample values, 36.4 %Fe.

    Effectively, is just a g we haven't measured yet. If we want to find confidencelimits for and g has a Normal distribution then we can say that comes from

    a Normal distribution with mean and standard deviation . We can state thatwe are 90% confident that:

  • 7/16/2019 Practical Geostatistics 2000-2 Spatial Statistics

    11/138

    p | 11

    Of course, we don't have and , so we would have to substitute and s asestimates. In Part 2 of the previous course we have seen that replacing by smeans that the 1.6449 (from Table 2 - above left) has to be replaced by therelevant value from Table 4 (above right) of Student's t distribution. Our estimate

    was produced from 47 samples, so (presumably) the degrees of freedomassociated with this estimate are = 46. Checking Table 4 (above right), we findthat:

    the above expression becomes:

    That is, we can be 90% certain that the true value at the unsampled location liesbetween 30.2 %Fe and 42.6 %Fe. Of course, the whole data set only rangesbetween 28 and 44 %Fe, so this is a pretty safe statement.

    In summary, then, if the values are completely random, our unknown value at the

    unsampled location is simply another number drawn at random from thepopulation of all possible samples. If there is some relationship between samplevalues and unsampled values which depends on location, we ought to be able todo better than simply using the overall mean and getting very wide confidenceintervals.

    Spatial Relationships

    In order to produce an estimate for the value at a specified location which isbetter than a random guess, we need to assume that there is some sort of

    relationship between values in the area which depends on the location of thesamples. There are many different ways to approach this, the major factor beingwhat kind of relationship we are willing to accept. All mapping packages arebased on this assumption, with some packages (such as Surfer) offering severaldifferent algorithms to produce estimated grid values. As an example of the sortof assumption available, a bicubic spline mapping method assumes that we aretrying to map a smooth continuous surface which needs to be 'differentiable' atevery point.

    Basically, all mapping methods assume that the 'unknown' value tends to berelated to sample values which are close to it. We tend to assume that if thelocations are close together then the values will be close together. Conceptually,

  • 7/16/2019 Practical Geostatistics 2000-2 Spatial Statistics

    12/138

    p | 12

    we assume that an estimator put together from neighbouring samples will bemore useful than one which includes more distant sampling.

    In this book we tackle the estimation methods grouped under the title of'geostatistics'. What differentiates geostatistical estimation from other mapping

    methods is, simply, the form of the relationship which is assumed to be presentbetween values at different locations in the study area. In the next three parts ofthis course, we will see how 'kriging' methods evolve directly from this basicconcept of spatial relationship.

    Looking at the plot of the borehole values in our example data set, we can see thatthere is some sort of 'continuity' in the values. Most of the samples in the middleof the area are in the mid 30s, shading to the 40s towards the northwest and the20s towards the southeast. The point we have indicated where we want toestimate the value is towards the northwest corner. On the basis of apparentcontinuity in values, we would expect this borehole to have a value in the high30s or low 40s. If we had selected a point in the south for our attention, we wouldexpect a value in the low 30s or high 20s.

    Our estimator for the unknown value, T, will be some sort of combination of theneighbouring sample values. Let us consider the simplest of all combinations the linear combination or weighted average. We take the local samples where wehave values and combine those sample values with weighting factors influencedby how close they are to the unsampled location. That is:

    where m is the number of samples we want to include inthe estimation and the gi the values of those samples. Thewi are the 'weights' which are attached to each sample.

    Using the above example, let us home in on the areaaround the unsampled location of interest.

    For this illustration, we will consider the seven samplessurrounding the unsampled grid node. The estimate for the unknown value

    becomes:

    where the wi should be chosen according to how close each sample is to T.Samples 1, 3, 5 and 7 are 121 feet from the unsampled location. Samples 2, 4 and6 are 100 feet from T. Intuitively, weights 2, 4 and 6 should be greater thanweights 1, 3, 5 and 7. The only question is, how much greater?

    It is interesting that we have no direct measure of how 'close' two locations are.

    We can measure the distance between them but not the closeness. We will,

  • 7/16/2019 Practical Geostatistics 2000-2 Spatial Statistics

    13/138

    p | 13

    therefore, have to assume that closeness is some inverse function of distance. Forexample, we could suppose that:

    that is, the weighting is inversely proportional to the distance from theunsampled location. For our example, this would produce a weight of 0.01 forsamples 2,4 and 6 and a weight of 0.007071 for the diagonal samples. Theresulting estimator would be:

    This does not make a lot of sense. We wanted a value in %Fe and we have a valuein %Fe per foot. We expected an estimate between 37 and 44 %Fe and we have

    an estimate of just over 2.0. Obviously we are doing something wrong here. Theproblem is in the distance units. We need to remove the units of distance to getweights which are 'pure numbers''. We also need to do this in a way which givesus a sensible answer.

    The simplest way to do this is to choose a set of weightswhich add up to 1. At the moment our weights add up to0.051213. If all of the samples had a value of 37 %Fe, ourestimator would be 0.051213 of 37 %Fe. If all oursamples had a value of 37 %Fe, surely our estimatorshould be 37 %Fe? The only way to guarantee this is tochoose weights which add up to 1. To calculate weightswhich add up to 1 is pretty straightforward: find out what they do add up to andthen divide each one by that number. Our estimator for the unsampled location is39.6 %Fe, if we use 'inverse distance' estimation.

    Inverse Distance Estimation

    Inverse Distance Estimation

    We have produced an estimator for unsampled locations which is a weightedaverage of the neighbouring sample values. This estimator is based on theassumptions that:

    the values form a continuous 'surface' across the whole area; the relationship between values depends on the distance between their

    locations.

  • 7/16/2019 Practical Geostatistics 2000-2 Spatial Statistics

    14/138

    p | 14

    In some cases, we might want to extend that last assumption to include direction.For example, sediments in a river would be expected to be more continuousdownstream than across the river bed.

    The general form of the estimator is:

    where

    where f in this context denotes a function of the distance value, d. The concepts

    and calculation of this estimator are intuitively attractive and verystraightforward. However, in practice, use of such an estimator gives rise to morequestions than are answered.

    In the following sections, we pose some of these questions and, in the next fewsections, we will attempt to answer them. The order of the questions is not anindication of their relative importance.

    1. What function of distance should we use in any given application? In theillustration above, we have used the function 1/d to produce our estimator. Wecould have easily used any one of the following functions:

    or any other function which was an inverse function of distance. The higherthe power of function, the more weight will be given to closer samples.

    2. How do we handle different continuity in different directions? We havesaid above that there may be phenomena which have different relationships indifferent directions. There may be physical controls on how values wereproduced. This is known in geostatistics as 'anisotropy'. In pollution studies,we may have flow directions or plume shapes to deal with. In fishing, theoffshore direction may have a different level of predictability from the

    longshore direction.

  • 7/16/2019 Practical Geostatistics 2000-2 Spatial Statistics

    15/138

    p | 15

    3. How many samples should we include in the estimation? In the previoussections on statistical analyses, we have seen that the more samples we use thebetter our estimator becomes. Is this true in inverse distance type estimation?Well, no. The more samples we include the thinner we have to spread theavailable weight. Remember that the weights have to add up to one, so if you

    include more samples the weight for those has to come offthe closer samples.Of course, you can compensate for this by changing the power of the functionyou use.

    4. How do we compensate for irregularly spaced or highly clusteredsampling? In the simple example above, we have a grid of samples. In otherapplications see, for example, the Wolfcamp data we have significantirregularities and clustering in sample locations. It is natural for a cattlerancher to sink a new water well into an aquifer where he has had goodpressure in the past. It is almost inevitable in mining that a geologist willschedule more sampling in the rich areas than in the poor ones. It is difficult toobtain a budget from a project manager to sink holes in the 'waste' purely tobalance your statistics.

    5. How far should we go to include samples in our estimation process? Thisis not the same question as in 2 above. This is a question about the continuityof the phenomenon we are studying. For example, if you have a coal seam, youmight believe that there is a relationship between quality of coal at samplesmore than a kilometre apart. If you have a gold reef, on the other hand, youwill be lucky if there is any relationship more than 100 metres away. Rainfall isa pretty continuous phenomenon, especially over oceans, but not at the same

    scale over mountainous terrain.

    6. Should we honour the sample values? Presumably we are not (in the realworld) going to do all these calculations by hand. If we wish to produce a map,we usually lay a grid of nodes over the area and estimate the value at each gridnode. The contours are then produced on the basis of the gridded values. Acomputer program or spreadsheet application can be used to produce the gridnode estimates. However, at some of those nodes we will already know thevalue because we will have a sample there. What happens when d becomeszero? None of the 1/d type functions can be calculated if d is zero. Theexponential and R-dtype functions can be, but will not give all of the weight tothe sample at that point. We have to make a special case for the calculation atthe sample locations. So we have to answer this question. On the basis of ourfundamental assumptions precise and accurate sampling the answer has tobe yes. We return to this problem in the next few sections and see what aprofound impact it can have on our results and our perception of ourconfidence in the final estimates.

    7. How reliable is the estimate when we have it? In the previous sections, wehave seen how we can produce confidence intervals for estimators. Can we dothe same here? The estimator is a linear combination of the sample values. If

    the values come from a Normal distribution, then so does the linear

  • 7/16/2019 Practical Geostatistics 2000-2 Spatial Statistics

    16/138

    p | 16

    combination. We can work out the mean and standard deviation of thatcombination:

    provided that the weights sum to 1 and all of the samples come from the sameNormal population. The variance of*is given by:

    expanding out the square of the bracket produces a table of terms as follows:

    multiplying through:

    This horrendous list of terms has to be averaged over the whole population.We can simplify this a bit by remembering that the average of all the gs is and that the weights add up to 1. This means that:

    and

    so that

  • 7/16/2019 Practical Geostatistics 2000-2 Spatial Statistics

    17/138

    p | 17

    and all of the terms with in them boil down to a single -2. The remainder ofthe huge expression is then a list of all of the possible cross-products:

    for which the population average must be found. Using another algebraic trick,we could show that sum of all the terms {wiwj} is also 1. Try it if you do notbelieve us. If this is true then

    so that the general expression above becomes:

    or, with a very little jiggery pokery:

    but we know from the previous course that:

    is the covariance between gi and gj, so all of these terms must be covariancesbetween each pair of samples, ij. When i =j, of course, the covariance is equalto the variance of the g values, 2g. So we can write the variance of the

    estimator *as

    You may be wondering why this is so much more complicated than in theprevious course, where we found the confidence intervals for the arithmeticmean using . Two reasons:

    a. all of the weights were equal at (in this case) 1/m;

  • 7/16/2019 Practical Geostatistics 2000-2 Spatial Statistics

    18/138

    p | 18

    b. the sample values were assumed to be random and independent, so that all ofthose covariance terms were zero.

    Under these circumstances, the above expression would reduce to:

    which is the result we had in the previous course for estimating from thesample mean, .

    In this case, we definitely do not want the sample values to be uncorrelated, sothe covariance terms must be non-zero. If we want to evaluate how reliableour estimate is to put confidence limits round it we are going to have to beable to calculate the covariances between pairs of samples. We have assumedthat the relationship between the samples is a function of distance. We nowknow what kind of relationship it is that interests us the covariance. All of theabove algebraic gymnastics have served to explain to us that the covariancebetween two samples a given distance apart should depend only on thatdistance (and possibly the direction). This sounds remarkably like Krige'sbasic assumption for his weighted average template approach, which wediscussed in the Part 5 of the previous course. We will return to this in the nextsection.

    8. Why is our final map too smooth? A weighted average estimator cannot

    produce values which are larger than the largest single sample value orsmaller than the smallest sample value. We know from the previous coursethat making averages of sample values reduces the standard deviation of theanswers. This means that weighted average estimates must, by definition, havea smaller range of possible values that the actual individual values from thepopulation. This is another instance of what Krige called the 'regression effect'back in the 1950s. For mapping purposes, we get the opposite effect to thatwhen planning mining blocks: a weighted average will tend to under-estimatehigh values and over-estimate low values whe Part 5. Simulation is one way ofquantifying just how smooth the predicted maps are.

    9. What happens if our sample data is not Normal? We have seen in Part 3 ofthe previous course that using arithmetic mean calculations with highlyskewed data can produce seriously erroneous results. Is it, therefore, at allsensible to use a weighted average of sample values if the data is from a highlyskewed distribution whether positively or negatively skewed? We willdiscuss this problem at greater length in Part 6.

    10. What happens if there is a strong trend in the values? In this context, a'trend' or 'drift' is taken to mean a consistent change in the expected value ofthe phenomenon as we move across the study area. This is a trend as

    discussed in the previous section. A weighted average estimator will only beeffective in the presence of trend if:

  • 7/16/2019 Practical Geostatistics 2000-2 Spatial Statistics

    19/138

    p | 19

    a. the data is taken on a regular spacing in all directions, and

    b. the form of the trend is a simple increase or decrease in one particulardirection.

    If the change in values is more complex perhaps peaking or forming troughs invalues a weighted average will smooth out the 'dips' and 'humps' and leaveus with an even smoother map than that discussed in point 8 above. Copingwith trend will be discussed (to some extent) in the next section and in Part 6.

    11. How do we estimate average values over areas or volumes? This point isonly made here to suggest its relevance to our original problem. In miningapplications, in particular, values are generally required for mining blocks orstoping areas. Rarely is a mine planned on the basis of 'point' values. Averagevalues over an area or volume can be produced by estimating a grid of pointvalues within the area and averaging the resulting estimates. We will see inPart 5 that there are simpler and quicker methods to obtain a direct estimatefor the average over a volume or area.

    Worked Examples

    Worked Examples

    This section contains worked examples using the following datasets:

    Coal Project Iron Ore Wolfcamp Scallops

    Coal Project, Calorific Values

    If we are to consider location, the first thing we need to do is to get an idea of thelayout of the samples in two or three dimensions. The simplest way to do this isto draw a 'post plot' of the sample data. This is simply a map showing the

    locations of the samples and their measured values. Post plots can belabelled, as in the figure here, or coloured or shaded by value.

    For this illustration we wish to estimate the grid pointwhich has not been drilled and is labelled . We have 5points in the immediate neighbourhood. We might want

  • 7/16/2019 Practical Geostatistics 2000-2 Spatial Statistics

    20/138

    p | 20

    to include the sample 300 metres to the east and that 300 metres to the north tofill in the gaps.

    Let us try both ways. We will use the simple weighting function of inversedistance squared. Our calculation table would be as follows for the 5 sample case,

    numbering samples clockwise from North:

    so that our estimator for the unsampled location would be:

    Using seven samples, we would obtain:

    giving us a slightly lower estimator at 24.944 MJ.

    The impact of changing the search radius is illustrated in Figures 2 and 3 below.Both mapping exercises use simple inverse distance weighting. Figure 2 uses asearch radius of around 390 metres, which was evaluated on the basis of gettingan average of 20 samples within the search circle. Figure 3 was produced using asearch radius of 250 metres, which basically takes the first 'ring' of samplesaround each unsampled grid point.

    Iron Ore Project

    For this example we will use inverse distance methods to

    map the values over the area of the Iron Ore project data.We have chosen to estimate a grid of points every 10

  • 7/16/2019 Practical Geostatistics 2000-2 Spatial Statistics

    21/138

    p | 21

    metres. In the two examples shown, we use simple inverse distance squared withvarious search distances. The area of interest is 400 metres square, containing 50samples. To obtain an average of 20 samples for each inverse distance estimation,we need a search radius of around 140 metres.

    This map is intuitively unappealing. It looks ridiculous. The overlapping of thesearch 'circles' can be seen clearly as we move across and down the grid ofestimated points.

    A solution to this would be to widen the search radius. Figure 5 shows the resultwhen we use a search radius of 250 metres. The 'moon crater' effect seems tohave disappeared, but so has most of the variation in the data. We can see clearlywhere the algorithm is struggling to honour individual data points which arequite inconsistent with the draconian weighted average in the same area.

    From the sublime to the even more ridiculous, we reduced the search radius to100 metres (for reasons which will become plain in Part 2). Figure 6 shows thedisastrous result of that exercise.

    It would seem, basically, that inverse distance squared is not the most effectiveway to map this particular set of sample data. We could spend many happy hourstrying different distance functions and search radii.

    Wolfcamp Aquifer

    In complete contrast to the above example, we tryinverse distance with the Wolfcamp data and get almostthe same map no matter what function or what searchradius we choose. Figure 7 shows the map obtained withinverse distance squared and a search radius of 58 miles.

    Lengthening the radius to 75 miles or shortening it to 35 miles produce onlyminor changes in the contours. We can make it look pretty rough with a searchradius of 25 miles and an average of 5 samples per estimated point!

    Scallops Caught

    A rough post plot of Scallops samples shows the layout of the whole data set.This data is obviously irregularly spaced, possibly because of the difficulty insampling fishing beds on a regular grid. For an inverse distance type estimator,

    we will 'home in' on a section in the centre of the sampled area.

  • 7/16/2019 Practical Geostatistics 2000-2 Spatial Statistics

    22/138

    p | 22

    The point of interest in this illustration is at longitude 72.6, latitude 39.8.Distances are calculated by Pythagoras' theorem.

    The estimated value at the unsampled location is almost1,053 scallops in total. Notice that over half of this value

    comes from a single sample number 6. This sample isthe closest, but it also has a very high value at almosttwice that of the next highest sample value in this area(number 3). The sample is weighted at almost 0.3, butthe contribution to the estimate is almost 60%. Sample2, at roughly half the weight, contributes less than 4% of the final value. Samples3 and 4 together have roughly the same weight as sample 6, but contribute only30% of the estimated value.

    The reason for the apparent inequity in sample contribution is simply that thedistribution of the sample values is highly skewed. We have seen in the previous

    course that taking arithmetic means of highly skewed data produces absurdestimates for the population mean. The smaller the number of samples, theworse the effect becomes. Estimating a lognormal-ish value from a weightedaverage of highly skewed sample values has to be tantamount to senselessness.

    Another problem is illustrated in this case that of 'anisotropy'. If we do thecomplete mapping exercise, we can see both the smearing due to the skewednature of the data and the impact of direction on scallop growth. Two examplesare illustrated below:

    Figure 10 (above) shows inverse distance results with an isotropic searchradius designed to pick up an average of 20 samples;

    Figure 11 (above) shows inverse distance with the same search radius in the

    northwest direction and one-quarter of the distance in the northeast direction.Weights are scaled by the relative anisotropy.

  • 7/16/2019 Practical Geostatistics 2000-2 Spatial Statistics

    23/138

    p | 23

    Note of Caution

    On a note of caution, many mapping packages offer algorithms withanisotropic search ellipses. Make sure that the package actually changes the

    weighting factors with direction too. Some packages use an anisotropicsearch but still weight a sample, say, 50 m away the same in all directions.

  • 7/16/2019 Practical Geostatistics 2000-2 Spatial Statistics

    24/138

    p | 24

  • 7/16/2019 Practical Geostatistics 2000-2 Spatial Statistics

    25/138

    p | 25

    Part 2

    The Semi-Variogram

  • 7/16/2019 Practical Geostatistics 2000-2 Spatial Statistics

    26/138

    p | 26

  • 7/16/2019 Practical Geostatistics 2000-2 Spatial Statistics

    27/138

    p | 27

    The Experimental Semi-Variogram

    The Semi-Variogram

    In Part 1, we looked at the problem of estimating an unknown value at aparticular location within a study area. We laid out a set of essential assumptionsfor producing such an estimate and chose a weighted average method ofestimation. To recap, if our unknown value is denoted by , then our estimator *is expressed as:

    where:

    giare the values of the samples included in the estimation; wiare the weights given to each of the samples; m is the number of samples included, and

    to ensure that the resulting estimator is unbiased. We assume that the

    relationship between known and unknown values depends on the distancebetween their locations and, possibly, the direction between them. We used asimple simulated example to illustrate how an 'inverse distance' estimator isproduced and discussed the questions which arose when such an estimator wasconsidered. We produced a list of 11 such questions, all of which are importantbut some more immediately relevant.

    In most of the previous course, we have produced estimates for unknownpopulation parameters of one kind or another. In almost all of those cases wehave been able to quantify how 'reliable' the estimate is as a reflection of what isactually going on in the population. We have seen that it is not enough to producean estimate we must also provide confidence levels to attach to the estimationprocess. If we can answer question 7 in our list 'how reliable is the estimate whenwe have it?' then we could answer many of the questions posed in Part 1. Forexample, we could find the confidence limits for simple inverse distance andcompare them to those for inverse distance squared. Presumably, the betterestimation method would be the one which gives the 'narrowest" confidenceintervals.

    We have seen in Part 1 that the production of confidenceintervals is a non-trivial problem when we have

    relationships between the known and the unknownvalues. The standard deviation for the estimation error

  • 7/16/2019 Practical Geostatistics 2000-2 Spatial Statistics

    28/138

    p | 28

    or 'standard error' as it is often called is a function of all of the cross-covariancevalues between every pair of samples. Our problem is twofold:

    how do we estimate the covariance between a single pairof samples?, and

    how do we estimate the covariance between a knownsample and the unknown value?

    Let us simplify the situation a little and look at thesimulated example (Iron Ore) we used in Part 1.

    We need to find an estimate for the value at the unsampled location on the grid.Zooming in on the problem we would probably use the closest seven samples toproduce such an estimate.

    Our estimator would, therefore, become:

    where the weights would be calculated using some inverse function of distance.The 'reliability' can be measured quite simply as the difference between the

    estimated value and the actual value, . This is the same approach we used infinding confidence levels for the estimation of the 'global' population mean. Wecan define our 'error of estimation' as:

    Now, our weights add up to 1, so we could rewrite this as:

    which we could also write as:

    If we rephrase this in words, the logic goes something like this:

    the error we make in the estimation is the difference between the estimator andthe actual value.

    this is the difference between a weighted average of the samples and the actualvalue.

  • 7/16/2019 Practical Geostatistics 2000-2 Spatial Statistics

    29/138

    p | 29

    this is the weighted average of the individual differencesbetween each sample and the unknown value.

    In other words, the error on a weighted average is simplythe weighted average of the individual errors. If we canquantify one of these simple differences, we can bag the lot.

    Let us consider just the first sample and the unsampled location. The differencebetween these two is:

    Of course, we do not know what this value is since we

    do not know what value has so we cannot calculate it.This is where the statistical training comes in handy. Wehave made an assumption that the relationship betweeng1 and depends on the distance 141 feet (andpossibly direction, northeast/southwest) between them.If this is so, then we should look at our available information to find other pairsof known values this distance apart (in this direction). If our assumption iscorrect, these pairs should have the same kind of relationship as the one pair inwhich we are interested. For this data set, we have 31 pairs of sample 141 feetapart in a northeast/southwest direction. In statistical terms, we have 31 samplesfrom the population 'pairs of samples 141 feet apart, NE/SW" and we cancalculate the difference for each pair.

    From these samples we could calculate the average difference and estimate thestandard deviation of the differences:

    where N141,NE is the number of pairs found at a distance of 141 feet in the NEdirection and

    is the difference in value between the two samples found in each pair. We wouldestimate the variance of the differences using

    If our original samples came from a distribution with mean and standard

    deviation , then is an estimate for the average difference:

  • 7/16/2019 Practical Geostatistics 2000-2 Spatial Statistics

    30/138

    p | 30

    That is, if our samples all come from the same underlying population the true(population) average difference between any two samples is zero, by definition.

    diff will only be non-zero if the 'expected' value of the samples changes fromplace to place in the study area. For example, if there were a trend or 'drift' invalues over the area, the expected value would change and the mean difference

    would not (necessarily) be zero. On the assumption of no trend, is anestimate of zero. Seems a bit of a waste of time to actually calculate an estimatefor zero!

    If we accept the assumption of no trend for the moment, then

    and

    Which is simply the average of the squares of the differences between the samplevalues. Notice that we do not lose the usual 1 degree of freedom, because we arenot estimating the mean from the samples. We will see later in this part of thecourse (cf. Wolfcamp) what happens when our assumption on 'no trend' iswrong.

    For the distance and direction in question, we would get:

    so that

  • 7/16/2019 Practical Geostatistics 2000-2 Spatial Statistics

    31/138

    p | 31

    To summarise: the difference between two values which have locations 141 feetapart in a northeast/southwest direction comes from a population of similardifferences with a true mean of zero (in the absence of trend) and an estimatedstandard deviation of 2.778 %Fe. If our original samples come from a singleNormal distribution, the differences will also be Normal, with these parameters.We could state with 95% confidence that:

    where t0.02531 is the Studenttvalue read from table 4 with 31 degrees of freedom.Nowg1 = 40, so

    and our 95% confidence interval for the true value at the unsampled locationwould be between 34.3 and 45.7 %Fe.

    The Experimental Semi-Variogram

    We have seen, now, how we can begin to answer the question posed. To find the

    difference between the weighted average estimator and the actual value , weneed to look at all the individual samples and repeat this calculation for each oneof them. Of course, we can reduce this task slightly. The relationship between and sample 1 should be the same as that between and sample 5. If direction isnot a factor, both of these should be the same as the pairs {,g3} and {,g7}.Similarly, pairs {,g2} and {,g6} whilst {,g4} will be similar if direction is not afactor.

    Once we have all of these values, we can begin to estimate any missing point onthe grid. Of course, as we pointed out in Part 1, we do not just want to estimatepoints actually on the grid we want to estimate values all over the study area.Using this approach, how would we get a standard deviation for, say, 150 feet?And what would we do in a situation where our original samples were not on agrid and we would have trouble finding many pairs of samples at a specifieddistance in a specified direction?

    We need to generalise this process somehow so that we can produce a routine'algorithm' for the calculations. Let us restate the situation in more general terms.Leth denote a specified distance and direction. For thath, we can find all of thepossible pairs of samples. Assuming a true mean difference of zero, we canestimate the variance of the differences as:

  • 7/16/2019 Practical Geostatistics 2000-2 Spatial Statistics

    32/138

    p | 32

    We repeat this calculation for as many different values of h as the sample datawill support. The results can be tabulated or displayed in a graph. Before we do

    that, however, a little historical background. This type of approach wasinvestigated by many different workers in widely different fields of application from Gandin in Russia studying meteorology (Gandin (1963)) to Matrn inSweden applying similar methods to forestry problems (Matrn (1960)). There isa good paper by Noel Cressie called Mathematical Geology (Cressie (1993)) whichdiscusses the origins of the techniques which we will cover in the rest of thisbook.

    The particular work, notation and nomenclature which we will follow was laidout by Georges Matheron in his seminal work The Theory of Regionalised

    Variables, in the early 1960s (Matheron (1965)). In his work, he suggests theabove calculation but with a slight modification. He defines the quantity he isinterested in as one-halfof the variance of the differences and uses a differentsymbol for the result:

    We will see in the next section one of the reasons this was proposed. Generally,using half the variance instead of the whole variance simplifies the mathematics alittle. It is (intuitively) more pleasing to have some terms which are '2x' than tohave lots of terms which are '0.5x'.

    This 'semi-variance' is calculated for each direction and each distance and theresults are tabulated for our example in Table 1.

    In Part 1, we saw that the values of the samples in this simplistic example varymuch faster in the north/south direction than they do in the east/west direction.We now have a quantitative measure of that in a semi-variance of 5.35 %Fe 2north/south and only 1.46 %Fe2 in the east/west direction. This is an indicationof how 'anisotropic' the continuity is. In the process of trying to answer question7, we seem to have come up with the beginnings of an answer to question 2.

    This is a very small and regular example and already we have a fairly complex

    table of results to interpret. Matheron suggested that the easiest way to interpretthe results was to plot them as a graph of the semi-variance versus the distance

  • 7/16/2019 Practical Geostatistics 2000-2 Spatial Statistics

    33/138

    p | 33

    between the samples. Directions are indicated by different symbols or differentcolours. Because we assume precise and accurate sampling (sic), we have anadded point on our graph at zero on both axes. That is, there is no differencebetween two samples at the same location. Since this is a graph of the semi-variance it is generally referred to as a 'semi-variogram'. In recent times, with the

    spread of geostatistics, authors increasingly refer to the graph as a 'variogram'.We find this confusing not to say, sloppy and will use the full form, semi-variogram, throughout this book.

    The semi-variogram graph is a picture of the relationship (difference) betweensample values versus the distance between their locations. This is, effectively, anapproximation to the distance function based on the sample data. Once again, inan attempt to answer question 7 in Part 1, we have come up with a way ofanswering question 1 : 'what function of distance should we use in any givenapplication?'. It would seems sensible, given the foundations which we have builtin the previous course, to assess the function of distance by considering therelationships which exist between the samples that we do have. If we can(reliably) produce a distance function from the available data, we will have somebasis for applying that to situations where we do not have the other sample in thepair the {,g1} problem.

    This calculated or 'experimental' semi-variogram is anillustration of the relationships which exist amongst thesample values. The graph will verify for us whether thereactually is a relationship with distance. If our basicassumption is incorrect, then the graph will be a scatter of

    points more or less around a horizontal line. There arefew things more illogical than the person who insists 'wecould not get a good semi-variogram graph, so we used an inverse distanceweighting method'. If there is no distance relationship, how can you weight by adistance function?

    Irregular Sampling

    The calculation of a semi-variogram differs slightly if sampling has been carriedout on an irregular or highly clustered basis. In these cases see, for instance, the

    Wolfcamp or Scallops data sets specifying an exact distance and direction willgive very few pairs of samples for any given point on the semi-variogram graph.In this situation, we take the same approach as we do with a histogram ratherthan a 'bar chart' we group the sample pairs into intervals and use the averagevalue within the interval.

    For example, in the Wolfcamp data, which is discussed in detail later in this part,we choose an interval of 5 miles with a tolerance of 2.5 miles. That is, any pairof samples which is between 2.5 miles and 7.5 miles apart gets amalgamated intoa single interval. The next point on the graph would be the average squareddifference between all pairs of samples 7.5 to 12.5 miles apart and so on. The

    semi-variance can then be plotted against the average distance between the pairs

  • 7/16/2019 Practical Geostatistics 2000-2 Spatial Statistics

    34/138

    p | 34

    included in that interval. It is often necessary to experiment with interval widthsand numbers of intervals, particularly if your data is very irregularly spaced.

    Cautionary Notes

    Remember that the semi-variogram graph is an illustration of the variance ofdifferences in sample values. This graph will only be stable if the standarddeviation is a sensible measure for the variability of the values. As an example,consider the lognormal distribution. If your sample values come from alognormal, calculating the variance of the values from 'raw' untransformed datavalues is tantamount to outright stupidity. The variance on a lognormal is ameasure of skewness not of variability. If you want to measure variability orcontinuity, you must do it with some transformation which will produce a betterbehaved base distribution a logarithm or a rank uniform transform, forexample.

    Unless you are absolutely certain that there is no anisotropy in your data, alwayscalculate directional semi-variograms. You can combine them later if you need to.Isobel was once presented with an 'omni-directional' experimental semi-variogram for data with two spatial co-ordinates and variation through time. Thatis, the 'distance' between the samples was a function ofX, Yand time. When sheasked how many metres were equivalent to a minute, she was met with completeincomprehension.

    Remember that one of our basic assumptions is physical continuity of thephenomenon being measured. In geology, contain your area within fault blocks

    and relatively homogeneous mineralisations. In environmental studies, checkthat there are no factors which affect the spread of the substance of interest. Asan extreme case, think of measuring air temperature in an area containing theGrand Canyon or Victoria Falls.

    If you take the time to stop and 'listen' to your experimental semi-variograms,you can often pick up inconsistencies in your data or structural factors in yourvalues. Too many people applying geostatistics want to surge onto the nextsection and let the computer do the scut work. Your semi-variogram is a pictureof your data spatially and will give you a lot of information about relationships

    you may not even have thought about. Remember that each point on your semi-variogram graph is an estimate of one-

    half of the variance and depends heavily on how many pairs of samples wereavailable for its calculation. You might want to consider imposing a minimumnumber of pairs of samples before a point is included on the graph.

  • 7/16/2019 Practical Geostatistics 2000-2 Spatial Statistics

    35/138

    p | 35

    Modelling the Semi-Variogram Function

    Modelling of the Semi-Variogram Function

    When we looked at classical statistics, we drew a histogram or probability plot ofour data and we then proposed some theoretical function for the distribution ofvalues within the whole population. We now have the equivalent of a spatialhistogram for the sample data and need to theorise about what this graph wouldlook like if we had the whole population of all possible pairs of values over thewhole study area.

    There are many possible models for the idealised 'population' semi-variogram,(h). As with probability distributions, there are mathematical restrictions on themodels which can be applied, mostly designed to ensure that we do not end up

    with answers involving, say, negative variances. You can invent your own semi-variogram models if you wish, but remember the restrictions.

    We will present here a set of the most commonly used semi-variogram models.This is not an exhaustive set, but you will find all of these in the softwareassociated with this book (Practical Geostatistics 2000 software). There are othermodels in general use which are not included here, such as the de Wijsian modelfavoured by some South African gold mining companies (Krige (1979)).

    If we stick to the documented models, we should have few problems with the

    mathematical constraints. Remember that the major purpose for fitting a model isto give us an algebraic formula for the relationship between values at specifieddistances. This will be equivalent to the 'distance function' discussed in Part 1and will allow us to produce weighting factors for our samples based on theactual relationship between their values and that of the unsampled location.

    The Linear Model

    This is the simplest model for a semi-variogram graph, being a straight line with apositive slope and a positive (or zero) intercept with the semi-variogram axis.The formula and shape for this model are shown below:

    where represents the value on the semi-variogram axis and h the distancebetween the two points of interest. The parameter p represents the slope of the

  • 7/16/2019 Practical Geostatistics 2000-2 Spatial Statistics

    36/138

    p | 36

    line and C0 the nugget effect on the axis. This intercept is common to manysemi-variogram models and has been dubbed the 'nugget effect' or 'nuggetvariance'. It reflects the difference between samples which are very closetogether but not in exactly the same position. This has been interpreted invarious ways, but is generally accepted to be due either to sampling errors or to

    the inherent variability of the mineralisation.

    The Generalised Linear Model

    This is a generalisation of the Linear Model for a semi-variogram graph, being aline with a positive slope, a positive (or zero) intercept with the semi-variogramaxis. The 'generalisation' lies in the fact that the distance values are raised to aspecified power rather than linear. The formula and shape for this model areshown below:

    where is again the value on the semi-variogram axis and h the distance betweenthe two points of interest. Added to the parameter p representing the slope of theline and C0 the nugget effect on the axis, we have introduced for the power towhich distance is raised. For mathematical reasons, this power can only takevalues in the range 0 is less than or equal to which is less than 2. Theaccompanying diagram shows two generalised linear models with = 0.5 and =1.5 for illustration.

    The Spherical Model

    This is a model first proposed by Matheron and represents the non-overlap oftwo spheres of influence. The formula is a cubic one since it represents volumes,

    and relies on two parameters: the range of influence (radius of the sphere) andthe sill (plateau) which the graph reaches at the range. In addition to these, theremay be a positive intercepton the axis the 'nugget effect' described above. Theformula and shape for this model are shown below:

  • 7/16/2019 Practical Geostatistics 2000-2 Spatial Statistics

    37/138

    p | 37

    where is the semi-variogram and h the distance between the two points ofinterest. The parameter a represents the range of influence of the semi-variogram. We generally interpret the range of influence as that distance beyondwhich pairs of sample values are unrelated. C is the sill of the Sphericalcomponent and C0 the nugget effect on the axis. You will note that the final

    height of the semi-variogram model is C0+C.

    Unlike the previous two models, there are modifications which you can make tothe standard Spherical model. There are often cases where the semi-variogramgraph reaches a definite 'sill' but does not quite match the shape of a singleSpherical model. In this case you may mix Spherical components with differentranges of influence and/or sill values in order to achieve the correct shape.Remember that, if you have (say) three component Sphericals, the final height ofthe graph will be C0+C1+C2+C3.

    The formula for such a model is simply the combination of ordinary Sphericalmodels, remembering to stop each one as it reaches its range of influence:

    The Exponential Model

    This is a model developed to represent the notion of exponential decay of'influence' between two samples. It relies on two major parameters: the range of

    influence (a scaling parameter) and the sill (plateau) which the graph tendstowards at large distances. There is also a possible 'nugget effect'. The formulaand shape for the Exponential model are shown below:

  • 7/16/2019 Practical Geostatistics 2000-2 Spatial Statistics

    38/138

    p | 38

    where is the semi-variogram and h the distance between the two points ofinterest. The parameter a represents the so-called 'range of influence' of thesemi-variogram, Cthe sill of the Exponential component and C0 the nugget effecton the axis. You will note that the asymptotic height of the semi-variogrammodel is C0+C. You may also note that, although parameter a is referred to as the

    range of influence, it is not possible to interpret is in the same way as for theSpherical model. This distance a is not the distance at which samples become'independent' of one another. In fact, the exponential model reaches about two-thirds of its height at a distance a, and must go to four or five times this distanceto come close to its asymptotic 'sill'.

    The Gaussian Model

    This model represents phenomena which are extremelycontinuous or similar at short distances. Although

    illustrated with a nugget effect, this is almost anoxymoron. This sort of model occurs in topographicapplications or where samples are very large comparedto the spatial continuity of the values being measured.

    The formula for this curve is similar to that for acumulative Normal distribution hence the name Gaussian it does not implythat the sample values must be Normal:

    where is the semi-variogram and h the distance between the two points ofinterest. The parameter a represents the so-called 'range of influence' of thesemi-variogram, Cthe sill of the Gaussian component and C0 the nugget effect onthe axis. You will note that the asymptotic height of the semi-variogram modelis C0+C. You may also note that, although parameter a is referred to as the rangeof influence, it is not possible to interpret is in the same way as for the Sphericalmodel. This distance a is notthe distance at which samples become 'independent'of one another. In fact, the Gaussian model reaches about two-thirds of its height

    at a distance , and must go to four or five times this distance to come close toits asymptotic 'sill'.

    The Hole Effect Model

    This is a model developed to represent a cyclic or periodic relationship betweentwo samples. It relies on two major parameters: the cycle distance (a full cycle ofthe periodicity) and the sill (plateau) which the graph tends to oscillate around.There is also a possible 'nugget effect'. In many cases, a fourth parameter is addedto these three, known as a 'damping' or decay parameter. Without this parameter

    the cyclic effect would continue on to infinity. In practical circumstances, the

  • 7/16/2019 Practical Geostatistics 2000-2 Spatial Statistics

    39/138

    p | 39

    relationship usually tends to trail off and this may be reflected by including thedamping parameter. The formula for the Hole Effect model with damping is:

    where the formula for this model would be:

    where is the semi-variogram and h the distance between the two points ofinterest. The parameter is the cycle interval (distance), represents the so-called 'decay' or damping parameter of the semi-variogram, Cthe sill of the HoleEffect component and C0 the nugget effect on the axis. You will note that theasymptotic height of the semi-variogram model is C0+C. The damping on themodel is inverse to the magnitude of the parameter the bigger the the less thedamping effect. It is also scaled by the distance h, and so will be relative to thescale at which we calculate the graph.

    Note of CautionThe hole effect model is not mathematically stable and can lead tosome weird results if used without care.

    Paddington Mix Model

    This model is included mostly as a illustration of how you can reflect the geologyor structure of the measurements by combining components of various differentshapes. In this case we have a fairly continuous phenomenon which has a weakercyclic component present.

    The model was first used (by us) in an Australian gold deposit which was 'shearenhanced'. That is, the gold was present throughout the mining area but valueswere higher close to a quartz shear. Since shears in rock tend to occur with greatregularity, the obvious Spherical structure for gold grade was modified by a'ripple' effect due to the presence of shears. The shape of the graph changed

    according to whether the direction was parallel to or across the direction of thequartz shears.

  • 7/16/2019 Practical Geostatistics 2000-2 Spatial Statistics

    40/138

    p | 40

    Other applications for this type of model include:

    potholes in platinum reefs; diamonds occurring on the sea-bed (genuine ripples); plant or tree yields where fields have been trenched; and occurrences of species which show 'competition' effects.

    where the full formula would be:

    where is the semi-variogram and h the distance between the two points ofinterest. The parameter a represents the range of influence of the semi-variogram, C the sill of the Spherical component. The parameter is the cycleinterval (distance), represents the so-called 'decay' or damping parameter ofthe semi-variogram, Chef the sill of the Hole Effect component and C0 the nuggeteffect on the axis. You will note that the asymptotic height of the semi-

    variogram model is C0+C+Chef.

    Judging How Well the Model Fits the Data

    This is a tough one. There have been many attempts to develop automatic modelfitting techniques, least squares methods and other confidence or sensitivitystudies over the last 35 years. Noel Cressie came up with a very nice 'goodness offit' statistic in the late 1980s which goes a long way to measuring how well themodel fits the data(Cressie (1993)).

    We (your present authors) are still a little conservative on this matter and prefera combination of statistic and visual assessment.

    The Cressie goodness of fit statistic is calculated as follows. For each point on thesemi-variogram graph, calculate:

    and sum the terms. This is analogous to but not the same as the2 goodness of fit

    test, even allowing for the weighting by the number of pairs. This statistic allowsbigger deviations between the experimental and model semi-variogram as the

  • 7/16/2019 Practical Geostatistics 2000-2 Spatial Statistics

    41/138

    p | 41

    model becomes higher. It demands closer fitting at the lower levels usually thelower distances than at higher ones. It also demands better fitting where youhave more pairs of samples in a point. The fit is directly weighted by the numberof pairs of samples.

    In the software, you will find that this statistic has been modified slightly. Theactual magnitude of the statistic depends on the total number of pairs of samplesfound during the calculation. Now, not all samples are paired the same number oftimes in all different directions. This means that a Cressie statistic of a certain sizein one direction is not necessarily equivalent to the same value in anotherdirection. To adjust for this, we suggest a modification to remove the scaling bytotal number of pairs. Simply:

    dividing through by the total number of pairs of samples in that particular semi-variogram fit.

    Equivalence to Covariance Function

    Matheron showed that, if the semi-variogram model has a sill or final asymptote,then the final height of the semi-variogram is theoretically equal to thepopulation variance of measured values. That is, as h , 2 then . If thesemi-variogram has a sill, this tends to support the assumption that the samples

    come from a 'stationary' distribution with a fixed mean and standard deviation.In this case, some workers prefer to use the covariance function rather than thesemi-variogram function. The relationship between the two is:

    where covh is used here as shorthand for the covariance between sample valuesat distance (and direction) h, and is estimated by:

    There does not seem to be any indication in the geostatistical literature thatNh-1might be more appropriate here, since we have to use or some other estimatorto estimate the population average,. This shortcoming in the literature has beenpointed out rather forcefully by Dr. Jan Merks in some moderately intemperatearticles (Merks (1992 -1994)).

    The Nugget Effect

    The nugget effect or discontinuity at short distances in the semi-variogram isanother cause of much dissension in the geostatistical world. If we stand by our

  • 7/16/2019 Practical Geostatistics 2000-2 Spatial Statistics

    42/138

    p | 42

    initial assumptions that sample values are measured precisely and accurately or, if you prefer, they are reproducible and representative then the semi-variogram model must go to zero at zero distance. That is,

    The interpretation of the nugget effect then becomes one of the physical nature ofthe phenomenon being measured. The term nugget effect (or nugget variance)was coined on the basis of the interpretation of gold mineralisation. No matterhow close the samples get, there will be large differences in value between thesamples because of the 'nuggety' occurrence of the gold. We have to get down tothe scale of a gold nugget to have values which are the same. Any further apartthan the size of a nugget and one sample is inside a nugget and whilst the other isoutside and the values are very different.

    If we accept that the semi-variogram takes the value zero at zero distance, thenthe nugget effect represents the difference between two samples right next to oneanother contiguous quadrats, two halves of a borehole core, fish swimmingtogether and so on. If we do not accept 'zero at zero' then what we are basicallysaying is that we do not believe the data values. If the nugget effect is treated asan intercept on the axis, so that

    this implies that two samples taken at the same location could have a variance

    between them of2C0. We will see in Part 3 the impact that these two differentassumptions have on our analysis.

    Embarrassing question to ask your software vendor:

    What does your package do with the semi-variogram at zero distance?Some software allows the user to fit a semi-variogram model but thenuses the covariance function for the estimation process. You can laygood odds that, if your package does this, the semi-variogram modeldoes not go through zero at zero distance.

    Worked Examples

    Worked Examples

    This section contains worked examples using:

  • 7/16/2019 Practical Geostatistics 2000-2 Spatial Statistics

    43/138

    p | 43

    Silver Example Coal Project

    Wolfcamp

    Silver Example

    We illustrate some of the problems with fitting semi-variogram models with a simple example which appearedin Practical Geostatistics 1979 (Clark (1979)). A tunnelwas driven horizontally into a base metal sulphidedeposit in Southern Africa. Samples were chipped from

    the walls of the drive every one metre. The drive was 400metres long. The calculation of the semi-variogram isextremely simple and the results are listed in Table 1.

    The graph of this experimental semi-variogram is shownin Figure 1. We see what looks like an ideal semi-variogram shape for the first 70 metres or so. This graphstarts at zero, rises gradually, slows down and levels offto a 'sill'. After about 80 metres, it surges into a parabolicrise. This last part of the graph is an indication of a trendin the values on the larger scale. We have calculated the semi-variogram on theassumption of no trend in the values. That is:

    where we assumed thatdiff= 0. An exactly equivalent formula would be:

    By assuming a zero mean we have failed to subtract a positive quantity from thesemi-variogram calculation. If the mean is not zero, the graph shows the actualsemi-variogram plus this squared component a parabola. Thus, parabolicbehaviour in a semi-variogram is an instant diagnostic of trend or drift in thevalues. This is one reason why we are not allowed to fit generalised linear modelswith slopes of 2 or higher.

    In this case, the trend only becomes apparent after 75 metres. If we restrict ourinterpretation to within this distance, we should be safe enough assuming notrend.

  • 7/16/2019 Practical Geostatistics 2000-2 Spatial Statistics

    44/138

    p | 44

    Figures 2, 3 and 4 show fitted models of the exponential, spherical andpaddington mix variety. In all three cases there are some desirable and some notso desirable characteristics. Of the three, the exponential model gives the bestCressie statistic, but is possibly the least pleasing visually. This examplehighlights one of the problems with using a goodness of fit statistic it dependson which points we choose to include in the calculation. If we only look at the first30 metres, we would judge the fits quite differently:

    Coal Project, Calorific Values

    In Part 1 we showed a post plot of the calorific valuesfrom the Coal Project data set. This is a straightforwardset of Normally distributed data on a regular 150 metre

    grid with some gaps. The calculation of the experimentalsemi-variogram is straightforward and we have used itfor classroom exercises very successfully. We find that ifeach student calculates the point for a specified distanceand direction, the construction of the graph becomes ateam effort in which every student can contribute. It alsogives a better intuitive idea of the calculation process.

    The experimental semi-variograms for the four mainpoints of the compass: north/south, east/west and twodiagonal directions are shown in Table 2 and in Figure 5. It is clear from this

    graph that there is no significant difference between the directional semi-variograms. When judging the difference between directional semi-variograms,you should bear in mind that each point is an estimate of (one-half of) a variance.In ideal circumstances, you should be able to take each point and put a confidenceinterval around it, as we did in Part 2 of the previous course for variances andstandard deviations. Aso remember that, when you compare the two differentestimates for a particular semi-variance say, the east/west and the north/southat the same distance this is the equivalent of an Fratio test for variances.

    The combined 'omni-directional' experimental semi-variogram is very well

    behaved and exhibits a slight curve upwards with no sign of a sill on the graph.This is an ideal candidate for a generalised linear semi-variogram model. Fitting a

  • 7/16/2019 Practical Geostatistics 2000-2 Spatial Statistics

    45/138

    p | 45

    generalised linear is very similar to fitting a straight line provided you can takelogarithms. For our first estimates, we want a line which goes through, say,

    Now, our generalised linear model is:

    where C0 is the nugget effect, p the slope of the curve and the power for thedistance. Because of mathematical restrictions, must have a value between zeroand 2. From our experimental points, we need:

    In our case we have an apparent zero nugget effect, but we will leave it in here forgenerality. If we rewrite this as:

    then a log transform of both sides would give:

    We estimate our nugget effect in this case to be zero. This reduces the equationsto:

    Solving for and loge p we find:

    These values can be substituted back into the model equation to give us a 'model'value for each experimental point on the semi-variogram graph. The Cressie

  • 7/16/2019 Practical Geostatistics 2000-2 Spatial Statistics

    46/138

    p | 46

    goodness of fit statistic can be calculated and the visual fit between model anddata assessed. The parameters will need to be adjusted to get a 'best fit' model.

    Of course, those of you with fancy statistical packages will be able to use all of theexperimental points in a weighted least squares solution to minimise the Cressie

    statistic. After a few adjustments, our best model was found to have:

    The final Cressie calculation is shown in Table 3 and the fitted model along withthe experimental semi-variogram in Figure 6.

    Wolfcamp Aquifer

    The Wolfcamp data set has been seen to exhibit severalundesirable traits as far as classical statistical analysis is

    concerned. In particular, the samples are highly clusteredspatially and there is a significant downward trend in thevalues from southwest to northeast. A post plot confirmsthis visually.

    Before we can calculate a semi-variogram, we have to choose the distanceintervals between points on the graph and how many of those intervals we willwant to see. One of the simplest (if not exactly the quickest) ways to assess thedistance between the sample locations is to look at a 'nearest neighbour'distribution. For each sample, we find the nearest sample to that location. The

    distance is recorded and the process repeated for all of the sample points in turn.A histogram can be constructed of the nearest neighbour distances. If the samplelocations are totally random, this histogram would follow a negative exponentialform. If the data was on a strict grid, all of the nearest neighbour distances wouldbe identical. This is also a useful timing exercise, since the nearest neighbouranalysis takes exactly twice as long as the maximum calculation time for a semi-variogram graph. A histogram of the results is shown in Figure 8. There is a clearmode at around 3 miles and a slow tail off into the larger distances for the moreisolated points.

    Calculating the Semi-variogram

  • 7/16/2019 Practical Geostatistics 2000-2 Spatial Statistics

    47/138

    p | 47

    From the above nearest neighbour analysis, we see thatthe minimum feasible interval for a semi-variogramcalculation is around 2.5-3 miles. However, we mustbalance this 'natural' sampling interval against the needto acquire reasonable estimates for the semi-variance

    that is, a reasonable number of pairs of samples for eachpoint on the graph. It is often necessary to experimentwith the data to achieve the optimum compromise between number of points onthe graph and reliability of each point. The maximum number of intervals shouldbe chosen with this in mind. A general rule of thumb is to choose the maximuminterval at around half the geographical extent of the study area. This ensuresthat we do not run out of pairs of samples.

    When a significant trend is present in the sample values,this shows up in the semi-variogram calculation as a'parabolic' component. It is particularly noticeable if youconstruct semi-variograms in different directions.

    For example, if we calculate semi-variograms at 5 mileintervals in four major directions: north/south, east/west,northeast/southwest and northwest/southeast before taking out the trend, weget Figures 9 and 10. These are just two different ways of displaying the samegraphs. Note that the shape of the semi-variograms in the different directionsexactly reflects the shape of the trend in the values. In the northwest/southeastdirection, the semi-variogram is comparatively low with relatively smalldifferences between the sample values. In the northeast/southwest direction, the

    differences between the sample values get larger and larger as the square of thedistance. This is the diagnostic for a trend in the values. Values are changedsignificantly more in one direction than in another. To complete the picture andensure us of a consistent diagnosis, we find that the east/west and north/southsemi-variograms lie neatly between the direction of maximum difference and thatof minimum difference.

    These semi-variograms tell us that there is a trend andthat the contours probably run northwest/southeast.They do not tell us whether the values are rising or fallingto the northeast. The trend surface analysis tells us that.The semi-variogram also does not tell us what form thetrend takes. In this case we would expect it to be a fairlylow order, since one direction northwest/southeast appears to have no trend at all.

    Trend Surface Analysis

    The calculated coefficients for each term in the three trend surfaces.

    X represents the first co-ordinate (left-right on maps),

    and Ythe second (bottom-top). It is difficult to judge fromthe coefficients listed in Figure 11, just which of these

  • 7/16/2019 Practical Geostatistics 2000-2 Spatial Statistics

    48/138

    p | 48

    surfaces might 'best' describe the Wolfcamp data. The residual variation is thedifference between what the trend says is there (the equation) and what theactual value was. We usually choose a surface which makes the residual variationas small as possible. A traditional method used by geologists since the mid 1950sis simply to calculate the sum of the squared residuals and compare this as a

    percentage of the original variation.

    A more formal method used by statisticians to judge thesuitability of a Least Squares fit is the Analysis ofVariance. This analysis requires that the residuals shouldbe independent of one another. However, we feel that theAnalysis of Variance is another way of getting an intuitive'feel' for which surface (if any) may be best. This producesthe results to the right:

    The final column is the important one. Under statistical assumptions of Normalityand independence, the statistics shown in this last column would follow the Fdistribution. Reference tables 5(a) and 5(b) (below) ofFdistribution statistics atvarious levels of 'significance' are given in this book. The first item in the lastcolumn in the table is a value of 338.76. This statistic compares the variation onthe original set of sample data with that left after fitting a linear (planar) surface.Looked at very simplistically we might say that fitting a linear trend surface hasreduced the variation amongst the sample values by a factor of over 300. In otherwords, a simple surface described by two coefficients 'explains' a significantproportion of the variability in the original sample values. These two coefficientsare the ones shown in the equation above multiplying the terms Xand Y in the

    linear equation.

    Now, when you look at the next division in the table there are two figures in theright hand column. The first compares the fitting of a quadratic surface with its

    five coefficients with assuming that there is no trend. In this case the figure is165.94 indicating that a very significant proportion of the original variation canbe 'explained' by a quadratic trend surface. This figure is lower than the linear Fstatistic, partly because we have had to calculate more coefficients. However, itmight be because the quadratic is not a lot better than the linear surface. Tocompare these directly we produce the second figure. This is a measurement ofhow much more variation is explained by the quadratic than was alreadyexplained by the linear. In short, the amount of extra variation explained by theinclusion of three extra coefficients. The Fstatistic calculated for this Wolfcampdata was 6.37. In our simplistic interpretation we can interpret this statistic ascomparing:

    the va