18

Spatially Balanced Sampling of Natural Resources · Spatially Balanced Sampling of Natural Resources DonL.STEVENSJr. and Anthony R. OLSEN The spatial distribution of a natural resource

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Spatially Balanced Sampling of Natural Resources · Spatially Balanced Sampling of Natural Resources DonL.STEVENSJr. and Anthony R. OLSEN The spatial distribution of a natural resource

Spatially Balanced Sampling of Natural ResourcesDon L STEVENS Jr and Anthony R OLSEN

The spatial distribution of a natural resource is an important consideration in designing an ef cient survey or monitoring program for theresource Generally sample sites that are spatially balanced that is more or less evenly dispersed over the extent of the resource are moreef cient than simple random sampling We review a uni ed strategy for selecting spatially balanced probability samples of natural resourcesThe technique is based on creating a function that maps two-dimensional space into one-dimensional space thereby de ning an orderedspatial address We use a restricted randomization to randomly order the addresses so that systematic sampling along the randomly orderedlinear structure results in a spatially well-balanced random sample Variable inclusion probability proportional to an arbitrary positiveancillary variable is easily accommodated The basic technique selects points in a two-dimensional continuum but is also applicable tosampling nite populations or one-dimensional continua embedded in two-dimensional space An extension of the basic technique givesa way to order the sample points so that any set of consecutively numbered points is in itself a spatially well-balanced sample This latterproperty is extremely useful in adjusting the sample for the frame imperfections common in environmental sampling

KEY WORDS Environmental sampling Imperfect sampling frame Monitoring Non-response Spatial sampling Survey design Sys-tematic sampling

1 INTRODUCTION

Environmental studies invariably involve populations dis-tributed over space Traditionally such studies tended to fo-cus on relatively small and well-delimited systems Howeversome of the environmental issues that we face today suchas global warming long-range transport of atmospheric pol-lutants or habitat alteration are not localized Understandingand quantifying the extent of symptoms of widespread concernrequires large-scale study efforts which in turn needs environ-mental sampling techniques and methodology that are formu-lated to address regional continental and global environmentalissues Stehman and Overton (1994) gave an overview of somestatistical issues associated with environmental sampling andmonitoring and Gilbert (1987) gave an extensive discussion ofsampling methods for monitoring environmental pollution

Several generic situations arise when sampling environmen-tal resources spread over large spatial extents Many resourcepopulations may be represented as collections of points linesor areas that is as zero- one- or two-dimensional objectsFor sampling purposes the major distinctions occur between nite (pointlike zero-dimensional) linear (one-dimensional)and areal (two-dimensional)populationsFinite populationsarethose with discrete identi ably distinct units that occupy xedlocations within a bounded area Examples are studies of thebasal area of trees within a forest and the eutrophication statusof lakes within the United States Treating the lakes as pointsin a two-dimensional domain is appropriate if the purpose ofthe sample is to determine an attribute of each sampled lakeand estimate characteristics of the lake population The point

Don L Stevens Jr is Senior Research Associate Professor Departmentof Statistics Oregon State University Corvallis OR 97331-4501 (E-mailstevensstatorstedu) Anthony R Olsen is Mathematical Statistician USEnvironmental Protection Agency NHEERL Western Ecology Division Cor-vallis OR 97333 We appreciate the willingness of the Indiana Department ofEnvironmental Management to let us use their state monitoring program as anexample special thanks go to Stacey Sobat who provided the biological dataComments by the Associate Editor and referees helped improve the clarity ofthe manuscript The research described in this article was funded by the USEnvironmental Protection Agency This document was prepared at the EPA Na-tional Health and Environmental Effects Research Laboratory Western Ecol-ogy Division in Corvallis Oregon through contract 68-C6-0005 to DynamacInternational Inc and though cooperative agreement CR82-9096-01 to OregonState University it was subjected to Agency review and approved for publica-tion The conclusions and opinions are solely those of the authors and are notnecessarily the views of the Agency Mention of trade names or commercialproducts does not constitute endorsement or recommendation for use

associated with a lake could be any uniquely de ned locationin the lake for example the lake centroid Linear resources arepopulations such as streams or rivers that are present only ona linear network within a bounded area Attributes are de nedat all points of a stream or river network for example waterchemistry Linear resources are often sampled as nite popula-tions by breaking them into discrete units say by taking xed-length intervals beginning at the mouth headwaters or domainboundary The division into units is often arbitrary because theresource does not have well-de ned natural units Such a dis-cretization ignores the essential nature of linear resources asone-dimensionalcontinuaembedded in two-dimensionalspaceConceptualizingthem as linear networks and sampling at pointsalong the network retains the continuous nature of such popu-lations From this viewpoint the population is an uncountablyin nite collection of points An areal resource is a continuouspopulation that is present everywhere within a bounded areaAreal resources extend over large regions in a more or less con-tinuous and connected fashion although they may comprisedisconnected polygons As for a linear resource an areal re-source does not have distinct natural units and is viewed as anin nite point set for example all forested land in the UnitedStates the Puget Sound estuary and large wetlands such as saltmarshes or the Everglades fall into this category

A consideration that frequently arises in designing an envi-ronmental resource sample is that some population elementsare perceived to be more important than others For examplein sampling lakes one might wish to select large lakes witha greater probability than small lakes because large lakes areless numerous than small or because they contribute dispro-portionately to total surface area total water volume or totalrecreational usage For a second example one might wish toincrease the sampling rate for lakes in an arid region of thepopulation domain to get enough samples to reliably describelake characteristics for the region These two examples illus-trate two very different scenarios for which variable probabilitysampling might be required In the rst the probability varieselementwise and depends on an attribute (in this case size) ofthe element In the second case the probability varies on a ge-ographical region basis but may be the same for every element

copy 2004 American Statistical AssociationJournal of the American Statistical Association

March 2004 Vol 99 No 465 Theory and MethodsDOI 101198016214504000000250

262

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 263

within the region Moreover the two scenarios can occur incombination so that we have a need to conform to both ele-mentwise and regionwise variation in inclusion probability

A practical complication frequently encountered in envi-ronmental sampling is the dif culty of obtaining an accuratesampling frame In many instances available sampling framesinclude a substantial portion of nontarget elements For exam-ple we could use the National Hydrography Dataset (NHD)available from the US Geological Survey (USGS) as a sampleframe for perennial streams (USGS 1999) Although attributeswithin NHD can be used to identify a subset of NHD that moreclosely matches the target population of perennial streams thesubset still includes many ephemeral or intermittent streamsor long-dry channels especially in the more arid sections ofthe western United States Another problem is that much of theresource we might like to sample is inaccessible because ofphysical location safety or lack of access permission from thelandowner In some cases it is possible to lose 50 or moreof potential sample points because of lack of access That issigni cant nonresponse can be an issue in environmental sur-veys Both problems result in fewer samples being collectedthan planned If estimates of the percentage of the samplingframe that is nontarget or the percentage of inaccessible sitesare available then a common practice is to increase the plannedsample size Our experience is that such estimates are not avail-able or at best are poorly known

Some of the attributes of resource populations that in uencesampling design are spatial pattern in the measured or observedresponse uneven spatial distributionsof the populationand dif- culty in obtaining an adequate frame Spatial pattern in theresponse arises because nearby units interact with one anotherand tend to be in uenced by the same set of natural and an-thropogenic factors For example neighboring trees in a forestinteract by competing for energy and nutrients and are in u-enced by the same set of physical and meteorological condi-tions the same level of air- or water-borne pollutants and thesame set of landscape disturbances The pattern in the responsemay show up either as a gradient or as a mosaic A number ofstudies have concludedthat regularly spaced design points (egsystematic designs) are optimal for a variety of reasonable spa-tial correlation functions (see eg Cochran 1946 Quenouille1949 Das 1950 Mateacutern 1960 Dalenius Haacutejek and Zubrzycki1961 Bellhouse 1977 Iachan 1985)

The concept that some degree of spatial regularity should beused for sampling for environmental populations is well estab-lished Accordingly there are numerous paradigms for incor-porating the spatial aspect of an environmental population intoa sample Area sampling partitions the domain of the popula-tion into polygons which can be treated either as strata or aspopulation units themselves Systematic sampling using a reg-ular grid is often applied (Bickford et al 1963 Messer et al1986 Hazard and Law 1989) as are several variants that per-turb the strict alignment (Olea 1984) Spatial strati cation isalso frequently used with regular polygonsnatural boundariespolitical boundaries or arbitrary tessellations as strata Max-imal strati cation that is one or two points per stratum hasbeen viewed as the most ef cient To this end MunhollandandBorkowski (1996) used a Latin square with a single additionalindependent sample to achieve a spatially balanced sample

Breidt (1995) used a Markov process to generate a one-unit-per-stratum spatially distributed sample Both of these techniquesselect cells in a regular grid Another approach is to use spaceto order a list frame of the ( nite) population and then use theorder of the list to structure the sample say by de ning strata assuccessive segments of the ordered list or by systematic randomsampling For example Saalfeld (1991) drew on graph theoryto de ne a tree that leads to a spatially articulated list frame andthe National Agricultural Statistics Service has used serpentinestrips (Cotter and Nealon 1987) to order primary sample unitswithin a state A related idea that originated in geography isthe general balanced ternary (GBT) spatial addressing scheme(Gibson and Lucas 1982)The concept behind a GBT address isrelated to the concept of space- lling curves such as rst con-structed by Peano (1890) or the Hilbert curve (Simmons 1963)Stevens and Olsen (1999) used a similar concept recursive par-titioningtogetherwith hierarchical randomizationto distributesample points throughspace and time Wolter and Harter (1990)used a construction similar to Peanorsquos to construct a ldquoPeanokeyrdquo to maintain the spatial dispersion of a sample as the under-lying population experiences births or deaths Saalfeld (1991)also used the Peano key to maintain spatial dispersion of a sam-ple

The foregoing cited methods all do reasonably well at get-ting a spatially balanced sample under favorable circumstancesbut have dif culties with some aspect of environmental popu-lations For example spatial strati cation can be applied to -nite linear and areal populations However de ning strata for nite or linear populations with variable probability and sub-stantial variation in spatial density can be dif cult maximal ef- ciency is obtained for one or two samples per stratum To doso we need some means to split the population into spatiallycontiguousstrata We could simply adopt equal-sized strata thattile the population domain which usually results in a variablenumber of samples per stratum and noninteger expected sam-ples sizes (We illustrate this approach in Sec 3) Alternativelywe could try to develop unequal-area strata with the same ornearly so expected number of samples (total inclusion proba-bility) in each stratum For a small nite population the stratacould be developed by inspection For a large population saythe 21000 lakes in the northeasternUnited States an automatedstrati cation procedure is necessary Developing such a proce-dure is a nontrivial task Small sample sizes per strata are goodfor ef ciency but cause the greatest loss of ef ciency in thepresence of nonresponse Suppose we have a two-sample-per-stratum design with a moderate rate of nonresponse say 25We are almost certain to lose both samples from some strataIf we replace both sample points we double the inclusion prob-ability for those strata Moreover it is quite possible that the re-placement points will also be nonresponse points so we end uptripling or quadrupling the inclusion probability The result isan unintentionalimbalance in inclusionprobability those stratawith high nonresponseget less weight in the analysisDeviationfrom the intended inclusion probability that introduces morevariation in the weight of the sample points results in loss ofef ciency Similar arguments can be made for the other meth-ods for achieving spatial balance

Sampling the gamut of natural resources requires a tech-nique that can select a spatially balanced sample of nite

264 Journal of the American Statistical Association March 2004

linear and areal resources with patterned and possibly peri-odic responses using arbitrarily variable inclusion probabilitywith imperfect frame information in the presence of substan-tial nonresponse In the design discussed here we generalizethe concept of spatial strati cation to create a very powerfuland exible technique for selecting a spatially well-distributedprobability sample that works under all of the preceding cir-cumstances The technique is based on creating a functionthat maps two-dimensional space into one-dimensional spacethereby de ning an ordered spatial address We use a restrictedrandomizationcalled hierarchical randomization(HR) (Stevensand Olsen 2000) to randomly order the address and then ap-ply a transformation that induces an equiprobable linear struc-ture Systematic sampling along the randomly ordered linearstructure is analogous to sampling a random tessellation oftwo-dimensional space and results in a spatially well-balancedrandom sample We call the resulting design a generalizedrandom-tessellation strati ed (GRTS) design We develop thedesign in a general setting that applies to nite linear and arealresources and that accommodates arbitrary inclusion probabil-ity functions A particularly favorable feature is that we can dy-namically add points to the sample as we discover nontargetor inaccessible points at the same time maintaining a spatiallywell-balanced sample Features of the design are demonstratedwith a simulation study and are illustrated with an applicationto rivers and streams in Indiana

2 GENERALIZED RANDOMndashTESSELLATIONSTRATIFIED DESIGN

Before presenting the theoretical development of the GRTSdesign we give a heuristic overview of the process Assumethat the sample frame consists of N points located within a ge-ographic region Assign each point a unit length and place eachpoint in some order (say randomly) on a line The line haslength N units Select a systematic sample of size n from theline by dividing the line into N=n length intervals randomlyselect a starting point between 0 N=n] say k and then takeevery k C iN=nth point for i D 1 n iexcl 1 If the point oc-curs within one of the units then that unit is selected (Brewerand Hanif 1983) For a linear resource use the actual lengthof the units to construct the line For an areal resource ran-domly place a systematic grid over the region randomly se-lect a point in each grid cell and then proceed as in the pointcase A GRTS sample results when a process termed hierarchi-cal randomization is used to place the points on the line Firstrandomly place a 2 pound 2 square grid over the region and placethe cells in random order in a line For each cell repeat thesame process randomly ordering the subcells within each orig-inal cell This second step results in 16 cells in a line Continuethe process until at most one population point occurs in a cellUse the random order of the cells to place the points on the lineThis hierarchical randomizationprocess maps two-dimensionalspace into one-dimensionalspace while preserving spatial rela-tionships as much as possible The combination of hierarchicalrandomization to create the line and systematic sampling witha random start results in a spatially balanced equal probabilitysample Unequal probability sampling is implemented by giv-ing each point a length proportional to its inclusion probability

Stevens (1997) derived inclusion and joint inclusion func-tions for several grid-based designs that were precursors toGRTS designs and share some of their properties The de-signs are all generalizations of the random-tessellation strati- ed (RTS) design (Dalenius et al 1961 Olea 1984 Overtonand Stehman 1993) The RTS design selects random points inspace via a two-step process First a regular tessellation co-herent with a regular grid is randomly located over the do-main to be sampled and second a random point is selectedwithin each random tessellation cell The RTS design is avariation on a systematic design that avoids the alignmentproblems that can occur with a completely regular systematicdesign Like a systematic design a RTS design does not allowvariable probability spatial sampling Stevens (1997) intro-duced the multiple-density nested random-tessellation strati- ed (MD-NRTS) design to provide for variable spatial samplingintensityThe geometric conceptunderlyingthe MD-NRTS wasthe notion of coherent intensi cation of a grid that is addingpoints to a regular grid in such a way as to result in a denserregular grid with similarly shaped but smaller tessellation cellsWe have since extended the same notion by generalizing to aprocess that creates a potentially in nite series of nested co-herent grids In the limit the process results in a function thatmaps two-dimensional space into one-dimensional space

We can cover nite linear and areal populations with thesame developmentif we work in the context of general measureand integration theory Let R be the domain of the populationwe wish to sample that is the set of points occupied by ele-ments of the populationWe require that R be a bounded subsetof R2 Thus R can be enclosed in a bounded square so that byscaling and translation we can de ne a 1ndash1 map from R into01=2] pound 01=2] the lower left quadrant of the unit square(We map to the lower left quadrant so that we can add a ran-dom offset to the image of R and stay within the unit squareThe random offset guarantees that the points from any pair canend up in different quadrants) Clearly every point in the im-age is associated with a unique point in R and vice versa sohenceforth we identify R with its image in the unit square

21 Random Quadrant-Recursive Maps

The heart of the GRTS sample selection method is a functionf that maps the unit square I2 D 01]pound 0 1] onto the unit in-terval ID 0 1] To be useful in achieving a spatially balancedsample f must preserve some proximity relationships so weneed to impose some restrictions on the class of functions tobe considered Mark (1990) in studying discrete two- to one-dimensionalmaps de ned a property called quadrant recursivewhich required that subquadrants be mapped onto sets of adja-cent points To de ne the continuousanalog let

Qnjk D

sup3j

2n

j C 12n

parapound

sup3k

2n

k C 12n

para

j k D 0 1 2n iexcl 1

and let

J nm D

sup3m

4n

m C 14n

para m D 0 1 4n iexcl 1

A function f I2 I is quadrant recursive if for all n cedil 0there is some m 2 f0 1 4n iexcl 1g such that f Qn

jk D J nm

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 265

We can view a quadrant-recursive function as being de nedby the limit of successive intensi cations of a grid coveringthe unit square where a grid cell is divided into four sub-cells each of which is subsequently divided into four sub-subcells and so on If we carried this recursion to the limitand paired grid points with an address based on the order inwhich the divisions were carried out where each digit of theaddress represented a step in the subdivision then we wouldobtain a quadrant-recursive function For example suppose webegin with a point at (1 1) and replace it with four pointsp3 D 11 p2 D 1 1=2 p1 D 1=2 1 and p0 D 1=21=2The next step of the recursion replaces each of the rst fourpoints p0 p3 with pi iexcl f0 0 0 1 1 0 1 1g=22Thus the point p2 D 1 1=2 is replaced with the four pointsp23 D 1 1=2 p22 D 11=4 p21 D 3=4 1=2 and p20 D3=4 1=4 The nth step replaces each of the 4n points pi1 i2 centcentcentinwith pi1 i2centcentcentin iexcl f0 0 0 1 1 0 1 1g=2nC1

A spatially referenced address can be constructed followingthe pattern of the partitioning with each new partition addinga digit position to the address Thus in the preceding exam-ple the four points in the rst group are assigned the addresses3 2 1 and 0 where 3 is the original point at 11 The suc-cessor points to point 2 get the addresses 23 22 21 and 20and so forth The addresses induce a linear ordering of the sub-quadrants Moreover if we carry the process to the limit andtreat the resulting address as digits in a base-4 fraction [egtreat 22103cent cent cent as the base-4 number (22103cent cent cent4] then thecorrespondence between grid point and address is a quadrant-recursive function

Recursive partitioning generates a nested hierarchy of gridcells The derived addressing has the property that all succes-sor cells of a cell have consecutiveaddresses Thus a path fromcell to cell following the recursive partitioning address orderwill connect all successor cells of cell 0 before reaching anysuccessor of cell 2 (Fig 1)

A 1ndash1 continuous mapping of I2 onto I is not possible soquadrant-recursive functions are not continuousHowever theydo have the property that all points in a quadrant are mappedonto an interval all points in any one of the four subquadrantsof a quadrant are mapped onto an interval and so on ad in ni-tum This property tends to preserve proximity relationshipsthat is if s is ldquoclose tordquo t then f s should ldquotend to be close tordquof t In Appendix A we make this statement more precise byshowing that if the origin is located at random and s is chosenat random from I2 then limjplusmnj0 E[jf s iexcl f s C plusmnj] D 0 In-tuitively two elements that are close together will tend to fallin the same randomly located cell of a size that decreases asthe distance between points decreases Because the two ele-ments are covered by the same cell their addresses match tothe level of that cell and thus in expectation their addresseswill be close

A fundamental 1ndash1 quadrant-recursive map is de ned bydigit interweaving Let s D x y be a point in I2 Each of thecoordinates has an expansion as a binary fraction of the formx D x1x2x3 cent cent cent y D y1y2y3 cent cent cent where each xi and yi is either0 or 1 De ne f0s by alternating successive digits of x and y that is f0s D x1y1x2y2 cent cent cent Clearly f0 would be 1ndash1 ex-cept for different expansions of the same number For example1 and 011111cent cent cent where the 1s continue inde nitely are two

Figure 1 First Four Levels of a Quadrant-Recursive Partitioning ofthe Unit Square The address associated with the cross-hatched cellis 213

representations of the number 1=2 If we always use the binaryrepresentation with an in nite number of 1s then f0 is 1ndash1Moreover every point in I is the image of a point in I2 whichis obtained by ldquodigit splittingrdquo That is if t D t1t2t3 cent cent cent is in Ithen s D f iexcl1

0 t D t1t3t5 cent cent cent t2 t4t6 cent cent cent is the preimage of t Both f0 and f iexcl1

0 are 1ndash1 if we always use the representationwith an in nite number of 1s (Hausdorff 1957 p 45) To showthat f0 is quadrant recursive note that for s 2 Qn

jk the rst4n digits of f0s are xed so f0s 2 J n

m where m is de nedby the rst 4n digits Conversely the preimages of every t 2 J n

m

have the same rst 2n digits and so must be in the same Qnjk

Figure 1 shows the rst four levels of the recursive parti-tioning of the unit square The address of the cross-hatchedsubquadrant is as a base-4 fraction (213)4 and the associ-ated grid point is at (3=4 1=2) the upper right corner ofthe subquadrant Following the convention of having an in- nite number of 1s in the expansion we have 3=41=2 D11 12 D 1011111 cent cent cent 0111111 cent cent cent2 Digit interweavinggives the image 10011111cent cent cent2 D 2133333 cent cent cent4 of whichthe rst three digits are the subquadrant address If we carriedthe recursive partitioning to the limit every point in the sub-quadrant would be assigned an address beginningwith 2134

The class of all quadrant-recursive functions can be gener-ated from the function f0 which is de ned by digit interweav-ing by permuting the order in which subquadrants Qn

jk are

paired with the intervals J nm For example for n D 1 f0Q1

jk DJ 1

2jCk We obtain a different quadrant-recursivefunctionby per-muting the subscripts f0 12 3g of the image intervals Thusunder the permutation iquest D f21 30g we get a function suchthat fiquest Q1

jk D J 1iquest 2jCk so that fiquest Q1

00 D J 12 fiquest Q1

01 D J 11

fiquest Q110 D J 1

3 and fiquest Q111 D J 1

0 To see that the class ofall quadrant-recursive functions is generated by such permu-tations express each number in I as a base-4 number that

266 Journal of the American Statistical Association March 2004

is as t D t1t2t3 cent cent cent where each digit ti is either a 0 1 2or 3 A function hp I I is a hierarchical permutationif hpt D p1t1pt12t2pt1 t23t3 cent cent cent where pt1 t2centcentcenttniexcl1ncent isa permutationof f0 12 3g for each uniquecombinationof dig-its t1 t2 tniexcl1 Again we ensure that hp is 1ndash1 by alwaysusing the expansion with an in nite number of nonzero digitsAny quadrant-recursive function can be expressed as the com-position of f0 with some hierarchical permutation hp becausethe associations f Qn

jk D J nm determine the series of permuta-

tions and the permutations de ne the associationsIf the permutations that de ne hpcent are chosen at random

and independentlyfrom the set of all possible permutations wecall hpcent a hierarchical randomization function and call theprocess of applying hpcent hierarchical randomization

22 Sample Selection With Probability Proportional toArbitrary Intensity Function

We assume that the design speci cations de ne a desiredsample intensity function frac14s that is the number of samplesper unit measure of the populationFor example if the popula-tion were a stream network frac14s might specify the number ofsamples per kilometer of stream at s For a discrete populationfrac14s has the usual nite-population-sampling interpretation asthe target inclusion probability of the population unit locatedat s We call frac14s an intensity function because we have notyet introduced a probability measure In Appendix B we de-velop the details of a sample selection method that yields aninclusion-probability function equal to frac14s The concept be-hind the method is the composition of a hierarchical random-ization function with a function that assigns to every intervalin f R a weight equal to the total of the intensity function ofits preimage in R In effect we stretch the image interval viaa distribution function F so that its total length is equal to thesample size M We pick M points by taking a systematic sam-ple with a unit separation along the stretched image and we mapthese points back into the domain R via the inverse function toget the sample of the population We show in Appendix B thatthis procedure does indeed give a sample with an inclusion-probability function equal to the intensity function frac14s

The technique of randomly mapping two-dimensional spaceto a line segment systematically sampling from the range of thedistribution function and then mapping back to the populationelements always produces a sample with the desired rst-orderinclusion-probability function as long as f is 1ndash1 and measur-able We required that f be quadrant recursive and claim thatthis is suf cient to give a spatially balanced sample This claimfollows from the fact that the map f iexcl1 plusmn F=M plusmn f transformsthe unequal intensity surface de ned by frac14 into an equiproba-ble surface The quadrant-recursive property of f guaranteesthat the sample is evenly spread over the equiprobable sur-face (in the sense that each subquadrant receives its expected

number of samples) to the resolution determined by the samplesize M

23 Reverse Hierarchical Ordering

The sample points selected by mapping the systematic pointsalong 0 M] back to the population domain will be ordered ina way that follows the quadrant-recursiveness of f temperedby an allowance for unequal probability selection Thus the rst quarter of the points all will come from the same ldquoquad-rantrdquo of the equiprobabledomain and all will be approximatelyneighbors in the original populationdomain It follows that fourpoints one picked from each quarter of the sample points or-dered by the systematic selection will be a spatially balancedsample Because the random permutations that de ne the hier-archical randomization are selected independently of one an-other it makes no difference from a distributional standpointwhether we pick the points systematically from each quarteror make random selections from each quarter Therefore welose no randomness by picking the points that occupy positionsthat correspond to being at the beginningone-quarter one-halfand three-quarters of the way through the ordered list of samplepoints

Within each quarter of the list the points are again quadrant-recursively ordered so points picked at the beginning one-quarter one-half and three-quarters of the way througheach quarter of the list will be spread out over the correspond-ing quadrant and so on down through the sequence of sub-quadrants We can utilize these properties by reordering thesystematically selected list so that at any point in the reorderedlist the samples up to that point are well spread out over thepopulationdomain

The order is most convenientlyexpressed in terms of a base-4fraction where the fraction expresses the relative position inthe systematically ordered list Thus the rst four points cor-respond to the fractions 0 1 2 34 D 0 1=4 1=2 3=410Stepping down a subquadrant level corresponds to addinga digit position to the base-4 fraction which we ll in sucha way as to spread the sequence of points over the populationdomain The pattern for the rst 16 points is shown in Table 1Note that the order corresponds to the ranking obtained by re-versing the sequence of base-4 digits and treating the reversedsequence as a base-4 fraction

We can continue this same pattern of adding digit positionsthrough as many positions as necessary to order the entire sam-ple The resulting order is called reverse hierarchical orderIt remains to show that reverse hierarchical order does indeedgive a spatially well-balanced sample for any m middot M Clearlythis is the case for m D 4k because the reduced sample canbe viewed as a sample selected from a complete GRTS designStevens (1997) derived an analytic expression for the pairwiseinclusion density for some special intermediate cases Here weinvestigate the spatial balance properties using simulation

Table 1 Generation of Reverse Hierarchical Order

Reverse Reverse Reverse ReverseOrder Base 4 base 4 Order Base 4 base 4 Order Base 4 base 4 Order Base 4 base 4

1 00 00 5 01 10 9 02 20 13 03 302 10 01 6 11 11 10 12 21 14 13 313 20 02 7 21 12 11 22 22 15 23 324 30 03 8 31 13 12 32 23 16 33 33

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 267

3 SPATIAL PROPERTIES OF GRTS SAMPLE POINTS

In this section we investigate the spatial balance or regu-larity of the sample points produced by a GRTS design Wenoted in the Introduction that generally the ef ciency of anenvironmental sample increases as spatial regularity increasesA design with regularity comparable to a maximally strati edsample should have good ef ciency Choosing a suitable sta-tistic to describe regularity is nontrivial because the popula-tion domain itself is likely to have some inherent nonregularity(eg variation in spatial density for a nite or linear popula-tion) and because of the need to account for variable inclusionprobability The measure of regularity needs to describe reg-ularity over the inclusion-probability-weighted irregular pop-ulation domain Various statistics to assess the regularity ofa point process have been proposed in the study of stochasticpoint processes One class of descriptive statistics is based oncounts of event points within cells of a regular grid that cov-ers the process domain The mean count is a measure of theprocess intensity and the variance of the counts is a measureof the regularity The usual point process approach is to invokeergodicity and take expectation over a single realization In thepresent case the expectationshould and can be taken over repli-cate sample selections

We illustrate this approach using an arti cial nite popula-tion that consists of 1000 points in the unit square with a spatialdistribution constructed to have high spatial variability that isto have voids and regions with densely packed points Variableprobability was introduced by randomly assigning 750 unitsa relative weight of 1 200 units a weight of 2 and 50 unitsa weight of 4 The inclusion probability was obtained by scal-ing the weights to sum to the sample size We divided the unit

square into 100 square cells with sides 1 units Fifty-one of thecells were empty The expected sample sizes (the sum of theinclusion probability for each cell) for the 49 nonempty cellsranged from 037 to 4111

We compared the regularity of three sampling designs in-dependent random sampling (IRS) spatially strati ed sampling(SSS) and GRTS sampling For each sampling scheme we se-lected 1000 replicates of a sample of 50 points and counted thenumber of sample points that fell into each of the 49 nonemptycells de ned in the previous paragraph For the IRS sample weused the S-PLUS (Insightful Corporation 2002) ldquosamplerdquo func-tion with ldquoprobrdquo set to the element inclusion probability

As we noted in the Introductionthere is no general algorithmfor partitioning an arbitrary nite spatial population with vari-able inclusionprobability into spatial strata with equal expectedsample sizes For this exercisewe chose to use equal-area stratawith variable expected sample sizes For simplicity we chosesquare strata We picked a side length and origin so that (1)the strata were not coherent with the 1 pound 1 cells used for reg-ularity assessment and (2) about 50 stratum cells had at leastone population point The strata we used were offset from theorigin by 03 03 with a side length of 095 Exactly 50 strat-i cation cells were nonemptywith expected sample sizes rang-ing from 037 to 4111 Figure 2 shows the population with thestrati cation cells overlaid

We selected the strati ed sample in two stages The fractionalparts of the expected sample sizes will always sum to an inte-ger in this case 21 The rst step in the sample selection was toselect which 21 of the 50 strata would receive an ldquoextrardquo samplepoint For this step we again used the S-PLUS ldquosamplerdquo func-tion this time with ldquoprobrdquo set to the fractional part of the ex-pected sample size The second step in the sample selection was

Figure 2 Finite Population Used in Spatial Balance Investigation Overlaid With Grid Cells Used for Stratication Cell cross-hatching indicatesthe expected sample size in each cell

268 Journal of the American Statistical Association March 2004

to pick samples in each of the 50 strata that had a sample sizeequal to the integerpart of the expected sample size plus 1 if thestratum was selected in stage 1 Again the sample was selectedwith the ldquosamplerdquo function this time with ldquoprobrdquo set to the ele-ment inclusionprobabilityThis two-stage procedurealways se-lects exactly 50 samples with the desired inclusion probability

In Figure 3 we plot the variance of the achieved sample sizein each of the evaluation cells versus the expected sample sizewith lowess tted lines Of the three designs the IRS has thelargest variance and the GRTS has the smallest the SSS designis approximatelymidway between Strati cation with one sam-ple per cell would likely have about the same variance as theGRTS

Another common way to characterize a one-dimensionalpoint process is via the interevent distance for example themean interevent time for a time series measures the intensity ofthe process and the variance measures the regularity An anal-ogous concept in two-dimensions is that of Voronoi polygonsFor a set of event points fs1 s2 sng in a two-dimensionaldomain the Voronoi polygon 9i for the ith point is the collec-tion of domain points that are closer to si than to any other sj inthe set Note that in the case of a nite population the Voronoildquopolygonsrdquoare collectionsof populationpoints and for a linearpopulation they are collections of line segments

We propose using a statistic based on Voronoi polygons todescribe the regularity of a spatial sample For the sample S

consisting of the points fs1 s2 sng let ordmi DR

9ifrac14s dAacutes

so that vi is the total inclusion probability of the Voronoi poly-gon for the ith sample point and set sup3 D Varfordmig For a -nite population with variable inclusion probability vi is thesum of the inclusion probability of all population units closerto the sample point si than to any other sample point BecauseP

i j9ij D jRj andR

Rfrac14s dAacutes D n E[vi ] D 1 We note that

for an equiprobable sample of a two-dimensionalcontinua sup3 isequal to the variance of the area of the Voronoi polygons for thepoints of S multipliedby the square of the inclusionprobability

Figure 3 Comparison of the Regularity of GRTS SSS and IRS De-signs Results are based on the mean of 1000 samples of size 50 Theachieved sample size is the number of samples that fell into 1 pound 1square cells that tiled the population domain Lines were tted withlowess (N generalized random tessellation stratied sampling pound inde-pendent random sampling brvbar spatially stratied sampling)

For the kinds of applications that we have in mind the spa-tial context of the population is an intrinsic aspect of the sampleselection For a nite population the spatial context simplycomprises the locationsof the populationunits for a linear pop-ulation the spatial context is the network and for an areal re-source the spatial context is described by the boundary of theresource domain which may be a series of disconnected poly-gons The effect of the interplay of sampling design and spa-tial context on properties of the sample cannot be ignored Forsmall to moderate sample sizes or for highly irregular domainsthe spatial context can have a substantial impact on the distri-bution of sup3 Because of the spatial dependence the derivationof a closed form for the distribution of sup3 does not seem feasi-ble even for simple sampling designs such as IRS Howeverfor most cases it should be relatively easy to simulate the dis-tribution of sup3 under IRS to obtain a standard for comparisonThe regularity of a proposed design can then be quanti ed asthe ratio sup3proposed design=sup3IRS where ratios less than 1indicate more regularity than an IRS design

We evaluated spatial balance using the sup3 ratio under threescenarios (1) a variable probability sample from a nite popu-lation (2) an equiprobable point sample from an areal popula-tion de ned on the unit square and (3) an equiprobable pointsample from the same extensive populationbut with randomlylocated square holes to model nonresponseand imperfect frameinformation

For the nite population study we drew 1000 samples ofsize 50 from the previously described nite population for boththe GRTS and the IRS designs To illustrate the ability of theGRTS design to maintain spatial regularity as the sample sizeis augmented we ordered the GRTS points using reverse hi-erarchical ordering We then calculated the sup3 ratio beginningwith a size of 10 and adding one point at a time following thereverse hierarchical order We also drew 1000 samples of size50 using the previously discussed spatial strati cation Becausethere is no sensible way to add the strati ed sample points oneat a time we can compute the sup3 ratio only for the complete sam-ple of 50 points Figure 4 is a plot of the sup3 ratio for GRTS andIRS versus sample size The single sup3 ratio for SSS(50) is alsoshown For the GRTS design the sup3 ratio has a maximum valueof 587 with 10 samples and gradually tapers off to 420 with50 samples Although it would be dif cult to prove we sus-pect that the gradual taper is due to lessening edge effect withincreasing sample size that is fewer of the Voronoi polygonscross the void regions in the population domain We note thatthe valleys in the sup3 ratio occur at multiples of 4 with the mostextreme dips occurring at powers of 4 This is a consequenceof quadrant-recursive partitioning maximum regularity occurswith one point from each of the four quadrants We also notethat the SSS(50) value of the sup3 ratio is 550 compared to thecorrespondingvalue of 420 for the GRTS design Inasmuch asthe GRTS is analogous to a one-sample-per-stratum SSS wewould expect the GRTS to be as ef cient as a maximally ef -cient SSS

For our extensive population study we selected 1000 sam-ples of size M D 256 from the unit square using the GRTS

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 269

Figure 4 The sup3 Ratio as a Function of Sample Size Based on 1000Replicate Samples From an Arti cial Finite Population Sample pointswere added one point at a time up to the maximum sample size of 50following reverse hierarchical order for the GRTS sample The sup3 ratio fora spatially stratied sample is also indicated on the plot

design and ordered the samples using the reverse hierarchicalorder As for the nite populationstudy we calculated the sup3 ra-tio as the points were added to the sample one at a time be-ginning with point number 10 The holes represent nontargetor access-denied elements that were a priori unknown Samplepoints that fell in the holes were discarded resulting in a vari-able number of sample points in the target domain As for thecomplete domain we ordered the points using reverse hierar-chical ordering and then calculated the sup3 ratio as the points wereadded one at a time Because the sample points that fall into thenontarget areas contribute to the sample point density but not tothe sample size the sup3 ratio was plotted versus point density

We used three different distributions of hole size constantlinearly increasing and exponentially increasing In each casethe holes comprise 20 of the domain area Figure 5 shows the

Figure 5 Void Patterns Used to Simulate InaccessiblePopulation El-ements

Figure 6 The sup3 Ratio as a Function of Point Density Based on 1000Replications of a Sample of Size 256 (mdashmdash continuous domain with novoids iexcl iexcl iexcliexcl exponentialy increasing polygon size - - - - - linearly in-creasing polygon size cent cent centmdashcent cent centmdash constant polygon size)

placementof the holes for each scenario and Figure 6 shows thesup3 ratio for all four scenarios no voids exponentiallyincreasinglinearly increasing and constant size

In every scenario the variance ratio is much less than 1Except for small sample sizes the ratio stays in the range of2 to 4 The gradual decrease as the sample size increases is dueto the decreasing impact of the boundaryas the sample size in-creases the proportion of polygons that intersect the boundarydecreases A similar effect is seen with the different inaccessi-bility scenarios even though the inaccessible area is constantthe scenarios with greater perimeter cause more increase invariance

4 STATISTICAL PROPERTIES OF GRTS DESIGN

41 Estimation

The GRTS design produces a sample with speci ed rst-order inclusion probabilities so that the HorvitzndashThompson(Horvitz and Thompson 1952) estimator or its continuouspop-ulation analog (Cordy 1993 Stevens 1997) can be applied toget estimates of population characteristics Thus for examplean estimate of the population total of a response z is given byOZT D

Psi2Rzsi=frac14si Stevens (1997) provided exact ex-

pressions for second-order inclusion functions for some specialcases of a GRTS These expressions can also be used to provideaccurate approximations for the general case Unfortunatelythe variance estimator based on using these approximations inthe usual HorvitzndashThompson (HT) or YatesndashGrundyndashSen (YGYates and Grundy 1953 Sen 1953) estimator tends to be unsta-ble The design achieves spatial balance by forcing the pairwiseinclusion probability to approach 0 as the distance between thepoints in the pair goes to 0 Even though the pairwise inclu-sion density is nonzero almost everywhere any moderate-sizedsample will nevertheless have one or more pairs of points thatare close together with a correspondinglysmall pairwise inclu-sion probability For both the HT and YG variance estimatorsthe pairwise inclusion probability appears as a divisor The cor-

270 Journal of the American Statistical Association March 2004

responding terms in either HT or YG variance estimators willtend to be large leading to instability of the variance estimator

Contrast-based estimators of the form OVCtr OZT DP

i wiy2i

where yi is a contrast of the form yi DP

k cikzsk withPk cik D 0 have been discussed by several authors (Yates

1981 Wolter 1985 Overton and Stehman 1993) For an RTSdesign Overton and Stehman also considered a ldquosmoothedrdquocontrast-based estimator of the form OVSMO OZT D

Pi wizi iexcl

zcurreni 2 where zcurren

i called the smoothed value for data point zi istaken as a weighted mean of a point plus its nearest neighborsin the tessellation

Stevens and Olsen (2003) proposed a contrast-based esti-mator for the GRTS design that bears some resemblance tothe Overton and Stehman smoothed estimator The single con-trast zi iexcl zcurren

i 2 is replaced with an average of several con-trasts over a local neighborhood analogous to a tessellationcell and its nearest neighbors in the RTS design A heuris-tic justi cation for this approach stems from the observationthat the inverse images of the unit-probability intervals on theline form a random spatial strati cation of the population do-main The GRTS design conditional on the strati cation isa one-sample-per-stratumspatially strati ed sample Recall thatOZT D

Psi2Rzsi=frac14si where zsi is a sample from the

ith random stratum The selections within strata are condition-ally independent of one another so that

V OZT DX

si2R

E

microV

sup3zsi

frac14si

shyshyshystrata

acutepara

The proposed variance estimator approximates E[V zsi=

frac14sij strata] by averaging several contrasts over a localneighborhoodof each sample point The estimator is

OVNBH OZT DX

si2R

X

sj 2Dsi

wij

sup3zsj

frac14sj iexcl

X

sk2Dsi

wikzsk

frac14sk

acute2

where Dsi is a local neighborhood of the si The weightswij are chosen to re ect the behavior of the pairwise inclu-sion function for GRTS and are constrained so that

Pi wij DP

j wij D 1 Stimulation studies with a variety of scenarioshave shown the proposed estimator to be stable and nearly unbi-ased Applications with real data have consistently shown thatour local neighborhoodvariance estimator produces smaller es-timates than the HorvitzndashThompson estimator when IRS is as-sumed to approximate for the joint inclusion probabilities

42 Inverse Sampling

The reverse hierarchical ordering provides the ability to doinverse sampling that is to sample until a given number ofsamples are obtained in the target population The true inclu-sion probability in this case depends on the spatial con gura-tion of the target populationwhich may be unknownHoweverone can compute an inclusion probability that is conditional onthe achieved sample size in the target population being xedFor example suppose we want M sample points in our do-main R We do not know the exact boundaries of R but areable to enclose R in a larger set Rcurren We select a sample of sizeMcurren gt M from Rcurren using an inclusion density frac14 curren scaled so that

Table 2 Domain Area Estimates Using ConditionalInclusion Probability

Targetsamplesize

Mean estimated domain area

Exponential Constant Linear

25 8000979 7969819 801058950 7995775 7979406 8005739

100 7994983 7980543 8002237150 7994777 7997587 7995685

RRcurren frac14currens dAacutes D M curren The inclusion density for the k-point

reverse hierarchical ordered sample is frac14currenk s D k=Mcurrenfrac14currens

Using the inclusion density frac14 currenk the expected number of sam-

ples in R is

Mk DZ

R

frac14currenk s dAacutes D

Z

RcurrenIRsfrac14 curren

k s dAacutes

We cannot compute Mk because the boundaryof R is unknownbut an estimate is

OMk DX

i

IR sifrac14currenk si

frac14currenk si

DX

i

IRsi

We pick Qk so that OMk D M and base inference on frac14currenQk Thus

for example an estimate of the unknown extent of R is j ORj DPiIRsi=frac14curren

Qksi

We illustrate this using the same inaccessibility scenarios asfor the spatial balance simulation Results are summarized inTable 2 In each case the true area of R is 8 so that the esti-mator using frac14curren

Qk is either unbiased or nearly so

43 Statistical Ef ciency

As discussed in the Introduction sampling designs withsome degree of spatial regularity for example systematic grid-based or spatially strati ed designs tend to be more ef cientfor sampling natural resources than designs with no spatialstructure The GRTS design takes the concept of spatial strat-i cation carries it to an extreme and gives it exibility androbustness The basis for these claims is that for the case ofan equiprobable sample of an areal resource over a continuousconnected domain a GRTS sample with size n D 4k is a spa-tially strati ed sample with one sample point per stratum Inthis case the strata are square grid cells with a randomly locatedorigin Generally the ef ciency of a spatially strati ed sampleincreases as the number of strata increases (samples per stratumdecreases) so maximal ef ciency is obtained for a one-point-per-stratum-design Thus in this restricted case the GRTS hasthe same ef ciency as the maximally ef cient spatial strati ca-tion

The spatial regularity simulation studies provide some in-sight into less restrictive cases First the ldquono-voidrdquo case ofthe continuous domain study shows that the spatial regularityis not seriously degraded for sample sizes that are not powersof 4 so that even for intermediate sample sizes the GRTS ef- ciency should be close to the ef ciency of maximal spatialstrati cation Second the ldquoholesrdquo cases show that for irreg-ularly shaped domains GRTS maintains spatial regularity Inthis case GRTS with n D 4k is again a one-point-per-stratum

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 271

design but the strata are no longer regular polygonsNeverthe-less GRTS should have the same ef ciency as maximal strati- cation

An example of circumstances where ef ciency is dif cultto evaluate is a nite population study with variable probabil-ity and irregular spatial density In these circumstances spatialstrata can be very dif cult to form and in fact it may be impos-sible to form strata with a xed number of samples per stratumA GRTS sample achieves the regularity of a one-sample-per-stratum strati cation and so should have the same ef ciency

The overwhelming advantage of a GRTS design is not that itis more ef cient than spatial strati cation but that it can be ap-plied in a straightforward manner in circumstances where spa-tial strati cation is dif cult All of the pathologies that occurin sampling natural populations (poor frame information inac-cessibility variable probability uneven spatial pattern missingdata and panel structures) can be easily accommodated withinthe GRTS design

5 EXAMPLE APPLICATION TO STREAMS

The Indiana Department of Environmental Management(IDEM) conducts water quality and biological assessments ofthe streams and rivers within Indiana For administrative pur-poses the state is divided into nine hydrologicbasins East ForkWhite River Basin West Fork White River Basin Upper IllinoisRiver Basin Great Miami River Basin Lower Wabash RiverBasin Patoka River Basin Upper Wabash River Basin GreatLakes Basin and Ohio River Basin All basins are assessedonce during a 5-year period typically two basins are completedeach year In 1996 IDEM initiated a monitoring strategy thatused probability survey designs for the selection of samplingsite locations We collaborated with them on the survey designIn 1997 a GRTS multidensity design was implemented for theEast Fork White River Basin and the Great Miami River BasinIn 1999 another GRTS multidensity design was implementedfor the Upper Illinois Basin and the Lower Wabash These de-signs will be used to illustrate the application of GRTS surveydesigns to a linear network

The target population for the studies consists of all streamsand rivers with perennially owing water A sample frameRiver Reach File Version 3 (RF3) for the target populationis available from the US Environmental Protection Agency(Horn and Grayman 1993) The RF3 includes attributes that en-able perennial streams and rivers to be identi ed but results inan overcoverage of the target population due to coding errorsIn addition Strahler order is available to classify streams andrivers into relative size categories (Strahler 1957) A headwaterstream is a Strahler rst-order stream two rst-order streamsjoining results in a second-order stream and so on Approx-imately 60 of the stream length in Indiana is rst order

Table 3 Sample Frame Stream and River Length by Basin andStrahler Order Category

Strahler order category length (km)

Basin Total length (km) 1 2 3 C

E Fork White 6802385 3833335 2189494 779556Great Miami 2270018 1501711 621039 147268L Wabash 7601418 4632484 1331228 1637706U Illinois 5606329 4559123 500188 547018

20 is second order 10 is third order and 10 is fourth andgreater (see Table 3) In 1997 IDEM determined that the sam-ple would be structured so that approximately an equal num-ber of sites would be in rst order second and third order andfourth C order for the East Fork White River and the Great Mi-ami River basins In 1999 the sample was modi ed to have anequal number of sites in rst second third and fourth C ordercategories for the Lower Wabash and Upper Illinois basins

The GRTS multidensity survey designs were applied In bothyears six multidensity categories were used (three Strahler or-der categories in each of two basins) Although four Strahlerorder categories were planned in 1999 the stream lengths as-sociated with the third and fourth C categories were approxi-mately equal so a single category that combined the samplesizes was used To account for frame errors landowner denialsand physically inaccessible stream sites a 100 oversamplewas incorporated in 1999 The intent was to have a minimumof 38 biological sites with eld data in 1999 this was not donein 1997 Table 4 summarizes the number of sites expected andactually evaluated as well as the number of nontarget targetnonresponse and sampled sites Almost all of the nonresponsesites are due to landowner denial In 1999 the sites were usedin reverse hierarchical order until the desired number of ac-tual eld sample sites was obtained The biological sites werea nested subsample of the water chemistry sites and were takenin reverse hierarchicalorder from the water chemistry sites Fig-ures 7ndash10 show the spatial pattern of the stream networks andthe GRTS sample sites for each of the four basins by Strahlerorder categories Although this is an example of a single real-izationof a multidensityGRTS design all realizationswill havea similar spatial pattern Prior to statistical analysis the initialinclusion densities are adjusted to account for use of oversam-ple sites by recalculating the inclusion densities by basin

Indiana determined two summary indices related to the eco-logical conditionof the streams and rivers the IBI score whichis a sh community index of biological integrity (Karr 1991)that assesses water quality using resident sh communities asa tool for monitoring the biological integrity of streams and theQHEI score which is a habitat index based on the Ohio Envi-ronmental Protection Agency qualitative habitat evaluation in-dex (see IDEM 2000 for detailed descriptions of these indices)

Table 4 Survey Design Sample Sizes for Basins Sampled in 1997 and 1999

Expected Evaluated Nontarget Target Nonresponse Water chemistry BiologicalBasin sample size sample size sites sites sites sites sites

E Fork White 60 60 5 55 9 35 34Great Miami 40 40 12 28 5 19 19L Wabash 128 91 11 80 9 71 39U Illinois 128 85 8 77 5 72 41

272 Journal of the American Statistical Association March 2004

Figure 7 East Fork White River Basin Sample Sites by Multidensity Categories

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 273

Figure 8 Great Miami River Basin Sample Sites by Multidensity Categories

274 Journal of the American Statistical Association March 2004

Figure 9 Upper Illinois River Basin Sample Sites by Multidensity Categories

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 275

Figure 10 Lower Wabash River Basin Sample Sites by Multidensity Categories

276 Journal of the American Statistical Association March 2004

Table 5 Population Estimates With IRS and Local Variance Estimates

Indicator IRS Local DifferenceSubpopulation score N sites Mean std err std err ()

L Wabash IBI 39 361 21 14 iexcl568U Illinois IBI 41 325 17 13 iexcl448E Fork White IBI 32 351 13 12 iexcl223Great Miami IBI 19 408 27 22 iexcl335L Wabash QHEI 39 556 23 16 iexcl522U Illinois QHEI 41 433 21 16 iexcl395E Fork White QHEI 34 543 21 15 iexcl459Great Miami QHEI 19 678 22 19 iexcl260

Table 5 summarizes the population estimates for IBI andQHEI scores for each of the four basins The associated stan-dard error estimates are based on the HorvitzndashThompson ratiovariance estimator assuming an independent random sampleand on the local neighborhoodvariance estimator described inSection 41 On average the neighborhood variance estimatoris 38 smaller than the IRS variance estimator Figure 11 il-lustrates the impact of the variance estimators on con denceintervals for cumulative distribution function estimates for theLower Wabash Basin

6 DISCUSSION

There are a number of designs that provide good disper-sion of sample points over a spatial domain When we appliedthese designs to large-scale environmental sampling programsit quickly became apparent that we needed a means (1) toaccommodate variable inclusion probability and (2) to adjustsample sizes dynamically These requirements are rooted inthe very fundamentals of environmental management The rstrequirement stems from the fact that an environmental re-source is rarely uniformly important in the objective of themonitoring there are always scienti c economic or politi-cal reasons for sampling some portions of a resource moreintensively than others Two features of environmental moni-toring programs drive the second requirement First these pro-grams tend to be long lived so that even if the objectives ofthe program remain unchangedthe ldquoimportantrdquosubpopulationschange necessitating a corresponding change in sampling in-tensity Second a high-quality sampling frame is often lack-ing for environmental resource populationsAs far as we knowthere is no other technique for spatial sampling that ldquobalancesrdquoover an intensity metric instead of a Euclidean distance metricor permits dynamic modi cation of sample intensity

Adaptive sampling (Thompson 1992 pp 261ndash319) is an-other way to modify sample intensity However there are somesigni cant differences between GRTS and adaptive sampling inthe way the modi cation is accomplished Adaptive samplingincreases the sampling intensity locally depending on the re-sponse observed at a sample point whereas the GRTS intensitychange is global

The GRTS rst-order inclusion probability (or density) canbe made proportional to an arbitrary positive auxiliary vari-able for example a signal from a remote sensing platformor a sample intensity that varies by geographical divisions orknown physical characteristics of the target populationIn somepoint and linear situations it may be desirable to have thesample be spatially balanced with respect to geographic spacerather than with respect to the population density This can be

Figure 11 Stream Network and Sample Site Spatial Patterns by Mul-tidensity Category for the Lower Wabash Basin (mdashmdash- CDF estimateiexcl iexcl iexcl iexcl iexcl 95 local condence limits cent cent cent cent cent cent cent cent cent95 IRS condencelimits)

achieved by making the inclusion probability inversely propor-tional to the population density Although the development ofGRTS has focused on applications in geographic space it canbe applied in other spaces For exampleone applicationde nedtwo-dimensionalspace by the rst two principal componentsofclimate variables and selected a GRTS sample of forest plots inthat space

The computational burden in hierarchical randomization canbe substantial However it needs to be carried out only to a res-olution suf cient to obtain no more than one sample point persubquadrant The actual point selection can be carried out bytreating the subquadrants as if they are elements of a nitepopulation selecting the M subquadrants to receive samplepoints and then selecting one population element at randomfrom among the elements contained within the selected sub-quadrants according to the probability speci ed by frac14

Reverse hierarchical ordering adds a feature that is im-mensely popular with eld practitioners namely the ability toldquoreplacerdquo samples that are lost due to being nontarget or inac-cessible Moreover we can replace the samples in such a wayas to achieve good spatial balance over the population that isactually sampleable even when sampleability cannot be deter-mined prior to sample selectionOf course this feature does noteliminate the nonresponse or the bias of an inference to the in-accessible population It does however allow investigators toobtain the maximum number of samples that their budget willpermit them to analyze

Reverse hierarchical ordering has other uses as well One isto generate interpenetrating subsamples (Mahalanobis 1946)For example 10 interpenetrating subsamples from a samplesize of 100 can be obtained simply by taking consecutive sub-sets of 10 from the reverse hierarchical ordering Each subsethas the same properties as the complete design Consecutivesubsets can also be used to de ne panels of sites for applica-tion in surveys over time for example sampling with partial re-placement (Patterson 1950 Kish 1987 Urquhart Overton andBirkes 1993)

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 277

APPENDIX A PROOF OF LEMMA

Lemma Let f I2 I be a 1ndash1 quadrant-recursive function andlet s raquo UI2 Then limjplusmnj0 Efjf s iexcl f s C plusmnjg D 0

Proof If for some n gt 0 s and s C plusmn are in the same subquad-rant Qn

jk then f s and f s C plusmn are in the same interval J nm so

that jf s iexcl f s C plusmnj middot 1=4n The probability that s and s C plusmn arein the same subquadrant is the same as the probability of the ori-gin and plusmn D plusmnx plusmny being in the same cell of a randomly locatedgrid with cells congruent to Qn

jk For plusmnx plusmny middot 1=2n that probability

is equal to jQn0 Qnplusmnj=jQn0j D 1 iexcl 2nplusmnx C plusmny C 4nplusmnxplusmny where Qnx denotes a polygon congruent to Qn

jk centered on x

For Ds plusmn D jf s iexcl f s C plusmnj then we have that P D middot 1=4n cedil1 iexcl 2nplusmnx C plusmny C 4nplusmnxplusmny Thus the distribution function FD of D isbounded below by

FD u cedil

8gtlt

gt

0 u middot 14n

1 iexcl 2nplusmnx C plusmny C 4nplusmnxplusmny u gt1

4n

Because D is positive and bounded above by 1

E[Dplusmn] D 1 iexclZ 1

0FD udu

middot 1 iexclraquo

0

4nC

sup31 iexcl 1

4n

acuteiexcl 2nplusmnx C plusmny C 4nplusmnxplusmny

frac14

For xed n we have that

limjplusmnj0

E[Dplusmn] middot 14n

but this holds for all n so that

limjplusmnj0

E[Dplusmn] D 0

APPENDIX B PROOF THAT THE PROBABILITYINCLUSION FUNCTION EQUALS THE

TARGET INTENSITY FUNCTION

We need the measure space XB Aacute where X is the unit inter-val I D 0 1] or the unit square I2 D 0 1] pound 01] and the rele-vant frac34 elds are BI and BI2 the frac34 elds of the Borel subsetsof I and I2 respectively For each of the three types of populationswe de ne a measure Aacute of population size We use the same symbolfor all three cases but the speci cs vary from case to case For a -nite population we take Aacute to be counting measure restricted to R sothat for any subset B 2 BI2 AacuteB is the number of population ele-ments in B R For linear populations we take AacuteB to be the lengthof the linear population contained within B Clearly Aacute is nonnega-tive countably additive de ned for all Borel sets and Aacute D 0 soAacute is a measure Finally for areal populations we take AacuteB to be theLebesgue measure of B R

We begin by randomly translating the image of R in the unit squareby adding independent U0 1=2 offsets to the xy coordinates Thisrandom translation plays the same role as random grid location doesin an RTS design namely it guarantees that pairwise inclusion prob-abilities are nonzero In particular in this case it ensures that any pairof points in R has a nonzero chance of being mapped into differentquadrants

Let frac14s be an inclusion intensity function that is a function thatspeci es the target number of samples per unit measure We assumethat any linear population consists of a nite number m of smoothrecti able curves R D

SmiD1fdegit D xi t yi t jt 2 [ai bi ]g with

xi and yi continuous and differentiable on [ai bi ] We set frac14s equalto the target number of samples per unit length at s for s 2 L and

equal to zero elsewhere For example if the linear population werea stream network frac14s would specify the desired number of samplesper kilometer of stream at the point s Finally an areal population isa nite collection of closed polygons In this case frac14s speci es thetarget intensity as number of samples per unit area Note that for one-and two-dimensional resources frac14s could be a continuous smoothlyvarying function Formally we require frac14s to be bounded and mea-surable strictly positive on R and zero elsewhere and scaled so thatM D

RR frac14s dAacutes From these de nitions of frac14cent and Aacutecent it follows

that wB DR

B frac14sdAacutes is a measure and that wB is the targetnumber of samples in B In particular M D wI2 is the target samplesize In the following discussion we assume that M is an integer thenoninteger case is a simple extension

Let f cent be a quadrant-recursive function that maps I2 into I Be-cause BI can be generated by sets of the form J n

m and BI2 can begenerated by sets of the form Qn

j k both f and f iexcl1 are measurable

Because f is measurable f iexcl1B is measurable for B 2 BI so thatQF x D

Rf iexcl10x] frac14sdAacutes exists In fact QF is a distribution func-

tion that is nonnegative increasing and right continuous For linearand areal resources QF is a continuous increasing function but for -nite resourcepopulations QF is a step function with jumps at the imagesof populationelements We can modify QF to obtain continuity in the -nite case via linear interpolation that is let xi i D 1 N be the or-dered jump points of QF set x0 D 0 xNC1 D 1 and for xi lt x middot xiC1set F x D QF x C QFxiC1 iexcl QFxi =xiC1 iexcl xi x iexcl xi If we setF D QF for the linear and areal case then in every cases we have thatF is a continuous distribution function with range 0M]

In the nite case Fiexcl1 is single-valued so that Gy D minxi jF iexcl1y middot xi is well de ned In the linear and areal cases F iexcl1

may not be single-valued Points that are in the unit square but notin R lead to ats in F that correspond to regions in the unit squarewith frac14s D 0 However Fiexcl1y always will be closed and boundedso that Gy D minfxjx 2 Fiexcl1yg is well de ned In all cases theintensity function frac14 is positive at s D f iexcl1Gy that is there isa population element at s Thus f iexcl1 plusmn G maps 0M] onto the tar-get population that is f iexcl1 plusmn G associates every point in 0 M] witha unique element in the population

It follows that selecting a sample from 0 M] also selects pop-ulation elements via the mapping f iexcl1 plusmn G To get a sample withan inclusion function equal to the target inclusion density we selecta sample from 0M] by splitting the range into M unit-length in-tervals 01] 1 2] M iexcl 1M] and picking one point in eachinterval Because of hierarchical randomizationwe gain no additionalldquorandomnessrdquo by picking the points independently so we use system-atic sampling with a random start and a unit-length selection intervalThe selection procedure de nes an inclusion probability density func-tion on 0M] with a correspondingmeasure PM cent Note that PM co-incides with Lebesgue measure on 0 M] in particular the measureof a subinterval of 0 M] is its length We induce a measure P1 on I

via P1B DRGiexcl1B dP M and in turn induce a measure P2 on I2

via P2B DR

f iexcl1B dP 1 The measure P2 is an inclusion probability

measure on I2 and P2B D wB so the sample selection methoddoes give an inclusion probability function equal to the target sampleintensity function

[Received August 2002 Revised September 2003]

REFERENCES

Bellhouse D R (1977) ldquoSome Optimal Designs for Sampling in Two Dimen-sionsrdquo Biometrika 64 605ndash611

Bickford C A Mayer C E and Ware K D (1963) ldquoAn Ef cient Sam-pling Design for Forest Inventory The Northeast Forest Resurveyrdquo Journalof Forestry 61 826ndash833

278 Journal of the American Statistical Association March 2004

Breidt F J (1995) ldquoMarkov Chain Designs for One-per-Stratum SamplingrdquoSurvey Methodology 21 63ndash70

Brewer K R W and Hanif M (1983) Sampling With Unequal ProbabilitiesNew York Springer-Verlag

Cochran W G (1946) ldquoRelative Accuracy of Systematic and Strati ed Ran-dom Samples for a Certain Class of Populationsrdquo The Annals of Mathemati-cal Statistics 17 164ndash177

Cordy C (1993) ldquoAn Extension of the HorvitzndashThompson Theorem to PointSampling From a Continuous Universerdquo Probability and Statistics Letters18 353ndash362

Cotter J and Nealon J (1987) ldquoArea Frame Design for Agricultural SurveysrdquoUS Department of Agriculture National Agricultural Statistics Service Re-search and Applications Division Area Frame Section

Dalenius T Haacutejek J and Zubrzycki S (1961) ldquoOn Plane Sampling and Re-lated Geometrical Problemsrdquo in Proceedings of the 4th Berkeley Symposiumon Probability and Mathematical Statistics 1 125ndash150

Das A C (1950) ldquoTwo-Dimensional Systematic Sampling and the AssociatedStrati ed and Random Samplingrdquo Sankhya 10 95ndash108

Gibson L and Lucas D (1982) ldquoSpatial Data Processing Using BalancedTernaryrdquo in Proceedings of the IEEE Computer Society Conference on Pat-tern Recognition and Image Processing Silver Springs MD IEEE ComputerSociety Press

Gilbert R O (1987) Statistical Methods for Environmental Pollution Moni-toring New York Van Nostrand Reinhold

Hausdorff F (1957) Set Theory New York ChelseaHazard J W and Law B E (1989) Forest Survey Methods Used in the USDA

Forest Service EPA6003-89065 Corvallis Oregon US EnvironmentalProtection Agency Of ce of Research and Development Environmental Re-search Laboratory

Horn C R and Grayman W M (1993) ldquoWater-Quality Modeling With EPARiver Reach File Systemrdquo Journal of Water Resources Planning and Man-agement 119 262ndash274

Horvitz D G and Thompson D J (1952) ldquoA Generalization of SamplingWithout Replacement From a Finite Universerdquo Journal of the American Sta-tistical Association 47 663ndash685

Iachan R (1985) ldquoPlane Samplingrdquo Statistics and Probability Letters 50151ndash159

IDEM (2000) ldquoIndiana Water Quality Report 2000rdquo Report IDEM34020012000 Indiana Department of Environmental Management Of ce of Wa-ter Management Indianapolis Indiana

Insightful Corporation (2002) ldquoS-PLUS 6 for Windows Language ReferencerdquoInsightful Corporation Seattle WA

Karr J R (1991) ldquoBiological Integrity A Long Neglected Aspect of WaterResource Managementrdquo Ecological Applications 1 66ndash84

Kish L (1987) Statistical Design for Research New York WileyMahalanobis P C (1946) ldquoRecent Experiments in Statistical Sampling in

the Indian Statistical Instituterdquo Journal of the Royal Statistical Society 109325ndash370

Mark D M (1990) ldquoNeighbor-Based Properties of Some Orderings of Two-Dimensional Spacerdquo Geographical Analysis 2 145ndash157

Mateacutern B (1960) Spatial Variation Stockholm Sweden Meddelanden fraringnStatens Skogsforskningsinstitut

Messer J J Arsiss C W Baker J R Drouseacute S K Eshleman K NKaufmann P R Linthurst R A Omernik J M Overton W S Sale M JSchonbrod R D Stambaugh S M and Tuschall J R Jr (1986) Na-tional Surface Water Survey National Stream Survey Phase I-Pilot SurveyEPA-6004-86026 Washington DC US Environmental ProtectionAgency

MunhollandP L and Borkowski J J (1996) ldquoSimple Latin Square SamplingC 1 A Spatial Design Using Quadratsrdquo Biometrics 52 125ndash136

Olea R A (1984) ldquoSampling Design Optimization for Spatial FunctionsrdquoMathematical Geology 16 369ndash392

Overton W S and Stehman S V (1993) ldquoProperties of Designs for SamplingContinuous Spatial Resources From a Triangular Gridrdquo Communications inStatistics Part AmdashTheory and Methods 22 2641ndash2660

Patterson H D (1950) ldquoSampling on Successive Occasions With Partial Re-placement of Unitsrdquo Journal of the Royal Statistical Society Ser B 12241ndash255

Peano G (1890) ldquoSur Une Courbe Qui Remplit Toute Une Aire Planerdquo Math-ematische Annalen 36 157ndash160

Quenouille M H (1949) ldquoProblems in Plane Samplingrdquo The Annals of Math-ematical Statistics 20 335ndash375

Saalfeld A (1991) ldquoConstruction of Spatially Articulated List Frames forHousehold Surveysrdquo in Proceedings of Statistics Canada Symposium 91Spatial Issues in Statistics Ottawa Canada Statistics Canada pp 41ndash53

Sen A R (1953) ldquoOn the Estimate of the Variance in Sampling With Vary-ing Probabilitiesrdquo Journal of the Indian Society of Agricultural Statistics 7119ndash127

Simmons G F (1963) Introduction to Topology and Modern Analysis NewYork McGrawndashHill

Stehman S V and Overton W S (1994) ldquoEnvironmental Sampling and Mon-itoringrdquo in Handbook of Statistics Vol 12 eds G P Patil and C R RaoAmsterdam The Netherlands Elsevier Science pp 263ndash305

Stevens D L Jr (1997) ldquoVariable Density Grid-Based Sampling Designs forContinuous Spatial Populationsrdquo Environmetrics 8 167ndash195

Stevens D L Jr and Olsen A R (1999) ldquoSpatially Restricted Surveys OverTime for Aquatic Resourcesrdquo Journal of Agricultural Biological and Envi-ronmental Statistics 4 415ndash428

(2000) ldquoSpatially-Restricted Random Sampling Designs for Design-Based and Model-Based Estimationrdquo in Accuracy 2000 Proceedings of the4th International Symposium on Spatial Accuracy Assessment in Natural Re-sources and Environmental Sciences Delft The Netherlands Delft Univer-sity Press pp 609ndash616

(2003) ldquoVariance Estimation for Spatially Balanced Samples of Envi-ronmental Resourcesrdquo Environmetrics 14 593ndash610

Strahler A N (1957) ldquoQuantitative Analysis of Watershed GeomorphologyrdquoTransactions of the American Geophysical Union 38 913ndash920

Thompson S K (1992) Sampling New York WileyUrquhart N S Overton W S and Birkes D S (1993) ldquoComparing

Sampling Designs for Monitoring Ecological Status and Trends Impact ofTemporal Patternsrdquo in Statistics for the Environment eds V Barnett andK F Turkman New York Wiley pp 71ndash86

USGS (1999) ldquoThe National Hydrography Datasetrdquo Fact Sheet 106-99 USGeological Survey

Wolter K (1985) Introduction to Variance Estimation New York Springer-Verlag

Wolter K M and Harter R M (1990) ldquoSample Maintenance Based on PeanoKeysrdquo in Proceedings of the 1989 International Symposium Analysis of Datain Time Ottawa Canada Statistics Canada pp 21ndash31

Yates F (1981) Sampling Methods for Censuses and Surveys (4th ed) Lon-don Grif n

Yates F and Grundy P M (1953) ldquoSelection Without Replacement FromWithin Strata With Probability Proportional to Sizerdquo Journal of the RoyalStatistical Society Ser B 15 253ndash261

Page 2: Spatially Balanced Sampling of Natural Resources · Spatially Balanced Sampling of Natural Resources DonL.STEVENSJr. and Anthony R. OLSEN The spatial distribution of a natural resource

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 263

within the region Moreover the two scenarios can occur incombination so that we have a need to conform to both ele-mentwise and regionwise variation in inclusion probability

A practical complication frequently encountered in envi-ronmental sampling is the dif culty of obtaining an accuratesampling frame In many instances available sampling framesinclude a substantial portion of nontarget elements For exam-ple we could use the National Hydrography Dataset (NHD)available from the US Geological Survey (USGS) as a sampleframe for perennial streams (USGS 1999) Although attributeswithin NHD can be used to identify a subset of NHD that moreclosely matches the target population of perennial streams thesubset still includes many ephemeral or intermittent streamsor long-dry channels especially in the more arid sections ofthe western United States Another problem is that much of theresource we might like to sample is inaccessible because ofphysical location safety or lack of access permission from thelandowner In some cases it is possible to lose 50 or moreof potential sample points because of lack of access That issigni cant nonresponse can be an issue in environmental sur-veys Both problems result in fewer samples being collectedthan planned If estimates of the percentage of the samplingframe that is nontarget or the percentage of inaccessible sitesare available then a common practice is to increase the plannedsample size Our experience is that such estimates are not avail-able or at best are poorly known

Some of the attributes of resource populations that in uencesampling design are spatial pattern in the measured or observedresponse uneven spatial distributionsof the populationand dif- culty in obtaining an adequate frame Spatial pattern in theresponse arises because nearby units interact with one anotherand tend to be in uenced by the same set of natural and an-thropogenic factors For example neighboring trees in a forestinteract by competing for energy and nutrients and are in u-enced by the same set of physical and meteorological condi-tions the same level of air- or water-borne pollutants and thesame set of landscape disturbances The pattern in the responsemay show up either as a gradient or as a mosaic A number ofstudies have concludedthat regularly spaced design points (egsystematic designs) are optimal for a variety of reasonable spa-tial correlation functions (see eg Cochran 1946 Quenouille1949 Das 1950 Mateacutern 1960 Dalenius Haacutejek and Zubrzycki1961 Bellhouse 1977 Iachan 1985)

The concept that some degree of spatial regularity should beused for sampling for environmental populations is well estab-lished Accordingly there are numerous paradigms for incor-porating the spatial aspect of an environmental population intoa sample Area sampling partitions the domain of the popula-tion into polygons which can be treated either as strata or aspopulation units themselves Systematic sampling using a reg-ular grid is often applied (Bickford et al 1963 Messer et al1986 Hazard and Law 1989) as are several variants that per-turb the strict alignment (Olea 1984) Spatial strati cation isalso frequently used with regular polygonsnatural boundariespolitical boundaries or arbitrary tessellations as strata Max-imal strati cation that is one or two points per stratum hasbeen viewed as the most ef cient To this end MunhollandandBorkowski (1996) used a Latin square with a single additionalindependent sample to achieve a spatially balanced sample

Breidt (1995) used a Markov process to generate a one-unit-per-stratum spatially distributed sample Both of these techniquesselect cells in a regular grid Another approach is to use spaceto order a list frame of the ( nite) population and then use theorder of the list to structure the sample say by de ning strata assuccessive segments of the ordered list or by systematic randomsampling For example Saalfeld (1991) drew on graph theoryto de ne a tree that leads to a spatially articulated list frame andthe National Agricultural Statistics Service has used serpentinestrips (Cotter and Nealon 1987) to order primary sample unitswithin a state A related idea that originated in geography isthe general balanced ternary (GBT) spatial addressing scheme(Gibson and Lucas 1982)The concept behind a GBT address isrelated to the concept of space- lling curves such as rst con-structed by Peano (1890) or the Hilbert curve (Simmons 1963)Stevens and Olsen (1999) used a similar concept recursive par-titioningtogetherwith hierarchical randomizationto distributesample points throughspace and time Wolter and Harter (1990)used a construction similar to Peanorsquos to construct a ldquoPeanokeyrdquo to maintain the spatial dispersion of a sample as the under-lying population experiences births or deaths Saalfeld (1991)also used the Peano key to maintain spatial dispersion of a sam-ple

The foregoing cited methods all do reasonably well at get-ting a spatially balanced sample under favorable circumstancesbut have dif culties with some aspect of environmental popu-lations For example spatial strati cation can be applied to -nite linear and areal populations However de ning strata for nite or linear populations with variable probability and sub-stantial variation in spatial density can be dif cult maximal ef- ciency is obtained for one or two samples per stratum To doso we need some means to split the population into spatiallycontiguousstrata We could simply adopt equal-sized strata thattile the population domain which usually results in a variablenumber of samples per stratum and noninteger expected sam-ples sizes (We illustrate this approach in Sec 3) Alternativelywe could try to develop unequal-area strata with the same ornearly so expected number of samples (total inclusion proba-bility) in each stratum For a small nite population the stratacould be developed by inspection For a large population saythe 21000 lakes in the northeasternUnited States an automatedstrati cation procedure is necessary Developing such a proce-dure is a nontrivial task Small sample sizes per strata are goodfor ef ciency but cause the greatest loss of ef ciency in thepresence of nonresponse Suppose we have a two-sample-per-stratum design with a moderate rate of nonresponse say 25We are almost certain to lose both samples from some strataIf we replace both sample points we double the inclusion prob-ability for those strata Moreover it is quite possible that the re-placement points will also be nonresponse points so we end uptripling or quadrupling the inclusion probability The result isan unintentionalimbalance in inclusionprobability those stratawith high nonresponseget less weight in the analysisDeviationfrom the intended inclusion probability that introduces morevariation in the weight of the sample points results in loss ofef ciency Similar arguments can be made for the other meth-ods for achieving spatial balance

Sampling the gamut of natural resources requires a tech-nique that can select a spatially balanced sample of nite

264 Journal of the American Statistical Association March 2004

linear and areal resources with patterned and possibly peri-odic responses using arbitrarily variable inclusion probabilitywith imperfect frame information in the presence of substan-tial nonresponse In the design discussed here we generalizethe concept of spatial strati cation to create a very powerfuland exible technique for selecting a spatially well-distributedprobability sample that works under all of the preceding cir-cumstances The technique is based on creating a functionthat maps two-dimensional space into one-dimensional spacethereby de ning an ordered spatial address We use a restrictedrandomizationcalled hierarchical randomization(HR) (Stevensand Olsen 2000) to randomly order the address and then ap-ply a transformation that induces an equiprobable linear struc-ture Systematic sampling along the randomly ordered linearstructure is analogous to sampling a random tessellation oftwo-dimensional space and results in a spatially well-balancedrandom sample We call the resulting design a generalizedrandom-tessellation strati ed (GRTS) design We develop thedesign in a general setting that applies to nite linear and arealresources and that accommodates arbitrary inclusion probabil-ity functions A particularly favorable feature is that we can dy-namically add points to the sample as we discover nontargetor inaccessible points at the same time maintaining a spatiallywell-balanced sample Features of the design are demonstratedwith a simulation study and are illustrated with an applicationto rivers and streams in Indiana

2 GENERALIZED RANDOMndashTESSELLATIONSTRATIFIED DESIGN

Before presenting the theoretical development of the GRTSdesign we give a heuristic overview of the process Assumethat the sample frame consists of N points located within a ge-ographic region Assign each point a unit length and place eachpoint in some order (say randomly) on a line The line haslength N units Select a systematic sample of size n from theline by dividing the line into N=n length intervals randomlyselect a starting point between 0 N=n] say k and then takeevery k C iN=nth point for i D 1 n iexcl 1 If the point oc-curs within one of the units then that unit is selected (Brewerand Hanif 1983) For a linear resource use the actual lengthof the units to construct the line For an areal resource ran-domly place a systematic grid over the region randomly se-lect a point in each grid cell and then proceed as in the pointcase A GRTS sample results when a process termed hierarchi-cal randomization is used to place the points on the line Firstrandomly place a 2 pound 2 square grid over the region and placethe cells in random order in a line For each cell repeat thesame process randomly ordering the subcells within each orig-inal cell This second step results in 16 cells in a line Continuethe process until at most one population point occurs in a cellUse the random order of the cells to place the points on the lineThis hierarchical randomizationprocess maps two-dimensionalspace into one-dimensionalspace while preserving spatial rela-tionships as much as possible The combination of hierarchicalrandomization to create the line and systematic sampling witha random start results in a spatially balanced equal probabilitysample Unequal probability sampling is implemented by giv-ing each point a length proportional to its inclusion probability

Stevens (1997) derived inclusion and joint inclusion func-tions for several grid-based designs that were precursors toGRTS designs and share some of their properties The de-signs are all generalizations of the random-tessellation strati- ed (RTS) design (Dalenius et al 1961 Olea 1984 Overtonand Stehman 1993) The RTS design selects random points inspace via a two-step process First a regular tessellation co-herent with a regular grid is randomly located over the do-main to be sampled and second a random point is selectedwithin each random tessellation cell The RTS design is avariation on a systematic design that avoids the alignmentproblems that can occur with a completely regular systematicdesign Like a systematic design a RTS design does not allowvariable probability spatial sampling Stevens (1997) intro-duced the multiple-density nested random-tessellation strati- ed (MD-NRTS) design to provide for variable spatial samplingintensityThe geometric conceptunderlyingthe MD-NRTS wasthe notion of coherent intensi cation of a grid that is addingpoints to a regular grid in such a way as to result in a denserregular grid with similarly shaped but smaller tessellation cellsWe have since extended the same notion by generalizing to aprocess that creates a potentially in nite series of nested co-herent grids In the limit the process results in a function thatmaps two-dimensional space into one-dimensional space

We can cover nite linear and areal populations with thesame developmentif we work in the context of general measureand integration theory Let R be the domain of the populationwe wish to sample that is the set of points occupied by ele-ments of the populationWe require that R be a bounded subsetof R2 Thus R can be enclosed in a bounded square so that byscaling and translation we can de ne a 1ndash1 map from R into01=2] pound 01=2] the lower left quadrant of the unit square(We map to the lower left quadrant so that we can add a ran-dom offset to the image of R and stay within the unit squareThe random offset guarantees that the points from any pair canend up in different quadrants) Clearly every point in the im-age is associated with a unique point in R and vice versa sohenceforth we identify R with its image in the unit square

21 Random Quadrant-Recursive Maps

The heart of the GRTS sample selection method is a functionf that maps the unit square I2 D 01]pound 0 1] onto the unit in-terval ID 0 1] To be useful in achieving a spatially balancedsample f must preserve some proximity relationships so weneed to impose some restrictions on the class of functions tobe considered Mark (1990) in studying discrete two- to one-dimensionalmaps de ned a property called quadrant recursivewhich required that subquadrants be mapped onto sets of adja-cent points To de ne the continuousanalog let

Qnjk D

sup3j

2n

j C 12n

parapound

sup3k

2n

k C 12n

para

j k D 0 1 2n iexcl 1

and let

J nm D

sup3m

4n

m C 14n

para m D 0 1 4n iexcl 1

A function f I2 I is quadrant recursive if for all n cedil 0there is some m 2 f0 1 4n iexcl 1g such that f Qn

jk D J nm

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 265

We can view a quadrant-recursive function as being de nedby the limit of successive intensi cations of a grid coveringthe unit square where a grid cell is divided into four sub-cells each of which is subsequently divided into four sub-subcells and so on If we carried this recursion to the limitand paired grid points with an address based on the order inwhich the divisions were carried out where each digit of theaddress represented a step in the subdivision then we wouldobtain a quadrant-recursive function For example suppose webegin with a point at (1 1) and replace it with four pointsp3 D 11 p2 D 1 1=2 p1 D 1=2 1 and p0 D 1=21=2The next step of the recursion replaces each of the rst fourpoints p0 p3 with pi iexcl f0 0 0 1 1 0 1 1g=22Thus the point p2 D 1 1=2 is replaced with the four pointsp23 D 1 1=2 p22 D 11=4 p21 D 3=4 1=2 and p20 D3=4 1=4 The nth step replaces each of the 4n points pi1 i2 centcentcentinwith pi1 i2centcentcentin iexcl f0 0 0 1 1 0 1 1g=2nC1

A spatially referenced address can be constructed followingthe pattern of the partitioning with each new partition addinga digit position to the address Thus in the preceding exam-ple the four points in the rst group are assigned the addresses3 2 1 and 0 where 3 is the original point at 11 The suc-cessor points to point 2 get the addresses 23 22 21 and 20and so forth The addresses induce a linear ordering of the sub-quadrants Moreover if we carry the process to the limit andtreat the resulting address as digits in a base-4 fraction [egtreat 22103cent cent cent as the base-4 number (22103cent cent cent4] then thecorrespondence between grid point and address is a quadrant-recursive function

Recursive partitioning generates a nested hierarchy of gridcells The derived addressing has the property that all succes-sor cells of a cell have consecutiveaddresses Thus a path fromcell to cell following the recursive partitioning address orderwill connect all successor cells of cell 0 before reaching anysuccessor of cell 2 (Fig 1)

A 1ndash1 continuous mapping of I2 onto I is not possible soquadrant-recursive functions are not continuousHowever theydo have the property that all points in a quadrant are mappedonto an interval all points in any one of the four subquadrantsof a quadrant are mapped onto an interval and so on ad in ni-tum This property tends to preserve proximity relationshipsthat is if s is ldquoclose tordquo t then f s should ldquotend to be close tordquof t In Appendix A we make this statement more precise byshowing that if the origin is located at random and s is chosenat random from I2 then limjplusmnj0 E[jf s iexcl f s C plusmnj] D 0 In-tuitively two elements that are close together will tend to fallin the same randomly located cell of a size that decreases asthe distance between points decreases Because the two ele-ments are covered by the same cell their addresses match tothe level of that cell and thus in expectation their addresseswill be close

A fundamental 1ndash1 quadrant-recursive map is de ned bydigit interweaving Let s D x y be a point in I2 Each of thecoordinates has an expansion as a binary fraction of the formx D x1x2x3 cent cent cent y D y1y2y3 cent cent cent where each xi and yi is either0 or 1 De ne f0s by alternating successive digits of x and y that is f0s D x1y1x2y2 cent cent cent Clearly f0 would be 1ndash1 ex-cept for different expansions of the same number For example1 and 011111cent cent cent where the 1s continue inde nitely are two

Figure 1 First Four Levels of a Quadrant-Recursive Partitioning ofthe Unit Square The address associated with the cross-hatched cellis 213

representations of the number 1=2 If we always use the binaryrepresentation with an in nite number of 1s then f0 is 1ndash1Moreover every point in I is the image of a point in I2 whichis obtained by ldquodigit splittingrdquo That is if t D t1t2t3 cent cent cent is in Ithen s D f iexcl1

0 t D t1t3t5 cent cent cent t2 t4t6 cent cent cent is the preimage of t Both f0 and f iexcl1

0 are 1ndash1 if we always use the representationwith an in nite number of 1s (Hausdorff 1957 p 45) To showthat f0 is quadrant recursive note that for s 2 Qn

jk the rst4n digits of f0s are xed so f0s 2 J n

m where m is de nedby the rst 4n digits Conversely the preimages of every t 2 J n

m

have the same rst 2n digits and so must be in the same Qnjk

Figure 1 shows the rst four levels of the recursive parti-tioning of the unit square The address of the cross-hatchedsubquadrant is as a base-4 fraction (213)4 and the associ-ated grid point is at (3=4 1=2) the upper right corner ofthe subquadrant Following the convention of having an in- nite number of 1s in the expansion we have 3=41=2 D11 12 D 1011111 cent cent cent 0111111 cent cent cent2 Digit interweavinggives the image 10011111cent cent cent2 D 2133333 cent cent cent4 of whichthe rst three digits are the subquadrant address If we carriedthe recursive partitioning to the limit every point in the sub-quadrant would be assigned an address beginningwith 2134

The class of all quadrant-recursive functions can be gener-ated from the function f0 which is de ned by digit interweav-ing by permuting the order in which subquadrants Qn

jk are

paired with the intervals J nm For example for n D 1 f0Q1

jk DJ 1

2jCk We obtain a different quadrant-recursivefunctionby per-muting the subscripts f0 12 3g of the image intervals Thusunder the permutation iquest D f21 30g we get a function suchthat fiquest Q1

jk D J 1iquest 2jCk so that fiquest Q1

00 D J 12 fiquest Q1

01 D J 11

fiquest Q110 D J 1

3 and fiquest Q111 D J 1

0 To see that the class ofall quadrant-recursive functions is generated by such permu-tations express each number in I as a base-4 number that

266 Journal of the American Statistical Association March 2004

is as t D t1t2t3 cent cent cent where each digit ti is either a 0 1 2or 3 A function hp I I is a hierarchical permutationif hpt D p1t1pt12t2pt1 t23t3 cent cent cent where pt1 t2centcentcenttniexcl1ncent isa permutationof f0 12 3g for each uniquecombinationof dig-its t1 t2 tniexcl1 Again we ensure that hp is 1ndash1 by alwaysusing the expansion with an in nite number of nonzero digitsAny quadrant-recursive function can be expressed as the com-position of f0 with some hierarchical permutation hp becausethe associations f Qn

jk D J nm determine the series of permuta-

tions and the permutations de ne the associationsIf the permutations that de ne hpcent are chosen at random

and independentlyfrom the set of all possible permutations wecall hpcent a hierarchical randomization function and call theprocess of applying hpcent hierarchical randomization

22 Sample Selection With Probability Proportional toArbitrary Intensity Function

We assume that the design speci cations de ne a desiredsample intensity function frac14s that is the number of samplesper unit measure of the populationFor example if the popula-tion were a stream network frac14s might specify the number ofsamples per kilometer of stream at s For a discrete populationfrac14s has the usual nite-population-sampling interpretation asthe target inclusion probability of the population unit locatedat s We call frac14s an intensity function because we have notyet introduced a probability measure In Appendix B we de-velop the details of a sample selection method that yields aninclusion-probability function equal to frac14s The concept be-hind the method is the composition of a hierarchical random-ization function with a function that assigns to every intervalin f R a weight equal to the total of the intensity function ofits preimage in R In effect we stretch the image interval viaa distribution function F so that its total length is equal to thesample size M We pick M points by taking a systematic sam-ple with a unit separation along the stretched image and we mapthese points back into the domain R via the inverse function toget the sample of the population We show in Appendix B thatthis procedure does indeed give a sample with an inclusion-probability function equal to the intensity function frac14s

The technique of randomly mapping two-dimensional spaceto a line segment systematically sampling from the range of thedistribution function and then mapping back to the populationelements always produces a sample with the desired rst-orderinclusion-probability function as long as f is 1ndash1 and measur-able We required that f be quadrant recursive and claim thatthis is suf cient to give a spatially balanced sample This claimfollows from the fact that the map f iexcl1 plusmn F=M plusmn f transformsthe unequal intensity surface de ned by frac14 into an equiproba-ble surface The quadrant-recursive property of f guaranteesthat the sample is evenly spread over the equiprobable sur-face (in the sense that each subquadrant receives its expected

number of samples) to the resolution determined by the samplesize M

23 Reverse Hierarchical Ordering

The sample points selected by mapping the systematic pointsalong 0 M] back to the population domain will be ordered ina way that follows the quadrant-recursiveness of f temperedby an allowance for unequal probability selection Thus the rst quarter of the points all will come from the same ldquoquad-rantrdquo of the equiprobabledomain and all will be approximatelyneighbors in the original populationdomain It follows that fourpoints one picked from each quarter of the sample points or-dered by the systematic selection will be a spatially balancedsample Because the random permutations that de ne the hier-archical randomization are selected independently of one an-other it makes no difference from a distributional standpointwhether we pick the points systematically from each quarteror make random selections from each quarter Therefore welose no randomness by picking the points that occupy positionsthat correspond to being at the beginningone-quarter one-halfand three-quarters of the way through the ordered list of samplepoints

Within each quarter of the list the points are again quadrant-recursively ordered so points picked at the beginning one-quarter one-half and three-quarters of the way througheach quarter of the list will be spread out over the correspond-ing quadrant and so on down through the sequence of sub-quadrants We can utilize these properties by reordering thesystematically selected list so that at any point in the reorderedlist the samples up to that point are well spread out over thepopulationdomain

The order is most convenientlyexpressed in terms of a base-4fraction where the fraction expresses the relative position inthe systematically ordered list Thus the rst four points cor-respond to the fractions 0 1 2 34 D 0 1=4 1=2 3=410Stepping down a subquadrant level corresponds to addinga digit position to the base-4 fraction which we ll in sucha way as to spread the sequence of points over the populationdomain The pattern for the rst 16 points is shown in Table 1Note that the order corresponds to the ranking obtained by re-versing the sequence of base-4 digits and treating the reversedsequence as a base-4 fraction

We can continue this same pattern of adding digit positionsthrough as many positions as necessary to order the entire sam-ple The resulting order is called reverse hierarchical orderIt remains to show that reverse hierarchical order does indeedgive a spatially well-balanced sample for any m middot M Clearlythis is the case for m D 4k because the reduced sample canbe viewed as a sample selected from a complete GRTS designStevens (1997) derived an analytic expression for the pairwiseinclusion density for some special intermediate cases Here weinvestigate the spatial balance properties using simulation

Table 1 Generation of Reverse Hierarchical Order

Reverse Reverse Reverse ReverseOrder Base 4 base 4 Order Base 4 base 4 Order Base 4 base 4 Order Base 4 base 4

1 00 00 5 01 10 9 02 20 13 03 302 10 01 6 11 11 10 12 21 14 13 313 20 02 7 21 12 11 22 22 15 23 324 30 03 8 31 13 12 32 23 16 33 33

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 267

3 SPATIAL PROPERTIES OF GRTS SAMPLE POINTS

In this section we investigate the spatial balance or regu-larity of the sample points produced by a GRTS design Wenoted in the Introduction that generally the ef ciency of anenvironmental sample increases as spatial regularity increasesA design with regularity comparable to a maximally strati edsample should have good ef ciency Choosing a suitable sta-tistic to describe regularity is nontrivial because the popula-tion domain itself is likely to have some inherent nonregularity(eg variation in spatial density for a nite or linear popula-tion) and because of the need to account for variable inclusionprobability The measure of regularity needs to describe reg-ularity over the inclusion-probability-weighted irregular pop-ulation domain Various statistics to assess the regularity ofa point process have been proposed in the study of stochasticpoint processes One class of descriptive statistics is based oncounts of event points within cells of a regular grid that cov-ers the process domain The mean count is a measure of theprocess intensity and the variance of the counts is a measureof the regularity The usual point process approach is to invokeergodicity and take expectation over a single realization In thepresent case the expectationshould and can be taken over repli-cate sample selections

We illustrate this approach using an arti cial nite popula-tion that consists of 1000 points in the unit square with a spatialdistribution constructed to have high spatial variability that isto have voids and regions with densely packed points Variableprobability was introduced by randomly assigning 750 unitsa relative weight of 1 200 units a weight of 2 and 50 unitsa weight of 4 The inclusion probability was obtained by scal-ing the weights to sum to the sample size We divided the unit

square into 100 square cells with sides 1 units Fifty-one of thecells were empty The expected sample sizes (the sum of theinclusion probability for each cell) for the 49 nonempty cellsranged from 037 to 4111

We compared the regularity of three sampling designs in-dependent random sampling (IRS) spatially strati ed sampling(SSS) and GRTS sampling For each sampling scheme we se-lected 1000 replicates of a sample of 50 points and counted thenumber of sample points that fell into each of the 49 nonemptycells de ned in the previous paragraph For the IRS sample weused the S-PLUS (Insightful Corporation 2002) ldquosamplerdquo func-tion with ldquoprobrdquo set to the element inclusion probability

As we noted in the Introductionthere is no general algorithmfor partitioning an arbitrary nite spatial population with vari-able inclusionprobability into spatial strata with equal expectedsample sizes For this exercisewe chose to use equal-area stratawith variable expected sample sizes For simplicity we chosesquare strata We picked a side length and origin so that (1)the strata were not coherent with the 1 pound 1 cells used for reg-ularity assessment and (2) about 50 stratum cells had at leastone population point The strata we used were offset from theorigin by 03 03 with a side length of 095 Exactly 50 strat-i cation cells were nonemptywith expected sample sizes rang-ing from 037 to 4111 Figure 2 shows the population with thestrati cation cells overlaid

We selected the strati ed sample in two stages The fractionalparts of the expected sample sizes will always sum to an inte-ger in this case 21 The rst step in the sample selection was toselect which 21 of the 50 strata would receive an ldquoextrardquo samplepoint For this step we again used the S-PLUS ldquosamplerdquo func-tion this time with ldquoprobrdquo set to the fractional part of the ex-pected sample size The second step in the sample selection was

Figure 2 Finite Population Used in Spatial Balance Investigation Overlaid With Grid Cells Used for Stratication Cell cross-hatching indicatesthe expected sample size in each cell

268 Journal of the American Statistical Association March 2004

to pick samples in each of the 50 strata that had a sample sizeequal to the integerpart of the expected sample size plus 1 if thestratum was selected in stage 1 Again the sample was selectedwith the ldquosamplerdquo function this time with ldquoprobrdquo set to the ele-ment inclusionprobabilityThis two-stage procedurealways se-lects exactly 50 samples with the desired inclusion probability

In Figure 3 we plot the variance of the achieved sample sizein each of the evaluation cells versus the expected sample sizewith lowess tted lines Of the three designs the IRS has thelargest variance and the GRTS has the smallest the SSS designis approximatelymidway between Strati cation with one sam-ple per cell would likely have about the same variance as theGRTS

Another common way to characterize a one-dimensionalpoint process is via the interevent distance for example themean interevent time for a time series measures the intensity ofthe process and the variance measures the regularity An anal-ogous concept in two-dimensions is that of Voronoi polygonsFor a set of event points fs1 s2 sng in a two-dimensionaldomain the Voronoi polygon 9i for the ith point is the collec-tion of domain points that are closer to si than to any other sj inthe set Note that in the case of a nite population the Voronoildquopolygonsrdquoare collectionsof populationpoints and for a linearpopulation they are collections of line segments

We propose using a statistic based on Voronoi polygons todescribe the regularity of a spatial sample For the sample S

consisting of the points fs1 s2 sng let ordmi DR

9ifrac14s dAacutes

so that vi is the total inclusion probability of the Voronoi poly-gon for the ith sample point and set sup3 D Varfordmig For a -nite population with variable inclusion probability vi is thesum of the inclusion probability of all population units closerto the sample point si than to any other sample point BecauseP

i j9ij D jRj andR

Rfrac14s dAacutes D n E[vi ] D 1 We note that

for an equiprobable sample of a two-dimensionalcontinua sup3 isequal to the variance of the area of the Voronoi polygons for thepoints of S multipliedby the square of the inclusionprobability

Figure 3 Comparison of the Regularity of GRTS SSS and IRS De-signs Results are based on the mean of 1000 samples of size 50 Theachieved sample size is the number of samples that fell into 1 pound 1square cells that tiled the population domain Lines were tted withlowess (N generalized random tessellation stratied sampling pound inde-pendent random sampling brvbar spatially stratied sampling)

For the kinds of applications that we have in mind the spa-tial context of the population is an intrinsic aspect of the sampleselection For a nite population the spatial context simplycomprises the locationsof the populationunits for a linear pop-ulation the spatial context is the network and for an areal re-source the spatial context is described by the boundary of theresource domain which may be a series of disconnected poly-gons The effect of the interplay of sampling design and spa-tial context on properties of the sample cannot be ignored Forsmall to moderate sample sizes or for highly irregular domainsthe spatial context can have a substantial impact on the distri-bution of sup3 Because of the spatial dependence the derivationof a closed form for the distribution of sup3 does not seem feasi-ble even for simple sampling designs such as IRS Howeverfor most cases it should be relatively easy to simulate the dis-tribution of sup3 under IRS to obtain a standard for comparisonThe regularity of a proposed design can then be quanti ed asthe ratio sup3proposed design=sup3IRS where ratios less than 1indicate more regularity than an IRS design

We evaluated spatial balance using the sup3 ratio under threescenarios (1) a variable probability sample from a nite popu-lation (2) an equiprobable point sample from an areal popula-tion de ned on the unit square and (3) an equiprobable pointsample from the same extensive populationbut with randomlylocated square holes to model nonresponseand imperfect frameinformation

For the nite population study we drew 1000 samples ofsize 50 from the previously described nite population for boththe GRTS and the IRS designs To illustrate the ability of theGRTS design to maintain spatial regularity as the sample sizeis augmented we ordered the GRTS points using reverse hi-erarchical ordering We then calculated the sup3 ratio beginningwith a size of 10 and adding one point at a time following thereverse hierarchical order We also drew 1000 samples of size50 using the previously discussed spatial strati cation Becausethere is no sensible way to add the strati ed sample points oneat a time we can compute the sup3 ratio only for the complete sam-ple of 50 points Figure 4 is a plot of the sup3 ratio for GRTS andIRS versus sample size The single sup3 ratio for SSS(50) is alsoshown For the GRTS design the sup3 ratio has a maximum valueof 587 with 10 samples and gradually tapers off to 420 with50 samples Although it would be dif cult to prove we sus-pect that the gradual taper is due to lessening edge effect withincreasing sample size that is fewer of the Voronoi polygonscross the void regions in the population domain We note thatthe valleys in the sup3 ratio occur at multiples of 4 with the mostextreme dips occurring at powers of 4 This is a consequenceof quadrant-recursive partitioning maximum regularity occurswith one point from each of the four quadrants We also notethat the SSS(50) value of the sup3 ratio is 550 compared to thecorrespondingvalue of 420 for the GRTS design Inasmuch asthe GRTS is analogous to a one-sample-per-stratum SSS wewould expect the GRTS to be as ef cient as a maximally ef -cient SSS

For our extensive population study we selected 1000 sam-ples of size M D 256 from the unit square using the GRTS

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 269

Figure 4 The sup3 Ratio as a Function of Sample Size Based on 1000Replicate Samples From an Arti cial Finite Population Sample pointswere added one point at a time up to the maximum sample size of 50following reverse hierarchical order for the GRTS sample The sup3 ratio fora spatially stratied sample is also indicated on the plot

design and ordered the samples using the reverse hierarchicalorder As for the nite populationstudy we calculated the sup3 ra-tio as the points were added to the sample one at a time be-ginning with point number 10 The holes represent nontargetor access-denied elements that were a priori unknown Samplepoints that fell in the holes were discarded resulting in a vari-able number of sample points in the target domain As for thecomplete domain we ordered the points using reverse hierar-chical ordering and then calculated the sup3 ratio as the points wereadded one at a time Because the sample points that fall into thenontarget areas contribute to the sample point density but not tothe sample size the sup3 ratio was plotted versus point density

We used three different distributions of hole size constantlinearly increasing and exponentially increasing In each casethe holes comprise 20 of the domain area Figure 5 shows the

Figure 5 Void Patterns Used to Simulate InaccessiblePopulation El-ements

Figure 6 The sup3 Ratio as a Function of Point Density Based on 1000Replications of a Sample of Size 256 (mdashmdash continuous domain with novoids iexcl iexcl iexcliexcl exponentialy increasing polygon size - - - - - linearly in-creasing polygon size cent cent centmdashcent cent centmdash constant polygon size)

placementof the holes for each scenario and Figure 6 shows thesup3 ratio for all four scenarios no voids exponentiallyincreasinglinearly increasing and constant size

In every scenario the variance ratio is much less than 1Except for small sample sizes the ratio stays in the range of2 to 4 The gradual decrease as the sample size increases is dueto the decreasing impact of the boundaryas the sample size in-creases the proportion of polygons that intersect the boundarydecreases A similar effect is seen with the different inaccessi-bility scenarios even though the inaccessible area is constantthe scenarios with greater perimeter cause more increase invariance

4 STATISTICAL PROPERTIES OF GRTS DESIGN

41 Estimation

The GRTS design produces a sample with speci ed rst-order inclusion probabilities so that the HorvitzndashThompson(Horvitz and Thompson 1952) estimator or its continuouspop-ulation analog (Cordy 1993 Stevens 1997) can be applied toget estimates of population characteristics Thus for examplean estimate of the population total of a response z is given byOZT D

Psi2Rzsi=frac14si Stevens (1997) provided exact ex-

pressions for second-order inclusion functions for some specialcases of a GRTS These expressions can also be used to provideaccurate approximations for the general case Unfortunatelythe variance estimator based on using these approximations inthe usual HorvitzndashThompson (HT) or YatesndashGrundyndashSen (YGYates and Grundy 1953 Sen 1953) estimator tends to be unsta-ble The design achieves spatial balance by forcing the pairwiseinclusion probability to approach 0 as the distance between thepoints in the pair goes to 0 Even though the pairwise inclu-sion density is nonzero almost everywhere any moderate-sizedsample will nevertheless have one or more pairs of points thatare close together with a correspondinglysmall pairwise inclu-sion probability For both the HT and YG variance estimatorsthe pairwise inclusion probability appears as a divisor The cor-

270 Journal of the American Statistical Association March 2004

responding terms in either HT or YG variance estimators willtend to be large leading to instability of the variance estimator

Contrast-based estimators of the form OVCtr OZT DP

i wiy2i

where yi is a contrast of the form yi DP

k cikzsk withPk cik D 0 have been discussed by several authors (Yates

1981 Wolter 1985 Overton and Stehman 1993) For an RTSdesign Overton and Stehman also considered a ldquosmoothedrdquocontrast-based estimator of the form OVSMO OZT D

Pi wizi iexcl

zcurreni 2 where zcurren

i called the smoothed value for data point zi istaken as a weighted mean of a point plus its nearest neighborsin the tessellation

Stevens and Olsen (2003) proposed a contrast-based esti-mator for the GRTS design that bears some resemblance tothe Overton and Stehman smoothed estimator The single con-trast zi iexcl zcurren

i 2 is replaced with an average of several con-trasts over a local neighborhood analogous to a tessellationcell and its nearest neighbors in the RTS design A heuris-tic justi cation for this approach stems from the observationthat the inverse images of the unit-probability intervals on theline form a random spatial strati cation of the population do-main The GRTS design conditional on the strati cation isa one-sample-per-stratumspatially strati ed sample Recall thatOZT D

Psi2Rzsi=frac14si where zsi is a sample from the

ith random stratum The selections within strata are condition-ally independent of one another so that

V OZT DX

si2R

E

microV

sup3zsi

frac14si

shyshyshystrata

acutepara

The proposed variance estimator approximates E[V zsi=

frac14sij strata] by averaging several contrasts over a localneighborhoodof each sample point The estimator is

OVNBH OZT DX

si2R

X

sj 2Dsi

wij

sup3zsj

frac14sj iexcl

X

sk2Dsi

wikzsk

frac14sk

acute2

where Dsi is a local neighborhood of the si The weightswij are chosen to re ect the behavior of the pairwise inclu-sion function for GRTS and are constrained so that

Pi wij DP

j wij D 1 Stimulation studies with a variety of scenarioshave shown the proposed estimator to be stable and nearly unbi-ased Applications with real data have consistently shown thatour local neighborhoodvariance estimator produces smaller es-timates than the HorvitzndashThompson estimator when IRS is as-sumed to approximate for the joint inclusion probabilities

42 Inverse Sampling

The reverse hierarchical ordering provides the ability to doinverse sampling that is to sample until a given number ofsamples are obtained in the target population The true inclu-sion probability in this case depends on the spatial con gura-tion of the target populationwhich may be unknownHoweverone can compute an inclusion probability that is conditional onthe achieved sample size in the target population being xedFor example suppose we want M sample points in our do-main R We do not know the exact boundaries of R but areable to enclose R in a larger set Rcurren We select a sample of sizeMcurren gt M from Rcurren using an inclusion density frac14 curren scaled so that

Table 2 Domain Area Estimates Using ConditionalInclusion Probability

Targetsamplesize

Mean estimated domain area

Exponential Constant Linear

25 8000979 7969819 801058950 7995775 7979406 8005739

100 7994983 7980543 8002237150 7994777 7997587 7995685

RRcurren frac14currens dAacutes D M curren The inclusion density for the k-point

reverse hierarchical ordered sample is frac14currenk s D k=Mcurrenfrac14currens

Using the inclusion density frac14 currenk the expected number of sam-

ples in R is

Mk DZ

R

frac14currenk s dAacutes D

Z

RcurrenIRsfrac14 curren

k s dAacutes

We cannot compute Mk because the boundaryof R is unknownbut an estimate is

OMk DX

i

IR sifrac14currenk si

frac14currenk si

DX

i

IRsi

We pick Qk so that OMk D M and base inference on frac14currenQk Thus

for example an estimate of the unknown extent of R is j ORj DPiIRsi=frac14curren

Qksi

We illustrate this using the same inaccessibility scenarios asfor the spatial balance simulation Results are summarized inTable 2 In each case the true area of R is 8 so that the esti-mator using frac14curren

Qk is either unbiased or nearly so

43 Statistical Ef ciency

As discussed in the Introduction sampling designs withsome degree of spatial regularity for example systematic grid-based or spatially strati ed designs tend to be more ef cientfor sampling natural resources than designs with no spatialstructure The GRTS design takes the concept of spatial strat-i cation carries it to an extreme and gives it exibility androbustness The basis for these claims is that for the case ofan equiprobable sample of an areal resource over a continuousconnected domain a GRTS sample with size n D 4k is a spa-tially strati ed sample with one sample point per stratum Inthis case the strata are square grid cells with a randomly locatedorigin Generally the ef ciency of a spatially strati ed sampleincreases as the number of strata increases (samples per stratumdecreases) so maximal ef ciency is obtained for a one-point-per-stratum-design Thus in this restricted case the GRTS hasthe same ef ciency as the maximally ef cient spatial strati ca-tion

The spatial regularity simulation studies provide some in-sight into less restrictive cases First the ldquono-voidrdquo case ofthe continuous domain study shows that the spatial regularityis not seriously degraded for sample sizes that are not powersof 4 so that even for intermediate sample sizes the GRTS ef- ciency should be close to the ef ciency of maximal spatialstrati cation Second the ldquoholesrdquo cases show that for irreg-ularly shaped domains GRTS maintains spatial regularity Inthis case GRTS with n D 4k is again a one-point-per-stratum

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 271

design but the strata are no longer regular polygonsNeverthe-less GRTS should have the same ef ciency as maximal strati- cation

An example of circumstances where ef ciency is dif cultto evaluate is a nite population study with variable probabil-ity and irregular spatial density In these circumstances spatialstrata can be very dif cult to form and in fact it may be impos-sible to form strata with a xed number of samples per stratumA GRTS sample achieves the regularity of a one-sample-per-stratum strati cation and so should have the same ef ciency

The overwhelming advantage of a GRTS design is not that itis more ef cient than spatial strati cation but that it can be ap-plied in a straightforward manner in circumstances where spa-tial strati cation is dif cult All of the pathologies that occurin sampling natural populations (poor frame information inac-cessibility variable probability uneven spatial pattern missingdata and panel structures) can be easily accommodated withinthe GRTS design

5 EXAMPLE APPLICATION TO STREAMS

The Indiana Department of Environmental Management(IDEM) conducts water quality and biological assessments ofthe streams and rivers within Indiana For administrative pur-poses the state is divided into nine hydrologicbasins East ForkWhite River Basin West Fork White River Basin Upper IllinoisRiver Basin Great Miami River Basin Lower Wabash RiverBasin Patoka River Basin Upper Wabash River Basin GreatLakes Basin and Ohio River Basin All basins are assessedonce during a 5-year period typically two basins are completedeach year In 1996 IDEM initiated a monitoring strategy thatused probability survey designs for the selection of samplingsite locations We collaborated with them on the survey designIn 1997 a GRTS multidensity design was implemented for theEast Fork White River Basin and the Great Miami River BasinIn 1999 another GRTS multidensity design was implementedfor the Upper Illinois Basin and the Lower Wabash These de-signs will be used to illustrate the application of GRTS surveydesigns to a linear network

The target population for the studies consists of all streamsand rivers with perennially owing water A sample frameRiver Reach File Version 3 (RF3) for the target populationis available from the US Environmental Protection Agency(Horn and Grayman 1993) The RF3 includes attributes that en-able perennial streams and rivers to be identi ed but results inan overcoverage of the target population due to coding errorsIn addition Strahler order is available to classify streams andrivers into relative size categories (Strahler 1957) A headwaterstream is a Strahler rst-order stream two rst-order streamsjoining results in a second-order stream and so on Approx-imately 60 of the stream length in Indiana is rst order

Table 3 Sample Frame Stream and River Length by Basin andStrahler Order Category

Strahler order category length (km)

Basin Total length (km) 1 2 3 C

E Fork White 6802385 3833335 2189494 779556Great Miami 2270018 1501711 621039 147268L Wabash 7601418 4632484 1331228 1637706U Illinois 5606329 4559123 500188 547018

20 is second order 10 is third order and 10 is fourth andgreater (see Table 3) In 1997 IDEM determined that the sam-ple would be structured so that approximately an equal num-ber of sites would be in rst order second and third order andfourth C order for the East Fork White River and the Great Mi-ami River basins In 1999 the sample was modi ed to have anequal number of sites in rst second third and fourth C ordercategories for the Lower Wabash and Upper Illinois basins

The GRTS multidensity survey designs were applied In bothyears six multidensity categories were used (three Strahler or-der categories in each of two basins) Although four Strahlerorder categories were planned in 1999 the stream lengths as-sociated with the third and fourth C categories were approxi-mately equal so a single category that combined the samplesizes was used To account for frame errors landowner denialsand physically inaccessible stream sites a 100 oversamplewas incorporated in 1999 The intent was to have a minimumof 38 biological sites with eld data in 1999 this was not donein 1997 Table 4 summarizes the number of sites expected andactually evaluated as well as the number of nontarget targetnonresponse and sampled sites Almost all of the nonresponsesites are due to landowner denial In 1999 the sites were usedin reverse hierarchical order until the desired number of ac-tual eld sample sites was obtained The biological sites werea nested subsample of the water chemistry sites and were takenin reverse hierarchicalorder from the water chemistry sites Fig-ures 7ndash10 show the spatial pattern of the stream networks andthe GRTS sample sites for each of the four basins by Strahlerorder categories Although this is an example of a single real-izationof a multidensityGRTS design all realizationswill havea similar spatial pattern Prior to statistical analysis the initialinclusion densities are adjusted to account for use of oversam-ple sites by recalculating the inclusion densities by basin

Indiana determined two summary indices related to the eco-logical conditionof the streams and rivers the IBI score whichis a sh community index of biological integrity (Karr 1991)that assesses water quality using resident sh communities asa tool for monitoring the biological integrity of streams and theQHEI score which is a habitat index based on the Ohio Envi-ronmental Protection Agency qualitative habitat evaluation in-dex (see IDEM 2000 for detailed descriptions of these indices)

Table 4 Survey Design Sample Sizes for Basins Sampled in 1997 and 1999

Expected Evaluated Nontarget Target Nonresponse Water chemistry BiologicalBasin sample size sample size sites sites sites sites sites

E Fork White 60 60 5 55 9 35 34Great Miami 40 40 12 28 5 19 19L Wabash 128 91 11 80 9 71 39U Illinois 128 85 8 77 5 72 41

272 Journal of the American Statistical Association March 2004

Figure 7 East Fork White River Basin Sample Sites by Multidensity Categories

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 273

Figure 8 Great Miami River Basin Sample Sites by Multidensity Categories

274 Journal of the American Statistical Association March 2004

Figure 9 Upper Illinois River Basin Sample Sites by Multidensity Categories

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 275

Figure 10 Lower Wabash River Basin Sample Sites by Multidensity Categories

276 Journal of the American Statistical Association March 2004

Table 5 Population Estimates With IRS and Local Variance Estimates

Indicator IRS Local DifferenceSubpopulation score N sites Mean std err std err ()

L Wabash IBI 39 361 21 14 iexcl568U Illinois IBI 41 325 17 13 iexcl448E Fork White IBI 32 351 13 12 iexcl223Great Miami IBI 19 408 27 22 iexcl335L Wabash QHEI 39 556 23 16 iexcl522U Illinois QHEI 41 433 21 16 iexcl395E Fork White QHEI 34 543 21 15 iexcl459Great Miami QHEI 19 678 22 19 iexcl260

Table 5 summarizes the population estimates for IBI andQHEI scores for each of the four basins The associated stan-dard error estimates are based on the HorvitzndashThompson ratiovariance estimator assuming an independent random sampleand on the local neighborhoodvariance estimator described inSection 41 On average the neighborhood variance estimatoris 38 smaller than the IRS variance estimator Figure 11 il-lustrates the impact of the variance estimators on con denceintervals for cumulative distribution function estimates for theLower Wabash Basin

6 DISCUSSION

There are a number of designs that provide good disper-sion of sample points over a spatial domain When we appliedthese designs to large-scale environmental sampling programsit quickly became apparent that we needed a means (1) toaccommodate variable inclusion probability and (2) to adjustsample sizes dynamically These requirements are rooted inthe very fundamentals of environmental management The rstrequirement stems from the fact that an environmental re-source is rarely uniformly important in the objective of themonitoring there are always scienti c economic or politi-cal reasons for sampling some portions of a resource moreintensively than others Two features of environmental moni-toring programs drive the second requirement First these pro-grams tend to be long lived so that even if the objectives ofthe program remain unchangedthe ldquoimportantrdquosubpopulationschange necessitating a corresponding change in sampling in-tensity Second a high-quality sampling frame is often lack-ing for environmental resource populationsAs far as we knowthere is no other technique for spatial sampling that ldquobalancesrdquoover an intensity metric instead of a Euclidean distance metricor permits dynamic modi cation of sample intensity

Adaptive sampling (Thompson 1992 pp 261ndash319) is an-other way to modify sample intensity However there are somesigni cant differences between GRTS and adaptive sampling inthe way the modi cation is accomplished Adaptive samplingincreases the sampling intensity locally depending on the re-sponse observed at a sample point whereas the GRTS intensitychange is global

The GRTS rst-order inclusion probability (or density) canbe made proportional to an arbitrary positive auxiliary vari-able for example a signal from a remote sensing platformor a sample intensity that varies by geographical divisions orknown physical characteristics of the target populationIn somepoint and linear situations it may be desirable to have thesample be spatially balanced with respect to geographic spacerather than with respect to the population density This can be

Figure 11 Stream Network and Sample Site Spatial Patterns by Mul-tidensity Category for the Lower Wabash Basin (mdashmdash- CDF estimateiexcl iexcl iexcl iexcl iexcl 95 local condence limits cent cent cent cent cent cent cent cent cent95 IRS condencelimits)

achieved by making the inclusion probability inversely propor-tional to the population density Although the development ofGRTS has focused on applications in geographic space it canbe applied in other spaces For exampleone applicationde nedtwo-dimensionalspace by the rst two principal componentsofclimate variables and selected a GRTS sample of forest plots inthat space

The computational burden in hierarchical randomization canbe substantial However it needs to be carried out only to a res-olution suf cient to obtain no more than one sample point persubquadrant The actual point selection can be carried out bytreating the subquadrants as if they are elements of a nitepopulation selecting the M subquadrants to receive samplepoints and then selecting one population element at randomfrom among the elements contained within the selected sub-quadrants according to the probability speci ed by frac14

Reverse hierarchical ordering adds a feature that is im-mensely popular with eld practitioners namely the ability toldquoreplacerdquo samples that are lost due to being nontarget or inac-cessible Moreover we can replace the samples in such a wayas to achieve good spatial balance over the population that isactually sampleable even when sampleability cannot be deter-mined prior to sample selectionOf course this feature does noteliminate the nonresponse or the bias of an inference to the in-accessible population It does however allow investigators toobtain the maximum number of samples that their budget willpermit them to analyze

Reverse hierarchical ordering has other uses as well One isto generate interpenetrating subsamples (Mahalanobis 1946)For example 10 interpenetrating subsamples from a samplesize of 100 can be obtained simply by taking consecutive sub-sets of 10 from the reverse hierarchical ordering Each subsethas the same properties as the complete design Consecutivesubsets can also be used to de ne panels of sites for applica-tion in surveys over time for example sampling with partial re-placement (Patterson 1950 Kish 1987 Urquhart Overton andBirkes 1993)

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 277

APPENDIX A PROOF OF LEMMA

Lemma Let f I2 I be a 1ndash1 quadrant-recursive function andlet s raquo UI2 Then limjplusmnj0 Efjf s iexcl f s C plusmnjg D 0

Proof If for some n gt 0 s and s C plusmn are in the same subquad-rant Qn

jk then f s and f s C plusmn are in the same interval J nm so

that jf s iexcl f s C plusmnj middot 1=4n The probability that s and s C plusmn arein the same subquadrant is the same as the probability of the ori-gin and plusmn D plusmnx plusmny being in the same cell of a randomly locatedgrid with cells congruent to Qn

jk For plusmnx plusmny middot 1=2n that probability

is equal to jQn0 Qnplusmnj=jQn0j D 1 iexcl 2nplusmnx C plusmny C 4nplusmnxplusmny where Qnx denotes a polygon congruent to Qn

jk centered on x

For Ds plusmn D jf s iexcl f s C plusmnj then we have that P D middot 1=4n cedil1 iexcl 2nplusmnx C plusmny C 4nplusmnxplusmny Thus the distribution function FD of D isbounded below by

FD u cedil

8gtlt

gt

0 u middot 14n

1 iexcl 2nplusmnx C plusmny C 4nplusmnxplusmny u gt1

4n

Because D is positive and bounded above by 1

E[Dplusmn] D 1 iexclZ 1

0FD udu

middot 1 iexclraquo

0

4nC

sup31 iexcl 1

4n

acuteiexcl 2nplusmnx C plusmny C 4nplusmnxplusmny

frac14

For xed n we have that

limjplusmnj0

E[Dplusmn] middot 14n

but this holds for all n so that

limjplusmnj0

E[Dplusmn] D 0

APPENDIX B PROOF THAT THE PROBABILITYINCLUSION FUNCTION EQUALS THE

TARGET INTENSITY FUNCTION

We need the measure space XB Aacute where X is the unit inter-val I D 0 1] or the unit square I2 D 0 1] pound 01] and the rele-vant frac34 elds are BI and BI2 the frac34 elds of the Borel subsetsof I and I2 respectively For each of the three types of populationswe de ne a measure Aacute of population size We use the same symbolfor all three cases but the speci cs vary from case to case For a -nite population we take Aacute to be counting measure restricted to R sothat for any subset B 2 BI2 AacuteB is the number of population ele-ments in B R For linear populations we take AacuteB to be the lengthof the linear population contained within B Clearly Aacute is nonnega-tive countably additive de ned for all Borel sets and Aacute D 0 soAacute is a measure Finally for areal populations we take AacuteB to be theLebesgue measure of B R

We begin by randomly translating the image of R in the unit squareby adding independent U0 1=2 offsets to the xy coordinates Thisrandom translation plays the same role as random grid location doesin an RTS design namely it guarantees that pairwise inclusion prob-abilities are nonzero In particular in this case it ensures that any pairof points in R has a nonzero chance of being mapped into differentquadrants

Let frac14s be an inclusion intensity function that is a function thatspeci es the target number of samples per unit measure We assumethat any linear population consists of a nite number m of smoothrecti able curves R D

SmiD1fdegit D xi t yi t jt 2 [ai bi ]g with

xi and yi continuous and differentiable on [ai bi ] We set frac14s equalto the target number of samples per unit length at s for s 2 L and

equal to zero elsewhere For example if the linear population werea stream network frac14s would specify the desired number of samplesper kilometer of stream at the point s Finally an areal population isa nite collection of closed polygons In this case frac14s speci es thetarget intensity as number of samples per unit area Note that for one-and two-dimensional resources frac14s could be a continuous smoothlyvarying function Formally we require frac14s to be bounded and mea-surable strictly positive on R and zero elsewhere and scaled so thatM D

RR frac14s dAacutes From these de nitions of frac14cent and Aacutecent it follows

that wB DR

B frac14sdAacutes is a measure and that wB is the targetnumber of samples in B In particular M D wI2 is the target samplesize In the following discussion we assume that M is an integer thenoninteger case is a simple extension

Let f cent be a quadrant-recursive function that maps I2 into I Be-cause BI can be generated by sets of the form J n

m and BI2 can begenerated by sets of the form Qn

j k both f and f iexcl1 are measurable

Because f is measurable f iexcl1B is measurable for B 2 BI so thatQF x D

Rf iexcl10x] frac14sdAacutes exists In fact QF is a distribution func-

tion that is nonnegative increasing and right continuous For linearand areal resources QF is a continuous increasing function but for -nite resourcepopulations QF is a step function with jumps at the imagesof populationelements We can modify QF to obtain continuity in the -nite case via linear interpolation that is let xi i D 1 N be the or-dered jump points of QF set x0 D 0 xNC1 D 1 and for xi lt x middot xiC1set F x D QF x C QFxiC1 iexcl QFxi =xiC1 iexcl xi x iexcl xi If we setF D QF for the linear and areal case then in every cases we have thatF is a continuous distribution function with range 0M]

In the nite case Fiexcl1 is single-valued so that Gy D minxi jF iexcl1y middot xi is well de ned In the linear and areal cases F iexcl1

may not be single-valued Points that are in the unit square but notin R lead to ats in F that correspond to regions in the unit squarewith frac14s D 0 However Fiexcl1y always will be closed and boundedso that Gy D minfxjx 2 Fiexcl1yg is well de ned In all cases theintensity function frac14 is positive at s D f iexcl1Gy that is there isa population element at s Thus f iexcl1 plusmn G maps 0M] onto the tar-get population that is f iexcl1 plusmn G associates every point in 0 M] witha unique element in the population

It follows that selecting a sample from 0 M] also selects pop-ulation elements via the mapping f iexcl1 plusmn G To get a sample withan inclusion function equal to the target inclusion density we selecta sample from 0M] by splitting the range into M unit-length in-tervals 01] 1 2] M iexcl 1M] and picking one point in eachinterval Because of hierarchical randomizationwe gain no additionalldquorandomnessrdquo by picking the points independently so we use system-atic sampling with a random start and a unit-length selection intervalThe selection procedure de nes an inclusion probability density func-tion on 0M] with a correspondingmeasure PM cent Note that PM co-incides with Lebesgue measure on 0 M] in particular the measureof a subinterval of 0 M] is its length We induce a measure P1 on I

via P1B DRGiexcl1B dP M and in turn induce a measure P2 on I2

via P2B DR

f iexcl1B dP 1 The measure P2 is an inclusion probability

measure on I2 and P2B D wB so the sample selection methoddoes give an inclusion probability function equal to the target sampleintensity function

[Received August 2002 Revised September 2003]

REFERENCES

Bellhouse D R (1977) ldquoSome Optimal Designs for Sampling in Two Dimen-sionsrdquo Biometrika 64 605ndash611

Bickford C A Mayer C E and Ware K D (1963) ldquoAn Ef cient Sam-pling Design for Forest Inventory The Northeast Forest Resurveyrdquo Journalof Forestry 61 826ndash833

278 Journal of the American Statistical Association March 2004

Breidt F J (1995) ldquoMarkov Chain Designs for One-per-Stratum SamplingrdquoSurvey Methodology 21 63ndash70

Brewer K R W and Hanif M (1983) Sampling With Unequal ProbabilitiesNew York Springer-Verlag

Cochran W G (1946) ldquoRelative Accuracy of Systematic and Strati ed Ran-dom Samples for a Certain Class of Populationsrdquo The Annals of Mathemati-cal Statistics 17 164ndash177

Cordy C (1993) ldquoAn Extension of the HorvitzndashThompson Theorem to PointSampling From a Continuous Universerdquo Probability and Statistics Letters18 353ndash362

Cotter J and Nealon J (1987) ldquoArea Frame Design for Agricultural SurveysrdquoUS Department of Agriculture National Agricultural Statistics Service Re-search and Applications Division Area Frame Section

Dalenius T Haacutejek J and Zubrzycki S (1961) ldquoOn Plane Sampling and Re-lated Geometrical Problemsrdquo in Proceedings of the 4th Berkeley Symposiumon Probability and Mathematical Statistics 1 125ndash150

Das A C (1950) ldquoTwo-Dimensional Systematic Sampling and the AssociatedStrati ed and Random Samplingrdquo Sankhya 10 95ndash108

Gibson L and Lucas D (1982) ldquoSpatial Data Processing Using BalancedTernaryrdquo in Proceedings of the IEEE Computer Society Conference on Pat-tern Recognition and Image Processing Silver Springs MD IEEE ComputerSociety Press

Gilbert R O (1987) Statistical Methods for Environmental Pollution Moni-toring New York Van Nostrand Reinhold

Hausdorff F (1957) Set Theory New York ChelseaHazard J W and Law B E (1989) Forest Survey Methods Used in the USDA

Forest Service EPA6003-89065 Corvallis Oregon US EnvironmentalProtection Agency Of ce of Research and Development Environmental Re-search Laboratory

Horn C R and Grayman W M (1993) ldquoWater-Quality Modeling With EPARiver Reach File Systemrdquo Journal of Water Resources Planning and Man-agement 119 262ndash274

Horvitz D G and Thompson D J (1952) ldquoA Generalization of SamplingWithout Replacement From a Finite Universerdquo Journal of the American Sta-tistical Association 47 663ndash685

Iachan R (1985) ldquoPlane Samplingrdquo Statistics and Probability Letters 50151ndash159

IDEM (2000) ldquoIndiana Water Quality Report 2000rdquo Report IDEM34020012000 Indiana Department of Environmental Management Of ce of Wa-ter Management Indianapolis Indiana

Insightful Corporation (2002) ldquoS-PLUS 6 for Windows Language ReferencerdquoInsightful Corporation Seattle WA

Karr J R (1991) ldquoBiological Integrity A Long Neglected Aspect of WaterResource Managementrdquo Ecological Applications 1 66ndash84

Kish L (1987) Statistical Design for Research New York WileyMahalanobis P C (1946) ldquoRecent Experiments in Statistical Sampling in

the Indian Statistical Instituterdquo Journal of the Royal Statistical Society 109325ndash370

Mark D M (1990) ldquoNeighbor-Based Properties of Some Orderings of Two-Dimensional Spacerdquo Geographical Analysis 2 145ndash157

Mateacutern B (1960) Spatial Variation Stockholm Sweden Meddelanden fraringnStatens Skogsforskningsinstitut

Messer J J Arsiss C W Baker J R Drouseacute S K Eshleman K NKaufmann P R Linthurst R A Omernik J M Overton W S Sale M JSchonbrod R D Stambaugh S M and Tuschall J R Jr (1986) Na-tional Surface Water Survey National Stream Survey Phase I-Pilot SurveyEPA-6004-86026 Washington DC US Environmental ProtectionAgency

MunhollandP L and Borkowski J J (1996) ldquoSimple Latin Square SamplingC 1 A Spatial Design Using Quadratsrdquo Biometrics 52 125ndash136

Olea R A (1984) ldquoSampling Design Optimization for Spatial FunctionsrdquoMathematical Geology 16 369ndash392

Overton W S and Stehman S V (1993) ldquoProperties of Designs for SamplingContinuous Spatial Resources From a Triangular Gridrdquo Communications inStatistics Part AmdashTheory and Methods 22 2641ndash2660

Patterson H D (1950) ldquoSampling on Successive Occasions With Partial Re-placement of Unitsrdquo Journal of the Royal Statistical Society Ser B 12241ndash255

Peano G (1890) ldquoSur Une Courbe Qui Remplit Toute Une Aire Planerdquo Math-ematische Annalen 36 157ndash160

Quenouille M H (1949) ldquoProblems in Plane Samplingrdquo The Annals of Math-ematical Statistics 20 335ndash375

Saalfeld A (1991) ldquoConstruction of Spatially Articulated List Frames forHousehold Surveysrdquo in Proceedings of Statistics Canada Symposium 91Spatial Issues in Statistics Ottawa Canada Statistics Canada pp 41ndash53

Sen A R (1953) ldquoOn the Estimate of the Variance in Sampling With Vary-ing Probabilitiesrdquo Journal of the Indian Society of Agricultural Statistics 7119ndash127

Simmons G F (1963) Introduction to Topology and Modern Analysis NewYork McGrawndashHill

Stehman S V and Overton W S (1994) ldquoEnvironmental Sampling and Mon-itoringrdquo in Handbook of Statistics Vol 12 eds G P Patil and C R RaoAmsterdam The Netherlands Elsevier Science pp 263ndash305

Stevens D L Jr (1997) ldquoVariable Density Grid-Based Sampling Designs forContinuous Spatial Populationsrdquo Environmetrics 8 167ndash195

Stevens D L Jr and Olsen A R (1999) ldquoSpatially Restricted Surveys OverTime for Aquatic Resourcesrdquo Journal of Agricultural Biological and Envi-ronmental Statistics 4 415ndash428

(2000) ldquoSpatially-Restricted Random Sampling Designs for Design-Based and Model-Based Estimationrdquo in Accuracy 2000 Proceedings of the4th International Symposium on Spatial Accuracy Assessment in Natural Re-sources and Environmental Sciences Delft The Netherlands Delft Univer-sity Press pp 609ndash616

(2003) ldquoVariance Estimation for Spatially Balanced Samples of Envi-ronmental Resourcesrdquo Environmetrics 14 593ndash610

Strahler A N (1957) ldquoQuantitative Analysis of Watershed GeomorphologyrdquoTransactions of the American Geophysical Union 38 913ndash920

Thompson S K (1992) Sampling New York WileyUrquhart N S Overton W S and Birkes D S (1993) ldquoComparing

Sampling Designs for Monitoring Ecological Status and Trends Impact ofTemporal Patternsrdquo in Statistics for the Environment eds V Barnett andK F Turkman New York Wiley pp 71ndash86

USGS (1999) ldquoThe National Hydrography Datasetrdquo Fact Sheet 106-99 USGeological Survey

Wolter K (1985) Introduction to Variance Estimation New York Springer-Verlag

Wolter K M and Harter R M (1990) ldquoSample Maintenance Based on PeanoKeysrdquo in Proceedings of the 1989 International Symposium Analysis of Datain Time Ottawa Canada Statistics Canada pp 21ndash31

Yates F (1981) Sampling Methods for Censuses and Surveys (4th ed) Lon-don Grif n

Yates F and Grundy P M (1953) ldquoSelection Without Replacement FromWithin Strata With Probability Proportional to Sizerdquo Journal of the RoyalStatistical Society Ser B 15 253ndash261

Page 3: Spatially Balanced Sampling of Natural Resources · Spatially Balanced Sampling of Natural Resources DonL.STEVENSJr. and Anthony R. OLSEN The spatial distribution of a natural resource

264 Journal of the American Statistical Association March 2004

linear and areal resources with patterned and possibly peri-odic responses using arbitrarily variable inclusion probabilitywith imperfect frame information in the presence of substan-tial nonresponse In the design discussed here we generalizethe concept of spatial strati cation to create a very powerfuland exible technique for selecting a spatially well-distributedprobability sample that works under all of the preceding cir-cumstances The technique is based on creating a functionthat maps two-dimensional space into one-dimensional spacethereby de ning an ordered spatial address We use a restrictedrandomizationcalled hierarchical randomization(HR) (Stevensand Olsen 2000) to randomly order the address and then ap-ply a transformation that induces an equiprobable linear struc-ture Systematic sampling along the randomly ordered linearstructure is analogous to sampling a random tessellation oftwo-dimensional space and results in a spatially well-balancedrandom sample We call the resulting design a generalizedrandom-tessellation strati ed (GRTS) design We develop thedesign in a general setting that applies to nite linear and arealresources and that accommodates arbitrary inclusion probabil-ity functions A particularly favorable feature is that we can dy-namically add points to the sample as we discover nontargetor inaccessible points at the same time maintaining a spatiallywell-balanced sample Features of the design are demonstratedwith a simulation study and are illustrated with an applicationto rivers and streams in Indiana

2 GENERALIZED RANDOMndashTESSELLATIONSTRATIFIED DESIGN

Before presenting the theoretical development of the GRTSdesign we give a heuristic overview of the process Assumethat the sample frame consists of N points located within a ge-ographic region Assign each point a unit length and place eachpoint in some order (say randomly) on a line The line haslength N units Select a systematic sample of size n from theline by dividing the line into N=n length intervals randomlyselect a starting point between 0 N=n] say k and then takeevery k C iN=nth point for i D 1 n iexcl 1 If the point oc-curs within one of the units then that unit is selected (Brewerand Hanif 1983) For a linear resource use the actual lengthof the units to construct the line For an areal resource ran-domly place a systematic grid over the region randomly se-lect a point in each grid cell and then proceed as in the pointcase A GRTS sample results when a process termed hierarchi-cal randomization is used to place the points on the line Firstrandomly place a 2 pound 2 square grid over the region and placethe cells in random order in a line For each cell repeat thesame process randomly ordering the subcells within each orig-inal cell This second step results in 16 cells in a line Continuethe process until at most one population point occurs in a cellUse the random order of the cells to place the points on the lineThis hierarchical randomizationprocess maps two-dimensionalspace into one-dimensionalspace while preserving spatial rela-tionships as much as possible The combination of hierarchicalrandomization to create the line and systematic sampling witha random start results in a spatially balanced equal probabilitysample Unequal probability sampling is implemented by giv-ing each point a length proportional to its inclusion probability

Stevens (1997) derived inclusion and joint inclusion func-tions for several grid-based designs that were precursors toGRTS designs and share some of their properties The de-signs are all generalizations of the random-tessellation strati- ed (RTS) design (Dalenius et al 1961 Olea 1984 Overtonand Stehman 1993) The RTS design selects random points inspace via a two-step process First a regular tessellation co-herent with a regular grid is randomly located over the do-main to be sampled and second a random point is selectedwithin each random tessellation cell The RTS design is avariation on a systematic design that avoids the alignmentproblems that can occur with a completely regular systematicdesign Like a systematic design a RTS design does not allowvariable probability spatial sampling Stevens (1997) intro-duced the multiple-density nested random-tessellation strati- ed (MD-NRTS) design to provide for variable spatial samplingintensityThe geometric conceptunderlyingthe MD-NRTS wasthe notion of coherent intensi cation of a grid that is addingpoints to a regular grid in such a way as to result in a denserregular grid with similarly shaped but smaller tessellation cellsWe have since extended the same notion by generalizing to aprocess that creates a potentially in nite series of nested co-herent grids In the limit the process results in a function thatmaps two-dimensional space into one-dimensional space

We can cover nite linear and areal populations with thesame developmentif we work in the context of general measureand integration theory Let R be the domain of the populationwe wish to sample that is the set of points occupied by ele-ments of the populationWe require that R be a bounded subsetof R2 Thus R can be enclosed in a bounded square so that byscaling and translation we can de ne a 1ndash1 map from R into01=2] pound 01=2] the lower left quadrant of the unit square(We map to the lower left quadrant so that we can add a ran-dom offset to the image of R and stay within the unit squareThe random offset guarantees that the points from any pair canend up in different quadrants) Clearly every point in the im-age is associated with a unique point in R and vice versa sohenceforth we identify R with its image in the unit square

21 Random Quadrant-Recursive Maps

The heart of the GRTS sample selection method is a functionf that maps the unit square I2 D 01]pound 0 1] onto the unit in-terval ID 0 1] To be useful in achieving a spatially balancedsample f must preserve some proximity relationships so weneed to impose some restrictions on the class of functions tobe considered Mark (1990) in studying discrete two- to one-dimensionalmaps de ned a property called quadrant recursivewhich required that subquadrants be mapped onto sets of adja-cent points To de ne the continuousanalog let

Qnjk D

sup3j

2n

j C 12n

parapound

sup3k

2n

k C 12n

para

j k D 0 1 2n iexcl 1

and let

J nm D

sup3m

4n

m C 14n

para m D 0 1 4n iexcl 1

A function f I2 I is quadrant recursive if for all n cedil 0there is some m 2 f0 1 4n iexcl 1g such that f Qn

jk D J nm

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 265

We can view a quadrant-recursive function as being de nedby the limit of successive intensi cations of a grid coveringthe unit square where a grid cell is divided into four sub-cells each of which is subsequently divided into four sub-subcells and so on If we carried this recursion to the limitand paired grid points with an address based on the order inwhich the divisions were carried out where each digit of theaddress represented a step in the subdivision then we wouldobtain a quadrant-recursive function For example suppose webegin with a point at (1 1) and replace it with four pointsp3 D 11 p2 D 1 1=2 p1 D 1=2 1 and p0 D 1=21=2The next step of the recursion replaces each of the rst fourpoints p0 p3 with pi iexcl f0 0 0 1 1 0 1 1g=22Thus the point p2 D 1 1=2 is replaced with the four pointsp23 D 1 1=2 p22 D 11=4 p21 D 3=4 1=2 and p20 D3=4 1=4 The nth step replaces each of the 4n points pi1 i2 centcentcentinwith pi1 i2centcentcentin iexcl f0 0 0 1 1 0 1 1g=2nC1

A spatially referenced address can be constructed followingthe pattern of the partitioning with each new partition addinga digit position to the address Thus in the preceding exam-ple the four points in the rst group are assigned the addresses3 2 1 and 0 where 3 is the original point at 11 The suc-cessor points to point 2 get the addresses 23 22 21 and 20and so forth The addresses induce a linear ordering of the sub-quadrants Moreover if we carry the process to the limit andtreat the resulting address as digits in a base-4 fraction [egtreat 22103cent cent cent as the base-4 number (22103cent cent cent4] then thecorrespondence between grid point and address is a quadrant-recursive function

Recursive partitioning generates a nested hierarchy of gridcells The derived addressing has the property that all succes-sor cells of a cell have consecutiveaddresses Thus a path fromcell to cell following the recursive partitioning address orderwill connect all successor cells of cell 0 before reaching anysuccessor of cell 2 (Fig 1)

A 1ndash1 continuous mapping of I2 onto I is not possible soquadrant-recursive functions are not continuousHowever theydo have the property that all points in a quadrant are mappedonto an interval all points in any one of the four subquadrantsof a quadrant are mapped onto an interval and so on ad in ni-tum This property tends to preserve proximity relationshipsthat is if s is ldquoclose tordquo t then f s should ldquotend to be close tordquof t In Appendix A we make this statement more precise byshowing that if the origin is located at random and s is chosenat random from I2 then limjplusmnj0 E[jf s iexcl f s C plusmnj] D 0 In-tuitively two elements that are close together will tend to fallin the same randomly located cell of a size that decreases asthe distance between points decreases Because the two ele-ments are covered by the same cell their addresses match tothe level of that cell and thus in expectation their addresseswill be close

A fundamental 1ndash1 quadrant-recursive map is de ned bydigit interweaving Let s D x y be a point in I2 Each of thecoordinates has an expansion as a binary fraction of the formx D x1x2x3 cent cent cent y D y1y2y3 cent cent cent where each xi and yi is either0 or 1 De ne f0s by alternating successive digits of x and y that is f0s D x1y1x2y2 cent cent cent Clearly f0 would be 1ndash1 ex-cept for different expansions of the same number For example1 and 011111cent cent cent where the 1s continue inde nitely are two

Figure 1 First Four Levels of a Quadrant-Recursive Partitioning ofthe Unit Square The address associated with the cross-hatched cellis 213

representations of the number 1=2 If we always use the binaryrepresentation with an in nite number of 1s then f0 is 1ndash1Moreover every point in I is the image of a point in I2 whichis obtained by ldquodigit splittingrdquo That is if t D t1t2t3 cent cent cent is in Ithen s D f iexcl1

0 t D t1t3t5 cent cent cent t2 t4t6 cent cent cent is the preimage of t Both f0 and f iexcl1

0 are 1ndash1 if we always use the representationwith an in nite number of 1s (Hausdorff 1957 p 45) To showthat f0 is quadrant recursive note that for s 2 Qn

jk the rst4n digits of f0s are xed so f0s 2 J n

m where m is de nedby the rst 4n digits Conversely the preimages of every t 2 J n

m

have the same rst 2n digits and so must be in the same Qnjk

Figure 1 shows the rst four levels of the recursive parti-tioning of the unit square The address of the cross-hatchedsubquadrant is as a base-4 fraction (213)4 and the associ-ated grid point is at (3=4 1=2) the upper right corner ofthe subquadrant Following the convention of having an in- nite number of 1s in the expansion we have 3=41=2 D11 12 D 1011111 cent cent cent 0111111 cent cent cent2 Digit interweavinggives the image 10011111cent cent cent2 D 2133333 cent cent cent4 of whichthe rst three digits are the subquadrant address If we carriedthe recursive partitioning to the limit every point in the sub-quadrant would be assigned an address beginningwith 2134

The class of all quadrant-recursive functions can be gener-ated from the function f0 which is de ned by digit interweav-ing by permuting the order in which subquadrants Qn

jk are

paired with the intervals J nm For example for n D 1 f0Q1

jk DJ 1

2jCk We obtain a different quadrant-recursivefunctionby per-muting the subscripts f0 12 3g of the image intervals Thusunder the permutation iquest D f21 30g we get a function suchthat fiquest Q1

jk D J 1iquest 2jCk so that fiquest Q1

00 D J 12 fiquest Q1

01 D J 11

fiquest Q110 D J 1

3 and fiquest Q111 D J 1

0 To see that the class ofall quadrant-recursive functions is generated by such permu-tations express each number in I as a base-4 number that

266 Journal of the American Statistical Association March 2004

is as t D t1t2t3 cent cent cent where each digit ti is either a 0 1 2or 3 A function hp I I is a hierarchical permutationif hpt D p1t1pt12t2pt1 t23t3 cent cent cent where pt1 t2centcentcenttniexcl1ncent isa permutationof f0 12 3g for each uniquecombinationof dig-its t1 t2 tniexcl1 Again we ensure that hp is 1ndash1 by alwaysusing the expansion with an in nite number of nonzero digitsAny quadrant-recursive function can be expressed as the com-position of f0 with some hierarchical permutation hp becausethe associations f Qn

jk D J nm determine the series of permuta-

tions and the permutations de ne the associationsIf the permutations that de ne hpcent are chosen at random

and independentlyfrom the set of all possible permutations wecall hpcent a hierarchical randomization function and call theprocess of applying hpcent hierarchical randomization

22 Sample Selection With Probability Proportional toArbitrary Intensity Function

We assume that the design speci cations de ne a desiredsample intensity function frac14s that is the number of samplesper unit measure of the populationFor example if the popula-tion were a stream network frac14s might specify the number ofsamples per kilometer of stream at s For a discrete populationfrac14s has the usual nite-population-sampling interpretation asthe target inclusion probability of the population unit locatedat s We call frac14s an intensity function because we have notyet introduced a probability measure In Appendix B we de-velop the details of a sample selection method that yields aninclusion-probability function equal to frac14s The concept be-hind the method is the composition of a hierarchical random-ization function with a function that assigns to every intervalin f R a weight equal to the total of the intensity function ofits preimage in R In effect we stretch the image interval viaa distribution function F so that its total length is equal to thesample size M We pick M points by taking a systematic sam-ple with a unit separation along the stretched image and we mapthese points back into the domain R via the inverse function toget the sample of the population We show in Appendix B thatthis procedure does indeed give a sample with an inclusion-probability function equal to the intensity function frac14s

The technique of randomly mapping two-dimensional spaceto a line segment systematically sampling from the range of thedistribution function and then mapping back to the populationelements always produces a sample with the desired rst-orderinclusion-probability function as long as f is 1ndash1 and measur-able We required that f be quadrant recursive and claim thatthis is suf cient to give a spatially balanced sample This claimfollows from the fact that the map f iexcl1 plusmn F=M plusmn f transformsthe unequal intensity surface de ned by frac14 into an equiproba-ble surface The quadrant-recursive property of f guaranteesthat the sample is evenly spread over the equiprobable sur-face (in the sense that each subquadrant receives its expected

number of samples) to the resolution determined by the samplesize M

23 Reverse Hierarchical Ordering

The sample points selected by mapping the systematic pointsalong 0 M] back to the population domain will be ordered ina way that follows the quadrant-recursiveness of f temperedby an allowance for unequal probability selection Thus the rst quarter of the points all will come from the same ldquoquad-rantrdquo of the equiprobabledomain and all will be approximatelyneighbors in the original populationdomain It follows that fourpoints one picked from each quarter of the sample points or-dered by the systematic selection will be a spatially balancedsample Because the random permutations that de ne the hier-archical randomization are selected independently of one an-other it makes no difference from a distributional standpointwhether we pick the points systematically from each quarteror make random selections from each quarter Therefore welose no randomness by picking the points that occupy positionsthat correspond to being at the beginningone-quarter one-halfand three-quarters of the way through the ordered list of samplepoints

Within each quarter of the list the points are again quadrant-recursively ordered so points picked at the beginning one-quarter one-half and three-quarters of the way througheach quarter of the list will be spread out over the correspond-ing quadrant and so on down through the sequence of sub-quadrants We can utilize these properties by reordering thesystematically selected list so that at any point in the reorderedlist the samples up to that point are well spread out over thepopulationdomain

The order is most convenientlyexpressed in terms of a base-4fraction where the fraction expresses the relative position inthe systematically ordered list Thus the rst four points cor-respond to the fractions 0 1 2 34 D 0 1=4 1=2 3=410Stepping down a subquadrant level corresponds to addinga digit position to the base-4 fraction which we ll in sucha way as to spread the sequence of points over the populationdomain The pattern for the rst 16 points is shown in Table 1Note that the order corresponds to the ranking obtained by re-versing the sequence of base-4 digits and treating the reversedsequence as a base-4 fraction

We can continue this same pattern of adding digit positionsthrough as many positions as necessary to order the entire sam-ple The resulting order is called reverse hierarchical orderIt remains to show that reverse hierarchical order does indeedgive a spatially well-balanced sample for any m middot M Clearlythis is the case for m D 4k because the reduced sample canbe viewed as a sample selected from a complete GRTS designStevens (1997) derived an analytic expression for the pairwiseinclusion density for some special intermediate cases Here weinvestigate the spatial balance properties using simulation

Table 1 Generation of Reverse Hierarchical Order

Reverse Reverse Reverse ReverseOrder Base 4 base 4 Order Base 4 base 4 Order Base 4 base 4 Order Base 4 base 4

1 00 00 5 01 10 9 02 20 13 03 302 10 01 6 11 11 10 12 21 14 13 313 20 02 7 21 12 11 22 22 15 23 324 30 03 8 31 13 12 32 23 16 33 33

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 267

3 SPATIAL PROPERTIES OF GRTS SAMPLE POINTS

In this section we investigate the spatial balance or regu-larity of the sample points produced by a GRTS design Wenoted in the Introduction that generally the ef ciency of anenvironmental sample increases as spatial regularity increasesA design with regularity comparable to a maximally strati edsample should have good ef ciency Choosing a suitable sta-tistic to describe regularity is nontrivial because the popula-tion domain itself is likely to have some inherent nonregularity(eg variation in spatial density for a nite or linear popula-tion) and because of the need to account for variable inclusionprobability The measure of regularity needs to describe reg-ularity over the inclusion-probability-weighted irregular pop-ulation domain Various statistics to assess the regularity ofa point process have been proposed in the study of stochasticpoint processes One class of descriptive statistics is based oncounts of event points within cells of a regular grid that cov-ers the process domain The mean count is a measure of theprocess intensity and the variance of the counts is a measureof the regularity The usual point process approach is to invokeergodicity and take expectation over a single realization In thepresent case the expectationshould and can be taken over repli-cate sample selections

We illustrate this approach using an arti cial nite popula-tion that consists of 1000 points in the unit square with a spatialdistribution constructed to have high spatial variability that isto have voids and regions with densely packed points Variableprobability was introduced by randomly assigning 750 unitsa relative weight of 1 200 units a weight of 2 and 50 unitsa weight of 4 The inclusion probability was obtained by scal-ing the weights to sum to the sample size We divided the unit

square into 100 square cells with sides 1 units Fifty-one of thecells were empty The expected sample sizes (the sum of theinclusion probability for each cell) for the 49 nonempty cellsranged from 037 to 4111

We compared the regularity of three sampling designs in-dependent random sampling (IRS) spatially strati ed sampling(SSS) and GRTS sampling For each sampling scheme we se-lected 1000 replicates of a sample of 50 points and counted thenumber of sample points that fell into each of the 49 nonemptycells de ned in the previous paragraph For the IRS sample weused the S-PLUS (Insightful Corporation 2002) ldquosamplerdquo func-tion with ldquoprobrdquo set to the element inclusion probability

As we noted in the Introductionthere is no general algorithmfor partitioning an arbitrary nite spatial population with vari-able inclusionprobability into spatial strata with equal expectedsample sizes For this exercisewe chose to use equal-area stratawith variable expected sample sizes For simplicity we chosesquare strata We picked a side length and origin so that (1)the strata were not coherent with the 1 pound 1 cells used for reg-ularity assessment and (2) about 50 stratum cells had at leastone population point The strata we used were offset from theorigin by 03 03 with a side length of 095 Exactly 50 strat-i cation cells were nonemptywith expected sample sizes rang-ing from 037 to 4111 Figure 2 shows the population with thestrati cation cells overlaid

We selected the strati ed sample in two stages The fractionalparts of the expected sample sizes will always sum to an inte-ger in this case 21 The rst step in the sample selection was toselect which 21 of the 50 strata would receive an ldquoextrardquo samplepoint For this step we again used the S-PLUS ldquosamplerdquo func-tion this time with ldquoprobrdquo set to the fractional part of the ex-pected sample size The second step in the sample selection was

Figure 2 Finite Population Used in Spatial Balance Investigation Overlaid With Grid Cells Used for Stratication Cell cross-hatching indicatesthe expected sample size in each cell

268 Journal of the American Statistical Association March 2004

to pick samples in each of the 50 strata that had a sample sizeequal to the integerpart of the expected sample size plus 1 if thestratum was selected in stage 1 Again the sample was selectedwith the ldquosamplerdquo function this time with ldquoprobrdquo set to the ele-ment inclusionprobabilityThis two-stage procedurealways se-lects exactly 50 samples with the desired inclusion probability

In Figure 3 we plot the variance of the achieved sample sizein each of the evaluation cells versus the expected sample sizewith lowess tted lines Of the three designs the IRS has thelargest variance and the GRTS has the smallest the SSS designis approximatelymidway between Strati cation with one sam-ple per cell would likely have about the same variance as theGRTS

Another common way to characterize a one-dimensionalpoint process is via the interevent distance for example themean interevent time for a time series measures the intensity ofthe process and the variance measures the regularity An anal-ogous concept in two-dimensions is that of Voronoi polygonsFor a set of event points fs1 s2 sng in a two-dimensionaldomain the Voronoi polygon 9i for the ith point is the collec-tion of domain points that are closer to si than to any other sj inthe set Note that in the case of a nite population the Voronoildquopolygonsrdquoare collectionsof populationpoints and for a linearpopulation they are collections of line segments

We propose using a statistic based on Voronoi polygons todescribe the regularity of a spatial sample For the sample S

consisting of the points fs1 s2 sng let ordmi DR

9ifrac14s dAacutes

so that vi is the total inclusion probability of the Voronoi poly-gon for the ith sample point and set sup3 D Varfordmig For a -nite population with variable inclusion probability vi is thesum of the inclusion probability of all population units closerto the sample point si than to any other sample point BecauseP

i j9ij D jRj andR

Rfrac14s dAacutes D n E[vi ] D 1 We note that

for an equiprobable sample of a two-dimensionalcontinua sup3 isequal to the variance of the area of the Voronoi polygons for thepoints of S multipliedby the square of the inclusionprobability

Figure 3 Comparison of the Regularity of GRTS SSS and IRS De-signs Results are based on the mean of 1000 samples of size 50 Theachieved sample size is the number of samples that fell into 1 pound 1square cells that tiled the population domain Lines were tted withlowess (N generalized random tessellation stratied sampling pound inde-pendent random sampling brvbar spatially stratied sampling)

For the kinds of applications that we have in mind the spa-tial context of the population is an intrinsic aspect of the sampleselection For a nite population the spatial context simplycomprises the locationsof the populationunits for a linear pop-ulation the spatial context is the network and for an areal re-source the spatial context is described by the boundary of theresource domain which may be a series of disconnected poly-gons The effect of the interplay of sampling design and spa-tial context on properties of the sample cannot be ignored Forsmall to moderate sample sizes or for highly irregular domainsthe spatial context can have a substantial impact on the distri-bution of sup3 Because of the spatial dependence the derivationof a closed form for the distribution of sup3 does not seem feasi-ble even for simple sampling designs such as IRS Howeverfor most cases it should be relatively easy to simulate the dis-tribution of sup3 under IRS to obtain a standard for comparisonThe regularity of a proposed design can then be quanti ed asthe ratio sup3proposed design=sup3IRS where ratios less than 1indicate more regularity than an IRS design

We evaluated spatial balance using the sup3 ratio under threescenarios (1) a variable probability sample from a nite popu-lation (2) an equiprobable point sample from an areal popula-tion de ned on the unit square and (3) an equiprobable pointsample from the same extensive populationbut with randomlylocated square holes to model nonresponseand imperfect frameinformation

For the nite population study we drew 1000 samples ofsize 50 from the previously described nite population for boththe GRTS and the IRS designs To illustrate the ability of theGRTS design to maintain spatial regularity as the sample sizeis augmented we ordered the GRTS points using reverse hi-erarchical ordering We then calculated the sup3 ratio beginningwith a size of 10 and adding one point at a time following thereverse hierarchical order We also drew 1000 samples of size50 using the previously discussed spatial strati cation Becausethere is no sensible way to add the strati ed sample points oneat a time we can compute the sup3 ratio only for the complete sam-ple of 50 points Figure 4 is a plot of the sup3 ratio for GRTS andIRS versus sample size The single sup3 ratio for SSS(50) is alsoshown For the GRTS design the sup3 ratio has a maximum valueof 587 with 10 samples and gradually tapers off to 420 with50 samples Although it would be dif cult to prove we sus-pect that the gradual taper is due to lessening edge effect withincreasing sample size that is fewer of the Voronoi polygonscross the void regions in the population domain We note thatthe valleys in the sup3 ratio occur at multiples of 4 with the mostextreme dips occurring at powers of 4 This is a consequenceof quadrant-recursive partitioning maximum regularity occurswith one point from each of the four quadrants We also notethat the SSS(50) value of the sup3 ratio is 550 compared to thecorrespondingvalue of 420 for the GRTS design Inasmuch asthe GRTS is analogous to a one-sample-per-stratum SSS wewould expect the GRTS to be as ef cient as a maximally ef -cient SSS

For our extensive population study we selected 1000 sam-ples of size M D 256 from the unit square using the GRTS

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 269

Figure 4 The sup3 Ratio as a Function of Sample Size Based on 1000Replicate Samples From an Arti cial Finite Population Sample pointswere added one point at a time up to the maximum sample size of 50following reverse hierarchical order for the GRTS sample The sup3 ratio fora spatially stratied sample is also indicated on the plot

design and ordered the samples using the reverse hierarchicalorder As for the nite populationstudy we calculated the sup3 ra-tio as the points were added to the sample one at a time be-ginning with point number 10 The holes represent nontargetor access-denied elements that were a priori unknown Samplepoints that fell in the holes were discarded resulting in a vari-able number of sample points in the target domain As for thecomplete domain we ordered the points using reverse hierar-chical ordering and then calculated the sup3 ratio as the points wereadded one at a time Because the sample points that fall into thenontarget areas contribute to the sample point density but not tothe sample size the sup3 ratio was plotted versus point density

We used three different distributions of hole size constantlinearly increasing and exponentially increasing In each casethe holes comprise 20 of the domain area Figure 5 shows the

Figure 5 Void Patterns Used to Simulate InaccessiblePopulation El-ements

Figure 6 The sup3 Ratio as a Function of Point Density Based on 1000Replications of a Sample of Size 256 (mdashmdash continuous domain with novoids iexcl iexcl iexcliexcl exponentialy increasing polygon size - - - - - linearly in-creasing polygon size cent cent centmdashcent cent centmdash constant polygon size)

placementof the holes for each scenario and Figure 6 shows thesup3 ratio for all four scenarios no voids exponentiallyincreasinglinearly increasing and constant size

In every scenario the variance ratio is much less than 1Except for small sample sizes the ratio stays in the range of2 to 4 The gradual decrease as the sample size increases is dueto the decreasing impact of the boundaryas the sample size in-creases the proportion of polygons that intersect the boundarydecreases A similar effect is seen with the different inaccessi-bility scenarios even though the inaccessible area is constantthe scenarios with greater perimeter cause more increase invariance

4 STATISTICAL PROPERTIES OF GRTS DESIGN

41 Estimation

The GRTS design produces a sample with speci ed rst-order inclusion probabilities so that the HorvitzndashThompson(Horvitz and Thompson 1952) estimator or its continuouspop-ulation analog (Cordy 1993 Stevens 1997) can be applied toget estimates of population characteristics Thus for examplean estimate of the population total of a response z is given byOZT D

Psi2Rzsi=frac14si Stevens (1997) provided exact ex-

pressions for second-order inclusion functions for some specialcases of a GRTS These expressions can also be used to provideaccurate approximations for the general case Unfortunatelythe variance estimator based on using these approximations inthe usual HorvitzndashThompson (HT) or YatesndashGrundyndashSen (YGYates and Grundy 1953 Sen 1953) estimator tends to be unsta-ble The design achieves spatial balance by forcing the pairwiseinclusion probability to approach 0 as the distance between thepoints in the pair goes to 0 Even though the pairwise inclu-sion density is nonzero almost everywhere any moderate-sizedsample will nevertheless have one or more pairs of points thatare close together with a correspondinglysmall pairwise inclu-sion probability For both the HT and YG variance estimatorsthe pairwise inclusion probability appears as a divisor The cor-

270 Journal of the American Statistical Association March 2004

responding terms in either HT or YG variance estimators willtend to be large leading to instability of the variance estimator

Contrast-based estimators of the form OVCtr OZT DP

i wiy2i

where yi is a contrast of the form yi DP

k cikzsk withPk cik D 0 have been discussed by several authors (Yates

1981 Wolter 1985 Overton and Stehman 1993) For an RTSdesign Overton and Stehman also considered a ldquosmoothedrdquocontrast-based estimator of the form OVSMO OZT D

Pi wizi iexcl

zcurreni 2 where zcurren

i called the smoothed value for data point zi istaken as a weighted mean of a point plus its nearest neighborsin the tessellation

Stevens and Olsen (2003) proposed a contrast-based esti-mator for the GRTS design that bears some resemblance tothe Overton and Stehman smoothed estimator The single con-trast zi iexcl zcurren

i 2 is replaced with an average of several con-trasts over a local neighborhood analogous to a tessellationcell and its nearest neighbors in the RTS design A heuris-tic justi cation for this approach stems from the observationthat the inverse images of the unit-probability intervals on theline form a random spatial strati cation of the population do-main The GRTS design conditional on the strati cation isa one-sample-per-stratumspatially strati ed sample Recall thatOZT D

Psi2Rzsi=frac14si where zsi is a sample from the

ith random stratum The selections within strata are condition-ally independent of one another so that

V OZT DX

si2R

E

microV

sup3zsi

frac14si

shyshyshystrata

acutepara

The proposed variance estimator approximates E[V zsi=

frac14sij strata] by averaging several contrasts over a localneighborhoodof each sample point The estimator is

OVNBH OZT DX

si2R

X

sj 2Dsi

wij

sup3zsj

frac14sj iexcl

X

sk2Dsi

wikzsk

frac14sk

acute2

where Dsi is a local neighborhood of the si The weightswij are chosen to re ect the behavior of the pairwise inclu-sion function for GRTS and are constrained so that

Pi wij DP

j wij D 1 Stimulation studies with a variety of scenarioshave shown the proposed estimator to be stable and nearly unbi-ased Applications with real data have consistently shown thatour local neighborhoodvariance estimator produces smaller es-timates than the HorvitzndashThompson estimator when IRS is as-sumed to approximate for the joint inclusion probabilities

42 Inverse Sampling

The reverse hierarchical ordering provides the ability to doinverse sampling that is to sample until a given number ofsamples are obtained in the target population The true inclu-sion probability in this case depends on the spatial con gura-tion of the target populationwhich may be unknownHoweverone can compute an inclusion probability that is conditional onthe achieved sample size in the target population being xedFor example suppose we want M sample points in our do-main R We do not know the exact boundaries of R but areable to enclose R in a larger set Rcurren We select a sample of sizeMcurren gt M from Rcurren using an inclusion density frac14 curren scaled so that

Table 2 Domain Area Estimates Using ConditionalInclusion Probability

Targetsamplesize

Mean estimated domain area

Exponential Constant Linear

25 8000979 7969819 801058950 7995775 7979406 8005739

100 7994983 7980543 8002237150 7994777 7997587 7995685

RRcurren frac14currens dAacutes D M curren The inclusion density for the k-point

reverse hierarchical ordered sample is frac14currenk s D k=Mcurrenfrac14currens

Using the inclusion density frac14 currenk the expected number of sam-

ples in R is

Mk DZ

R

frac14currenk s dAacutes D

Z

RcurrenIRsfrac14 curren

k s dAacutes

We cannot compute Mk because the boundaryof R is unknownbut an estimate is

OMk DX

i

IR sifrac14currenk si

frac14currenk si

DX

i

IRsi

We pick Qk so that OMk D M and base inference on frac14currenQk Thus

for example an estimate of the unknown extent of R is j ORj DPiIRsi=frac14curren

Qksi

We illustrate this using the same inaccessibility scenarios asfor the spatial balance simulation Results are summarized inTable 2 In each case the true area of R is 8 so that the esti-mator using frac14curren

Qk is either unbiased or nearly so

43 Statistical Ef ciency

As discussed in the Introduction sampling designs withsome degree of spatial regularity for example systematic grid-based or spatially strati ed designs tend to be more ef cientfor sampling natural resources than designs with no spatialstructure The GRTS design takes the concept of spatial strat-i cation carries it to an extreme and gives it exibility androbustness The basis for these claims is that for the case ofan equiprobable sample of an areal resource over a continuousconnected domain a GRTS sample with size n D 4k is a spa-tially strati ed sample with one sample point per stratum Inthis case the strata are square grid cells with a randomly locatedorigin Generally the ef ciency of a spatially strati ed sampleincreases as the number of strata increases (samples per stratumdecreases) so maximal ef ciency is obtained for a one-point-per-stratum-design Thus in this restricted case the GRTS hasthe same ef ciency as the maximally ef cient spatial strati ca-tion

The spatial regularity simulation studies provide some in-sight into less restrictive cases First the ldquono-voidrdquo case ofthe continuous domain study shows that the spatial regularityis not seriously degraded for sample sizes that are not powersof 4 so that even for intermediate sample sizes the GRTS ef- ciency should be close to the ef ciency of maximal spatialstrati cation Second the ldquoholesrdquo cases show that for irreg-ularly shaped domains GRTS maintains spatial regularity Inthis case GRTS with n D 4k is again a one-point-per-stratum

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 271

design but the strata are no longer regular polygonsNeverthe-less GRTS should have the same ef ciency as maximal strati- cation

An example of circumstances where ef ciency is dif cultto evaluate is a nite population study with variable probabil-ity and irregular spatial density In these circumstances spatialstrata can be very dif cult to form and in fact it may be impos-sible to form strata with a xed number of samples per stratumA GRTS sample achieves the regularity of a one-sample-per-stratum strati cation and so should have the same ef ciency

The overwhelming advantage of a GRTS design is not that itis more ef cient than spatial strati cation but that it can be ap-plied in a straightforward manner in circumstances where spa-tial strati cation is dif cult All of the pathologies that occurin sampling natural populations (poor frame information inac-cessibility variable probability uneven spatial pattern missingdata and panel structures) can be easily accommodated withinthe GRTS design

5 EXAMPLE APPLICATION TO STREAMS

The Indiana Department of Environmental Management(IDEM) conducts water quality and biological assessments ofthe streams and rivers within Indiana For administrative pur-poses the state is divided into nine hydrologicbasins East ForkWhite River Basin West Fork White River Basin Upper IllinoisRiver Basin Great Miami River Basin Lower Wabash RiverBasin Patoka River Basin Upper Wabash River Basin GreatLakes Basin and Ohio River Basin All basins are assessedonce during a 5-year period typically two basins are completedeach year In 1996 IDEM initiated a monitoring strategy thatused probability survey designs for the selection of samplingsite locations We collaborated with them on the survey designIn 1997 a GRTS multidensity design was implemented for theEast Fork White River Basin and the Great Miami River BasinIn 1999 another GRTS multidensity design was implementedfor the Upper Illinois Basin and the Lower Wabash These de-signs will be used to illustrate the application of GRTS surveydesigns to a linear network

The target population for the studies consists of all streamsand rivers with perennially owing water A sample frameRiver Reach File Version 3 (RF3) for the target populationis available from the US Environmental Protection Agency(Horn and Grayman 1993) The RF3 includes attributes that en-able perennial streams and rivers to be identi ed but results inan overcoverage of the target population due to coding errorsIn addition Strahler order is available to classify streams andrivers into relative size categories (Strahler 1957) A headwaterstream is a Strahler rst-order stream two rst-order streamsjoining results in a second-order stream and so on Approx-imately 60 of the stream length in Indiana is rst order

Table 3 Sample Frame Stream and River Length by Basin andStrahler Order Category

Strahler order category length (km)

Basin Total length (km) 1 2 3 C

E Fork White 6802385 3833335 2189494 779556Great Miami 2270018 1501711 621039 147268L Wabash 7601418 4632484 1331228 1637706U Illinois 5606329 4559123 500188 547018

20 is second order 10 is third order and 10 is fourth andgreater (see Table 3) In 1997 IDEM determined that the sam-ple would be structured so that approximately an equal num-ber of sites would be in rst order second and third order andfourth C order for the East Fork White River and the Great Mi-ami River basins In 1999 the sample was modi ed to have anequal number of sites in rst second third and fourth C ordercategories for the Lower Wabash and Upper Illinois basins

The GRTS multidensity survey designs were applied In bothyears six multidensity categories were used (three Strahler or-der categories in each of two basins) Although four Strahlerorder categories were planned in 1999 the stream lengths as-sociated with the third and fourth C categories were approxi-mately equal so a single category that combined the samplesizes was used To account for frame errors landowner denialsand physically inaccessible stream sites a 100 oversamplewas incorporated in 1999 The intent was to have a minimumof 38 biological sites with eld data in 1999 this was not donein 1997 Table 4 summarizes the number of sites expected andactually evaluated as well as the number of nontarget targetnonresponse and sampled sites Almost all of the nonresponsesites are due to landowner denial In 1999 the sites were usedin reverse hierarchical order until the desired number of ac-tual eld sample sites was obtained The biological sites werea nested subsample of the water chemistry sites and were takenin reverse hierarchicalorder from the water chemistry sites Fig-ures 7ndash10 show the spatial pattern of the stream networks andthe GRTS sample sites for each of the four basins by Strahlerorder categories Although this is an example of a single real-izationof a multidensityGRTS design all realizationswill havea similar spatial pattern Prior to statistical analysis the initialinclusion densities are adjusted to account for use of oversam-ple sites by recalculating the inclusion densities by basin

Indiana determined two summary indices related to the eco-logical conditionof the streams and rivers the IBI score whichis a sh community index of biological integrity (Karr 1991)that assesses water quality using resident sh communities asa tool for monitoring the biological integrity of streams and theQHEI score which is a habitat index based on the Ohio Envi-ronmental Protection Agency qualitative habitat evaluation in-dex (see IDEM 2000 for detailed descriptions of these indices)

Table 4 Survey Design Sample Sizes for Basins Sampled in 1997 and 1999

Expected Evaluated Nontarget Target Nonresponse Water chemistry BiologicalBasin sample size sample size sites sites sites sites sites

E Fork White 60 60 5 55 9 35 34Great Miami 40 40 12 28 5 19 19L Wabash 128 91 11 80 9 71 39U Illinois 128 85 8 77 5 72 41

272 Journal of the American Statistical Association March 2004

Figure 7 East Fork White River Basin Sample Sites by Multidensity Categories

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 273

Figure 8 Great Miami River Basin Sample Sites by Multidensity Categories

274 Journal of the American Statistical Association March 2004

Figure 9 Upper Illinois River Basin Sample Sites by Multidensity Categories

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 275

Figure 10 Lower Wabash River Basin Sample Sites by Multidensity Categories

276 Journal of the American Statistical Association March 2004

Table 5 Population Estimates With IRS and Local Variance Estimates

Indicator IRS Local DifferenceSubpopulation score N sites Mean std err std err ()

L Wabash IBI 39 361 21 14 iexcl568U Illinois IBI 41 325 17 13 iexcl448E Fork White IBI 32 351 13 12 iexcl223Great Miami IBI 19 408 27 22 iexcl335L Wabash QHEI 39 556 23 16 iexcl522U Illinois QHEI 41 433 21 16 iexcl395E Fork White QHEI 34 543 21 15 iexcl459Great Miami QHEI 19 678 22 19 iexcl260

Table 5 summarizes the population estimates for IBI andQHEI scores for each of the four basins The associated stan-dard error estimates are based on the HorvitzndashThompson ratiovariance estimator assuming an independent random sampleand on the local neighborhoodvariance estimator described inSection 41 On average the neighborhood variance estimatoris 38 smaller than the IRS variance estimator Figure 11 il-lustrates the impact of the variance estimators on con denceintervals for cumulative distribution function estimates for theLower Wabash Basin

6 DISCUSSION

There are a number of designs that provide good disper-sion of sample points over a spatial domain When we appliedthese designs to large-scale environmental sampling programsit quickly became apparent that we needed a means (1) toaccommodate variable inclusion probability and (2) to adjustsample sizes dynamically These requirements are rooted inthe very fundamentals of environmental management The rstrequirement stems from the fact that an environmental re-source is rarely uniformly important in the objective of themonitoring there are always scienti c economic or politi-cal reasons for sampling some portions of a resource moreintensively than others Two features of environmental moni-toring programs drive the second requirement First these pro-grams tend to be long lived so that even if the objectives ofthe program remain unchangedthe ldquoimportantrdquosubpopulationschange necessitating a corresponding change in sampling in-tensity Second a high-quality sampling frame is often lack-ing for environmental resource populationsAs far as we knowthere is no other technique for spatial sampling that ldquobalancesrdquoover an intensity metric instead of a Euclidean distance metricor permits dynamic modi cation of sample intensity

Adaptive sampling (Thompson 1992 pp 261ndash319) is an-other way to modify sample intensity However there are somesigni cant differences between GRTS and adaptive sampling inthe way the modi cation is accomplished Adaptive samplingincreases the sampling intensity locally depending on the re-sponse observed at a sample point whereas the GRTS intensitychange is global

The GRTS rst-order inclusion probability (or density) canbe made proportional to an arbitrary positive auxiliary vari-able for example a signal from a remote sensing platformor a sample intensity that varies by geographical divisions orknown physical characteristics of the target populationIn somepoint and linear situations it may be desirable to have thesample be spatially balanced with respect to geographic spacerather than with respect to the population density This can be

Figure 11 Stream Network and Sample Site Spatial Patterns by Mul-tidensity Category for the Lower Wabash Basin (mdashmdash- CDF estimateiexcl iexcl iexcl iexcl iexcl 95 local condence limits cent cent cent cent cent cent cent cent cent95 IRS condencelimits)

achieved by making the inclusion probability inversely propor-tional to the population density Although the development ofGRTS has focused on applications in geographic space it canbe applied in other spaces For exampleone applicationde nedtwo-dimensionalspace by the rst two principal componentsofclimate variables and selected a GRTS sample of forest plots inthat space

The computational burden in hierarchical randomization canbe substantial However it needs to be carried out only to a res-olution suf cient to obtain no more than one sample point persubquadrant The actual point selection can be carried out bytreating the subquadrants as if they are elements of a nitepopulation selecting the M subquadrants to receive samplepoints and then selecting one population element at randomfrom among the elements contained within the selected sub-quadrants according to the probability speci ed by frac14

Reverse hierarchical ordering adds a feature that is im-mensely popular with eld practitioners namely the ability toldquoreplacerdquo samples that are lost due to being nontarget or inac-cessible Moreover we can replace the samples in such a wayas to achieve good spatial balance over the population that isactually sampleable even when sampleability cannot be deter-mined prior to sample selectionOf course this feature does noteliminate the nonresponse or the bias of an inference to the in-accessible population It does however allow investigators toobtain the maximum number of samples that their budget willpermit them to analyze

Reverse hierarchical ordering has other uses as well One isto generate interpenetrating subsamples (Mahalanobis 1946)For example 10 interpenetrating subsamples from a samplesize of 100 can be obtained simply by taking consecutive sub-sets of 10 from the reverse hierarchical ordering Each subsethas the same properties as the complete design Consecutivesubsets can also be used to de ne panels of sites for applica-tion in surveys over time for example sampling with partial re-placement (Patterson 1950 Kish 1987 Urquhart Overton andBirkes 1993)

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 277

APPENDIX A PROOF OF LEMMA

Lemma Let f I2 I be a 1ndash1 quadrant-recursive function andlet s raquo UI2 Then limjplusmnj0 Efjf s iexcl f s C plusmnjg D 0

Proof If for some n gt 0 s and s C plusmn are in the same subquad-rant Qn

jk then f s and f s C plusmn are in the same interval J nm so

that jf s iexcl f s C plusmnj middot 1=4n The probability that s and s C plusmn arein the same subquadrant is the same as the probability of the ori-gin and plusmn D plusmnx plusmny being in the same cell of a randomly locatedgrid with cells congruent to Qn

jk For plusmnx plusmny middot 1=2n that probability

is equal to jQn0 Qnplusmnj=jQn0j D 1 iexcl 2nplusmnx C plusmny C 4nplusmnxplusmny where Qnx denotes a polygon congruent to Qn

jk centered on x

For Ds plusmn D jf s iexcl f s C plusmnj then we have that P D middot 1=4n cedil1 iexcl 2nplusmnx C plusmny C 4nplusmnxplusmny Thus the distribution function FD of D isbounded below by

FD u cedil

8gtlt

gt

0 u middot 14n

1 iexcl 2nplusmnx C plusmny C 4nplusmnxplusmny u gt1

4n

Because D is positive and bounded above by 1

E[Dplusmn] D 1 iexclZ 1

0FD udu

middot 1 iexclraquo

0

4nC

sup31 iexcl 1

4n

acuteiexcl 2nplusmnx C plusmny C 4nplusmnxplusmny

frac14

For xed n we have that

limjplusmnj0

E[Dplusmn] middot 14n

but this holds for all n so that

limjplusmnj0

E[Dplusmn] D 0

APPENDIX B PROOF THAT THE PROBABILITYINCLUSION FUNCTION EQUALS THE

TARGET INTENSITY FUNCTION

We need the measure space XB Aacute where X is the unit inter-val I D 0 1] or the unit square I2 D 0 1] pound 01] and the rele-vant frac34 elds are BI and BI2 the frac34 elds of the Borel subsetsof I and I2 respectively For each of the three types of populationswe de ne a measure Aacute of population size We use the same symbolfor all three cases but the speci cs vary from case to case For a -nite population we take Aacute to be counting measure restricted to R sothat for any subset B 2 BI2 AacuteB is the number of population ele-ments in B R For linear populations we take AacuteB to be the lengthof the linear population contained within B Clearly Aacute is nonnega-tive countably additive de ned for all Borel sets and Aacute D 0 soAacute is a measure Finally for areal populations we take AacuteB to be theLebesgue measure of B R

We begin by randomly translating the image of R in the unit squareby adding independent U0 1=2 offsets to the xy coordinates Thisrandom translation plays the same role as random grid location doesin an RTS design namely it guarantees that pairwise inclusion prob-abilities are nonzero In particular in this case it ensures that any pairof points in R has a nonzero chance of being mapped into differentquadrants

Let frac14s be an inclusion intensity function that is a function thatspeci es the target number of samples per unit measure We assumethat any linear population consists of a nite number m of smoothrecti able curves R D

SmiD1fdegit D xi t yi t jt 2 [ai bi ]g with

xi and yi continuous and differentiable on [ai bi ] We set frac14s equalto the target number of samples per unit length at s for s 2 L and

equal to zero elsewhere For example if the linear population werea stream network frac14s would specify the desired number of samplesper kilometer of stream at the point s Finally an areal population isa nite collection of closed polygons In this case frac14s speci es thetarget intensity as number of samples per unit area Note that for one-and two-dimensional resources frac14s could be a continuous smoothlyvarying function Formally we require frac14s to be bounded and mea-surable strictly positive on R and zero elsewhere and scaled so thatM D

RR frac14s dAacutes From these de nitions of frac14cent and Aacutecent it follows

that wB DR

B frac14sdAacutes is a measure and that wB is the targetnumber of samples in B In particular M D wI2 is the target samplesize In the following discussion we assume that M is an integer thenoninteger case is a simple extension

Let f cent be a quadrant-recursive function that maps I2 into I Be-cause BI can be generated by sets of the form J n

m and BI2 can begenerated by sets of the form Qn

j k both f and f iexcl1 are measurable

Because f is measurable f iexcl1B is measurable for B 2 BI so thatQF x D

Rf iexcl10x] frac14sdAacutes exists In fact QF is a distribution func-

tion that is nonnegative increasing and right continuous For linearand areal resources QF is a continuous increasing function but for -nite resourcepopulations QF is a step function with jumps at the imagesof populationelements We can modify QF to obtain continuity in the -nite case via linear interpolation that is let xi i D 1 N be the or-dered jump points of QF set x0 D 0 xNC1 D 1 and for xi lt x middot xiC1set F x D QF x C QFxiC1 iexcl QFxi =xiC1 iexcl xi x iexcl xi If we setF D QF for the linear and areal case then in every cases we have thatF is a continuous distribution function with range 0M]

In the nite case Fiexcl1 is single-valued so that Gy D minxi jF iexcl1y middot xi is well de ned In the linear and areal cases F iexcl1

may not be single-valued Points that are in the unit square but notin R lead to ats in F that correspond to regions in the unit squarewith frac14s D 0 However Fiexcl1y always will be closed and boundedso that Gy D minfxjx 2 Fiexcl1yg is well de ned In all cases theintensity function frac14 is positive at s D f iexcl1Gy that is there isa population element at s Thus f iexcl1 plusmn G maps 0M] onto the tar-get population that is f iexcl1 plusmn G associates every point in 0 M] witha unique element in the population

It follows that selecting a sample from 0 M] also selects pop-ulation elements via the mapping f iexcl1 plusmn G To get a sample withan inclusion function equal to the target inclusion density we selecta sample from 0M] by splitting the range into M unit-length in-tervals 01] 1 2] M iexcl 1M] and picking one point in eachinterval Because of hierarchical randomizationwe gain no additionalldquorandomnessrdquo by picking the points independently so we use system-atic sampling with a random start and a unit-length selection intervalThe selection procedure de nes an inclusion probability density func-tion on 0M] with a correspondingmeasure PM cent Note that PM co-incides with Lebesgue measure on 0 M] in particular the measureof a subinterval of 0 M] is its length We induce a measure P1 on I

via P1B DRGiexcl1B dP M and in turn induce a measure P2 on I2

via P2B DR

f iexcl1B dP 1 The measure P2 is an inclusion probability

measure on I2 and P2B D wB so the sample selection methoddoes give an inclusion probability function equal to the target sampleintensity function

[Received August 2002 Revised September 2003]

REFERENCES

Bellhouse D R (1977) ldquoSome Optimal Designs for Sampling in Two Dimen-sionsrdquo Biometrika 64 605ndash611

Bickford C A Mayer C E and Ware K D (1963) ldquoAn Ef cient Sam-pling Design for Forest Inventory The Northeast Forest Resurveyrdquo Journalof Forestry 61 826ndash833

278 Journal of the American Statistical Association March 2004

Breidt F J (1995) ldquoMarkov Chain Designs for One-per-Stratum SamplingrdquoSurvey Methodology 21 63ndash70

Brewer K R W and Hanif M (1983) Sampling With Unequal ProbabilitiesNew York Springer-Verlag

Cochran W G (1946) ldquoRelative Accuracy of Systematic and Strati ed Ran-dom Samples for a Certain Class of Populationsrdquo The Annals of Mathemati-cal Statistics 17 164ndash177

Cordy C (1993) ldquoAn Extension of the HorvitzndashThompson Theorem to PointSampling From a Continuous Universerdquo Probability and Statistics Letters18 353ndash362

Cotter J and Nealon J (1987) ldquoArea Frame Design for Agricultural SurveysrdquoUS Department of Agriculture National Agricultural Statistics Service Re-search and Applications Division Area Frame Section

Dalenius T Haacutejek J and Zubrzycki S (1961) ldquoOn Plane Sampling and Re-lated Geometrical Problemsrdquo in Proceedings of the 4th Berkeley Symposiumon Probability and Mathematical Statistics 1 125ndash150

Das A C (1950) ldquoTwo-Dimensional Systematic Sampling and the AssociatedStrati ed and Random Samplingrdquo Sankhya 10 95ndash108

Gibson L and Lucas D (1982) ldquoSpatial Data Processing Using BalancedTernaryrdquo in Proceedings of the IEEE Computer Society Conference on Pat-tern Recognition and Image Processing Silver Springs MD IEEE ComputerSociety Press

Gilbert R O (1987) Statistical Methods for Environmental Pollution Moni-toring New York Van Nostrand Reinhold

Hausdorff F (1957) Set Theory New York ChelseaHazard J W and Law B E (1989) Forest Survey Methods Used in the USDA

Forest Service EPA6003-89065 Corvallis Oregon US EnvironmentalProtection Agency Of ce of Research and Development Environmental Re-search Laboratory

Horn C R and Grayman W M (1993) ldquoWater-Quality Modeling With EPARiver Reach File Systemrdquo Journal of Water Resources Planning and Man-agement 119 262ndash274

Horvitz D G and Thompson D J (1952) ldquoA Generalization of SamplingWithout Replacement From a Finite Universerdquo Journal of the American Sta-tistical Association 47 663ndash685

Iachan R (1985) ldquoPlane Samplingrdquo Statistics and Probability Letters 50151ndash159

IDEM (2000) ldquoIndiana Water Quality Report 2000rdquo Report IDEM34020012000 Indiana Department of Environmental Management Of ce of Wa-ter Management Indianapolis Indiana

Insightful Corporation (2002) ldquoS-PLUS 6 for Windows Language ReferencerdquoInsightful Corporation Seattle WA

Karr J R (1991) ldquoBiological Integrity A Long Neglected Aspect of WaterResource Managementrdquo Ecological Applications 1 66ndash84

Kish L (1987) Statistical Design for Research New York WileyMahalanobis P C (1946) ldquoRecent Experiments in Statistical Sampling in

the Indian Statistical Instituterdquo Journal of the Royal Statistical Society 109325ndash370

Mark D M (1990) ldquoNeighbor-Based Properties of Some Orderings of Two-Dimensional Spacerdquo Geographical Analysis 2 145ndash157

Mateacutern B (1960) Spatial Variation Stockholm Sweden Meddelanden fraringnStatens Skogsforskningsinstitut

Messer J J Arsiss C W Baker J R Drouseacute S K Eshleman K NKaufmann P R Linthurst R A Omernik J M Overton W S Sale M JSchonbrod R D Stambaugh S M and Tuschall J R Jr (1986) Na-tional Surface Water Survey National Stream Survey Phase I-Pilot SurveyEPA-6004-86026 Washington DC US Environmental ProtectionAgency

MunhollandP L and Borkowski J J (1996) ldquoSimple Latin Square SamplingC 1 A Spatial Design Using Quadratsrdquo Biometrics 52 125ndash136

Olea R A (1984) ldquoSampling Design Optimization for Spatial FunctionsrdquoMathematical Geology 16 369ndash392

Overton W S and Stehman S V (1993) ldquoProperties of Designs for SamplingContinuous Spatial Resources From a Triangular Gridrdquo Communications inStatistics Part AmdashTheory and Methods 22 2641ndash2660

Patterson H D (1950) ldquoSampling on Successive Occasions With Partial Re-placement of Unitsrdquo Journal of the Royal Statistical Society Ser B 12241ndash255

Peano G (1890) ldquoSur Une Courbe Qui Remplit Toute Une Aire Planerdquo Math-ematische Annalen 36 157ndash160

Quenouille M H (1949) ldquoProblems in Plane Samplingrdquo The Annals of Math-ematical Statistics 20 335ndash375

Saalfeld A (1991) ldquoConstruction of Spatially Articulated List Frames forHousehold Surveysrdquo in Proceedings of Statistics Canada Symposium 91Spatial Issues in Statistics Ottawa Canada Statistics Canada pp 41ndash53

Sen A R (1953) ldquoOn the Estimate of the Variance in Sampling With Vary-ing Probabilitiesrdquo Journal of the Indian Society of Agricultural Statistics 7119ndash127

Simmons G F (1963) Introduction to Topology and Modern Analysis NewYork McGrawndashHill

Stehman S V and Overton W S (1994) ldquoEnvironmental Sampling and Mon-itoringrdquo in Handbook of Statistics Vol 12 eds G P Patil and C R RaoAmsterdam The Netherlands Elsevier Science pp 263ndash305

Stevens D L Jr (1997) ldquoVariable Density Grid-Based Sampling Designs forContinuous Spatial Populationsrdquo Environmetrics 8 167ndash195

Stevens D L Jr and Olsen A R (1999) ldquoSpatially Restricted Surveys OverTime for Aquatic Resourcesrdquo Journal of Agricultural Biological and Envi-ronmental Statistics 4 415ndash428

(2000) ldquoSpatially-Restricted Random Sampling Designs for Design-Based and Model-Based Estimationrdquo in Accuracy 2000 Proceedings of the4th International Symposium on Spatial Accuracy Assessment in Natural Re-sources and Environmental Sciences Delft The Netherlands Delft Univer-sity Press pp 609ndash616

(2003) ldquoVariance Estimation for Spatially Balanced Samples of Envi-ronmental Resourcesrdquo Environmetrics 14 593ndash610

Strahler A N (1957) ldquoQuantitative Analysis of Watershed GeomorphologyrdquoTransactions of the American Geophysical Union 38 913ndash920

Thompson S K (1992) Sampling New York WileyUrquhart N S Overton W S and Birkes D S (1993) ldquoComparing

Sampling Designs for Monitoring Ecological Status and Trends Impact ofTemporal Patternsrdquo in Statistics for the Environment eds V Barnett andK F Turkman New York Wiley pp 71ndash86

USGS (1999) ldquoThe National Hydrography Datasetrdquo Fact Sheet 106-99 USGeological Survey

Wolter K (1985) Introduction to Variance Estimation New York Springer-Verlag

Wolter K M and Harter R M (1990) ldquoSample Maintenance Based on PeanoKeysrdquo in Proceedings of the 1989 International Symposium Analysis of Datain Time Ottawa Canada Statistics Canada pp 21ndash31

Yates F (1981) Sampling Methods for Censuses and Surveys (4th ed) Lon-don Grif n

Yates F and Grundy P M (1953) ldquoSelection Without Replacement FromWithin Strata With Probability Proportional to Sizerdquo Journal of the RoyalStatistical Society Ser B 15 253ndash261

Page 4: Spatially Balanced Sampling of Natural Resources · Spatially Balanced Sampling of Natural Resources DonL.STEVENSJr. and Anthony R. OLSEN The spatial distribution of a natural resource

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 265

We can view a quadrant-recursive function as being de nedby the limit of successive intensi cations of a grid coveringthe unit square where a grid cell is divided into four sub-cells each of which is subsequently divided into four sub-subcells and so on If we carried this recursion to the limitand paired grid points with an address based on the order inwhich the divisions were carried out where each digit of theaddress represented a step in the subdivision then we wouldobtain a quadrant-recursive function For example suppose webegin with a point at (1 1) and replace it with four pointsp3 D 11 p2 D 1 1=2 p1 D 1=2 1 and p0 D 1=21=2The next step of the recursion replaces each of the rst fourpoints p0 p3 with pi iexcl f0 0 0 1 1 0 1 1g=22Thus the point p2 D 1 1=2 is replaced with the four pointsp23 D 1 1=2 p22 D 11=4 p21 D 3=4 1=2 and p20 D3=4 1=4 The nth step replaces each of the 4n points pi1 i2 centcentcentinwith pi1 i2centcentcentin iexcl f0 0 0 1 1 0 1 1g=2nC1

A spatially referenced address can be constructed followingthe pattern of the partitioning with each new partition addinga digit position to the address Thus in the preceding exam-ple the four points in the rst group are assigned the addresses3 2 1 and 0 where 3 is the original point at 11 The suc-cessor points to point 2 get the addresses 23 22 21 and 20and so forth The addresses induce a linear ordering of the sub-quadrants Moreover if we carry the process to the limit andtreat the resulting address as digits in a base-4 fraction [egtreat 22103cent cent cent as the base-4 number (22103cent cent cent4] then thecorrespondence between grid point and address is a quadrant-recursive function

Recursive partitioning generates a nested hierarchy of gridcells The derived addressing has the property that all succes-sor cells of a cell have consecutiveaddresses Thus a path fromcell to cell following the recursive partitioning address orderwill connect all successor cells of cell 0 before reaching anysuccessor of cell 2 (Fig 1)

A 1ndash1 continuous mapping of I2 onto I is not possible soquadrant-recursive functions are not continuousHowever theydo have the property that all points in a quadrant are mappedonto an interval all points in any one of the four subquadrantsof a quadrant are mapped onto an interval and so on ad in ni-tum This property tends to preserve proximity relationshipsthat is if s is ldquoclose tordquo t then f s should ldquotend to be close tordquof t In Appendix A we make this statement more precise byshowing that if the origin is located at random and s is chosenat random from I2 then limjplusmnj0 E[jf s iexcl f s C plusmnj] D 0 In-tuitively two elements that are close together will tend to fallin the same randomly located cell of a size that decreases asthe distance between points decreases Because the two ele-ments are covered by the same cell their addresses match tothe level of that cell and thus in expectation their addresseswill be close

A fundamental 1ndash1 quadrant-recursive map is de ned bydigit interweaving Let s D x y be a point in I2 Each of thecoordinates has an expansion as a binary fraction of the formx D x1x2x3 cent cent cent y D y1y2y3 cent cent cent where each xi and yi is either0 or 1 De ne f0s by alternating successive digits of x and y that is f0s D x1y1x2y2 cent cent cent Clearly f0 would be 1ndash1 ex-cept for different expansions of the same number For example1 and 011111cent cent cent where the 1s continue inde nitely are two

Figure 1 First Four Levels of a Quadrant-Recursive Partitioning ofthe Unit Square The address associated with the cross-hatched cellis 213

representations of the number 1=2 If we always use the binaryrepresentation with an in nite number of 1s then f0 is 1ndash1Moreover every point in I is the image of a point in I2 whichis obtained by ldquodigit splittingrdquo That is if t D t1t2t3 cent cent cent is in Ithen s D f iexcl1

0 t D t1t3t5 cent cent cent t2 t4t6 cent cent cent is the preimage of t Both f0 and f iexcl1

0 are 1ndash1 if we always use the representationwith an in nite number of 1s (Hausdorff 1957 p 45) To showthat f0 is quadrant recursive note that for s 2 Qn

jk the rst4n digits of f0s are xed so f0s 2 J n

m where m is de nedby the rst 4n digits Conversely the preimages of every t 2 J n

m

have the same rst 2n digits and so must be in the same Qnjk

Figure 1 shows the rst four levels of the recursive parti-tioning of the unit square The address of the cross-hatchedsubquadrant is as a base-4 fraction (213)4 and the associ-ated grid point is at (3=4 1=2) the upper right corner ofthe subquadrant Following the convention of having an in- nite number of 1s in the expansion we have 3=41=2 D11 12 D 1011111 cent cent cent 0111111 cent cent cent2 Digit interweavinggives the image 10011111cent cent cent2 D 2133333 cent cent cent4 of whichthe rst three digits are the subquadrant address If we carriedthe recursive partitioning to the limit every point in the sub-quadrant would be assigned an address beginningwith 2134

The class of all quadrant-recursive functions can be gener-ated from the function f0 which is de ned by digit interweav-ing by permuting the order in which subquadrants Qn

jk are

paired with the intervals J nm For example for n D 1 f0Q1

jk DJ 1

2jCk We obtain a different quadrant-recursivefunctionby per-muting the subscripts f0 12 3g of the image intervals Thusunder the permutation iquest D f21 30g we get a function suchthat fiquest Q1

jk D J 1iquest 2jCk so that fiquest Q1

00 D J 12 fiquest Q1

01 D J 11

fiquest Q110 D J 1

3 and fiquest Q111 D J 1

0 To see that the class ofall quadrant-recursive functions is generated by such permu-tations express each number in I as a base-4 number that

266 Journal of the American Statistical Association March 2004

is as t D t1t2t3 cent cent cent where each digit ti is either a 0 1 2or 3 A function hp I I is a hierarchical permutationif hpt D p1t1pt12t2pt1 t23t3 cent cent cent where pt1 t2centcentcenttniexcl1ncent isa permutationof f0 12 3g for each uniquecombinationof dig-its t1 t2 tniexcl1 Again we ensure that hp is 1ndash1 by alwaysusing the expansion with an in nite number of nonzero digitsAny quadrant-recursive function can be expressed as the com-position of f0 with some hierarchical permutation hp becausethe associations f Qn

jk D J nm determine the series of permuta-

tions and the permutations de ne the associationsIf the permutations that de ne hpcent are chosen at random

and independentlyfrom the set of all possible permutations wecall hpcent a hierarchical randomization function and call theprocess of applying hpcent hierarchical randomization

22 Sample Selection With Probability Proportional toArbitrary Intensity Function

We assume that the design speci cations de ne a desiredsample intensity function frac14s that is the number of samplesper unit measure of the populationFor example if the popula-tion were a stream network frac14s might specify the number ofsamples per kilometer of stream at s For a discrete populationfrac14s has the usual nite-population-sampling interpretation asthe target inclusion probability of the population unit locatedat s We call frac14s an intensity function because we have notyet introduced a probability measure In Appendix B we de-velop the details of a sample selection method that yields aninclusion-probability function equal to frac14s The concept be-hind the method is the composition of a hierarchical random-ization function with a function that assigns to every intervalin f R a weight equal to the total of the intensity function ofits preimage in R In effect we stretch the image interval viaa distribution function F so that its total length is equal to thesample size M We pick M points by taking a systematic sam-ple with a unit separation along the stretched image and we mapthese points back into the domain R via the inverse function toget the sample of the population We show in Appendix B thatthis procedure does indeed give a sample with an inclusion-probability function equal to the intensity function frac14s

The technique of randomly mapping two-dimensional spaceto a line segment systematically sampling from the range of thedistribution function and then mapping back to the populationelements always produces a sample with the desired rst-orderinclusion-probability function as long as f is 1ndash1 and measur-able We required that f be quadrant recursive and claim thatthis is suf cient to give a spatially balanced sample This claimfollows from the fact that the map f iexcl1 plusmn F=M plusmn f transformsthe unequal intensity surface de ned by frac14 into an equiproba-ble surface The quadrant-recursive property of f guaranteesthat the sample is evenly spread over the equiprobable sur-face (in the sense that each subquadrant receives its expected

number of samples) to the resolution determined by the samplesize M

23 Reverse Hierarchical Ordering

The sample points selected by mapping the systematic pointsalong 0 M] back to the population domain will be ordered ina way that follows the quadrant-recursiveness of f temperedby an allowance for unequal probability selection Thus the rst quarter of the points all will come from the same ldquoquad-rantrdquo of the equiprobabledomain and all will be approximatelyneighbors in the original populationdomain It follows that fourpoints one picked from each quarter of the sample points or-dered by the systematic selection will be a spatially balancedsample Because the random permutations that de ne the hier-archical randomization are selected independently of one an-other it makes no difference from a distributional standpointwhether we pick the points systematically from each quarteror make random selections from each quarter Therefore welose no randomness by picking the points that occupy positionsthat correspond to being at the beginningone-quarter one-halfand three-quarters of the way through the ordered list of samplepoints

Within each quarter of the list the points are again quadrant-recursively ordered so points picked at the beginning one-quarter one-half and three-quarters of the way througheach quarter of the list will be spread out over the correspond-ing quadrant and so on down through the sequence of sub-quadrants We can utilize these properties by reordering thesystematically selected list so that at any point in the reorderedlist the samples up to that point are well spread out over thepopulationdomain

The order is most convenientlyexpressed in terms of a base-4fraction where the fraction expresses the relative position inthe systematically ordered list Thus the rst four points cor-respond to the fractions 0 1 2 34 D 0 1=4 1=2 3=410Stepping down a subquadrant level corresponds to addinga digit position to the base-4 fraction which we ll in sucha way as to spread the sequence of points over the populationdomain The pattern for the rst 16 points is shown in Table 1Note that the order corresponds to the ranking obtained by re-versing the sequence of base-4 digits and treating the reversedsequence as a base-4 fraction

We can continue this same pattern of adding digit positionsthrough as many positions as necessary to order the entire sam-ple The resulting order is called reverse hierarchical orderIt remains to show that reverse hierarchical order does indeedgive a spatially well-balanced sample for any m middot M Clearlythis is the case for m D 4k because the reduced sample canbe viewed as a sample selected from a complete GRTS designStevens (1997) derived an analytic expression for the pairwiseinclusion density for some special intermediate cases Here weinvestigate the spatial balance properties using simulation

Table 1 Generation of Reverse Hierarchical Order

Reverse Reverse Reverse ReverseOrder Base 4 base 4 Order Base 4 base 4 Order Base 4 base 4 Order Base 4 base 4

1 00 00 5 01 10 9 02 20 13 03 302 10 01 6 11 11 10 12 21 14 13 313 20 02 7 21 12 11 22 22 15 23 324 30 03 8 31 13 12 32 23 16 33 33

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 267

3 SPATIAL PROPERTIES OF GRTS SAMPLE POINTS

In this section we investigate the spatial balance or regu-larity of the sample points produced by a GRTS design Wenoted in the Introduction that generally the ef ciency of anenvironmental sample increases as spatial regularity increasesA design with regularity comparable to a maximally strati edsample should have good ef ciency Choosing a suitable sta-tistic to describe regularity is nontrivial because the popula-tion domain itself is likely to have some inherent nonregularity(eg variation in spatial density for a nite or linear popula-tion) and because of the need to account for variable inclusionprobability The measure of regularity needs to describe reg-ularity over the inclusion-probability-weighted irregular pop-ulation domain Various statistics to assess the regularity ofa point process have been proposed in the study of stochasticpoint processes One class of descriptive statistics is based oncounts of event points within cells of a regular grid that cov-ers the process domain The mean count is a measure of theprocess intensity and the variance of the counts is a measureof the regularity The usual point process approach is to invokeergodicity and take expectation over a single realization In thepresent case the expectationshould and can be taken over repli-cate sample selections

We illustrate this approach using an arti cial nite popula-tion that consists of 1000 points in the unit square with a spatialdistribution constructed to have high spatial variability that isto have voids and regions with densely packed points Variableprobability was introduced by randomly assigning 750 unitsa relative weight of 1 200 units a weight of 2 and 50 unitsa weight of 4 The inclusion probability was obtained by scal-ing the weights to sum to the sample size We divided the unit

square into 100 square cells with sides 1 units Fifty-one of thecells were empty The expected sample sizes (the sum of theinclusion probability for each cell) for the 49 nonempty cellsranged from 037 to 4111

We compared the regularity of three sampling designs in-dependent random sampling (IRS) spatially strati ed sampling(SSS) and GRTS sampling For each sampling scheme we se-lected 1000 replicates of a sample of 50 points and counted thenumber of sample points that fell into each of the 49 nonemptycells de ned in the previous paragraph For the IRS sample weused the S-PLUS (Insightful Corporation 2002) ldquosamplerdquo func-tion with ldquoprobrdquo set to the element inclusion probability

As we noted in the Introductionthere is no general algorithmfor partitioning an arbitrary nite spatial population with vari-able inclusionprobability into spatial strata with equal expectedsample sizes For this exercisewe chose to use equal-area stratawith variable expected sample sizes For simplicity we chosesquare strata We picked a side length and origin so that (1)the strata were not coherent with the 1 pound 1 cells used for reg-ularity assessment and (2) about 50 stratum cells had at leastone population point The strata we used were offset from theorigin by 03 03 with a side length of 095 Exactly 50 strat-i cation cells were nonemptywith expected sample sizes rang-ing from 037 to 4111 Figure 2 shows the population with thestrati cation cells overlaid

We selected the strati ed sample in two stages The fractionalparts of the expected sample sizes will always sum to an inte-ger in this case 21 The rst step in the sample selection was toselect which 21 of the 50 strata would receive an ldquoextrardquo samplepoint For this step we again used the S-PLUS ldquosamplerdquo func-tion this time with ldquoprobrdquo set to the fractional part of the ex-pected sample size The second step in the sample selection was

Figure 2 Finite Population Used in Spatial Balance Investigation Overlaid With Grid Cells Used for Stratication Cell cross-hatching indicatesthe expected sample size in each cell

268 Journal of the American Statistical Association March 2004

to pick samples in each of the 50 strata that had a sample sizeequal to the integerpart of the expected sample size plus 1 if thestratum was selected in stage 1 Again the sample was selectedwith the ldquosamplerdquo function this time with ldquoprobrdquo set to the ele-ment inclusionprobabilityThis two-stage procedurealways se-lects exactly 50 samples with the desired inclusion probability

In Figure 3 we plot the variance of the achieved sample sizein each of the evaluation cells versus the expected sample sizewith lowess tted lines Of the three designs the IRS has thelargest variance and the GRTS has the smallest the SSS designis approximatelymidway between Strati cation with one sam-ple per cell would likely have about the same variance as theGRTS

Another common way to characterize a one-dimensionalpoint process is via the interevent distance for example themean interevent time for a time series measures the intensity ofthe process and the variance measures the regularity An anal-ogous concept in two-dimensions is that of Voronoi polygonsFor a set of event points fs1 s2 sng in a two-dimensionaldomain the Voronoi polygon 9i for the ith point is the collec-tion of domain points that are closer to si than to any other sj inthe set Note that in the case of a nite population the Voronoildquopolygonsrdquoare collectionsof populationpoints and for a linearpopulation they are collections of line segments

We propose using a statistic based on Voronoi polygons todescribe the regularity of a spatial sample For the sample S

consisting of the points fs1 s2 sng let ordmi DR

9ifrac14s dAacutes

so that vi is the total inclusion probability of the Voronoi poly-gon for the ith sample point and set sup3 D Varfordmig For a -nite population with variable inclusion probability vi is thesum of the inclusion probability of all population units closerto the sample point si than to any other sample point BecauseP

i j9ij D jRj andR

Rfrac14s dAacutes D n E[vi ] D 1 We note that

for an equiprobable sample of a two-dimensionalcontinua sup3 isequal to the variance of the area of the Voronoi polygons for thepoints of S multipliedby the square of the inclusionprobability

Figure 3 Comparison of the Regularity of GRTS SSS and IRS De-signs Results are based on the mean of 1000 samples of size 50 Theachieved sample size is the number of samples that fell into 1 pound 1square cells that tiled the population domain Lines were tted withlowess (N generalized random tessellation stratied sampling pound inde-pendent random sampling brvbar spatially stratied sampling)

For the kinds of applications that we have in mind the spa-tial context of the population is an intrinsic aspect of the sampleselection For a nite population the spatial context simplycomprises the locationsof the populationunits for a linear pop-ulation the spatial context is the network and for an areal re-source the spatial context is described by the boundary of theresource domain which may be a series of disconnected poly-gons The effect of the interplay of sampling design and spa-tial context on properties of the sample cannot be ignored Forsmall to moderate sample sizes or for highly irregular domainsthe spatial context can have a substantial impact on the distri-bution of sup3 Because of the spatial dependence the derivationof a closed form for the distribution of sup3 does not seem feasi-ble even for simple sampling designs such as IRS Howeverfor most cases it should be relatively easy to simulate the dis-tribution of sup3 under IRS to obtain a standard for comparisonThe regularity of a proposed design can then be quanti ed asthe ratio sup3proposed design=sup3IRS where ratios less than 1indicate more regularity than an IRS design

We evaluated spatial balance using the sup3 ratio under threescenarios (1) a variable probability sample from a nite popu-lation (2) an equiprobable point sample from an areal popula-tion de ned on the unit square and (3) an equiprobable pointsample from the same extensive populationbut with randomlylocated square holes to model nonresponseand imperfect frameinformation

For the nite population study we drew 1000 samples ofsize 50 from the previously described nite population for boththe GRTS and the IRS designs To illustrate the ability of theGRTS design to maintain spatial regularity as the sample sizeis augmented we ordered the GRTS points using reverse hi-erarchical ordering We then calculated the sup3 ratio beginningwith a size of 10 and adding one point at a time following thereverse hierarchical order We also drew 1000 samples of size50 using the previously discussed spatial strati cation Becausethere is no sensible way to add the strati ed sample points oneat a time we can compute the sup3 ratio only for the complete sam-ple of 50 points Figure 4 is a plot of the sup3 ratio for GRTS andIRS versus sample size The single sup3 ratio for SSS(50) is alsoshown For the GRTS design the sup3 ratio has a maximum valueof 587 with 10 samples and gradually tapers off to 420 with50 samples Although it would be dif cult to prove we sus-pect that the gradual taper is due to lessening edge effect withincreasing sample size that is fewer of the Voronoi polygonscross the void regions in the population domain We note thatthe valleys in the sup3 ratio occur at multiples of 4 with the mostextreme dips occurring at powers of 4 This is a consequenceof quadrant-recursive partitioning maximum regularity occurswith one point from each of the four quadrants We also notethat the SSS(50) value of the sup3 ratio is 550 compared to thecorrespondingvalue of 420 for the GRTS design Inasmuch asthe GRTS is analogous to a one-sample-per-stratum SSS wewould expect the GRTS to be as ef cient as a maximally ef -cient SSS

For our extensive population study we selected 1000 sam-ples of size M D 256 from the unit square using the GRTS

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 269

Figure 4 The sup3 Ratio as a Function of Sample Size Based on 1000Replicate Samples From an Arti cial Finite Population Sample pointswere added one point at a time up to the maximum sample size of 50following reverse hierarchical order for the GRTS sample The sup3 ratio fora spatially stratied sample is also indicated on the plot

design and ordered the samples using the reverse hierarchicalorder As for the nite populationstudy we calculated the sup3 ra-tio as the points were added to the sample one at a time be-ginning with point number 10 The holes represent nontargetor access-denied elements that were a priori unknown Samplepoints that fell in the holes were discarded resulting in a vari-able number of sample points in the target domain As for thecomplete domain we ordered the points using reverse hierar-chical ordering and then calculated the sup3 ratio as the points wereadded one at a time Because the sample points that fall into thenontarget areas contribute to the sample point density but not tothe sample size the sup3 ratio was plotted versus point density

We used three different distributions of hole size constantlinearly increasing and exponentially increasing In each casethe holes comprise 20 of the domain area Figure 5 shows the

Figure 5 Void Patterns Used to Simulate InaccessiblePopulation El-ements

Figure 6 The sup3 Ratio as a Function of Point Density Based on 1000Replications of a Sample of Size 256 (mdashmdash continuous domain with novoids iexcl iexcl iexcliexcl exponentialy increasing polygon size - - - - - linearly in-creasing polygon size cent cent centmdashcent cent centmdash constant polygon size)

placementof the holes for each scenario and Figure 6 shows thesup3 ratio for all four scenarios no voids exponentiallyincreasinglinearly increasing and constant size

In every scenario the variance ratio is much less than 1Except for small sample sizes the ratio stays in the range of2 to 4 The gradual decrease as the sample size increases is dueto the decreasing impact of the boundaryas the sample size in-creases the proportion of polygons that intersect the boundarydecreases A similar effect is seen with the different inaccessi-bility scenarios even though the inaccessible area is constantthe scenarios with greater perimeter cause more increase invariance

4 STATISTICAL PROPERTIES OF GRTS DESIGN

41 Estimation

The GRTS design produces a sample with speci ed rst-order inclusion probabilities so that the HorvitzndashThompson(Horvitz and Thompson 1952) estimator or its continuouspop-ulation analog (Cordy 1993 Stevens 1997) can be applied toget estimates of population characteristics Thus for examplean estimate of the population total of a response z is given byOZT D

Psi2Rzsi=frac14si Stevens (1997) provided exact ex-

pressions for second-order inclusion functions for some specialcases of a GRTS These expressions can also be used to provideaccurate approximations for the general case Unfortunatelythe variance estimator based on using these approximations inthe usual HorvitzndashThompson (HT) or YatesndashGrundyndashSen (YGYates and Grundy 1953 Sen 1953) estimator tends to be unsta-ble The design achieves spatial balance by forcing the pairwiseinclusion probability to approach 0 as the distance between thepoints in the pair goes to 0 Even though the pairwise inclu-sion density is nonzero almost everywhere any moderate-sizedsample will nevertheless have one or more pairs of points thatare close together with a correspondinglysmall pairwise inclu-sion probability For both the HT and YG variance estimatorsthe pairwise inclusion probability appears as a divisor The cor-

270 Journal of the American Statistical Association March 2004

responding terms in either HT or YG variance estimators willtend to be large leading to instability of the variance estimator

Contrast-based estimators of the form OVCtr OZT DP

i wiy2i

where yi is a contrast of the form yi DP

k cikzsk withPk cik D 0 have been discussed by several authors (Yates

1981 Wolter 1985 Overton and Stehman 1993) For an RTSdesign Overton and Stehman also considered a ldquosmoothedrdquocontrast-based estimator of the form OVSMO OZT D

Pi wizi iexcl

zcurreni 2 where zcurren

i called the smoothed value for data point zi istaken as a weighted mean of a point plus its nearest neighborsin the tessellation

Stevens and Olsen (2003) proposed a contrast-based esti-mator for the GRTS design that bears some resemblance tothe Overton and Stehman smoothed estimator The single con-trast zi iexcl zcurren

i 2 is replaced with an average of several con-trasts over a local neighborhood analogous to a tessellationcell and its nearest neighbors in the RTS design A heuris-tic justi cation for this approach stems from the observationthat the inverse images of the unit-probability intervals on theline form a random spatial strati cation of the population do-main The GRTS design conditional on the strati cation isa one-sample-per-stratumspatially strati ed sample Recall thatOZT D

Psi2Rzsi=frac14si where zsi is a sample from the

ith random stratum The selections within strata are condition-ally independent of one another so that

V OZT DX

si2R

E

microV

sup3zsi

frac14si

shyshyshystrata

acutepara

The proposed variance estimator approximates E[V zsi=

frac14sij strata] by averaging several contrasts over a localneighborhoodof each sample point The estimator is

OVNBH OZT DX

si2R

X

sj 2Dsi

wij

sup3zsj

frac14sj iexcl

X

sk2Dsi

wikzsk

frac14sk

acute2

where Dsi is a local neighborhood of the si The weightswij are chosen to re ect the behavior of the pairwise inclu-sion function for GRTS and are constrained so that

Pi wij DP

j wij D 1 Stimulation studies with a variety of scenarioshave shown the proposed estimator to be stable and nearly unbi-ased Applications with real data have consistently shown thatour local neighborhoodvariance estimator produces smaller es-timates than the HorvitzndashThompson estimator when IRS is as-sumed to approximate for the joint inclusion probabilities

42 Inverse Sampling

The reverse hierarchical ordering provides the ability to doinverse sampling that is to sample until a given number ofsamples are obtained in the target population The true inclu-sion probability in this case depends on the spatial con gura-tion of the target populationwhich may be unknownHoweverone can compute an inclusion probability that is conditional onthe achieved sample size in the target population being xedFor example suppose we want M sample points in our do-main R We do not know the exact boundaries of R but areable to enclose R in a larger set Rcurren We select a sample of sizeMcurren gt M from Rcurren using an inclusion density frac14 curren scaled so that

Table 2 Domain Area Estimates Using ConditionalInclusion Probability

Targetsamplesize

Mean estimated domain area

Exponential Constant Linear

25 8000979 7969819 801058950 7995775 7979406 8005739

100 7994983 7980543 8002237150 7994777 7997587 7995685

RRcurren frac14currens dAacutes D M curren The inclusion density for the k-point

reverse hierarchical ordered sample is frac14currenk s D k=Mcurrenfrac14currens

Using the inclusion density frac14 currenk the expected number of sam-

ples in R is

Mk DZ

R

frac14currenk s dAacutes D

Z

RcurrenIRsfrac14 curren

k s dAacutes

We cannot compute Mk because the boundaryof R is unknownbut an estimate is

OMk DX

i

IR sifrac14currenk si

frac14currenk si

DX

i

IRsi

We pick Qk so that OMk D M and base inference on frac14currenQk Thus

for example an estimate of the unknown extent of R is j ORj DPiIRsi=frac14curren

Qksi

We illustrate this using the same inaccessibility scenarios asfor the spatial balance simulation Results are summarized inTable 2 In each case the true area of R is 8 so that the esti-mator using frac14curren

Qk is either unbiased or nearly so

43 Statistical Ef ciency

As discussed in the Introduction sampling designs withsome degree of spatial regularity for example systematic grid-based or spatially strati ed designs tend to be more ef cientfor sampling natural resources than designs with no spatialstructure The GRTS design takes the concept of spatial strat-i cation carries it to an extreme and gives it exibility androbustness The basis for these claims is that for the case ofan equiprobable sample of an areal resource over a continuousconnected domain a GRTS sample with size n D 4k is a spa-tially strati ed sample with one sample point per stratum Inthis case the strata are square grid cells with a randomly locatedorigin Generally the ef ciency of a spatially strati ed sampleincreases as the number of strata increases (samples per stratumdecreases) so maximal ef ciency is obtained for a one-point-per-stratum-design Thus in this restricted case the GRTS hasthe same ef ciency as the maximally ef cient spatial strati ca-tion

The spatial regularity simulation studies provide some in-sight into less restrictive cases First the ldquono-voidrdquo case ofthe continuous domain study shows that the spatial regularityis not seriously degraded for sample sizes that are not powersof 4 so that even for intermediate sample sizes the GRTS ef- ciency should be close to the ef ciency of maximal spatialstrati cation Second the ldquoholesrdquo cases show that for irreg-ularly shaped domains GRTS maintains spatial regularity Inthis case GRTS with n D 4k is again a one-point-per-stratum

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 271

design but the strata are no longer regular polygonsNeverthe-less GRTS should have the same ef ciency as maximal strati- cation

An example of circumstances where ef ciency is dif cultto evaluate is a nite population study with variable probabil-ity and irregular spatial density In these circumstances spatialstrata can be very dif cult to form and in fact it may be impos-sible to form strata with a xed number of samples per stratumA GRTS sample achieves the regularity of a one-sample-per-stratum strati cation and so should have the same ef ciency

The overwhelming advantage of a GRTS design is not that itis more ef cient than spatial strati cation but that it can be ap-plied in a straightforward manner in circumstances where spa-tial strati cation is dif cult All of the pathologies that occurin sampling natural populations (poor frame information inac-cessibility variable probability uneven spatial pattern missingdata and panel structures) can be easily accommodated withinthe GRTS design

5 EXAMPLE APPLICATION TO STREAMS

The Indiana Department of Environmental Management(IDEM) conducts water quality and biological assessments ofthe streams and rivers within Indiana For administrative pur-poses the state is divided into nine hydrologicbasins East ForkWhite River Basin West Fork White River Basin Upper IllinoisRiver Basin Great Miami River Basin Lower Wabash RiverBasin Patoka River Basin Upper Wabash River Basin GreatLakes Basin and Ohio River Basin All basins are assessedonce during a 5-year period typically two basins are completedeach year In 1996 IDEM initiated a monitoring strategy thatused probability survey designs for the selection of samplingsite locations We collaborated with them on the survey designIn 1997 a GRTS multidensity design was implemented for theEast Fork White River Basin and the Great Miami River BasinIn 1999 another GRTS multidensity design was implementedfor the Upper Illinois Basin and the Lower Wabash These de-signs will be used to illustrate the application of GRTS surveydesigns to a linear network

The target population for the studies consists of all streamsand rivers with perennially owing water A sample frameRiver Reach File Version 3 (RF3) for the target populationis available from the US Environmental Protection Agency(Horn and Grayman 1993) The RF3 includes attributes that en-able perennial streams and rivers to be identi ed but results inan overcoverage of the target population due to coding errorsIn addition Strahler order is available to classify streams andrivers into relative size categories (Strahler 1957) A headwaterstream is a Strahler rst-order stream two rst-order streamsjoining results in a second-order stream and so on Approx-imately 60 of the stream length in Indiana is rst order

Table 3 Sample Frame Stream and River Length by Basin andStrahler Order Category

Strahler order category length (km)

Basin Total length (km) 1 2 3 C

E Fork White 6802385 3833335 2189494 779556Great Miami 2270018 1501711 621039 147268L Wabash 7601418 4632484 1331228 1637706U Illinois 5606329 4559123 500188 547018

20 is second order 10 is third order and 10 is fourth andgreater (see Table 3) In 1997 IDEM determined that the sam-ple would be structured so that approximately an equal num-ber of sites would be in rst order second and third order andfourth C order for the East Fork White River and the Great Mi-ami River basins In 1999 the sample was modi ed to have anequal number of sites in rst second third and fourth C ordercategories for the Lower Wabash and Upper Illinois basins

The GRTS multidensity survey designs were applied In bothyears six multidensity categories were used (three Strahler or-der categories in each of two basins) Although four Strahlerorder categories were planned in 1999 the stream lengths as-sociated with the third and fourth C categories were approxi-mately equal so a single category that combined the samplesizes was used To account for frame errors landowner denialsand physically inaccessible stream sites a 100 oversamplewas incorporated in 1999 The intent was to have a minimumof 38 biological sites with eld data in 1999 this was not donein 1997 Table 4 summarizes the number of sites expected andactually evaluated as well as the number of nontarget targetnonresponse and sampled sites Almost all of the nonresponsesites are due to landowner denial In 1999 the sites were usedin reverse hierarchical order until the desired number of ac-tual eld sample sites was obtained The biological sites werea nested subsample of the water chemistry sites and were takenin reverse hierarchicalorder from the water chemistry sites Fig-ures 7ndash10 show the spatial pattern of the stream networks andthe GRTS sample sites for each of the four basins by Strahlerorder categories Although this is an example of a single real-izationof a multidensityGRTS design all realizationswill havea similar spatial pattern Prior to statistical analysis the initialinclusion densities are adjusted to account for use of oversam-ple sites by recalculating the inclusion densities by basin

Indiana determined two summary indices related to the eco-logical conditionof the streams and rivers the IBI score whichis a sh community index of biological integrity (Karr 1991)that assesses water quality using resident sh communities asa tool for monitoring the biological integrity of streams and theQHEI score which is a habitat index based on the Ohio Envi-ronmental Protection Agency qualitative habitat evaluation in-dex (see IDEM 2000 for detailed descriptions of these indices)

Table 4 Survey Design Sample Sizes for Basins Sampled in 1997 and 1999

Expected Evaluated Nontarget Target Nonresponse Water chemistry BiologicalBasin sample size sample size sites sites sites sites sites

E Fork White 60 60 5 55 9 35 34Great Miami 40 40 12 28 5 19 19L Wabash 128 91 11 80 9 71 39U Illinois 128 85 8 77 5 72 41

272 Journal of the American Statistical Association March 2004

Figure 7 East Fork White River Basin Sample Sites by Multidensity Categories

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 273

Figure 8 Great Miami River Basin Sample Sites by Multidensity Categories

274 Journal of the American Statistical Association March 2004

Figure 9 Upper Illinois River Basin Sample Sites by Multidensity Categories

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 275

Figure 10 Lower Wabash River Basin Sample Sites by Multidensity Categories

276 Journal of the American Statistical Association March 2004

Table 5 Population Estimates With IRS and Local Variance Estimates

Indicator IRS Local DifferenceSubpopulation score N sites Mean std err std err ()

L Wabash IBI 39 361 21 14 iexcl568U Illinois IBI 41 325 17 13 iexcl448E Fork White IBI 32 351 13 12 iexcl223Great Miami IBI 19 408 27 22 iexcl335L Wabash QHEI 39 556 23 16 iexcl522U Illinois QHEI 41 433 21 16 iexcl395E Fork White QHEI 34 543 21 15 iexcl459Great Miami QHEI 19 678 22 19 iexcl260

Table 5 summarizes the population estimates for IBI andQHEI scores for each of the four basins The associated stan-dard error estimates are based on the HorvitzndashThompson ratiovariance estimator assuming an independent random sampleand on the local neighborhoodvariance estimator described inSection 41 On average the neighborhood variance estimatoris 38 smaller than the IRS variance estimator Figure 11 il-lustrates the impact of the variance estimators on con denceintervals for cumulative distribution function estimates for theLower Wabash Basin

6 DISCUSSION

There are a number of designs that provide good disper-sion of sample points over a spatial domain When we appliedthese designs to large-scale environmental sampling programsit quickly became apparent that we needed a means (1) toaccommodate variable inclusion probability and (2) to adjustsample sizes dynamically These requirements are rooted inthe very fundamentals of environmental management The rstrequirement stems from the fact that an environmental re-source is rarely uniformly important in the objective of themonitoring there are always scienti c economic or politi-cal reasons for sampling some portions of a resource moreintensively than others Two features of environmental moni-toring programs drive the second requirement First these pro-grams tend to be long lived so that even if the objectives ofthe program remain unchangedthe ldquoimportantrdquosubpopulationschange necessitating a corresponding change in sampling in-tensity Second a high-quality sampling frame is often lack-ing for environmental resource populationsAs far as we knowthere is no other technique for spatial sampling that ldquobalancesrdquoover an intensity metric instead of a Euclidean distance metricor permits dynamic modi cation of sample intensity

Adaptive sampling (Thompson 1992 pp 261ndash319) is an-other way to modify sample intensity However there are somesigni cant differences between GRTS and adaptive sampling inthe way the modi cation is accomplished Adaptive samplingincreases the sampling intensity locally depending on the re-sponse observed at a sample point whereas the GRTS intensitychange is global

The GRTS rst-order inclusion probability (or density) canbe made proportional to an arbitrary positive auxiliary vari-able for example a signal from a remote sensing platformor a sample intensity that varies by geographical divisions orknown physical characteristics of the target populationIn somepoint and linear situations it may be desirable to have thesample be spatially balanced with respect to geographic spacerather than with respect to the population density This can be

Figure 11 Stream Network and Sample Site Spatial Patterns by Mul-tidensity Category for the Lower Wabash Basin (mdashmdash- CDF estimateiexcl iexcl iexcl iexcl iexcl 95 local condence limits cent cent cent cent cent cent cent cent cent95 IRS condencelimits)

achieved by making the inclusion probability inversely propor-tional to the population density Although the development ofGRTS has focused on applications in geographic space it canbe applied in other spaces For exampleone applicationde nedtwo-dimensionalspace by the rst two principal componentsofclimate variables and selected a GRTS sample of forest plots inthat space

The computational burden in hierarchical randomization canbe substantial However it needs to be carried out only to a res-olution suf cient to obtain no more than one sample point persubquadrant The actual point selection can be carried out bytreating the subquadrants as if they are elements of a nitepopulation selecting the M subquadrants to receive samplepoints and then selecting one population element at randomfrom among the elements contained within the selected sub-quadrants according to the probability speci ed by frac14

Reverse hierarchical ordering adds a feature that is im-mensely popular with eld practitioners namely the ability toldquoreplacerdquo samples that are lost due to being nontarget or inac-cessible Moreover we can replace the samples in such a wayas to achieve good spatial balance over the population that isactually sampleable even when sampleability cannot be deter-mined prior to sample selectionOf course this feature does noteliminate the nonresponse or the bias of an inference to the in-accessible population It does however allow investigators toobtain the maximum number of samples that their budget willpermit them to analyze

Reverse hierarchical ordering has other uses as well One isto generate interpenetrating subsamples (Mahalanobis 1946)For example 10 interpenetrating subsamples from a samplesize of 100 can be obtained simply by taking consecutive sub-sets of 10 from the reverse hierarchical ordering Each subsethas the same properties as the complete design Consecutivesubsets can also be used to de ne panels of sites for applica-tion in surveys over time for example sampling with partial re-placement (Patterson 1950 Kish 1987 Urquhart Overton andBirkes 1993)

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 277

APPENDIX A PROOF OF LEMMA

Lemma Let f I2 I be a 1ndash1 quadrant-recursive function andlet s raquo UI2 Then limjplusmnj0 Efjf s iexcl f s C plusmnjg D 0

Proof If for some n gt 0 s and s C plusmn are in the same subquad-rant Qn

jk then f s and f s C plusmn are in the same interval J nm so

that jf s iexcl f s C plusmnj middot 1=4n The probability that s and s C plusmn arein the same subquadrant is the same as the probability of the ori-gin and plusmn D plusmnx plusmny being in the same cell of a randomly locatedgrid with cells congruent to Qn

jk For plusmnx plusmny middot 1=2n that probability

is equal to jQn0 Qnplusmnj=jQn0j D 1 iexcl 2nplusmnx C plusmny C 4nplusmnxplusmny where Qnx denotes a polygon congruent to Qn

jk centered on x

For Ds plusmn D jf s iexcl f s C plusmnj then we have that P D middot 1=4n cedil1 iexcl 2nplusmnx C plusmny C 4nplusmnxplusmny Thus the distribution function FD of D isbounded below by

FD u cedil

8gtlt

gt

0 u middot 14n

1 iexcl 2nplusmnx C plusmny C 4nplusmnxplusmny u gt1

4n

Because D is positive and bounded above by 1

E[Dplusmn] D 1 iexclZ 1

0FD udu

middot 1 iexclraquo

0

4nC

sup31 iexcl 1

4n

acuteiexcl 2nplusmnx C plusmny C 4nplusmnxplusmny

frac14

For xed n we have that

limjplusmnj0

E[Dplusmn] middot 14n

but this holds for all n so that

limjplusmnj0

E[Dplusmn] D 0

APPENDIX B PROOF THAT THE PROBABILITYINCLUSION FUNCTION EQUALS THE

TARGET INTENSITY FUNCTION

We need the measure space XB Aacute where X is the unit inter-val I D 0 1] or the unit square I2 D 0 1] pound 01] and the rele-vant frac34 elds are BI and BI2 the frac34 elds of the Borel subsetsof I and I2 respectively For each of the three types of populationswe de ne a measure Aacute of population size We use the same symbolfor all three cases but the speci cs vary from case to case For a -nite population we take Aacute to be counting measure restricted to R sothat for any subset B 2 BI2 AacuteB is the number of population ele-ments in B R For linear populations we take AacuteB to be the lengthof the linear population contained within B Clearly Aacute is nonnega-tive countably additive de ned for all Borel sets and Aacute D 0 soAacute is a measure Finally for areal populations we take AacuteB to be theLebesgue measure of B R

We begin by randomly translating the image of R in the unit squareby adding independent U0 1=2 offsets to the xy coordinates Thisrandom translation plays the same role as random grid location doesin an RTS design namely it guarantees that pairwise inclusion prob-abilities are nonzero In particular in this case it ensures that any pairof points in R has a nonzero chance of being mapped into differentquadrants

Let frac14s be an inclusion intensity function that is a function thatspeci es the target number of samples per unit measure We assumethat any linear population consists of a nite number m of smoothrecti able curves R D

SmiD1fdegit D xi t yi t jt 2 [ai bi ]g with

xi and yi continuous and differentiable on [ai bi ] We set frac14s equalto the target number of samples per unit length at s for s 2 L and

equal to zero elsewhere For example if the linear population werea stream network frac14s would specify the desired number of samplesper kilometer of stream at the point s Finally an areal population isa nite collection of closed polygons In this case frac14s speci es thetarget intensity as number of samples per unit area Note that for one-and two-dimensional resources frac14s could be a continuous smoothlyvarying function Formally we require frac14s to be bounded and mea-surable strictly positive on R and zero elsewhere and scaled so thatM D

RR frac14s dAacutes From these de nitions of frac14cent and Aacutecent it follows

that wB DR

B frac14sdAacutes is a measure and that wB is the targetnumber of samples in B In particular M D wI2 is the target samplesize In the following discussion we assume that M is an integer thenoninteger case is a simple extension

Let f cent be a quadrant-recursive function that maps I2 into I Be-cause BI can be generated by sets of the form J n

m and BI2 can begenerated by sets of the form Qn

j k both f and f iexcl1 are measurable

Because f is measurable f iexcl1B is measurable for B 2 BI so thatQF x D

Rf iexcl10x] frac14sdAacutes exists In fact QF is a distribution func-

tion that is nonnegative increasing and right continuous For linearand areal resources QF is a continuous increasing function but for -nite resourcepopulations QF is a step function with jumps at the imagesof populationelements We can modify QF to obtain continuity in the -nite case via linear interpolation that is let xi i D 1 N be the or-dered jump points of QF set x0 D 0 xNC1 D 1 and for xi lt x middot xiC1set F x D QF x C QFxiC1 iexcl QFxi =xiC1 iexcl xi x iexcl xi If we setF D QF for the linear and areal case then in every cases we have thatF is a continuous distribution function with range 0M]

In the nite case Fiexcl1 is single-valued so that Gy D minxi jF iexcl1y middot xi is well de ned In the linear and areal cases F iexcl1

may not be single-valued Points that are in the unit square but notin R lead to ats in F that correspond to regions in the unit squarewith frac14s D 0 However Fiexcl1y always will be closed and boundedso that Gy D minfxjx 2 Fiexcl1yg is well de ned In all cases theintensity function frac14 is positive at s D f iexcl1Gy that is there isa population element at s Thus f iexcl1 plusmn G maps 0M] onto the tar-get population that is f iexcl1 plusmn G associates every point in 0 M] witha unique element in the population

It follows that selecting a sample from 0 M] also selects pop-ulation elements via the mapping f iexcl1 plusmn G To get a sample withan inclusion function equal to the target inclusion density we selecta sample from 0M] by splitting the range into M unit-length in-tervals 01] 1 2] M iexcl 1M] and picking one point in eachinterval Because of hierarchical randomizationwe gain no additionalldquorandomnessrdquo by picking the points independently so we use system-atic sampling with a random start and a unit-length selection intervalThe selection procedure de nes an inclusion probability density func-tion on 0M] with a correspondingmeasure PM cent Note that PM co-incides with Lebesgue measure on 0 M] in particular the measureof a subinterval of 0 M] is its length We induce a measure P1 on I

via P1B DRGiexcl1B dP M and in turn induce a measure P2 on I2

via P2B DR

f iexcl1B dP 1 The measure P2 is an inclusion probability

measure on I2 and P2B D wB so the sample selection methoddoes give an inclusion probability function equal to the target sampleintensity function

[Received August 2002 Revised September 2003]

REFERENCES

Bellhouse D R (1977) ldquoSome Optimal Designs for Sampling in Two Dimen-sionsrdquo Biometrika 64 605ndash611

Bickford C A Mayer C E and Ware K D (1963) ldquoAn Ef cient Sam-pling Design for Forest Inventory The Northeast Forest Resurveyrdquo Journalof Forestry 61 826ndash833

278 Journal of the American Statistical Association March 2004

Breidt F J (1995) ldquoMarkov Chain Designs for One-per-Stratum SamplingrdquoSurvey Methodology 21 63ndash70

Brewer K R W and Hanif M (1983) Sampling With Unequal ProbabilitiesNew York Springer-Verlag

Cochran W G (1946) ldquoRelative Accuracy of Systematic and Strati ed Ran-dom Samples for a Certain Class of Populationsrdquo The Annals of Mathemati-cal Statistics 17 164ndash177

Cordy C (1993) ldquoAn Extension of the HorvitzndashThompson Theorem to PointSampling From a Continuous Universerdquo Probability and Statistics Letters18 353ndash362

Cotter J and Nealon J (1987) ldquoArea Frame Design for Agricultural SurveysrdquoUS Department of Agriculture National Agricultural Statistics Service Re-search and Applications Division Area Frame Section

Dalenius T Haacutejek J and Zubrzycki S (1961) ldquoOn Plane Sampling and Re-lated Geometrical Problemsrdquo in Proceedings of the 4th Berkeley Symposiumon Probability and Mathematical Statistics 1 125ndash150

Das A C (1950) ldquoTwo-Dimensional Systematic Sampling and the AssociatedStrati ed and Random Samplingrdquo Sankhya 10 95ndash108

Gibson L and Lucas D (1982) ldquoSpatial Data Processing Using BalancedTernaryrdquo in Proceedings of the IEEE Computer Society Conference on Pat-tern Recognition and Image Processing Silver Springs MD IEEE ComputerSociety Press

Gilbert R O (1987) Statistical Methods for Environmental Pollution Moni-toring New York Van Nostrand Reinhold

Hausdorff F (1957) Set Theory New York ChelseaHazard J W and Law B E (1989) Forest Survey Methods Used in the USDA

Forest Service EPA6003-89065 Corvallis Oregon US EnvironmentalProtection Agency Of ce of Research and Development Environmental Re-search Laboratory

Horn C R and Grayman W M (1993) ldquoWater-Quality Modeling With EPARiver Reach File Systemrdquo Journal of Water Resources Planning and Man-agement 119 262ndash274

Horvitz D G and Thompson D J (1952) ldquoA Generalization of SamplingWithout Replacement From a Finite Universerdquo Journal of the American Sta-tistical Association 47 663ndash685

Iachan R (1985) ldquoPlane Samplingrdquo Statistics and Probability Letters 50151ndash159

IDEM (2000) ldquoIndiana Water Quality Report 2000rdquo Report IDEM34020012000 Indiana Department of Environmental Management Of ce of Wa-ter Management Indianapolis Indiana

Insightful Corporation (2002) ldquoS-PLUS 6 for Windows Language ReferencerdquoInsightful Corporation Seattle WA

Karr J R (1991) ldquoBiological Integrity A Long Neglected Aspect of WaterResource Managementrdquo Ecological Applications 1 66ndash84

Kish L (1987) Statistical Design for Research New York WileyMahalanobis P C (1946) ldquoRecent Experiments in Statistical Sampling in

the Indian Statistical Instituterdquo Journal of the Royal Statistical Society 109325ndash370

Mark D M (1990) ldquoNeighbor-Based Properties of Some Orderings of Two-Dimensional Spacerdquo Geographical Analysis 2 145ndash157

Mateacutern B (1960) Spatial Variation Stockholm Sweden Meddelanden fraringnStatens Skogsforskningsinstitut

Messer J J Arsiss C W Baker J R Drouseacute S K Eshleman K NKaufmann P R Linthurst R A Omernik J M Overton W S Sale M JSchonbrod R D Stambaugh S M and Tuschall J R Jr (1986) Na-tional Surface Water Survey National Stream Survey Phase I-Pilot SurveyEPA-6004-86026 Washington DC US Environmental ProtectionAgency

MunhollandP L and Borkowski J J (1996) ldquoSimple Latin Square SamplingC 1 A Spatial Design Using Quadratsrdquo Biometrics 52 125ndash136

Olea R A (1984) ldquoSampling Design Optimization for Spatial FunctionsrdquoMathematical Geology 16 369ndash392

Overton W S and Stehman S V (1993) ldquoProperties of Designs for SamplingContinuous Spatial Resources From a Triangular Gridrdquo Communications inStatistics Part AmdashTheory and Methods 22 2641ndash2660

Patterson H D (1950) ldquoSampling on Successive Occasions With Partial Re-placement of Unitsrdquo Journal of the Royal Statistical Society Ser B 12241ndash255

Peano G (1890) ldquoSur Une Courbe Qui Remplit Toute Une Aire Planerdquo Math-ematische Annalen 36 157ndash160

Quenouille M H (1949) ldquoProblems in Plane Samplingrdquo The Annals of Math-ematical Statistics 20 335ndash375

Saalfeld A (1991) ldquoConstruction of Spatially Articulated List Frames forHousehold Surveysrdquo in Proceedings of Statistics Canada Symposium 91Spatial Issues in Statistics Ottawa Canada Statistics Canada pp 41ndash53

Sen A R (1953) ldquoOn the Estimate of the Variance in Sampling With Vary-ing Probabilitiesrdquo Journal of the Indian Society of Agricultural Statistics 7119ndash127

Simmons G F (1963) Introduction to Topology and Modern Analysis NewYork McGrawndashHill

Stehman S V and Overton W S (1994) ldquoEnvironmental Sampling and Mon-itoringrdquo in Handbook of Statistics Vol 12 eds G P Patil and C R RaoAmsterdam The Netherlands Elsevier Science pp 263ndash305

Stevens D L Jr (1997) ldquoVariable Density Grid-Based Sampling Designs forContinuous Spatial Populationsrdquo Environmetrics 8 167ndash195

Stevens D L Jr and Olsen A R (1999) ldquoSpatially Restricted Surveys OverTime for Aquatic Resourcesrdquo Journal of Agricultural Biological and Envi-ronmental Statistics 4 415ndash428

(2000) ldquoSpatially-Restricted Random Sampling Designs for Design-Based and Model-Based Estimationrdquo in Accuracy 2000 Proceedings of the4th International Symposium on Spatial Accuracy Assessment in Natural Re-sources and Environmental Sciences Delft The Netherlands Delft Univer-sity Press pp 609ndash616

(2003) ldquoVariance Estimation for Spatially Balanced Samples of Envi-ronmental Resourcesrdquo Environmetrics 14 593ndash610

Strahler A N (1957) ldquoQuantitative Analysis of Watershed GeomorphologyrdquoTransactions of the American Geophysical Union 38 913ndash920

Thompson S K (1992) Sampling New York WileyUrquhart N S Overton W S and Birkes D S (1993) ldquoComparing

Sampling Designs for Monitoring Ecological Status and Trends Impact ofTemporal Patternsrdquo in Statistics for the Environment eds V Barnett andK F Turkman New York Wiley pp 71ndash86

USGS (1999) ldquoThe National Hydrography Datasetrdquo Fact Sheet 106-99 USGeological Survey

Wolter K (1985) Introduction to Variance Estimation New York Springer-Verlag

Wolter K M and Harter R M (1990) ldquoSample Maintenance Based on PeanoKeysrdquo in Proceedings of the 1989 International Symposium Analysis of Datain Time Ottawa Canada Statistics Canada pp 21ndash31

Yates F (1981) Sampling Methods for Censuses and Surveys (4th ed) Lon-don Grif n

Yates F and Grundy P M (1953) ldquoSelection Without Replacement FromWithin Strata With Probability Proportional to Sizerdquo Journal of the RoyalStatistical Society Ser B 15 253ndash261

Page 5: Spatially Balanced Sampling of Natural Resources · Spatially Balanced Sampling of Natural Resources DonL.STEVENSJr. and Anthony R. OLSEN The spatial distribution of a natural resource

266 Journal of the American Statistical Association March 2004

is as t D t1t2t3 cent cent cent where each digit ti is either a 0 1 2or 3 A function hp I I is a hierarchical permutationif hpt D p1t1pt12t2pt1 t23t3 cent cent cent where pt1 t2centcentcenttniexcl1ncent isa permutationof f0 12 3g for each uniquecombinationof dig-its t1 t2 tniexcl1 Again we ensure that hp is 1ndash1 by alwaysusing the expansion with an in nite number of nonzero digitsAny quadrant-recursive function can be expressed as the com-position of f0 with some hierarchical permutation hp becausethe associations f Qn

jk D J nm determine the series of permuta-

tions and the permutations de ne the associationsIf the permutations that de ne hpcent are chosen at random

and independentlyfrom the set of all possible permutations wecall hpcent a hierarchical randomization function and call theprocess of applying hpcent hierarchical randomization

22 Sample Selection With Probability Proportional toArbitrary Intensity Function

We assume that the design speci cations de ne a desiredsample intensity function frac14s that is the number of samplesper unit measure of the populationFor example if the popula-tion were a stream network frac14s might specify the number ofsamples per kilometer of stream at s For a discrete populationfrac14s has the usual nite-population-sampling interpretation asthe target inclusion probability of the population unit locatedat s We call frac14s an intensity function because we have notyet introduced a probability measure In Appendix B we de-velop the details of a sample selection method that yields aninclusion-probability function equal to frac14s The concept be-hind the method is the composition of a hierarchical random-ization function with a function that assigns to every intervalin f R a weight equal to the total of the intensity function ofits preimage in R In effect we stretch the image interval viaa distribution function F so that its total length is equal to thesample size M We pick M points by taking a systematic sam-ple with a unit separation along the stretched image and we mapthese points back into the domain R via the inverse function toget the sample of the population We show in Appendix B thatthis procedure does indeed give a sample with an inclusion-probability function equal to the intensity function frac14s

The technique of randomly mapping two-dimensional spaceto a line segment systematically sampling from the range of thedistribution function and then mapping back to the populationelements always produces a sample with the desired rst-orderinclusion-probability function as long as f is 1ndash1 and measur-able We required that f be quadrant recursive and claim thatthis is suf cient to give a spatially balanced sample This claimfollows from the fact that the map f iexcl1 plusmn F=M plusmn f transformsthe unequal intensity surface de ned by frac14 into an equiproba-ble surface The quadrant-recursive property of f guaranteesthat the sample is evenly spread over the equiprobable sur-face (in the sense that each subquadrant receives its expected

number of samples) to the resolution determined by the samplesize M

23 Reverse Hierarchical Ordering

The sample points selected by mapping the systematic pointsalong 0 M] back to the population domain will be ordered ina way that follows the quadrant-recursiveness of f temperedby an allowance for unequal probability selection Thus the rst quarter of the points all will come from the same ldquoquad-rantrdquo of the equiprobabledomain and all will be approximatelyneighbors in the original populationdomain It follows that fourpoints one picked from each quarter of the sample points or-dered by the systematic selection will be a spatially balancedsample Because the random permutations that de ne the hier-archical randomization are selected independently of one an-other it makes no difference from a distributional standpointwhether we pick the points systematically from each quarteror make random selections from each quarter Therefore welose no randomness by picking the points that occupy positionsthat correspond to being at the beginningone-quarter one-halfand three-quarters of the way through the ordered list of samplepoints

Within each quarter of the list the points are again quadrant-recursively ordered so points picked at the beginning one-quarter one-half and three-quarters of the way througheach quarter of the list will be spread out over the correspond-ing quadrant and so on down through the sequence of sub-quadrants We can utilize these properties by reordering thesystematically selected list so that at any point in the reorderedlist the samples up to that point are well spread out over thepopulationdomain

The order is most convenientlyexpressed in terms of a base-4fraction where the fraction expresses the relative position inthe systematically ordered list Thus the rst four points cor-respond to the fractions 0 1 2 34 D 0 1=4 1=2 3=410Stepping down a subquadrant level corresponds to addinga digit position to the base-4 fraction which we ll in sucha way as to spread the sequence of points over the populationdomain The pattern for the rst 16 points is shown in Table 1Note that the order corresponds to the ranking obtained by re-versing the sequence of base-4 digits and treating the reversedsequence as a base-4 fraction

We can continue this same pattern of adding digit positionsthrough as many positions as necessary to order the entire sam-ple The resulting order is called reverse hierarchical orderIt remains to show that reverse hierarchical order does indeedgive a spatially well-balanced sample for any m middot M Clearlythis is the case for m D 4k because the reduced sample canbe viewed as a sample selected from a complete GRTS designStevens (1997) derived an analytic expression for the pairwiseinclusion density for some special intermediate cases Here weinvestigate the spatial balance properties using simulation

Table 1 Generation of Reverse Hierarchical Order

Reverse Reverse Reverse ReverseOrder Base 4 base 4 Order Base 4 base 4 Order Base 4 base 4 Order Base 4 base 4

1 00 00 5 01 10 9 02 20 13 03 302 10 01 6 11 11 10 12 21 14 13 313 20 02 7 21 12 11 22 22 15 23 324 30 03 8 31 13 12 32 23 16 33 33

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 267

3 SPATIAL PROPERTIES OF GRTS SAMPLE POINTS

In this section we investigate the spatial balance or regu-larity of the sample points produced by a GRTS design Wenoted in the Introduction that generally the ef ciency of anenvironmental sample increases as spatial regularity increasesA design with regularity comparable to a maximally strati edsample should have good ef ciency Choosing a suitable sta-tistic to describe regularity is nontrivial because the popula-tion domain itself is likely to have some inherent nonregularity(eg variation in spatial density for a nite or linear popula-tion) and because of the need to account for variable inclusionprobability The measure of regularity needs to describe reg-ularity over the inclusion-probability-weighted irregular pop-ulation domain Various statistics to assess the regularity ofa point process have been proposed in the study of stochasticpoint processes One class of descriptive statistics is based oncounts of event points within cells of a regular grid that cov-ers the process domain The mean count is a measure of theprocess intensity and the variance of the counts is a measureof the regularity The usual point process approach is to invokeergodicity and take expectation over a single realization In thepresent case the expectationshould and can be taken over repli-cate sample selections

We illustrate this approach using an arti cial nite popula-tion that consists of 1000 points in the unit square with a spatialdistribution constructed to have high spatial variability that isto have voids and regions with densely packed points Variableprobability was introduced by randomly assigning 750 unitsa relative weight of 1 200 units a weight of 2 and 50 unitsa weight of 4 The inclusion probability was obtained by scal-ing the weights to sum to the sample size We divided the unit

square into 100 square cells with sides 1 units Fifty-one of thecells were empty The expected sample sizes (the sum of theinclusion probability for each cell) for the 49 nonempty cellsranged from 037 to 4111

We compared the regularity of three sampling designs in-dependent random sampling (IRS) spatially strati ed sampling(SSS) and GRTS sampling For each sampling scheme we se-lected 1000 replicates of a sample of 50 points and counted thenumber of sample points that fell into each of the 49 nonemptycells de ned in the previous paragraph For the IRS sample weused the S-PLUS (Insightful Corporation 2002) ldquosamplerdquo func-tion with ldquoprobrdquo set to the element inclusion probability

As we noted in the Introductionthere is no general algorithmfor partitioning an arbitrary nite spatial population with vari-able inclusionprobability into spatial strata with equal expectedsample sizes For this exercisewe chose to use equal-area stratawith variable expected sample sizes For simplicity we chosesquare strata We picked a side length and origin so that (1)the strata were not coherent with the 1 pound 1 cells used for reg-ularity assessment and (2) about 50 stratum cells had at leastone population point The strata we used were offset from theorigin by 03 03 with a side length of 095 Exactly 50 strat-i cation cells were nonemptywith expected sample sizes rang-ing from 037 to 4111 Figure 2 shows the population with thestrati cation cells overlaid

We selected the strati ed sample in two stages The fractionalparts of the expected sample sizes will always sum to an inte-ger in this case 21 The rst step in the sample selection was toselect which 21 of the 50 strata would receive an ldquoextrardquo samplepoint For this step we again used the S-PLUS ldquosamplerdquo func-tion this time with ldquoprobrdquo set to the fractional part of the ex-pected sample size The second step in the sample selection was

Figure 2 Finite Population Used in Spatial Balance Investigation Overlaid With Grid Cells Used for Stratication Cell cross-hatching indicatesthe expected sample size in each cell

268 Journal of the American Statistical Association March 2004

to pick samples in each of the 50 strata that had a sample sizeequal to the integerpart of the expected sample size plus 1 if thestratum was selected in stage 1 Again the sample was selectedwith the ldquosamplerdquo function this time with ldquoprobrdquo set to the ele-ment inclusionprobabilityThis two-stage procedurealways se-lects exactly 50 samples with the desired inclusion probability

In Figure 3 we plot the variance of the achieved sample sizein each of the evaluation cells versus the expected sample sizewith lowess tted lines Of the three designs the IRS has thelargest variance and the GRTS has the smallest the SSS designis approximatelymidway between Strati cation with one sam-ple per cell would likely have about the same variance as theGRTS

Another common way to characterize a one-dimensionalpoint process is via the interevent distance for example themean interevent time for a time series measures the intensity ofthe process and the variance measures the regularity An anal-ogous concept in two-dimensions is that of Voronoi polygonsFor a set of event points fs1 s2 sng in a two-dimensionaldomain the Voronoi polygon 9i for the ith point is the collec-tion of domain points that are closer to si than to any other sj inthe set Note that in the case of a nite population the Voronoildquopolygonsrdquoare collectionsof populationpoints and for a linearpopulation they are collections of line segments

We propose using a statistic based on Voronoi polygons todescribe the regularity of a spatial sample For the sample S

consisting of the points fs1 s2 sng let ordmi DR

9ifrac14s dAacutes

so that vi is the total inclusion probability of the Voronoi poly-gon for the ith sample point and set sup3 D Varfordmig For a -nite population with variable inclusion probability vi is thesum of the inclusion probability of all population units closerto the sample point si than to any other sample point BecauseP

i j9ij D jRj andR

Rfrac14s dAacutes D n E[vi ] D 1 We note that

for an equiprobable sample of a two-dimensionalcontinua sup3 isequal to the variance of the area of the Voronoi polygons for thepoints of S multipliedby the square of the inclusionprobability

Figure 3 Comparison of the Regularity of GRTS SSS and IRS De-signs Results are based on the mean of 1000 samples of size 50 Theachieved sample size is the number of samples that fell into 1 pound 1square cells that tiled the population domain Lines were tted withlowess (N generalized random tessellation stratied sampling pound inde-pendent random sampling brvbar spatially stratied sampling)

For the kinds of applications that we have in mind the spa-tial context of the population is an intrinsic aspect of the sampleselection For a nite population the spatial context simplycomprises the locationsof the populationunits for a linear pop-ulation the spatial context is the network and for an areal re-source the spatial context is described by the boundary of theresource domain which may be a series of disconnected poly-gons The effect of the interplay of sampling design and spa-tial context on properties of the sample cannot be ignored Forsmall to moderate sample sizes or for highly irregular domainsthe spatial context can have a substantial impact on the distri-bution of sup3 Because of the spatial dependence the derivationof a closed form for the distribution of sup3 does not seem feasi-ble even for simple sampling designs such as IRS Howeverfor most cases it should be relatively easy to simulate the dis-tribution of sup3 under IRS to obtain a standard for comparisonThe regularity of a proposed design can then be quanti ed asthe ratio sup3proposed design=sup3IRS where ratios less than 1indicate more regularity than an IRS design

We evaluated spatial balance using the sup3 ratio under threescenarios (1) a variable probability sample from a nite popu-lation (2) an equiprobable point sample from an areal popula-tion de ned on the unit square and (3) an equiprobable pointsample from the same extensive populationbut with randomlylocated square holes to model nonresponseand imperfect frameinformation

For the nite population study we drew 1000 samples ofsize 50 from the previously described nite population for boththe GRTS and the IRS designs To illustrate the ability of theGRTS design to maintain spatial regularity as the sample sizeis augmented we ordered the GRTS points using reverse hi-erarchical ordering We then calculated the sup3 ratio beginningwith a size of 10 and adding one point at a time following thereverse hierarchical order We also drew 1000 samples of size50 using the previously discussed spatial strati cation Becausethere is no sensible way to add the strati ed sample points oneat a time we can compute the sup3 ratio only for the complete sam-ple of 50 points Figure 4 is a plot of the sup3 ratio for GRTS andIRS versus sample size The single sup3 ratio for SSS(50) is alsoshown For the GRTS design the sup3 ratio has a maximum valueof 587 with 10 samples and gradually tapers off to 420 with50 samples Although it would be dif cult to prove we sus-pect that the gradual taper is due to lessening edge effect withincreasing sample size that is fewer of the Voronoi polygonscross the void regions in the population domain We note thatthe valleys in the sup3 ratio occur at multiples of 4 with the mostextreme dips occurring at powers of 4 This is a consequenceof quadrant-recursive partitioning maximum regularity occurswith one point from each of the four quadrants We also notethat the SSS(50) value of the sup3 ratio is 550 compared to thecorrespondingvalue of 420 for the GRTS design Inasmuch asthe GRTS is analogous to a one-sample-per-stratum SSS wewould expect the GRTS to be as ef cient as a maximally ef -cient SSS

For our extensive population study we selected 1000 sam-ples of size M D 256 from the unit square using the GRTS

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 269

Figure 4 The sup3 Ratio as a Function of Sample Size Based on 1000Replicate Samples From an Arti cial Finite Population Sample pointswere added one point at a time up to the maximum sample size of 50following reverse hierarchical order for the GRTS sample The sup3 ratio fora spatially stratied sample is also indicated on the plot

design and ordered the samples using the reverse hierarchicalorder As for the nite populationstudy we calculated the sup3 ra-tio as the points were added to the sample one at a time be-ginning with point number 10 The holes represent nontargetor access-denied elements that were a priori unknown Samplepoints that fell in the holes were discarded resulting in a vari-able number of sample points in the target domain As for thecomplete domain we ordered the points using reverse hierar-chical ordering and then calculated the sup3 ratio as the points wereadded one at a time Because the sample points that fall into thenontarget areas contribute to the sample point density but not tothe sample size the sup3 ratio was plotted versus point density

We used three different distributions of hole size constantlinearly increasing and exponentially increasing In each casethe holes comprise 20 of the domain area Figure 5 shows the

Figure 5 Void Patterns Used to Simulate InaccessiblePopulation El-ements

Figure 6 The sup3 Ratio as a Function of Point Density Based on 1000Replications of a Sample of Size 256 (mdashmdash continuous domain with novoids iexcl iexcl iexcliexcl exponentialy increasing polygon size - - - - - linearly in-creasing polygon size cent cent centmdashcent cent centmdash constant polygon size)

placementof the holes for each scenario and Figure 6 shows thesup3 ratio for all four scenarios no voids exponentiallyincreasinglinearly increasing and constant size

In every scenario the variance ratio is much less than 1Except for small sample sizes the ratio stays in the range of2 to 4 The gradual decrease as the sample size increases is dueto the decreasing impact of the boundaryas the sample size in-creases the proportion of polygons that intersect the boundarydecreases A similar effect is seen with the different inaccessi-bility scenarios even though the inaccessible area is constantthe scenarios with greater perimeter cause more increase invariance

4 STATISTICAL PROPERTIES OF GRTS DESIGN

41 Estimation

The GRTS design produces a sample with speci ed rst-order inclusion probabilities so that the HorvitzndashThompson(Horvitz and Thompson 1952) estimator or its continuouspop-ulation analog (Cordy 1993 Stevens 1997) can be applied toget estimates of population characteristics Thus for examplean estimate of the population total of a response z is given byOZT D

Psi2Rzsi=frac14si Stevens (1997) provided exact ex-

pressions for second-order inclusion functions for some specialcases of a GRTS These expressions can also be used to provideaccurate approximations for the general case Unfortunatelythe variance estimator based on using these approximations inthe usual HorvitzndashThompson (HT) or YatesndashGrundyndashSen (YGYates and Grundy 1953 Sen 1953) estimator tends to be unsta-ble The design achieves spatial balance by forcing the pairwiseinclusion probability to approach 0 as the distance between thepoints in the pair goes to 0 Even though the pairwise inclu-sion density is nonzero almost everywhere any moderate-sizedsample will nevertheless have one or more pairs of points thatare close together with a correspondinglysmall pairwise inclu-sion probability For both the HT and YG variance estimatorsthe pairwise inclusion probability appears as a divisor The cor-

270 Journal of the American Statistical Association March 2004

responding terms in either HT or YG variance estimators willtend to be large leading to instability of the variance estimator

Contrast-based estimators of the form OVCtr OZT DP

i wiy2i

where yi is a contrast of the form yi DP

k cikzsk withPk cik D 0 have been discussed by several authors (Yates

1981 Wolter 1985 Overton and Stehman 1993) For an RTSdesign Overton and Stehman also considered a ldquosmoothedrdquocontrast-based estimator of the form OVSMO OZT D

Pi wizi iexcl

zcurreni 2 where zcurren

i called the smoothed value for data point zi istaken as a weighted mean of a point plus its nearest neighborsin the tessellation

Stevens and Olsen (2003) proposed a contrast-based esti-mator for the GRTS design that bears some resemblance tothe Overton and Stehman smoothed estimator The single con-trast zi iexcl zcurren

i 2 is replaced with an average of several con-trasts over a local neighborhood analogous to a tessellationcell and its nearest neighbors in the RTS design A heuris-tic justi cation for this approach stems from the observationthat the inverse images of the unit-probability intervals on theline form a random spatial strati cation of the population do-main The GRTS design conditional on the strati cation isa one-sample-per-stratumspatially strati ed sample Recall thatOZT D

Psi2Rzsi=frac14si where zsi is a sample from the

ith random stratum The selections within strata are condition-ally independent of one another so that

V OZT DX

si2R

E

microV

sup3zsi

frac14si

shyshyshystrata

acutepara

The proposed variance estimator approximates E[V zsi=

frac14sij strata] by averaging several contrasts over a localneighborhoodof each sample point The estimator is

OVNBH OZT DX

si2R

X

sj 2Dsi

wij

sup3zsj

frac14sj iexcl

X

sk2Dsi

wikzsk

frac14sk

acute2

where Dsi is a local neighborhood of the si The weightswij are chosen to re ect the behavior of the pairwise inclu-sion function for GRTS and are constrained so that

Pi wij DP

j wij D 1 Stimulation studies with a variety of scenarioshave shown the proposed estimator to be stable and nearly unbi-ased Applications with real data have consistently shown thatour local neighborhoodvariance estimator produces smaller es-timates than the HorvitzndashThompson estimator when IRS is as-sumed to approximate for the joint inclusion probabilities

42 Inverse Sampling

The reverse hierarchical ordering provides the ability to doinverse sampling that is to sample until a given number ofsamples are obtained in the target population The true inclu-sion probability in this case depends on the spatial con gura-tion of the target populationwhich may be unknownHoweverone can compute an inclusion probability that is conditional onthe achieved sample size in the target population being xedFor example suppose we want M sample points in our do-main R We do not know the exact boundaries of R but areable to enclose R in a larger set Rcurren We select a sample of sizeMcurren gt M from Rcurren using an inclusion density frac14 curren scaled so that

Table 2 Domain Area Estimates Using ConditionalInclusion Probability

Targetsamplesize

Mean estimated domain area

Exponential Constant Linear

25 8000979 7969819 801058950 7995775 7979406 8005739

100 7994983 7980543 8002237150 7994777 7997587 7995685

RRcurren frac14currens dAacutes D M curren The inclusion density for the k-point

reverse hierarchical ordered sample is frac14currenk s D k=Mcurrenfrac14currens

Using the inclusion density frac14 currenk the expected number of sam-

ples in R is

Mk DZ

R

frac14currenk s dAacutes D

Z

RcurrenIRsfrac14 curren

k s dAacutes

We cannot compute Mk because the boundaryof R is unknownbut an estimate is

OMk DX

i

IR sifrac14currenk si

frac14currenk si

DX

i

IRsi

We pick Qk so that OMk D M and base inference on frac14currenQk Thus

for example an estimate of the unknown extent of R is j ORj DPiIRsi=frac14curren

Qksi

We illustrate this using the same inaccessibility scenarios asfor the spatial balance simulation Results are summarized inTable 2 In each case the true area of R is 8 so that the esti-mator using frac14curren

Qk is either unbiased or nearly so

43 Statistical Ef ciency

As discussed in the Introduction sampling designs withsome degree of spatial regularity for example systematic grid-based or spatially strati ed designs tend to be more ef cientfor sampling natural resources than designs with no spatialstructure The GRTS design takes the concept of spatial strat-i cation carries it to an extreme and gives it exibility androbustness The basis for these claims is that for the case ofan equiprobable sample of an areal resource over a continuousconnected domain a GRTS sample with size n D 4k is a spa-tially strati ed sample with one sample point per stratum Inthis case the strata are square grid cells with a randomly locatedorigin Generally the ef ciency of a spatially strati ed sampleincreases as the number of strata increases (samples per stratumdecreases) so maximal ef ciency is obtained for a one-point-per-stratum-design Thus in this restricted case the GRTS hasthe same ef ciency as the maximally ef cient spatial strati ca-tion

The spatial regularity simulation studies provide some in-sight into less restrictive cases First the ldquono-voidrdquo case ofthe continuous domain study shows that the spatial regularityis not seriously degraded for sample sizes that are not powersof 4 so that even for intermediate sample sizes the GRTS ef- ciency should be close to the ef ciency of maximal spatialstrati cation Second the ldquoholesrdquo cases show that for irreg-ularly shaped domains GRTS maintains spatial regularity Inthis case GRTS with n D 4k is again a one-point-per-stratum

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 271

design but the strata are no longer regular polygonsNeverthe-less GRTS should have the same ef ciency as maximal strati- cation

An example of circumstances where ef ciency is dif cultto evaluate is a nite population study with variable probabil-ity and irregular spatial density In these circumstances spatialstrata can be very dif cult to form and in fact it may be impos-sible to form strata with a xed number of samples per stratumA GRTS sample achieves the regularity of a one-sample-per-stratum strati cation and so should have the same ef ciency

The overwhelming advantage of a GRTS design is not that itis more ef cient than spatial strati cation but that it can be ap-plied in a straightforward manner in circumstances where spa-tial strati cation is dif cult All of the pathologies that occurin sampling natural populations (poor frame information inac-cessibility variable probability uneven spatial pattern missingdata and panel structures) can be easily accommodated withinthe GRTS design

5 EXAMPLE APPLICATION TO STREAMS

The Indiana Department of Environmental Management(IDEM) conducts water quality and biological assessments ofthe streams and rivers within Indiana For administrative pur-poses the state is divided into nine hydrologicbasins East ForkWhite River Basin West Fork White River Basin Upper IllinoisRiver Basin Great Miami River Basin Lower Wabash RiverBasin Patoka River Basin Upper Wabash River Basin GreatLakes Basin and Ohio River Basin All basins are assessedonce during a 5-year period typically two basins are completedeach year In 1996 IDEM initiated a monitoring strategy thatused probability survey designs for the selection of samplingsite locations We collaborated with them on the survey designIn 1997 a GRTS multidensity design was implemented for theEast Fork White River Basin and the Great Miami River BasinIn 1999 another GRTS multidensity design was implementedfor the Upper Illinois Basin and the Lower Wabash These de-signs will be used to illustrate the application of GRTS surveydesigns to a linear network

The target population for the studies consists of all streamsand rivers with perennially owing water A sample frameRiver Reach File Version 3 (RF3) for the target populationis available from the US Environmental Protection Agency(Horn and Grayman 1993) The RF3 includes attributes that en-able perennial streams and rivers to be identi ed but results inan overcoverage of the target population due to coding errorsIn addition Strahler order is available to classify streams andrivers into relative size categories (Strahler 1957) A headwaterstream is a Strahler rst-order stream two rst-order streamsjoining results in a second-order stream and so on Approx-imately 60 of the stream length in Indiana is rst order

Table 3 Sample Frame Stream and River Length by Basin andStrahler Order Category

Strahler order category length (km)

Basin Total length (km) 1 2 3 C

E Fork White 6802385 3833335 2189494 779556Great Miami 2270018 1501711 621039 147268L Wabash 7601418 4632484 1331228 1637706U Illinois 5606329 4559123 500188 547018

20 is second order 10 is third order and 10 is fourth andgreater (see Table 3) In 1997 IDEM determined that the sam-ple would be structured so that approximately an equal num-ber of sites would be in rst order second and third order andfourth C order for the East Fork White River and the Great Mi-ami River basins In 1999 the sample was modi ed to have anequal number of sites in rst second third and fourth C ordercategories for the Lower Wabash and Upper Illinois basins

The GRTS multidensity survey designs were applied In bothyears six multidensity categories were used (three Strahler or-der categories in each of two basins) Although four Strahlerorder categories were planned in 1999 the stream lengths as-sociated with the third and fourth C categories were approxi-mately equal so a single category that combined the samplesizes was used To account for frame errors landowner denialsand physically inaccessible stream sites a 100 oversamplewas incorporated in 1999 The intent was to have a minimumof 38 biological sites with eld data in 1999 this was not donein 1997 Table 4 summarizes the number of sites expected andactually evaluated as well as the number of nontarget targetnonresponse and sampled sites Almost all of the nonresponsesites are due to landowner denial In 1999 the sites were usedin reverse hierarchical order until the desired number of ac-tual eld sample sites was obtained The biological sites werea nested subsample of the water chemistry sites and were takenin reverse hierarchicalorder from the water chemistry sites Fig-ures 7ndash10 show the spatial pattern of the stream networks andthe GRTS sample sites for each of the four basins by Strahlerorder categories Although this is an example of a single real-izationof a multidensityGRTS design all realizationswill havea similar spatial pattern Prior to statistical analysis the initialinclusion densities are adjusted to account for use of oversam-ple sites by recalculating the inclusion densities by basin

Indiana determined two summary indices related to the eco-logical conditionof the streams and rivers the IBI score whichis a sh community index of biological integrity (Karr 1991)that assesses water quality using resident sh communities asa tool for monitoring the biological integrity of streams and theQHEI score which is a habitat index based on the Ohio Envi-ronmental Protection Agency qualitative habitat evaluation in-dex (see IDEM 2000 for detailed descriptions of these indices)

Table 4 Survey Design Sample Sizes for Basins Sampled in 1997 and 1999

Expected Evaluated Nontarget Target Nonresponse Water chemistry BiologicalBasin sample size sample size sites sites sites sites sites

E Fork White 60 60 5 55 9 35 34Great Miami 40 40 12 28 5 19 19L Wabash 128 91 11 80 9 71 39U Illinois 128 85 8 77 5 72 41

272 Journal of the American Statistical Association March 2004

Figure 7 East Fork White River Basin Sample Sites by Multidensity Categories

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 273

Figure 8 Great Miami River Basin Sample Sites by Multidensity Categories

274 Journal of the American Statistical Association March 2004

Figure 9 Upper Illinois River Basin Sample Sites by Multidensity Categories

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 275

Figure 10 Lower Wabash River Basin Sample Sites by Multidensity Categories

276 Journal of the American Statistical Association March 2004

Table 5 Population Estimates With IRS and Local Variance Estimates

Indicator IRS Local DifferenceSubpopulation score N sites Mean std err std err ()

L Wabash IBI 39 361 21 14 iexcl568U Illinois IBI 41 325 17 13 iexcl448E Fork White IBI 32 351 13 12 iexcl223Great Miami IBI 19 408 27 22 iexcl335L Wabash QHEI 39 556 23 16 iexcl522U Illinois QHEI 41 433 21 16 iexcl395E Fork White QHEI 34 543 21 15 iexcl459Great Miami QHEI 19 678 22 19 iexcl260

Table 5 summarizes the population estimates for IBI andQHEI scores for each of the four basins The associated stan-dard error estimates are based on the HorvitzndashThompson ratiovariance estimator assuming an independent random sampleand on the local neighborhoodvariance estimator described inSection 41 On average the neighborhood variance estimatoris 38 smaller than the IRS variance estimator Figure 11 il-lustrates the impact of the variance estimators on con denceintervals for cumulative distribution function estimates for theLower Wabash Basin

6 DISCUSSION

There are a number of designs that provide good disper-sion of sample points over a spatial domain When we appliedthese designs to large-scale environmental sampling programsit quickly became apparent that we needed a means (1) toaccommodate variable inclusion probability and (2) to adjustsample sizes dynamically These requirements are rooted inthe very fundamentals of environmental management The rstrequirement stems from the fact that an environmental re-source is rarely uniformly important in the objective of themonitoring there are always scienti c economic or politi-cal reasons for sampling some portions of a resource moreintensively than others Two features of environmental moni-toring programs drive the second requirement First these pro-grams tend to be long lived so that even if the objectives ofthe program remain unchangedthe ldquoimportantrdquosubpopulationschange necessitating a corresponding change in sampling in-tensity Second a high-quality sampling frame is often lack-ing for environmental resource populationsAs far as we knowthere is no other technique for spatial sampling that ldquobalancesrdquoover an intensity metric instead of a Euclidean distance metricor permits dynamic modi cation of sample intensity

Adaptive sampling (Thompson 1992 pp 261ndash319) is an-other way to modify sample intensity However there are somesigni cant differences between GRTS and adaptive sampling inthe way the modi cation is accomplished Adaptive samplingincreases the sampling intensity locally depending on the re-sponse observed at a sample point whereas the GRTS intensitychange is global

The GRTS rst-order inclusion probability (or density) canbe made proportional to an arbitrary positive auxiliary vari-able for example a signal from a remote sensing platformor a sample intensity that varies by geographical divisions orknown physical characteristics of the target populationIn somepoint and linear situations it may be desirable to have thesample be spatially balanced with respect to geographic spacerather than with respect to the population density This can be

Figure 11 Stream Network and Sample Site Spatial Patterns by Mul-tidensity Category for the Lower Wabash Basin (mdashmdash- CDF estimateiexcl iexcl iexcl iexcl iexcl 95 local condence limits cent cent cent cent cent cent cent cent cent95 IRS condencelimits)

achieved by making the inclusion probability inversely propor-tional to the population density Although the development ofGRTS has focused on applications in geographic space it canbe applied in other spaces For exampleone applicationde nedtwo-dimensionalspace by the rst two principal componentsofclimate variables and selected a GRTS sample of forest plots inthat space

The computational burden in hierarchical randomization canbe substantial However it needs to be carried out only to a res-olution suf cient to obtain no more than one sample point persubquadrant The actual point selection can be carried out bytreating the subquadrants as if they are elements of a nitepopulation selecting the M subquadrants to receive samplepoints and then selecting one population element at randomfrom among the elements contained within the selected sub-quadrants according to the probability speci ed by frac14

Reverse hierarchical ordering adds a feature that is im-mensely popular with eld practitioners namely the ability toldquoreplacerdquo samples that are lost due to being nontarget or inac-cessible Moreover we can replace the samples in such a wayas to achieve good spatial balance over the population that isactually sampleable even when sampleability cannot be deter-mined prior to sample selectionOf course this feature does noteliminate the nonresponse or the bias of an inference to the in-accessible population It does however allow investigators toobtain the maximum number of samples that their budget willpermit them to analyze

Reverse hierarchical ordering has other uses as well One isto generate interpenetrating subsamples (Mahalanobis 1946)For example 10 interpenetrating subsamples from a samplesize of 100 can be obtained simply by taking consecutive sub-sets of 10 from the reverse hierarchical ordering Each subsethas the same properties as the complete design Consecutivesubsets can also be used to de ne panels of sites for applica-tion in surveys over time for example sampling with partial re-placement (Patterson 1950 Kish 1987 Urquhart Overton andBirkes 1993)

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 277

APPENDIX A PROOF OF LEMMA

Lemma Let f I2 I be a 1ndash1 quadrant-recursive function andlet s raquo UI2 Then limjplusmnj0 Efjf s iexcl f s C plusmnjg D 0

Proof If for some n gt 0 s and s C plusmn are in the same subquad-rant Qn

jk then f s and f s C plusmn are in the same interval J nm so

that jf s iexcl f s C plusmnj middot 1=4n The probability that s and s C plusmn arein the same subquadrant is the same as the probability of the ori-gin and plusmn D plusmnx plusmny being in the same cell of a randomly locatedgrid with cells congruent to Qn

jk For plusmnx plusmny middot 1=2n that probability

is equal to jQn0 Qnplusmnj=jQn0j D 1 iexcl 2nplusmnx C plusmny C 4nplusmnxplusmny where Qnx denotes a polygon congruent to Qn

jk centered on x

For Ds plusmn D jf s iexcl f s C plusmnj then we have that P D middot 1=4n cedil1 iexcl 2nplusmnx C plusmny C 4nplusmnxplusmny Thus the distribution function FD of D isbounded below by

FD u cedil

8gtlt

gt

0 u middot 14n

1 iexcl 2nplusmnx C plusmny C 4nplusmnxplusmny u gt1

4n

Because D is positive and bounded above by 1

E[Dplusmn] D 1 iexclZ 1

0FD udu

middot 1 iexclraquo

0

4nC

sup31 iexcl 1

4n

acuteiexcl 2nplusmnx C plusmny C 4nplusmnxplusmny

frac14

For xed n we have that

limjplusmnj0

E[Dplusmn] middot 14n

but this holds for all n so that

limjplusmnj0

E[Dplusmn] D 0

APPENDIX B PROOF THAT THE PROBABILITYINCLUSION FUNCTION EQUALS THE

TARGET INTENSITY FUNCTION

We need the measure space XB Aacute where X is the unit inter-val I D 0 1] or the unit square I2 D 0 1] pound 01] and the rele-vant frac34 elds are BI and BI2 the frac34 elds of the Borel subsetsof I and I2 respectively For each of the three types of populationswe de ne a measure Aacute of population size We use the same symbolfor all three cases but the speci cs vary from case to case For a -nite population we take Aacute to be counting measure restricted to R sothat for any subset B 2 BI2 AacuteB is the number of population ele-ments in B R For linear populations we take AacuteB to be the lengthof the linear population contained within B Clearly Aacute is nonnega-tive countably additive de ned for all Borel sets and Aacute D 0 soAacute is a measure Finally for areal populations we take AacuteB to be theLebesgue measure of B R

We begin by randomly translating the image of R in the unit squareby adding independent U0 1=2 offsets to the xy coordinates Thisrandom translation plays the same role as random grid location doesin an RTS design namely it guarantees that pairwise inclusion prob-abilities are nonzero In particular in this case it ensures that any pairof points in R has a nonzero chance of being mapped into differentquadrants

Let frac14s be an inclusion intensity function that is a function thatspeci es the target number of samples per unit measure We assumethat any linear population consists of a nite number m of smoothrecti able curves R D

SmiD1fdegit D xi t yi t jt 2 [ai bi ]g with

xi and yi continuous and differentiable on [ai bi ] We set frac14s equalto the target number of samples per unit length at s for s 2 L and

equal to zero elsewhere For example if the linear population werea stream network frac14s would specify the desired number of samplesper kilometer of stream at the point s Finally an areal population isa nite collection of closed polygons In this case frac14s speci es thetarget intensity as number of samples per unit area Note that for one-and two-dimensional resources frac14s could be a continuous smoothlyvarying function Formally we require frac14s to be bounded and mea-surable strictly positive on R and zero elsewhere and scaled so thatM D

RR frac14s dAacutes From these de nitions of frac14cent and Aacutecent it follows

that wB DR

B frac14sdAacutes is a measure and that wB is the targetnumber of samples in B In particular M D wI2 is the target samplesize In the following discussion we assume that M is an integer thenoninteger case is a simple extension

Let f cent be a quadrant-recursive function that maps I2 into I Be-cause BI can be generated by sets of the form J n

m and BI2 can begenerated by sets of the form Qn

j k both f and f iexcl1 are measurable

Because f is measurable f iexcl1B is measurable for B 2 BI so thatQF x D

Rf iexcl10x] frac14sdAacutes exists In fact QF is a distribution func-

tion that is nonnegative increasing and right continuous For linearand areal resources QF is a continuous increasing function but for -nite resourcepopulations QF is a step function with jumps at the imagesof populationelements We can modify QF to obtain continuity in the -nite case via linear interpolation that is let xi i D 1 N be the or-dered jump points of QF set x0 D 0 xNC1 D 1 and for xi lt x middot xiC1set F x D QF x C QFxiC1 iexcl QFxi =xiC1 iexcl xi x iexcl xi If we setF D QF for the linear and areal case then in every cases we have thatF is a continuous distribution function with range 0M]

In the nite case Fiexcl1 is single-valued so that Gy D minxi jF iexcl1y middot xi is well de ned In the linear and areal cases F iexcl1

may not be single-valued Points that are in the unit square but notin R lead to ats in F that correspond to regions in the unit squarewith frac14s D 0 However Fiexcl1y always will be closed and boundedso that Gy D minfxjx 2 Fiexcl1yg is well de ned In all cases theintensity function frac14 is positive at s D f iexcl1Gy that is there isa population element at s Thus f iexcl1 plusmn G maps 0M] onto the tar-get population that is f iexcl1 plusmn G associates every point in 0 M] witha unique element in the population

It follows that selecting a sample from 0 M] also selects pop-ulation elements via the mapping f iexcl1 plusmn G To get a sample withan inclusion function equal to the target inclusion density we selecta sample from 0M] by splitting the range into M unit-length in-tervals 01] 1 2] M iexcl 1M] and picking one point in eachinterval Because of hierarchical randomizationwe gain no additionalldquorandomnessrdquo by picking the points independently so we use system-atic sampling with a random start and a unit-length selection intervalThe selection procedure de nes an inclusion probability density func-tion on 0M] with a correspondingmeasure PM cent Note that PM co-incides with Lebesgue measure on 0 M] in particular the measureof a subinterval of 0 M] is its length We induce a measure P1 on I

via P1B DRGiexcl1B dP M and in turn induce a measure P2 on I2

via P2B DR

f iexcl1B dP 1 The measure P2 is an inclusion probability

measure on I2 and P2B D wB so the sample selection methoddoes give an inclusion probability function equal to the target sampleintensity function

[Received August 2002 Revised September 2003]

REFERENCES

Bellhouse D R (1977) ldquoSome Optimal Designs for Sampling in Two Dimen-sionsrdquo Biometrika 64 605ndash611

Bickford C A Mayer C E and Ware K D (1963) ldquoAn Ef cient Sam-pling Design for Forest Inventory The Northeast Forest Resurveyrdquo Journalof Forestry 61 826ndash833

278 Journal of the American Statistical Association March 2004

Breidt F J (1995) ldquoMarkov Chain Designs for One-per-Stratum SamplingrdquoSurvey Methodology 21 63ndash70

Brewer K R W and Hanif M (1983) Sampling With Unequal ProbabilitiesNew York Springer-Verlag

Cochran W G (1946) ldquoRelative Accuracy of Systematic and Strati ed Ran-dom Samples for a Certain Class of Populationsrdquo The Annals of Mathemati-cal Statistics 17 164ndash177

Cordy C (1993) ldquoAn Extension of the HorvitzndashThompson Theorem to PointSampling From a Continuous Universerdquo Probability and Statistics Letters18 353ndash362

Cotter J and Nealon J (1987) ldquoArea Frame Design for Agricultural SurveysrdquoUS Department of Agriculture National Agricultural Statistics Service Re-search and Applications Division Area Frame Section

Dalenius T Haacutejek J and Zubrzycki S (1961) ldquoOn Plane Sampling and Re-lated Geometrical Problemsrdquo in Proceedings of the 4th Berkeley Symposiumon Probability and Mathematical Statistics 1 125ndash150

Das A C (1950) ldquoTwo-Dimensional Systematic Sampling and the AssociatedStrati ed and Random Samplingrdquo Sankhya 10 95ndash108

Gibson L and Lucas D (1982) ldquoSpatial Data Processing Using BalancedTernaryrdquo in Proceedings of the IEEE Computer Society Conference on Pat-tern Recognition and Image Processing Silver Springs MD IEEE ComputerSociety Press

Gilbert R O (1987) Statistical Methods for Environmental Pollution Moni-toring New York Van Nostrand Reinhold

Hausdorff F (1957) Set Theory New York ChelseaHazard J W and Law B E (1989) Forest Survey Methods Used in the USDA

Forest Service EPA6003-89065 Corvallis Oregon US EnvironmentalProtection Agency Of ce of Research and Development Environmental Re-search Laboratory

Horn C R and Grayman W M (1993) ldquoWater-Quality Modeling With EPARiver Reach File Systemrdquo Journal of Water Resources Planning and Man-agement 119 262ndash274

Horvitz D G and Thompson D J (1952) ldquoA Generalization of SamplingWithout Replacement From a Finite Universerdquo Journal of the American Sta-tistical Association 47 663ndash685

Iachan R (1985) ldquoPlane Samplingrdquo Statistics and Probability Letters 50151ndash159

IDEM (2000) ldquoIndiana Water Quality Report 2000rdquo Report IDEM34020012000 Indiana Department of Environmental Management Of ce of Wa-ter Management Indianapolis Indiana

Insightful Corporation (2002) ldquoS-PLUS 6 for Windows Language ReferencerdquoInsightful Corporation Seattle WA

Karr J R (1991) ldquoBiological Integrity A Long Neglected Aspect of WaterResource Managementrdquo Ecological Applications 1 66ndash84

Kish L (1987) Statistical Design for Research New York WileyMahalanobis P C (1946) ldquoRecent Experiments in Statistical Sampling in

the Indian Statistical Instituterdquo Journal of the Royal Statistical Society 109325ndash370

Mark D M (1990) ldquoNeighbor-Based Properties of Some Orderings of Two-Dimensional Spacerdquo Geographical Analysis 2 145ndash157

Mateacutern B (1960) Spatial Variation Stockholm Sweden Meddelanden fraringnStatens Skogsforskningsinstitut

Messer J J Arsiss C W Baker J R Drouseacute S K Eshleman K NKaufmann P R Linthurst R A Omernik J M Overton W S Sale M JSchonbrod R D Stambaugh S M and Tuschall J R Jr (1986) Na-tional Surface Water Survey National Stream Survey Phase I-Pilot SurveyEPA-6004-86026 Washington DC US Environmental ProtectionAgency

MunhollandP L and Borkowski J J (1996) ldquoSimple Latin Square SamplingC 1 A Spatial Design Using Quadratsrdquo Biometrics 52 125ndash136

Olea R A (1984) ldquoSampling Design Optimization for Spatial FunctionsrdquoMathematical Geology 16 369ndash392

Overton W S and Stehman S V (1993) ldquoProperties of Designs for SamplingContinuous Spatial Resources From a Triangular Gridrdquo Communications inStatistics Part AmdashTheory and Methods 22 2641ndash2660

Patterson H D (1950) ldquoSampling on Successive Occasions With Partial Re-placement of Unitsrdquo Journal of the Royal Statistical Society Ser B 12241ndash255

Peano G (1890) ldquoSur Une Courbe Qui Remplit Toute Une Aire Planerdquo Math-ematische Annalen 36 157ndash160

Quenouille M H (1949) ldquoProblems in Plane Samplingrdquo The Annals of Math-ematical Statistics 20 335ndash375

Saalfeld A (1991) ldquoConstruction of Spatially Articulated List Frames forHousehold Surveysrdquo in Proceedings of Statistics Canada Symposium 91Spatial Issues in Statistics Ottawa Canada Statistics Canada pp 41ndash53

Sen A R (1953) ldquoOn the Estimate of the Variance in Sampling With Vary-ing Probabilitiesrdquo Journal of the Indian Society of Agricultural Statistics 7119ndash127

Simmons G F (1963) Introduction to Topology and Modern Analysis NewYork McGrawndashHill

Stehman S V and Overton W S (1994) ldquoEnvironmental Sampling and Mon-itoringrdquo in Handbook of Statistics Vol 12 eds G P Patil and C R RaoAmsterdam The Netherlands Elsevier Science pp 263ndash305

Stevens D L Jr (1997) ldquoVariable Density Grid-Based Sampling Designs forContinuous Spatial Populationsrdquo Environmetrics 8 167ndash195

Stevens D L Jr and Olsen A R (1999) ldquoSpatially Restricted Surveys OverTime for Aquatic Resourcesrdquo Journal of Agricultural Biological and Envi-ronmental Statistics 4 415ndash428

(2000) ldquoSpatially-Restricted Random Sampling Designs for Design-Based and Model-Based Estimationrdquo in Accuracy 2000 Proceedings of the4th International Symposium on Spatial Accuracy Assessment in Natural Re-sources and Environmental Sciences Delft The Netherlands Delft Univer-sity Press pp 609ndash616

(2003) ldquoVariance Estimation for Spatially Balanced Samples of Envi-ronmental Resourcesrdquo Environmetrics 14 593ndash610

Strahler A N (1957) ldquoQuantitative Analysis of Watershed GeomorphologyrdquoTransactions of the American Geophysical Union 38 913ndash920

Thompson S K (1992) Sampling New York WileyUrquhart N S Overton W S and Birkes D S (1993) ldquoComparing

Sampling Designs for Monitoring Ecological Status and Trends Impact ofTemporal Patternsrdquo in Statistics for the Environment eds V Barnett andK F Turkman New York Wiley pp 71ndash86

USGS (1999) ldquoThe National Hydrography Datasetrdquo Fact Sheet 106-99 USGeological Survey

Wolter K (1985) Introduction to Variance Estimation New York Springer-Verlag

Wolter K M and Harter R M (1990) ldquoSample Maintenance Based on PeanoKeysrdquo in Proceedings of the 1989 International Symposium Analysis of Datain Time Ottawa Canada Statistics Canada pp 21ndash31

Yates F (1981) Sampling Methods for Censuses and Surveys (4th ed) Lon-don Grif n

Yates F and Grundy P M (1953) ldquoSelection Without Replacement FromWithin Strata With Probability Proportional to Sizerdquo Journal of the RoyalStatistical Society Ser B 15 253ndash261

Page 6: Spatially Balanced Sampling of Natural Resources · Spatially Balanced Sampling of Natural Resources DonL.STEVENSJr. and Anthony R. OLSEN The spatial distribution of a natural resource

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 267

3 SPATIAL PROPERTIES OF GRTS SAMPLE POINTS

In this section we investigate the spatial balance or regu-larity of the sample points produced by a GRTS design Wenoted in the Introduction that generally the ef ciency of anenvironmental sample increases as spatial regularity increasesA design with regularity comparable to a maximally strati edsample should have good ef ciency Choosing a suitable sta-tistic to describe regularity is nontrivial because the popula-tion domain itself is likely to have some inherent nonregularity(eg variation in spatial density for a nite or linear popula-tion) and because of the need to account for variable inclusionprobability The measure of regularity needs to describe reg-ularity over the inclusion-probability-weighted irregular pop-ulation domain Various statistics to assess the regularity ofa point process have been proposed in the study of stochasticpoint processes One class of descriptive statistics is based oncounts of event points within cells of a regular grid that cov-ers the process domain The mean count is a measure of theprocess intensity and the variance of the counts is a measureof the regularity The usual point process approach is to invokeergodicity and take expectation over a single realization In thepresent case the expectationshould and can be taken over repli-cate sample selections

We illustrate this approach using an arti cial nite popula-tion that consists of 1000 points in the unit square with a spatialdistribution constructed to have high spatial variability that isto have voids and regions with densely packed points Variableprobability was introduced by randomly assigning 750 unitsa relative weight of 1 200 units a weight of 2 and 50 unitsa weight of 4 The inclusion probability was obtained by scal-ing the weights to sum to the sample size We divided the unit

square into 100 square cells with sides 1 units Fifty-one of thecells were empty The expected sample sizes (the sum of theinclusion probability for each cell) for the 49 nonempty cellsranged from 037 to 4111

We compared the regularity of three sampling designs in-dependent random sampling (IRS) spatially strati ed sampling(SSS) and GRTS sampling For each sampling scheme we se-lected 1000 replicates of a sample of 50 points and counted thenumber of sample points that fell into each of the 49 nonemptycells de ned in the previous paragraph For the IRS sample weused the S-PLUS (Insightful Corporation 2002) ldquosamplerdquo func-tion with ldquoprobrdquo set to the element inclusion probability

As we noted in the Introductionthere is no general algorithmfor partitioning an arbitrary nite spatial population with vari-able inclusionprobability into spatial strata with equal expectedsample sizes For this exercisewe chose to use equal-area stratawith variable expected sample sizes For simplicity we chosesquare strata We picked a side length and origin so that (1)the strata were not coherent with the 1 pound 1 cells used for reg-ularity assessment and (2) about 50 stratum cells had at leastone population point The strata we used were offset from theorigin by 03 03 with a side length of 095 Exactly 50 strat-i cation cells were nonemptywith expected sample sizes rang-ing from 037 to 4111 Figure 2 shows the population with thestrati cation cells overlaid

We selected the strati ed sample in two stages The fractionalparts of the expected sample sizes will always sum to an inte-ger in this case 21 The rst step in the sample selection was toselect which 21 of the 50 strata would receive an ldquoextrardquo samplepoint For this step we again used the S-PLUS ldquosamplerdquo func-tion this time with ldquoprobrdquo set to the fractional part of the ex-pected sample size The second step in the sample selection was

Figure 2 Finite Population Used in Spatial Balance Investigation Overlaid With Grid Cells Used for Stratication Cell cross-hatching indicatesthe expected sample size in each cell

268 Journal of the American Statistical Association March 2004

to pick samples in each of the 50 strata that had a sample sizeequal to the integerpart of the expected sample size plus 1 if thestratum was selected in stage 1 Again the sample was selectedwith the ldquosamplerdquo function this time with ldquoprobrdquo set to the ele-ment inclusionprobabilityThis two-stage procedurealways se-lects exactly 50 samples with the desired inclusion probability

In Figure 3 we plot the variance of the achieved sample sizein each of the evaluation cells versus the expected sample sizewith lowess tted lines Of the three designs the IRS has thelargest variance and the GRTS has the smallest the SSS designis approximatelymidway between Strati cation with one sam-ple per cell would likely have about the same variance as theGRTS

Another common way to characterize a one-dimensionalpoint process is via the interevent distance for example themean interevent time for a time series measures the intensity ofthe process and the variance measures the regularity An anal-ogous concept in two-dimensions is that of Voronoi polygonsFor a set of event points fs1 s2 sng in a two-dimensionaldomain the Voronoi polygon 9i for the ith point is the collec-tion of domain points that are closer to si than to any other sj inthe set Note that in the case of a nite population the Voronoildquopolygonsrdquoare collectionsof populationpoints and for a linearpopulation they are collections of line segments

We propose using a statistic based on Voronoi polygons todescribe the regularity of a spatial sample For the sample S

consisting of the points fs1 s2 sng let ordmi DR

9ifrac14s dAacutes

so that vi is the total inclusion probability of the Voronoi poly-gon for the ith sample point and set sup3 D Varfordmig For a -nite population with variable inclusion probability vi is thesum of the inclusion probability of all population units closerto the sample point si than to any other sample point BecauseP

i j9ij D jRj andR

Rfrac14s dAacutes D n E[vi ] D 1 We note that

for an equiprobable sample of a two-dimensionalcontinua sup3 isequal to the variance of the area of the Voronoi polygons for thepoints of S multipliedby the square of the inclusionprobability

Figure 3 Comparison of the Regularity of GRTS SSS and IRS De-signs Results are based on the mean of 1000 samples of size 50 Theachieved sample size is the number of samples that fell into 1 pound 1square cells that tiled the population domain Lines were tted withlowess (N generalized random tessellation stratied sampling pound inde-pendent random sampling brvbar spatially stratied sampling)

For the kinds of applications that we have in mind the spa-tial context of the population is an intrinsic aspect of the sampleselection For a nite population the spatial context simplycomprises the locationsof the populationunits for a linear pop-ulation the spatial context is the network and for an areal re-source the spatial context is described by the boundary of theresource domain which may be a series of disconnected poly-gons The effect of the interplay of sampling design and spa-tial context on properties of the sample cannot be ignored Forsmall to moderate sample sizes or for highly irregular domainsthe spatial context can have a substantial impact on the distri-bution of sup3 Because of the spatial dependence the derivationof a closed form for the distribution of sup3 does not seem feasi-ble even for simple sampling designs such as IRS Howeverfor most cases it should be relatively easy to simulate the dis-tribution of sup3 under IRS to obtain a standard for comparisonThe regularity of a proposed design can then be quanti ed asthe ratio sup3proposed design=sup3IRS where ratios less than 1indicate more regularity than an IRS design

We evaluated spatial balance using the sup3 ratio under threescenarios (1) a variable probability sample from a nite popu-lation (2) an equiprobable point sample from an areal popula-tion de ned on the unit square and (3) an equiprobable pointsample from the same extensive populationbut with randomlylocated square holes to model nonresponseand imperfect frameinformation

For the nite population study we drew 1000 samples ofsize 50 from the previously described nite population for boththe GRTS and the IRS designs To illustrate the ability of theGRTS design to maintain spatial regularity as the sample sizeis augmented we ordered the GRTS points using reverse hi-erarchical ordering We then calculated the sup3 ratio beginningwith a size of 10 and adding one point at a time following thereverse hierarchical order We also drew 1000 samples of size50 using the previously discussed spatial strati cation Becausethere is no sensible way to add the strati ed sample points oneat a time we can compute the sup3 ratio only for the complete sam-ple of 50 points Figure 4 is a plot of the sup3 ratio for GRTS andIRS versus sample size The single sup3 ratio for SSS(50) is alsoshown For the GRTS design the sup3 ratio has a maximum valueof 587 with 10 samples and gradually tapers off to 420 with50 samples Although it would be dif cult to prove we sus-pect that the gradual taper is due to lessening edge effect withincreasing sample size that is fewer of the Voronoi polygonscross the void regions in the population domain We note thatthe valleys in the sup3 ratio occur at multiples of 4 with the mostextreme dips occurring at powers of 4 This is a consequenceof quadrant-recursive partitioning maximum regularity occurswith one point from each of the four quadrants We also notethat the SSS(50) value of the sup3 ratio is 550 compared to thecorrespondingvalue of 420 for the GRTS design Inasmuch asthe GRTS is analogous to a one-sample-per-stratum SSS wewould expect the GRTS to be as ef cient as a maximally ef -cient SSS

For our extensive population study we selected 1000 sam-ples of size M D 256 from the unit square using the GRTS

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 269

Figure 4 The sup3 Ratio as a Function of Sample Size Based on 1000Replicate Samples From an Arti cial Finite Population Sample pointswere added one point at a time up to the maximum sample size of 50following reverse hierarchical order for the GRTS sample The sup3 ratio fora spatially stratied sample is also indicated on the plot

design and ordered the samples using the reverse hierarchicalorder As for the nite populationstudy we calculated the sup3 ra-tio as the points were added to the sample one at a time be-ginning with point number 10 The holes represent nontargetor access-denied elements that were a priori unknown Samplepoints that fell in the holes were discarded resulting in a vari-able number of sample points in the target domain As for thecomplete domain we ordered the points using reverse hierar-chical ordering and then calculated the sup3 ratio as the points wereadded one at a time Because the sample points that fall into thenontarget areas contribute to the sample point density but not tothe sample size the sup3 ratio was plotted versus point density

We used three different distributions of hole size constantlinearly increasing and exponentially increasing In each casethe holes comprise 20 of the domain area Figure 5 shows the

Figure 5 Void Patterns Used to Simulate InaccessiblePopulation El-ements

Figure 6 The sup3 Ratio as a Function of Point Density Based on 1000Replications of a Sample of Size 256 (mdashmdash continuous domain with novoids iexcl iexcl iexcliexcl exponentialy increasing polygon size - - - - - linearly in-creasing polygon size cent cent centmdashcent cent centmdash constant polygon size)

placementof the holes for each scenario and Figure 6 shows thesup3 ratio for all four scenarios no voids exponentiallyincreasinglinearly increasing and constant size

In every scenario the variance ratio is much less than 1Except for small sample sizes the ratio stays in the range of2 to 4 The gradual decrease as the sample size increases is dueto the decreasing impact of the boundaryas the sample size in-creases the proportion of polygons that intersect the boundarydecreases A similar effect is seen with the different inaccessi-bility scenarios even though the inaccessible area is constantthe scenarios with greater perimeter cause more increase invariance

4 STATISTICAL PROPERTIES OF GRTS DESIGN

41 Estimation

The GRTS design produces a sample with speci ed rst-order inclusion probabilities so that the HorvitzndashThompson(Horvitz and Thompson 1952) estimator or its continuouspop-ulation analog (Cordy 1993 Stevens 1997) can be applied toget estimates of population characteristics Thus for examplean estimate of the population total of a response z is given byOZT D

Psi2Rzsi=frac14si Stevens (1997) provided exact ex-

pressions for second-order inclusion functions for some specialcases of a GRTS These expressions can also be used to provideaccurate approximations for the general case Unfortunatelythe variance estimator based on using these approximations inthe usual HorvitzndashThompson (HT) or YatesndashGrundyndashSen (YGYates and Grundy 1953 Sen 1953) estimator tends to be unsta-ble The design achieves spatial balance by forcing the pairwiseinclusion probability to approach 0 as the distance between thepoints in the pair goes to 0 Even though the pairwise inclu-sion density is nonzero almost everywhere any moderate-sizedsample will nevertheless have one or more pairs of points thatare close together with a correspondinglysmall pairwise inclu-sion probability For both the HT and YG variance estimatorsthe pairwise inclusion probability appears as a divisor The cor-

270 Journal of the American Statistical Association March 2004

responding terms in either HT or YG variance estimators willtend to be large leading to instability of the variance estimator

Contrast-based estimators of the form OVCtr OZT DP

i wiy2i

where yi is a contrast of the form yi DP

k cikzsk withPk cik D 0 have been discussed by several authors (Yates

1981 Wolter 1985 Overton and Stehman 1993) For an RTSdesign Overton and Stehman also considered a ldquosmoothedrdquocontrast-based estimator of the form OVSMO OZT D

Pi wizi iexcl

zcurreni 2 where zcurren

i called the smoothed value for data point zi istaken as a weighted mean of a point plus its nearest neighborsin the tessellation

Stevens and Olsen (2003) proposed a contrast-based esti-mator for the GRTS design that bears some resemblance tothe Overton and Stehman smoothed estimator The single con-trast zi iexcl zcurren

i 2 is replaced with an average of several con-trasts over a local neighborhood analogous to a tessellationcell and its nearest neighbors in the RTS design A heuris-tic justi cation for this approach stems from the observationthat the inverse images of the unit-probability intervals on theline form a random spatial strati cation of the population do-main The GRTS design conditional on the strati cation isa one-sample-per-stratumspatially strati ed sample Recall thatOZT D

Psi2Rzsi=frac14si where zsi is a sample from the

ith random stratum The selections within strata are condition-ally independent of one another so that

V OZT DX

si2R

E

microV

sup3zsi

frac14si

shyshyshystrata

acutepara

The proposed variance estimator approximates E[V zsi=

frac14sij strata] by averaging several contrasts over a localneighborhoodof each sample point The estimator is

OVNBH OZT DX

si2R

X

sj 2Dsi

wij

sup3zsj

frac14sj iexcl

X

sk2Dsi

wikzsk

frac14sk

acute2

where Dsi is a local neighborhood of the si The weightswij are chosen to re ect the behavior of the pairwise inclu-sion function for GRTS and are constrained so that

Pi wij DP

j wij D 1 Stimulation studies with a variety of scenarioshave shown the proposed estimator to be stable and nearly unbi-ased Applications with real data have consistently shown thatour local neighborhoodvariance estimator produces smaller es-timates than the HorvitzndashThompson estimator when IRS is as-sumed to approximate for the joint inclusion probabilities

42 Inverse Sampling

The reverse hierarchical ordering provides the ability to doinverse sampling that is to sample until a given number ofsamples are obtained in the target population The true inclu-sion probability in this case depends on the spatial con gura-tion of the target populationwhich may be unknownHoweverone can compute an inclusion probability that is conditional onthe achieved sample size in the target population being xedFor example suppose we want M sample points in our do-main R We do not know the exact boundaries of R but areable to enclose R in a larger set Rcurren We select a sample of sizeMcurren gt M from Rcurren using an inclusion density frac14 curren scaled so that

Table 2 Domain Area Estimates Using ConditionalInclusion Probability

Targetsamplesize

Mean estimated domain area

Exponential Constant Linear

25 8000979 7969819 801058950 7995775 7979406 8005739

100 7994983 7980543 8002237150 7994777 7997587 7995685

RRcurren frac14currens dAacutes D M curren The inclusion density for the k-point

reverse hierarchical ordered sample is frac14currenk s D k=Mcurrenfrac14currens

Using the inclusion density frac14 currenk the expected number of sam-

ples in R is

Mk DZ

R

frac14currenk s dAacutes D

Z

RcurrenIRsfrac14 curren

k s dAacutes

We cannot compute Mk because the boundaryof R is unknownbut an estimate is

OMk DX

i

IR sifrac14currenk si

frac14currenk si

DX

i

IRsi

We pick Qk so that OMk D M and base inference on frac14currenQk Thus

for example an estimate of the unknown extent of R is j ORj DPiIRsi=frac14curren

Qksi

We illustrate this using the same inaccessibility scenarios asfor the spatial balance simulation Results are summarized inTable 2 In each case the true area of R is 8 so that the esti-mator using frac14curren

Qk is either unbiased or nearly so

43 Statistical Ef ciency

As discussed in the Introduction sampling designs withsome degree of spatial regularity for example systematic grid-based or spatially strati ed designs tend to be more ef cientfor sampling natural resources than designs with no spatialstructure The GRTS design takes the concept of spatial strat-i cation carries it to an extreme and gives it exibility androbustness The basis for these claims is that for the case ofan equiprobable sample of an areal resource over a continuousconnected domain a GRTS sample with size n D 4k is a spa-tially strati ed sample with one sample point per stratum Inthis case the strata are square grid cells with a randomly locatedorigin Generally the ef ciency of a spatially strati ed sampleincreases as the number of strata increases (samples per stratumdecreases) so maximal ef ciency is obtained for a one-point-per-stratum-design Thus in this restricted case the GRTS hasthe same ef ciency as the maximally ef cient spatial strati ca-tion

The spatial regularity simulation studies provide some in-sight into less restrictive cases First the ldquono-voidrdquo case ofthe continuous domain study shows that the spatial regularityis not seriously degraded for sample sizes that are not powersof 4 so that even for intermediate sample sizes the GRTS ef- ciency should be close to the ef ciency of maximal spatialstrati cation Second the ldquoholesrdquo cases show that for irreg-ularly shaped domains GRTS maintains spatial regularity Inthis case GRTS with n D 4k is again a one-point-per-stratum

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 271

design but the strata are no longer regular polygonsNeverthe-less GRTS should have the same ef ciency as maximal strati- cation

An example of circumstances where ef ciency is dif cultto evaluate is a nite population study with variable probabil-ity and irregular spatial density In these circumstances spatialstrata can be very dif cult to form and in fact it may be impos-sible to form strata with a xed number of samples per stratumA GRTS sample achieves the regularity of a one-sample-per-stratum strati cation and so should have the same ef ciency

The overwhelming advantage of a GRTS design is not that itis more ef cient than spatial strati cation but that it can be ap-plied in a straightforward manner in circumstances where spa-tial strati cation is dif cult All of the pathologies that occurin sampling natural populations (poor frame information inac-cessibility variable probability uneven spatial pattern missingdata and panel structures) can be easily accommodated withinthe GRTS design

5 EXAMPLE APPLICATION TO STREAMS

The Indiana Department of Environmental Management(IDEM) conducts water quality and biological assessments ofthe streams and rivers within Indiana For administrative pur-poses the state is divided into nine hydrologicbasins East ForkWhite River Basin West Fork White River Basin Upper IllinoisRiver Basin Great Miami River Basin Lower Wabash RiverBasin Patoka River Basin Upper Wabash River Basin GreatLakes Basin and Ohio River Basin All basins are assessedonce during a 5-year period typically two basins are completedeach year In 1996 IDEM initiated a monitoring strategy thatused probability survey designs for the selection of samplingsite locations We collaborated with them on the survey designIn 1997 a GRTS multidensity design was implemented for theEast Fork White River Basin and the Great Miami River BasinIn 1999 another GRTS multidensity design was implementedfor the Upper Illinois Basin and the Lower Wabash These de-signs will be used to illustrate the application of GRTS surveydesigns to a linear network

The target population for the studies consists of all streamsand rivers with perennially owing water A sample frameRiver Reach File Version 3 (RF3) for the target populationis available from the US Environmental Protection Agency(Horn and Grayman 1993) The RF3 includes attributes that en-able perennial streams and rivers to be identi ed but results inan overcoverage of the target population due to coding errorsIn addition Strahler order is available to classify streams andrivers into relative size categories (Strahler 1957) A headwaterstream is a Strahler rst-order stream two rst-order streamsjoining results in a second-order stream and so on Approx-imately 60 of the stream length in Indiana is rst order

Table 3 Sample Frame Stream and River Length by Basin andStrahler Order Category

Strahler order category length (km)

Basin Total length (km) 1 2 3 C

E Fork White 6802385 3833335 2189494 779556Great Miami 2270018 1501711 621039 147268L Wabash 7601418 4632484 1331228 1637706U Illinois 5606329 4559123 500188 547018

20 is second order 10 is third order and 10 is fourth andgreater (see Table 3) In 1997 IDEM determined that the sam-ple would be structured so that approximately an equal num-ber of sites would be in rst order second and third order andfourth C order for the East Fork White River and the Great Mi-ami River basins In 1999 the sample was modi ed to have anequal number of sites in rst second third and fourth C ordercategories for the Lower Wabash and Upper Illinois basins

The GRTS multidensity survey designs were applied In bothyears six multidensity categories were used (three Strahler or-der categories in each of two basins) Although four Strahlerorder categories were planned in 1999 the stream lengths as-sociated with the third and fourth C categories were approxi-mately equal so a single category that combined the samplesizes was used To account for frame errors landowner denialsand physically inaccessible stream sites a 100 oversamplewas incorporated in 1999 The intent was to have a minimumof 38 biological sites with eld data in 1999 this was not donein 1997 Table 4 summarizes the number of sites expected andactually evaluated as well as the number of nontarget targetnonresponse and sampled sites Almost all of the nonresponsesites are due to landowner denial In 1999 the sites were usedin reverse hierarchical order until the desired number of ac-tual eld sample sites was obtained The biological sites werea nested subsample of the water chemistry sites and were takenin reverse hierarchicalorder from the water chemistry sites Fig-ures 7ndash10 show the spatial pattern of the stream networks andthe GRTS sample sites for each of the four basins by Strahlerorder categories Although this is an example of a single real-izationof a multidensityGRTS design all realizationswill havea similar spatial pattern Prior to statistical analysis the initialinclusion densities are adjusted to account for use of oversam-ple sites by recalculating the inclusion densities by basin

Indiana determined two summary indices related to the eco-logical conditionof the streams and rivers the IBI score whichis a sh community index of biological integrity (Karr 1991)that assesses water quality using resident sh communities asa tool for monitoring the biological integrity of streams and theQHEI score which is a habitat index based on the Ohio Envi-ronmental Protection Agency qualitative habitat evaluation in-dex (see IDEM 2000 for detailed descriptions of these indices)

Table 4 Survey Design Sample Sizes for Basins Sampled in 1997 and 1999

Expected Evaluated Nontarget Target Nonresponse Water chemistry BiologicalBasin sample size sample size sites sites sites sites sites

E Fork White 60 60 5 55 9 35 34Great Miami 40 40 12 28 5 19 19L Wabash 128 91 11 80 9 71 39U Illinois 128 85 8 77 5 72 41

272 Journal of the American Statistical Association March 2004

Figure 7 East Fork White River Basin Sample Sites by Multidensity Categories

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 273

Figure 8 Great Miami River Basin Sample Sites by Multidensity Categories

274 Journal of the American Statistical Association March 2004

Figure 9 Upper Illinois River Basin Sample Sites by Multidensity Categories

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 275

Figure 10 Lower Wabash River Basin Sample Sites by Multidensity Categories

276 Journal of the American Statistical Association March 2004

Table 5 Population Estimates With IRS and Local Variance Estimates

Indicator IRS Local DifferenceSubpopulation score N sites Mean std err std err ()

L Wabash IBI 39 361 21 14 iexcl568U Illinois IBI 41 325 17 13 iexcl448E Fork White IBI 32 351 13 12 iexcl223Great Miami IBI 19 408 27 22 iexcl335L Wabash QHEI 39 556 23 16 iexcl522U Illinois QHEI 41 433 21 16 iexcl395E Fork White QHEI 34 543 21 15 iexcl459Great Miami QHEI 19 678 22 19 iexcl260

Table 5 summarizes the population estimates for IBI andQHEI scores for each of the four basins The associated stan-dard error estimates are based on the HorvitzndashThompson ratiovariance estimator assuming an independent random sampleand on the local neighborhoodvariance estimator described inSection 41 On average the neighborhood variance estimatoris 38 smaller than the IRS variance estimator Figure 11 il-lustrates the impact of the variance estimators on con denceintervals for cumulative distribution function estimates for theLower Wabash Basin

6 DISCUSSION

There are a number of designs that provide good disper-sion of sample points over a spatial domain When we appliedthese designs to large-scale environmental sampling programsit quickly became apparent that we needed a means (1) toaccommodate variable inclusion probability and (2) to adjustsample sizes dynamically These requirements are rooted inthe very fundamentals of environmental management The rstrequirement stems from the fact that an environmental re-source is rarely uniformly important in the objective of themonitoring there are always scienti c economic or politi-cal reasons for sampling some portions of a resource moreintensively than others Two features of environmental moni-toring programs drive the second requirement First these pro-grams tend to be long lived so that even if the objectives ofthe program remain unchangedthe ldquoimportantrdquosubpopulationschange necessitating a corresponding change in sampling in-tensity Second a high-quality sampling frame is often lack-ing for environmental resource populationsAs far as we knowthere is no other technique for spatial sampling that ldquobalancesrdquoover an intensity metric instead of a Euclidean distance metricor permits dynamic modi cation of sample intensity

Adaptive sampling (Thompson 1992 pp 261ndash319) is an-other way to modify sample intensity However there are somesigni cant differences between GRTS and adaptive sampling inthe way the modi cation is accomplished Adaptive samplingincreases the sampling intensity locally depending on the re-sponse observed at a sample point whereas the GRTS intensitychange is global

The GRTS rst-order inclusion probability (or density) canbe made proportional to an arbitrary positive auxiliary vari-able for example a signal from a remote sensing platformor a sample intensity that varies by geographical divisions orknown physical characteristics of the target populationIn somepoint and linear situations it may be desirable to have thesample be spatially balanced with respect to geographic spacerather than with respect to the population density This can be

Figure 11 Stream Network and Sample Site Spatial Patterns by Mul-tidensity Category for the Lower Wabash Basin (mdashmdash- CDF estimateiexcl iexcl iexcl iexcl iexcl 95 local condence limits cent cent cent cent cent cent cent cent cent95 IRS condencelimits)

achieved by making the inclusion probability inversely propor-tional to the population density Although the development ofGRTS has focused on applications in geographic space it canbe applied in other spaces For exampleone applicationde nedtwo-dimensionalspace by the rst two principal componentsofclimate variables and selected a GRTS sample of forest plots inthat space

The computational burden in hierarchical randomization canbe substantial However it needs to be carried out only to a res-olution suf cient to obtain no more than one sample point persubquadrant The actual point selection can be carried out bytreating the subquadrants as if they are elements of a nitepopulation selecting the M subquadrants to receive samplepoints and then selecting one population element at randomfrom among the elements contained within the selected sub-quadrants according to the probability speci ed by frac14

Reverse hierarchical ordering adds a feature that is im-mensely popular with eld practitioners namely the ability toldquoreplacerdquo samples that are lost due to being nontarget or inac-cessible Moreover we can replace the samples in such a wayas to achieve good spatial balance over the population that isactually sampleable even when sampleability cannot be deter-mined prior to sample selectionOf course this feature does noteliminate the nonresponse or the bias of an inference to the in-accessible population It does however allow investigators toobtain the maximum number of samples that their budget willpermit them to analyze

Reverse hierarchical ordering has other uses as well One isto generate interpenetrating subsamples (Mahalanobis 1946)For example 10 interpenetrating subsamples from a samplesize of 100 can be obtained simply by taking consecutive sub-sets of 10 from the reverse hierarchical ordering Each subsethas the same properties as the complete design Consecutivesubsets can also be used to de ne panels of sites for applica-tion in surveys over time for example sampling with partial re-placement (Patterson 1950 Kish 1987 Urquhart Overton andBirkes 1993)

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 277

APPENDIX A PROOF OF LEMMA

Lemma Let f I2 I be a 1ndash1 quadrant-recursive function andlet s raquo UI2 Then limjplusmnj0 Efjf s iexcl f s C plusmnjg D 0

Proof If for some n gt 0 s and s C plusmn are in the same subquad-rant Qn

jk then f s and f s C plusmn are in the same interval J nm so

that jf s iexcl f s C plusmnj middot 1=4n The probability that s and s C plusmn arein the same subquadrant is the same as the probability of the ori-gin and plusmn D plusmnx plusmny being in the same cell of a randomly locatedgrid with cells congruent to Qn

jk For plusmnx plusmny middot 1=2n that probability

is equal to jQn0 Qnplusmnj=jQn0j D 1 iexcl 2nplusmnx C plusmny C 4nplusmnxplusmny where Qnx denotes a polygon congruent to Qn

jk centered on x

For Ds plusmn D jf s iexcl f s C plusmnj then we have that P D middot 1=4n cedil1 iexcl 2nplusmnx C plusmny C 4nplusmnxplusmny Thus the distribution function FD of D isbounded below by

FD u cedil

8gtlt

gt

0 u middot 14n

1 iexcl 2nplusmnx C plusmny C 4nplusmnxplusmny u gt1

4n

Because D is positive and bounded above by 1

E[Dplusmn] D 1 iexclZ 1

0FD udu

middot 1 iexclraquo

0

4nC

sup31 iexcl 1

4n

acuteiexcl 2nplusmnx C plusmny C 4nplusmnxplusmny

frac14

For xed n we have that

limjplusmnj0

E[Dplusmn] middot 14n

but this holds for all n so that

limjplusmnj0

E[Dplusmn] D 0

APPENDIX B PROOF THAT THE PROBABILITYINCLUSION FUNCTION EQUALS THE

TARGET INTENSITY FUNCTION

We need the measure space XB Aacute where X is the unit inter-val I D 0 1] or the unit square I2 D 0 1] pound 01] and the rele-vant frac34 elds are BI and BI2 the frac34 elds of the Borel subsetsof I and I2 respectively For each of the three types of populationswe de ne a measure Aacute of population size We use the same symbolfor all three cases but the speci cs vary from case to case For a -nite population we take Aacute to be counting measure restricted to R sothat for any subset B 2 BI2 AacuteB is the number of population ele-ments in B R For linear populations we take AacuteB to be the lengthof the linear population contained within B Clearly Aacute is nonnega-tive countably additive de ned for all Borel sets and Aacute D 0 soAacute is a measure Finally for areal populations we take AacuteB to be theLebesgue measure of B R

We begin by randomly translating the image of R in the unit squareby adding independent U0 1=2 offsets to the xy coordinates Thisrandom translation plays the same role as random grid location doesin an RTS design namely it guarantees that pairwise inclusion prob-abilities are nonzero In particular in this case it ensures that any pairof points in R has a nonzero chance of being mapped into differentquadrants

Let frac14s be an inclusion intensity function that is a function thatspeci es the target number of samples per unit measure We assumethat any linear population consists of a nite number m of smoothrecti able curves R D

SmiD1fdegit D xi t yi t jt 2 [ai bi ]g with

xi and yi continuous and differentiable on [ai bi ] We set frac14s equalto the target number of samples per unit length at s for s 2 L and

equal to zero elsewhere For example if the linear population werea stream network frac14s would specify the desired number of samplesper kilometer of stream at the point s Finally an areal population isa nite collection of closed polygons In this case frac14s speci es thetarget intensity as number of samples per unit area Note that for one-and two-dimensional resources frac14s could be a continuous smoothlyvarying function Formally we require frac14s to be bounded and mea-surable strictly positive on R and zero elsewhere and scaled so thatM D

RR frac14s dAacutes From these de nitions of frac14cent and Aacutecent it follows

that wB DR

B frac14sdAacutes is a measure and that wB is the targetnumber of samples in B In particular M D wI2 is the target samplesize In the following discussion we assume that M is an integer thenoninteger case is a simple extension

Let f cent be a quadrant-recursive function that maps I2 into I Be-cause BI can be generated by sets of the form J n

m and BI2 can begenerated by sets of the form Qn

j k both f and f iexcl1 are measurable

Because f is measurable f iexcl1B is measurable for B 2 BI so thatQF x D

Rf iexcl10x] frac14sdAacutes exists In fact QF is a distribution func-

tion that is nonnegative increasing and right continuous For linearand areal resources QF is a continuous increasing function but for -nite resourcepopulations QF is a step function with jumps at the imagesof populationelements We can modify QF to obtain continuity in the -nite case via linear interpolation that is let xi i D 1 N be the or-dered jump points of QF set x0 D 0 xNC1 D 1 and for xi lt x middot xiC1set F x D QF x C QFxiC1 iexcl QFxi =xiC1 iexcl xi x iexcl xi If we setF D QF for the linear and areal case then in every cases we have thatF is a continuous distribution function with range 0M]

In the nite case Fiexcl1 is single-valued so that Gy D minxi jF iexcl1y middot xi is well de ned In the linear and areal cases F iexcl1

may not be single-valued Points that are in the unit square but notin R lead to ats in F that correspond to regions in the unit squarewith frac14s D 0 However Fiexcl1y always will be closed and boundedso that Gy D minfxjx 2 Fiexcl1yg is well de ned In all cases theintensity function frac14 is positive at s D f iexcl1Gy that is there isa population element at s Thus f iexcl1 plusmn G maps 0M] onto the tar-get population that is f iexcl1 plusmn G associates every point in 0 M] witha unique element in the population

It follows that selecting a sample from 0 M] also selects pop-ulation elements via the mapping f iexcl1 plusmn G To get a sample withan inclusion function equal to the target inclusion density we selecta sample from 0M] by splitting the range into M unit-length in-tervals 01] 1 2] M iexcl 1M] and picking one point in eachinterval Because of hierarchical randomizationwe gain no additionalldquorandomnessrdquo by picking the points independently so we use system-atic sampling with a random start and a unit-length selection intervalThe selection procedure de nes an inclusion probability density func-tion on 0M] with a correspondingmeasure PM cent Note that PM co-incides with Lebesgue measure on 0 M] in particular the measureof a subinterval of 0 M] is its length We induce a measure P1 on I

via P1B DRGiexcl1B dP M and in turn induce a measure P2 on I2

via P2B DR

f iexcl1B dP 1 The measure P2 is an inclusion probability

measure on I2 and P2B D wB so the sample selection methoddoes give an inclusion probability function equal to the target sampleintensity function

[Received August 2002 Revised September 2003]

REFERENCES

Bellhouse D R (1977) ldquoSome Optimal Designs for Sampling in Two Dimen-sionsrdquo Biometrika 64 605ndash611

Bickford C A Mayer C E and Ware K D (1963) ldquoAn Ef cient Sam-pling Design for Forest Inventory The Northeast Forest Resurveyrdquo Journalof Forestry 61 826ndash833

278 Journal of the American Statistical Association March 2004

Breidt F J (1995) ldquoMarkov Chain Designs for One-per-Stratum SamplingrdquoSurvey Methodology 21 63ndash70

Brewer K R W and Hanif M (1983) Sampling With Unequal ProbabilitiesNew York Springer-Verlag

Cochran W G (1946) ldquoRelative Accuracy of Systematic and Strati ed Ran-dom Samples for a Certain Class of Populationsrdquo The Annals of Mathemati-cal Statistics 17 164ndash177

Cordy C (1993) ldquoAn Extension of the HorvitzndashThompson Theorem to PointSampling From a Continuous Universerdquo Probability and Statistics Letters18 353ndash362

Cotter J and Nealon J (1987) ldquoArea Frame Design for Agricultural SurveysrdquoUS Department of Agriculture National Agricultural Statistics Service Re-search and Applications Division Area Frame Section

Dalenius T Haacutejek J and Zubrzycki S (1961) ldquoOn Plane Sampling and Re-lated Geometrical Problemsrdquo in Proceedings of the 4th Berkeley Symposiumon Probability and Mathematical Statistics 1 125ndash150

Das A C (1950) ldquoTwo-Dimensional Systematic Sampling and the AssociatedStrati ed and Random Samplingrdquo Sankhya 10 95ndash108

Gibson L and Lucas D (1982) ldquoSpatial Data Processing Using BalancedTernaryrdquo in Proceedings of the IEEE Computer Society Conference on Pat-tern Recognition and Image Processing Silver Springs MD IEEE ComputerSociety Press

Gilbert R O (1987) Statistical Methods for Environmental Pollution Moni-toring New York Van Nostrand Reinhold

Hausdorff F (1957) Set Theory New York ChelseaHazard J W and Law B E (1989) Forest Survey Methods Used in the USDA

Forest Service EPA6003-89065 Corvallis Oregon US EnvironmentalProtection Agency Of ce of Research and Development Environmental Re-search Laboratory

Horn C R and Grayman W M (1993) ldquoWater-Quality Modeling With EPARiver Reach File Systemrdquo Journal of Water Resources Planning and Man-agement 119 262ndash274

Horvitz D G and Thompson D J (1952) ldquoA Generalization of SamplingWithout Replacement From a Finite Universerdquo Journal of the American Sta-tistical Association 47 663ndash685

Iachan R (1985) ldquoPlane Samplingrdquo Statistics and Probability Letters 50151ndash159

IDEM (2000) ldquoIndiana Water Quality Report 2000rdquo Report IDEM34020012000 Indiana Department of Environmental Management Of ce of Wa-ter Management Indianapolis Indiana

Insightful Corporation (2002) ldquoS-PLUS 6 for Windows Language ReferencerdquoInsightful Corporation Seattle WA

Karr J R (1991) ldquoBiological Integrity A Long Neglected Aspect of WaterResource Managementrdquo Ecological Applications 1 66ndash84

Kish L (1987) Statistical Design for Research New York WileyMahalanobis P C (1946) ldquoRecent Experiments in Statistical Sampling in

the Indian Statistical Instituterdquo Journal of the Royal Statistical Society 109325ndash370

Mark D M (1990) ldquoNeighbor-Based Properties of Some Orderings of Two-Dimensional Spacerdquo Geographical Analysis 2 145ndash157

Mateacutern B (1960) Spatial Variation Stockholm Sweden Meddelanden fraringnStatens Skogsforskningsinstitut

Messer J J Arsiss C W Baker J R Drouseacute S K Eshleman K NKaufmann P R Linthurst R A Omernik J M Overton W S Sale M JSchonbrod R D Stambaugh S M and Tuschall J R Jr (1986) Na-tional Surface Water Survey National Stream Survey Phase I-Pilot SurveyEPA-6004-86026 Washington DC US Environmental ProtectionAgency

MunhollandP L and Borkowski J J (1996) ldquoSimple Latin Square SamplingC 1 A Spatial Design Using Quadratsrdquo Biometrics 52 125ndash136

Olea R A (1984) ldquoSampling Design Optimization for Spatial FunctionsrdquoMathematical Geology 16 369ndash392

Overton W S and Stehman S V (1993) ldquoProperties of Designs for SamplingContinuous Spatial Resources From a Triangular Gridrdquo Communications inStatistics Part AmdashTheory and Methods 22 2641ndash2660

Patterson H D (1950) ldquoSampling on Successive Occasions With Partial Re-placement of Unitsrdquo Journal of the Royal Statistical Society Ser B 12241ndash255

Peano G (1890) ldquoSur Une Courbe Qui Remplit Toute Une Aire Planerdquo Math-ematische Annalen 36 157ndash160

Quenouille M H (1949) ldquoProblems in Plane Samplingrdquo The Annals of Math-ematical Statistics 20 335ndash375

Saalfeld A (1991) ldquoConstruction of Spatially Articulated List Frames forHousehold Surveysrdquo in Proceedings of Statistics Canada Symposium 91Spatial Issues in Statistics Ottawa Canada Statistics Canada pp 41ndash53

Sen A R (1953) ldquoOn the Estimate of the Variance in Sampling With Vary-ing Probabilitiesrdquo Journal of the Indian Society of Agricultural Statistics 7119ndash127

Simmons G F (1963) Introduction to Topology and Modern Analysis NewYork McGrawndashHill

Stehman S V and Overton W S (1994) ldquoEnvironmental Sampling and Mon-itoringrdquo in Handbook of Statistics Vol 12 eds G P Patil and C R RaoAmsterdam The Netherlands Elsevier Science pp 263ndash305

Stevens D L Jr (1997) ldquoVariable Density Grid-Based Sampling Designs forContinuous Spatial Populationsrdquo Environmetrics 8 167ndash195

Stevens D L Jr and Olsen A R (1999) ldquoSpatially Restricted Surveys OverTime for Aquatic Resourcesrdquo Journal of Agricultural Biological and Envi-ronmental Statistics 4 415ndash428

(2000) ldquoSpatially-Restricted Random Sampling Designs for Design-Based and Model-Based Estimationrdquo in Accuracy 2000 Proceedings of the4th International Symposium on Spatial Accuracy Assessment in Natural Re-sources and Environmental Sciences Delft The Netherlands Delft Univer-sity Press pp 609ndash616

(2003) ldquoVariance Estimation for Spatially Balanced Samples of Envi-ronmental Resourcesrdquo Environmetrics 14 593ndash610

Strahler A N (1957) ldquoQuantitative Analysis of Watershed GeomorphologyrdquoTransactions of the American Geophysical Union 38 913ndash920

Thompson S K (1992) Sampling New York WileyUrquhart N S Overton W S and Birkes D S (1993) ldquoComparing

Sampling Designs for Monitoring Ecological Status and Trends Impact ofTemporal Patternsrdquo in Statistics for the Environment eds V Barnett andK F Turkman New York Wiley pp 71ndash86

USGS (1999) ldquoThe National Hydrography Datasetrdquo Fact Sheet 106-99 USGeological Survey

Wolter K (1985) Introduction to Variance Estimation New York Springer-Verlag

Wolter K M and Harter R M (1990) ldquoSample Maintenance Based on PeanoKeysrdquo in Proceedings of the 1989 International Symposium Analysis of Datain Time Ottawa Canada Statistics Canada pp 21ndash31

Yates F (1981) Sampling Methods for Censuses and Surveys (4th ed) Lon-don Grif n

Yates F and Grundy P M (1953) ldquoSelection Without Replacement FromWithin Strata With Probability Proportional to Sizerdquo Journal of the RoyalStatistical Society Ser B 15 253ndash261

Page 7: Spatially Balanced Sampling of Natural Resources · Spatially Balanced Sampling of Natural Resources DonL.STEVENSJr. and Anthony R. OLSEN The spatial distribution of a natural resource

268 Journal of the American Statistical Association March 2004

to pick samples in each of the 50 strata that had a sample sizeequal to the integerpart of the expected sample size plus 1 if thestratum was selected in stage 1 Again the sample was selectedwith the ldquosamplerdquo function this time with ldquoprobrdquo set to the ele-ment inclusionprobabilityThis two-stage procedurealways se-lects exactly 50 samples with the desired inclusion probability

In Figure 3 we plot the variance of the achieved sample sizein each of the evaluation cells versus the expected sample sizewith lowess tted lines Of the three designs the IRS has thelargest variance and the GRTS has the smallest the SSS designis approximatelymidway between Strati cation with one sam-ple per cell would likely have about the same variance as theGRTS

Another common way to characterize a one-dimensionalpoint process is via the interevent distance for example themean interevent time for a time series measures the intensity ofthe process and the variance measures the regularity An anal-ogous concept in two-dimensions is that of Voronoi polygonsFor a set of event points fs1 s2 sng in a two-dimensionaldomain the Voronoi polygon 9i for the ith point is the collec-tion of domain points that are closer to si than to any other sj inthe set Note that in the case of a nite population the Voronoildquopolygonsrdquoare collectionsof populationpoints and for a linearpopulation they are collections of line segments

We propose using a statistic based on Voronoi polygons todescribe the regularity of a spatial sample For the sample S

consisting of the points fs1 s2 sng let ordmi DR

9ifrac14s dAacutes

so that vi is the total inclusion probability of the Voronoi poly-gon for the ith sample point and set sup3 D Varfordmig For a -nite population with variable inclusion probability vi is thesum of the inclusion probability of all population units closerto the sample point si than to any other sample point BecauseP

i j9ij D jRj andR

Rfrac14s dAacutes D n E[vi ] D 1 We note that

for an equiprobable sample of a two-dimensionalcontinua sup3 isequal to the variance of the area of the Voronoi polygons for thepoints of S multipliedby the square of the inclusionprobability

Figure 3 Comparison of the Regularity of GRTS SSS and IRS De-signs Results are based on the mean of 1000 samples of size 50 Theachieved sample size is the number of samples that fell into 1 pound 1square cells that tiled the population domain Lines were tted withlowess (N generalized random tessellation stratied sampling pound inde-pendent random sampling brvbar spatially stratied sampling)

For the kinds of applications that we have in mind the spa-tial context of the population is an intrinsic aspect of the sampleselection For a nite population the spatial context simplycomprises the locationsof the populationunits for a linear pop-ulation the spatial context is the network and for an areal re-source the spatial context is described by the boundary of theresource domain which may be a series of disconnected poly-gons The effect of the interplay of sampling design and spa-tial context on properties of the sample cannot be ignored Forsmall to moderate sample sizes or for highly irregular domainsthe spatial context can have a substantial impact on the distri-bution of sup3 Because of the spatial dependence the derivationof a closed form for the distribution of sup3 does not seem feasi-ble even for simple sampling designs such as IRS Howeverfor most cases it should be relatively easy to simulate the dis-tribution of sup3 under IRS to obtain a standard for comparisonThe regularity of a proposed design can then be quanti ed asthe ratio sup3proposed design=sup3IRS where ratios less than 1indicate more regularity than an IRS design

We evaluated spatial balance using the sup3 ratio under threescenarios (1) a variable probability sample from a nite popu-lation (2) an equiprobable point sample from an areal popula-tion de ned on the unit square and (3) an equiprobable pointsample from the same extensive populationbut with randomlylocated square holes to model nonresponseand imperfect frameinformation

For the nite population study we drew 1000 samples ofsize 50 from the previously described nite population for boththe GRTS and the IRS designs To illustrate the ability of theGRTS design to maintain spatial regularity as the sample sizeis augmented we ordered the GRTS points using reverse hi-erarchical ordering We then calculated the sup3 ratio beginningwith a size of 10 and adding one point at a time following thereverse hierarchical order We also drew 1000 samples of size50 using the previously discussed spatial strati cation Becausethere is no sensible way to add the strati ed sample points oneat a time we can compute the sup3 ratio only for the complete sam-ple of 50 points Figure 4 is a plot of the sup3 ratio for GRTS andIRS versus sample size The single sup3 ratio for SSS(50) is alsoshown For the GRTS design the sup3 ratio has a maximum valueof 587 with 10 samples and gradually tapers off to 420 with50 samples Although it would be dif cult to prove we sus-pect that the gradual taper is due to lessening edge effect withincreasing sample size that is fewer of the Voronoi polygonscross the void regions in the population domain We note thatthe valleys in the sup3 ratio occur at multiples of 4 with the mostextreme dips occurring at powers of 4 This is a consequenceof quadrant-recursive partitioning maximum regularity occurswith one point from each of the four quadrants We also notethat the SSS(50) value of the sup3 ratio is 550 compared to thecorrespondingvalue of 420 for the GRTS design Inasmuch asthe GRTS is analogous to a one-sample-per-stratum SSS wewould expect the GRTS to be as ef cient as a maximally ef -cient SSS

For our extensive population study we selected 1000 sam-ples of size M D 256 from the unit square using the GRTS

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 269

Figure 4 The sup3 Ratio as a Function of Sample Size Based on 1000Replicate Samples From an Arti cial Finite Population Sample pointswere added one point at a time up to the maximum sample size of 50following reverse hierarchical order for the GRTS sample The sup3 ratio fora spatially stratied sample is also indicated on the plot

design and ordered the samples using the reverse hierarchicalorder As for the nite populationstudy we calculated the sup3 ra-tio as the points were added to the sample one at a time be-ginning with point number 10 The holes represent nontargetor access-denied elements that were a priori unknown Samplepoints that fell in the holes were discarded resulting in a vari-able number of sample points in the target domain As for thecomplete domain we ordered the points using reverse hierar-chical ordering and then calculated the sup3 ratio as the points wereadded one at a time Because the sample points that fall into thenontarget areas contribute to the sample point density but not tothe sample size the sup3 ratio was plotted versus point density

We used three different distributions of hole size constantlinearly increasing and exponentially increasing In each casethe holes comprise 20 of the domain area Figure 5 shows the

Figure 5 Void Patterns Used to Simulate InaccessiblePopulation El-ements

Figure 6 The sup3 Ratio as a Function of Point Density Based on 1000Replications of a Sample of Size 256 (mdashmdash continuous domain with novoids iexcl iexcl iexcliexcl exponentialy increasing polygon size - - - - - linearly in-creasing polygon size cent cent centmdashcent cent centmdash constant polygon size)

placementof the holes for each scenario and Figure 6 shows thesup3 ratio for all four scenarios no voids exponentiallyincreasinglinearly increasing and constant size

In every scenario the variance ratio is much less than 1Except for small sample sizes the ratio stays in the range of2 to 4 The gradual decrease as the sample size increases is dueto the decreasing impact of the boundaryas the sample size in-creases the proportion of polygons that intersect the boundarydecreases A similar effect is seen with the different inaccessi-bility scenarios even though the inaccessible area is constantthe scenarios with greater perimeter cause more increase invariance

4 STATISTICAL PROPERTIES OF GRTS DESIGN

41 Estimation

The GRTS design produces a sample with speci ed rst-order inclusion probabilities so that the HorvitzndashThompson(Horvitz and Thompson 1952) estimator or its continuouspop-ulation analog (Cordy 1993 Stevens 1997) can be applied toget estimates of population characteristics Thus for examplean estimate of the population total of a response z is given byOZT D

Psi2Rzsi=frac14si Stevens (1997) provided exact ex-

pressions for second-order inclusion functions for some specialcases of a GRTS These expressions can also be used to provideaccurate approximations for the general case Unfortunatelythe variance estimator based on using these approximations inthe usual HorvitzndashThompson (HT) or YatesndashGrundyndashSen (YGYates and Grundy 1953 Sen 1953) estimator tends to be unsta-ble The design achieves spatial balance by forcing the pairwiseinclusion probability to approach 0 as the distance between thepoints in the pair goes to 0 Even though the pairwise inclu-sion density is nonzero almost everywhere any moderate-sizedsample will nevertheless have one or more pairs of points thatare close together with a correspondinglysmall pairwise inclu-sion probability For both the HT and YG variance estimatorsthe pairwise inclusion probability appears as a divisor The cor-

270 Journal of the American Statistical Association March 2004

responding terms in either HT or YG variance estimators willtend to be large leading to instability of the variance estimator

Contrast-based estimators of the form OVCtr OZT DP

i wiy2i

where yi is a contrast of the form yi DP

k cikzsk withPk cik D 0 have been discussed by several authors (Yates

1981 Wolter 1985 Overton and Stehman 1993) For an RTSdesign Overton and Stehman also considered a ldquosmoothedrdquocontrast-based estimator of the form OVSMO OZT D

Pi wizi iexcl

zcurreni 2 where zcurren

i called the smoothed value for data point zi istaken as a weighted mean of a point plus its nearest neighborsin the tessellation

Stevens and Olsen (2003) proposed a contrast-based esti-mator for the GRTS design that bears some resemblance tothe Overton and Stehman smoothed estimator The single con-trast zi iexcl zcurren

i 2 is replaced with an average of several con-trasts over a local neighborhood analogous to a tessellationcell and its nearest neighbors in the RTS design A heuris-tic justi cation for this approach stems from the observationthat the inverse images of the unit-probability intervals on theline form a random spatial strati cation of the population do-main The GRTS design conditional on the strati cation isa one-sample-per-stratumspatially strati ed sample Recall thatOZT D

Psi2Rzsi=frac14si where zsi is a sample from the

ith random stratum The selections within strata are condition-ally independent of one another so that

V OZT DX

si2R

E

microV

sup3zsi

frac14si

shyshyshystrata

acutepara

The proposed variance estimator approximates E[V zsi=

frac14sij strata] by averaging several contrasts over a localneighborhoodof each sample point The estimator is

OVNBH OZT DX

si2R

X

sj 2Dsi

wij

sup3zsj

frac14sj iexcl

X

sk2Dsi

wikzsk

frac14sk

acute2

where Dsi is a local neighborhood of the si The weightswij are chosen to re ect the behavior of the pairwise inclu-sion function for GRTS and are constrained so that

Pi wij DP

j wij D 1 Stimulation studies with a variety of scenarioshave shown the proposed estimator to be stable and nearly unbi-ased Applications with real data have consistently shown thatour local neighborhoodvariance estimator produces smaller es-timates than the HorvitzndashThompson estimator when IRS is as-sumed to approximate for the joint inclusion probabilities

42 Inverse Sampling

The reverse hierarchical ordering provides the ability to doinverse sampling that is to sample until a given number ofsamples are obtained in the target population The true inclu-sion probability in this case depends on the spatial con gura-tion of the target populationwhich may be unknownHoweverone can compute an inclusion probability that is conditional onthe achieved sample size in the target population being xedFor example suppose we want M sample points in our do-main R We do not know the exact boundaries of R but areable to enclose R in a larger set Rcurren We select a sample of sizeMcurren gt M from Rcurren using an inclusion density frac14 curren scaled so that

Table 2 Domain Area Estimates Using ConditionalInclusion Probability

Targetsamplesize

Mean estimated domain area

Exponential Constant Linear

25 8000979 7969819 801058950 7995775 7979406 8005739

100 7994983 7980543 8002237150 7994777 7997587 7995685

RRcurren frac14currens dAacutes D M curren The inclusion density for the k-point

reverse hierarchical ordered sample is frac14currenk s D k=Mcurrenfrac14currens

Using the inclusion density frac14 currenk the expected number of sam-

ples in R is

Mk DZ

R

frac14currenk s dAacutes D

Z

RcurrenIRsfrac14 curren

k s dAacutes

We cannot compute Mk because the boundaryof R is unknownbut an estimate is

OMk DX

i

IR sifrac14currenk si

frac14currenk si

DX

i

IRsi

We pick Qk so that OMk D M and base inference on frac14currenQk Thus

for example an estimate of the unknown extent of R is j ORj DPiIRsi=frac14curren

Qksi

We illustrate this using the same inaccessibility scenarios asfor the spatial balance simulation Results are summarized inTable 2 In each case the true area of R is 8 so that the esti-mator using frac14curren

Qk is either unbiased or nearly so

43 Statistical Ef ciency

As discussed in the Introduction sampling designs withsome degree of spatial regularity for example systematic grid-based or spatially strati ed designs tend to be more ef cientfor sampling natural resources than designs with no spatialstructure The GRTS design takes the concept of spatial strat-i cation carries it to an extreme and gives it exibility androbustness The basis for these claims is that for the case ofan equiprobable sample of an areal resource over a continuousconnected domain a GRTS sample with size n D 4k is a spa-tially strati ed sample with one sample point per stratum Inthis case the strata are square grid cells with a randomly locatedorigin Generally the ef ciency of a spatially strati ed sampleincreases as the number of strata increases (samples per stratumdecreases) so maximal ef ciency is obtained for a one-point-per-stratum-design Thus in this restricted case the GRTS hasthe same ef ciency as the maximally ef cient spatial strati ca-tion

The spatial regularity simulation studies provide some in-sight into less restrictive cases First the ldquono-voidrdquo case ofthe continuous domain study shows that the spatial regularityis not seriously degraded for sample sizes that are not powersof 4 so that even for intermediate sample sizes the GRTS ef- ciency should be close to the ef ciency of maximal spatialstrati cation Second the ldquoholesrdquo cases show that for irreg-ularly shaped domains GRTS maintains spatial regularity Inthis case GRTS with n D 4k is again a one-point-per-stratum

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 271

design but the strata are no longer regular polygonsNeverthe-less GRTS should have the same ef ciency as maximal strati- cation

An example of circumstances where ef ciency is dif cultto evaluate is a nite population study with variable probabil-ity and irregular spatial density In these circumstances spatialstrata can be very dif cult to form and in fact it may be impos-sible to form strata with a xed number of samples per stratumA GRTS sample achieves the regularity of a one-sample-per-stratum strati cation and so should have the same ef ciency

The overwhelming advantage of a GRTS design is not that itis more ef cient than spatial strati cation but that it can be ap-plied in a straightforward manner in circumstances where spa-tial strati cation is dif cult All of the pathologies that occurin sampling natural populations (poor frame information inac-cessibility variable probability uneven spatial pattern missingdata and panel structures) can be easily accommodated withinthe GRTS design

5 EXAMPLE APPLICATION TO STREAMS

The Indiana Department of Environmental Management(IDEM) conducts water quality and biological assessments ofthe streams and rivers within Indiana For administrative pur-poses the state is divided into nine hydrologicbasins East ForkWhite River Basin West Fork White River Basin Upper IllinoisRiver Basin Great Miami River Basin Lower Wabash RiverBasin Patoka River Basin Upper Wabash River Basin GreatLakes Basin and Ohio River Basin All basins are assessedonce during a 5-year period typically two basins are completedeach year In 1996 IDEM initiated a monitoring strategy thatused probability survey designs for the selection of samplingsite locations We collaborated with them on the survey designIn 1997 a GRTS multidensity design was implemented for theEast Fork White River Basin and the Great Miami River BasinIn 1999 another GRTS multidensity design was implementedfor the Upper Illinois Basin and the Lower Wabash These de-signs will be used to illustrate the application of GRTS surveydesigns to a linear network

The target population for the studies consists of all streamsand rivers with perennially owing water A sample frameRiver Reach File Version 3 (RF3) for the target populationis available from the US Environmental Protection Agency(Horn and Grayman 1993) The RF3 includes attributes that en-able perennial streams and rivers to be identi ed but results inan overcoverage of the target population due to coding errorsIn addition Strahler order is available to classify streams andrivers into relative size categories (Strahler 1957) A headwaterstream is a Strahler rst-order stream two rst-order streamsjoining results in a second-order stream and so on Approx-imately 60 of the stream length in Indiana is rst order

Table 3 Sample Frame Stream and River Length by Basin andStrahler Order Category

Strahler order category length (km)

Basin Total length (km) 1 2 3 C

E Fork White 6802385 3833335 2189494 779556Great Miami 2270018 1501711 621039 147268L Wabash 7601418 4632484 1331228 1637706U Illinois 5606329 4559123 500188 547018

20 is second order 10 is third order and 10 is fourth andgreater (see Table 3) In 1997 IDEM determined that the sam-ple would be structured so that approximately an equal num-ber of sites would be in rst order second and third order andfourth C order for the East Fork White River and the Great Mi-ami River basins In 1999 the sample was modi ed to have anequal number of sites in rst second third and fourth C ordercategories for the Lower Wabash and Upper Illinois basins

The GRTS multidensity survey designs were applied In bothyears six multidensity categories were used (three Strahler or-der categories in each of two basins) Although four Strahlerorder categories were planned in 1999 the stream lengths as-sociated with the third and fourth C categories were approxi-mately equal so a single category that combined the samplesizes was used To account for frame errors landowner denialsand physically inaccessible stream sites a 100 oversamplewas incorporated in 1999 The intent was to have a minimumof 38 biological sites with eld data in 1999 this was not donein 1997 Table 4 summarizes the number of sites expected andactually evaluated as well as the number of nontarget targetnonresponse and sampled sites Almost all of the nonresponsesites are due to landowner denial In 1999 the sites were usedin reverse hierarchical order until the desired number of ac-tual eld sample sites was obtained The biological sites werea nested subsample of the water chemistry sites and were takenin reverse hierarchicalorder from the water chemistry sites Fig-ures 7ndash10 show the spatial pattern of the stream networks andthe GRTS sample sites for each of the four basins by Strahlerorder categories Although this is an example of a single real-izationof a multidensityGRTS design all realizationswill havea similar spatial pattern Prior to statistical analysis the initialinclusion densities are adjusted to account for use of oversam-ple sites by recalculating the inclusion densities by basin

Indiana determined two summary indices related to the eco-logical conditionof the streams and rivers the IBI score whichis a sh community index of biological integrity (Karr 1991)that assesses water quality using resident sh communities asa tool for monitoring the biological integrity of streams and theQHEI score which is a habitat index based on the Ohio Envi-ronmental Protection Agency qualitative habitat evaluation in-dex (see IDEM 2000 for detailed descriptions of these indices)

Table 4 Survey Design Sample Sizes for Basins Sampled in 1997 and 1999

Expected Evaluated Nontarget Target Nonresponse Water chemistry BiologicalBasin sample size sample size sites sites sites sites sites

E Fork White 60 60 5 55 9 35 34Great Miami 40 40 12 28 5 19 19L Wabash 128 91 11 80 9 71 39U Illinois 128 85 8 77 5 72 41

272 Journal of the American Statistical Association March 2004

Figure 7 East Fork White River Basin Sample Sites by Multidensity Categories

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 273

Figure 8 Great Miami River Basin Sample Sites by Multidensity Categories

274 Journal of the American Statistical Association March 2004

Figure 9 Upper Illinois River Basin Sample Sites by Multidensity Categories

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 275

Figure 10 Lower Wabash River Basin Sample Sites by Multidensity Categories

276 Journal of the American Statistical Association March 2004

Table 5 Population Estimates With IRS and Local Variance Estimates

Indicator IRS Local DifferenceSubpopulation score N sites Mean std err std err ()

L Wabash IBI 39 361 21 14 iexcl568U Illinois IBI 41 325 17 13 iexcl448E Fork White IBI 32 351 13 12 iexcl223Great Miami IBI 19 408 27 22 iexcl335L Wabash QHEI 39 556 23 16 iexcl522U Illinois QHEI 41 433 21 16 iexcl395E Fork White QHEI 34 543 21 15 iexcl459Great Miami QHEI 19 678 22 19 iexcl260

Table 5 summarizes the population estimates for IBI andQHEI scores for each of the four basins The associated stan-dard error estimates are based on the HorvitzndashThompson ratiovariance estimator assuming an independent random sampleand on the local neighborhoodvariance estimator described inSection 41 On average the neighborhood variance estimatoris 38 smaller than the IRS variance estimator Figure 11 il-lustrates the impact of the variance estimators on con denceintervals for cumulative distribution function estimates for theLower Wabash Basin

6 DISCUSSION

There are a number of designs that provide good disper-sion of sample points over a spatial domain When we appliedthese designs to large-scale environmental sampling programsit quickly became apparent that we needed a means (1) toaccommodate variable inclusion probability and (2) to adjustsample sizes dynamically These requirements are rooted inthe very fundamentals of environmental management The rstrequirement stems from the fact that an environmental re-source is rarely uniformly important in the objective of themonitoring there are always scienti c economic or politi-cal reasons for sampling some portions of a resource moreintensively than others Two features of environmental moni-toring programs drive the second requirement First these pro-grams tend to be long lived so that even if the objectives ofthe program remain unchangedthe ldquoimportantrdquosubpopulationschange necessitating a corresponding change in sampling in-tensity Second a high-quality sampling frame is often lack-ing for environmental resource populationsAs far as we knowthere is no other technique for spatial sampling that ldquobalancesrdquoover an intensity metric instead of a Euclidean distance metricor permits dynamic modi cation of sample intensity

Adaptive sampling (Thompson 1992 pp 261ndash319) is an-other way to modify sample intensity However there are somesigni cant differences between GRTS and adaptive sampling inthe way the modi cation is accomplished Adaptive samplingincreases the sampling intensity locally depending on the re-sponse observed at a sample point whereas the GRTS intensitychange is global

The GRTS rst-order inclusion probability (or density) canbe made proportional to an arbitrary positive auxiliary vari-able for example a signal from a remote sensing platformor a sample intensity that varies by geographical divisions orknown physical characteristics of the target populationIn somepoint and linear situations it may be desirable to have thesample be spatially balanced with respect to geographic spacerather than with respect to the population density This can be

Figure 11 Stream Network and Sample Site Spatial Patterns by Mul-tidensity Category for the Lower Wabash Basin (mdashmdash- CDF estimateiexcl iexcl iexcl iexcl iexcl 95 local condence limits cent cent cent cent cent cent cent cent cent95 IRS condencelimits)

achieved by making the inclusion probability inversely propor-tional to the population density Although the development ofGRTS has focused on applications in geographic space it canbe applied in other spaces For exampleone applicationde nedtwo-dimensionalspace by the rst two principal componentsofclimate variables and selected a GRTS sample of forest plots inthat space

The computational burden in hierarchical randomization canbe substantial However it needs to be carried out only to a res-olution suf cient to obtain no more than one sample point persubquadrant The actual point selection can be carried out bytreating the subquadrants as if they are elements of a nitepopulation selecting the M subquadrants to receive samplepoints and then selecting one population element at randomfrom among the elements contained within the selected sub-quadrants according to the probability speci ed by frac14

Reverse hierarchical ordering adds a feature that is im-mensely popular with eld practitioners namely the ability toldquoreplacerdquo samples that are lost due to being nontarget or inac-cessible Moreover we can replace the samples in such a wayas to achieve good spatial balance over the population that isactually sampleable even when sampleability cannot be deter-mined prior to sample selectionOf course this feature does noteliminate the nonresponse or the bias of an inference to the in-accessible population It does however allow investigators toobtain the maximum number of samples that their budget willpermit them to analyze

Reverse hierarchical ordering has other uses as well One isto generate interpenetrating subsamples (Mahalanobis 1946)For example 10 interpenetrating subsamples from a samplesize of 100 can be obtained simply by taking consecutive sub-sets of 10 from the reverse hierarchical ordering Each subsethas the same properties as the complete design Consecutivesubsets can also be used to de ne panels of sites for applica-tion in surveys over time for example sampling with partial re-placement (Patterson 1950 Kish 1987 Urquhart Overton andBirkes 1993)

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 277

APPENDIX A PROOF OF LEMMA

Lemma Let f I2 I be a 1ndash1 quadrant-recursive function andlet s raquo UI2 Then limjplusmnj0 Efjf s iexcl f s C plusmnjg D 0

Proof If for some n gt 0 s and s C plusmn are in the same subquad-rant Qn

jk then f s and f s C plusmn are in the same interval J nm so

that jf s iexcl f s C plusmnj middot 1=4n The probability that s and s C plusmn arein the same subquadrant is the same as the probability of the ori-gin and plusmn D plusmnx plusmny being in the same cell of a randomly locatedgrid with cells congruent to Qn

jk For plusmnx plusmny middot 1=2n that probability

is equal to jQn0 Qnplusmnj=jQn0j D 1 iexcl 2nplusmnx C plusmny C 4nplusmnxplusmny where Qnx denotes a polygon congruent to Qn

jk centered on x

For Ds plusmn D jf s iexcl f s C plusmnj then we have that P D middot 1=4n cedil1 iexcl 2nplusmnx C plusmny C 4nplusmnxplusmny Thus the distribution function FD of D isbounded below by

FD u cedil

8gtlt

gt

0 u middot 14n

1 iexcl 2nplusmnx C plusmny C 4nplusmnxplusmny u gt1

4n

Because D is positive and bounded above by 1

E[Dplusmn] D 1 iexclZ 1

0FD udu

middot 1 iexclraquo

0

4nC

sup31 iexcl 1

4n

acuteiexcl 2nplusmnx C plusmny C 4nplusmnxplusmny

frac14

For xed n we have that

limjplusmnj0

E[Dplusmn] middot 14n

but this holds for all n so that

limjplusmnj0

E[Dplusmn] D 0

APPENDIX B PROOF THAT THE PROBABILITYINCLUSION FUNCTION EQUALS THE

TARGET INTENSITY FUNCTION

We need the measure space XB Aacute where X is the unit inter-val I D 0 1] or the unit square I2 D 0 1] pound 01] and the rele-vant frac34 elds are BI and BI2 the frac34 elds of the Borel subsetsof I and I2 respectively For each of the three types of populationswe de ne a measure Aacute of population size We use the same symbolfor all three cases but the speci cs vary from case to case For a -nite population we take Aacute to be counting measure restricted to R sothat for any subset B 2 BI2 AacuteB is the number of population ele-ments in B R For linear populations we take AacuteB to be the lengthof the linear population contained within B Clearly Aacute is nonnega-tive countably additive de ned for all Borel sets and Aacute D 0 soAacute is a measure Finally for areal populations we take AacuteB to be theLebesgue measure of B R

We begin by randomly translating the image of R in the unit squareby adding independent U0 1=2 offsets to the xy coordinates Thisrandom translation plays the same role as random grid location doesin an RTS design namely it guarantees that pairwise inclusion prob-abilities are nonzero In particular in this case it ensures that any pairof points in R has a nonzero chance of being mapped into differentquadrants

Let frac14s be an inclusion intensity function that is a function thatspeci es the target number of samples per unit measure We assumethat any linear population consists of a nite number m of smoothrecti able curves R D

SmiD1fdegit D xi t yi t jt 2 [ai bi ]g with

xi and yi continuous and differentiable on [ai bi ] We set frac14s equalto the target number of samples per unit length at s for s 2 L and

equal to zero elsewhere For example if the linear population werea stream network frac14s would specify the desired number of samplesper kilometer of stream at the point s Finally an areal population isa nite collection of closed polygons In this case frac14s speci es thetarget intensity as number of samples per unit area Note that for one-and two-dimensional resources frac14s could be a continuous smoothlyvarying function Formally we require frac14s to be bounded and mea-surable strictly positive on R and zero elsewhere and scaled so thatM D

RR frac14s dAacutes From these de nitions of frac14cent and Aacutecent it follows

that wB DR

B frac14sdAacutes is a measure and that wB is the targetnumber of samples in B In particular M D wI2 is the target samplesize In the following discussion we assume that M is an integer thenoninteger case is a simple extension

Let f cent be a quadrant-recursive function that maps I2 into I Be-cause BI can be generated by sets of the form J n

m and BI2 can begenerated by sets of the form Qn

j k both f and f iexcl1 are measurable

Because f is measurable f iexcl1B is measurable for B 2 BI so thatQF x D

Rf iexcl10x] frac14sdAacutes exists In fact QF is a distribution func-

tion that is nonnegative increasing and right continuous For linearand areal resources QF is a continuous increasing function but for -nite resourcepopulations QF is a step function with jumps at the imagesof populationelements We can modify QF to obtain continuity in the -nite case via linear interpolation that is let xi i D 1 N be the or-dered jump points of QF set x0 D 0 xNC1 D 1 and for xi lt x middot xiC1set F x D QF x C QFxiC1 iexcl QFxi =xiC1 iexcl xi x iexcl xi If we setF D QF for the linear and areal case then in every cases we have thatF is a continuous distribution function with range 0M]

In the nite case Fiexcl1 is single-valued so that Gy D minxi jF iexcl1y middot xi is well de ned In the linear and areal cases F iexcl1

may not be single-valued Points that are in the unit square but notin R lead to ats in F that correspond to regions in the unit squarewith frac14s D 0 However Fiexcl1y always will be closed and boundedso that Gy D minfxjx 2 Fiexcl1yg is well de ned In all cases theintensity function frac14 is positive at s D f iexcl1Gy that is there isa population element at s Thus f iexcl1 plusmn G maps 0M] onto the tar-get population that is f iexcl1 plusmn G associates every point in 0 M] witha unique element in the population

It follows that selecting a sample from 0 M] also selects pop-ulation elements via the mapping f iexcl1 plusmn G To get a sample withan inclusion function equal to the target inclusion density we selecta sample from 0M] by splitting the range into M unit-length in-tervals 01] 1 2] M iexcl 1M] and picking one point in eachinterval Because of hierarchical randomizationwe gain no additionalldquorandomnessrdquo by picking the points independently so we use system-atic sampling with a random start and a unit-length selection intervalThe selection procedure de nes an inclusion probability density func-tion on 0M] with a correspondingmeasure PM cent Note that PM co-incides with Lebesgue measure on 0 M] in particular the measureof a subinterval of 0 M] is its length We induce a measure P1 on I

via P1B DRGiexcl1B dP M and in turn induce a measure P2 on I2

via P2B DR

f iexcl1B dP 1 The measure P2 is an inclusion probability

measure on I2 and P2B D wB so the sample selection methoddoes give an inclusion probability function equal to the target sampleintensity function

[Received August 2002 Revised September 2003]

REFERENCES

Bellhouse D R (1977) ldquoSome Optimal Designs for Sampling in Two Dimen-sionsrdquo Biometrika 64 605ndash611

Bickford C A Mayer C E and Ware K D (1963) ldquoAn Ef cient Sam-pling Design for Forest Inventory The Northeast Forest Resurveyrdquo Journalof Forestry 61 826ndash833

278 Journal of the American Statistical Association March 2004

Breidt F J (1995) ldquoMarkov Chain Designs for One-per-Stratum SamplingrdquoSurvey Methodology 21 63ndash70

Brewer K R W and Hanif M (1983) Sampling With Unequal ProbabilitiesNew York Springer-Verlag

Cochran W G (1946) ldquoRelative Accuracy of Systematic and Strati ed Ran-dom Samples for a Certain Class of Populationsrdquo The Annals of Mathemati-cal Statistics 17 164ndash177

Cordy C (1993) ldquoAn Extension of the HorvitzndashThompson Theorem to PointSampling From a Continuous Universerdquo Probability and Statistics Letters18 353ndash362

Cotter J and Nealon J (1987) ldquoArea Frame Design for Agricultural SurveysrdquoUS Department of Agriculture National Agricultural Statistics Service Re-search and Applications Division Area Frame Section

Dalenius T Haacutejek J and Zubrzycki S (1961) ldquoOn Plane Sampling and Re-lated Geometrical Problemsrdquo in Proceedings of the 4th Berkeley Symposiumon Probability and Mathematical Statistics 1 125ndash150

Das A C (1950) ldquoTwo-Dimensional Systematic Sampling and the AssociatedStrati ed and Random Samplingrdquo Sankhya 10 95ndash108

Gibson L and Lucas D (1982) ldquoSpatial Data Processing Using BalancedTernaryrdquo in Proceedings of the IEEE Computer Society Conference on Pat-tern Recognition and Image Processing Silver Springs MD IEEE ComputerSociety Press

Gilbert R O (1987) Statistical Methods for Environmental Pollution Moni-toring New York Van Nostrand Reinhold

Hausdorff F (1957) Set Theory New York ChelseaHazard J W and Law B E (1989) Forest Survey Methods Used in the USDA

Forest Service EPA6003-89065 Corvallis Oregon US EnvironmentalProtection Agency Of ce of Research and Development Environmental Re-search Laboratory

Horn C R and Grayman W M (1993) ldquoWater-Quality Modeling With EPARiver Reach File Systemrdquo Journal of Water Resources Planning and Man-agement 119 262ndash274

Horvitz D G and Thompson D J (1952) ldquoA Generalization of SamplingWithout Replacement From a Finite Universerdquo Journal of the American Sta-tistical Association 47 663ndash685

Iachan R (1985) ldquoPlane Samplingrdquo Statistics and Probability Letters 50151ndash159

IDEM (2000) ldquoIndiana Water Quality Report 2000rdquo Report IDEM34020012000 Indiana Department of Environmental Management Of ce of Wa-ter Management Indianapolis Indiana

Insightful Corporation (2002) ldquoS-PLUS 6 for Windows Language ReferencerdquoInsightful Corporation Seattle WA

Karr J R (1991) ldquoBiological Integrity A Long Neglected Aspect of WaterResource Managementrdquo Ecological Applications 1 66ndash84

Kish L (1987) Statistical Design for Research New York WileyMahalanobis P C (1946) ldquoRecent Experiments in Statistical Sampling in

the Indian Statistical Instituterdquo Journal of the Royal Statistical Society 109325ndash370

Mark D M (1990) ldquoNeighbor-Based Properties of Some Orderings of Two-Dimensional Spacerdquo Geographical Analysis 2 145ndash157

Mateacutern B (1960) Spatial Variation Stockholm Sweden Meddelanden fraringnStatens Skogsforskningsinstitut

Messer J J Arsiss C W Baker J R Drouseacute S K Eshleman K NKaufmann P R Linthurst R A Omernik J M Overton W S Sale M JSchonbrod R D Stambaugh S M and Tuschall J R Jr (1986) Na-tional Surface Water Survey National Stream Survey Phase I-Pilot SurveyEPA-6004-86026 Washington DC US Environmental ProtectionAgency

MunhollandP L and Borkowski J J (1996) ldquoSimple Latin Square SamplingC 1 A Spatial Design Using Quadratsrdquo Biometrics 52 125ndash136

Olea R A (1984) ldquoSampling Design Optimization for Spatial FunctionsrdquoMathematical Geology 16 369ndash392

Overton W S and Stehman S V (1993) ldquoProperties of Designs for SamplingContinuous Spatial Resources From a Triangular Gridrdquo Communications inStatistics Part AmdashTheory and Methods 22 2641ndash2660

Patterson H D (1950) ldquoSampling on Successive Occasions With Partial Re-placement of Unitsrdquo Journal of the Royal Statistical Society Ser B 12241ndash255

Peano G (1890) ldquoSur Une Courbe Qui Remplit Toute Une Aire Planerdquo Math-ematische Annalen 36 157ndash160

Quenouille M H (1949) ldquoProblems in Plane Samplingrdquo The Annals of Math-ematical Statistics 20 335ndash375

Saalfeld A (1991) ldquoConstruction of Spatially Articulated List Frames forHousehold Surveysrdquo in Proceedings of Statistics Canada Symposium 91Spatial Issues in Statistics Ottawa Canada Statistics Canada pp 41ndash53

Sen A R (1953) ldquoOn the Estimate of the Variance in Sampling With Vary-ing Probabilitiesrdquo Journal of the Indian Society of Agricultural Statistics 7119ndash127

Simmons G F (1963) Introduction to Topology and Modern Analysis NewYork McGrawndashHill

Stehman S V and Overton W S (1994) ldquoEnvironmental Sampling and Mon-itoringrdquo in Handbook of Statistics Vol 12 eds G P Patil and C R RaoAmsterdam The Netherlands Elsevier Science pp 263ndash305

Stevens D L Jr (1997) ldquoVariable Density Grid-Based Sampling Designs forContinuous Spatial Populationsrdquo Environmetrics 8 167ndash195

Stevens D L Jr and Olsen A R (1999) ldquoSpatially Restricted Surveys OverTime for Aquatic Resourcesrdquo Journal of Agricultural Biological and Envi-ronmental Statistics 4 415ndash428

(2000) ldquoSpatially-Restricted Random Sampling Designs for Design-Based and Model-Based Estimationrdquo in Accuracy 2000 Proceedings of the4th International Symposium on Spatial Accuracy Assessment in Natural Re-sources and Environmental Sciences Delft The Netherlands Delft Univer-sity Press pp 609ndash616

(2003) ldquoVariance Estimation for Spatially Balanced Samples of Envi-ronmental Resourcesrdquo Environmetrics 14 593ndash610

Strahler A N (1957) ldquoQuantitative Analysis of Watershed GeomorphologyrdquoTransactions of the American Geophysical Union 38 913ndash920

Thompson S K (1992) Sampling New York WileyUrquhart N S Overton W S and Birkes D S (1993) ldquoComparing

Sampling Designs for Monitoring Ecological Status and Trends Impact ofTemporal Patternsrdquo in Statistics for the Environment eds V Barnett andK F Turkman New York Wiley pp 71ndash86

USGS (1999) ldquoThe National Hydrography Datasetrdquo Fact Sheet 106-99 USGeological Survey

Wolter K (1985) Introduction to Variance Estimation New York Springer-Verlag

Wolter K M and Harter R M (1990) ldquoSample Maintenance Based on PeanoKeysrdquo in Proceedings of the 1989 International Symposium Analysis of Datain Time Ottawa Canada Statistics Canada pp 21ndash31

Yates F (1981) Sampling Methods for Censuses and Surveys (4th ed) Lon-don Grif n

Yates F and Grundy P M (1953) ldquoSelection Without Replacement FromWithin Strata With Probability Proportional to Sizerdquo Journal of the RoyalStatistical Society Ser B 15 253ndash261

Page 8: Spatially Balanced Sampling of Natural Resources · Spatially Balanced Sampling of Natural Resources DonL.STEVENSJr. and Anthony R. OLSEN The spatial distribution of a natural resource

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 269

Figure 4 The sup3 Ratio as a Function of Sample Size Based on 1000Replicate Samples From an Arti cial Finite Population Sample pointswere added one point at a time up to the maximum sample size of 50following reverse hierarchical order for the GRTS sample The sup3 ratio fora spatially stratied sample is also indicated on the plot

design and ordered the samples using the reverse hierarchicalorder As for the nite populationstudy we calculated the sup3 ra-tio as the points were added to the sample one at a time be-ginning with point number 10 The holes represent nontargetor access-denied elements that were a priori unknown Samplepoints that fell in the holes were discarded resulting in a vari-able number of sample points in the target domain As for thecomplete domain we ordered the points using reverse hierar-chical ordering and then calculated the sup3 ratio as the points wereadded one at a time Because the sample points that fall into thenontarget areas contribute to the sample point density but not tothe sample size the sup3 ratio was plotted versus point density

We used three different distributions of hole size constantlinearly increasing and exponentially increasing In each casethe holes comprise 20 of the domain area Figure 5 shows the

Figure 5 Void Patterns Used to Simulate InaccessiblePopulation El-ements

Figure 6 The sup3 Ratio as a Function of Point Density Based on 1000Replications of a Sample of Size 256 (mdashmdash continuous domain with novoids iexcl iexcl iexcliexcl exponentialy increasing polygon size - - - - - linearly in-creasing polygon size cent cent centmdashcent cent centmdash constant polygon size)

placementof the holes for each scenario and Figure 6 shows thesup3 ratio for all four scenarios no voids exponentiallyincreasinglinearly increasing and constant size

In every scenario the variance ratio is much less than 1Except for small sample sizes the ratio stays in the range of2 to 4 The gradual decrease as the sample size increases is dueto the decreasing impact of the boundaryas the sample size in-creases the proportion of polygons that intersect the boundarydecreases A similar effect is seen with the different inaccessi-bility scenarios even though the inaccessible area is constantthe scenarios with greater perimeter cause more increase invariance

4 STATISTICAL PROPERTIES OF GRTS DESIGN

41 Estimation

The GRTS design produces a sample with speci ed rst-order inclusion probabilities so that the HorvitzndashThompson(Horvitz and Thompson 1952) estimator or its continuouspop-ulation analog (Cordy 1993 Stevens 1997) can be applied toget estimates of population characteristics Thus for examplean estimate of the population total of a response z is given byOZT D

Psi2Rzsi=frac14si Stevens (1997) provided exact ex-

pressions for second-order inclusion functions for some specialcases of a GRTS These expressions can also be used to provideaccurate approximations for the general case Unfortunatelythe variance estimator based on using these approximations inthe usual HorvitzndashThompson (HT) or YatesndashGrundyndashSen (YGYates and Grundy 1953 Sen 1953) estimator tends to be unsta-ble The design achieves spatial balance by forcing the pairwiseinclusion probability to approach 0 as the distance between thepoints in the pair goes to 0 Even though the pairwise inclu-sion density is nonzero almost everywhere any moderate-sizedsample will nevertheless have one or more pairs of points thatare close together with a correspondinglysmall pairwise inclu-sion probability For both the HT and YG variance estimatorsthe pairwise inclusion probability appears as a divisor The cor-

270 Journal of the American Statistical Association March 2004

responding terms in either HT or YG variance estimators willtend to be large leading to instability of the variance estimator

Contrast-based estimators of the form OVCtr OZT DP

i wiy2i

where yi is a contrast of the form yi DP

k cikzsk withPk cik D 0 have been discussed by several authors (Yates

1981 Wolter 1985 Overton and Stehman 1993) For an RTSdesign Overton and Stehman also considered a ldquosmoothedrdquocontrast-based estimator of the form OVSMO OZT D

Pi wizi iexcl

zcurreni 2 where zcurren

i called the smoothed value for data point zi istaken as a weighted mean of a point plus its nearest neighborsin the tessellation

Stevens and Olsen (2003) proposed a contrast-based esti-mator for the GRTS design that bears some resemblance tothe Overton and Stehman smoothed estimator The single con-trast zi iexcl zcurren

i 2 is replaced with an average of several con-trasts over a local neighborhood analogous to a tessellationcell and its nearest neighbors in the RTS design A heuris-tic justi cation for this approach stems from the observationthat the inverse images of the unit-probability intervals on theline form a random spatial strati cation of the population do-main The GRTS design conditional on the strati cation isa one-sample-per-stratumspatially strati ed sample Recall thatOZT D

Psi2Rzsi=frac14si where zsi is a sample from the

ith random stratum The selections within strata are condition-ally independent of one another so that

V OZT DX

si2R

E

microV

sup3zsi

frac14si

shyshyshystrata

acutepara

The proposed variance estimator approximates E[V zsi=

frac14sij strata] by averaging several contrasts over a localneighborhoodof each sample point The estimator is

OVNBH OZT DX

si2R

X

sj 2Dsi

wij

sup3zsj

frac14sj iexcl

X

sk2Dsi

wikzsk

frac14sk

acute2

where Dsi is a local neighborhood of the si The weightswij are chosen to re ect the behavior of the pairwise inclu-sion function for GRTS and are constrained so that

Pi wij DP

j wij D 1 Stimulation studies with a variety of scenarioshave shown the proposed estimator to be stable and nearly unbi-ased Applications with real data have consistently shown thatour local neighborhoodvariance estimator produces smaller es-timates than the HorvitzndashThompson estimator when IRS is as-sumed to approximate for the joint inclusion probabilities

42 Inverse Sampling

The reverse hierarchical ordering provides the ability to doinverse sampling that is to sample until a given number ofsamples are obtained in the target population The true inclu-sion probability in this case depends on the spatial con gura-tion of the target populationwhich may be unknownHoweverone can compute an inclusion probability that is conditional onthe achieved sample size in the target population being xedFor example suppose we want M sample points in our do-main R We do not know the exact boundaries of R but areable to enclose R in a larger set Rcurren We select a sample of sizeMcurren gt M from Rcurren using an inclusion density frac14 curren scaled so that

Table 2 Domain Area Estimates Using ConditionalInclusion Probability

Targetsamplesize

Mean estimated domain area

Exponential Constant Linear

25 8000979 7969819 801058950 7995775 7979406 8005739

100 7994983 7980543 8002237150 7994777 7997587 7995685

RRcurren frac14currens dAacutes D M curren The inclusion density for the k-point

reverse hierarchical ordered sample is frac14currenk s D k=Mcurrenfrac14currens

Using the inclusion density frac14 currenk the expected number of sam-

ples in R is

Mk DZ

R

frac14currenk s dAacutes D

Z

RcurrenIRsfrac14 curren

k s dAacutes

We cannot compute Mk because the boundaryof R is unknownbut an estimate is

OMk DX

i

IR sifrac14currenk si

frac14currenk si

DX

i

IRsi

We pick Qk so that OMk D M and base inference on frac14currenQk Thus

for example an estimate of the unknown extent of R is j ORj DPiIRsi=frac14curren

Qksi

We illustrate this using the same inaccessibility scenarios asfor the spatial balance simulation Results are summarized inTable 2 In each case the true area of R is 8 so that the esti-mator using frac14curren

Qk is either unbiased or nearly so

43 Statistical Ef ciency

As discussed in the Introduction sampling designs withsome degree of spatial regularity for example systematic grid-based or spatially strati ed designs tend to be more ef cientfor sampling natural resources than designs with no spatialstructure The GRTS design takes the concept of spatial strat-i cation carries it to an extreme and gives it exibility androbustness The basis for these claims is that for the case ofan equiprobable sample of an areal resource over a continuousconnected domain a GRTS sample with size n D 4k is a spa-tially strati ed sample with one sample point per stratum Inthis case the strata are square grid cells with a randomly locatedorigin Generally the ef ciency of a spatially strati ed sampleincreases as the number of strata increases (samples per stratumdecreases) so maximal ef ciency is obtained for a one-point-per-stratum-design Thus in this restricted case the GRTS hasthe same ef ciency as the maximally ef cient spatial strati ca-tion

The spatial regularity simulation studies provide some in-sight into less restrictive cases First the ldquono-voidrdquo case ofthe continuous domain study shows that the spatial regularityis not seriously degraded for sample sizes that are not powersof 4 so that even for intermediate sample sizes the GRTS ef- ciency should be close to the ef ciency of maximal spatialstrati cation Second the ldquoholesrdquo cases show that for irreg-ularly shaped domains GRTS maintains spatial regularity Inthis case GRTS with n D 4k is again a one-point-per-stratum

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 271

design but the strata are no longer regular polygonsNeverthe-less GRTS should have the same ef ciency as maximal strati- cation

An example of circumstances where ef ciency is dif cultto evaluate is a nite population study with variable probabil-ity and irregular spatial density In these circumstances spatialstrata can be very dif cult to form and in fact it may be impos-sible to form strata with a xed number of samples per stratumA GRTS sample achieves the regularity of a one-sample-per-stratum strati cation and so should have the same ef ciency

The overwhelming advantage of a GRTS design is not that itis more ef cient than spatial strati cation but that it can be ap-plied in a straightforward manner in circumstances where spa-tial strati cation is dif cult All of the pathologies that occurin sampling natural populations (poor frame information inac-cessibility variable probability uneven spatial pattern missingdata and panel structures) can be easily accommodated withinthe GRTS design

5 EXAMPLE APPLICATION TO STREAMS

The Indiana Department of Environmental Management(IDEM) conducts water quality and biological assessments ofthe streams and rivers within Indiana For administrative pur-poses the state is divided into nine hydrologicbasins East ForkWhite River Basin West Fork White River Basin Upper IllinoisRiver Basin Great Miami River Basin Lower Wabash RiverBasin Patoka River Basin Upper Wabash River Basin GreatLakes Basin and Ohio River Basin All basins are assessedonce during a 5-year period typically two basins are completedeach year In 1996 IDEM initiated a monitoring strategy thatused probability survey designs for the selection of samplingsite locations We collaborated with them on the survey designIn 1997 a GRTS multidensity design was implemented for theEast Fork White River Basin and the Great Miami River BasinIn 1999 another GRTS multidensity design was implementedfor the Upper Illinois Basin and the Lower Wabash These de-signs will be used to illustrate the application of GRTS surveydesigns to a linear network

The target population for the studies consists of all streamsand rivers with perennially owing water A sample frameRiver Reach File Version 3 (RF3) for the target populationis available from the US Environmental Protection Agency(Horn and Grayman 1993) The RF3 includes attributes that en-able perennial streams and rivers to be identi ed but results inan overcoverage of the target population due to coding errorsIn addition Strahler order is available to classify streams andrivers into relative size categories (Strahler 1957) A headwaterstream is a Strahler rst-order stream two rst-order streamsjoining results in a second-order stream and so on Approx-imately 60 of the stream length in Indiana is rst order

Table 3 Sample Frame Stream and River Length by Basin andStrahler Order Category

Strahler order category length (km)

Basin Total length (km) 1 2 3 C

E Fork White 6802385 3833335 2189494 779556Great Miami 2270018 1501711 621039 147268L Wabash 7601418 4632484 1331228 1637706U Illinois 5606329 4559123 500188 547018

20 is second order 10 is third order and 10 is fourth andgreater (see Table 3) In 1997 IDEM determined that the sam-ple would be structured so that approximately an equal num-ber of sites would be in rst order second and third order andfourth C order for the East Fork White River and the Great Mi-ami River basins In 1999 the sample was modi ed to have anequal number of sites in rst second third and fourth C ordercategories for the Lower Wabash and Upper Illinois basins

The GRTS multidensity survey designs were applied In bothyears six multidensity categories were used (three Strahler or-der categories in each of two basins) Although four Strahlerorder categories were planned in 1999 the stream lengths as-sociated with the third and fourth C categories were approxi-mately equal so a single category that combined the samplesizes was used To account for frame errors landowner denialsand physically inaccessible stream sites a 100 oversamplewas incorporated in 1999 The intent was to have a minimumof 38 biological sites with eld data in 1999 this was not donein 1997 Table 4 summarizes the number of sites expected andactually evaluated as well as the number of nontarget targetnonresponse and sampled sites Almost all of the nonresponsesites are due to landowner denial In 1999 the sites were usedin reverse hierarchical order until the desired number of ac-tual eld sample sites was obtained The biological sites werea nested subsample of the water chemistry sites and were takenin reverse hierarchicalorder from the water chemistry sites Fig-ures 7ndash10 show the spatial pattern of the stream networks andthe GRTS sample sites for each of the four basins by Strahlerorder categories Although this is an example of a single real-izationof a multidensityGRTS design all realizationswill havea similar spatial pattern Prior to statistical analysis the initialinclusion densities are adjusted to account for use of oversam-ple sites by recalculating the inclusion densities by basin

Indiana determined two summary indices related to the eco-logical conditionof the streams and rivers the IBI score whichis a sh community index of biological integrity (Karr 1991)that assesses water quality using resident sh communities asa tool for monitoring the biological integrity of streams and theQHEI score which is a habitat index based on the Ohio Envi-ronmental Protection Agency qualitative habitat evaluation in-dex (see IDEM 2000 for detailed descriptions of these indices)

Table 4 Survey Design Sample Sizes for Basins Sampled in 1997 and 1999

Expected Evaluated Nontarget Target Nonresponse Water chemistry BiologicalBasin sample size sample size sites sites sites sites sites

E Fork White 60 60 5 55 9 35 34Great Miami 40 40 12 28 5 19 19L Wabash 128 91 11 80 9 71 39U Illinois 128 85 8 77 5 72 41

272 Journal of the American Statistical Association March 2004

Figure 7 East Fork White River Basin Sample Sites by Multidensity Categories

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 273

Figure 8 Great Miami River Basin Sample Sites by Multidensity Categories

274 Journal of the American Statistical Association March 2004

Figure 9 Upper Illinois River Basin Sample Sites by Multidensity Categories

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 275

Figure 10 Lower Wabash River Basin Sample Sites by Multidensity Categories

276 Journal of the American Statistical Association March 2004

Table 5 Population Estimates With IRS and Local Variance Estimates

Indicator IRS Local DifferenceSubpopulation score N sites Mean std err std err ()

L Wabash IBI 39 361 21 14 iexcl568U Illinois IBI 41 325 17 13 iexcl448E Fork White IBI 32 351 13 12 iexcl223Great Miami IBI 19 408 27 22 iexcl335L Wabash QHEI 39 556 23 16 iexcl522U Illinois QHEI 41 433 21 16 iexcl395E Fork White QHEI 34 543 21 15 iexcl459Great Miami QHEI 19 678 22 19 iexcl260

Table 5 summarizes the population estimates for IBI andQHEI scores for each of the four basins The associated stan-dard error estimates are based on the HorvitzndashThompson ratiovariance estimator assuming an independent random sampleand on the local neighborhoodvariance estimator described inSection 41 On average the neighborhood variance estimatoris 38 smaller than the IRS variance estimator Figure 11 il-lustrates the impact of the variance estimators on con denceintervals for cumulative distribution function estimates for theLower Wabash Basin

6 DISCUSSION

There are a number of designs that provide good disper-sion of sample points over a spatial domain When we appliedthese designs to large-scale environmental sampling programsit quickly became apparent that we needed a means (1) toaccommodate variable inclusion probability and (2) to adjustsample sizes dynamically These requirements are rooted inthe very fundamentals of environmental management The rstrequirement stems from the fact that an environmental re-source is rarely uniformly important in the objective of themonitoring there are always scienti c economic or politi-cal reasons for sampling some portions of a resource moreintensively than others Two features of environmental moni-toring programs drive the second requirement First these pro-grams tend to be long lived so that even if the objectives ofthe program remain unchangedthe ldquoimportantrdquosubpopulationschange necessitating a corresponding change in sampling in-tensity Second a high-quality sampling frame is often lack-ing for environmental resource populationsAs far as we knowthere is no other technique for spatial sampling that ldquobalancesrdquoover an intensity metric instead of a Euclidean distance metricor permits dynamic modi cation of sample intensity

Adaptive sampling (Thompson 1992 pp 261ndash319) is an-other way to modify sample intensity However there are somesigni cant differences between GRTS and adaptive sampling inthe way the modi cation is accomplished Adaptive samplingincreases the sampling intensity locally depending on the re-sponse observed at a sample point whereas the GRTS intensitychange is global

The GRTS rst-order inclusion probability (or density) canbe made proportional to an arbitrary positive auxiliary vari-able for example a signal from a remote sensing platformor a sample intensity that varies by geographical divisions orknown physical characteristics of the target populationIn somepoint and linear situations it may be desirable to have thesample be spatially balanced with respect to geographic spacerather than with respect to the population density This can be

Figure 11 Stream Network and Sample Site Spatial Patterns by Mul-tidensity Category for the Lower Wabash Basin (mdashmdash- CDF estimateiexcl iexcl iexcl iexcl iexcl 95 local condence limits cent cent cent cent cent cent cent cent cent95 IRS condencelimits)

achieved by making the inclusion probability inversely propor-tional to the population density Although the development ofGRTS has focused on applications in geographic space it canbe applied in other spaces For exampleone applicationde nedtwo-dimensionalspace by the rst two principal componentsofclimate variables and selected a GRTS sample of forest plots inthat space

The computational burden in hierarchical randomization canbe substantial However it needs to be carried out only to a res-olution suf cient to obtain no more than one sample point persubquadrant The actual point selection can be carried out bytreating the subquadrants as if they are elements of a nitepopulation selecting the M subquadrants to receive samplepoints and then selecting one population element at randomfrom among the elements contained within the selected sub-quadrants according to the probability speci ed by frac14

Reverse hierarchical ordering adds a feature that is im-mensely popular with eld practitioners namely the ability toldquoreplacerdquo samples that are lost due to being nontarget or inac-cessible Moreover we can replace the samples in such a wayas to achieve good spatial balance over the population that isactually sampleable even when sampleability cannot be deter-mined prior to sample selectionOf course this feature does noteliminate the nonresponse or the bias of an inference to the in-accessible population It does however allow investigators toobtain the maximum number of samples that their budget willpermit them to analyze

Reverse hierarchical ordering has other uses as well One isto generate interpenetrating subsamples (Mahalanobis 1946)For example 10 interpenetrating subsamples from a samplesize of 100 can be obtained simply by taking consecutive sub-sets of 10 from the reverse hierarchical ordering Each subsethas the same properties as the complete design Consecutivesubsets can also be used to de ne panels of sites for applica-tion in surveys over time for example sampling with partial re-placement (Patterson 1950 Kish 1987 Urquhart Overton andBirkes 1993)

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 277

APPENDIX A PROOF OF LEMMA

Lemma Let f I2 I be a 1ndash1 quadrant-recursive function andlet s raquo UI2 Then limjplusmnj0 Efjf s iexcl f s C plusmnjg D 0

Proof If for some n gt 0 s and s C plusmn are in the same subquad-rant Qn

jk then f s and f s C plusmn are in the same interval J nm so

that jf s iexcl f s C plusmnj middot 1=4n The probability that s and s C plusmn arein the same subquadrant is the same as the probability of the ori-gin and plusmn D plusmnx plusmny being in the same cell of a randomly locatedgrid with cells congruent to Qn

jk For plusmnx plusmny middot 1=2n that probability

is equal to jQn0 Qnplusmnj=jQn0j D 1 iexcl 2nplusmnx C plusmny C 4nplusmnxplusmny where Qnx denotes a polygon congruent to Qn

jk centered on x

For Ds plusmn D jf s iexcl f s C plusmnj then we have that P D middot 1=4n cedil1 iexcl 2nplusmnx C plusmny C 4nplusmnxplusmny Thus the distribution function FD of D isbounded below by

FD u cedil

8gtlt

gt

0 u middot 14n

1 iexcl 2nplusmnx C plusmny C 4nplusmnxplusmny u gt1

4n

Because D is positive and bounded above by 1

E[Dplusmn] D 1 iexclZ 1

0FD udu

middot 1 iexclraquo

0

4nC

sup31 iexcl 1

4n

acuteiexcl 2nplusmnx C plusmny C 4nplusmnxplusmny

frac14

For xed n we have that

limjplusmnj0

E[Dplusmn] middot 14n

but this holds for all n so that

limjplusmnj0

E[Dplusmn] D 0

APPENDIX B PROOF THAT THE PROBABILITYINCLUSION FUNCTION EQUALS THE

TARGET INTENSITY FUNCTION

We need the measure space XB Aacute where X is the unit inter-val I D 0 1] or the unit square I2 D 0 1] pound 01] and the rele-vant frac34 elds are BI and BI2 the frac34 elds of the Borel subsetsof I and I2 respectively For each of the three types of populationswe de ne a measure Aacute of population size We use the same symbolfor all three cases but the speci cs vary from case to case For a -nite population we take Aacute to be counting measure restricted to R sothat for any subset B 2 BI2 AacuteB is the number of population ele-ments in B R For linear populations we take AacuteB to be the lengthof the linear population contained within B Clearly Aacute is nonnega-tive countably additive de ned for all Borel sets and Aacute D 0 soAacute is a measure Finally for areal populations we take AacuteB to be theLebesgue measure of B R

We begin by randomly translating the image of R in the unit squareby adding independent U0 1=2 offsets to the xy coordinates Thisrandom translation plays the same role as random grid location doesin an RTS design namely it guarantees that pairwise inclusion prob-abilities are nonzero In particular in this case it ensures that any pairof points in R has a nonzero chance of being mapped into differentquadrants

Let frac14s be an inclusion intensity function that is a function thatspeci es the target number of samples per unit measure We assumethat any linear population consists of a nite number m of smoothrecti able curves R D

SmiD1fdegit D xi t yi t jt 2 [ai bi ]g with

xi and yi continuous and differentiable on [ai bi ] We set frac14s equalto the target number of samples per unit length at s for s 2 L and

equal to zero elsewhere For example if the linear population werea stream network frac14s would specify the desired number of samplesper kilometer of stream at the point s Finally an areal population isa nite collection of closed polygons In this case frac14s speci es thetarget intensity as number of samples per unit area Note that for one-and two-dimensional resources frac14s could be a continuous smoothlyvarying function Formally we require frac14s to be bounded and mea-surable strictly positive on R and zero elsewhere and scaled so thatM D

RR frac14s dAacutes From these de nitions of frac14cent and Aacutecent it follows

that wB DR

B frac14sdAacutes is a measure and that wB is the targetnumber of samples in B In particular M D wI2 is the target samplesize In the following discussion we assume that M is an integer thenoninteger case is a simple extension

Let f cent be a quadrant-recursive function that maps I2 into I Be-cause BI can be generated by sets of the form J n

m and BI2 can begenerated by sets of the form Qn

j k both f and f iexcl1 are measurable

Because f is measurable f iexcl1B is measurable for B 2 BI so thatQF x D

Rf iexcl10x] frac14sdAacutes exists In fact QF is a distribution func-

tion that is nonnegative increasing and right continuous For linearand areal resources QF is a continuous increasing function but for -nite resourcepopulations QF is a step function with jumps at the imagesof populationelements We can modify QF to obtain continuity in the -nite case via linear interpolation that is let xi i D 1 N be the or-dered jump points of QF set x0 D 0 xNC1 D 1 and for xi lt x middot xiC1set F x D QF x C QFxiC1 iexcl QFxi =xiC1 iexcl xi x iexcl xi If we setF D QF for the linear and areal case then in every cases we have thatF is a continuous distribution function with range 0M]

In the nite case Fiexcl1 is single-valued so that Gy D minxi jF iexcl1y middot xi is well de ned In the linear and areal cases F iexcl1

may not be single-valued Points that are in the unit square but notin R lead to ats in F that correspond to regions in the unit squarewith frac14s D 0 However Fiexcl1y always will be closed and boundedso that Gy D minfxjx 2 Fiexcl1yg is well de ned In all cases theintensity function frac14 is positive at s D f iexcl1Gy that is there isa population element at s Thus f iexcl1 plusmn G maps 0M] onto the tar-get population that is f iexcl1 plusmn G associates every point in 0 M] witha unique element in the population

It follows that selecting a sample from 0 M] also selects pop-ulation elements via the mapping f iexcl1 plusmn G To get a sample withan inclusion function equal to the target inclusion density we selecta sample from 0M] by splitting the range into M unit-length in-tervals 01] 1 2] M iexcl 1M] and picking one point in eachinterval Because of hierarchical randomizationwe gain no additionalldquorandomnessrdquo by picking the points independently so we use system-atic sampling with a random start and a unit-length selection intervalThe selection procedure de nes an inclusion probability density func-tion on 0M] with a correspondingmeasure PM cent Note that PM co-incides with Lebesgue measure on 0 M] in particular the measureof a subinterval of 0 M] is its length We induce a measure P1 on I

via P1B DRGiexcl1B dP M and in turn induce a measure P2 on I2

via P2B DR

f iexcl1B dP 1 The measure P2 is an inclusion probability

measure on I2 and P2B D wB so the sample selection methoddoes give an inclusion probability function equal to the target sampleintensity function

[Received August 2002 Revised September 2003]

REFERENCES

Bellhouse D R (1977) ldquoSome Optimal Designs for Sampling in Two Dimen-sionsrdquo Biometrika 64 605ndash611

Bickford C A Mayer C E and Ware K D (1963) ldquoAn Ef cient Sam-pling Design for Forest Inventory The Northeast Forest Resurveyrdquo Journalof Forestry 61 826ndash833

278 Journal of the American Statistical Association March 2004

Breidt F J (1995) ldquoMarkov Chain Designs for One-per-Stratum SamplingrdquoSurvey Methodology 21 63ndash70

Brewer K R W and Hanif M (1983) Sampling With Unequal ProbabilitiesNew York Springer-Verlag

Cochran W G (1946) ldquoRelative Accuracy of Systematic and Strati ed Ran-dom Samples for a Certain Class of Populationsrdquo The Annals of Mathemati-cal Statistics 17 164ndash177

Cordy C (1993) ldquoAn Extension of the HorvitzndashThompson Theorem to PointSampling From a Continuous Universerdquo Probability and Statistics Letters18 353ndash362

Cotter J and Nealon J (1987) ldquoArea Frame Design for Agricultural SurveysrdquoUS Department of Agriculture National Agricultural Statistics Service Re-search and Applications Division Area Frame Section

Dalenius T Haacutejek J and Zubrzycki S (1961) ldquoOn Plane Sampling and Re-lated Geometrical Problemsrdquo in Proceedings of the 4th Berkeley Symposiumon Probability and Mathematical Statistics 1 125ndash150

Das A C (1950) ldquoTwo-Dimensional Systematic Sampling and the AssociatedStrati ed and Random Samplingrdquo Sankhya 10 95ndash108

Gibson L and Lucas D (1982) ldquoSpatial Data Processing Using BalancedTernaryrdquo in Proceedings of the IEEE Computer Society Conference on Pat-tern Recognition and Image Processing Silver Springs MD IEEE ComputerSociety Press

Gilbert R O (1987) Statistical Methods for Environmental Pollution Moni-toring New York Van Nostrand Reinhold

Hausdorff F (1957) Set Theory New York ChelseaHazard J W and Law B E (1989) Forest Survey Methods Used in the USDA

Forest Service EPA6003-89065 Corvallis Oregon US EnvironmentalProtection Agency Of ce of Research and Development Environmental Re-search Laboratory

Horn C R and Grayman W M (1993) ldquoWater-Quality Modeling With EPARiver Reach File Systemrdquo Journal of Water Resources Planning and Man-agement 119 262ndash274

Horvitz D G and Thompson D J (1952) ldquoA Generalization of SamplingWithout Replacement From a Finite Universerdquo Journal of the American Sta-tistical Association 47 663ndash685

Iachan R (1985) ldquoPlane Samplingrdquo Statistics and Probability Letters 50151ndash159

IDEM (2000) ldquoIndiana Water Quality Report 2000rdquo Report IDEM34020012000 Indiana Department of Environmental Management Of ce of Wa-ter Management Indianapolis Indiana

Insightful Corporation (2002) ldquoS-PLUS 6 for Windows Language ReferencerdquoInsightful Corporation Seattle WA

Karr J R (1991) ldquoBiological Integrity A Long Neglected Aspect of WaterResource Managementrdquo Ecological Applications 1 66ndash84

Kish L (1987) Statistical Design for Research New York WileyMahalanobis P C (1946) ldquoRecent Experiments in Statistical Sampling in

the Indian Statistical Instituterdquo Journal of the Royal Statistical Society 109325ndash370

Mark D M (1990) ldquoNeighbor-Based Properties of Some Orderings of Two-Dimensional Spacerdquo Geographical Analysis 2 145ndash157

Mateacutern B (1960) Spatial Variation Stockholm Sweden Meddelanden fraringnStatens Skogsforskningsinstitut

Messer J J Arsiss C W Baker J R Drouseacute S K Eshleman K NKaufmann P R Linthurst R A Omernik J M Overton W S Sale M JSchonbrod R D Stambaugh S M and Tuschall J R Jr (1986) Na-tional Surface Water Survey National Stream Survey Phase I-Pilot SurveyEPA-6004-86026 Washington DC US Environmental ProtectionAgency

MunhollandP L and Borkowski J J (1996) ldquoSimple Latin Square SamplingC 1 A Spatial Design Using Quadratsrdquo Biometrics 52 125ndash136

Olea R A (1984) ldquoSampling Design Optimization for Spatial FunctionsrdquoMathematical Geology 16 369ndash392

Overton W S and Stehman S V (1993) ldquoProperties of Designs for SamplingContinuous Spatial Resources From a Triangular Gridrdquo Communications inStatistics Part AmdashTheory and Methods 22 2641ndash2660

Patterson H D (1950) ldquoSampling on Successive Occasions With Partial Re-placement of Unitsrdquo Journal of the Royal Statistical Society Ser B 12241ndash255

Peano G (1890) ldquoSur Une Courbe Qui Remplit Toute Une Aire Planerdquo Math-ematische Annalen 36 157ndash160

Quenouille M H (1949) ldquoProblems in Plane Samplingrdquo The Annals of Math-ematical Statistics 20 335ndash375

Saalfeld A (1991) ldquoConstruction of Spatially Articulated List Frames forHousehold Surveysrdquo in Proceedings of Statistics Canada Symposium 91Spatial Issues in Statistics Ottawa Canada Statistics Canada pp 41ndash53

Sen A R (1953) ldquoOn the Estimate of the Variance in Sampling With Vary-ing Probabilitiesrdquo Journal of the Indian Society of Agricultural Statistics 7119ndash127

Simmons G F (1963) Introduction to Topology and Modern Analysis NewYork McGrawndashHill

Stehman S V and Overton W S (1994) ldquoEnvironmental Sampling and Mon-itoringrdquo in Handbook of Statistics Vol 12 eds G P Patil and C R RaoAmsterdam The Netherlands Elsevier Science pp 263ndash305

Stevens D L Jr (1997) ldquoVariable Density Grid-Based Sampling Designs forContinuous Spatial Populationsrdquo Environmetrics 8 167ndash195

Stevens D L Jr and Olsen A R (1999) ldquoSpatially Restricted Surveys OverTime for Aquatic Resourcesrdquo Journal of Agricultural Biological and Envi-ronmental Statistics 4 415ndash428

(2000) ldquoSpatially-Restricted Random Sampling Designs for Design-Based and Model-Based Estimationrdquo in Accuracy 2000 Proceedings of the4th International Symposium on Spatial Accuracy Assessment in Natural Re-sources and Environmental Sciences Delft The Netherlands Delft Univer-sity Press pp 609ndash616

(2003) ldquoVariance Estimation for Spatially Balanced Samples of Envi-ronmental Resourcesrdquo Environmetrics 14 593ndash610

Strahler A N (1957) ldquoQuantitative Analysis of Watershed GeomorphologyrdquoTransactions of the American Geophysical Union 38 913ndash920

Thompson S K (1992) Sampling New York WileyUrquhart N S Overton W S and Birkes D S (1993) ldquoComparing

Sampling Designs for Monitoring Ecological Status and Trends Impact ofTemporal Patternsrdquo in Statistics for the Environment eds V Barnett andK F Turkman New York Wiley pp 71ndash86

USGS (1999) ldquoThe National Hydrography Datasetrdquo Fact Sheet 106-99 USGeological Survey

Wolter K (1985) Introduction to Variance Estimation New York Springer-Verlag

Wolter K M and Harter R M (1990) ldquoSample Maintenance Based on PeanoKeysrdquo in Proceedings of the 1989 International Symposium Analysis of Datain Time Ottawa Canada Statistics Canada pp 21ndash31

Yates F (1981) Sampling Methods for Censuses and Surveys (4th ed) Lon-don Grif n

Yates F and Grundy P M (1953) ldquoSelection Without Replacement FromWithin Strata With Probability Proportional to Sizerdquo Journal of the RoyalStatistical Society Ser B 15 253ndash261

Page 9: Spatially Balanced Sampling of Natural Resources · Spatially Balanced Sampling of Natural Resources DonL.STEVENSJr. and Anthony R. OLSEN The spatial distribution of a natural resource

270 Journal of the American Statistical Association March 2004

responding terms in either HT or YG variance estimators willtend to be large leading to instability of the variance estimator

Contrast-based estimators of the form OVCtr OZT DP

i wiy2i

where yi is a contrast of the form yi DP

k cikzsk withPk cik D 0 have been discussed by several authors (Yates

1981 Wolter 1985 Overton and Stehman 1993) For an RTSdesign Overton and Stehman also considered a ldquosmoothedrdquocontrast-based estimator of the form OVSMO OZT D

Pi wizi iexcl

zcurreni 2 where zcurren

i called the smoothed value for data point zi istaken as a weighted mean of a point plus its nearest neighborsin the tessellation

Stevens and Olsen (2003) proposed a contrast-based esti-mator for the GRTS design that bears some resemblance tothe Overton and Stehman smoothed estimator The single con-trast zi iexcl zcurren

i 2 is replaced with an average of several con-trasts over a local neighborhood analogous to a tessellationcell and its nearest neighbors in the RTS design A heuris-tic justi cation for this approach stems from the observationthat the inverse images of the unit-probability intervals on theline form a random spatial strati cation of the population do-main The GRTS design conditional on the strati cation isa one-sample-per-stratumspatially strati ed sample Recall thatOZT D

Psi2Rzsi=frac14si where zsi is a sample from the

ith random stratum The selections within strata are condition-ally independent of one another so that

V OZT DX

si2R

E

microV

sup3zsi

frac14si

shyshyshystrata

acutepara

The proposed variance estimator approximates E[V zsi=

frac14sij strata] by averaging several contrasts over a localneighborhoodof each sample point The estimator is

OVNBH OZT DX

si2R

X

sj 2Dsi

wij

sup3zsj

frac14sj iexcl

X

sk2Dsi

wikzsk

frac14sk

acute2

where Dsi is a local neighborhood of the si The weightswij are chosen to re ect the behavior of the pairwise inclu-sion function for GRTS and are constrained so that

Pi wij DP

j wij D 1 Stimulation studies with a variety of scenarioshave shown the proposed estimator to be stable and nearly unbi-ased Applications with real data have consistently shown thatour local neighborhoodvariance estimator produces smaller es-timates than the HorvitzndashThompson estimator when IRS is as-sumed to approximate for the joint inclusion probabilities

42 Inverse Sampling

The reverse hierarchical ordering provides the ability to doinverse sampling that is to sample until a given number ofsamples are obtained in the target population The true inclu-sion probability in this case depends on the spatial con gura-tion of the target populationwhich may be unknownHoweverone can compute an inclusion probability that is conditional onthe achieved sample size in the target population being xedFor example suppose we want M sample points in our do-main R We do not know the exact boundaries of R but areable to enclose R in a larger set Rcurren We select a sample of sizeMcurren gt M from Rcurren using an inclusion density frac14 curren scaled so that

Table 2 Domain Area Estimates Using ConditionalInclusion Probability

Targetsamplesize

Mean estimated domain area

Exponential Constant Linear

25 8000979 7969819 801058950 7995775 7979406 8005739

100 7994983 7980543 8002237150 7994777 7997587 7995685

RRcurren frac14currens dAacutes D M curren The inclusion density for the k-point

reverse hierarchical ordered sample is frac14currenk s D k=Mcurrenfrac14currens

Using the inclusion density frac14 currenk the expected number of sam-

ples in R is

Mk DZ

R

frac14currenk s dAacutes D

Z

RcurrenIRsfrac14 curren

k s dAacutes

We cannot compute Mk because the boundaryof R is unknownbut an estimate is

OMk DX

i

IR sifrac14currenk si

frac14currenk si

DX

i

IRsi

We pick Qk so that OMk D M and base inference on frac14currenQk Thus

for example an estimate of the unknown extent of R is j ORj DPiIRsi=frac14curren

Qksi

We illustrate this using the same inaccessibility scenarios asfor the spatial balance simulation Results are summarized inTable 2 In each case the true area of R is 8 so that the esti-mator using frac14curren

Qk is either unbiased or nearly so

43 Statistical Ef ciency

As discussed in the Introduction sampling designs withsome degree of spatial regularity for example systematic grid-based or spatially strati ed designs tend to be more ef cientfor sampling natural resources than designs with no spatialstructure The GRTS design takes the concept of spatial strat-i cation carries it to an extreme and gives it exibility androbustness The basis for these claims is that for the case ofan equiprobable sample of an areal resource over a continuousconnected domain a GRTS sample with size n D 4k is a spa-tially strati ed sample with one sample point per stratum Inthis case the strata are square grid cells with a randomly locatedorigin Generally the ef ciency of a spatially strati ed sampleincreases as the number of strata increases (samples per stratumdecreases) so maximal ef ciency is obtained for a one-point-per-stratum-design Thus in this restricted case the GRTS hasthe same ef ciency as the maximally ef cient spatial strati ca-tion

The spatial regularity simulation studies provide some in-sight into less restrictive cases First the ldquono-voidrdquo case ofthe continuous domain study shows that the spatial regularityis not seriously degraded for sample sizes that are not powersof 4 so that even for intermediate sample sizes the GRTS ef- ciency should be close to the ef ciency of maximal spatialstrati cation Second the ldquoholesrdquo cases show that for irreg-ularly shaped domains GRTS maintains spatial regularity Inthis case GRTS with n D 4k is again a one-point-per-stratum

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 271

design but the strata are no longer regular polygonsNeverthe-less GRTS should have the same ef ciency as maximal strati- cation

An example of circumstances where ef ciency is dif cultto evaluate is a nite population study with variable probabil-ity and irregular spatial density In these circumstances spatialstrata can be very dif cult to form and in fact it may be impos-sible to form strata with a xed number of samples per stratumA GRTS sample achieves the regularity of a one-sample-per-stratum strati cation and so should have the same ef ciency

The overwhelming advantage of a GRTS design is not that itis more ef cient than spatial strati cation but that it can be ap-plied in a straightforward manner in circumstances where spa-tial strati cation is dif cult All of the pathologies that occurin sampling natural populations (poor frame information inac-cessibility variable probability uneven spatial pattern missingdata and panel structures) can be easily accommodated withinthe GRTS design

5 EXAMPLE APPLICATION TO STREAMS

The Indiana Department of Environmental Management(IDEM) conducts water quality and biological assessments ofthe streams and rivers within Indiana For administrative pur-poses the state is divided into nine hydrologicbasins East ForkWhite River Basin West Fork White River Basin Upper IllinoisRiver Basin Great Miami River Basin Lower Wabash RiverBasin Patoka River Basin Upper Wabash River Basin GreatLakes Basin and Ohio River Basin All basins are assessedonce during a 5-year period typically two basins are completedeach year In 1996 IDEM initiated a monitoring strategy thatused probability survey designs for the selection of samplingsite locations We collaborated with them on the survey designIn 1997 a GRTS multidensity design was implemented for theEast Fork White River Basin and the Great Miami River BasinIn 1999 another GRTS multidensity design was implementedfor the Upper Illinois Basin and the Lower Wabash These de-signs will be used to illustrate the application of GRTS surveydesigns to a linear network

The target population for the studies consists of all streamsand rivers with perennially owing water A sample frameRiver Reach File Version 3 (RF3) for the target populationis available from the US Environmental Protection Agency(Horn and Grayman 1993) The RF3 includes attributes that en-able perennial streams and rivers to be identi ed but results inan overcoverage of the target population due to coding errorsIn addition Strahler order is available to classify streams andrivers into relative size categories (Strahler 1957) A headwaterstream is a Strahler rst-order stream two rst-order streamsjoining results in a second-order stream and so on Approx-imately 60 of the stream length in Indiana is rst order

Table 3 Sample Frame Stream and River Length by Basin andStrahler Order Category

Strahler order category length (km)

Basin Total length (km) 1 2 3 C

E Fork White 6802385 3833335 2189494 779556Great Miami 2270018 1501711 621039 147268L Wabash 7601418 4632484 1331228 1637706U Illinois 5606329 4559123 500188 547018

20 is second order 10 is third order and 10 is fourth andgreater (see Table 3) In 1997 IDEM determined that the sam-ple would be structured so that approximately an equal num-ber of sites would be in rst order second and third order andfourth C order for the East Fork White River and the Great Mi-ami River basins In 1999 the sample was modi ed to have anequal number of sites in rst second third and fourth C ordercategories for the Lower Wabash and Upper Illinois basins

The GRTS multidensity survey designs were applied In bothyears six multidensity categories were used (three Strahler or-der categories in each of two basins) Although four Strahlerorder categories were planned in 1999 the stream lengths as-sociated with the third and fourth C categories were approxi-mately equal so a single category that combined the samplesizes was used To account for frame errors landowner denialsand physically inaccessible stream sites a 100 oversamplewas incorporated in 1999 The intent was to have a minimumof 38 biological sites with eld data in 1999 this was not donein 1997 Table 4 summarizes the number of sites expected andactually evaluated as well as the number of nontarget targetnonresponse and sampled sites Almost all of the nonresponsesites are due to landowner denial In 1999 the sites were usedin reverse hierarchical order until the desired number of ac-tual eld sample sites was obtained The biological sites werea nested subsample of the water chemistry sites and were takenin reverse hierarchicalorder from the water chemistry sites Fig-ures 7ndash10 show the spatial pattern of the stream networks andthe GRTS sample sites for each of the four basins by Strahlerorder categories Although this is an example of a single real-izationof a multidensityGRTS design all realizationswill havea similar spatial pattern Prior to statistical analysis the initialinclusion densities are adjusted to account for use of oversam-ple sites by recalculating the inclusion densities by basin

Indiana determined two summary indices related to the eco-logical conditionof the streams and rivers the IBI score whichis a sh community index of biological integrity (Karr 1991)that assesses water quality using resident sh communities asa tool for monitoring the biological integrity of streams and theQHEI score which is a habitat index based on the Ohio Envi-ronmental Protection Agency qualitative habitat evaluation in-dex (see IDEM 2000 for detailed descriptions of these indices)

Table 4 Survey Design Sample Sizes for Basins Sampled in 1997 and 1999

Expected Evaluated Nontarget Target Nonresponse Water chemistry BiologicalBasin sample size sample size sites sites sites sites sites

E Fork White 60 60 5 55 9 35 34Great Miami 40 40 12 28 5 19 19L Wabash 128 91 11 80 9 71 39U Illinois 128 85 8 77 5 72 41

272 Journal of the American Statistical Association March 2004

Figure 7 East Fork White River Basin Sample Sites by Multidensity Categories

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 273

Figure 8 Great Miami River Basin Sample Sites by Multidensity Categories

274 Journal of the American Statistical Association March 2004

Figure 9 Upper Illinois River Basin Sample Sites by Multidensity Categories

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 275

Figure 10 Lower Wabash River Basin Sample Sites by Multidensity Categories

276 Journal of the American Statistical Association March 2004

Table 5 Population Estimates With IRS and Local Variance Estimates

Indicator IRS Local DifferenceSubpopulation score N sites Mean std err std err ()

L Wabash IBI 39 361 21 14 iexcl568U Illinois IBI 41 325 17 13 iexcl448E Fork White IBI 32 351 13 12 iexcl223Great Miami IBI 19 408 27 22 iexcl335L Wabash QHEI 39 556 23 16 iexcl522U Illinois QHEI 41 433 21 16 iexcl395E Fork White QHEI 34 543 21 15 iexcl459Great Miami QHEI 19 678 22 19 iexcl260

Table 5 summarizes the population estimates for IBI andQHEI scores for each of the four basins The associated stan-dard error estimates are based on the HorvitzndashThompson ratiovariance estimator assuming an independent random sampleand on the local neighborhoodvariance estimator described inSection 41 On average the neighborhood variance estimatoris 38 smaller than the IRS variance estimator Figure 11 il-lustrates the impact of the variance estimators on con denceintervals for cumulative distribution function estimates for theLower Wabash Basin

6 DISCUSSION

There are a number of designs that provide good disper-sion of sample points over a spatial domain When we appliedthese designs to large-scale environmental sampling programsit quickly became apparent that we needed a means (1) toaccommodate variable inclusion probability and (2) to adjustsample sizes dynamically These requirements are rooted inthe very fundamentals of environmental management The rstrequirement stems from the fact that an environmental re-source is rarely uniformly important in the objective of themonitoring there are always scienti c economic or politi-cal reasons for sampling some portions of a resource moreintensively than others Two features of environmental moni-toring programs drive the second requirement First these pro-grams tend to be long lived so that even if the objectives ofthe program remain unchangedthe ldquoimportantrdquosubpopulationschange necessitating a corresponding change in sampling in-tensity Second a high-quality sampling frame is often lack-ing for environmental resource populationsAs far as we knowthere is no other technique for spatial sampling that ldquobalancesrdquoover an intensity metric instead of a Euclidean distance metricor permits dynamic modi cation of sample intensity

Adaptive sampling (Thompson 1992 pp 261ndash319) is an-other way to modify sample intensity However there are somesigni cant differences between GRTS and adaptive sampling inthe way the modi cation is accomplished Adaptive samplingincreases the sampling intensity locally depending on the re-sponse observed at a sample point whereas the GRTS intensitychange is global

The GRTS rst-order inclusion probability (or density) canbe made proportional to an arbitrary positive auxiliary vari-able for example a signal from a remote sensing platformor a sample intensity that varies by geographical divisions orknown physical characteristics of the target populationIn somepoint and linear situations it may be desirable to have thesample be spatially balanced with respect to geographic spacerather than with respect to the population density This can be

Figure 11 Stream Network and Sample Site Spatial Patterns by Mul-tidensity Category for the Lower Wabash Basin (mdashmdash- CDF estimateiexcl iexcl iexcl iexcl iexcl 95 local condence limits cent cent cent cent cent cent cent cent cent95 IRS condencelimits)

achieved by making the inclusion probability inversely propor-tional to the population density Although the development ofGRTS has focused on applications in geographic space it canbe applied in other spaces For exampleone applicationde nedtwo-dimensionalspace by the rst two principal componentsofclimate variables and selected a GRTS sample of forest plots inthat space

The computational burden in hierarchical randomization canbe substantial However it needs to be carried out only to a res-olution suf cient to obtain no more than one sample point persubquadrant The actual point selection can be carried out bytreating the subquadrants as if they are elements of a nitepopulation selecting the M subquadrants to receive samplepoints and then selecting one population element at randomfrom among the elements contained within the selected sub-quadrants according to the probability speci ed by frac14

Reverse hierarchical ordering adds a feature that is im-mensely popular with eld practitioners namely the ability toldquoreplacerdquo samples that are lost due to being nontarget or inac-cessible Moreover we can replace the samples in such a wayas to achieve good spatial balance over the population that isactually sampleable even when sampleability cannot be deter-mined prior to sample selectionOf course this feature does noteliminate the nonresponse or the bias of an inference to the in-accessible population It does however allow investigators toobtain the maximum number of samples that their budget willpermit them to analyze

Reverse hierarchical ordering has other uses as well One isto generate interpenetrating subsamples (Mahalanobis 1946)For example 10 interpenetrating subsamples from a samplesize of 100 can be obtained simply by taking consecutive sub-sets of 10 from the reverse hierarchical ordering Each subsethas the same properties as the complete design Consecutivesubsets can also be used to de ne panels of sites for applica-tion in surveys over time for example sampling with partial re-placement (Patterson 1950 Kish 1987 Urquhart Overton andBirkes 1993)

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 277

APPENDIX A PROOF OF LEMMA

Lemma Let f I2 I be a 1ndash1 quadrant-recursive function andlet s raquo UI2 Then limjplusmnj0 Efjf s iexcl f s C plusmnjg D 0

Proof If for some n gt 0 s and s C plusmn are in the same subquad-rant Qn

jk then f s and f s C plusmn are in the same interval J nm so

that jf s iexcl f s C plusmnj middot 1=4n The probability that s and s C plusmn arein the same subquadrant is the same as the probability of the ori-gin and plusmn D plusmnx plusmny being in the same cell of a randomly locatedgrid with cells congruent to Qn

jk For plusmnx plusmny middot 1=2n that probability

is equal to jQn0 Qnplusmnj=jQn0j D 1 iexcl 2nplusmnx C plusmny C 4nplusmnxplusmny where Qnx denotes a polygon congruent to Qn

jk centered on x

For Ds plusmn D jf s iexcl f s C plusmnj then we have that P D middot 1=4n cedil1 iexcl 2nplusmnx C plusmny C 4nplusmnxplusmny Thus the distribution function FD of D isbounded below by

FD u cedil

8gtlt

gt

0 u middot 14n

1 iexcl 2nplusmnx C plusmny C 4nplusmnxplusmny u gt1

4n

Because D is positive and bounded above by 1

E[Dplusmn] D 1 iexclZ 1

0FD udu

middot 1 iexclraquo

0

4nC

sup31 iexcl 1

4n

acuteiexcl 2nplusmnx C plusmny C 4nplusmnxplusmny

frac14

For xed n we have that

limjplusmnj0

E[Dplusmn] middot 14n

but this holds for all n so that

limjplusmnj0

E[Dplusmn] D 0

APPENDIX B PROOF THAT THE PROBABILITYINCLUSION FUNCTION EQUALS THE

TARGET INTENSITY FUNCTION

We need the measure space XB Aacute where X is the unit inter-val I D 0 1] or the unit square I2 D 0 1] pound 01] and the rele-vant frac34 elds are BI and BI2 the frac34 elds of the Borel subsetsof I and I2 respectively For each of the three types of populationswe de ne a measure Aacute of population size We use the same symbolfor all three cases but the speci cs vary from case to case For a -nite population we take Aacute to be counting measure restricted to R sothat for any subset B 2 BI2 AacuteB is the number of population ele-ments in B R For linear populations we take AacuteB to be the lengthof the linear population contained within B Clearly Aacute is nonnega-tive countably additive de ned for all Borel sets and Aacute D 0 soAacute is a measure Finally for areal populations we take AacuteB to be theLebesgue measure of B R

We begin by randomly translating the image of R in the unit squareby adding independent U0 1=2 offsets to the xy coordinates Thisrandom translation plays the same role as random grid location doesin an RTS design namely it guarantees that pairwise inclusion prob-abilities are nonzero In particular in this case it ensures that any pairof points in R has a nonzero chance of being mapped into differentquadrants

Let frac14s be an inclusion intensity function that is a function thatspeci es the target number of samples per unit measure We assumethat any linear population consists of a nite number m of smoothrecti able curves R D

SmiD1fdegit D xi t yi t jt 2 [ai bi ]g with

xi and yi continuous and differentiable on [ai bi ] We set frac14s equalto the target number of samples per unit length at s for s 2 L and

equal to zero elsewhere For example if the linear population werea stream network frac14s would specify the desired number of samplesper kilometer of stream at the point s Finally an areal population isa nite collection of closed polygons In this case frac14s speci es thetarget intensity as number of samples per unit area Note that for one-and two-dimensional resources frac14s could be a continuous smoothlyvarying function Formally we require frac14s to be bounded and mea-surable strictly positive on R and zero elsewhere and scaled so thatM D

RR frac14s dAacutes From these de nitions of frac14cent and Aacutecent it follows

that wB DR

B frac14sdAacutes is a measure and that wB is the targetnumber of samples in B In particular M D wI2 is the target samplesize In the following discussion we assume that M is an integer thenoninteger case is a simple extension

Let f cent be a quadrant-recursive function that maps I2 into I Be-cause BI can be generated by sets of the form J n

m and BI2 can begenerated by sets of the form Qn

j k both f and f iexcl1 are measurable

Because f is measurable f iexcl1B is measurable for B 2 BI so thatQF x D

Rf iexcl10x] frac14sdAacutes exists In fact QF is a distribution func-

tion that is nonnegative increasing and right continuous For linearand areal resources QF is a continuous increasing function but for -nite resourcepopulations QF is a step function with jumps at the imagesof populationelements We can modify QF to obtain continuity in the -nite case via linear interpolation that is let xi i D 1 N be the or-dered jump points of QF set x0 D 0 xNC1 D 1 and for xi lt x middot xiC1set F x D QF x C QFxiC1 iexcl QFxi =xiC1 iexcl xi x iexcl xi If we setF D QF for the linear and areal case then in every cases we have thatF is a continuous distribution function with range 0M]

In the nite case Fiexcl1 is single-valued so that Gy D minxi jF iexcl1y middot xi is well de ned In the linear and areal cases F iexcl1

may not be single-valued Points that are in the unit square but notin R lead to ats in F that correspond to regions in the unit squarewith frac14s D 0 However Fiexcl1y always will be closed and boundedso that Gy D minfxjx 2 Fiexcl1yg is well de ned In all cases theintensity function frac14 is positive at s D f iexcl1Gy that is there isa population element at s Thus f iexcl1 plusmn G maps 0M] onto the tar-get population that is f iexcl1 plusmn G associates every point in 0 M] witha unique element in the population

It follows that selecting a sample from 0 M] also selects pop-ulation elements via the mapping f iexcl1 plusmn G To get a sample withan inclusion function equal to the target inclusion density we selecta sample from 0M] by splitting the range into M unit-length in-tervals 01] 1 2] M iexcl 1M] and picking one point in eachinterval Because of hierarchical randomizationwe gain no additionalldquorandomnessrdquo by picking the points independently so we use system-atic sampling with a random start and a unit-length selection intervalThe selection procedure de nes an inclusion probability density func-tion on 0M] with a correspondingmeasure PM cent Note that PM co-incides with Lebesgue measure on 0 M] in particular the measureof a subinterval of 0 M] is its length We induce a measure P1 on I

via P1B DRGiexcl1B dP M and in turn induce a measure P2 on I2

via P2B DR

f iexcl1B dP 1 The measure P2 is an inclusion probability

measure on I2 and P2B D wB so the sample selection methoddoes give an inclusion probability function equal to the target sampleintensity function

[Received August 2002 Revised September 2003]

REFERENCES

Bellhouse D R (1977) ldquoSome Optimal Designs for Sampling in Two Dimen-sionsrdquo Biometrika 64 605ndash611

Bickford C A Mayer C E and Ware K D (1963) ldquoAn Ef cient Sam-pling Design for Forest Inventory The Northeast Forest Resurveyrdquo Journalof Forestry 61 826ndash833

278 Journal of the American Statistical Association March 2004

Breidt F J (1995) ldquoMarkov Chain Designs for One-per-Stratum SamplingrdquoSurvey Methodology 21 63ndash70

Brewer K R W and Hanif M (1983) Sampling With Unequal ProbabilitiesNew York Springer-Verlag

Cochran W G (1946) ldquoRelative Accuracy of Systematic and Strati ed Ran-dom Samples for a Certain Class of Populationsrdquo The Annals of Mathemati-cal Statistics 17 164ndash177

Cordy C (1993) ldquoAn Extension of the HorvitzndashThompson Theorem to PointSampling From a Continuous Universerdquo Probability and Statistics Letters18 353ndash362

Cotter J and Nealon J (1987) ldquoArea Frame Design for Agricultural SurveysrdquoUS Department of Agriculture National Agricultural Statistics Service Re-search and Applications Division Area Frame Section

Dalenius T Haacutejek J and Zubrzycki S (1961) ldquoOn Plane Sampling and Re-lated Geometrical Problemsrdquo in Proceedings of the 4th Berkeley Symposiumon Probability and Mathematical Statistics 1 125ndash150

Das A C (1950) ldquoTwo-Dimensional Systematic Sampling and the AssociatedStrati ed and Random Samplingrdquo Sankhya 10 95ndash108

Gibson L and Lucas D (1982) ldquoSpatial Data Processing Using BalancedTernaryrdquo in Proceedings of the IEEE Computer Society Conference on Pat-tern Recognition and Image Processing Silver Springs MD IEEE ComputerSociety Press

Gilbert R O (1987) Statistical Methods for Environmental Pollution Moni-toring New York Van Nostrand Reinhold

Hausdorff F (1957) Set Theory New York ChelseaHazard J W and Law B E (1989) Forest Survey Methods Used in the USDA

Forest Service EPA6003-89065 Corvallis Oregon US EnvironmentalProtection Agency Of ce of Research and Development Environmental Re-search Laboratory

Horn C R and Grayman W M (1993) ldquoWater-Quality Modeling With EPARiver Reach File Systemrdquo Journal of Water Resources Planning and Man-agement 119 262ndash274

Horvitz D G and Thompson D J (1952) ldquoA Generalization of SamplingWithout Replacement From a Finite Universerdquo Journal of the American Sta-tistical Association 47 663ndash685

Iachan R (1985) ldquoPlane Samplingrdquo Statistics and Probability Letters 50151ndash159

IDEM (2000) ldquoIndiana Water Quality Report 2000rdquo Report IDEM34020012000 Indiana Department of Environmental Management Of ce of Wa-ter Management Indianapolis Indiana

Insightful Corporation (2002) ldquoS-PLUS 6 for Windows Language ReferencerdquoInsightful Corporation Seattle WA

Karr J R (1991) ldquoBiological Integrity A Long Neglected Aspect of WaterResource Managementrdquo Ecological Applications 1 66ndash84

Kish L (1987) Statistical Design for Research New York WileyMahalanobis P C (1946) ldquoRecent Experiments in Statistical Sampling in

the Indian Statistical Instituterdquo Journal of the Royal Statistical Society 109325ndash370

Mark D M (1990) ldquoNeighbor-Based Properties of Some Orderings of Two-Dimensional Spacerdquo Geographical Analysis 2 145ndash157

Mateacutern B (1960) Spatial Variation Stockholm Sweden Meddelanden fraringnStatens Skogsforskningsinstitut

Messer J J Arsiss C W Baker J R Drouseacute S K Eshleman K NKaufmann P R Linthurst R A Omernik J M Overton W S Sale M JSchonbrod R D Stambaugh S M and Tuschall J R Jr (1986) Na-tional Surface Water Survey National Stream Survey Phase I-Pilot SurveyEPA-6004-86026 Washington DC US Environmental ProtectionAgency

MunhollandP L and Borkowski J J (1996) ldquoSimple Latin Square SamplingC 1 A Spatial Design Using Quadratsrdquo Biometrics 52 125ndash136

Olea R A (1984) ldquoSampling Design Optimization for Spatial FunctionsrdquoMathematical Geology 16 369ndash392

Overton W S and Stehman S V (1993) ldquoProperties of Designs for SamplingContinuous Spatial Resources From a Triangular Gridrdquo Communications inStatistics Part AmdashTheory and Methods 22 2641ndash2660

Patterson H D (1950) ldquoSampling on Successive Occasions With Partial Re-placement of Unitsrdquo Journal of the Royal Statistical Society Ser B 12241ndash255

Peano G (1890) ldquoSur Une Courbe Qui Remplit Toute Une Aire Planerdquo Math-ematische Annalen 36 157ndash160

Quenouille M H (1949) ldquoProblems in Plane Samplingrdquo The Annals of Math-ematical Statistics 20 335ndash375

Saalfeld A (1991) ldquoConstruction of Spatially Articulated List Frames forHousehold Surveysrdquo in Proceedings of Statistics Canada Symposium 91Spatial Issues in Statistics Ottawa Canada Statistics Canada pp 41ndash53

Sen A R (1953) ldquoOn the Estimate of the Variance in Sampling With Vary-ing Probabilitiesrdquo Journal of the Indian Society of Agricultural Statistics 7119ndash127

Simmons G F (1963) Introduction to Topology and Modern Analysis NewYork McGrawndashHill

Stehman S V and Overton W S (1994) ldquoEnvironmental Sampling and Mon-itoringrdquo in Handbook of Statistics Vol 12 eds G P Patil and C R RaoAmsterdam The Netherlands Elsevier Science pp 263ndash305

Stevens D L Jr (1997) ldquoVariable Density Grid-Based Sampling Designs forContinuous Spatial Populationsrdquo Environmetrics 8 167ndash195

Stevens D L Jr and Olsen A R (1999) ldquoSpatially Restricted Surveys OverTime for Aquatic Resourcesrdquo Journal of Agricultural Biological and Envi-ronmental Statistics 4 415ndash428

(2000) ldquoSpatially-Restricted Random Sampling Designs for Design-Based and Model-Based Estimationrdquo in Accuracy 2000 Proceedings of the4th International Symposium on Spatial Accuracy Assessment in Natural Re-sources and Environmental Sciences Delft The Netherlands Delft Univer-sity Press pp 609ndash616

(2003) ldquoVariance Estimation for Spatially Balanced Samples of Envi-ronmental Resourcesrdquo Environmetrics 14 593ndash610

Strahler A N (1957) ldquoQuantitative Analysis of Watershed GeomorphologyrdquoTransactions of the American Geophysical Union 38 913ndash920

Thompson S K (1992) Sampling New York WileyUrquhart N S Overton W S and Birkes D S (1993) ldquoComparing

Sampling Designs for Monitoring Ecological Status and Trends Impact ofTemporal Patternsrdquo in Statistics for the Environment eds V Barnett andK F Turkman New York Wiley pp 71ndash86

USGS (1999) ldquoThe National Hydrography Datasetrdquo Fact Sheet 106-99 USGeological Survey

Wolter K (1985) Introduction to Variance Estimation New York Springer-Verlag

Wolter K M and Harter R M (1990) ldquoSample Maintenance Based on PeanoKeysrdquo in Proceedings of the 1989 International Symposium Analysis of Datain Time Ottawa Canada Statistics Canada pp 21ndash31

Yates F (1981) Sampling Methods for Censuses and Surveys (4th ed) Lon-don Grif n

Yates F and Grundy P M (1953) ldquoSelection Without Replacement FromWithin Strata With Probability Proportional to Sizerdquo Journal of the RoyalStatistical Society Ser B 15 253ndash261

Page 10: Spatially Balanced Sampling of Natural Resources · Spatially Balanced Sampling of Natural Resources DonL.STEVENSJr. and Anthony R. OLSEN The spatial distribution of a natural resource

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 271

design but the strata are no longer regular polygonsNeverthe-less GRTS should have the same ef ciency as maximal strati- cation

An example of circumstances where ef ciency is dif cultto evaluate is a nite population study with variable probabil-ity and irregular spatial density In these circumstances spatialstrata can be very dif cult to form and in fact it may be impos-sible to form strata with a xed number of samples per stratumA GRTS sample achieves the regularity of a one-sample-per-stratum strati cation and so should have the same ef ciency

The overwhelming advantage of a GRTS design is not that itis more ef cient than spatial strati cation but that it can be ap-plied in a straightforward manner in circumstances where spa-tial strati cation is dif cult All of the pathologies that occurin sampling natural populations (poor frame information inac-cessibility variable probability uneven spatial pattern missingdata and panel structures) can be easily accommodated withinthe GRTS design

5 EXAMPLE APPLICATION TO STREAMS

The Indiana Department of Environmental Management(IDEM) conducts water quality and biological assessments ofthe streams and rivers within Indiana For administrative pur-poses the state is divided into nine hydrologicbasins East ForkWhite River Basin West Fork White River Basin Upper IllinoisRiver Basin Great Miami River Basin Lower Wabash RiverBasin Patoka River Basin Upper Wabash River Basin GreatLakes Basin and Ohio River Basin All basins are assessedonce during a 5-year period typically two basins are completedeach year In 1996 IDEM initiated a monitoring strategy thatused probability survey designs for the selection of samplingsite locations We collaborated with them on the survey designIn 1997 a GRTS multidensity design was implemented for theEast Fork White River Basin and the Great Miami River BasinIn 1999 another GRTS multidensity design was implementedfor the Upper Illinois Basin and the Lower Wabash These de-signs will be used to illustrate the application of GRTS surveydesigns to a linear network

The target population for the studies consists of all streamsand rivers with perennially owing water A sample frameRiver Reach File Version 3 (RF3) for the target populationis available from the US Environmental Protection Agency(Horn and Grayman 1993) The RF3 includes attributes that en-able perennial streams and rivers to be identi ed but results inan overcoverage of the target population due to coding errorsIn addition Strahler order is available to classify streams andrivers into relative size categories (Strahler 1957) A headwaterstream is a Strahler rst-order stream two rst-order streamsjoining results in a second-order stream and so on Approx-imately 60 of the stream length in Indiana is rst order

Table 3 Sample Frame Stream and River Length by Basin andStrahler Order Category

Strahler order category length (km)

Basin Total length (km) 1 2 3 C

E Fork White 6802385 3833335 2189494 779556Great Miami 2270018 1501711 621039 147268L Wabash 7601418 4632484 1331228 1637706U Illinois 5606329 4559123 500188 547018

20 is second order 10 is third order and 10 is fourth andgreater (see Table 3) In 1997 IDEM determined that the sam-ple would be structured so that approximately an equal num-ber of sites would be in rst order second and third order andfourth C order for the East Fork White River and the Great Mi-ami River basins In 1999 the sample was modi ed to have anequal number of sites in rst second third and fourth C ordercategories for the Lower Wabash and Upper Illinois basins

The GRTS multidensity survey designs were applied In bothyears six multidensity categories were used (three Strahler or-der categories in each of two basins) Although four Strahlerorder categories were planned in 1999 the stream lengths as-sociated with the third and fourth C categories were approxi-mately equal so a single category that combined the samplesizes was used To account for frame errors landowner denialsand physically inaccessible stream sites a 100 oversamplewas incorporated in 1999 The intent was to have a minimumof 38 biological sites with eld data in 1999 this was not donein 1997 Table 4 summarizes the number of sites expected andactually evaluated as well as the number of nontarget targetnonresponse and sampled sites Almost all of the nonresponsesites are due to landowner denial In 1999 the sites were usedin reverse hierarchical order until the desired number of ac-tual eld sample sites was obtained The biological sites werea nested subsample of the water chemistry sites and were takenin reverse hierarchicalorder from the water chemistry sites Fig-ures 7ndash10 show the spatial pattern of the stream networks andthe GRTS sample sites for each of the four basins by Strahlerorder categories Although this is an example of a single real-izationof a multidensityGRTS design all realizationswill havea similar spatial pattern Prior to statistical analysis the initialinclusion densities are adjusted to account for use of oversam-ple sites by recalculating the inclusion densities by basin

Indiana determined two summary indices related to the eco-logical conditionof the streams and rivers the IBI score whichis a sh community index of biological integrity (Karr 1991)that assesses water quality using resident sh communities asa tool for monitoring the biological integrity of streams and theQHEI score which is a habitat index based on the Ohio Envi-ronmental Protection Agency qualitative habitat evaluation in-dex (see IDEM 2000 for detailed descriptions of these indices)

Table 4 Survey Design Sample Sizes for Basins Sampled in 1997 and 1999

Expected Evaluated Nontarget Target Nonresponse Water chemistry BiologicalBasin sample size sample size sites sites sites sites sites

E Fork White 60 60 5 55 9 35 34Great Miami 40 40 12 28 5 19 19L Wabash 128 91 11 80 9 71 39U Illinois 128 85 8 77 5 72 41

272 Journal of the American Statistical Association March 2004

Figure 7 East Fork White River Basin Sample Sites by Multidensity Categories

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 273

Figure 8 Great Miami River Basin Sample Sites by Multidensity Categories

274 Journal of the American Statistical Association March 2004

Figure 9 Upper Illinois River Basin Sample Sites by Multidensity Categories

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 275

Figure 10 Lower Wabash River Basin Sample Sites by Multidensity Categories

276 Journal of the American Statistical Association March 2004

Table 5 Population Estimates With IRS and Local Variance Estimates

Indicator IRS Local DifferenceSubpopulation score N sites Mean std err std err ()

L Wabash IBI 39 361 21 14 iexcl568U Illinois IBI 41 325 17 13 iexcl448E Fork White IBI 32 351 13 12 iexcl223Great Miami IBI 19 408 27 22 iexcl335L Wabash QHEI 39 556 23 16 iexcl522U Illinois QHEI 41 433 21 16 iexcl395E Fork White QHEI 34 543 21 15 iexcl459Great Miami QHEI 19 678 22 19 iexcl260

Table 5 summarizes the population estimates for IBI andQHEI scores for each of the four basins The associated stan-dard error estimates are based on the HorvitzndashThompson ratiovariance estimator assuming an independent random sampleand on the local neighborhoodvariance estimator described inSection 41 On average the neighborhood variance estimatoris 38 smaller than the IRS variance estimator Figure 11 il-lustrates the impact of the variance estimators on con denceintervals for cumulative distribution function estimates for theLower Wabash Basin

6 DISCUSSION

There are a number of designs that provide good disper-sion of sample points over a spatial domain When we appliedthese designs to large-scale environmental sampling programsit quickly became apparent that we needed a means (1) toaccommodate variable inclusion probability and (2) to adjustsample sizes dynamically These requirements are rooted inthe very fundamentals of environmental management The rstrequirement stems from the fact that an environmental re-source is rarely uniformly important in the objective of themonitoring there are always scienti c economic or politi-cal reasons for sampling some portions of a resource moreintensively than others Two features of environmental moni-toring programs drive the second requirement First these pro-grams tend to be long lived so that even if the objectives ofthe program remain unchangedthe ldquoimportantrdquosubpopulationschange necessitating a corresponding change in sampling in-tensity Second a high-quality sampling frame is often lack-ing for environmental resource populationsAs far as we knowthere is no other technique for spatial sampling that ldquobalancesrdquoover an intensity metric instead of a Euclidean distance metricor permits dynamic modi cation of sample intensity

Adaptive sampling (Thompson 1992 pp 261ndash319) is an-other way to modify sample intensity However there are somesigni cant differences between GRTS and adaptive sampling inthe way the modi cation is accomplished Adaptive samplingincreases the sampling intensity locally depending on the re-sponse observed at a sample point whereas the GRTS intensitychange is global

The GRTS rst-order inclusion probability (or density) canbe made proportional to an arbitrary positive auxiliary vari-able for example a signal from a remote sensing platformor a sample intensity that varies by geographical divisions orknown physical characteristics of the target populationIn somepoint and linear situations it may be desirable to have thesample be spatially balanced with respect to geographic spacerather than with respect to the population density This can be

Figure 11 Stream Network and Sample Site Spatial Patterns by Mul-tidensity Category for the Lower Wabash Basin (mdashmdash- CDF estimateiexcl iexcl iexcl iexcl iexcl 95 local condence limits cent cent cent cent cent cent cent cent cent95 IRS condencelimits)

achieved by making the inclusion probability inversely propor-tional to the population density Although the development ofGRTS has focused on applications in geographic space it canbe applied in other spaces For exampleone applicationde nedtwo-dimensionalspace by the rst two principal componentsofclimate variables and selected a GRTS sample of forest plots inthat space

The computational burden in hierarchical randomization canbe substantial However it needs to be carried out only to a res-olution suf cient to obtain no more than one sample point persubquadrant The actual point selection can be carried out bytreating the subquadrants as if they are elements of a nitepopulation selecting the M subquadrants to receive samplepoints and then selecting one population element at randomfrom among the elements contained within the selected sub-quadrants according to the probability speci ed by frac14

Reverse hierarchical ordering adds a feature that is im-mensely popular with eld practitioners namely the ability toldquoreplacerdquo samples that are lost due to being nontarget or inac-cessible Moreover we can replace the samples in such a wayas to achieve good spatial balance over the population that isactually sampleable even when sampleability cannot be deter-mined prior to sample selectionOf course this feature does noteliminate the nonresponse or the bias of an inference to the in-accessible population It does however allow investigators toobtain the maximum number of samples that their budget willpermit them to analyze

Reverse hierarchical ordering has other uses as well One isto generate interpenetrating subsamples (Mahalanobis 1946)For example 10 interpenetrating subsamples from a samplesize of 100 can be obtained simply by taking consecutive sub-sets of 10 from the reverse hierarchical ordering Each subsethas the same properties as the complete design Consecutivesubsets can also be used to de ne panels of sites for applica-tion in surveys over time for example sampling with partial re-placement (Patterson 1950 Kish 1987 Urquhart Overton andBirkes 1993)

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 277

APPENDIX A PROOF OF LEMMA

Lemma Let f I2 I be a 1ndash1 quadrant-recursive function andlet s raquo UI2 Then limjplusmnj0 Efjf s iexcl f s C plusmnjg D 0

Proof If for some n gt 0 s and s C plusmn are in the same subquad-rant Qn

jk then f s and f s C plusmn are in the same interval J nm so

that jf s iexcl f s C plusmnj middot 1=4n The probability that s and s C plusmn arein the same subquadrant is the same as the probability of the ori-gin and plusmn D plusmnx plusmny being in the same cell of a randomly locatedgrid with cells congruent to Qn

jk For plusmnx plusmny middot 1=2n that probability

is equal to jQn0 Qnplusmnj=jQn0j D 1 iexcl 2nplusmnx C plusmny C 4nplusmnxplusmny where Qnx denotes a polygon congruent to Qn

jk centered on x

For Ds plusmn D jf s iexcl f s C plusmnj then we have that P D middot 1=4n cedil1 iexcl 2nplusmnx C plusmny C 4nplusmnxplusmny Thus the distribution function FD of D isbounded below by

FD u cedil

8gtlt

gt

0 u middot 14n

1 iexcl 2nplusmnx C plusmny C 4nplusmnxplusmny u gt1

4n

Because D is positive and bounded above by 1

E[Dplusmn] D 1 iexclZ 1

0FD udu

middot 1 iexclraquo

0

4nC

sup31 iexcl 1

4n

acuteiexcl 2nplusmnx C plusmny C 4nplusmnxplusmny

frac14

For xed n we have that

limjplusmnj0

E[Dplusmn] middot 14n

but this holds for all n so that

limjplusmnj0

E[Dplusmn] D 0

APPENDIX B PROOF THAT THE PROBABILITYINCLUSION FUNCTION EQUALS THE

TARGET INTENSITY FUNCTION

We need the measure space XB Aacute where X is the unit inter-val I D 0 1] or the unit square I2 D 0 1] pound 01] and the rele-vant frac34 elds are BI and BI2 the frac34 elds of the Borel subsetsof I and I2 respectively For each of the three types of populationswe de ne a measure Aacute of population size We use the same symbolfor all three cases but the speci cs vary from case to case For a -nite population we take Aacute to be counting measure restricted to R sothat for any subset B 2 BI2 AacuteB is the number of population ele-ments in B R For linear populations we take AacuteB to be the lengthof the linear population contained within B Clearly Aacute is nonnega-tive countably additive de ned for all Borel sets and Aacute D 0 soAacute is a measure Finally for areal populations we take AacuteB to be theLebesgue measure of B R

We begin by randomly translating the image of R in the unit squareby adding independent U0 1=2 offsets to the xy coordinates Thisrandom translation plays the same role as random grid location doesin an RTS design namely it guarantees that pairwise inclusion prob-abilities are nonzero In particular in this case it ensures that any pairof points in R has a nonzero chance of being mapped into differentquadrants

Let frac14s be an inclusion intensity function that is a function thatspeci es the target number of samples per unit measure We assumethat any linear population consists of a nite number m of smoothrecti able curves R D

SmiD1fdegit D xi t yi t jt 2 [ai bi ]g with

xi and yi continuous and differentiable on [ai bi ] We set frac14s equalto the target number of samples per unit length at s for s 2 L and

equal to zero elsewhere For example if the linear population werea stream network frac14s would specify the desired number of samplesper kilometer of stream at the point s Finally an areal population isa nite collection of closed polygons In this case frac14s speci es thetarget intensity as number of samples per unit area Note that for one-and two-dimensional resources frac14s could be a continuous smoothlyvarying function Formally we require frac14s to be bounded and mea-surable strictly positive on R and zero elsewhere and scaled so thatM D

RR frac14s dAacutes From these de nitions of frac14cent and Aacutecent it follows

that wB DR

B frac14sdAacutes is a measure and that wB is the targetnumber of samples in B In particular M D wI2 is the target samplesize In the following discussion we assume that M is an integer thenoninteger case is a simple extension

Let f cent be a quadrant-recursive function that maps I2 into I Be-cause BI can be generated by sets of the form J n

m and BI2 can begenerated by sets of the form Qn

j k both f and f iexcl1 are measurable

Because f is measurable f iexcl1B is measurable for B 2 BI so thatQF x D

Rf iexcl10x] frac14sdAacutes exists In fact QF is a distribution func-

tion that is nonnegative increasing and right continuous For linearand areal resources QF is a continuous increasing function but for -nite resourcepopulations QF is a step function with jumps at the imagesof populationelements We can modify QF to obtain continuity in the -nite case via linear interpolation that is let xi i D 1 N be the or-dered jump points of QF set x0 D 0 xNC1 D 1 and for xi lt x middot xiC1set F x D QF x C QFxiC1 iexcl QFxi =xiC1 iexcl xi x iexcl xi If we setF D QF for the linear and areal case then in every cases we have thatF is a continuous distribution function with range 0M]

In the nite case Fiexcl1 is single-valued so that Gy D minxi jF iexcl1y middot xi is well de ned In the linear and areal cases F iexcl1

may not be single-valued Points that are in the unit square but notin R lead to ats in F that correspond to regions in the unit squarewith frac14s D 0 However Fiexcl1y always will be closed and boundedso that Gy D minfxjx 2 Fiexcl1yg is well de ned In all cases theintensity function frac14 is positive at s D f iexcl1Gy that is there isa population element at s Thus f iexcl1 plusmn G maps 0M] onto the tar-get population that is f iexcl1 plusmn G associates every point in 0 M] witha unique element in the population

It follows that selecting a sample from 0 M] also selects pop-ulation elements via the mapping f iexcl1 plusmn G To get a sample withan inclusion function equal to the target inclusion density we selecta sample from 0M] by splitting the range into M unit-length in-tervals 01] 1 2] M iexcl 1M] and picking one point in eachinterval Because of hierarchical randomizationwe gain no additionalldquorandomnessrdquo by picking the points independently so we use system-atic sampling with a random start and a unit-length selection intervalThe selection procedure de nes an inclusion probability density func-tion on 0M] with a correspondingmeasure PM cent Note that PM co-incides with Lebesgue measure on 0 M] in particular the measureof a subinterval of 0 M] is its length We induce a measure P1 on I

via P1B DRGiexcl1B dP M and in turn induce a measure P2 on I2

via P2B DR

f iexcl1B dP 1 The measure P2 is an inclusion probability

measure on I2 and P2B D wB so the sample selection methoddoes give an inclusion probability function equal to the target sampleintensity function

[Received August 2002 Revised September 2003]

REFERENCES

Bellhouse D R (1977) ldquoSome Optimal Designs for Sampling in Two Dimen-sionsrdquo Biometrika 64 605ndash611

Bickford C A Mayer C E and Ware K D (1963) ldquoAn Ef cient Sam-pling Design for Forest Inventory The Northeast Forest Resurveyrdquo Journalof Forestry 61 826ndash833

278 Journal of the American Statistical Association March 2004

Breidt F J (1995) ldquoMarkov Chain Designs for One-per-Stratum SamplingrdquoSurvey Methodology 21 63ndash70

Brewer K R W and Hanif M (1983) Sampling With Unequal ProbabilitiesNew York Springer-Verlag

Cochran W G (1946) ldquoRelative Accuracy of Systematic and Strati ed Ran-dom Samples for a Certain Class of Populationsrdquo The Annals of Mathemati-cal Statistics 17 164ndash177

Cordy C (1993) ldquoAn Extension of the HorvitzndashThompson Theorem to PointSampling From a Continuous Universerdquo Probability and Statistics Letters18 353ndash362

Cotter J and Nealon J (1987) ldquoArea Frame Design for Agricultural SurveysrdquoUS Department of Agriculture National Agricultural Statistics Service Re-search and Applications Division Area Frame Section

Dalenius T Haacutejek J and Zubrzycki S (1961) ldquoOn Plane Sampling and Re-lated Geometrical Problemsrdquo in Proceedings of the 4th Berkeley Symposiumon Probability and Mathematical Statistics 1 125ndash150

Das A C (1950) ldquoTwo-Dimensional Systematic Sampling and the AssociatedStrati ed and Random Samplingrdquo Sankhya 10 95ndash108

Gibson L and Lucas D (1982) ldquoSpatial Data Processing Using BalancedTernaryrdquo in Proceedings of the IEEE Computer Society Conference on Pat-tern Recognition and Image Processing Silver Springs MD IEEE ComputerSociety Press

Gilbert R O (1987) Statistical Methods for Environmental Pollution Moni-toring New York Van Nostrand Reinhold

Hausdorff F (1957) Set Theory New York ChelseaHazard J W and Law B E (1989) Forest Survey Methods Used in the USDA

Forest Service EPA6003-89065 Corvallis Oregon US EnvironmentalProtection Agency Of ce of Research and Development Environmental Re-search Laboratory

Horn C R and Grayman W M (1993) ldquoWater-Quality Modeling With EPARiver Reach File Systemrdquo Journal of Water Resources Planning and Man-agement 119 262ndash274

Horvitz D G and Thompson D J (1952) ldquoA Generalization of SamplingWithout Replacement From a Finite Universerdquo Journal of the American Sta-tistical Association 47 663ndash685

Iachan R (1985) ldquoPlane Samplingrdquo Statistics and Probability Letters 50151ndash159

IDEM (2000) ldquoIndiana Water Quality Report 2000rdquo Report IDEM34020012000 Indiana Department of Environmental Management Of ce of Wa-ter Management Indianapolis Indiana

Insightful Corporation (2002) ldquoS-PLUS 6 for Windows Language ReferencerdquoInsightful Corporation Seattle WA

Karr J R (1991) ldquoBiological Integrity A Long Neglected Aspect of WaterResource Managementrdquo Ecological Applications 1 66ndash84

Kish L (1987) Statistical Design for Research New York WileyMahalanobis P C (1946) ldquoRecent Experiments in Statistical Sampling in

the Indian Statistical Instituterdquo Journal of the Royal Statistical Society 109325ndash370

Mark D M (1990) ldquoNeighbor-Based Properties of Some Orderings of Two-Dimensional Spacerdquo Geographical Analysis 2 145ndash157

Mateacutern B (1960) Spatial Variation Stockholm Sweden Meddelanden fraringnStatens Skogsforskningsinstitut

Messer J J Arsiss C W Baker J R Drouseacute S K Eshleman K NKaufmann P R Linthurst R A Omernik J M Overton W S Sale M JSchonbrod R D Stambaugh S M and Tuschall J R Jr (1986) Na-tional Surface Water Survey National Stream Survey Phase I-Pilot SurveyEPA-6004-86026 Washington DC US Environmental ProtectionAgency

MunhollandP L and Borkowski J J (1996) ldquoSimple Latin Square SamplingC 1 A Spatial Design Using Quadratsrdquo Biometrics 52 125ndash136

Olea R A (1984) ldquoSampling Design Optimization for Spatial FunctionsrdquoMathematical Geology 16 369ndash392

Overton W S and Stehman S V (1993) ldquoProperties of Designs for SamplingContinuous Spatial Resources From a Triangular Gridrdquo Communications inStatistics Part AmdashTheory and Methods 22 2641ndash2660

Patterson H D (1950) ldquoSampling on Successive Occasions With Partial Re-placement of Unitsrdquo Journal of the Royal Statistical Society Ser B 12241ndash255

Peano G (1890) ldquoSur Une Courbe Qui Remplit Toute Une Aire Planerdquo Math-ematische Annalen 36 157ndash160

Quenouille M H (1949) ldquoProblems in Plane Samplingrdquo The Annals of Math-ematical Statistics 20 335ndash375

Saalfeld A (1991) ldquoConstruction of Spatially Articulated List Frames forHousehold Surveysrdquo in Proceedings of Statistics Canada Symposium 91Spatial Issues in Statistics Ottawa Canada Statistics Canada pp 41ndash53

Sen A R (1953) ldquoOn the Estimate of the Variance in Sampling With Vary-ing Probabilitiesrdquo Journal of the Indian Society of Agricultural Statistics 7119ndash127

Simmons G F (1963) Introduction to Topology and Modern Analysis NewYork McGrawndashHill

Stehman S V and Overton W S (1994) ldquoEnvironmental Sampling and Mon-itoringrdquo in Handbook of Statistics Vol 12 eds G P Patil and C R RaoAmsterdam The Netherlands Elsevier Science pp 263ndash305

Stevens D L Jr (1997) ldquoVariable Density Grid-Based Sampling Designs forContinuous Spatial Populationsrdquo Environmetrics 8 167ndash195

Stevens D L Jr and Olsen A R (1999) ldquoSpatially Restricted Surveys OverTime for Aquatic Resourcesrdquo Journal of Agricultural Biological and Envi-ronmental Statistics 4 415ndash428

(2000) ldquoSpatially-Restricted Random Sampling Designs for Design-Based and Model-Based Estimationrdquo in Accuracy 2000 Proceedings of the4th International Symposium on Spatial Accuracy Assessment in Natural Re-sources and Environmental Sciences Delft The Netherlands Delft Univer-sity Press pp 609ndash616

(2003) ldquoVariance Estimation for Spatially Balanced Samples of Envi-ronmental Resourcesrdquo Environmetrics 14 593ndash610

Strahler A N (1957) ldquoQuantitative Analysis of Watershed GeomorphologyrdquoTransactions of the American Geophysical Union 38 913ndash920

Thompson S K (1992) Sampling New York WileyUrquhart N S Overton W S and Birkes D S (1993) ldquoComparing

Sampling Designs for Monitoring Ecological Status and Trends Impact ofTemporal Patternsrdquo in Statistics for the Environment eds V Barnett andK F Turkman New York Wiley pp 71ndash86

USGS (1999) ldquoThe National Hydrography Datasetrdquo Fact Sheet 106-99 USGeological Survey

Wolter K (1985) Introduction to Variance Estimation New York Springer-Verlag

Wolter K M and Harter R M (1990) ldquoSample Maintenance Based on PeanoKeysrdquo in Proceedings of the 1989 International Symposium Analysis of Datain Time Ottawa Canada Statistics Canada pp 21ndash31

Yates F (1981) Sampling Methods for Censuses and Surveys (4th ed) Lon-don Grif n

Yates F and Grundy P M (1953) ldquoSelection Without Replacement FromWithin Strata With Probability Proportional to Sizerdquo Journal of the RoyalStatistical Society Ser B 15 253ndash261

Page 11: Spatially Balanced Sampling of Natural Resources · Spatially Balanced Sampling of Natural Resources DonL.STEVENSJr. and Anthony R. OLSEN The spatial distribution of a natural resource

272 Journal of the American Statistical Association March 2004

Figure 7 East Fork White River Basin Sample Sites by Multidensity Categories

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 273

Figure 8 Great Miami River Basin Sample Sites by Multidensity Categories

274 Journal of the American Statistical Association March 2004

Figure 9 Upper Illinois River Basin Sample Sites by Multidensity Categories

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 275

Figure 10 Lower Wabash River Basin Sample Sites by Multidensity Categories

276 Journal of the American Statistical Association March 2004

Table 5 Population Estimates With IRS and Local Variance Estimates

Indicator IRS Local DifferenceSubpopulation score N sites Mean std err std err ()

L Wabash IBI 39 361 21 14 iexcl568U Illinois IBI 41 325 17 13 iexcl448E Fork White IBI 32 351 13 12 iexcl223Great Miami IBI 19 408 27 22 iexcl335L Wabash QHEI 39 556 23 16 iexcl522U Illinois QHEI 41 433 21 16 iexcl395E Fork White QHEI 34 543 21 15 iexcl459Great Miami QHEI 19 678 22 19 iexcl260

Table 5 summarizes the population estimates for IBI andQHEI scores for each of the four basins The associated stan-dard error estimates are based on the HorvitzndashThompson ratiovariance estimator assuming an independent random sampleand on the local neighborhoodvariance estimator described inSection 41 On average the neighborhood variance estimatoris 38 smaller than the IRS variance estimator Figure 11 il-lustrates the impact of the variance estimators on con denceintervals for cumulative distribution function estimates for theLower Wabash Basin

6 DISCUSSION

There are a number of designs that provide good disper-sion of sample points over a spatial domain When we appliedthese designs to large-scale environmental sampling programsit quickly became apparent that we needed a means (1) toaccommodate variable inclusion probability and (2) to adjustsample sizes dynamically These requirements are rooted inthe very fundamentals of environmental management The rstrequirement stems from the fact that an environmental re-source is rarely uniformly important in the objective of themonitoring there are always scienti c economic or politi-cal reasons for sampling some portions of a resource moreintensively than others Two features of environmental moni-toring programs drive the second requirement First these pro-grams tend to be long lived so that even if the objectives ofthe program remain unchangedthe ldquoimportantrdquosubpopulationschange necessitating a corresponding change in sampling in-tensity Second a high-quality sampling frame is often lack-ing for environmental resource populationsAs far as we knowthere is no other technique for spatial sampling that ldquobalancesrdquoover an intensity metric instead of a Euclidean distance metricor permits dynamic modi cation of sample intensity

Adaptive sampling (Thompson 1992 pp 261ndash319) is an-other way to modify sample intensity However there are somesigni cant differences between GRTS and adaptive sampling inthe way the modi cation is accomplished Adaptive samplingincreases the sampling intensity locally depending on the re-sponse observed at a sample point whereas the GRTS intensitychange is global

The GRTS rst-order inclusion probability (or density) canbe made proportional to an arbitrary positive auxiliary vari-able for example a signal from a remote sensing platformor a sample intensity that varies by geographical divisions orknown physical characteristics of the target populationIn somepoint and linear situations it may be desirable to have thesample be spatially balanced with respect to geographic spacerather than with respect to the population density This can be

Figure 11 Stream Network and Sample Site Spatial Patterns by Mul-tidensity Category for the Lower Wabash Basin (mdashmdash- CDF estimateiexcl iexcl iexcl iexcl iexcl 95 local condence limits cent cent cent cent cent cent cent cent cent95 IRS condencelimits)

achieved by making the inclusion probability inversely propor-tional to the population density Although the development ofGRTS has focused on applications in geographic space it canbe applied in other spaces For exampleone applicationde nedtwo-dimensionalspace by the rst two principal componentsofclimate variables and selected a GRTS sample of forest plots inthat space

The computational burden in hierarchical randomization canbe substantial However it needs to be carried out only to a res-olution suf cient to obtain no more than one sample point persubquadrant The actual point selection can be carried out bytreating the subquadrants as if they are elements of a nitepopulation selecting the M subquadrants to receive samplepoints and then selecting one population element at randomfrom among the elements contained within the selected sub-quadrants according to the probability speci ed by frac14

Reverse hierarchical ordering adds a feature that is im-mensely popular with eld practitioners namely the ability toldquoreplacerdquo samples that are lost due to being nontarget or inac-cessible Moreover we can replace the samples in such a wayas to achieve good spatial balance over the population that isactually sampleable even when sampleability cannot be deter-mined prior to sample selectionOf course this feature does noteliminate the nonresponse or the bias of an inference to the in-accessible population It does however allow investigators toobtain the maximum number of samples that their budget willpermit them to analyze

Reverse hierarchical ordering has other uses as well One isto generate interpenetrating subsamples (Mahalanobis 1946)For example 10 interpenetrating subsamples from a samplesize of 100 can be obtained simply by taking consecutive sub-sets of 10 from the reverse hierarchical ordering Each subsethas the same properties as the complete design Consecutivesubsets can also be used to de ne panels of sites for applica-tion in surveys over time for example sampling with partial re-placement (Patterson 1950 Kish 1987 Urquhart Overton andBirkes 1993)

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 277

APPENDIX A PROOF OF LEMMA

Lemma Let f I2 I be a 1ndash1 quadrant-recursive function andlet s raquo UI2 Then limjplusmnj0 Efjf s iexcl f s C plusmnjg D 0

Proof If for some n gt 0 s and s C plusmn are in the same subquad-rant Qn

jk then f s and f s C plusmn are in the same interval J nm so

that jf s iexcl f s C plusmnj middot 1=4n The probability that s and s C plusmn arein the same subquadrant is the same as the probability of the ori-gin and plusmn D plusmnx plusmny being in the same cell of a randomly locatedgrid with cells congruent to Qn

jk For plusmnx plusmny middot 1=2n that probability

is equal to jQn0 Qnplusmnj=jQn0j D 1 iexcl 2nplusmnx C plusmny C 4nplusmnxplusmny where Qnx denotes a polygon congruent to Qn

jk centered on x

For Ds plusmn D jf s iexcl f s C plusmnj then we have that P D middot 1=4n cedil1 iexcl 2nplusmnx C plusmny C 4nplusmnxplusmny Thus the distribution function FD of D isbounded below by

FD u cedil

8gtlt

gt

0 u middot 14n

1 iexcl 2nplusmnx C plusmny C 4nplusmnxplusmny u gt1

4n

Because D is positive and bounded above by 1

E[Dplusmn] D 1 iexclZ 1

0FD udu

middot 1 iexclraquo

0

4nC

sup31 iexcl 1

4n

acuteiexcl 2nplusmnx C plusmny C 4nplusmnxplusmny

frac14

For xed n we have that

limjplusmnj0

E[Dplusmn] middot 14n

but this holds for all n so that

limjplusmnj0

E[Dplusmn] D 0

APPENDIX B PROOF THAT THE PROBABILITYINCLUSION FUNCTION EQUALS THE

TARGET INTENSITY FUNCTION

We need the measure space XB Aacute where X is the unit inter-val I D 0 1] or the unit square I2 D 0 1] pound 01] and the rele-vant frac34 elds are BI and BI2 the frac34 elds of the Borel subsetsof I and I2 respectively For each of the three types of populationswe de ne a measure Aacute of population size We use the same symbolfor all three cases but the speci cs vary from case to case For a -nite population we take Aacute to be counting measure restricted to R sothat for any subset B 2 BI2 AacuteB is the number of population ele-ments in B R For linear populations we take AacuteB to be the lengthof the linear population contained within B Clearly Aacute is nonnega-tive countably additive de ned for all Borel sets and Aacute D 0 soAacute is a measure Finally for areal populations we take AacuteB to be theLebesgue measure of B R

We begin by randomly translating the image of R in the unit squareby adding independent U0 1=2 offsets to the xy coordinates Thisrandom translation plays the same role as random grid location doesin an RTS design namely it guarantees that pairwise inclusion prob-abilities are nonzero In particular in this case it ensures that any pairof points in R has a nonzero chance of being mapped into differentquadrants

Let frac14s be an inclusion intensity function that is a function thatspeci es the target number of samples per unit measure We assumethat any linear population consists of a nite number m of smoothrecti able curves R D

SmiD1fdegit D xi t yi t jt 2 [ai bi ]g with

xi and yi continuous and differentiable on [ai bi ] We set frac14s equalto the target number of samples per unit length at s for s 2 L and

equal to zero elsewhere For example if the linear population werea stream network frac14s would specify the desired number of samplesper kilometer of stream at the point s Finally an areal population isa nite collection of closed polygons In this case frac14s speci es thetarget intensity as number of samples per unit area Note that for one-and two-dimensional resources frac14s could be a continuous smoothlyvarying function Formally we require frac14s to be bounded and mea-surable strictly positive on R and zero elsewhere and scaled so thatM D

RR frac14s dAacutes From these de nitions of frac14cent and Aacutecent it follows

that wB DR

B frac14sdAacutes is a measure and that wB is the targetnumber of samples in B In particular M D wI2 is the target samplesize In the following discussion we assume that M is an integer thenoninteger case is a simple extension

Let f cent be a quadrant-recursive function that maps I2 into I Be-cause BI can be generated by sets of the form J n

m and BI2 can begenerated by sets of the form Qn

j k both f and f iexcl1 are measurable

Because f is measurable f iexcl1B is measurable for B 2 BI so thatQF x D

Rf iexcl10x] frac14sdAacutes exists In fact QF is a distribution func-

tion that is nonnegative increasing and right continuous For linearand areal resources QF is a continuous increasing function but for -nite resourcepopulations QF is a step function with jumps at the imagesof populationelements We can modify QF to obtain continuity in the -nite case via linear interpolation that is let xi i D 1 N be the or-dered jump points of QF set x0 D 0 xNC1 D 1 and for xi lt x middot xiC1set F x D QF x C QFxiC1 iexcl QFxi =xiC1 iexcl xi x iexcl xi If we setF D QF for the linear and areal case then in every cases we have thatF is a continuous distribution function with range 0M]

In the nite case Fiexcl1 is single-valued so that Gy D minxi jF iexcl1y middot xi is well de ned In the linear and areal cases F iexcl1

may not be single-valued Points that are in the unit square but notin R lead to ats in F that correspond to regions in the unit squarewith frac14s D 0 However Fiexcl1y always will be closed and boundedso that Gy D minfxjx 2 Fiexcl1yg is well de ned In all cases theintensity function frac14 is positive at s D f iexcl1Gy that is there isa population element at s Thus f iexcl1 plusmn G maps 0M] onto the tar-get population that is f iexcl1 plusmn G associates every point in 0 M] witha unique element in the population

It follows that selecting a sample from 0 M] also selects pop-ulation elements via the mapping f iexcl1 plusmn G To get a sample withan inclusion function equal to the target inclusion density we selecta sample from 0M] by splitting the range into M unit-length in-tervals 01] 1 2] M iexcl 1M] and picking one point in eachinterval Because of hierarchical randomizationwe gain no additionalldquorandomnessrdquo by picking the points independently so we use system-atic sampling with a random start and a unit-length selection intervalThe selection procedure de nes an inclusion probability density func-tion on 0M] with a correspondingmeasure PM cent Note that PM co-incides with Lebesgue measure on 0 M] in particular the measureof a subinterval of 0 M] is its length We induce a measure P1 on I

via P1B DRGiexcl1B dP M and in turn induce a measure P2 on I2

via P2B DR

f iexcl1B dP 1 The measure P2 is an inclusion probability

measure on I2 and P2B D wB so the sample selection methoddoes give an inclusion probability function equal to the target sampleintensity function

[Received August 2002 Revised September 2003]

REFERENCES

Bellhouse D R (1977) ldquoSome Optimal Designs for Sampling in Two Dimen-sionsrdquo Biometrika 64 605ndash611

Bickford C A Mayer C E and Ware K D (1963) ldquoAn Ef cient Sam-pling Design for Forest Inventory The Northeast Forest Resurveyrdquo Journalof Forestry 61 826ndash833

278 Journal of the American Statistical Association March 2004

Breidt F J (1995) ldquoMarkov Chain Designs for One-per-Stratum SamplingrdquoSurvey Methodology 21 63ndash70

Brewer K R W and Hanif M (1983) Sampling With Unequal ProbabilitiesNew York Springer-Verlag

Cochran W G (1946) ldquoRelative Accuracy of Systematic and Strati ed Ran-dom Samples for a Certain Class of Populationsrdquo The Annals of Mathemati-cal Statistics 17 164ndash177

Cordy C (1993) ldquoAn Extension of the HorvitzndashThompson Theorem to PointSampling From a Continuous Universerdquo Probability and Statistics Letters18 353ndash362

Cotter J and Nealon J (1987) ldquoArea Frame Design for Agricultural SurveysrdquoUS Department of Agriculture National Agricultural Statistics Service Re-search and Applications Division Area Frame Section

Dalenius T Haacutejek J and Zubrzycki S (1961) ldquoOn Plane Sampling and Re-lated Geometrical Problemsrdquo in Proceedings of the 4th Berkeley Symposiumon Probability and Mathematical Statistics 1 125ndash150

Das A C (1950) ldquoTwo-Dimensional Systematic Sampling and the AssociatedStrati ed and Random Samplingrdquo Sankhya 10 95ndash108

Gibson L and Lucas D (1982) ldquoSpatial Data Processing Using BalancedTernaryrdquo in Proceedings of the IEEE Computer Society Conference on Pat-tern Recognition and Image Processing Silver Springs MD IEEE ComputerSociety Press

Gilbert R O (1987) Statistical Methods for Environmental Pollution Moni-toring New York Van Nostrand Reinhold

Hausdorff F (1957) Set Theory New York ChelseaHazard J W and Law B E (1989) Forest Survey Methods Used in the USDA

Forest Service EPA6003-89065 Corvallis Oregon US EnvironmentalProtection Agency Of ce of Research and Development Environmental Re-search Laboratory

Horn C R and Grayman W M (1993) ldquoWater-Quality Modeling With EPARiver Reach File Systemrdquo Journal of Water Resources Planning and Man-agement 119 262ndash274

Horvitz D G and Thompson D J (1952) ldquoA Generalization of SamplingWithout Replacement From a Finite Universerdquo Journal of the American Sta-tistical Association 47 663ndash685

Iachan R (1985) ldquoPlane Samplingrdquo Statistics and Probability Letters 50151ndash159

IDEM (2000) ldquoIndiana Water Quality Report 2000rdquo Report IDEM34020012000 Indiana Department of Environmental Management Of ce of Wa-ter Management Indianapolis Indiana

Insightful Corporation (2002) ldquoS-PLUS 6 for Windows Language ReferencerdquoInsightful Corporation Seattle WA

Karr J R (1991) ldquoBiological Integrity A Long Neglected Aspect of WaterResource Managementrdquo Ecological Applications 1 66ndash84

Kish L (1987) Statistical Design for Research New York WileyMahalanobis P C (1946) ldquoRecent Experiments in Statistical Sampling in

the Indian Statistical Instituterdquo Journal of the Royal Statistical Society 109325ndash370

Mark D M (1990) ldquoNeighbor-Based Properties of Some Orderings of Two-Dimensional Spacerdquo Geographical Analysis 2 145ndash157

Mateacutern B (1960) Spatial Variation Stockholm Sweden Meddelanden fraringnStatens Skogsforskningsinstitut

Messer J J Arsiss C W Baker J R Drouseacute S K Eshleman K NKaufmann P R Linthurst R A Omernik J M Overton W S Sale M JSchonbrod R D Stambaugh S M and Tuschall J R Jr (1986) Na-tional Surface Water Survey National Stream Survey Phase I-Pilot SurveyEPA-6004-86026 Washington DC US Environmental ProtectionAgency

MunhollandP L and Borkowski J J (1996) ldquoSimple Latin Square SamplingC 1 A Spatial Design Using Quadratsrdquo Biometrics 52 125ndash136

Olea R A (1984) ldquoSampling Design Optimization for Spatial FunctionsrdquoMathematical Geology 16 369ndash392

Overton W S and Stehman S V (1993) ldquoProperties of Designs for SamplingContinuous Spatial Resources From a Triangular Gridrdquo Communications inStatistics Part AmdashTheory and Methods 22 2641ndash2660

Patterson H D (1950) ldquoSampling on Successive Occasions With Partial Re-placement of Unitsrdquo Journal of the Royal Statistical Society Ser B 12241ndash255

Peano G (1890) ldquoSur Une Courbe Qui Remplit Toute Une Aire Planerdquo Math-ematische Annalen 36 157ndash160

Quenouille M H (1949) ldquoProblems in Plane Samplingrdquo The Annals of Math-ematical Statistics 20 335ndash375

Saalfeld A (1991) ldquoConstruction of Spatially Articulated List Frames forHousehold Surveysrdquo in Proceedings of Statistics Canada Symposium 91Spatial Issues in Statistics Ottawa Canada Statistics Canada pp 41ndash53

Sen A R (1953) ldquoOn the Estimate of the Variance in Sampling With Vary-ing Probabilitiesrdquo Journal of the Indian Society of Agricultural Statistics 7119ndash127

Simmons G F (1963) Introduction to Topology and Modern Analysis NewYork McGrawndashHill

Stehman S V and Overton W S (1994) ldquoEnvironmental Sampling and Mon-itoringrdquo in Handbook of Statistics Vol 12 eds G P Patil and C R RaoAmsterdam The Netherlands Elsevier Science pp 263ndash305

Stevens D L Jr (1997) ldquoVariable Density Grid-Based Sampling Designs forContinuous Spatial Populationsrdquo Environmetrics 8 167ndash195

Stevens D L Jr and Olsen A R (1999) ldquoSpatially Restricted Surveys OverTime for Aquatic Resourcesrdquo Journal of Agricultural Biological and Envi-ronmental Statistics 4 415ndash428

(2000) ldquoSpatially-Restricted Random Sampling Designs for Design-Based and Model-Based Estimationrdquo in Accuracy 2000 Proceedings of the4th International Symposium on Spatial Accuracy Assessment in Natural Re-sources and Environmental Sciences Delft The Netherlands Delft Univer-sity Press pp 609ndash616

(2003) ldquoVariance Estimation for Spatially Balanced Samples of Envi-ronmental Resourcesrdquo Environmetrics 14 593ndash610

Strahler A N (1957) ldquoQuantitative Analysis of Watershed GeomorphologyrdquoTransactions of the American Geophysical Union 38 913ndash920

Thompson S K (1992) Sampling New York WileyUrquhart N S Overton W S and Birkes D S (1993) ldquoComparing

Sampling Designs for Monitoring Ecological Status and Trends Impact ofTemporal Patternsrdquo in Statistics for the Environment eds V Barnett andK F Turkman New York Wiley pp 71ndash86

USGS (1999) ldquoThe National Hydrography Datasetrdquo Fact Sheet 106-99 USGeological Survey

Wolter K (1985) Introduction to Variance Estimation New York Springer-Verlag

Wolter K M and Harter R M (1990) ldquoSample Maintenance Based on PeanoKeysrdquo in Proceedings of the 1989 International Symposium Analysis of Datain Time Ottawa Canada Statistics Canada pp 21ndash31

Yates F (1981) Sampling Methods for Censuses and Surveys (4th ed) Lon-don Grif n

Yates F and Grundy P M (1953) ldquoSelection Without Replacement FromWithin Strata With Probability Proportional to Sizerdquo Journal of the RoyalStatistical Society Ser B 15 253ndash261

Page 12: Spatially Balanced Sampling of Natural Resources · Spatially Balanced Sampling of Natural Resources DonL.STEVENSJr. and Anthony R. OLSEN The spatial distribution of a natural resource

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 273

Figure 8 Great Miami River Basin Sample Sites by Multidensity Categories

274 Journal of the American Statistical Association March 2004

Figure 9 Upper Illinois River Basin Sample Sites by Multidensity Categories

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 275

Figure 10 Lower Wabash River Basin Sample Sites by Multidensity Categories

276 Journal of the American Statistical Association March 2004

Table 5 Population Estimates With IRS and Local Variance Estimates

Indicator IRS Local DifferenceSubpopulation score N sites Mean std err std err ()

L Wabash IBI 39 361 21 14 iexcl568U Illinois IBI 41 325 17 13 iexcl448E Fork White IBI 32 351 13 12 iexcl223Great Miami IBI 19 408 27 22 iexcl335L Wabash QHEI 39 556 23 16 iexcl522U Illinois QHEI 41 433 21 16 iexcl395E Fork White QHEI 34 543 21 15 iexcl459Great Miami QHEI 19 678 22 19 iexcl260

Table 5 summarizes the population estimates for IBI andQHEI scores for each of the four basins The associated stan-dard error estimates are based on the HorvitzndashThompson ratiovariance estimator assuming an independent random sampleand on the local neighborhoodvariance estimator described inSection 41 On average the neighborhood variance estimatoris 38 smaller than the IRS variance estimator Figure 11 il-lustrates the impact of the variance estimators on con denceintervals for cumulative distribution function estimates for theLower Wabash Basin

6 DISCUSSION

There are a number of designs that provide good disper-sion of sample points over a spatial domain When we appliedthese designs to large-scale environmental sampling programsit quickly became apparent that we needed a means (1) toaccommodate variable inclusion probability and (2) to adjustsample sizes dynamically These requirements are rooted inthe very fundamentals of environmental management The rstrequirement stems from the fact that an environmental re-source is rarely uniformly important in the objective of themonitoring there are always scienti c economic or politi-cal reasons for sampling some portions of a resource moreintensively than others Two features of environmental moni-toring programs drive the second requirement First these pro-grams tend to be long lived so that even if the objectives ofthe program remain unchangedthe ldquoimportantrdquosubpopulationschange necessitating a corresponding change in sampling in-tensity Second a high-quality sampling frame is often lack-ing for environmental resource populationsAs far as we knowthere is no other technique for spatial sampling that ldquobalancesrdquoover an intensity metric instead of a Euclidean distance metricor permits dynamic modi cation of sample intensity

Adaptive sampling (Thompson 1992 pp 261ndash319) is an-other way to modify sample intensity However there are somesigni cant differences between GRTS and adaptive sampling inthe way the modi cation is accomplished Adaptive samplingincreases the sampling intensity locally depending on the re-sponse observed at a sample point whereas the GRTS intensitychange is global

The GRTS rst-order inclusion probability (or density) canbe made proportional to an arbitrary positive auxiliary vari-able for example a signal from a remote sensing platformor a sample intensity that varies by geographical divisions orknown physical characteristics of the target populationIn somepoint and linear situations it may be desirable to have thesample be spatially balanced with respect to geographic spacerather than with respect to the population density This can be

Figure 11 Stream Network and Sample Site Spatial Patterns by Mul-tidensity Category for the Lower Wabash Basin (mdashmdash- CDF estimateiexcl iexcl iexcl iexcl iexcl 95 local condence limits cent cent cent cent cent cent cent cent cent95 IRS condencelimits)

achieved by making the inclusion probability inversely propor-tional to the population density Although the development ofGRTS has focused on applications in geographic space it canbe applied in other spaces For exampleone applicationde nedtwo-dimensionalspace by the rst two principal componentsofclimate variables and selected a GRTS sample of forest plots inthat space

The computational burden in hierarchical randomization canbe substantial However it needs to be carried out only to a res-olution suf cient to obtain no more than one sample point persubquadrant The actual point selection can be carried out bytreating the subquadrants as if they are elements of a nitepopulation selecting the M subquadrants to receive samplepoints and then selecting one population element at randomfrom among the elements contained within the selected sub-quadrants according to the probability speci ed by frac14

Reverse hierarchical ordering adds a feature that is im-mensely popular with eld practitioners namely the ability toldquoreplacerdquo samples that are lost due to being nontarget or inac-cessible Moreover we can replace the samples in such a wayas to achieve good spatial balance over the population that isactually sampleable even when sampleability cannot be deter-mined prior to sample selectionOf course this feature does noteliminate the nonresponse or the bias of an inference to the in-accessible population It does however allow investigators toobtain the maximum number of samples that their budget willpermit them to analyze

Reverse hierarchical ordering has other uses as well One isto generate interpenetrating subsamples (Mahalanobis 1946)For example 10 interpenetrating subsamples from a samplesize of 100 can be obtained simply by taking consecutive sub-sets of 10 from the reverse hierarchical ordering Each subsethas the same properties as the complete design Consecutivesubsets can also be used to de ne panels of sites for applica-tion in surveys over time for example sampling with partial re-placement (Patterson 1950 Kish 1987 Urquhart Overton andBirkes 1993)

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 277

APPENDIX A PROOF OF LEMMA

Lemma Let f I2 I be a 1ndash1 quadrant-recursive function andlet s raquo UI2 Then limjplusmnj0 Efjf s iexcl f s C plusmnjg D 0

Proof If for some n gt 0 s and s C plusmn are in the same subquad-rant Qn

jk then f s and f s C plusmn are in the same interval J nm so

that jf s iexcl f s C plusmnj middot 1=4n The probability that s and s C plusmn arein the same subquadrant is the same as the probability of the ori-gin and plusmn D plusmnx plusmny being in the same cell of a randomly locatedgrid with cells congruent to Qn

jk For plusmnx plusmny middot 1=2n that probability

is equal to jQn0 Qnplusmnj=jQn0j D 1 iexcl 2nplusmnx C plusmny C 4nplusmnxplusmny where Qnx denotes a polygon congruent to Qn

jk centered on x

For Ds plusmn D jf s iexcl f s C plusmnj then we have that P D middot 1=4n cedil1 iexcl 2nplusmnx C plusmny C 4nplusmnxplusmny Thus the distribution function FD of D isbounded below by

FD u cedil

8gtlt

gt

0 u middot 14n

1 iexcl 2nplusmnx C plusmny C 4nplusmnxplusmny u gt1

4n

Because D is positive and bounded above by 1

E[Dplusmn] D 1 iexclZ 1

0FD udu

middot 1 iexclraquo

0

4nC

sup31 iexcl 1

4n

acuteiexcl 2nplusmnx C plusmny C 4nplusmnxplusmny

frac14

For xed n we have that

limjplusmnj0

E[Dplusmn] middot 14n

but this holds for all n so that

limjplusmnj0

E[Dplusmn] D 0

APPENDIX B PROOF THAT THE PROBABILITYINCLUSION FUNCTION EQUALS THE

TARGET INTENSITY FUNCTION

We need the measure space XB Aacute where X is the unit inter-val I D 0 1] or the unit square I2 D 0 1] pound 01] and the rele-vant frac34 elds are BI and BI2 the frac34 elds of the Borel subsetsof I and I2 respectively For each of the three types of populationswe de ne a measure Aacute of population size We use the same symbolfor all three cases but the speci cs vary from case to case For a -nite population we take Aacute to be counting measure restricted to R sothat for any subset B 2 BI2 AacuteB is the number of population ele-ments in B R For linear populations we take AacuteB to be the lengthof the linear population contained within B Clearly Aacute is nonnega-tive countably additive de ned for all Borel sets and Aacute D 0 soAacute is a measure Finally for areal populations we take AacuteB to be theLebesgue measure of B R

We begin by randomly translating the image of R in the unit squareby adding independent U0 1=2 offsets to the xy coordinates Thisrandom translation plays the same role as random grid location doesin an RTS design namely it guarantees that pairwise inclusion prob-abilities are nonzero In particular in this case it ensures that any pairof points in R has a nonzero chance of being mapped into differentquadrants

Let frac14s be an inclusion intensity function that is a function thatspeci es the target number of samples per unit measure We assumethat any linear population consists of a nite number m of smoothrecti able curves R D

SmiD1fdegit D xi t yi t jt 2 [ai bi ]g with

xi and yi continuous and differentiable on [ai bi ] We set frac14s equalto the target number of samples per unit length at s for s 2 L and

equal to zero elsewhere For example if the linear population werea stream network frac14s would specify the desired number of samplesper kilometer of stream at the point s Finally an areal population isa nite collection of closed polygons In this case frac14s speci es thetarget intensity as number of samples per unit area Note that for one-and two-dimensional resources frac14s could be a continuous smoothlyvarying function Formally we require frac14s to be bounded and mea-surable strictly positive on R and zero elsewhere and scaled so thatM D

RR frac14s dAacutes From these de nitions of frac14cent and Aacutecent it follows

that wB DR

B frac14sdAacutes is a measure and that wB is the targetnumber of samples in B In particular M D wI2 is the target samplesize In the following discussion we assume that M is an integer thenoninteger case is a simple extension

Let f cent be a quadrant-recursive function that maps I2 into I Be-cause BI can be generated by sets of the form J n

m and BI2 can begenerated by sets of the form Qn

j k both f and f iexcl1 are measurable

Because f is measurable f iexcl1B is measurable for B 2 BI so thatQF x D

Rf iexcl10x] frac14sdAacutes exists In fact QF is a distribution func-

tion that is nonnegative increasing and right continuous For linearand areal resources QF is a continuous increasing function but for -nite resourcepopulations QF is a step function with jumps at the imagesof populationelements We can modify QF to obtain continuity in the -nite case via linear interpolation that is let xi i D 1 N be the or-dered jump points of QF set x0 D 0 xNC1 D 1 and for xi lt x middot xiC1set F x D QF x C QFxiC1 iexcl QFxi =xiC1 iexcl xi x iexcl xi If we setF D QF for the linear and areal case then in every cases we have thatF is a continuous distribution function with range 0M]

In the nite case Fiexcl1 is single-valued so that Gy D minxi jF iexcl1y middot xi is well de ned In the linear and areal cases F iexcl1

may not be single-valued Points that are in the unit square but notin R lead to ats in F that correspond to regions in the unit squarewith frac14s D 0 However Fiexcl1y always will be closed and boundedso that Gy D minfxjx 2 Fiexcl1yg is well de ned In all cases theintensity function frac14 is positive at s D f iexcl1Gy that is there isa population element at s Thus f iexcl1 plusmn G maps 0M] onto the tar-get population that is f iexcl1 plusmn G associates every point in 0 M] witha unique element in the population

It follows that selecting a sample from 0 M] also selects pop-ulation elements via the mapping f iexcl1 plusmn G To get a sample withan inclusion function equal to the target inclusion density we selecta sample from 0M] by splitting the range into M unit-length in-tervals 01] 1 2] M iexcl 1M] and picking one point in eachinterval Because of hierarchical randomizationwe gain no additionalldquorandomnessrdquo by picking the points independently so we use system-atic sampling with a random start and a unit-length selection intervalThe selection procedure de nes an inclusion probability density func-tion on 0M] with a correspondingmeasure PM cent Note that PM co-incides with Lebesgue measure on 0 M] in particular the measureof a subinterval of 0 M] is its length We induce a measure P1 on I

via P1B DRGiexcl1B dP M and in turn induce a measure P2 on I2

via P2B DR

f iexcl1B dP 1 The measure P2 is an inclusion probability

measure on I2 and P2B D wB so the sample selection methoddoes give an inclusion probability function equal to the target sampleintensity function

[Received August 2002 Revised September 2003]

REFERENCES

Bellhouse D R (1977) ldquoSome Optimal Designs for Sampling in Two Dimen-sionsrdquo Biometrika 64 605ndash611

Bickford C A Mayer C E and Ware K D (1963) ldquoAn Ef cient Sam-pling Design for Forest Inventory The Northeast Forest Resurveyrdquo Journalof Forestry 61 826ndash833

278 Journal of the American Statistical Association March 2004

Breidt F J (1995) ldquoMarkov Chain Designs for One-per-Stratum SamplingrdquoSurvey Methodology 21 63ndash70

Brewer K R W and Hanif M (1983) Sampling With Unequal ProbabilitiesNew York Springer-Verlag

Cochran W G (1946) ldquoRelative Accuracy of Systematic and Strati ed Ran-dom Samples for a Certain Class of Populationsrdquo The Annals of Mathemati-cal Statistics 17 164ndash177

Cordy C (1993) ldquoAn Extension of the HorvitzndashThompson Theorem to PointSampling From a Continuous Universerdquo Probability and Statistics Letters18 353ndash362

Cotter J and Nealon J (1987) ldquoArea Frame Design for Agricultural SurveysrdquoUS Department of Agriculture National Agricultural Statistics Service Re-search and Applications Division Area Frame Section

Dalenius T Haacutejek J and Zubrzycki S (1961) ldquoOn Plane Sampling and Re-lated Geometrical Problemsrdquo in Proceedings of the 4th Berkeley Symposiumon Probability and Mathematical Statistics 1 125ndash150

Das A C (1950) ldquoTwo-Dimensional Systematic Sampling and the AssociatedStrati ed and Random Samplingrdquo Sankhya 10 95ndash108

Gibson L and Lucas D (1982) ldquoSpatial Data Processing Using BalancedTernaryrdquo in Proceedings of the IEEE Computer Society Conference on Pat-tern Recognition and Image Processing Silver Springs MD IEEE ComputerSociety Press

Gilbert R O (1987) Statistical Methods for Environmental Pollution Moni-toring New York Van Nostrand Reinhold

Hausdorff F (1957) Set Theory New York ChelseaHazard J W and Law B E (1989) Forest Survey Methods Used in the USDA

Forest Service EPA6003-89065 Corvallis Oregon US EnvironmentalProtection Agency Of ce of Research and Development Environmental Re-search Laboratory

Horn C R and Grayman W M (1993) ldquoWater-Quality Modeling With EPARiver Reach File Systemrdquo Journal of Water Resources Planning and Man-agement 119 262ndash274

Horvitz D G and Thompson D J (1952) ldquoA Generalization of SamplingWithout Replacement From a Finite Universerdquo Journal of the American Sta-tistical Association 47 663ndash685

Iachan R (1985) ldquoPlane Samplingrdquo Statistics and Probability Letters 50151ndash159

IDEM (2000) ldquoIndiana Water Quality Report 2000rdquo Report IDEM34020012000 Indiana Department of Environmental Management Of ce of Wa-ter Management Indianapolis Indiana

Insightful Corporation (2002) ldquoS-PLUS 6 for Windows Language ReferencerdquoInsightful Corporation Seattle WA

Karr J R (1991) ldquoBiological Integrity A Long Neglected Aspect of WaterResource Managementrdquo Ecological Applications 1 66ndash84

Kish L (1987) Statistical Design for Research New York WileyMahalanobis P C (1946) ldquoRecent Experiments in Statistical Sampling in

the Indian Statistical Instituterdquo Journal of the Royal Statistical Society 109325ndash370

Mark D M (1990) ldquoNeighbor-Based Properties of Some Orderings of Two-Dimensional Spacerdquo Geographical Analysis 2 145ndash157

Mateacutern B (1960) Spatial Variation Stockholm Sweden Meddelanden fraringnStatens Skogsforskningsinstitut

Messer J J Arsiss C W Baker J R Drouseacute S K Eshleman K NKaufmann P R Linthurst R A Omernik J M Overton W S Sale M JSchonbrod R D Stambaugh S M and Tuschall J R Jr (1986) Na-tional Surface Water Survey National Stream Survey Phase I-Pilot SurveyEPA-6004-86026 Washington DC US Environmental ProtectionAgency

MunhollandP L and Borkowski J J (1996) ldquoSimple Latin Square SamplingC 1 A Spatial Design Using Quadratsrdquo Biometrics 52 125ndash136

Olea R A (1984) ldquoSampling Design Optimization for Spatial FunctionsrdquoMathematical Geology 16 369ndash392

Overton W S and Stehman S V (1993) ldquoProperties of Designs for SamplingContinuous Spatial Resources From a Triangular Gridrdquo Communications inStatistics Part AmdashTheory and Methods 22 2641ndash2660

Patterson H D (1950) ldquoSampling on Successive Occasions With Partial Re-placement of Unitsrdquo Journal of the Royal Statistical Society Ser B 12241ndash255

Peano G (1890) ldquoSur Une Courbe Qui Remplit Toute Une Aire Planerdquo Math-ematische Annalen 36 157ndash160

Quenouille M H (1949) ldquoProblems in Plane Samplingrdquo The Annals of Math-ematical Statistics 20 335ndash375

Saalfeld A (1991) ldquoConstruction of Spatially Articulated List Frames forHousehold Surveysrdquo in Proceedings of Statistics Canada Symposium 91Spatial Issues in Statistics Ottawa Canada Statistics Canada pp 41ndash53

Sen A R (1953) ldquoOn the Estimate of the Variance in Sampling With Vary-ing Probabilitiesrdquo Journal of the Indian Society of Agricultural Statistics 7119ndash127

Simmons G F (1963) Introduction to Topology and Modern Analysis NewYork McGrawndashHill

Stehman S V and Overton W S (1994) ldquoEnvironmental Sampling and Mon-itoringrdquo in Handbook of Statistics Vol 12 eds G P Patil and C R RaoAmsterdam The Netherlands Elsevier Science pp 263ndash305

Stevens D L Jr (1997) ldquoVariable Density Grid-Based Sampling Designs forContinuous Spatial Populationsrdquo Environmetrics 8 167ndash195

Stevens D L Jr and Olsen A R (1999) ldquoSpatially Restricted Surveys OverTime for Aquatic Resourcesrdquo Journal of Agricultural Biological and Envi-ronmental Statistics 4 415ndash428

(2000) ldquoSpatially-Restricted Random Sampling Designs for Design-Based and Model-Based Estimationrdquo in Accuracy 2000 Proceedings of the4th International Symposium on Spatial Accuracy Assessment in Natural Re-sources and Environmental Sciences Delft The Netherlands Delft Univer-sity Press pp 609ndash616

(2003) ldquoVariance Estimation for Spatially Balanced Samples of Envi-ronmental Resourcesrdquo Environmetrics 14 593ndash610

Strahler A N (1957) ldquoQuantitative Analysis of Watershed GeomorphologyrdquoTransactions of the American Geophysical Union 38 913ndash920

Thompson S K (1992) Sampling New York WileyUrquhart N S Overton W S and Birkes D S (1993) ldquoComparing

Sampling Designs for Monitoring Ecological Status and Trends Impact ofTemporal Patternsrdquo in Statistics for the Environment eds V Barnett andK F Turkman New York Wiley pp 71ndash86

USGS (1999) ldquoThe National Hydrography Datasetrdquo Fact Sheet 106-99 USGeological Survey

Wolter K (1985) Introduction to Variance Estimation New York Springer-Verlag

Wolter K M and Harter R M (1990) ldquoSample Maintenance Based on PeanoKeysrdquo in Proceedings of the 1989 International Symposium Analysis of Datain Time Ottawa Canada Statistics Canada pp 21ndash31

Yates F (1981) Sampling Methods for Censuses and Surveys (4th ed) Lon-don Grif n

Yates F and Grundy P M (1953) ldquoSelection Without Replacement FromWithin Strata With Probability Proportional to Sizerdquo Journal of the RoyalStatistical Society Ser B 15 253ndash261

Page 13: Spatially Balanced Sampling of Natural Resources · Spatially Balanced Sampling of Natural Resources DonL.STEVENSJr. and Anthony R. OLSEN The spatial distribution of a natural resource

274 Journal of the American Statistical Association March 2004

Figure 9 Upper Illinois River Basin Sample Sites by Multidensity Categories

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 275

Figure 10 Lower Wabash River Basin Sample Sites by Multidensity Categories

276 Journal of the American Statistical Association March 2004

Table 5 Population Estimates With IRS and Local Variance Estimates

Indicator IRS Local DifferenceSubpopulation score N sites Mean std err std err ()

L Wabash IBI 39 361 21 14 iexcl568U Illinois IBI 41 325 17 13 iexcl448E Fork White IBI 32 351 13 12 iexcl223Great Miami IBI 19 408 27 22 iexcl335L Wabash QHEI 39 556 23 16 iexcl522U Illinois QHEI 41 433 21 16 iexcl395E Fork White QHEI 34 543 21 15 iexcl459Great Miami QHEI 19 678 22 19 iexcl260

Table 5 summarizes the population estimates for IBI andQHEI scores for each of the four basins The associated stan-dard error estimates are based on the HorvitzndashThompson ratiovariance estimator assuming an independent random sampleand on the local neighborhoodvariance estimator described inSection 41 On average the neighborhood variance estimatoris 38 smaller than the IRS variance estimator Figure 11 il-lustrates the impact of the variance estimators on con denceintervals for cumulative distribution function estimates for theLower Wabash Basin

6 DISCUSSION

There are a number of designs that provide good disper-sion of sample points over a spatial domain When we appliedthese designs to large-scale environmental sampling programsit quickly became apparent that we needed a means (1) toaccommodate variable inclusion probability and (2) to adjustsample sizes dynamically These requirements are rooted inthe very fundamentals of environmental management The rstrequirement stems from the fact that an environmental re-source is rarely uniformly important in the objective of themonitoring there are always scienti c economic or politi-cal reasons for sampling some portions of a resource moreintensively than others Two features of environmental moni-toring programs drive the second requirement First these pro-grams tend to be long lived so that even if the objectives ofthe program remain unchangedthe ldquoimportantrdquosubpopulationschange necessitating a corresponding change in sampling in-tensity Second a high-quality sampling frame is often lack-ing for environmental resource populationsAs far as we knowthere is no other technique for spatial sampling that ldquobalancesrdquoover an intensity metric instead of a Euclidean distance metricor permits dynamic modi cation of sample intensity

Adaptive sampling (Thompson 1992 pp 261ndash319) is an-other way to modify sample intensity However there are somesigni cant differences between GRTS and adaptive sampling inthe way the modi cation is accomplished Adaptive samplingincreases the sampling intensity locally depending on the re-sponse observed at a sample point whereas the GRTS intensitychange is global

The GRTS rst-order inclusion probability (or density) canbe made proportional to an arbitrary positive auxiliary vari-able for example a signal from a remote sensing platformor a sample intensity that varies by geographical divisions orknown physical characteristics of the target populationIn somepoint and linear situations it may be desirable to have thesample be spatially balanced with respect to geographic spacerather than with respect to the population density This can be

Figure 11 Stream Network and Sample Site Spatial Patterns by Mul-tidensity Category for the Lower Wabash Basin (mdashmdash- CDF estimateiexcl iexcl iexcl iexcl iexcl 95 local condence limits cent cent cent cent cent cent cent cent cent95 IRS condencelimits)

achieved by making the inclusion probability inversely propor-tional to the population density Although the development ofGRTS has focused on applications in geographic space it canbe applied in other spaces For exampleone applicationde nedtwo-dimensionalspace by the rst two principal componentsofclimate variables and selected a GRTS sample of forest plots inthat space

The computational burden in hierarchical randomization canbe substantial However it needs to be carried out only to a res-olution suf cient to obtain no more than one sample point persubquadrant The actual point selection can be carried out bytreating the subquadrants as if they are elements of a nitepopulation selecting the M subquadrants to receive samplepoints and then selecting one population element at randomfrom among the elements contained within the selected sub-quadrants according to the probability speci ed by frac14

Reverse hierarchical ordering adds a feature that is im-mensely popular with eld practitioners namely the ability toldquoreplacerdquo samples that are lost due to being nontarget or inac-cessible Moreover we can replace the samples in such a wayas to achieve good spatial balance over the population that isactually sampleable even when sampleability cannot be deter-mined prior to sample selectionOf course this feature does noteliminate the nonresponse or the bias of an inference to the in-accessible population It does however allow investigators toobtain the maximum number of samples that their budget willpermit them to analyze

Reverse hierarchical ordering has other uses as well One isto generate interpenetrating subsamples (Mahalanobis 1946)For example 10 interpenetrating subsamples from a samplesize of 100 can be obtained simply by taking consecutive sub-sets of 10 from the reverse hierarchical ordering Each subsethas the same properties as the complete design Consecutivesubsets can also be used to de ne panels of sites for applica-tion in surveys over time for example sampling with partial re-placement (Patterson 1950 Kish 1987 Urquhart Overton andBirkes 1993)

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 277

APPENDIX A PROOF OF LEMMA

Lemma Let f I2 I be a 1ndash1 quadrant-recursive function andlet s raquo UI2 Then limjplusmnj0 Efjf s iexcl f s C plusmnjg D 0

Proof If for some n gt 0 s and s C plusmn are in the same subquad-rant Qn

jk then f s and f s C plusmn are in the same interval J nm so

that jf s iexcl f s C plusmnj middot 1=4n The probability that s and s C plusmn arein the same subquadrant is the same as the probability of the ori-gin and plusmn D plusmnx plusmny being in the same cell of a randomly locatedgrid with cells congruent to Qn

jk For plusmnx plusmny middot 1=2n that probability

is equal to jQn0 Qnplusmnj=jQn0j D 1 iexcl 2nplusmnx C plusmny C 4nplusmnxplusmny where Qnx denotes a polygon congruent to Qn

jk centered on x

For Ds plusmn D jf s iexcl f s C plusmnj then we have that P D middot 1=4n cedil1 iexcl 2nplusmnx C plusmny C 4nplusmnxplusmny Thus the distribution function FD of D isbounded below by

FD u cedil

8gtlt

gt

0 u middot 14n

1 iexcl 2nplusmnx C plusmny C 4nplusmnxplusmny u gt1

4n

Because D is positive and bounded above by 1

E[Dplusmn] D 1 iexclZ 1

0FD udu

middot 1 iexclraquo

0

4nC

sup31 iexcl 1

4n

acuteiexcl 2nplusmnx C plusmny C 4nplusmnxplusmny

frac14

For xed n we have that

limjplusmnj0

E[Dplusmn] middot 14n

but this holds for all n so that

limjplusmnj0

E[Dplusmn] D 0

APPENDIX B PROOF THAT THE PROBABILITYINCLUSION FUNCTION EQUALS THE

TARGET INTENSITY FUNCTION

We need the measure space XB Aacute where X is the unit inter-val I D 0 1] or the unit square I2 D 0 1] pound 01] and the rele-vant frac34 elds are BI and BI2 the frac34 elds of the Borel subsetsof I and I2 respectively For each of the three types of populationswe de ne a measure Aacute of population size We use the same symbolfor all three cases but the speci cs vary from case to case For a -nite population we take Aacute to be counting measure restricted to R sothat for any subset B 2 BI2 AacuteB is the number of population ele-ments in B R For linear populations we take AacuteB to be the lengthof the linear population contained within B Clearly Aacute is nonnega-tive countably additive de ned for all Borel sets and Aacute D 0 soAacute is a measure Finally for areal populations we take AacuteB to be theLebesgue measure of B R

We begin by randomly translating the image of R in the unit squareby adding independent U0 1=2 offsets to the xy coordinates Thisrandom translation plays the same role as random grid location doesin an RTS design namely it guarantees that pairwise inclusion prob-abilities are nonzero In particular in this case it ensures that any pairof points in R has a nonzero chance of being mapped into differentquadrants

Let frac14s be an inclusion intensity function that is a function thatspeci es the target number of samples per unit measure We assumethat any linear population consists of a nite number m of smoothrecti able curves R D

SmiD1fdegit D xi t yi t jt 2 [ai bi ]g with

xi and yi continuous and differentiable on [ai bi ] We set frac14s equalto the target number of samples per unit length at s for s 2 L and

equal to zero elsewhere For example if the linear population werea stream network frac14s would specify the desired number of samplesper kilometer of stream at the point s Finally an areal population isa nite collection of closed polygons In this case frac14s speci es thetarget intensity as number of samples per unit area Note that for one-and two-dimensional resources frac14s could be a continuous smoothlyvarying function Formally we require frac14s to be bounded and mea-surable strictly positive on R and zero elsewhere and scaled so thatM D

RR frac14s dAacutes From these de nitions of frac14cent and Aacutecent it follows

that wB DR

B frac14sdAacutes is a measure and that wB is the targetnumber of samples in B In particular M D wI2 is the target samplesize In the following discussion we assume that M is an integer thenoninteger case is a simple extension

Let f cent be a quadrant-recursive function that maps I2 into I Be-cause BI can be generated by sets of the form J n

m and BI2 can begenerated by sets of the form Qn

j k both f and f iexcl1 are measurable

Because f is measurable f iexcl1B is measurable for B 2 BI so thatQF x D

Rf iexcl10x] frac14sdAacutes exists In fact QF is a distribution func-

tion that is nonnegative increasing and right continuous For linearand areal resources QF is a continuous increasing function but for -nite resourcepopulations QF is a step function with jumps at the imagesof populationelements We can modify QF to obtain continuity in the -nite case via linear interpolation that is let xi i D 1 N be the or-dered jump points of QF set x0 D 0 xNC1 D 1 and for xi lt x middot xiC1set F x D QF x C QFxiC1 iexcl QFxi =xiC1 iexcl xi x iexcl xi If we setF D QF for the linear and areal case then in every cases we have thatF is a continuous distribution function with range 0M]

In the nite case Fiexcl1 is single-valued so that Gy D minxi jF iexcl1y middot xi is well de ned In the linear and areal cases F iexcl1

may not be single-valued Points that are in the unit square but notin R lead to ats in F that correspond to regions in the unit squarewith frac14s D 0 However Fiexcl1y always will be closed and boundedso that Gy D minfxjx 2 Fiexcl1yg is well de ned In all cases theintensity function frac14 is positive at s D f iexcl1Gy that is there isa population element at s Thus f iexcl1 plusmn G maps 0M] onto the tar-get population that is f iexcl1 plusmn G associates every point in 0 M] witha unique element in the population

It follows that selecting a sample from 0 M] also selects pop-ulation elements via the mapping f iexcl1 plusmn G To get a sample withan inclusion function equal to the target inclusion density we selecta sample from 0M] by splitting the range into M unit-length in-tervals 01] 1 2] M iexcl 1M] and picking one point in eachinterval Because of hierarchical randomizationwe gain no additionalldquorandomnessrdquo by picking the points independently so we use system-atic sampling with a random start and a unit-length selection intervalThe selection procedure de nes an inclusion probability density func-tion on 0M] with a correspondingmeasure PM cent Note that PM co-incides with Lebesgue measure on 0 M] in particular the measureof a subinterval of 0 M] is its length We induce a measure P1 on I

via P1B DRGiexcl1B dP M and in turn induce a measure P2 on I2

via P2B DR

f iexcl1B dP 1 The measure P2 is an inclusion probability

measure on I2 and P2B D wB so the sample selection methoddoes give an inclusion probability function equal to the target sampleintensity function

[Received August 2002 Revised September 2003]

REFERENCES

Bellhouse D R (1977) ldquoSome Optimal Designs for Sampling in Two Dimen-sionsrdquo Biometrika 64 605ndash611

Bickford C A Mayer C E and Ware K D (1963) ldquoAn Ef cient Sam-pling Design for Forest Inventory The Northeast Forest Resurveyrdquo Journalof Forestry 61 826ndash833

278 Journal of the American Statistical Association March 2004

Breidt F J (1995) ldquoMarkov Chain Designs for One-per-Stratum SamplingrdquoSurvey Methodology 21 63ndash70

Brewer K R W and Hanif M (1983) Sampling With Unequal ProbabilitiesNew York Springer-Verlag

Cochran W G (1946) ldquoRelative Accuracy of Systematic and Strati ed Ran-dom Samples for a Certain Class of Populationsrdquo The Annals of Mathemati-cal Statistics 17 164ndash177

Cordy C (1993) ldquoAn Extension of the HorvitzndashThompson Theorem to PointSampling From a Continuous Universerdquo Probability and Statistics Letters18 353ndash362

Cotter J and Nealon J (1987) ldquoArea Frame Design for Agricultural SurveysrdquoUS Department of Agriculture National Agricultural Statistics Service Re-search and Applications Division Area Frame Section

Dalenius T Haacutejek J and Zubrzycki S (1961) ldquoOn Plane Sampling and Re-lated Geometrical Problemsrdquo in Proceedings of the 4th Berkeley Symposiumon Probability and Mathematical Statistics 1 125ndash150

Das A C (1950) ldquoTwo-Dimensional Systematic Sampling and the AssociatedStrati ed and Random Samplingrdquo Sankhya 10 95ndash108

Gibson L and Lucas D (1982) ldquoSpatial Data Processing Using BalancedTernaryrdquo in Proceedings of the IEEE Computer Society Conference on Pat-tern Recognition and Image Processing Silver Springs MD IEEE ComputerSociety Press

Gilbert R O (1987) Statistical Methods for Environmental Pollution Moni-toring New York Van Nostrand Reinhold

Hausdorff F (1957) Set Theory New York ChelseaHazard J W and Law B E (1989) Forest Survey Methods Used in the USDA

Forest Service EPA6003-89065 Corvallis Oregon US EnvironmentalProtection Agency Of ce of Research and Development Environmental Re-search Laboratory

Horn C R and Grayman W M (1993) ldquoWater-Quality Modeling With EPARiver Reach File Systemrdquo Journal of Water Resources Planning and Man-agement 119 262ndash274

Horvitz D G and Thompson D J (1952) ldquoA Generalization of SamplingWithout Replacement From a Finite Universerdquo Journal of the American Sta-tistical Association 47 663ndash685

Iachan R (1985) ldquoPlane Samplingrdquo Statistics and Probability Letters 50151ndash159

IDEM (2000) ldquoIndiana Water Quality Report 2000rdquo Report IDEM34020012000 Indiana Department of Environmental Management Of ce of Wa-ter Management Indianapolis Indiana

Insightful Corporation (2002) ldquoS-PLUS 6 for Windows Language ReferencerdquoInsightful Corporation Seattle WA

Karr J R (1991) ldquoBiological Integrity A Long Neglected Aspect of WaterResource Managementrdquo Ecological Applications 1 66ndash84

Kish L (1987) Statistical Design for Research New York WileyMahalanobis P C (1946) ldquoRecent Experiments in Statistical Sampling in

the Indian Statistical Instituterdquo Journal of the Royal Statistical Society 109325ndash370

Mark D M (1990) ldquoNeighbor-Based Properties of Some Orderings of Two-Dimensional Spacerdquo Geographical Analysis 2 145ndash157

Mateacutern B (1960) Spatial Variation Stockholm Sweden Meddelanden fraringnStatens Skogsforskningsinstitut

Messer J J Arsiss C W Baker J R Drouseacute S K Eshleman K NKaufmann P R Linthurst R A Omernik J M Overton W S Sale M JSchonbrod R D Stambaugh S M and Tuschall J R Jr (1986) Na-tional Surface Water Survey National Stream Survey Phase I-Pilot SurveyEPA-6004-86026 Washington DC US Environmental ProtectionAgency

MunhollandP L and Borkowski J J (1996) ldquoSimple Latin Square SamplingC 1 A Spatial Design Using Quadratsrdquo Biometrics 52 125ndash136

Olea R A (1984) ldquoSampling Design Optimization for Spatial FunctionsrdquoMathematical Geology 16 369ndash392

Overton W S and Stehman S V (1993) ldquoProperties of Designs for SamplingContinuous Spatial Resources From a Triangular Gridrdquo Communications inStatistics Part AmdashTheory and Methods 22 2641ndash2660

Patterson H D (1950) ldquoSampling on Successive Occasions With Partial Re-placement of Unitsrdquo Journal of the Royal Statistical Society Ser B 12241ndash255

Peano G (1890) ldquoSur Une Courbe Qui Remplit Toute Une Aire Planerdquo Math-ematische Annalen 36 157ndash160

Quenouille M H (1949) ldquoProblems in Plane Samplingrdquo The Annals of Math-ematical Statistics 20 335ndash375

Saalfeld A (1991) ldquoConstruction of Spatially Articulated List Frames forHousehold Surveysrdquo in Proceedings of Statistics Canada Symposium 91Spatial Issues in Statistics Ottawa Canada Statistics Canada pp 41ndash53

Sen A R (1953) ldquoOn the Estimate of the Variance in Sampling With Vary-ing Probabilitiesrdquo Journal of the Indian Society of Agricultural Statistics 7119ndash127

Simmons G F (1963) Introduction to Topology and Modern Analysis NewYork McGrawndashHill

Stehman S V and Overton W S (1994) ldquoEnvironmental Sampling and Mon-itoringrdquo in Handbook of Statistics Vol 12 eds G P Patil and C R RaoAmsterdam The Netherlands Elsevier Science pp 263ndash305

Stevens D L Jr (1997) ldquoVariable Density Grid-Based Sampling Designs forContinuous Spatial Populationsrdquo Environmetrics 8 167ndash195

Stevens D L Jr and Olsen A R (1999) ldquoSpatially Restricted Surveys OverTime for Aquatic Resourcesrdquo Journal of Agricultural Biological and Envi-ronmental Statistics 4 415ndash428

(2000) ldquoSpatially-Restricted Random Sampling Designs for Design-Based and Model-Based Estimationrdquo in Accuracy 2000 Proceedings of the4th International Symposium on Spatial Accuracy Assessment in Natural Re-sources and Environmental Sciences Delft The Netherlands Delft Univer-sity Press pp 609ndash616

(2003) ldquoVariance Estimation for Spatially Balanced Samples of Envi-ronmental Resourcesrdquo Environmetrics 14 593ndash610

Strahler A N (1957) ldquoQuantitative Analysis of Watershed GeomorphologyrdquoTransactions of the American Geophysical Union 38 913ndash920

Thompson S K (1992) Sampling New York WileyUrquhart N S Overton W S and Birkes D S (1993) ldquoComparing

Sampling Designs for Monitoring Ecological Status and Trends Impact ofTemporal Patternsrdquo in Statistics for the Environment eds V Barnett andK F Turkman New York Wiley pp 71ndash86

USGS (1999) ldquoThe National Hydrography Datasetrdquo Fact Sheet 106-99 USGeological Survey

Wolter K (1985) Introduction to Variance Estimation New York Springer-Verlag

Wolter K M and Harter R M (1990) ldquoSample Maintenance Based on PeanoKeysrdquo in Proceedings of the 1989 International Symposium Analysis of Datain Time Ottawa Canada Statistics Canada pp 21ndash31

Yates F (1981) Sampling Methods for Censuses and Surveys (4th ed) Lon-don Grif n

Yates F and Grundy P M (1953) ldquoSelection Without Replacement FromWithin Strata With Probability Proportional to Sizerdquo Journal of the RoyalStatistical Society Ser B 15 253ndash261

Page 14: Spatially Balanced Sampling of Natural Resources · Spatially Balanced Sampling of Natural Resources DonL.STEVENSJr. and Anthony R. OLSEN The spatial distribution of a natural resource

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 275

Figure 10 Lower Wabash River Basin Sample Sites by Multidensity Categories

276 Journal of the American Statistical Association March 2004

Table 5 Population Estimates With IRS and Local Variance Estimates

Indicator IRS Local DifferenceSubpopulation score N sites Mean std err std err ()

L Wabash IBI 39 361 21 14 iexcl568U Illinois IBI 41 325 17 13 iexcl448E Fork White IBI 32 351 13 12 iexcl223Great Miami IBI 19 408 27 22 iexcl335L Wabash QHEI 39 556 23 16 iexcl522U Illinois QHEI 41 433 21 16 iexcl395E Fork White QHEI 34 543 21 15 iexcl459Great Miami QHEI 19 678 22 19 iexcl260

Table 5 summarizes the population estimates for IBI andQHEI scores for each of the four basins The associated stan-dard error estimates are based on the HorvitzndashThompson ratiovariance estimator assuming an independent random sampleand on the local neighborhoodvariance estimator described inSection 41 On average the neighborhood variance estimatoris 38 smaller than the IRS variance estimator Figure 11 il-lustrates the impact of the variance estimators on con denceintervals for cumulative distribution function estimates for theLower Wabash Basin

6 DISCUSSION

There are a number of designs that provide good disper-sion of sample points over a spatial domain When we appliedthese designs to large-scale environmental sampling programsit quickly became apparent that we needed a means (1) toaccommodate variable inclusion probability and (2) to adjustsample sizes dynamically These requirements are rooted inthe very fundamentals of environmental management The rstrequirement stems from the fact that an environmental re-source is rarely uniformly important in the objective of themonitoring there are always scienti c economic or politi-cal reasons for sampling some portions of a resource moreintensively than others Two features of environmental moni-toring programs drive the second requirement First these pro-grams tend to be long lived so that even if the objectives ofthe program remain unchangedthe ldquoimportantrdquosubpopulationschange necessitating a corresponding change in sampling in-tensity Second a high-quality sampling frame is often lack-ing for environmental resource populationsAs far as we knowthere is no other technique for spatial sampling that ldquobalancesrdquoover an intensity metric instead of a Euclidean distance metricor permits dynamic modi cation of sample intensity

Adaptive sampling (Thompson 1992 pp 261ndash319) is an-other way to modify sample intensity However there are somesigni cant differences between GRTS and adaptive sampling inthe way the modi cation is accomplished Adaptive samplingincreases the sampling intensity locally depending on the re-sponse observed at a sample point whereas the GRTS intensitychange is global

The GRTS rst-order inclusion probability (or density) canbe made proportional to an arbitrary positive auxiliary vari-able for example a signal from a remote sensing platformor a sample intensity that varies by geographical divisions orknown physical characteristics of the target populationIn somepoint and linear situations it may be desirable to have thesample be spatially balanced with respect to geographic spacerather than with respect to the population density This can be

Figure 11 Stream Network and Sample Site Spatial Patterns by Mul-tidensity Category for the Lower Wabash Basin (mdashmdash- CDF estimateiexcl iexcl iexcl iexcl iexcl 95 local condence limits cent cent cent cent cent cent cent cent cent95 IRS condencelimits)

achieved by making the inclusion probability inversely propor-tional to the population density Although the development ofGRTS has focused on applications in geographic space it canbe applied in other spaces For exampleone applicationde nedtwo-dimensionalspace by the rst two principal componentsofclimate variables and selected a GRTS sample of forest plots inthat space

The computational burden in hierarchical randomization canbe substantial However it needs to be carried out only to a res-olution suf cient to obtain no more than one sample point persubquadrant The actual point selection can be carried out bytreating the subquadrants as if they are elements of a nitepopulation selecting the M subquadrants to receive samplepoints and then selecting one population element at randomfrom among the elements contained within the selected sub-quadrants according to the probability speci ed by frac14

Reverse hierarchical ordering adds a feature that is im-mensely popular with eld practitioners namely the ability toldquoreplacerdquo samples that are lost due to being nontarget or inac-cessible Moreover we can replace the samples in such a wayas to achieve good spatial balance over the population that isactually sampleable even when sampleability cannot be deter-mined prior to sample selectionOf course this feature does noteliminate the nonresponse or the bias of an inference to the in-accessible population It does however allow investigators toobtain the maximum number of samples that their budget willpermit them to analyze

Reverse hierarchical ordering has other uses as well One isto generate interpenetrating subsamples (Mahalanobis 1946)For example 10 interpenetrating subsamples from a samplesize of 100 can be obtained simply by taking consecutive sub-sets of 10 from the reverse hierarchical ordering Each subsethas the same properties as the complete design Consecutivesubsets can also be used to de ne panels of sites for applica-tion in surveys over time for example sampling with partial re-placement (Patterson 1950 Kish 1987 Urquhart Overton andBirkes 1993)

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 277

APPENDIX A PROOF OF LEMMA

Lemma Let f I2 I be a 1ndash1 quadrant-recursive function andlet s raquo UI2 Then limjplusmnj0 Efjf s iexcl f s C plusmnjg D 0

Proof If for some n gt 0 s and s C plusmn are in the same subquad-rant Qn

jk then f s and f s C plusmn are in the same interval J nm so

that jf s iexcl f s C plusmnj middot 1=4n The probability that s and s C plusmn arein the same subquadrant is the same as the probability of the ori-gin and plusmn D plusmnx plusmny being in the same cell of a randomly locatedgrid with cells congruent to Qn

jk For plusmnx plusmny middot 1=2n that probability

is equal to jQn0 Qnplusmnj=jQn0j D 1 iexcl 2nplusmnx C plusmny C 4nplusmnxplusmny where Qnx denotes a polygon congruent to Qn

jk centered on x

For Ds plusmn D jf s iexcl f s C plusmnj then we have that P D middot 1=4n cedil1 iexcl 2nplusmnx C plusmny C 4nplusmnxplusmny Thus the distribution function FD of D isbounded below by

FD u cedil

8gtlt

gt

0 u middot 14n

1 iexcl 2nplusmnx C plusmny C 4nplusmnxplusmny u gt1

4n

Because D is positive and bounded above by 1

E[Dplusmn] D 1 iexclZ 1

0FD udu

middot 1 iexclraquo

0

4nC

sup31 iexcl 1

4n

acuteiexcl 2nplusmnx C plusmny C 4nplusmnxplusmny

frac14

For xed n we have that

limjplusmnj0

E[Dplusmn] middot 14n

but this holds for all n so that

limjplusmnj0

E[Dplusmn] D 0

APPENDIX B PROOF THAT THE PROBABILITYINCLUSION FUNCTION EQUALS THE

TARGET INTENSITY FUNCTION

We need the measure space XB Aacute where X is the unit inter-val I D 0 1] or the unit square I2 D 0 1] pound 01] and the rele-vant frac34 elds are BI and BI2 the frac34 elds of the Borel subsetsof I and I2 respectively For each of the three types of populationswe de ne a measure Aacute of population size We use the same symbolfor all three cases but the speci cs vary from case to case For a -nite population we take Aacute to be counting measure restricted to R sothat for any subset B 2 BI2 AacuteB is the number of population ele-ments in B R For linear populations we take AacuteB to be the lengthof the linear population contained within B Clearly Aacute is nonnega-tive countably additive de ned for all Borel sets and Aacute D 0 soAacute is a measure Finally for areal populations we take AacuteB to be theLebesgue measure of B R

We begin by randomly translating the image of R in the unit squareby adding independent U0 1=2 offsets to the xy coordinates Thisrandom translation plays the same role as random grid location doesin an RTS design namely it guarantees that pairwise inclusion prob-abilities are nonzero In particular in this case it ensures that any pairof points in R has a nonzero chance of being mapped into differentquadrants

Let frac14s be an inclusion intensity function that is a function thatspeci es the target number of samples per unit measure We assumethat any linear population consists of a nite number m of smoothrecti able curves R D

SmiD1fdegit D xi t yi t jt 2 [ai bi ]g with

xi and yi continuous and differentiable on [ai bi ] We set frac14s equalto the target number of samples per unit length at s for s 2 L and

equal to zero elsewhere For example if the linear population werea stream network frac14s would specify the desired number of samplesper kilometer of stream at the point s Finally an areal population isa nite collection of closed polygons In this case frac14s speci es thetarget intensity as number of samples per unit area Note that for one-and two-dimensional resources frac14s could be a continuous smoothlyvarying function Formally we require frac14s to be bounded and mea-surable strictly positive on R and zero elsewhere and scaled so thatM D

RR frac14s dAacutes From these de nitions of frac14cent and Aacutecent it follows

that wB DR

B frac14sdAacutes is a measure and that wB is the targetnumber of samples in B In particular M D wI2 is the target samplesize In the following discussion we assume that M is an integer thenoninteger case is a simple extension

Let f cent be a quadrant-recursive function that maps I2 into I Be-cause BI can be generated by sets of the form J n

m and BI2 can begenerated by sets of the form Qn

j k both f and f iexcl1 are measurable

Because f is measurable f iexcl1B is measurable for B 2 BI so thatQF x D

Rf iexcl10x] frac14sdAacutes exists In fact QF is a distribution func-

tion that is nonnegative increasing and right continuous For linearand areal resources QF is a continuous increasing function but for -nite resourcepopulations QF is a step function with jumps at the imagesof populationelements We can modify QF to obtain continuity in the -nite case via linear interpolation that is let xi i D 1 N be the or-dered jump points of QF set x0 D 0 xNC1 D 1 and for xi lt x middot xiC1set F x D QF x C QFxiC1 iexcl QFxi =xiC1 iexcl xi x iexcl xi If we setF D QF for the linear and areal case then in every cases we have thatF is a continuous distribution function with range 0M]

In the nite case Fiexcl1 is single-valued so that Gy D minxi jF iexcl1y middot xi is well de ned In the linear and areal cases F iexcl1

may not be single-valued Points that are in the unit square but notin R lead to ats in F that correspond to regions in the unit squarewith frac14s D 0 However Fiexcl1y always will be closed and boundedso that Gy D minfxjx 2 Fiexcl1yg is well de ned In all cases theintensity function frac14 is positive at s D f iexcl1Gy that is there isa population element at s Thus f iexcl1 plusmn G maps 0M] onto the tar-get population that is f iexcl1 plusmn G associates every point in 0 M] witha unique element in the population

It follows that selecting a sample from 0 M] also selects pop-ulation elements via the mapping f iexcl1 plusmn G To get a sample withan inclusion function equal to the target inclusion density we selecta sample from 0M] by splitting the range into M unit-length in-tervals 01] 1 2] M iexcl 1M] and picking one point in eachinterval Because of hierarchical randomizationwe gain no additionalldquorandomnessrdquo by picking the points independently so we use system-atic sampling with a random start and a unit-length selection intervalThe selection procedure de nes an inclusion probability density func-tion on 0M] with a correspondingmeasure PM cent Note that PM co-incides with Lebesgue measure on 0 M] in particular the measureof a subinterval of 0 M] is its length We induce a measure P1 on I

via P1B DRGiexcl1B dP M and in turn induce a measure P2 on I2

via P2B DR

f iexcl1B dP 1 The measure P2 is an inclusion probability

measure on I2 and P2B D wB so the sample selection methoddoes give an inclusion probability function equal to the target sampleintensity function

[Received August 2002 Revised September 2003]

REFERENCES

Bellhouse D R (1977) ldquoSome Optimal Designs for Sampling in Two Dimen-sionsrdquo Biometrika 64 605ndash611

Bickford C A Mayer C E and Ware K D (1963) ldquoAn Ef cient Sam-pling Design for Forest Inventory The Northeast Forest Resurveyrdquo Journalof Forestry 61 826ndash833

278 Journal of the American Statistical Association March 2004

Breidt F J (1995) ldquoMarkov Chain Designs for One-per-Stratum SamplingrdquoSurvey Methodology 21 63ndash70

Brewer K R W and Hanif M (1983) Sampling With Unequal ProbabilitiesNew York Springer-Verlag

Cochran W G (1946) ldquoRelative Accuracy of Systematic and Strati ed Ran-dom Samples for a Certain Class of Populationsrdquo The Annals of Mathemati-cal Statistics 17 164ndash177

Cordy C (1993) ldquoAn Extension of the HorvitzndashThompson Theorem to PointSampling From a Continuous Universerdquo Probability and Statistics Letters18 353ndash362

Cotter J and Nealon J (1987) ldquoArea Frame Design for Agricultural SurveysrdquoUS Department of Agriculture National Agricultural Statistics Service Re-search and Applications Division Area Frame Section

Dalenius T Haacutejek J and Zubrzycki S (1961) ldquoOn Plane Sampling and Re-lated Geometrical Problemsrdquo in Proceedings of the 4th Berkeley Symposiumon Probability and Mathematical Statistics 1 125ndash150

Das A C (1950) ldquoTwo-Dimensional Systematic Sampling and the AssociatedStrati ed and Random Samplingrdquo Sankhya 10 95ndash108

Gibson L and Lucas D (1982) ldquoSpatial Data Processing Using BalancedTernaryrdquo in Proceedings of the IEEE Computer Society Conference on Pat-tern Recognition and Image Processing Silver Springs MD IEEE ComputerSociety Press

Gilbert R O (1987) Statistical Methods for Environmental Pollution Moni-toring New York Van Nostrand Reinhold

Hausdorff F (1957) Set Theory New York ChelseaHazard J W and Law B E (1989) Forest Survey Methods Used in the USDA

Forest Service EPA6003-89065 Corvallis Oregon US EnvironmentalProtection Agency Of ce of Research and Development Environmental Re-search Laboratory

Horn C R and Grayman W M (1993) ldquoWater-Quality Modeling With EPARiver Reach File Systemrdquo Journal of Water Resources Planning and Man-agement 119 262ndash274

Horvitz D G and Thompson D J (1952) ldquoA Generalization of SamplingWithout Replacement From a Finite Universerdquo Journal of the American Sta-tistical Association 47 663ndash685

Iachan R (1985) ldquoPlane Samplingrdquo Statistics and Probability Letters 50151ndash159

IDEM (2000) ldquoIndiana Water Quality Report 2000rdquo Report IDEM34020012000 Indiana Department of Environmental Management Of ce of Wa-ter Management Indianapolis Indiana

Insightful Corporation (2002) ldquoS-PLUS 6 for Windows Language ReferencerdquoInsightful Corporation Seattle WA

Karr J R (1991) ldquoBiological Integrity A Long Neglected Aspect of WaterResource Managementrdquo Ecological Applications 1 66ndash84

Kish L (1987) Statistical Design for Research New York WileyMahalanobis P C (1946) ldquoRecent Experiments in Statistical Sampling in

the Indian Statistical Instituterdquo Journal of the Royal Statistical Society 109325ndash370

Mark D M (1990) ldquoNeighbor-Based Properties of Some Orderings of Two-Dimensional Spacerdquo Geographical Analysis 2 145ndash157

Mateacutern B (1960) Spatial Variation Stockholm Sweden Meddelanden fraringnStatens Skogsforskningsinstitut

Messer J J Arsiss C W Baker J R Drouseacute S K Eshleman K NKaufmann P R Linthurst R A Omernik J M Overton W S Sale M JSchonbrod R D Stambaugh S M and Tuschall J R Jr (1986) Na-tional Surface Water Survey National Stream Survey Phase I-Pilot SurveyEPA-6004-86026 Washington DC US Environmental ProtectionAgency

MunhollandP L and Borkowski J J (1996) ldquoSimple Latin Square SamplingC 1 A Spatial Design Using Quadratsrdquo Biometrics 52 125ndash136

Olea R A (1984) ldquoSampling Design Optimization for Spatial FunctionsrdquoMathematical Geology 16 369ndash392

Overton W S and Stehman S V (1993) ldquoProperties of Designs for SamplingContinuous Spatial Resources From a Triangular Gridrdquo Communications inStatistics Part AmdashTheory and Methods 22 2641ndash2660

Patterson H D (1950) ldquoSampling on Successive Occasions With Partial Re-placement of Unitsrdquo Journal of the Royal Statistical Society Ser B 12241ndash255

Peano G (1890) ldquoSur Une Courbe Qui Remplit Toute Une Aire Planerdquo Math-ematische Annalen 36 157ndash160

Quenouille M H (1949) ldquoProblems in Plane Samplingrdquo The Annals of Math-ematical Statistics 20 335ndash375

Saalfeld A (1991) ldquoConstruction of Spatially Articulated List Frames forHousehold Surveysrdquo in Proceedings of Statistics Canada Symposium 91Spatial Issues in Statistics Ottawa Canada Statistics Canada pp 41ndash53

Sen A R (1953) ldquoOn the Estimate of the Variance in Sampling With Vary-ing Probabilitiesrdquo Journal of the Indian Society of Agricultural Statistics 7119ndash127

Simmons G F (1963) Introduction to Topology and Modern Analysis NewYork McGrawndashHill

Stehman S V and Overton W S (1994) ldquoEnvironmental Sampling and Mon-itoringrdquo in Handbook of Statistics Vol 12 eds G P Patil and C R RaoAmsterdam The Netherlands Elsevier Science pp 263ndash305

Stevens D L Jr (1997) ldquoVariable Density Grid-Based Sampling Designs forContinuous Spatial Populationsrdquo Environmetrics 8 167ndash195

Stevens D L Jr and Olsen A R (1999) ldquoSpatially Restricted Surveys OverTime for Aquatic Resourcesrdquo Journal of Agricultural Biological and Envi-ronmental Statistics 4 415ndash428

(2000) ldquoSpatially-Restricted Random Sampling Designs for Design-Based and Model-Based Estimationrdquo in Accuracy 2000 Proceedings of the4th International Symposium on Spatial Accuracy Assessment in Natural Re-sources and Environmental Sciences Delft The Netherlands Delft Univer-sity Press pp 609ndash616

(2003) ldquoVariance Estimation for Spatially Balanced Samples of Envi-ronmental Resourcesrdquo Environmetrics 14 593ndash610

Strahler A N (1957) ldquoQuantitative Analysis of Watershed GeomorphologyrdquoTransactions of the American Geophysical Union 38 913ndash920

Thompson S K (1992) Sampling New York WileyUrquhart N S Overton W S and Birkes D S (1993) ldquoComparing

Sampling Designs for Monitoring Ecological Status and Trends Impact ofTemporal Patternsrdquo in Statistics for the Environment eds V Barnett andK F Turkman New York Wiley pp 71ndash86

USGS (1999) ldquoThe National Hydrography Datasetrdquo Fact Sheet 106-99 USGeological Survey

Wolter K (1985) Introduction to Variance Estimation New York Springer-Verlag

Wolter K M and Harter R M (1990) ldquoSample Maintenance Based on PeanoKeysrdquo in Proceedings of the 1989 International Symposium Analysis of Datain Time Ottawa Canada Statistics Canada pp 21ndash31

Yates F (1981) Sampling Methods for Censuses and Surveys (4th ed) Lon-don Grif n

Yates F and Grundy P M (1953) ldquoSelection Without Replacement FromWithin Strata With Probability Proportional to Sizerdquo Journal of the RoyalStatistical Society Ser B 15 253ndash261

Page 15: Spatially Balanced Sampling of Natural Resources · Spatially Balanced Sampling of Natural Resources DonL.STEVENSJr. and Anthony R. OLSEN The spatial distribution of a natural resource

276 Journal of the American Statistical Association March 2004

Table 5 Population Estimates With IRS and Local Variance Estimates

Indicator IRS Local DifferenceSubpopulation score N sites Mean std err std err ()

L Wabash IBI 39 361 21 14 iexcl568U Illinois IBI 41 325 17 13 iexcl448E Fork White IBI 32 351 13 12 iexcl223Great Miami IBI 19 408 27 22 iexcl335L Wabash QHEI 39 556 23 16 iexcl522U Illinois QHEI 41 433 21 16 iexcl395E Fork White QHEI 34 543 21 15 iexcl459Great Miami QHEI 19 678 22 19 iexcl260

Table 5 summarizes the population estimates for IBI andQHEI scores for each of the four basins The associated stan-dard error estimates are based on the HorvitzndashThompson ratiovariance estimator assuming an independent random sampleand on the local neighborhoodvariance estimator described inSection 41 On average the neighborhood variance estimatoris 38 smaller than the IRS variance estimator Figure 11 il-lustrates the impact of the variance estimators on con denceintervals for cumulative distribution function estimates for theLower Wabash Basin

6 DISCUSSION

There are a number of designs that provide good disper-sion of sample points over a spatial domain When we appliedthese designs to large-scale environmental sampling programsit quickly became apparent that we needed a means (1) toaccommodate variable inclusion probability and (2) to adjustsample sizes dynamically These requirements are rooted inthe very fundamentals of environmental management The rstrequirement stems from the fact that an environmental re-source is rarely uniformly important in the objective of themonitoring there are always scienti c economic or politi-cal reasons for sampling some portions of a resource moreintensively than others Two features of environmental moni-toring programs drive the second requirement First these pro-grams tend to be long lived so that even if the objectives ofthe program remain unchangedthe ldquoimportantrdquosubpopulationschange necessitating a corresponding change in sampling in-tensity Second a high-quality sampling frame is often lack-ing for environmental resource populationsAs far as we knowthere is no other technique for spatial sampling that ldquobalancesrdquoover an intensity metric instead of a Euclidean distance metricor permits dynamic modi cation of sample intensity

Adaptive sampling (Thompson 1992 pp 261ndash319) is an-other way to modify sample intensity However there are somesigni cant differences between GRTS and adaptive sampling inthe way the modi cation is accomplished Adaptive samplingincreases the sampling intensity locally depending on the re-sponse observed at a sample point whereas the GRTS intensitychange is global

The GRTS rst-order inclusion probability (or density) canbe made proportional to an arbitrary positive auxiliary vari-able for example a signal from a remote sensing platformor a sample intensity that varies by geographical divisions orknown physical characteristics of the target populationIn somepoint and linear situations it may be desirable to have thesample be spatially balanced with respect to geographic spacerather than with respect to the population density This can be

Figure 11 Stream Network and Sample Site Spatial Patterns by Mul-tidensity Category for the Lower Wabash Basin (mdashmdash- CDF estimateiexcl iexcl iexcl iexcl iexcl 95 local condence limits cent cent cent cent cent cent cent cent cent95 IRS condencelimits)

achieved by making the inclusion probability inversely propor-tional to the population density Although the development ofGRTS has focused on applications in geographic space it canbe applied in other spaces For exampleone applicationde nedtwo-dimensionalspace by the rst two principal componentsofclimate variables and selected a GRTS sample of forest plots inthat space

The computational burden in hierarchical randomization canbe substantial However it needs to be carried out only to a res-olution suf cient to obtain no more than one sample point persubquadrant The actual point selection can be carried out bytreating the subquadrants as if they are elements of a nitepopulation selecting the M subquadrants to receive samplepoints and then selecting one population element at randomfrom among the elements contained within the selected sub-quadrants according to the probability speci ed by frac14

Reverse hierarchical ordering adds a feature that is im-mensely popular with eld practitioners namely the ability toldquoreplacerdquo samples that are lost due to being nontarget or inac-cessible Moreover we can replace the samples in such a wayas to achieve good spatial balance over the population that isactually sampleable even when sampleability cannot be deter-mined prior to sample selectionOf course this feature does noteliminate the nonresponse or the bias of an inference to the in-accessible population It does however allow investigators toobtain the maximum number of samples that their budget willpermit them to analyze

Reverse hierarchical ordering has other uses as well One isto generate interpenetrating subsamples (Mahalanobis 1946)For example 10 interpenetrating subsamples from a samplesize of 100 can be obtained simply by taking consecutive sub-sets of 10 from the reverse hierarchical ordering Each subsethas the same properties as the complete design Consecutivesubsets can also be used to de ne panels of sites for applica-tion in surveys over time for example sampling with partial re-placement (Patterson 1950 Kish 1987 Urquhart Overton andBirkes 1993)

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 277

APPENDIX A PROOF OF LEMMA

Lemma Let f I2 I be a 1ndash1 quadrant-recursive function andlet s raquo UI2 Then limjplusmnj0 Efjf s iexcl f s C plusmnjg D 0

Proof If for some n gt 0 s and s C plusmn are in the same subquad-rant Qn

jk then f s and f s C plusmn are in the same interval J nm so

that jf s iexcl f s C plusmnj middot 1=4n The probability that s and s C plusmn arein the same subquadrant is the same as the probability of the ori-gin and plusmn D plusmnx plusmny being in the same cell of a randomly locatedgrid with cells congruent to Qn

jk For plusmnx plusmny middot 1=2n that probability

is equal to jQn0 Qnplusmnj=jQn0j D 1 iexcl 2nplusmnx C plusmny C 4nplusmnxplusmny where Qnx denotes a polygon congruent to Qn

jk centered on x

For Ds plusmn D jf s iexcl f s C plusmnj then we have that P D middot 1=4n cedil1 iexcl 2nplusmnx C plusmny C 4nplusmnxplusmny Thus the distribution function FD of D isbounded below by

FD u cedil

8gtlt

gt

0 u middot 14n

1 iexcl 2nplusmnx C plusmny C 4nplusmnxplusmny u gt1

4n

Because D is positive and bounded above by 1

E[Dplusmn] D 1 iexclZ 1

0FD udu

middot 1 iexclraquo

0

4nC

sup31 iexcl 1

4n

acuteiexcl 2nplusmnx C plusmny C 4nplusmnxplusmny

frac14

For xed n we have that

limjplusmnj0

E[Dplusmn] middot 14n

but this holds for all n so that

limjplusmnj0

E[Dplusmn] D 0

APPENDIX B PROOF THAT THE PROBABILITYINCLUSION FUNCTION EQUALS THE

TARGET INTENSITY FUNCTION

We need the measure space XB Aacute where X is the unit inter-val I D 0 1] or the unit square I2 D 0 1] pound 01] and the rele-vant frac34 elds are BI and BI2 the frac34 elds of the Borel subsetsof I and I2 respectively For each of the three types of populationswe de ne a measure Aacute of population size We use the same symbolfor all three cases but the speci cs vary from case to case For a -nite population we take Aacute to be counting measure restricted to R sothat for any subset B 2 BI2 AacuteB is the number of population ele-ments in B R For linear populations we take AacuteB to be the lengthof the linear population contained within B Clearly Aacute is nonnega-tive countably additive de ned for all Borel sets and Aacute D 0 soAacute is a measure Finally for areal populations we take AacuteB to be theLebesgue measure of B R

We begin by randomly translating the image of R in the unit squareby adding independent U0 1=2 offsets to the xy coordinates Thisrandom translation plays the same role as random grid location doesin an RTS design namely it guarantees that pairwise inclusion prob-abilities are nonzero In particular in this case it ensures that any pairof points in R has a nonzero chance of being mapped into differentquadrants

Let frac14s be an inclusion intensity function that is a function thatspeci es the target number of samples per unit measure We assumethat any linear population consists of a nite number m of smoothrecti able curves R D

SmiD1fdegit D xi t yi t jt 2 [ai bi ]g with

xi and yi continuous and differentiable on [ai bi ] We set frac14s equalto the target number of samples per unit length at s for s 2 L and

equal to zero elsewhere For example if the linear population werea stream network frac14s would specify the desired number of samplesper kilometer of stream at the point s Finally an areal population isa nite collection of closed polygons In this case frac14s speci es thetarget intensity as number of samples per unit area Note that for one-and two-dimensional resources frac14s could be a continuous smoothlyvarying function Formally we require frac14s to be bounded and mea-surable strictly positive on R and zero elsewhere and scaled so thatM D

RR frac14s dAacutes From these de nitions of frac14cent and Aacutecent it follows

that wB DR

B frac14sdAacutes is a measure and that wB is the targetnumber of samples in B In particular M D wI2 is the target samplesize In the following discussion we assume that M is an integer thenoninteger case is a simple extension

Let f cent be a quadrant-recursive function that maps I2 into I Be-cause BI can be generated by sets of the form J n

m and BI2 can begenerated by sets of the form Qn

j k both f and f iexcl1 are measurable

Because f is measurable f iexcl1B is measurable for B 2 BI so thatQF x D

Rf iexcl10x] frac14sdAacutes exists In fact QF is a distribution func-

tion that is nonnegative increasing and right continuous For linearand areal resources QF is a continuous increasing function but for -nite resourcepopulations QF is a step function with jumps at the imagesof populationelements We can modify QF to obtain continuity in the -nite case via linear interpolation that is let xi i D 1 N be the or-dered jump points of QF set x0 D 0 xNC1 D 1 and for xi lt x middot xiC1set F x D QF x C QFxiC1 iexcl QFxi =xiC1 iexcl xi x iexcl xi If we setF D QF for the linear and areal case then in every cases we have thatF is a continuous distribution function with range 0M]

In the nite case Fiexcl1 is single-valued so that Gy D minxi jF iexcl1y middot xi is well de ned In the linear and areal cases F iexcl1

may not be single-valued Points that are in the unit square but notin R lead to ats in F that correspond to regions in the unit squarewith frac14s D 0 However Fiexcl1y always will be closed and boundedso that Gy D minfxjx 2 Fiexcl1yg is well de ned In all cases theintensity function frac14 is positive at s D f iexcl1Gy that is there isa population element at s Thus f iexcl1 plusmn G maps 0M] onto the tar-get population that is f iexcl1 plusmn G associates every point in 0 M] witha unique element in the population

It follows that selecting a sample from 0 M] also selects pop-ulation elements via the mapping f iexcl1 plusmn G To get a sample withan inclusion function equal to the target inclusion density we selecta sample from 0M] by splitting the range into M unit-length in-tervals 01] 1 2] M iexcl 1M] and picking one point in eachinterval Because of hierarchical randomizationwe gain no additionalldquorandomnessrdquo by picking the points independently so we use system-atic sampling with a random start and a unit-length selection intervalThe selection procedure de nes an inclusion probability density func-tion on 0M] with a correspondingmeasure PM cent Note that PM co-incides with Lebesgue measure on 0 M] in particular the measureof a subinterval of 0 M] is its length We induce a measure P1 on I

via P1B DRGiexcl1B dP M and in turn induce a measure P2 on I2

via P2B DR

f iexcl1B dP 1 The measure P2 is an inclusion probability

measure on I2 and P2B D wB so the sample selection methoddoes give an inclusion probability function equal to the target sampleintensity function

[Received August 2002 Revised September 2003]

REFERENCES

Bellhouse D R (1977) ldquoSome Optimal Designs for Sampling in Two Dimen-sionsrdquo Biometrika 64 605ndash611

Bickford C A Mayer C E and Ware K D (1963) ldquoAn Ef cient Sam-pling Design for Forest Inventory The Northeast Forest Resurveyrdquo Journalof Forestry 61 826ndash833

278 Journal of the American Statistical Association March 2004

Breidt F J (1995) ldquoMarkov Chain Designs for One-per-Stratum SamplingrdquoSurvey Methodology 21 63ndash70

Brewer K R W and Hanif M (1983) Sampling With Unequal ProbabilitiesNew York Springer-Verlag

Cochran W G (1946) ldquoRelative Accuracy of Systematic and Strati ed Ran-dom Samples for a Certain Class of Populationsrdquo The Annals of Mathemati-cal Statistics 17 164ndash177

Cordy C (1993) ldquoAn Extension of the HorvitzndashThompson Theorem to PointSampling From a Continuous Universerdquo Probability and Statistics Letters18 353ndash362

Cotter J and Nealon J (1987) ldquoArea Frame Design for Agricultural SurveysrdquoUS Department of Agriculture National Agricultural Statistics Service Re-search and Applications Division Area Frame Section

Dalenius T Haacutejek J and Zubrzycki S (1961) ldquoOn Plane Sampling and Re-lated Geometrical Problemsrdquo in Proceedings of the 4th Berkeley Symposiumon Probability and Mathematical Statistics 1 125ndash150

Das A C (1950) ldquoTwo-Dimensional Systematic Sampling and the AssociatedStrati ed and Random Samplingrdquo Sankhya 10 95ndash108

Gibson L and Lucas D (1982) ldquoSpatial Data Processing Using BalancedTernaryrdquo in Proceedings of the IEEE Computer Society Conference on Pat-tern Recognition and Image Processing Silver Springs MD IEEE ComputerSociety Press

Gilbert R O (1987) Statistical Methods for Environmental Pollution Moni-toring New York Van Nostrand Reinhold

Hausdorff F (1957) Set Theory New York ChelseaHazard J W and Law B E (1989) Forest Survey Methods Used in the USDA

Forest Service EPA6003-89065 Corvallis Oregon US EnvironmentalProtection Agency Of ce of Research and Development Environmental Re-search Laboratory

Horn C R and Grayman W M (1993) ldquoWater-Quality Modeling With EPARiver Reach File Systemrdquo Journal of Water Resources Planning and Man-agement 119 262ndash274

Horvitz D G and Thompson D J (1952) ldquoA Generalization of SamplingWithout Replacement From a Finite Universerdquo Journal of the American Sta-tistical Association 47 663ndash685

Iachan R (1985) ldquoPlane Samplingrdquo Statistics and Probability Letters 50151ndash159

IDEM (2000) ldquoIndiana Water Quality Report 2000rdquo Report IDEM34020012000 Indiana Department of Environmental Management Of ce of Wa-ter Management Indianapolis Indiana

Insightful Corporation (2002) ldquoS-PLUS 6 for Windows Language ReferencerdquoInsightful Corporation Seattle WA

Karr J R (1991) ldquoBiological Integrity A Long Neglected Aspect of WaterResource Managementrdquo Ecological Applications 1 66ndash84

Kish L (1987) Statistical Design for Research New York WileyMahalanobis P C (1946) ldquoRecent Experiments in Statistical Sampling in

the Indian Statistical Instituterdquo Journal of the Royal Statistical Society 109325ndash370

Mark D M (1990) ldquoNeighbor-Based Properties of Some Orderings of Two-Dimensional Spacerdquo Geographical Analysis 2 145ndash157

Mateacutern B (1960) Spatial Variation Stockholm Sweden Meddelanden fraringnStatens Skogsforskningsinstitut

Messer J J Arsiss C W Baker J R Drouseacute S K Eshleman K NKaufmann P R Linthurst R A Omernik J M Overton W S Sale M JSchonbrod R D Stambaugh S M and Tuschall J R Jr (1986) Na-tional Surface Water Survey National Stream Survey Phase I-Pilot SurveyEPA-6004-86026 Washington DC US Environmental ProtectionAgency

MunhollandP L and Borkowski J J (1996) ldquoSimple Latin Square SamplingC 1 A Spatial Design Using Quadratsrdquo Biometrics 52 125ndash136

Olea R A (1984) ldquoSampling Design Optimization for Spatial FunctionsrdquoMathematical Geology 16 369ndash392

Overton W S and Stehman S V (1993) ldquoProperties of Designs for SamplingContinuous Spatial Resources From a Triangular Gridrdquo Communications inStatistics Part AmdashTheory and Methods 22 2641ndash2660

Patterson H D (1950) ldquoSampling on Successive Occasions With Partial Re-placement of Unitsrdquo Journal of the Royal Statistical Society Ser B 12241ndash255

Peano G (1890) ldquoSur Une Courbe Qui Remplit Toute Une Aire Planerdquo Math-ematische Annalen 36 157ndash160

Quenouille M H (1949) ldquoProblems in Plane Samplingrdquo The Annals of Math-ematical Statistics 20 335ndash375

Saalfeld A (1991) ldquoConstruction of Spatially Articulated List Frames forHousehold Surveysrdquo in Proceedings of Statistics Canada Symposium 91Spatial Issues in Statistics Ottawa Canada Statistics Canada pp 41ndash53

Sen A R (1953) ldquoOn the Estimate of the Variance in Sampling With Vary-ing Probabilitiesrdquo Journal of the Indian Society of Agricultural Statistics 7119ndash127

Simmons G F (1963) Introduction to Topology and Modern Analysis NewYork McGrawndashHill

Stehman S V and Overton W S (1994) ldquoEnvironmental Sampling and Mon-itoringrdquo in Handbook of Statistics Vol 12 eds G P Patil and C R RaoAmsterdam The Netherlands Elsevier Science pp 263ndash305

Stevens D L Jr (1997) ldquoVariable Density Grid-Based Sampling Designs forContinuous Spatial Populationsrdquo Environmetrics 8 167ndash195

Stevens D L Jr and Olsen A R (1999) ldquoSpatially Restricted Surveys OverTime for Aquatic Resourcesrdquo Journal of Agricultural Biological and Envi-ronmental Statistics 4 415ndash428

(2000) ldquoSpatially-Restricted Random Sampling Designs for Design-Based and Model-Based Estimationrdquo in Accuracy 2000 Proceedings of the4th International Symposium on Spatial Accuracy Assessment in Natural Re-sources and Environmental Sciences Delft The Netherlands Delft Univer-sity Press pp 609ndash616

(2003) ldquoVariance Estimation for Spatially Balanced Samples of Envi-ronmental Resourcesrdquo Environmetrics 14 593ndash610

Strahler A N (1957) ldquoQuantitative Analysis of Watershed GeomorphologyrdquoTransactions of the American Geophysical Union 38 913ndash920

Thompson S K (1992) Sampling New York WileyUrquhart N S Overton W S and Birkes D S (1993) ldquoComparing

Sampling Designs for Monitoring Ecological Status and Trends Impact ofTemporal Patternsrdquo in Statistics for the Environment eds V Barnett andK F Turkman New York Wiley pp 71ndash86

USGS (1999) ldquoThe National Hydrography Datasetrdquo Fact Sheet 106-99 USGeological Survey

Wolter K (1985) Introduction to Variance Estimation New York Springer-Verlag

Wolter K M and Harter R M (1990) ldquoSample Maintenance Based on PeanoKeysrdquo in Proceedings of the 1989 International Symposium Analysis of Datain Time Ottawa Canada Statistics Canada pp 21ndash31

Yates F (1981) Sampling Methods for Censuses and Surveys (4th ed) Lon-don Grif n

Yates F and Grundy P M (1953) ldquoSelection Without Replacement FromWithin Strata With Probability Proportional to Sizerdquo Journal of the RoyalStatistical Society Ser B 15 253ndash261

Page 16: Spatially Balanced Sampling of Natural Resources · Spatially Balanced Sampling of Natural Resources DonL.STEVENSJr. and Anthony R. OLSEN The spatial distribution of a natural resource

Stevens and Olsen Spatially Balanced Sampling of Natural Resources 277

APPENDIX A PROOF OF LEMMA

Lemma Let f I2 I be a 1ndash1 quadrant-recursive function andlet s raquo UI2 Then limjplusmnj0 Efjf s iexcl f s C plusmnjg D 0

Proof If for some n gt 0 s and s C plusmn are in the same subquad-rant Qn

jk then f s and f s C plusmn are in the same interval J nm so

that jf s iexcl f s C plusmnj middot 1=4n The probability that s and s C plusmn arein the same subquadrant is the same as the probability of the ori-gin and plusmn D plusmnx plusmny being in the same cell of a randomly locatedgrid with cells congruent to Qn

jk For plusmnx plusmny middot 1=2n that probability

is equal to jQn0 Qnplusmnj=jQn0j D 1 iexcl 2nplusmnx C plusmny C 4nplusmnxplusmny where Qnx denotes a polygon congruent to Qn

jk centered on x

For Ds plusmn D jf s iexcl f s C plusmnj then we have that P D middot 1=4n cedil1 iexcl 2nplusmnx C plusmny C 4nplusmnxplusmny Thus the distribution function FD of D isbounded below by

FD u cedil

8gtlt

gt

0 u middot 14n

1 iexcl 2nplusmnx C plusmny C 4nplusmnxplusmny u gt1

4n

Because D is positive and bounded above by 1

E[Dplusmn] D 1 iexclZ 1

0FD udu

middot 1 iexclraquo

0

4nC

sup31 iexcl 1

4n

acuteiexcl 2nplusmnx C plusmny C 4nplusmnxplusmny

frac14

For xed n we have that

limjplusmnj0

E[Dplusmn] middot 14n

but this holds for all n so that

limjplusmnj0

E[Dplusmn] D 0

APPENDIX B PROOF THAT THE PROBABILITYINCLUSION FUNCTION EQUALS THE

TARGET INTENSITY FUNCTION

We need the measure space XB Aacute where X is the unit inter-val I D 0 1] or the unit square I2 D 0 1] pound 01] and the rele-vant frac34 elds are BI and BI2 the frac34 elds of the Borel subsetsof I and I2 respectively For each of the three types of populationswe de ne a measure Aacute of population size We use the same symbolfor all three cases but the speci cs vary from case to case For a -nite population we take Aacute to be counting measure restricted to R sothat for any subset B 2 BI2 AacuteB is the number of population ele-ments in B R For linear populations we take AacuteB to be the lengthof the linear population contained within B Clearly Aacute is nonnega-tive countably additive de ned for all Borel sets and Aacute D 0 soAacute is a measure Finally for areal populations we take AacuteB to be theLebesgue measure of B R

We begin by randomly translating the image of R in the unit squareby adding independent U0 1=2 offsets to the xy coordinates Thisrandom translation plays the same role as random grid location doesin an RTS design namely it guarantees that pairwise inclusion prob-abilities are nonzero In particular in this case it ensures that any pairof points in R has a nonzero chance of being mapped into differentquadrants

Let frac14s be an inclusion intensity function that is a function thatspeci es the target number of samples per unit measure We assumethat any linear population consists of a nite number m of smoothrecti able curves R D

SmiD1fdegit D xi t yi t jt 2 [ai bi ]g with

xi and yi continuous and differentiable on [ai bi ] We set frac14s equalto the target number of samples per unit length at s for s 2 L and

equal to zero elsewhere For example if the linear population werea stream network frac14s would specify the desired number of samplesper kilometer of stream at the point s Finally an areal population isa nite collection of closed polygons In this case frac14s speci es thetarget intensity as number of samples per unit area Note that for one-and two-dimensional resources frac14s could be a continuous smoothlyvarying function Formally we require frac14s to be bounded and mea-surable strictly positive on R and zero elsewhere and scaled so thatM D

RR frac14s dAacutes From these de nitions of frac14cent and Aacutecent it follows

that wB DR

B frac14sdAacutes is a measure and that wB is the targetnumber of samples in B In particular M D wI2 is the target samplesize In the following discussion we assume that M is an integer thenoninteger case is a simple extension

Let f cent be a quadrant-recursive function that maps I2 into I Be-cause BI can be generated by sets of the form J n

m and BI2 can begenerated by sets of the form Qn

j k both f and f iexcl1 are measurable

Because f is measurable f iexcl1B is measurable for B 2 BI so thatQF x D

Rf iexcl10x] frac14sdAacutes exists In fact QF is a distribution func-

tion that is nonnegative increasing and right continuous For linearand areal resources QF is a continuous increasing function but for -nite resourcepopulations QF is a step function with jumps at the imagesof populationelements We can modify QF to obtain continuity in the -nite case via linear interpolation that is let xi i D 1 N be the or-dered jump points of QF set x0 D 0 xNC1 D 1 and for xi lt x middot xiC1set F x D QF x C QFxiC1 iexcl QFxi =xiC1 iexcl xi x iexcl xi If we setF D QF for the linear and areal case then in every cases we have thatF is a continuous distribution function with range 0M]

In the nite case Fiexcl1 is single-valued so that Gy D minxi jF iexcl1y middot xi is well de ned In the linear and areal cases F iexcl1

may not be single-valued Points that are in the unit square but notin R lead to ats in F that correspond to regions in the unit squarewith frac14s D 0 However Fiexcl1y always will be closed and boundedso that Gy D minfxjx 2 Fiexcl1yg is well de ned In all cases theintensity function frac14 is positive at s D f iexcl1Gy that is there isa population element at s Thus f iexcl1 plusmn G maps 0M] onto the tar-get population that is f iexcl1 plusmn G associates every point in 0 M] witha unique element in the population

It follows that selecting a sample from 0 M] also selects pop-ulation elements via the mapping f iexcl1 plusmn G To get a sample withan inclusion function equal to the target inclusion density we selecta sample from 0M] by splitting the range into M unit-length in-tervals 01] 1 2] M iexcl 1M] and picking one point in eachinterval Because of hierarchical randomizationwe gain no additionalldquorandomnessrdquo by picking the points independently so we use system-atic sampling with a random start and a unit-length selection intervalThe selection procedure de nes an inclusion probability density func-tion on 0M] with a correspondingmeasure PM cent Note that PM co-incides with Lebesgue measure on 0 M] in particular the measureof a subinterval of 0 M] is its length We induce a measure P1 on I

via P1B DRGiexcl1B dP M and in turn induce a measure P2 on I2

via P2B DR

f iexcl1B dP 1 The measure P2 is an inclusion probability

measure on I2 and P2B D wB so the sample selection methoddoes give an inclusion probability function equal to the target sampleintensity function

[Received August 2002 Revised September 2003]

REFERENCES

Bellhouse D R (1977) ldquoSome Optimal Designs for Sampling in Two Dimen-sionsrdquo Biometrika 64 605ndash611

Bickford C A Mayer C E and Ware K D (1963) ldquoAn Ef cient Sam-pling Design for Forest Inventory The Northeast Forest Resurveyrdquo Journalof Forestry 61 826ndash833

278 Journal of the American Statistical Association March 2004

Breidt F J (1995) ldquoMarkov Chain Designs for One-per-Stratum SamplingrdquoSurvey Methodology 21 63ndash70

Brewer K R W and Hanif M (1983) Sampling With Unequal ProbabilitiesNew York Springer-Verlag

Cochran W G (1946) ldquoRelative Accuracy of Systematic and Strati ed Ran-dom Samples for a Certain Class of Populationsrdquo The Annals of Mathemati-cal Statistics 17 164ndash177

Cordy C (1993) ldquoAn Extension of the HorvitzndashThompson Theorem to PointSampling From a Continuous Universerdquo Probability and Statistics Letters18 353ndash362

Cotter J and Nealon J (1987) ldquoArea Frame Design for Agricultural SurveysrdquoUS Department of Agriculture National Agricultural Statistics Service Re-search and Applications Division Area Frame Section

Dalenius T Haacutejek J and Zubrzycki S (1961) ldquoOn Plane Sampling and Re-lated Geometrical Problemsrdquo in Proceedings of the 4th Berkeley Symposiumon Probability and Mathematical Statistics 1 125ndash150

Das A C (1950) ldquoTwo-Dimensional Systematic Sampling and the AssociatedStrati ed and Random Samplingrdquo Sankhya 10 95ndash108

Gibson L and Lucas D (1982) ldquoSpatial Data Processing Using BalancedTernaryrdquo in Proceedings of the IEEE Computer Society Conference on Pat-tern Recognition and Image Processing Silver Springs MD IEEE ComputerSociety Press

Gilbert R O (1987) Statistical Methods for Environmental Pollution Moni-toring New York Van Nostrand Reinhold

Hausdorff F (1957) Set Theory New York ChelseaHazard J W and Law B E (1989) Forest Survey Methods Used in the USDA

Forest Service EPA6003-89065 Corvallis Oregon US EnvironmentalProtection Agency Of ce of Research and Development Environmental Re-search Laboratory

Horn C R and Grayman W M (1993) ldquoWater-Quality Modeling With EPARiver Reach File Systemrdquo Journal of Water Resources Planning and Man-agement 119 262ndash274

Horvitz D G and Thompson D J (1952) ldquoA Generalization of SamplingWithout Replacement From a Finite Universerdquo Journal of the American Sta-tistical Association 47 663ndash685

Iachan R (1985) ldquoPlane Samplingrdquo Statistics and Probability Letters 50151ndash159

IDEM (2000) ldquoIndiana Water Quality Report 2000rdquo Report IDEM34020012000 Indiana Department of Environmental Management Of ce of Wa-ter Management Indianapolis Indiana

Insightful Corporation (2002) ldquoS-PLUS 6 for Windows Language ReferencerdquoInsightful Corporation Seattle WA

Karr J R (1991) ldquoBiological Integrity A Long Neglected Aspect of WaterResource Managementrdquo Ecological Applications 1 66ndash84

Kish L (1987) Statistical Design for Research New York WileyMahalanobis P C (1946) ldquoRecent Experiments in Statistical Sampling in

the Indian Statistical Instituterdquo Journal of the Royal Statistical Society 109325ndash370

Mark D M (1990) ldquoNeighbor-Based Properties of Some Orderings of Two-Dimensional Spacerdquo Geographical Analysis 2 145ndash157

Mateacutern B (1960) Spatial Variation Stockholm Sweden Meddelanden fraringnStatens Skogsforskningsinstitut

Messer J J Arsiss C W Baker J R Drouseacute S K Eshleman K NKaufmann P R Linthurst R A Omernik J M Overton W S Sale M JSchonbrod R D Stambaugh S M and Tuschall J R Jr (1986) Na-tional Surface Water Survey National Stream Survey Phase I-Pilot SurveyEPA-6004-86026 Washington DC US Environmental ProtectionAgency

MunhollandP L and Borkowski J J (1996) ldquoSimple Latin Square SamplingC 1 A Spatial Design Using Quadratsrdquo Biometrics 52 125ndash136

Olea R A (1984) ldquoSampling Design Optimization for Spatial FunctionsrdquoMathematical Geology 16 369ndash392

Overton W S and Stehman S V (1993) ldquoProperties of Designs for SamplingContinuous Spatial Resources From a Triangular Gridrdquo Communications inStatistics Part AmdashTheory and Methods 22 2641ndash2660

Patterson H D (1950) ldquoSampling on Successive Occasions With Partial Re-placement of Unitsrdquo Journal of the Royal Statistical Society Ser B 12241ndash255

Peano G (1890) ldquoSur Une Courbe Qui Remplit Toute Une Aire Planerdquo Math-ematische Annalen 36 157ndash160

Quenouille M H (1949) ldquoProblems in Plane Samplingrdquo The Annals of Math-ematical Statistics 20 335ndash375

Saalfeld A (1991) ldquoConstruction of Spatially Articulated List Frames forHousehold Surveysrdquo in Proceedings of Statistics Canada Symposium 91Spatial Issues in Statistics Ottawa Canada Statistics Canada pp 41ndash53

Sen A R (1953) ldquoOn the Estimate of the Variance in Sampling With Vary-ing Probabilitiesrdquo Journal of the Indian Society of Agricultural Statistics 7119ndash127

Simmons G F (1963) Introduction to Topology and Modern Analysis NewYork McGrawndashHill

Stehman S V and Overton W S (1994) ldquoEnvironmental Sampling and Mon-itoringrdquo in Handbook of Statistics Vol 12 eds G P Patil and C R RaoAmsterdam The Netherlands Elsevier Science pp 263ndash305

Stevens D L Jr (1997) ldquoVariable Density Grid-Based Sampling Designs forContinuous Spatial Populationsrdquo Environmetrics 8 167ndash195

Stevens D L Jr and Olsen A R (1999) ldquoSpatially Restricted Surveys OverTime for Aquatic Resourcesrdquo Journal of Agricultural Biological and Envi-ronmental Statistics 4 415ndash428

(2000) ldquoSpatially-Restricted Random Sampling Designs for Design-Based and Model-Based Estimationrdquo in Accuracy 2000 Proceedings of the4th International Symposium on Spatial Accuracy Assessment in Natural Re-sources and Environmental Sciences Delft The Netherlands Delft Univer-sity Press pp 609ndash616

(2003) ldquoVariance Estimation for Spatially Balanced Samples of Envi-ronmental Resourcesrdquo Environmetrics 14 593ndash610

Strahler A N (1957) ldquoQuantitative Analysis of Watershed GeomorphologyrdquoTransactions of the American Geophysical Union 38 913ndash920

Thompson S K (1992) Sampling New York WileyUrquhart N S Overton W S and Birkes D S (1993) ldquoComparing

Sampling Designs for Monitoring Ecological Status and Trends Impact ofTemporal Patternsrdquo in Statistics for the Environment eds V Barnett andK F Turkman New York Wiley pp 71ndash86

USGS (1999) ldquoThe National Hydrography Datasetrdquo Fact Sheet 106-99 USGeological Survey

Wolter K (1985) Introduction to Variance Estimation New York Springer-Verlag

Wolter K M and Harter R M (1990) ldquoSample Maintenance Based on PeanoKeysrdquo in Proceedings of the 1989 International Symposium Analysis of Datain Time Ottawa Canada Statistics Canada pp 21ndash31

Yates F (1981) Sampling Methods for Censuses and Surveys (4th ed) Lon-don Grif n

Yates F and Grundy P M (1953) ldquoSelection Without Replacement FromWithin Strata With Probability Proportional to Sizerdquo Journal of the RoyalStatistical Society Ser B 15 253ndash261

Page 17: Spatially Balanced Sampling of Natural Resources · Spatially Balanced Sampling of Natural Resources DonL.STEVENSJr. and Anthony R. OLSEN The spatial distribution of a natural resource

278 Journal of the American Statistical Association March 2004

Breidt F J (1995) ldquoMarkov Chain Designs for One-per-Stratum SamplingrdquoSurvey Methodology 21 63ndash70

Brewer K R W and Hanif M (1983) Sampling With Unequal ProbabilitiesNew York Springer-Verlag

Cochran W G (1946) ldquoRelative Accuracy of Systematic and Strati ed Ran-dom Samples for a Certain Class of Populationsrdquo The Annals of Mathemati-cal Statistics 17 164ndash177

Cordy C (1993) ldquoAn Extension of the HorvitzndashThompson Theorem to PointSampling From a Continuous Universerdquo Probability and Statistics Letters18 353ndash362

Cotter J and Nealon J (1987) ldquoArea Frame Design for Agricultural SurveysrdquoUS Department of Agriculture National Agricultural Statistics Service Re-search and Applications Division Area Frame Section

Dalenius T Haacutejek J and Zubrzycki S (1961) ldquoOn Plane Sampling and Re-lated Geometrical Problemsrdquo in Proceedings of the 4th Berkeley Symposiumon Probability and Mathematical Statistics 1 125ndash150

Das A C (1950) ldquoTwo-Dimensional Systematic Sampling and the AssociatedStrati ed and Random Samplingrdquo Sankhya 10 95ndash108

Gibson L and Lucas D (1982) ldquoSpatial Data Processing Using BalancedTernaryrdquo in Proceedings of the IEEE Computer Society Conference on Pat-tern Recognition and Image Processing Silver Springs MD IEEE ComputerSociety Press

Gilbert R O (1987) Statistical Methods for Environmental Pollution Moni-toring New York Van Nostrand Reinhold

Hausdorff F (1957) Set Theory New York ChelseaHazard J W and Law B E (1989) Forest Survey Methods Used in the USDA

Forest Service EPA6003-89065 Corvallis Oregon US EnvironmentalProtection Agency Of ce of Research and Development Environmental Re-search Laboratory

Horn C R and Grayman W M (1993) ldquoWater-Quality Modeling With EPARiver Reach File Systemrdquo Journal of Water Resources Planning and Man-agement 119 262ndash274

Horvitz D G and Thompson D J (1952) ldquoA Generalization of SamplingWithout Replacement From a Finite Universerdquo Journal of the American Sta-tistical Association 47 663ndash685

Iachan R (1985) ldquoPlane Samplingrdquo Statistics and Probability Letters 50151ndash159

IDEM (2000) ldquoIndiana Water Quality Report 2000rdquo Report IDEM34020012000 Indiana Department of Environmental Management Of ce of Wa-ter Management Indianapolis Indiana

Insightful Corporation (2002) ldquoS-PLUS 6 for Windows Language ReferencerdquoInsightful Corporation Seattle WA

Karr J R (1991) ldquoBiological Integrity A Long Neglected Aspect of WaterResource Managementrdquo Ecological Applications 1 66ndash84

Kish L (1987) Statistical Design for Research New York WileyMahalanobis P C (1946) ldquoRecent Experiments in Statistical Sampling in

the Indian Statistical Instituterdquo Journal of the Royal Statistical Society 109325ndash370

Mark D M (1990) ldquoNeighbor-Based Properties of Some Orderings of Two-Dimensional Spacerdquo Geographical Analysis 2 145ndash157

Mateacutern B (1960) Spatial Variation Stockholm Sweden Meddelanden fraringnStatens Skogsforskningsinstitut

Messer J J Arsiss C W Baker J R Drouseacute S K Eshleman K NKaufmann P R Linthurst R A Omernik J M Overton W S Sale M JSchonbrod R D Stambaugh S M and Tuschall J R Jr (1986) Na-tional Surface Water Survey National Stream Survey Phase I-Pilot SurveyEPA-6004-86026 Washington DC US Environmental ProtectionAgency

MunhollandP L and Borkowski J J (1996) ldquoSimple Latin Square SamplingC 1 A Spatial Design Using Quadratsrdquo Biometrics 52 125ndash136

Olea R A (1984) ldquoSampling Design Optimization for Spatial FunctionsrdquoMathematical Geology 16 369ndash392

Overton W S and Stehman S V (1993) ldquoProperties of Designs for SamplingContinuous Spatial Resources From a Triangular Gridrdquo Communications inStatistics Part AmdashTheory and Methods 22 2641ndash2660

Patterson H D (1950) ldquoSampling on Successive Occasions With Partial Re-placement of Unitsrdquo Journal of the Royal Statistical Society Ser B 12241ndash255

Peano G (1890) ldquoSur Une Courbe Qui Remplit Toute Une Aire Planerdquo Math-ematische Annalen 36 157ndash160

Quenouille M H (1949) ldquoProblems in Plane Samplingrdquo The Annals of Math-ematical Statistics 20 335ndash375

Saalfeld A (1991) ldquoConstruction of Spatially Articulated List Frames forHousehold Surveysrdquo in Proceedings of Statistics Canada Symposium 91Spatial Issues in Statistics Ottawa Canada Statistics Canada pp 41ndash53

Sen A R (1953) ldquoOn the Estimate of the Variance in Sampling With Vary-ing Probabilitiesrdquo Journal of the Indian Society of Agricultural Statistics 7119ndash127

Simmons G F (1963) Introduction to Topology and Modern Analysis NewYork McGrawndashHill

Stehman S V and Overton W S (1994) ldquoEnvironmental Sampling and Mon-itoringrdquo in Handbook of Statistics Vol 12 eds G P Patil and C R RaoAmsterdam The Netherlands Elsevier Science pp 263ndash305

Stevens D L Jr (1997) ldquoVariable Density Grid-Based Sampling Designs forContinuous Spatial Populationsrdquo Environmetrics 8 167ndash195

Stevens D L Jr and Olsen A R (1999) ldquoSpatially Restricted Surveys OverTime for Aquatic Resourcesrdquo Journal of Agricultural Biological and Envi-ronmental Statistics 4 415ndash428

(2000) ldquoSpatially-Restricted Random Sampling Designs for Design-Based and Model-Based Estimationrdquo in Accuracy 2000 Proceedings of the4th International Symposium on Spatial Accuracy Assessment in Natural Re-sources and Environmental Sciences Delft The Netherlands Delft Univer-sity Press pp 609ndash616

(2003) ldquoVariance Estimation for Spatially Balanced Samples of Envi-ronmental Resourcesrdquo Environmetrics 14 593ndash610

Strahler A N (1957) ldquoQuantitative Analysis of Watershed GeomorphologyrdquoTransactions of the American Geophysical Union 38 913ndash920

Thompson S K (1992) Sampling New York WileyUrquhart N S Overton W S and Birkes D S (1993) ldquoComparing

Sampling Designs for Monitoring Ecological Status and Trends Impact ofTemporal Patternsrdquo in Statistics for the Environment eds V Barnett andK F Turkman New York Wiley pp 71ndash86

USGS (1999) ldquoThe National Hydrography Datasetrdquo Fact Sheet 106-99 USGeological Survey

Wolter K (1985) Introduction to Variance Estimation New York Springer-Verlag

Wolter K M and Harter R M (1990) ldquoSample Maintenance Based on PeanoKeysrdquo in Proceedings of the 1989 International Symposium Analysis of Datain Time Ottawa Canada Statistics Canada pp 21ndash31

Yates F (1981) Sampling Methods for Censuses and Surveys (4th ed) Lon-don Grif n

Yates F and Grundy P M (1953) ldquoSelection Without Replacement FromWithin Strata With Probability Proportional to Sizerdquo Journal of the RoyalStatistical Society Ser B 15 253ndash261