Arcuser Optimizing Sampling

8/6/2019 Arcuser Optimizing Sampling

http://slidepdf.com/reader/full/arcuser-optimizing-sampling 1/8

July–September 2007Vol. 10 No. 3

Optimizing a Sampling NetworkAutomating the Use of Geostatistical Tools for Lake Tahoe Area Study

By Witold Fraczek, ESRI Application Prototype Lab, and Andrzej Bytnerowicz, USDA Forest Service

A reference relief map showing the terrainsurrounding Lake Tahoe. The original ozone monitoring station locations used

to study air pollution in the area are indi-cated with red markers.

Editor’s note: This article describes how pow-erful analysis tools in the ArcGIS Geostatistical

Analyst 9.2 extension were applied in a studyof air quality degradation around Lake Tahoe,a resort destination located on the California/

Nevada border. As part of the study, a model

to optimize the monitoring network by locatingadditional monitoring stations was built using

ModelBuilder. An accompanying article, “Mak-ing Effective Use of Geostatistics,” introducesthis class of statistics.

About the StudyThe transparency and purity of Lake Tahoe’s wa-ter has been deteriorating since the 1950s, partial-ly due to increased deposition of nitrogenous airpollutants. Forests in the Lake Tahoe watershedhave also suffered from stresses such as drought,overstocking, and elevated concentrations of phytotoxic air pollutants, mainly ozone.

Ozone, one of the most damaging air pollut-ants, has strong toxic effects on human healthand vegetation and is most indicative of photo-chemical smog.

One of the main questions for scientists andforest managers in this area is whether air pollu-tion (speci cally ozone) is generated locally oris migrating with the prevailing westerly windsfrom California’s Central Valley, an area knownfor high levels of air pollution. A study to de-termine the origin of ozone found in the vicin-ity of Lake Tahoe was undertaken by the ForestService and ESRI. Initially, ambient ozone con-centrations were measured using a network of 31 sampling stations established by the UnitedStates Department of Agriculture (USDA) For -est Service Paci c Southwest Research Stationscientists from the Riverside Fire Laboratory inRiverside, California.

Looking at the Original NetworkKnowledge of the central Sierra Nevada Moun-tains, both the general patterns of ozone dis-tribution and the westerly wind pattern in thisarea, led to the positioning of several monitor-ing points on the western slopes of these moun-tains. Most monitoring stations were locatedinside the Lake Tahoe Basin.

Right: An ozone concentra-tion prediction map showingthe west–east trend in ozoneconcentration.

Three elevation transects wereset to measure ozone concentra-tion at different altitudes to learnif these concentrations could be

correlated with elevation.Prediction maps of ozone con-centration were generated usingGeostatistical Analyst. Becauseno strong correlation was de-tected between ozone and elevation, cokrigingcould not be applied. Most maps of ozone con-centration showed a noticeable trend. The mainrange of the Sierra Nevada Mountains apparent-ly blocks the transport of ozone from the Cen-tral Valley, located to the west of the study area.This helps explain why no correlation betweenozone concentration and elevation was detected.Monitoring stations located at similar elevations

but on opposite sides of the main range reportedsigni cantly different ozone values.

Looking at ErrorA map of prediction standard error was createdusing the same kriging method and parametersthat were used to generate ozone prediction maps.The bright yellow colors indicated areas where theprediction standard error for the existing network









Performing DiagnosticsThe Cross Validation Comparison dialog boxcompares ve prediction error parameters. Allerror parameters, with the exception of the Root-Mean-Square Standardized, should be as closeto 0 as possible for the most accurate output. Forthe Root-Mean-Square Standardized parameter,the result should be close to 1. In this example,four out of ve of the error parameters indicatedthat the surface created by applying both trendand error modeling was more accurate. Themost critical indicator of prediction accuracy,Root-Mean-Square Standardized, also indicatedthat the surface created using trend and errormodeling was the winner. Obviously, additionalprediction standard error surfaces could be gen-erated using different sets of parameters.

After selecting the most accurate surface, thespatial distribution of the reliability of that layer(or in other words, the levels of uncertainty inthe generated surface of ozone concentration)can be determined by creating a prediction stan-dard error map. This map was generated usingexactly the same parameters that were used forthe latest prediction map. From the MethodSummary interface, the methods and parame-ters that were used to generate the most accurategeostatistical surface were saved to an XMLle for use later in the process. The same sym -bology was used—bright yellow color for theareas of highest reliability and dark brown forareas of lowest reliability. This prediction mapshowed that areas at the map edges and in thecentral part of the study area had low predictionreliability.

To improve the reliability of the ozone dis-tribution surface and determine whether therewas a signi cant ozone-generating source in thevicinity of Lake Tahoe, more sampling pointswere needed. The project’s budget allowed forsix additional measurement stations for the nextseason. Locations of new monitoring pointswere chosen to improve the overall reliabilityof the geostatistical interpolation by sampling atthe locations within the study area where reli-ability was the lowest. In addition, all supple-mental points had to be located within the for-ested portion of the study area.

Locations for the new points could be se-lected manually in ArcMap based on the cri-teria previously stipulated. Alternatively,locations could be selected using an automatedmethod—a model. Part of the ArcGIS geopro-cessing framework, ModelBuilder provides agraphic environment for creating, running, andsaving models. Introducing a model would re-duce subjectivity, make the selection of the pro-spective locations reproducible, and make therules transparent.

The Automated Network Densi er modelcreated for this project generated an enhancedmonitoring network by adding supplementalpoints at the locations where they were mostneeded to reduce the overall prediction uncer-tainty. The prediction standard error geostatisti-cal surface, the input data for the model, was

The Automated Network Densifer Model

The model convertsthe two geostatistical

sur aces to the grid raster ormat and clipboth to the geographic extent o the studyarea. The maximumvalue on the standarderror o prediction

grid is ound, and thacell is converted intoa point shape le witha single eature and the value o the ozoneconcentration as itsattribute. This is theoptimal point or adding a station.

In the next iteration,this eature is ap-

pended to the origina sampling network shape le and a new geostatistical layer o prediction standard error is generated rom the original

31 stations plus thenew stations. The

XML le containingthe initial prediction

standard error sur ace

parameters is used to generate eachiteration. The procesis repeated to iden-ti y as many network

sampling locations asdesired.

In addition to veinput datasets and 18 utilized unctions,the model consistso two precondi-tions. First, since the

ozone concentration geostatistical layer isnot changed duringthe workfow, it isconverted into a gridonly during the rst iteration. Second, onthe last output grid o

prediction standard error is converted intisolines o equal valuo prediction standarderror (contours).



updated at each iteration as the model appendedone additional sampling point to the currentmonitoring network. The model could be runonce to indicate where the most crucial miss-ing point was located. It could also be run fora speci ed number of iterations to generate asmany additional points as the project’s budgetallowed or until a variable was no longer equalto a predetermined condition. For example, itcould be run until the maximum standard errorof prediction for the study area was less than thelargest acceptable potential error of prediction.The new points were sequentially placed at thelocation of the largest current potential standarderror of prediction.

The Automated Network Densi er modeltakes ve input data layers:n The established monitoring network of

31 sites (as a shape le)n The geostatistical surface of ozone concen-

tration that is based on the 31 original ozonemonitoring sites

n The geostatistical surface of predictionstandard error that is based on the original31 monitoring sites

n The XML le containing the methods andparameters used to create the predictionstandard error geostatistical layer originallyselected

n The forested area at the vicinity of Lake Tahoethat constitutes the study area (as a grid le)This model does not account for proximity

to roads or access restrictions because adequatedata for these factors was not available. As-

Classi cation using Geometrical Interval is now available.

suming the project budget allows for severalmonitoring stations to supplement the initialnetwork, six iterations of the model were runand the locations for six new sampling stationswere determined. The geostatistical surface of the standard error of prediction resulting fromthe sixth iteration was displayed together withthe relevant vector version of the isolines of pre-diction standard error.

The points added to the network can be moreeasily seen by turning off the newly created lay-ers in the table of contents and looking for thediscrepancies between the geostatistical layerand the isolines of prediction trust. Because thesupplemental points had to be located withinthe forested area, displaying the forested areagrid as a semitransparent green polygon madeit easier to understand why these locations werechosen.

As expected, increasing the density of themonitoring network decreased the standard er-ror of prediction for the entire study area. Tobetter measure the increase in reliability causedby adding supplemental stations, the outputgeostatistical surfaces of prediction standard er-ror generated by the model were converted intorasters with the same color schema.

In ArcGIS 9.2, users can now apply the Geo-metrical Interval classi cation to both rasters andgeostatistical surfaces simply by right-clickingon the layer, choosing Properties > Symbology,and using the Classi cation option to changethe classi cation to Geometric Interval. Thismethod was applied to the output grid from the

nal iteration of the Automated Network Densier model with the number of classes set to 10and a yellow to dark red color ramp.

Comparing the prediction standard error sur-face generated based on the initial 31 stationswith the surface that used all 37 stations illus-trated how the level of certainty of the predic-tion improved when the nal grid was renderedusing the same color symbology. The predictionstandard error based on the original number of stations was displayed with the additional sta-tions. The surface of prediction standard errorbased on 37 sampling points was considerablybrighter, signifying that the prediction of ozoneconcentration based on the enhanced networkwould be more reliable. Without going into nu-merical details, the network of 37 stations canprovide enough sampling data to signi cantlyimprove the trustworthiness of prediction overthe entire study area. Whether the six additionalstations for this network were suf cient to meeta minimum acceptable threshold of reliability isbeyond the scope of this article.

The nal result seems to con rm that the Automated Network Densi er model can improvethe network design process during the secondstage of sequential sampling. The proposedmethod appends the supplemental sites in a rea-sonable manner. It adds new points where theyare most effective in enhancing reliability. Withthe model, as each new point is added, the im-provement in the network can be observed.



Making Effective Use of Geostatistics

Geostatistics is a branch o science that applies statistical methods to spatial interpolation. Athough geostatistics was developed independently o GIS, it has become an integral part o GWithout a computer and GIS mapping ability, it wouldn’t be known outside a small group o gestatistical gurus. Just as one does not have to be a GIS expert to use GIS, one doesn’t need to ba geostatistician to make e ective use o geostatistics. Meteorologists, soil scientists, geologistoceanographers, oresters, and other scientists can bene t rom using appropriate geostatisticalmethods.

The unctionality o geostatistics is applicable when the studied phenomena are regionalizevariables that all between random and deterministic variables. The geographic distribution regionalized variables cannot be mathematically described as deterministic; yet the distributio intensity o those phenomena is not random. Most o the natural phenomena that take placein the atmosphere, seawater, or soil meet the criteria o this category. The distribution o air tem

perature, the salinity o an ocean, soil moisture, or ore deposit concentrations in a geologic layare examples o regionalized variables. Even though they don’t represent truly natural phenomecrop yield prediction and air pollution might also be subjects or geostatistical analysis.

It is not practical or possible to make exhaustive real-world observations so sampling is usor these analyses. The ultimate goal o sampling is to get a good representation o the phenomenon under study. Spatial sampling is an important consideration in environmental studies bcause sample con guration infuences the reliability, e ectiveness, and cost o a survey. Intensivsampling is expensive but gives a precise picture o spatial variability or a given phenomenoHowever, sparse sampling is less expensive but may miss signi cant spatial eatures. Practicsampling constraints and the availability o existing in ormation can enhance the development a sampling scheme.

To ensure a high level o con dence in the results o any geostatistical interpolation, it is impotant to have a su cient number o well-distributed sampling stations in the monitoring networkHow many stations are su cient and how can their distribution be optimized? GIS, and particula

the ArcGIS Geostatistical Analyst extension, can help answer this question.One technique used to design an optimal sampling network or a regionalized variable, such air pollution, is sequential sampling. Sequential sampling is based on extended knowledge o tarea to be sampled and expertise in the actors controlling the distribution o a regionalized varable. Familiarity with the terrain and the phenomena should in orm the initial choice o site or tsampling network. The results o this preliminary study are used to optimize the scheme by addnew sampling points both in areas having the lowest reliability and in possible hot spot areas (e.areas o maximum concentration, high variability, or uncertain measurements).

The kriging interpolator is considered the most sophisticated and accurate way to determinthe intensity o a phenomenon at unmeasured locations. Kriging weights surrounding measurevalues are based not only on the distance between measured points and the prediction locatiobut also on the overall spatial arrangement o the measured points. Except or generating an estmated prediction, kriging can provide a measure o an error, or uncertainty o the estimated suace. Since the estimation variances can be mapped, a con dence placed in the estimates can be

calculated and their spatial distribution can be presented on a map to assist in the decision-makinprocess. The prediction standard error maps show a distribution o a square root o a predictiovariance, which is a variation associated with di erences between the measured and calculatevalues. The prediction standard error quanti es an uncertainty o a prediction.

By Witold Fraczek, ESRI Application Prototype Lab,and Andrzej Bytnerowicz, USDA Forest Service

Applying GIS tools for studying the geographicdistribution of regionalized variables

About the AuthorsWitold Fraczek is a longtime employee of ESRIwho currently works in the Application Proto-type Lab. He received master’s degrees in hy-drology from the University of Warsaw, Poland,and remote sensing from the University of Wis-consin, Madison.

Andrzej Bytnerowicz is a senior scientist withthe USDA Forest Service Paci c SouthwestResearch Station in Riverside, California. Hisresearch focuses on various aspects of air pol-lution effects on forest and other ecosystems.He received his master’s degree in food chem-istry from the Warsaw Agricultural Univer-sity, Warsaw, Poland, and doctorate in natu-ral sciences from the Silesian University inKatowice, Poland.

ReferencesArbaugh, M. J., P. R. Miller, J. J. Carroll,B. Takemoto, and T. Procter. 1998. “Relation-ship of Ambient Ozone with Injury to Pines inthe Sierra Nevada and San Bernardino Moun-tains in California, USA.” Environmental Pol-lution, 101, 291–301.

Fraczek, W., A. Bytnerowicz, and M. Arbaugh.2003. “Use of Geostatistics to Estimate SurfaceOzone Patterns.” Ozone Air Pollution in the Si-erra Nevada: Distribution and Effects on For-ests. Elsevier, 215–247.

Gertler, A. W., A. Bytnerowicz, T. A. Cahill,M. Arbaugh, S. Cliff, J. Kahyaoglu-Koracin, L.Tarnay, R. Alonso, and W. Fraczek. 2006. “Lo-cal Air Pollutants Threaten Lake Tahoe’s Clar-ity.” California Agriculture, 60, No. 2, 53–58.

Goldman, C. R. 2006. “Science: a Decisive Fac -tor in Restoring Tahoe Clarity.” California Ag-riculture, 60, No. 2, 45–46.

Krupa, S. V., A. E. G. Tonneijck, andW. J. Manning. 1998. “Ozone.” In Recognitionof Air Pollution Injury to Vegetation: A Pictorial

Atlas, R. B. Flagler, ed. Air and Waste Manage-ment Association, Pittsburgh, Pennsylvania. 2-1through 2-28.

ESRI. 2001. ArcGIS Geostatistical Analyst: Sta-tistical Tools for Data Exploration, Modeling,and Advanced Surface Generation, an ESRIwhite paper .


http://slidepdf.com/reader/full/arcuser-optimizing-sampling 8/8Printed in USA

ESRI International O fces

ESRI Regional O fces

Olympia360-754-4727

St. Louis636-949-6620

Minneapolis651-454-0600

Boston978-777-4543

Washington, D.C.703-506-9515

Charlotte704-541-9810

San Antonio210-499-1044

Denver303-449-7779

Cali ornia909-793-2853ext. 1-1906

1103182/08sp

Copyright © 2008 ESRI. All rights reserved. ESRI, GIS by ESRI, ArcView, ArcIMS, the ESRI globe logo, ArcIn o, ArcEditor, ArcSDE, ArcGIS, @esri.com, www.esri.com,and Geography Network are trademarks, registered trademarks, or service marks o ESRI in the United States, the European Community, or certain other jurisdictions.Other companies and products mentioned herein may be trademarks or registered trademarks o their respective trademark owners.

1-800-GIS-XPRT (1-800-447-9778)

www.esri.com

Locate an ESRI value-added resellernear you atwww.esri.com/resellers

Outside the United States,contact your local ESRI distributor.For the number o your distributor,call ESRI at 909-793-2853,ext. 1-1235, or visit our Web site atwww.esri.com/distributors

For More In ormation

ESRI 380 New York StreetRedlands, Cali ornia92373-8100 USA

Phone: 909-793-2853Fax: 909-793-5953E-mail: in [email protected]

For more than 35 years, ESRI hasbeen helping people make better decisions through management

and analysis o geographic in ormation. A ull-service GIS company, ESRI o ers a ramework or implementing GIS technology and business logic in any organization rom personal GIS onthe desktop to enterprise-wide GIS

servers (including the Web) and mobile devices. ESRI GIS solutionsare fexible and can be customized to meet the needs o our users.

Philadelphia610-644-3374

Australia

www.esriaustralia.com.auBelgium/Luxembourgwww.esribelux.com

Bulgariawww.esribulgaria.com

Canadawww.esricanada.com

Chilewww.esri-chile.com

China (Beijing)www.esrichina-bj.cn

China (Hong Kong)www.esrichina-hk.com

Eastern A ricawww.esriea.co.ke

Finlandwww.esri-fnland.com

Francewww.esri rance. r

Germany/Switzerland

www.esri-germany.dewww.esri-suisse.ch

Hungarywww.esrihu.hu

Indiawww.esriindia.com

Indonesiawww.esrisa.com.my

Italywww.esriitalia.it

Japanwww.esrij.com

Koreawww.esrikr.co.kr

Malaysiawww.esrisa.com.my

Netherlandswww.esrinl.com

Northeast A rica202-516-7485

Poland

www.esripolska.com.plPortugalwww.esri-portugal.pt

Romaniawww.esriro.ro

Singaporewww.esrisa.com

Spainwww.esri-es.com

Swedenwww.esri-sgroup.se

Thailandwww.esrith.com

Turkeywww.esriturkey.com.tr

United Kingdomwww.esriuk.com

Venezuelawww.esriven.com

Documents

Arcuser Optimizing Sampling