Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Best practices for sensor networks
and sensor data management
Citation
ESIP EnviroSensing Cluster (2014). Community Wiki Document, “Best practices for sensor
networks and sensor data management”, Federation of Earth Science Information Partners.
http://wiki.esipfed.org/index.php/EnviroSensing_Cluster. (wiki document accessed 12-1-2014).
Contributors (as of December 2014)
Each chapter has a lead editor who is responsible for periodically compiling comments and
contributions into stable versions of this document which will be archived as PDF versions and
can be found here. If you contribute to this document by editing or adding text, images or
comments you agree to the use of that material in the regularly published PDF versions of this
document. Please add your name to the list of contributors if you feel you made a significant
contribution.
Corinna Gries, University of Wisconsin-Madison, North Temperate Lakes LTER
Don Henshaw, USFS Pacific Northwest Research Station, Andrews Forest LTER
Scotty Strachan, University of Nevada, Reno
Renee F. Brown, University of New Mexico, Sevilleta LTER & UNM Sevilleta Field Station
Christopher Jones, National Center for Ecological Analysis and Synthesis
Christine Laney, University of Texas at El Paso, Jornada Basin LTER
Branko Zdravkovic, University of Saskatchewan
Richard Cary, University of Georgia, Coweeta LTER
Jason Downing, University of Alaska Fairbanks, Bonanza Creek LTER
Adam Kennedy, Oregon State University, Andrews Forest LTER
Mary Martin, University of New Hampshire, Hubbard Brook LTER
Jennifer Morse, University of Colorado Boulder, Niwot Ridge LTER
Fox Peterson, Oregon State University, Andrews Forest LTER
John Porter, University of Virginia, Virginia Coast Reserve LTER
Jordan Read, US Geological Survey
Andrew Rettig, University of Cincinnati
Wade Sheldon, University of Georgia,, Georgia Coastal Ecosystems LTER
Scope
This document on best practices for sensor networks and sensor data management provides
information for establishing and managing a fixed environmental sensor network for on or near
surface point measurements with the purpose of long-term or “permanent” environmental data
acquisition. It does not cover remotely sensed data (satellite imagery, aerial photography, etc.),
although a few marginal cases where this distinction is not entirely clear are discussed, e.g.,
phenology and animal behavior webcams. The best practices covered in this document may not
all apply to temporary or transitory sensing efforts such as distributed “citizen science”
initiatives, which do not focus on building infrastructure. Furthermore, it is assumed that the
scientific goals for establishing a sensor network are thought out and discussed with all
members of the team responsible for establishing and maintaining the sensor network. i.e.,
appropriateness of certain sensors or installations to answer specific questions is not discussed.
Information is provided here for various stages of establishing and maintaining an environmental
sensor network: planning a completely new system, upgrading an existing system, improving
streaming data management, and archiving data.
Below are chapters of a living document to which contributions can be made by anybody
interested in this subject. Please post questions, answers, experiences with particular
software/hardware/setup, comments, additions, edits, resources, and publications. Please use
common online etiquette. If conflicting views arise they should be discussed in the
EnviroSensing e-mail list.
backtoEnviroSensingClustermainpage
Contents1PlanningProcess2ImplementationFeasibility3AssemblingtheTeam4OverviewofChapters
PlanningProcessThroughoutthisdocumentitisemphasizedthattheinitialplanningisextremelyimportant,aswellastheinclusionofexpertiseinmanydifferentareas,scientificandtechnical,intheearlydiscussionandplanningphasebeforeaproposaliswritten.Ifallfieldsofexpertisearenotconsulted/incorporatedpriortomakinglocation,budget,deployment,andtimelinedecisions,criticalinterdependenciesarelikelytobeoverlooked(e.g.powerrequirements,topographicconstraints,constructiontoolsrequired,etc.).
Although,thediscussionhereisgearedtowardmaintainingsensornetworksoveranextendedperiodoftime,planningisequallyimportantforshortterminstallations.Experiencehasshownthatmanyshortterminstallationshavebecomelongtermevenifthatwasnotintendedinitiallyandmanysmallinstallationshavebeenexpandedtocovermoreareaormeasuremoreparameters.
Clearly,sensornetworkdeploymentsaredrivenbyambitioussciencequestions.However,goodplanningcanhelpanticipatelimitationsandpreventtimeissuesfrombecomingthedrivingforce.Focusingontheoverarchingimperativesofgooddesign,properplacement,organizeddataflow,andawelltrainedandmotivatedteam,willresultinsuccessfulimplementationandcontinuedmaintenance.Compromisedinstallationsdiminishtheimpactsoftheoriginalstudy,candrainoperatingbudgetsunnecessarily,andinhibitleveragingofthescienceforfutureworkandfunding.
ImplementationFeasibilityDuringtheexperimentorprojectdesignphase,definingtheprimarymeasurementobjectiveisthefirststeptoplanninganobservationsiteandplatform.Answeringthesegeneralquestionsishelpfulbeforeaddressingspecifictechnicalissues:
Whereisthegeographicareaofinterest?Whatarethemeasurementsofinterest?Whatisthedesiredaccuracyandfrequencyofmeasurement?Howcriticalarethesensormeasurements?Candatagapsbetolerated?Issensorredundancynecessary?Whattypeofexperimentalmanipulationisdesired(ifany)?Whattypesoflocalizedtopographyarelikelytoyield“representative”measurementsatthetimefrequencyofinterest?Whatisthetotalfundingamountforpersonnel,travel,tools/equipment,fees,andscienceinstruments?Whatistheexpectedscope/lifetimeofthedeployment?Willitbeexpandedinthefuture?Considerscalingpossibility(moresites,moresensors)evenifitisnottheimmediategoal.Evaluatecommercialturnkeyinstallationsvs.systemsdevelopedfromcommercialoropensourcecomponents.Considerations:cost,skills,maintenance,longevityofthecompanyprovidingthewholesystemoreachcomponent,functionality,interoperability,accesstocontinuedsupport.
AssemblingtheTeamSeveralverydifferentareasofexpertisearerequiredtosuccessfullyplan,install,andmaintainsensingsystems.Someoftheseroles/skillsetscancertainlybeprovidedbyasingleindividualorindividuals.
Roleswithinateamestablishingandmaintaininganenvironmentalsensornetwork:
Scientist-determinesthetypeofdataandsamplingfrequencyneededtoanswerthescientificquestionswithinbudgetlimitations.
Sensorsystemexpert-knowsthetypesofsensorsandplatforms,theirinstallationandprogrammingneededtoanswerscientificquestions.IsfamiliarwithspecificclimateandterrainissuesandQA/QCapproachesinthefield.
Fieldlogisticsexpert(formajorsiteconstruction)-familiaritywithtransport,construction,weather,tools,andsuppliesforconstruction
Fieldconstructionandfabricationexpert -understandsconcrete,metalstructure,towerdesign,fencing,underwateranchors,floatingdevices,loadestimates
Fieldworkers/assistants-manypeopleareneededforremoteconstructiontasks,sensorwiring,initialsitesetup,cablemanagement
Fieldtechnician-familiarwithmaintenancetasksincludingminorrepairs,maintainingacalibrationschedule,otherregularsensormaintenancetasks.Fieldtechniciansneedtohaveagoodunderstandingofthescienceapplicationandtheenduser,theyneedtobecomfortablewithtechnology,andapplyingknowledgefromoneareatoanother,havecreativeproblemsolvingandcriticalthinkingskillsandpayattentiontodetail.Theyshouldhavebasicelectricalandmechanicalknowledge(e.g.,multimeteruse,basicequipmentinstallation,repairandprogramming).Dependingonsiteconditionstheyalsoneedtobecertifiedintower/rock/treeclimbing,boathandling,SCUBAdiving,respectivesafetytraining,andenjoyskiing,hiking,off-roaddrivingetc.plusneedtobeskilledinGPSorienteering,navigation,andbasicmapmaking.
Communications/datatransportexpert/LicensedCommercialradiooperator(ideal,butnotrequired)-needstobefamiliarwithmovingdigitaldataoverwiredorwirelessnetworksfromremotepointstoprojectserversandshouldhavebasicknowledgeofradiocommunication(e.g.,technician-levelamateurradiolicense,basicantennatheory,IPnetworking)
Networkadministrator/Systemadministrator-isresponsiblefornetworkarchitecture,redundancyofsystemsfromdatacentertofieldsites,backup,datasecurity
Softwaredeveloper-skillsinpreferredprogramminglanguage
Datamanager-needstobefamiliarwithmeansofdocumentingproceduresformaintainingcommunicationbetweenallrolesinvolved,specifically,meansfordocumentingfieldeventsandtheirramificationforthedataquality.Needstoknowapproaches/softwareformanaginghighfrequencystreamingdata,standardQA/QCroutinesforsuchdata,approachestodocumentingdataprovenanceanddataarchiving(spacerequirements,backup,storageofdifferentQ/Clevels)andhavedatabase/softwarepackageprogramming/configurationexpertise
Datatechnician-needstobethoroughandreliableduringtaskslike‘eyeon’qualitycontrol,manualdataentryetc.
OverviewofChapters
Thefollowingchapterscontainedinthisguidearestructuredtoprovideageneraloverviewofthespecificsubject,anintroductiontomethodsused,andalistofbestpracticerecommendationsbasedonthepreviousdiscussions.Casestudiesprovidespecificexamplesofimplementationsatcertainsites.
SensorSiteandPlatformSelectionconsidersenvironmentalissues,siteaccessibility,systemspecifications,sitelayout,andcommonpointsoffailure.DataAcquisitionandTransmissionisconcernedwiththeacquisitionofsensordatafromthefield,whileensuringtheintegrityofthosedata.Also,remotecontrolofthesystem.SensorManagementTrackingandDocumentationoutlinestheimportanceofcommunicationbetweenfieldanddatamanagementpersonnelasfieldeventsmayalterthedatastreamsandneedtobedocumented.StreamingDataManagementMiddlewarediscussessoftwarefeaturesformanagingstreamingsensordata.SensorDataQualitydiscussesdifferentwayssensordatamaybecompromised,howtoautomaticallycontrolforitinthedatastream..SensorDataArchivingintroducesdifferentapproachesandrepositoriesforarchivingandpublishingdatasetsofsensordata.
Sensor Site and Platform Selection
Considers environmental issues, site accessibility, system
specifications, site layout, and common points of failure.
backtoEnviroSensingClustermainpage
Fig.1Typicalenvironmentalsensordeploymentwithscience,support,andcommunicationsystems.Photo2013ScottyStrachan,NevCANSheepRangeBlackbrushstation
Contents1Contacts2Reviewers3Overview4Introduction5Methods
5.1Environmentalconcerns5.2Siteaccessibility5.3Scienceplatformselection5.4Supportsystemspecification5.5Sitelayout
6BestPractices6.1CommonPointsofFailure
7CaseStudies
ContactsTheleadeditorsforthispagemaybecontactedforquestions,comments,orhelpwithcontentadditions.
ScottyStrachan-DepartmentofGeography,UniversityofNevada,Reno-scottyatdayhike.netAdamKennedy-AndrewsForestLTER-adam.kennedyatoregonstate.edu
ReviewersThispagewasreviewedby:
JasonDowning,BonanzaCreekLTERInformationManager,on5/2/2014RichardJasoni,AssociateResearchEcologist,DesertResearchInstitute,on6/30/2014
Overview
Selectionofexactlywhereandhowtoacquiredataviain-situsensingeffortsisacrucialpointinthescienceprocesswhereenvironmentalresearchisconcerned.Decisionsmadewhenchoosingsites,sensorpackages,andsupportinfrastructureinturnplaceboundariesonwhatthefinalsciencedeliverablescanbe.Datatypes,quantity,andqualityaremoreorlesssetinstoneduringthisprocess.Initialcosts,timeframes,andsustainabilityarealsodeterminedbythesechoices.Selectionsneedtobemadebasedonthedesiredscienceproducts,butalsoinconsiderationofawidearrayofvariablesincludinglandownership,access,equipmentbudget,long-termmaintenancecapability,previousresearch,andconstruction/deconstructionlogistics.
Settingupterrestrialsensingsystemsisamajorinfrastructure/personnelcommitmentwithbudgetaryandenvironmentalconcerns,andeveryefforttowardsmaintainingarobust,low-impact,andlong-termdatastreamshouldbemade.Becauseeachregionpossessesuniquegeography,thereisno“onesizefitsall”solution.Instead,aseriesofdecisionsneedstobemade,withthegoalsandcapabilitiesoftheresearchteamdefinedinthecontextofclearly-articulatedsciencequestionsandobjectives.
Introduction
Fig.2Progressionofworkinselectingasiteanddesigningasciencedeployment.
Identifyingboththedeploymentstrategy(site,process)aswellasthephysicalhardware(sensorplatformsandsupportinfrastructure)forenvironmentalsensingisusuallyadauntingtask.Akeyobjectiveoftheresearchteamshouldbetokeepthesciencecontextinviewduringthisprocess,aslogisticrealitieswilloftenclashwith"ideal"scientificconditions.Veryoftenthedecisiontreeforchoosingexactlocationsanddeploymentschemesisdependentoninteractingfactors(suchaspermitting/geography/access;Fig.2).Thereisalsoavastarrayofpossiblesensor/hardwarepackagesavailableforamultitudeofscienceapplications.
ItiscriticalthatPrincipalInvestigators(PI’s),logisticaltechs,andsensorspecialistsworktogethertodevelopspecificdeploymentplansandalternatives,ideallyinthepre-proposalstage.Planningtopicsmustincludescienceobjectives,operatingbudgets,proposedlocations,seasonalweatherpatterns,powersources,communicationsoptions,landownership,distancefrommanaginginstitutions,availablepersonnel/expertise,andpotentialexpansion/future-proofing.Allofthesecategoriesareequallycriticalfordiscussionasproposedinstrumentationprojectsmovetowardsimplementation.
MethodsSitevisits,permit/agreementnegotiations,equipmentspecifications,anddeploymenttimelinesneedtobe
initiatedconcurrentlybecauseallphasesofdeploymentareinterdependent(Fig.2).TheP.I.,togetherwiththetechnicalpersonnel,shouldidentifysitesforsensorandequipmentdeploymentbasedonscienceneeds,localtopography,permit/agreementavailability,logisticalaccess,andavailabilityofservices(suchaspowerandcommunications).Portionsoftheplan(suchassomepurchasingdecisions)shouldremainflexibleuntiltheprecisesites,permits/agreements,anddataflowplanhavebeenpositivelydetermined.
Environmentalconcerns
Environmentalconditionshaveconsiderablebearingonscienceapplication,platformdesign,constructionlogistics,accessrestrictions,equipmentreliability,andmaintenancecost/longevity.Conditionsforinsitusensingcanvarytremendouslyfromregiontoregion;therefore,siteandequipmentselectionmustbeconcideredonacase-by-casebasis.
Localtopographicvariablesinclude:northernversussouthernexposure,whichcanaffecthoursofdirectsunlightandsnowpersistence;andvalley/sinkversusridgelinesettings,whichcanaffectdailytemperaturecycleandwindcharacteristics.Thedifferencesinairflow,windexposure,coldsinks,snowdrifts,skyexposureforsolarpanels,andpossibleradio/communicationspathwaysareallimportantvariableswhenselectingasiteandwhattypeofequipmentwillbedeployed.
Dominantvegetationconditionsandpotentiallong-termgrowthcanaltersensorreadingsviashadingeffects,affectingtemperature,radiation,andsnow-relatedmeasurements.Radiocommunicationsarealsoaffectedbyvegetation,withmostmicrowavefrequenciesusedbyhigh-speeddataradiosbeingstronglyattenuatedbytreesandbrush.Vegetationcanalsobealong-termhazardintheformsoffirefuelsanddeadfalls.
Visibilityandthevisualimpactofdeploymentsshouldbeconsideredforbothsecurityandaestheticconsiderations.Sometimesreductionofvisualimpactisrequiredbylandowners,butingeneralitissimplygoodpractice.Metalstructurescanbecamouflagedwithpainttoreducevisibility,structureheightsmaybereducedtoblendwithvegetation,andgrounddisturbancecanbekepttoaminimumtoavoidbiasingcertaintypesofmeasurementsanderosion.
Dominantweatherconditionsdeterminewhatlevelsofseasonalaccessareavailable,whatstructuraldesignsshouldbeused,andwhatsortofequipmentshouldbepurchased.Extremetemperatures,tropicalstorms,lightning,snowdepth,riming/ice,UVexposure,highhumidity,windspeeds,saltwaterexposure,flooding,andstreamdepthvariationareallexamplesofconditionswhichwillinfluencedesignanddeploymentplans.
Wildlifecanprovidehazardconsiderationsorbeaffectedbyproposeddeployments.Birdperchingandflightpaths,cattle,soilinvertebrates,rodents,andlargemammalscanalldisturborbeaffectedbysensorsandequipmentinstalledinthefield.Landownerswillhaveregulationsorpreferencesconcerningthesefactors,andproactivestepsarenecessaryonthepartofthescienceteamtominimizethesehazards.
Sensitivitytolocalpoliticalandsocialissuesneedtobeconsidered,asobjectivesciencedatashouldconstructivelyservethelocalpopulationsaswellasthescientistsandfundingagencies.
Sitesecurityisaprimaryconcernwhenplanningtodeploysensorsandequipmentintothefield.Humantheft/vandalismisapotentialcauseofsensordisturbanceorfailure.Whileremotedeploymentsarenearlyimpossibletosecurephysically,measuressuchascamouflaging,informativesigns,fencing,andlockboxesmaybeemployedtomitigatehostileorirresponsiblepassers-by.
Hazardstosensorsincludenaturaldisturbance/disasterssuchaswildfire,flooding,extremewinds,andmasswasting.Plannersshouldbeawareofallthesepossibilitiesandatleastexaminethelikelihoodsoftheseeventatsiteswhichhavebeenevaluatedfromthescientificpointofview.
Siteaccessibility
Fig.3Seasonalaccessmayvaryhighlydependingonlocation,limitingthetypesofmaintenancepossibleatanygiventime.
Locationsforin-situsensingmustbeaccessedfordatacollection,survey,construction,andmaintenanceoverthelifeoftheproject.Seasonalconditions,roads,andtopographydeterminewhattypesofaccessmaybeusedduringdifferenttimesoftheyear.Categoricalconsiderationsinclude:
Vehicularaccess.Commercialvehicle/equipment,2WDauto,4x4truck,ATV,snowmachine,boat,helicopter.Non-motorizedaccess.Hiking,skiing,packanimals,snowshoeing.Accessimprovements.Roadbuilding,trailbuilding,traildemarcation,safetyrails,harnessanchorpoints.Seasonalaccess.Defineaccessbyspring/summer/fall/winterseasons.Thisisdirectlyrelatedtolocalweather/topographicalconditions.Constructionaccess.Heavyequipment,specialequipment,heavyloads,andheavyfoottrafficarealllikelypossibilitiesdependingonmonitoringdesign.Minimalimpactconsiderations.Cantraffic/accessbedirectedinawaytominimizeenvironmentalimpact(e.g.erosion,vegetation)?Solutionsincludeboardwalks,bridges,raisedsteps,delineatedpathways.
Scienceplatformselection
Fig.4Scienceinstrumentationspecificationmustbedrivenbysciencequestionsandenvironmental/logisticalconstraints.
Oncethesciencequestionshavebeenestablishedandsiteconditionsareknown,anitemizedlistofsensorandsupportsystemplatforms/hardwaremaybeassembledthatbestfitstheapplicationandbudget.Primaryconsiderationsincludereliability,comparabilitywithothersimilarfieldsystems,technological
(e.g.programming)requirements,budget,andsystemflexibility(e.g.upgrades,expansion,telemetryoptions).Accuracy,precision,andexpectedperiodofusepriortocalibrationorreplacementmayalsobeaconsideration.Insomefieldsofstudy,thereareonlyoneortwoalternativestochoosefromintermsofscientificinstrumentation,whereasinotherstherecanbemanychoices.Optionscanbenarrowedbyresearchingwhatequipment/standardsareusedbyexistinginstallationstowhichcomparabilityisdesired.Onceadataacquisitionplatformandsensorarrayischosen,remainingsupportsystemsarethendesignedaroundthiscoreequipment.
Supportsystemspecification
Thesubsystemsofinfrastructure,electricalpowersupply,anddatacommunicationsshouldallbedesignedtobestsupportthescienceplatformsinallseasonsoverthelongterm.Whilesomevendorsoffer“all-in-one”packagessuppliedwithstandardinstrumentation,itisbestfortheresearchteamtoassesswhetherthesesolutionsareadequatefortheirchosensiteandobjective.Quiteoftenseveralsciencequestionsarebeingaddressedinlargerdeployments,andmultiplehardwaresolutionsfromseveralvendorsmustbecombinedintoonedeployment.Thesupportsystemsshouldbespecifiedandscaledappropriately.
Physicalinfrastructures–thesearethebuilding-blocksofanyremotedataacquisitionsite,includingtripods,towers,poles,buoys,solarpanelracks,storageboxes,fencing,concretepads,andthelike.Quiteoftenasingletripodortowerdoesnothaveadequatespaceorstructuralintegritytosupportallofthesensors,antennas,solarpanels,batteries,andotheritems,soatypicalsitedesignincorporatesmultiplestructuralcomponents.
Powergenerationandstorage–forsustaininglong-termreliabledatastreams,powerindependenceiscritical.Stationsshouldbecapableofgeneratingandstoringtheirownpowerlocally,aswellastakingadvantageofanygridorotheravailablepowerthatiswithinbudgetanddesigncriteria.BecausethemajorityofrelatedelectronicsareultimatelypoweredbyDCvoltage,havingapowergenerationsystemandDCbatterybankforeverysite(andsometimesdiscretesubsystems)isrecommendedtominimizethelossofpowerandtheresultingdatagaps.IndependentgenerationsourcesaremostcommonlyPVarrays(solar),wind,orwaterturbines.Forreasonsofcost,reliability,andmaintenanceissues,PV(solar)isrecommendedastheprimaryon-sitegenerationsourceifenvironmentalconditionsallow.Incorporatingsimplicity,redundancy,andexcesscapacityisimportantforlong-termreliability.
Datacommunications–Useofreal-timecommunication(inadditiontolocalstoragecapacity)isdesirableinordertotransmitdata,monitorsystemhealthperformance,troubleshootproblems,andminimizedatagaps.Thisisusuallyperformedusingradiocommunications(whethervendor-specificorbuildingageneral-purposefieldIPnetwork).Communicationssystemsneedtoberobust,secure,andshouldhavelowpowerrequirements(referto“DataAcquisitionandTransmission”BestPracticesforfurtherdetail).
Constructiondetails–Whenselectinganddesigningthesensorandsupportsystems,manydetailsneedtobeconsideredwhengeneratingspecificationsandpurchasinghardware.Wiresshouldbeprotectedinconduitandstorageenclosurestoavoidexposuretodamageandseasonaldegradation.Wirelengths,enclosuresizes,andmountinglocationsshouldbeplannedforaccordingly.Anchoringforsupportstructureshouldbedesignedtowithstandworst-caseweather/environmentalconditions.Useofcorrosion-resistantmetalsforstructureandhardwaresuchasgalvanizedsteelandaluminumwillgreatlyreducefailureorongoingmaintenanceproblems.
Sitelayout
Fig.5Carefullyplanningasitelayoutinadvancecanpreventsurprisesandsetbacksduringinstallation.
Sitelayoutatfirstmightseemtrivial,butisveryimportantwhenconsideringinteractionsofthevarioussubsystemsthatcaninfluencesensor/equipmentreliabilityanddataquality.Sciencequestions/objectivesshoulddrivetheplacement/separationofsensorstooptimizemeasurementquality,followedbyplacementofsupportsystemsandadditionalstructure.Solararraysneedtobeangledforsunexposure,minimalshading,andsnowshedding.Theimpactsofsitestructureonmeasurementssuchaswindeddies,incoming/outgoingradiation,cameraviewsheds,orprecipitationcatchzonesneedtobeconsideredaswellasaestheticimpactsiflocatedinaregionthatisfrequentlyvisitedbythepublic.Poweranddatacablerunsshouldbeprotectedandkeptasshortaspossible;voltagedropoverlongrunscanbeaconsiderationinlayoutanddesign.Stipulationsinsitepermitsmaybedriversofsitelayoutandconstruction.Oncethesitelayoutisdesignedandmapped,specificationofconstructionmaterials,sensorcables,andothersuppliesmaybeoptimized.
BestPractices
Fig.6Theapproachtodeploymentshouldbeasdurable,reliable,andflexibleaspossibletoaccommodateunforeseenconditionsandchangingsciencequestionsortechnologyimprovementsoverthelongterm.
Selectionofdeploymentsites,sensorpackages,andsupportsystemsareinteractingprocesseswhichcanrequiresomeiterationbeforearrivingatthefinaldeterminations.Unlessthesciencequestionsareextremelynarroworexceptionalinnature,itisunlikelythatanyoneofthesedecisionscanbemadeinavacuumwithoutconsideringtheothers.Withthisinmind,thefollowingoverarchingrecommendationsshouldbeemphasized:
P.I.consultationwithsystem/hardware/constructionspecialistswhileintheproposalphasewillminimizebudgetsurprisesorplatformcompromiselaterintheprocess.
Dataqualityandlongevityshouldbetheultimategoalswhendesigningthedeployment.Makingchoicesformorerobustandwidely-usedcoresystemsandsensorswillensurethatdatacomparabilityismaximizedandhardwareproblemscorruptingdataorcreatinggapsareminimized.Purchaseofreliableandknownequipmentisnotasexpensiveasrepairing/replacingequipmenthalfwaythroughthestudyorlosingvaluabledata.
Whendataqualityandcontinuityisparamount,useofreplicatesensorsorstationsmayberequired.
Planningforreal-timeconnectivityiscrucialforreducingfieldmaintenancetimeanddatagaps.
Optimalsiteselectiontoanswersciencequestionscanoftenbeimpededbypermitrequirementsandlandownerpreferences.Startingtheconversationwithlandownersearlyintheprocessmayimprovethechanceofgettingthelocations/deploymenttypesthataredesired.
Standardizingsensorandsupporthardware,software/programming,andstructuraldesignsacrossmultiplesitesminimizesmaintenanceissuesaswellasconstructioncostsanddesigntime.
Assessingaccesscapabilitiestothesiteswillallowforplanningofemergencymaintenanceaccess,procedures,andcosts.
Overbuildingstructure,powercapacity,andsiteinfrastructure(e.g.cabling,networking)willpreventproblemsinthecaseofunforeseeneventsorsiteexpansion.
CommonPointsofFailure
Powerproblemsareoneofthemostfrequentcausesoftotalsystemfailure.Batteryfatigue,looseconnections,andelectricalshortsneedtobeanticipatedandpreventedwherepossible.Powersystemsneedtobeprotected,over-engineered,andreplicatedwhereverpossible.
Temperatureextremesofheatorcoldcancauseelectronicormechanicalfailureofindividualsensorsandsystems.Insulatingenclosures,ventilatingenclosures(activeorpassive),andplacementofequipmentinshelteredzonescanhelpalleviatetheseproblems.
Humidityandcondensationcanbeaseriousissueforelectronicslongevityandcircuitperformance(includingaccuracy).Inzonesofhighaveragehumidity,sealingenclosuresandprovidingsomemeansofreducinghumidity(e.g.desiccantpackets)isdesirable.
Sensorscanbedisruptedbywildlife.Hardeningofsensorsystems(e.g.,armoringcables,fences)canhelpwithsomeproblems.Near-realtimedatafeedsallowrapiddetectionofproblemsthatwilloccur.
Lightningstrikesornear-missesareacommonproblematexposedormountainoussites.Extensivegrounding(e.g.exposedcopperwirenetwork)anduseofsurgeprotectionthroughoutthepowersystemandatendsoflongpoweranddatacablerunswillcompartmentalizethesiteelectricallyandprotectasmanycomponentsaspossible.
Lackofdatastoragereplicationcancauselossofdata.Incorporatinghighcapacitystorageon-site(datalogger)aswellasoff-site(database),thisproblemcanbemitigated.
Personnelturnovercoupledwithlackofprocessandhardwaredocumentationcanleadtodatadiscontinuityorequipmentfailure(seeSensorManagementandTrackingforadditionaldetails).
CaseStudies
NevCANTransectsorWalkerBasin(Scotty)---Tobecompleted,willincludeastationdesignandsystems,maintenance/accessplan,dataflow,andsomephotos/diagrams.AndrewsResearchSites(Adam)---TobecompletedSevilleta-Reneetocompletewithmultiplecasestudyexamples
Data Acquisition and Transmission
Concerns the acquisition of sensor data from the field and
remote control of the system, while ensuring the integrity
of those data.
backtoEnviroSensingClustermainpage
Contents1Overview2Introduction
2.1Considerations2.1.1CollectionFrequency2.1.2Bandwidth2.1.3Protocols
2.1.3.1Hardware2.1.3.2Network
2.1.4Line-of-sight2.1.5Power2.1.6Security
2.1.6.1PhysicalSecurity2.1.6.2NetworkSecurity
2.1.7ReliabilityandRedundancy2.1.8Expertise2.1.9Budget
3Methods3.1Manual3.2Unidirectional
3.2.1GeostationaryOperationalEnvironmentalSatellite(GOES)3.2.2MeteorBurstRadio3.2.3IridiumSatelliteservice
3.3Bidirectional3.3.1ISMbandradionetwork3.3.2Cellular3.3.3Vendor-specificradionetwork3.3.4Satelliteinternet3.3.5Licensedradio3.3.6MeshNetworks3.3.7Wired
4BestPractices5CaseStudies6Resources
6.1GOES7References
OverviewTraditionally,environmentalsensordatafromremotefieldsitesweremanuallyretrievedduringinfrequentsitevisits.However,withtoday'stechnology,thesedatacannowbeacquiredinreal-time.Indeed,thereareseveralmethodsofautomatingdataacquisitionfromremotesites,butthereisinsufficientknowledgeamongtheenvironmentalsensorcommunityabouttheiravailabilityandfunctionality.Moreover,thereareseveralfactorsthatshouldbetakenintoconsiderationwhenchoosingaremotedataacquisitionmethod,includingdesireddatacollectionfrequency,bandwidthrequirements,hardwareandnetworkprotocols,line-of-sight,powerconsumption,security,reliabilityandredundancy,expertise,andbudget.Here,weprovideanoverviewofthesemethodsandrecommendbestpracticesfortheirimplementation.
IntroductionTheclassicmethodofacquiringenvironmentalsensordatafromremotefieldsitesinvolvesroutinetechniciansitevisits,inwhichs/heconnectsalaptoptoadatalogger,anelectronicdevicethatrecordssensordataovertime,andmanuallydownloadsdatarecordedsincethelastsitevisit.Oncethetechnicianreturnstothelab,s/heisthenresponsibleformanuallyuploadingthesedatatoaserverforlaterprocessingandarchival.
Whilemanualacquisitionmethodsaregenerallyeffective,therearemanyreasonstoautomateenvironmentalsensordataacquisition.Forinstance,ifthesiteisnotvisitedfrequentlyenough,thedataloggermemorycanbecomefullanddependingonhowthedataloggerisprogrammed,sensordatawilleitheroverwriteitselforstoprecordingentirely.Thissituationoftenoccursatremotesitesthatbecomeperiodicallyinaccessibleduetoenvironmentalconditions,suchasheavywintersnowpack.Second,theburdenofresponsibilityfornotonlythesuccessfulretrievalofthesensordata,butalsothesubsequentuploadtoaserverforsafekeeping,liessolelyonthetechnician.Moreover,withanyinstrumentedsite,thereistheinherentpotentialforsensororpowerfailure.Automateddataacquisitionsystemsallowtechnicianstolearnofsuchissuespriortovisitingthefieldsite,reducingthepotentialfordataloss.Finally,automateddataacquisitionmethodssavehundredsofpersonhoursandvehiclemilesthatwouldhaveotherwisebeenspentmanuallyacquiringdataortroubleshootingunanticipatedproblems,thusimprovingtheoverallqualityofthedata.
Bidirectionalcommunicationmethodshavetheadditionaladvantagesofallowingtechnicianstoremotelychangesystemsettings,testconfigurations,andtroubleshootproblems.Thesemethodsalsoopenthefieldtoawidevarietyofdevicesthatmaybedeployedataremotefieldsite,suchascontrollablecameras,on-sitewirelesshotspots,andIP-enabledcontrolorautomationequipment.
Considerations
Thedecisionofwhichsensordataacquisitionmethodtouseatagivensiterequiresthecarefulconsiderationofmanyfactors,forwhichweprovideanoverviewhere.
CollectionFrequency
Whatisthedesiredcollectionfrequency?Howimportantisreal-timeaccessibility?Forinstance,thedatacouldberetrievedinnearreal-time(everyfewminutestoeveryfewhours)orjustonceortwiceperday.Highfrequencydatasetsorimagesshouldbecollectedmorefrequently.
Bandwidth
Bandwidthcanbeanimportantconsideration,particularlywhenhighfrequencydataarebeingcollected.Willcamerasbeutilizedatthesite?Whereisbroadbandpointofpresence(POP)located?Doesequipmentworkwithrequiredbandwidth?Morefrequentcollectionintervalsrequirelessbandwidthpertransmissionarearerecommendedforhighfrequencydatasetsorforimages.
Protocols
Hardware
Manydataloggersonlyhaveserial(RS232)ports,thereforerequiringaserial-to-ethernetconvertertointerfacewithautomatedacquisitioninstrumentation.USB.
Network
PublicIPnetworksareadvantageousoverprivateIPnetworksinmanycasesbecausetheycanbemanagedfromanywherethereisaconnectiontotheInternet.RemoteaccesstoprivateIPnetworksrequiresadvancednetworkexpertisetoprovisionportforwardinginfirewallsand/orVPN.
Line-of-sight
Fig.Anexampleofanear-Line-of-Sight(nLoS)condition,whereinterveningterrainand/orvegetationcaninterferewiththeradiosignal.Inthiscase,theantennaheightsatbothendsareactuallyatthe8mlevel,mitigatingtheeffectsomewhat.Thelinkisoperational,albeitwithareducedReceivedSignalStrengthIndication(RSSI)duetothepresenceofanobstructioninthelinksFresnelzones.
Evaluationofenvironment,topography,andvegetation.CanbeinitiallydeterminedusingLOScalculators,whichuseDEMmodels,butmustbegroundtruthed.Oftenrequiresarepeaterinfrastructure.Choosingrepeaterlocationsinvolvesmanyofthesameconsiderationsforchoosingsiteselection.Distancetorepeaterisafactor.AutomatedsensordataacquisitionmethodsrequiremanyofthesamesiteselectionconsiderationsdiscussedinSensorSiteandPlatformSelection,particularlywhenselectingrepeatersites.
Power
Howimportantisreal-timeaccessibility?(e.g.,whatisdesiredcollectionfrequency?).Whatarethetransmissiontypepowerrequirements,onsitebuffersize.Redundancyispreferred,especiallyinveryremotesites.Ifpowerisdisrupted,willsystemresumeoperations?
Security
PhysicalSecurity
Forphysicalsecurityconsiderations,refertoSensorSiteandPlatformSelection.
NetworkSecurity
Itisrecommendedthatencryptionkeys,suchasWPA2encryption,beconfiguredtopreventunauthorizedaccessofdataacquisitionequipmentorsensordata.AprivateIPnetworkcanfurtherhelptopreventunwantedaccess,butalsopreventseasyremotemanagementbynetworkadministratorsunlessaVPNisinstalled.
ReliabilityandRedundancy
oftransmissionmodeandofequipment.Also,networkinfrastructure.
Expertise
Someacquisitionmethodsareplug-n-playwithsubstantialvendorand/orcommunitysupport,whileothersrequireafairamountofhardwareandnetworkexpertisetoconfigureandmaintain.AllacquisitionmethodsrequirefundamentalknowledgeofIPnetworkingalongwithbasicelectronics,radio,andantennatheory.
Budget
Costsofimplementingadataacquisitionandtransmissionmethoddependonexistinginfrastructure,initialsetupcostsincludingpersonnel,personnelcosts,specificallytechnicianmaintenance,andrecurringcosts,suchasmonthlyrecurringcostswithcellulartransmission.
MethodsTherearethreegeneralcategoriesofremotesensordataacquisitionmethods:manual,unidirectionaltelemetry,andbidirectionaltelemetry.Eachhasadvantagesanddisadvantagesintermsofinfrastructure,cost,reliability,requiredexpertise,andpowerconsumption.
Manual
Thismethodinvolvesscheduledvisitstothesitebyafieldtechnician,whousesaserial-to-computerconnectionand/orflashmemorytransferofenvironmentalsensordatatotheirlaptoporsimilardevice.Uponreturningfromthefield,thetechnicianisresponsibleformanuallyuploadingthesedatatoaserver.Thisacquisitionmethodissimpleandmaybetheonlyoptionwhensiteinstrumentationgenerateslargedatafiles.However,thismethodprovidesnoreal-timedataaccessandtherefore,noknowledgeofinstrumentationfailures.Moreover,thereliabilityofthismethodiscompletelydependentonthetechnician.
Unidirectional
Unidirectionalsensordataacquisitionmethodsinvolveregularlyscheduledwirelessdatatransmissionfromaremotesitetoaserver,withnooffsiteabilitytocontrolorchangesensorsettings.Theseinclude...
GeostationaryOperationalEnvironmentalSatellite(GOES)
Fig.Atypicalcircular-polarizedGOESantennaforone-waybursttransmissionoflimiteddata
Thismethodispreferredinveryremoteandpotentiallyruggedareaswhereotherautomatedtransmissionmethodswouldnotwork.Whileitdoesnotrequireline-of-sitetoarepeaterlikemostothertransmissionmethods,itdoesrequireaviewtothesouthernsky.Additionally,theGOESmethodhasalowpowerrequirement.However,GOEShasseveraldisadvantages,includingahighinitialinvestment(<$5K)andrequirestrainingandlicensing.Moreover,lessthan100valuescanbetransferredperhour,makingitdisadvantageousforsitesthatsampleathighfrequencies.
DatatransferspeedforGOESsystemsistypicallylimitedto1200bitspersecondwith10secondtransferassignmentsoccurringonceeveryhour.Duringeach10secondperiod,onecantransferupto1500bytesofdata(12,000bits/8)includingthe53byteGOESheaderstring.Inotherwords,maximum1447byteswithtimestampsandmeasuredvaluescanbetransferredtothesatelliteduringonetransmissioninterval.Mostoften,GOESmessagesareorganizedinatimeorderedformatsimilartothefollowingexample:
0105E59013190131824G30+1NN196WXW00517 0 13:00:00 23.7,43,5,245,-55.1,5,245,23.7,23.7,12.8 1 12:30:00 23.7,43,-55.1,204,1011.09,0.000,0.0,24.7,0.270,-0.456,-0.997,-0.416,-2.687,23.5,0.00,214.81,0.00,5,245 1 12:45:00 23.7,43,-55.1,204,1011.11,0.000,0.0,24.7,0.249,-0.468,-0.994,-0.436,-2.650,23.5,0.00,214.82,0.00,5,245
Here,firstlinerepresentstheGOESheaderstringthatincludestheaddress,dateandUTCtimeofthetransfer(13:18:24),signalinformation,satelliteinformation,messagelengthandsomeothercharacters.Intheexampleabove,thelinesthatfollowcarrythetimestampandvalueinformationfromthesensorsets0and1.Asthelengthofeachcharacterinthesensorsetstringis1byte,wecanseethatourGOESmessagehasapproximately280bytesusedfrom1447bytesthatareatheoreticalmaximumforthetransfer.However,inordertoaccommodatethepossibledifferencesbetweenthestationsendingtime,decoders,andscheduledreceptiontime,weneverwanttoreachthisvalue.
ProspectiveusersoftheGOESsystemmustfillouttheSystemUseAgreement(SUA)formand,uponapproval,receiveandsigntheMemorandumofAgreement(MOA)fromtheNOAA'sSatelliteandInformationService(NESDIS).AftertheMOAisapproved,NESDISwillissueachannelassignmentandanIDaddresscodetotheapplyingorganization.Non-U.S.governmentandresearchorganizationsmustbesponsoredbyaU.S.governmentagencyinordertoapplyforthispermission.Uponapproval,allusersmustpurchaseequipmentthathasbeencertifiedtobecompatiblewiththeGOESDataCollectionSystem.AsofMay2013,GOEStransmittersmustconformtothecertificationstandardsversion2(alsoknownasCS2).ThischangewasimplementedtodoublethenumberofGOESchannelsonthesamebandwidth.Asaresult,oldGOEStransmittersthatareonlycompatiblewiththeCS1standardcannotbeusedfornewNESDISassignments.ForassignmentsobtainedpriortoMay2012,CS1transmitterswillbesupporteduntil2023.IfyouconsiderbuyingtheusedequipmentforGEOStransmission,makesurethetransmittersarecompliantwiththeCS2standard.
MeteorBurstRadio
LikeGOES,thismethoddoesnotrequireline-of-sightandhasalowpowerrequirement.However,itrequiresalargeantenna,arrangementofservice,andhasaveryslowtransmissionrate.ItworksbyreflectingVHFradiosignalsatasteepangleoffthebandofionizedmeteoritesthatexist50to75milesabovetheEarth.SeeSNOTELandITUCaseStudiesformoreinformation.
IridiumSatelliteservice
Iridiumprovidestheonlycompleteglobalsatellitecoverage.ThenewIridiumPilotisavailableuntil2016.ThenextgenerationofIridiumisexpectedtobeimplementedaroundthattimeframe.ThePilotisveryeasytoinstallandmaintainwithawaterproofbodyandUSBinterface.Withthissimpleinterfacealaptopcanbeconnectedandsurfingthewebwithinminutes.Recently,thecosthasbecomemoreaffordablewithaperdatausagecoststructure.SinceIridiumoperatesintheLbanditisnearlyimpervioustoweather.Iridiumisusedprimarilyformarinecommunication.
Bidirectional
Thismethodinvolvesbidirectional(andtypicallywireless)transmissionofdatafromaremotesitetoaserver,withtheabilitytomodifydataloggerprogramsand/orsensorsettingsremotely.Thesemethodsgenerallyrequireline-of-sightandsecurityconsiderations(bothnetworkandphysical).Sometimes,canbepurchasedfromanInternetServiceProvider(ISP)ifthereiscommercialcoverageinthearea,orcanbemanuallyinstalledinremoteareas.Often,connectivitycanbeextendedtocomputersonsite.Combinationofseveralmethodsmayberequiredincertainsituations.
ISMbandradionetwork
Fig.Threedifferentantennatypesusedforbi-directionalmicrowavebandcommunication:a)5.xGHz24"dual-polaritydish,30dBigain;b)2.4GHzsingle-polaritygrid,24dBigain;andc)5.xdual-polaritypanel,23dBigain.Thehigherthegainvalue,themorenarrowtheantennadirectivity,increasingsignalstrengthinthedesireddirectionandrejectingadjacent
interference.Eachofthesedesignshasprosandcons,dependingontheapplication.
(unlicensed,900MHz,2.4GHz,5.xGHz):TheISMbandradiosarecommonlyreferredtoas"WiFi"radios(eventhoughthesearegenerallyusedasbackhaulsandnotwide-areaaccesspoints)andcomeinavarietyoffrequencies.Thismethodhasmanyadvantagesinthatitisnonproprietary,hasnorecurringcosts,usespublicradiofrequencies,allowstransmissionoflargedatasets,utilizesinexpensivehardware,isnotrestrictedtoasinglevendorordevicetype,andhasincreasingcompatibilitywithmanydevices.However,itrequiresline-of-site(LoS)ornear-line-of-sight(nLoS),anetworkinterfaceonloggersanddevices,andbasictoadvancednetworkadministrationskills.Theseradiosarealsosubjecttointerference,particularlyinmorepopulatedareas,andcanhavehigherpowerrequirementsthanothertransmissionmethods.
Cellular
Thismethodhasprolificcoverageandminimalongoingmaintenance.However,itrequiresareliablecellularnetworkbepresentandcomeswithmonthlyrecurringcosts.Occasionallyacontractmayberequiredunlesscanbenegotiatedthroughuniversityororganization.
Vendor-specificradionetwork
Vendorspecificradionetworksuseproprietaryprotocolsandaretypicallymoreexpensivethansomeotheracquisitionmethods,buthavetheadvantageofbeingrelativelyeasytosetupandmaintain.Forexample,Freewave
Satelliteinternet
Thismethodcangetlimited2-wayconnectivityintoaremotesite,albeitathighmonetarycostsandsignificantpowerconsumption.Ithasslowuplinkspeeds,highlatency,requiresasubscription,andon-sitevendorsetupisrequired.
Licensedradio
Thismethodisexpensiveandrequiresapurchaseofalicensedfrequency.
MeshNetworks
Ameshnetworkisanetworktopologyinwhicheachnoderelaysdatathroughoutthenetwork.Ameshnetworkwhosenodesareallconnectedtoeachotherisafullyconnectednetwork.Duetotheinherentredundancyinmeshnetworkdesign,meshnetworksaretypicallyquitereliable,asthereisoftenmorethanonepathbetweenasourceandadestinationinthenetwork.Meshnetworksaretypicallywireless,butcanbewired.Meshnetworksarenotverycommon,especiallyatlargespatialscales,sinceeverydevicemustbeconnectedtoeveryotherdevice.Theinitialinvestmenttobuildsuchanetworkisconsiderablyhigherthanotheracquisitionmethods.Meshnetworks,eitherpartiallyorfullyconnected,aremostcommonlyusedindistributedsensornetworks.
Wired
Whileallmethodsdiscussedutilizewirelesstransmissionprotocols,wiredbidirectionaltransmissionmaybepossibleviain-groundoraerialcopperorfiberoptics.
BestPracticesThinkaboutdataacquisitionaspartofsitedesign.Itismoreexpensivetoaddtelemetrytoapreexistingsitethantointegratewithinitialsiteconstruction.Makesuretoincludeacquisitionmethodpowerconsumptioninthesitepowerbudget,oraseparatepowersystemwillberequired.UsesoftwaretoolswithradioorahandheldspectrumanalyzertosurveyRFconditionson-site.Forinstance,urbanareasaretypicallynoisierwithrespecttoRFinterference,andforWi-Fitransmissionmethods,5Ghzfrequenciesarepreferred.Useabidirectionaltransmissionmethodtoprovidemorecontrolandflexibility.Over-engineerpowersystem,especiallywhenpoweringrepeatersandothersitesinhardtoreachareas.Useequipmentthatcanconservepower(sleepmode)Provideadequatelocalstoragefordisruptedtransmissions.Adequate“offlogger”localstorageisrecommendedtoavoidlosingdatawhen/ifloggerisreset.Provideredundancy,suchthatwhenonelinkgoesdown,thesiteisstillremotelyaccessible.Thisisrelatedtonetworkarchitectureplanning-multiplegeographic/hardwarepathsalongbackhaulroutestofieldhubsishighlydesirable.Examplesinclude:parallelbackhauls,multipleinternetpointsofaccess,"failover"paths.Havinga"backdoor"intothenetwork,evenoverreducedspeedlinks,canallowatechtoremotelytroubleshootproblemsonthemainlinks.Standardizetransmissionprotocolacrossallsitestoprovideeasiernetworkmanagement.Matchradioband,power,antenna,andbandwidthtoapplication.Forinstance,whenasitegenerateshighfrequencysensordata,highbandwidthandhighdatacollectionfrequencyarerecommended.UseanarrowbandwidthforyourRFdevices/coordinatefrequenciesbetweenradiosystemsThoroughlydocumentallsitecoordinates,IPaddresses,maps,radioazimuth,zenith.WhenusinganIPbasedacquisitionmethod,usepublicIPaddressesforeasierremotemanagementofdevices.
CaseStudies
NevCAN:NevadaClimate-ecohydrologicalAssessmentNetwork-UniversityofNevada,Reno(UNR);DesertResearchInstitute(DRI),UniversityofNevada,LasVegas(UNLV)SevilletaWirelessNetwork-SevilletaLongTermEcologicalResearch(LTER)ProgramandSevilletaFieldStation;DepartmentofBiology;UniversityofNewMexico(UNM),Albuquerque,NewMexico,USAVirginiaCoastReserveLTERWirelessNetwork-VirginiaCoastReserveLongTermEcologicalResearch(LTER)ProgramSNOTELITU
Resources
GOES
NewNESDISAssignmentsCS2StandardCompliance
References
Sensor Management Tracking and
Documentation
Outlines the importance of communication between field
and data management personnel as field events may alter
the data streams and need to be documented.
backtoEnviroSensingClustermainpage
Fig.Documentationofsensorinstallation,maintenance,andrelatedsystemsiscriticaltolong-termdatausability.
Contents1Overview2Introduction3Methods
3.1Whatshouldbetracked3.1.1Documentationatsetuptime3.1.2Infrastructureeventstotrack3.1.3Siteleveleventstotrack3.1.4Sub-componenteventstotrack
3.2Howtotracktheinformation4BestPractices
4.1Documentspecificinformationduringnormaloperations4.2Maintainingtherecordsandlinkingtoaffecteddatastreams4.3Managingsensorconfigurations
5CaseStudies
OverviewAutomatedobservationsystemsneedtobemanagedforoptimalperformance.Maintenanceoftheoverallsensorsystemincludeanythingfromrepairs,replacements,changestothegeneralinfrastructure,todeploymentandoperationofindividualsensors,andseasonaloreventdrivensitecleanupactivities.Anyoftheseactivitiesinthefieldmayaffectthedatabeingcollected.Therefore,consistentanduniformrecordsofmaintenance,service,andchangestofieldinstrumentationandsupportinginfrastructureserveasmetadataforlongtermqualitycontrolandevaluationofthesensordata.
Inthischapter,wedescribethetypesofmanagementrecordsthatshouldbekeptandthevariousmethodsforcollecting,maintaining,communicating,andconnectingthisinformationtothedata.Itisimportanttocreatetrackinganddocumentationprotocolsearlyonbecausetheseprotocolswillsupportandguidecommunicationsandworkbetweenfieldanddatamanagementpersonnel.
Realtimemonitoringofsystemhealthandalertingsystemsarediscussedinthemiddleware,qualitycontrol,andtransmissionsectionsofthisdocument.Althoughsomeoftheseparametersdonotaffecttheactualdataquality,trackingofthesesystemperformancediagnosticdatamaybehelpfultodetectpatterns
andpreventfuturedataloss,interveneremotely,andschedulesitevisitsmoreeffectively.Calibrationproceduresandschedules,maintenanceactivities,andreplacementschedulesarehardwarespecificandwillnotbecoveredhereindetail.
IntroductionDataarecollectedtodetectchangesintheenvironment,effectsoftreatments,disturbancesetc.,andinalldatacollectiongreatcareistakentonotmaskthesignatureofeventsofinterestwithimpactsfromunavoidable,samplingrelateddisturbances.Fieldnotesareusuallyassociatedwiththerawdatatobeabletodiscernanaturaleventofinterestfromamanagementevent.Datacollectionapproachesusingautomatedsensingnetworksarebecomingmorecomplexwithmanypeopleinvolvedinthedatagathering,management,andinterpretationactivities,andcommunicationamongallinvolvedpartiesisbecomingmoreimportantandmorechallenging.Fieldnotescanbeausefulvehicleforthiscommunication.Everyoneusingolderlong-termdataknowsthevalueoffieldnotebookstohelpunderstandandinterpretadataset.Fieldnotesareequallyvaluabletofutureusersforasensordatastream,particularlyifthenotesareinterpretedsuchthatinformationisintegratedwiththedataviadataqualifyingflagsandmethoddescriptioncodes.
Currentlytherearenostandardsforflagcodesetsorfordefiningwhicheventsshouldbeflaggedandhowtoefficientlycommunicatewithdatausers.Hereweattempttopresentalistofeventsthatareusefultotrackandthathavebeenhelpfulinthepasttoguidedatausersintheinterpretationandevaluationofthedata.Tomanagethisinformationtheconceptsofa‘logicalsensor’,a‘physicalsensor’,a‘method’and‘eventcodes’haveprovenuseful.
A‘logicalsensor’orasensordatastreamcanbedefinedbyalocation,height/depth,andmeasurementparameter,regardlessofwhatexactphysicalsensororhardwareisusedtologmeasurements.Anexamplewouldbe‘airtemperatureat3mabovethegroundatsiteA’.However,overtimethe‘physicalsensor’willhavetobecalibrated,eventuallyreplaced,andanewtypeofsensormaybechosentoprovidemoreaccuratemeasurements.Ifhardwareisswappedoutfortechnicalreasons,thedatastreamstillrepresentsthesitelocationforthatmeasurement,andthenotionofa‘logicalsensor’allowsidentificationofaconsistentdatastreamovertime.
Changesinthetypeofsensoror‘method’mightbetrackedwithamethodcodeassociatedwiththelogicalsensor.Ofcourseshouldareplacementsensorbesignificantlydifferentsuchthatthepastandnewdatastreamarenotcomparable,thenanewlogicalsensorstreamshouldbeinitiated.Eventssuchasroutinecalibrationmightbeflaggedwithan‘eventcode’ratherthanachangein‘method’,evenifthiseventhaslastingeffectsonthedata,i.e.,moreaccuratedata.Aneventcodemayserveasameanstolinktoindividualfieldnotesfortheevent.‘Physicalsensors’shouldalsobeindividuallyidentifiablebylocationandtrackedthroughacalibrationorreplacementschedule.
MethodsWhatshouldbetrackedBasicinformationonthesiteandhardwareconfigurationneedtoberecordedatinstallationtime.Duringnormaloperationseventtrackingcanbedoneatseverallevelsofgranularitywithrespecttoaresearchprogram.Forexample,itmaybedoneattheleveloftheentireinfrastructure,atasite,oratasub-componentofasite.Theinformationabouteacheventneedstobepropagatedorconnectedtoallrelevantdatastreams.Followingareexamplesofwhatshouldbetrackedateachoftheabovelevels,intermsofimpactontherecordeddata:
Documentationatsetuptime
Locationlat,long,elevation(and/ordepth),direction(e.g.camerafacingnorth),Locationfromacertainreferencepoint(e.g.towerbase)SitedescriptionSitephotoswithmetadata,photosofprocedures(howdoyouchange...),photoofsensor(sootherscaneasilyrecognize)ManufacturersspecsandIDofinstruments(make,model,serialnumber,range,precision,detectionlimit,calibrationcoefficient)Instrumentation(e.g.datalogger,multiplexer,sensor)wiringdiagrams(thisshouldbepartoftheloggerprogramcomments,aheadersectionwiththewiringdescriptionchannelbychannel)Powerwiringdiagrams(e.g.howmanysolarpanels,aretheyhookedupinseriesorparallel,etc.)NetworktopologyandIPaddressesSoftwareusedforcalculatingmeasurements(otherthandatalogger)Instrumentationdeploymentdate(the“golive”date)
Infrastructureeventstotrack
Changestodataloggers,multiplexers,ordataloggerprograms(dataloggerprogramsmaybearchived)Powerproblems,includingbatteryvoltageEnclosuretemperatureandhumidityPlatformmaintenance(e.g.,towerinspection,tramlineleveling,etc.)Samplingprotocolchanges(e.g.,timing,routinechangingorupgradingofsensorparts,instrumentchangeorreplacement)RF/networkperformancedegradation(preventssome/alldatafrombeingtransmitted;trackhealth/statusofIPnetworkdevicesusingSNMPstreamstoNagios,etc.)
Siteleveleventstotrack
Sitedisturbance(e.g.,animal,human,weathercaused)Sitevisits(presenceofpeoplemaychangemeasurements)Sitemaintenance(e.g.,cuttingbrush,cuttingtrees,etc.)Changestosensornetworkdesign,includingadditionsordeletionsofsensors
Sub-componenteventstotrack
Here,weincludecomponentslikeindividualtelemetry,powersystems,instruments,sensorcomponents,etc.Whileeachcomponentdoesn’taffectthewholesystem,theystillmayinfluencetheinterpretationofthemeasurements.TotrackindividualcomponentsasystemofIDsmaybedevelopedforallcomponentsandsupportedbyBarcodes,Geo-LocationTagsandMicrochipEncodedSensors.
SensorfailuresSensorcalibrationsSensorremovalSub-sensoraddition,removal,orchange(pluggablesub-sensorpositionswithinthemainsensorneedtobenotedandkeptconsistent)Sensorinstallation(replacement)Sensormaintenance(cleaning,changeofparts)SensorfirmwareupgradesEnclosuretemperatureandhumidityRepositioningofsensor(e.g.,moveupduringwintertobeabovesnowlineNormal(nonextreme)disturbancesastheyarenotedandremoved(e.g.,sticksinweirs)Methodologychanges(e.g.,temperatureradiationshieldchange)
Howtotracktheinformation
Minimallydocumentingorloggingsiteeventsorproblemsmightbeinatablestructuresuchas:
SiteID DataloggerID SensorID datetimebegin
datetimeend category notes person
controlledvocabulary
However,usuallyalotmoreisrecordedateachsitevisit-seeusecases.Acontrolledvocabularyisveryimportanttocategorisetheeventforlaterinterpretationandflagginginthedatasetandshouldbeestablishedasearlyaspossiblewithprojectspecificterms.Severaldatabasestructurestomaintainthisinformationandconnecttotheactualdataarecurrentlybeingproposedanddiscussedbelowinusecases.
BestPracticesEstablishanddocumentproceduresandprotocolsforsitevisits,installationofnewsensors,maintenanceactivities,calibrations,communicationbetweenfieldanddatapersonnel.Suchprotocolsmayincludepre-designedfieldsheetsorsoftwareapplicationsonfielddataentrydevices,bothofwhichshouldbesynchronizedwithacentralstoragesystemtowhichallpartieshaveaccess.Observationsinthefieldmayalsobemadeandrecordedbyresearchersandfieldpersonnelnotdirectlyinvolvedinthesensorsystemmaintenance,andprovisionsshouldbemadetocapturethatinformationandcommunicateittoresponsiblestaffmembers.
Inadditiontocapturingthefieldeventsmentionedaboveitisgoodpracticeforthedatamanagementstafftoregularlymonitorthedataandconferwiththefieldcrewwhenanomaliesarenoticed.Thisfrequentlywillbringupadditionalinformationthatneedstoberecordedinthefield.Itisalsogoodpracticetohavethedatamanagementstaffvisitthesite,periodicallyassistwithfieldmaintenanceactivitiestobetterunderstandandinterpretfieldnotesandgenerallyinteractwiththefieldstaff.
Allphysicalsensorsshouldbeuniquelyidentifiable.Thismaybeachievedbyrecordingaserialnumber,attachingabarcode,usingintelligentsensorswhicharecapableofstoringtheirownmetadataandwhichcanbeaccesseduponconnection.Thisisparticularlyimportantforsensorsthataremovedaroundorarepulledformasscalibrationandredeployed.SensorlocationandcalibrationschedulesshouldbetrackedbyeachsensorwithID.
Documentspecificinformationduringnormaloperations
Eitherapre-designedfieldsheetoradataentryapponafielddevice(tablet,laptop,etc.)helpsremembereverydetailtorecord.Itisalsohelpfultodefinealistoftermstodescribethemostcommonproblemsinaconsistentwayforlateranalysis.DocumentsiteID,date,time,person(s),siteconditions,tasksperformedeverytimeasiteisvisited.Whenupdatingdataloggerprograms,useanewprogramnameforeverychange.Itisadvisabletosaveolddataloggerprograms.Useachangelogsectioninadataloggerprogramcommentheadertonotedate,author,anddescriptionofdifferencesfromlastdataloggerprogram.i.e.versioning/revisioncontrolForsensorspecificeventsnotethesensorID(BarCodes,Geo-LocationTags,MicrochipEncodedSensors(NEON'Grape'),orintelligentsensorsthatstoreandprovidetheirownmetadatauponconnection).
Maintainingtherecordsandlinkingtoaffecteddatastreams
Asmentionedearlier,thisrecordkeepingisaneffortincommunicationbetweenfieldanddatapersonnelaswellascommunicatingeventstofuturedatausers.Henceagoodpracticeistopermanentlylinkthisinformationtothedataset.Thismaybeachievedondifferentlevels-adescriptioninametadatadocument,anindicatorofamethodforadataseriesoreachdatavalue,aflagindicatingaonetimeevent
atacertaindatavalue.Asaminimumaffecteddatashouldbeflaggedinadifferentcolumnwithinthedatatable.
Followingtheconceptofalogicalsensor,certaineventsshouldtriggerthestartofanew‘method’descriptionwhenthedatastreamisaffectedmorethanregularcorrectionscanaccommodate(e.g.,newsensorusingadifferentmethodsofmeasurement).Inthiscaseitisgoodpracticetoruntheoldandthenewsensorsidebysideforawhiletocompare.Nohardandfastguidelinesareavailablefordecidingwhenamethodchangeoccursandwhenawholenewlogicalsensorstream(i.e.,differentdatasetordatatable)shouldbestarted.TheseconceptsarewellimplementedintheCUAHSIODM,pleaseseethosedocumentsforfurtherdiscussion.
Mostevents,however,canbehandledbywelldocumentedflags(sensorcalibration,sitemaintenanceactivities,disturbances,etc.).Fordocumentation,flagsinthedatafileshouldlinktoadatabasewithmoreextensiveexplanationsoftheevents.
Managingsensorconfigurations
Anumberofsensorsprovidecoremeasurements,butwillalsoprovidetheabilitytoexpandthesensorviaoneormorepluggableports.Whenasub-sensorisconnected,thedatafromthesub-sensorareusuallyaddedtothemaindatastreamasavoltagemeasurementthatgetsconvertedtothemeasurementparameterunitspost-transmission.Trackboththenumberofsub-sensorsandtheirportpositions,sinceachangetoeithermaycauseproblemsinprocessingthedatastreaminmiddlewareapplications.Forinstance,awatersamplerlikeaCTDmayprovideportstoconnectsubsensorsfordissolvedoxygenorturbiditymeasurements.NotethattheDOsub-sensorshouldalwaysbeconnectedto,say,voltageport1,andtheturbiditysensorisalwaysconnectedtovoltageport2,andvoltageport3isempty.
SeealsomiddlewarecapabilitiesandQA/QCproceduredocumentationinthoserespectivesections.
CaseStudies
Casestudy:DatamodelfortrackingsensorsandsensormaintenanceattheUtahWaterResearchLaboratory(J.Horsburgh,September2013)
ThedatabasedesigndiagramdepictsthedatamodelasitisusedattheUtahWaterResearchLaboratory,UtahStateUniversity.ItwasdevelopedbyJ.Horsburghandhisresearchteam.CurrentlyeffortsareunderwaytoextendtheCUAHSIODMtostorethiskindofmetadatabasedontheexperiencewiththisdatamodel.
Casestudy:TwoexamplefieldsheetsfromtheHJAndrewsExperimentalForestinOregon.
HJAndrewsstreamgagechecksheetHJAndrewswatershedchecksheet
Streaming Data Management Middleware
Discusses software features for managing streaming
sensor data.
backtoEnviroSensingClustermainpage
Fig.1Thepositionofmiddlewareinagenericsensordatamanagementsystem.
Contents1Overview2Introduction
2.1ResearchAgenda2.2TechnologicalRequirements2.3PersonnelSkills
3MiddlewareClassifications3.1Classificationbyfunctionality3.2Classificationbyproprietyandtype
4BestPractices5CaseStudies
5.1MarmotCreekResearchSite,RockyMountains,Canada5.2UniversityofTexasatElPaso'sSystemEcologyLab,Jornadaresearchsite,NM
6MiddlewarePackageDescriptions7References
OverviewMiddlewarearesoftwarepackagesandproceduresthatresidevirtuallybetweendatacollectors,suchasautomatedsensors,anddata‘consumers’,suchasdatarepositories,websites,orothersoftwareapplications.Middlewarecanbeusedtoperformtaskssuchasstreamingdatafromdataloggerstoservers,archivingdata,analyzingdata,orgeneratingvisualizations.
Manymiddlewarepackagesareavailablefordevelopingacomprehensive,reliable,andcost-effectiveenvironmentalinformationmanagementsystem.Eachmiddlewareoptioncanhaveauniquesetofrequirementsorcapabilities,andcostscanvarywidely.Asinglemiddlewarepackagemaybeusedifitincludesalloftheuserrequirements,ormultiplemiddlewaremaybebundledintoadatamanagementsystemiftheyarecompatibleorinteroperablewitheachotherandtherestofthedatacollectionandmanagementsystem.
Thissectiondescribesmultiplemiddlewarepackagesthatarecurrentlyavailable,andprovidesexamplesofhowdifferentsoftwareandproceduresarebeingusedtocollect,analyze,visualize,anddisseminatesensor-suppliedenvironmentaldata.
IntroductionTherearemultiplefactorsthatmayaffectthechoice,use,andperformanceofmiddleware.Thesefactorsmaybeclassifiedaccordingtoagroup’sresearchagenda,technologicalrequirements,andpersonnelskillsets.
ResearchAgenda
Theresearchagendaofagroupisamajordeterminantofthetypeofmiddlewaresystemneeded.Agroupfocusedononlyoneorafewnarrowlyfocusedresearchquestionsmayneedfewertypesofsensorsandconsequently,fewersoftwaremodulesmaybeadequatetostreamlinedataprocessingfromcollectiontotheendgoal.Ateamthatinvestigatesmultiplequestionsspanningmultipleresearchdomainsislikelytousemorediverseand/orlargersetsofsensors.Theremaynotbeasinglemiddlewarepackagethatcanmeetalloftheneedsofaresearchgroup.Inthiscase,multiplepackageswillneedtobelinkedintoaworkflow.
TechnologicalRequirements
Thetechnologicalrequirementsofaresearchprogrammayvaryfromsimpletocomplex.Iftheresearchcanbedonewithsensorsfromasingle,well-managedcompany,theproprietarysoftwarepackagedwiththepurchasedsensornetworkmaybeadequateforatleastamajorportionoftheinformationmanagementsystem.Forexample,forCampbellScientificdataloggers,their“LoggerNet”softwareintegratescommunication,datadownload,displayandgraphicsfunctions.However,somedataloggersandsensors(particularlyinnovativeones,custom-built),mayneedcustom-writtensoftware.Itisimportanttoplantimeandbudgetsforrequiredsoftwareupgrades,licensing,additionalpackages,support,andmaintenance.Systemsthatcostlessintheoutsetmaynotalwaysbecheaperoverthelongrun.Itisalsoimportanttoconsiderhowtobestmeetinfrastructureandbandwidthrequirements,whiledeployingmiddlewareonavarietyofserversorlaptopcomputersinthefieldorlabsetting.Dependingonthedataandhardwareinfrastructurecharacteristics,eachmiddlewareoptioncanintroducebenefitsordrawbackstotheoverallsystemfunctionality.
PersonnelSkills
Anotherkeyfactortoconsideristheskillsetofthepersonnel.Acomplexdatamanagementsystemmayrequiremultiplepeople,eachwithauniqueskillsetsuchasdatabasedesign,systemarchitecture,webprogramming,etc.Itisimportanttocorrectlyidentifyeachperson’sskillsetandroleindatamanagementtasks.Itmayalsobenecessarytoplanforadditionalhiresorjob-trainingtoaddressesvariousscenariosandsolutions,toidentifyappropriatesalaries,andtobudgetenoughtimeforsoftwaredevelopmentandsystemadministration.Moredetailsaboutthepersonnelrolesandskillscanbefoundinthe“Rolesandrequiredskillsets“section.
MiddlewareClassifications
Classificationbyfunctionality
Middlewarecanbeclassifiedwithrespecttothefunctionalitytheyprovide,suchas:
Controllinginstrumentationanddatacollection:Modulesmaybeusedtocontrolsamplingintervals,managetheevent-triggered(burst)orcontinuoussamplingregimes,communicateandtransferdatabetweentheinstrumentationandothersystemcomponents.
Datamonitoring,processing,andanalysis:Modulesmayprovidealarmmanagement,performautomatedQA/QCondatastreams,orrunderivativecalculationsincludingaverages,aggregationandaccumulation,datashiftingandtransformation,filteringoftimeseriesrecordswithrespecttothedates,valuerange,location,station/variabletype,orothercriteria.
Exportandpublishingofdata:Modulesmayprovidefunctionalitytoexportsensordatatodifferent
formats(e.g.,ASCII,binary,orxml),differentarchives,makedatadiscoverablethroughgeospatialcatalogues,orpublishthedatathroughwebservices.
Datavisualization:Modulesmayprovidevisualization(e.g.,tables,graphs,sonograms)ofgeospatialand/ortimeseriesdatafromsensorarraysorworkflowstructures.
Documentation:Modulesmaybeusedtodocumentfieldeventsthroughpaperlesscollectionoffielddata,integratesensordataanddocumentation(seesensortracking&documentationsection),orhandlesensorcalibrationrecords.
Othersupportedfunctionality:Modulesmaybeusedtoprovideaccesstoexternaldata(e.g.,ODBC,JDBC,OLEDB),toconnectorchainothermiddlewarecomponents,ortoimplementmobileapplications.
Classificationbyproprietyandtype
Middlewarecanalsobeclassifiedbysoftwareproprietaryrightsandwhethertheyareconsideredapplicationsorplatforms.Accordingly,wecanidentifydifferentgroupsofmiddleware:
ProprietarydatamanagementapplicationsandplatformsProprietaryresearchapplicationsLimitedopensourceapplications(freepackagesthatcanbeusedwithproprietarysolutions)OpensourcedatamanagementapplicationsandplatformsOpensourceresearchapplicationsandprogramminglanguages
Someoftheapplicationsandplatformslistedaboveareoftenidentifiedasasoftwareofchoiceformanydifferentorganizations.Moredetailsabouteachofthesecomponentsareprovidedinthenextsectionofthisdocument.
BestPracticesChoosingthemiddlewarecomponentsthatwillbestfitthetasksandworkenvironmentcanbechallenging.Inadditiontothepersonnelrolesandskills,budget,andinfrastructureconsiderationsalreadydiscussedintheIntroductionandotherchaptersofthisbestpracticesguide,itisimportanttobeawareofthewholesensormanagementprocessinordertoidentifythesuitablemiddlewarecomponents.Insomecases,aproprietarymiddlewaresoftwarewillrequiredaspartoftheinformationmanagementsystemiftheinstrumentationonlyoutputsdatainaproprietaryformat.Inothercases,multipleopensourcesoftwarepackagesmaybesuitableforchainingintoacomprehensivesystemthatmanagesdatafromcollectiontofinalarchivingandsharing.
Somestepsinselectingmiddlewareare:
Fig2.Sensormanagementworkflow.Simplesensormanagementconfigurationispresentedinblue;optionalsystemcomponentsareshowningrey.
1. Identifyyourobjectives.Whatdoyouwantthemiddlewaretodo?2. Assemblealistofcandidatesoftware.3. Ratethecandidatesbasedoncapabilities,cost(keepinginmindthatasimple-to-usebutexpensive
packagemaycutcostsinthelong-term),stability,andeaseofusewithrespecttothepersonnelskillsavailableonyourteam.
4. Ifnosinglesoftwareproductcanmeetalltheobjectives,testtoseehowwelldifferentcandidatesoftwareintegratewithoneanothertoperformtheneededfunctions.
Duringthisplanningstage,considerthefollowingrecommendations:
Identifyworkflowcomponentsanddescribetheirfunctionalrequirementsfromtheinstrumentationtothearchiveleveloforganization(seeFigure2).Somecomponentscanbeoptionalorpartofthemorecomplexsolutions.Planforrobustexecutionandchoosesoftwareandhardwarecomponentsthatcanhandlethelossofconnectivity,power,orotherfailuresrelatedtoharshenvironmentaloroperationalconditions.Choosereusable/sharablecomponents.Keepfielddeploymentofmiddlewareassimpleaspossible(keepoutoffieldifpossible).Useasfewmiddlewarecomponentsaspossiblebasedonresearchgrouprequirements.Documentanddiagramtheentireworkflowandupdateasneeded.
CaseStudiesWepresentseveralrealworldcasestudiesinthatvarywidelyinthetypesofecosystemsthatsensorsaredeployedinandincomplexityoftheinformationmanagementsystem.Somecasestudiesincludeproprietarysoftwareonly,someincludefreeoropen-sourcesoftware,andsomeincludeboth.
MarmotCreekResearchSite,RockyMountains,Canada
Fig.3.MarmotCreekresearchsite'sPacBusNetworkwithmixeddataloggersandRaventoRF401Base
IntroductionMarmotCreekresearchsiteislocatedontheeasternslopesofRockyMountainsinAlberta,Canada.Thesiteisdominatedbytheneedleleafvegetationandpoorlydevelopedmountainsoils.Precipitation,snowdepth,soilmoisture,soiltemperature,shortandlongwaveradiation,airtemperature,humidity,windspeed,andturbulentfluxesofheatandwatervapourdatasetsarecollectedandusedforthehydrologicalmodellingoftheMarmotCreekBasin.TimeseriesrecordsareobtainedatHayMeadow,UpperClearing,VistaView,FiseraRidge,andCentennialRidgehydro-meteorologicalstationsequippedwithdifferentsensorconfigurationsandCampbellScientificdataloggers.
CommunicationequipmentandmethodsThetelemetrynetworkconsistsofoneRavenCDMAcellularmodemandRF401spreadspectrumradiomodemlocatedattheUpperClearingbasestation,fouradditionalRF401modemslocatedateachoftheMeteorologicalstationsservicedbytelemetry,andthedesktopcomputerlocatedattheUniversityofSaskatchewan.Theradiosconnectedtothedataloggersateachofthemeteorologicalstationstalktothebasestationonanongoingbasis.AllofthedataloggersandRF401radioshavePacBusaddressesandtheyoperateasPacBusNodes.Also,dataloggersaresettooperateasroutersenablingroutinginsidethisnetworkthroughthevariouspaths.ThetelemetrynetworkconfigurationispresentedinFigure3.
DatacollectionandprocessingAttheintervalsprescribedwithintheLoggerNetapplicationrunningonadesktopcomputer,dataiscollectedfromthemeteorologicalstations.TheRavenCDMAtransfersdatautilizingadynamicIPaddressanditsstaticaliasassociatedthroughtheAirlinkIPmanagersoftware.TheuniquePakBusaddressisassignedtoeachofthedataloggersinthistelemetrynetwork.Inmostcases,loggerdatafilesattheoff-sitelocationwillbeappendedonadaily,four-hourlyandhourlybasis.Inadditiontothescheduledintervals,fielddatacanbedownloadedondemandthroughtheLoggerNetapplication.
LoggerNet“TaskMaster”utilityisusedtoexecutecustomprogramsaftereachsuccessfulcollectionofthefielddata.Also,theutilitycanbeusedtostartscheduledexecutionsofdifferentprogramsandoperations.ForMarmotCreekrecords,TaskMasterisusedtorenamethecollecteddataloggerfilesanduploadthemtotheFTPserver.
DatapublishingFielddatadownloadedtotheoff-sitecomputerareaccessedbytheRTMCPROLoggerNetutility.Lastmeasuredvaluesaremappedtothespecifiedlocationsonawebpage.ThewebserverhostsdifferentRTMCfilesfordailysummaryinformation,stationdatatables,alarms,andotherrecords.Themaininterfacecontainsindividualwindowsforthemainscreenwebpageaswellasthescreensforindividualstations,weeklydatagraphs,andsiteinformation.RTMCfilesinterfacewiththewebpageviatheRTMCWebServerdesktoputility.
ReferenceCentreforHydrology,UniversityofSaskatchewan.UniversityofSaskatchewanHydrologyFieldDataRetrievalandManagementManual.2009.PDFfile.
UniversityofTexasatElPaso'sSystemEcologyLab,Jornadaresearchsite,NM
Middlewareused:Hobolink,MySQL,R,ArcGIS,HTML5/Javascriptwebsite
IntroductionTheSystemsEcologyLab(SEL)attheUniversityofTexas,ElPasostudiespatternsandcontrolsofland-atmosphericwater,energy,andcarbonfluxesinbotharcticanddesertbiomes.AttheUSDA-ARSJornadaExperimentalRangeinsouthernNewMexico,SEL’sresearchsitecollectsdatausing>100automatedsensors(madebyCampbell,Onset,Decagon,PPSystems,andothers),andmanualfieldobservations.Sensorsaremountedonaneddycovariancetower,eightconnectedmini-towers(whichtogetherformawirelesssensornetwork),andacartmountedona110mlongtramline.>4GBofdataiscollectedperweekfrommicromet,hyperspectral,andgasfluxsensors,aswellascameras(detectchangesinphenology).Thisresearchsiteisalsousedtohelpdevelopandtestnewcyberinfrastructureandinformationmanagementconceptsandtools.Forthiscasestudy,wefocussolelyonmeasurementsmadebythe8-nodewirelesssensornetwork.
Fig.4.SEL-Jornadaresearchinformationsystemframeworkintermsofdataflow(symbolizedbyarrows).Webservicesareusedbyweb-basedapplicationstoquerydatafromdatabases.
DataCollectionandProcessingThe8-nodewirelesssensornetworkiscomposedexclusivelyofOnset’sHobodataloggers(8)andsensors(62).Eachdataloggerispoweredbyitsownsolarpanel.Sensorsmeasureprecipitation,leafwetness,PAR,solarradiation,andsoilmoisture.DataarerelayedtotheJornadaHeadquartersandsenttoOnset.ThedataareavailableforvisualizationanddownloadingviaHobolink(http://www.hobolink.com).Theonlineserviceallowsuserstosetupalertsforsystemmalfunctionsandautomatedreportingofdata.Ourteamdevelopedadatabaseschema(usingMySQLforimplementation)thatusesacorecommonconceptamongalldatasets-ameasurementonafocalentitybyanobserverataspecificlocationandtime-toorganizedata.Aroundthiscore,thereareothertablestostoremetadatasuchasprojectinformation,maintenancerecords,andrelatedfiles.ThewirelesssensornetworkisimportedintothedatabaseviacustomSQLscripts.Withinthedatabase,scheduledqueriescandobasicdataquality-checkingandflagging,andgeneratetablesofsummarizeddata(e.g.,dailymeans)thatcanbeaccessedshortlyaftertherawdatawereimported.
DataPublishingOurteamwrotewebservicesinPythontoquerythesummarizeddataanddelivertheresults(inJSONformat)toaJavaScript/HTML5websiteinwhichAmCharts’JavaScriptChartslibrary(http://www.amcharts.com/;freefornon-commercialuse)areusedtoplotthedataandgeneratereportsincludingthedataandpertinentmetadata.ThiswebsitealsorendersamapofthesiteandsensorsviaanESRIArcSDEgeodatabaseandArcGISServer.
MiddlewarePackageDescriptionsInthissection,severalmiddlewarepackagesaredescribedasabasicintroductiontothepackages.Nospecificendorsementorcriticismisimplied,anditisimportanttokeepinmindthatsoftwarepackagesareoftenrevised.
AquaticInformaticsAquarius:AQUARIUSisasoftwareforwatertimeseriesdatamanagementthatprovidesfunctionalitytocorrectandqualitycontroltimeseriesdata,buildratingcurves,transformandvisualizethehydrologicaldata,aswellastopublishthefielddatainreal-time.Toaccomplishthesetasks,AQUARIUSusesthreemaincomponents.DataAcquisitionSystemenablesaccessingtherealtimedatafromthefieldinstrumentationeitherastheextensiontotheEnviroSCADAorthroughthehotfolders.AquariusServerisawebcontrolleddatamanagementplatformthatenablescentralizedaccess
tothedatabasestoreddatasets.Also,theserverisusedtopublishthedataeitherthroughthepublicwebportalorasaRepresentationalStateTransfer(REST)webservicethatsupportstheWaterMLrepresentationofthedata.Finally,AQUARIUSWorkstationprovidesasetofdataprocessingtoolstoimportandprocessthedata,createratingcurves,andapplyQA/QCprocedurestothetimeseriesrecords.
AquariussystemcomponentsAquariusmodelingapproaches
CampbellScientificLoggerNet:LoggerNetisthemainCampbellScientificsoftwareapplicationusedtosetup,operate,andmanageasensornetworkthatusesCampbellScientificequipment.LoggerNetusesserialports,telephonydrivers,andEthernethardwaretocommunicatewithdataloggersviacellularandphonemodems,RFdevices,andotherperipherals.LoggerNetalsoincludesasuiteoftoolssuchastexteditorsforcreatingCampbellScientificdataloggerprograms,andmethodsforreal-timemonitoring,automateddataretrieval,datapost-processing,datavisualizationandmonitoringofretrievedinformation,anddatapublishingoptions.Moreadvancedfeatures,suchasexporttotoMySQLorSQLServerdatabases,arealsoofferedthroughadditionalLoggerNetapplicationsnotincludedinthestandardversion(LoggerNetDatabase,LoggerNetAdmin,LoggerNetRemoteetc.).
LoggerNet4.1InstructionManual
CUAHSIHIS:TheConsortiumofUniversitiesfortheAdvancementofHydrologicScience,Inc.'sHydrologicInformationSystem(CUAHSIHIS)isanadvancedwebservicebasedsystemcreatedtosharethehydrologicdata.ThesystemiscomprisedofhydrologicdatabasesrunningtheCUAHSIObservationsDataModel(ODM)andservershostedbydifferentorganizationsthatareconnectedthroughthewebservices.CentralizedHISmodulesareusedfordatapublication,access,anddiscovery,whilelocal(andcentral)modulesprovidetoolsfordataanalysisandvisualization.Overall,CUAHSIHISisusedtostoretheobservationdatainarelationaldatamodel(ODM),accessthedatathroughinternet-basedWaterDataServicesthatpublishtheobservationsandmetadatausingaconsistentWaterMarkupLanguage(WaterML),indexthedatathroughaNationalWaterMetadataCatalog,andprovideadiscoveryofdatathroughamapandkeywordsearchsystem.
CUAHSIHIScomponentsCUAHSIHISlistofpublicationsDevelopmentofaCommunityHydrologicInformationSystem
DataTurbineInitiative:DataTurbineisarealtimestreamingdataenginethatactsasablackboxtowhichdataproviders(sources)senddataandconsumers(sinks)receivedatafrom.DataTurbineisimplementedasamulti-tierjavaapplicationwithserversacceptingandservingupthedata,sourcesloadingthedataontotheservers,andsinkspullingthedataforvisualizationandanalysispurposes.Eachofthesecomponentscanbelocatedonthesamemachineordifferentcomputersandcancommunicatewitheachotherovertheinternet.Dataisheterogeneousandthesinkscouldaccessanytypeofdataseamlessly.Whilenewdataisloadedtotheserver(s),olddataisbeingerasedinordertofreethereceivingbuffers.
DataTurbine–SensorNetworksWorkshopUnderstandingDataTurbine
GCEDataToolbox:TheGeorgiaCoastalEcosystems(GCE)DataToolboxisasoftwarelibraryformetadata-basedprocessing,qualitycontrol,andanalysisofenvironmentaldata.ItisdesignedandmaintainedbyWadeM.Sheldon,Jr.oftheGeorgiaCoastalEcosystemsLTERandisavailablefreeofcharge,butdoesrequireaMATLABlicense.TheToolboxcanbeusedforawidevarietyofenvironmentaldatamanagementtaskssuchas:importingrawdatafromenvironmentalsensorsforpost-processingandanalysis;performingqualitycontrolanalysisusingrule-basedandinteractiveflaggingtools;gap-fillingandcorrectingdatausinggatedinterpolation,driftcorrectionandcustomalgorithms/models;visualizingdatausingfrequencyhistograms,line/scatterplotsandmapplots;summarizingandre-samplingdatasetsusingaggregation,binning,anddate/timescalingtools;synthesizingdatabycombiningmultipledatasetsusingjoinandmergetools;miningnear-real-timeorhistoricdatafromtheUSGSNWIS,NOAANCDC,NOAAHADSorLTERClimDBservers;
harvestingandintegratingchanneldatafromDataTurbineservers.Thissoftwareishighlymodularandcanbeusedasacomplete,lightweightsolutionforenvironmentaldataandmetadatamanagement,orinconjunctionwithothercyberinfrastructure.Forexample,newlyacquireddatacanberetrievedfromaDataTurbineorCampbellLoggerNetDatabaseserversforqualitycontrolandprocessing,thentransformedtoCUAHSIObservationsDataModelformatanduploadedtoaHydroServerfordistributionthroughtheCUAHSIHydrologicInformationSystem.
GCEToolboxoverview(GeorgiaCoastalEcosystemsLTER)
KistersWISKI:WISKIsoftwarepackageisatoolforhydrologicaldatamanagement.WISKIisaWindowsbasedclient/serversystemhostedthroughtheMSSQLorOracledatabases.Thesoftwarecombinesdatamanagementfeatureswithtoolstocollect,store,analyze,visualize,andpublishtheobservationdata.Typicaldatainputsourcesareremotedatacollectedfromthefielddataloggers,dataimportedfromthirdpartiesviainputfilesindifferentformats,recordsobtainedfromdigitizationofgraphicalcharts,ormanualinputs.MainWISKImoduleincorporatesthedatamanagementfunctionalityaswellasthedischargeandratingcurvetoolsthatworkcloselywithotherKisterssoftwarecomponentsincludingKiWQM(waterquality),KiWIS(datapublishingthroughwebservices),SODA(telemetryhardwaremoduleforremotedatacollection),KiDSM(taskscheduler),Modelingapps(Link-and-Nodeandstatisticalforecast),ArcGISextensions,WebPublicandWebPro(webserverpublishingapplications).
WISKIsystemoverviewWISKImodules
NexSensiChart:NexSensiChartisaWindows-baseddataacquisitionpackagedesignedforenvironmentalmonitoringapplications.iChartsupportsinterfacingbothlocally(directconnect)andremotely(throughtelemetry)withmanypopularenvironmentalproductssuchasYSI,OTT,andISCOsensors.AdditionallyitcaninterfacewithaNexSensiSICandsubmersibledataloggers.Thesoftwaresimplifiesandautomatesmanyofthetasksassociatedwithacquiring,processing,andpublishingenvironmentaldata.
[http://nexsens.com/pdf/nexsens_wqdata_spec.pdfNexSensWQDataandiChartsoftwareoverviewiChartsoftwareproductspotlight(LakeScientist)NexSensdatawebsite(BucknellUniversity)iChartquickstartguide
OnsetHobolinkandHoboware:OnsethastwomainsoftwareapplicationstosupportitsHobodataloggersandsensors..Hobolinkisanonlineservicesthatprovides5-minutedatafromitsdataloggers,multiplegraphsofdatastreams,customizeableinterface,settingsforautomatedalertsforsensormalfunction,andcustomizeabledatareportingfeatures.Hobowareisadownloadablepackagethatprovidesmorefunctionality,suchaslinechartsformorethanonedatastream,chartingtypesthatareunavailableinHobolink,etc.
HOBOwareProvs.HOBOwareLiteListoffeaturesHOBOware®User’sGuide(Datavisualizationandanalysis)HOBOlink®User’sGuide(DataaccessandcontrolofHOBOdevices)
VistaDataVision:VDVisadatamanagementsystemwithtoolstostoreandorganizedatacollectedfromavarietyofdatalogger“dat”files.Thesoftwareoffersdifferentvisualization,alarming,reporting,andwebpublishingfeatures.Dataloggerfilesareparsed,imported,andstoredintotheMySQLrelationaldatabasefromwherethedatacanbecustomqueriedandexportedorpublishedonawebserver.NumerousaccesscontroloptionsareavailablesoVDVuserscanhavecustomizedaccesstospecificstationorsensordata.
VistaDataVisionbrochuresandmanualsVistaDataVisionversioncomparisonVistaDataVisionReview(LTER)
YSIEcoNet:EcoNetsoftwareworkswithYSImonitoringinstrumentation.ThesoftwareoffersdeliveryofdatafromthefielddirectlytotheYSIwebserver.Nodesktopapplicationsareusedandalldataare
storedontheremoteYSIcomputer.Systemuserscanaccessvisualization,reports,alarms,andemailnotificationtoolsdirectlyontheYSIserver.
EcoNetsystemoverviewEmbeddingEcoNetdata
Thefollowingtablesdescribefeaturesofmiddlewarepackagesknowntotheauthorsofthewiki.Thesetablesdonotimplyendorsementorcriticismofanygivenproduct,andmayreflectolderversionsofproductsthancurrentlyexist.
Basic:Thesoftwarehasbuilt-inbutbasicfeaturescomparedtotheoverallmarket.Standard:Thesoftwarehasbuilt-infeaturesthatarestandardwithcomparisontotheoverallmarket.Advanced:Thesoftwarehasbuiltinadvancedfeaturescomparedtotheoverallmarket.Custom:Thesoftwaredoesn'thavebuilt-infeatures,butaprogrammercandevelopthem.None:Thesoftwaredoesn'thavethefeature,anditcannotbecustom-developed.Has:Thesoftwarehasbuiltinfeatures,butthelevelcomparedwiththeoverallmarketisunknowncurrently.Unknown:Thecapacityofthesoftwareisunknown.
Table1.Middlewarebasicfeatures:licensing,cost,inputandexportdataformats,andrequiredlevelofprogrammingexpertise.
Program Licensing Cost Inputdataformat Exportdataformat
Neededprogramming
expertiseAntelopeOrb Proprietary Pay ASCII,Binary ASCII,
Binary Advanced
Aquarius Proprietary Pay Advanced
ArcGIS Proprietary Pay ASCII,shapefiles ASCII,shapefiles Advanced
B3 Opensource Free ASCII ASCII NonetoBasicBigSenseandLtSense Opensource Free Binary CSV,JSON,
TXT,XML Advanced
CosmCUAHSIHIS Opensource Free ASCII XML,
WaterML Standard
DataTurbine Opensource Free ASCII,Binary ASCII,Binary Advanced
EddyPro Proprietary Pay Binary ASCII,Binary Standard
GCEToolbox
MATLABisproprietary,Toolboxisopensource
MATLABispay,Toolboxisfree
ASCII,Binary,database
ASCII,Binary,.mat,database
ToolboxisStandard,MATLABisAdvanced
Hobolink(Onset) Proprietary Free Proprietary ASCII,
Proprietary None
Hoboware(Onset) Proprietary Pay Proprietary ASCII,
Proprietary None
Kepler Opensource Free ASCII,Binary ASCII,Binary BasictoAdvanced
LakeAnalyzer
Proprietary/Opensource Free ASCII ASCII Basic
LoggerNet(Campbell) Proprietary Pay Proprietary
ASCII,database Standard
Nexsen'sTechnology Proprietary Pay Unknown Unknown Unknown
PandasPythonisFree,PandasisFreeandOpenSource
Binary,encoded,np.array,database,markup
Binary,encoded,np.array,database,markup
AdvancedandCustom
Pegasus Unknown Unknown Unknown Unknown Unknown
R Opensource Free ASCII,Binary,database
ASCII,Binary,database
StandardtoAdvanced
SAS Proprietary Pay ASCII,Binary,database
ASCII,Binary,database
StandardtoAdvanced
Taverna Opensource Free Unknown Unknown StandardtoAdvanced
VistaDataVision Proprietary Pay ASCII ASCII Unknown
VizTrails Opensource Free ASCII ASCII BasictoAdvancedWaterMLsupport Unknown Unknown Unknown Unknown Unknown
WISKI Proprietary Pay ASCII ASCII AdvancedYSIEcoNet Proprietary Pay Unknown Unknown Unknown
Table2.Middlewaredatahandlingfeatures:hardwarecommunication,abilitytodoqualityassuranceandcontrol(QA/QC),abilitytostreamdatatoarchives,datavisualization,datatransformationandanalysis,andabilitytogeneratecustomSQLqueriesorotherscripting.
Program Hardwarecommunication
QA/QCcapacity
Capacitytostreamtoarchive
Datatransformationandanalysis
Datavisualization
CustomSQLqueries/Scripting
AntelopeOrb Custom Custom Custom Custom Custom CustomAquarius Has Advanced Advanced Advanced Advanced Advanced
ArcGIS Unknown Advanced Unknown Advanced Advanced StandardtoAdvanced
B3 None Advanced None Has Has NoneBigSenseandLtSense Custom Custom Has Has Unknown Unknown
Cosm Unknown Unknown Unknown Unknown Unknown Unknown
CUAHSIHIS Custom Advanced Advanced(ODM)
Advanced(HydroDesktop,ODMTools,TSA)
Advanced(HydroServerTSA,HydroDesktop,externalprograms)
Advanced(HydroDesktop,ODMTools)
DataTurbine Custom Custom Custom Basic(NEESRDV)
Basic(NEESRDV) Has
DataFrames.jl ThroughCandPythonlibs
Has,stats.jl,numpy
Hasthroughcode.native
HasthroughGadfly,Matplotlib,D3,orWinston
Has Has
EddyPro Unknown Has Unknown Has Has Unknown
GCEMatlab Custom Advanced StandardtoAdvanced
Advanced(withMatlab) Advanced Advanced
Hobolink(Onset)
Basic None Basic Standard None None
Hoboware(Onset) Advanced Has Unknown Advanced Standard None
Kepler Custom Custom Custom Custom Custom CustomLakeAnalyzer None Basic None Has Has None
LoggerNet(Campbell) Advanced Basic Basic Basic None None
Nexsen'sTechnology Has None Basic Basic None None
Pegasus Unknown Unknown Unknown Unknown Unknown Unknown
RCustomwithopen-sourcetech
Custom Custom Advanced/Custom Advanced/Custom Custom
SAS None Custom Custom Advanced Advanced CustomTaverna Unknown Custom Custom Custom Custom CustomVistaDataVision None Basic Standard Standard Standard Basic
VizTrails Unknown Custom Unknown Custom Custom CustomWaterMLsupport Unknown Unknown Unknown Unknown Unknown Unknown
WISKI Has Advanced Advanced Advanced Advanced AdvancedYSIEcoNet Has None Basic Basic None None
Table3.Middlewareotherfeatures:Taskautomation,capacityformulti-tierarchitecture,websitepublishing,streamingthroughwebservices,supportformodeling.
Program Taskautomation
Multi-tierarchitecture Websitepublishing
Streamingthroughweb
service
Supportformodeling
AntelopeOrb Has Standard Custom Custom Unknown
Aquarius None Advanced Advanced Unknown UnknownArcGIS Advanced Unknown Advanced Advanced AdvancedB3 Unknown None None None HasBigSenseandLtSense Has Has Has(viaRESTfulservices) Has Unknown
Cosm Unknown Unknown Unknown Unknown UnknownCUAHSIHIS
Advanced(ODMSDL) Advanced Advanced(HydroServer,
Website,HydroSeek) Advanced Has(throughexternalprograms)
DataTurbine Has Standard Basic Standard NoneEddyPro Has Unknown Unknown Unknown Unknown
GCEMatlab Advanced None Advanced None Advanced(withMatlab)
Hobolink(Onset) Basic None Advanced Has None
Hoboware(Onset) Basic None None None None
Kepler Custom Unknown Custom Custom Advanced/Custom
LakeAnalyzer
Custom(Matlab)
None None None Basic(linktoGLMmodel)
LoggerNet(Campbell) Standard Basic Basic None None
Nexsen'sTechnology Basic Basic Basic None None
Pegasus Unknown Unknown Unknown Unknown UnknownR Custom None Custom(shinypackage) Custom Advanced/CustomSAS Custom None Custom Custom Advanced/CustomTaverna Custom None Unknown Custom Advanced/CustomVistaDataVision None Standard Advanced None None
VizTrails Custom None Custom Custom Advanced/CustomWaterMLsupport Unknown Unknown Unknown Unknown Unknown
WISKI Advanced Advanced Advanced Advanced HasYSIEcoNet Basic None basic None None
References“OGCWaterMLStandardRecommendedforAdoptionasJointWMO/ISOStandard.”OpenGeospatialConsortium,10Dec.2012.Web.14May2013.
Sensor Data Quality
Discusses different ways sensor data may be
compromised and how to control for it in the data stream.
ReturntoEnviroSensingClustermainpage
Contents1Contacts2Overview3Introduction4Methods
4.1SensorQualityAssurance(QA)4.2QualityControl(QC)ondatastreams
4.2.1Dataqualifiers(dataflags)4.2.2Dataqualitylevel4.2.3Datacollectioninterval
4.3DataManagement5BestPractices6CaseStudies7References8Resources
ContactsTheprimaryeditorsforthispagemaybecontactedforquestions,comments,orhelpwithcontentadditions.
DonHenshaw–U.S.ForestServiceResearch,PacificNorthwestResearchStation–don.henshawatoregonstate.eduMaryMartin–HubbardBrookLTER,UniversityofNewHampshire–mary.martinatunh.edu
OverviewAnewgenerationofenvironmentalsensorsandrecentmajortechnologicaladvancementsintheacquisitionandreal-timetransmissionofcontinuouslymonitoredenvironmentaldataprovidesamajorchallengeinprovidingqualityassurance(QA)andqualitycontrol(QC)forhigh-throughputdatastreams.Deploymentsofsensornetworksarebecomingincreasinglycommonatenvironmentalresearchlocations,andthereisagrowingneedtoaccesstheselargevolumesofdatainnearreal-time.However,thedirectreleaseofstreamingsensordataraisesthelikelihoodthatincorrectormisleadingdatawillbemadeavailable.Additionally,asresearchapplicationsbegintorelyonreal-timedatastreams,thecontinualandconsistentdeliveryofthisinformationwillbeessential.Thisincreasingaccessanduseofenvironmentalsensordatademandsthedevelopmentofstrategiestoassuredataquality,theimmediateapplicationofqualitycontrolmethods,andadescriptionofanyQA/QCproceduresappliedtothedata.
TraditionalQCsystemstendtooperateonfile-basedcollectionsofenvironmentaldatafromfieldsheets,fieldrecordersorcomputers,ordownloadeddataloggerfiles.Manuallyappliedtoolsandtechniquessuchasgraphicalcomparisonsareusedtoprovidedatavalidation.Documentationistypicallynotwell-organizedandnotdirectlyassociatedwithdatavalues.Theapplicationofthesesystemsmustbalancetheneedforreleasewithoutmonthsoryearsofdelayversusthedeliveryofwell-documented,highqualitydata.However,withincreasingdeploymentofsensornetworks,theseoldersystemsfailtoscaleorkeeppacewithuserneedsassociatedwithhighvolumesofstreamingdata.ComprehensiveandresponsiveQCsystemsareneededthataredesignedtoreducepotentialproblemsandcanmorequicklyproducehighqualitydataandmetadata.MethodsdescribedhereforbuildingaQCsystemwillincludeidentificationof:
preventativemeasurestobetakeninthefieldqualitychecksthatcanbeperformedinnearreal-timenecessarydatamanagementpractices
IntroductionAteamapproachisnecessarytobuildaQCsystemandmultipleskillsandpersonnelareneeded.TheQCsystemwillbeginwithsystemdesignandpreventativemeasurestakeninthefieldandcontinuethroughdataqualitycheckinganddatapublishing.Aleadscientistwillproposeresearchquestionsanddescribethetypesofdataandnecessaryquality.Expertiseinfieldlogistics,sensorsystemsandwirelesscommunicationswillplayaroleinsitedesignandconstruction.Asensorsystemexpertwillprovideknowledgeofspecificsensorsandprogrammingskillstoestablishqualitycontrolchecking.Fieldtechnicianswithstrongknowledgeoftheoverallscientificgoalsandcommunicationskillscanhelptoarticulateissuesanddiscoversolutions.Adatamanagerwillbeneededtoguidedeliveryandarchivalofdocumenteddataproducts.Communicationamongallpartiesisnecessaryforthemosttimelydeliveryofwell-documentedandhighqualitydata.
AllteammemberswillbeneededtodefineaQCworkflowthatisusefulindescribingproceduresandpersonnelresponsibilitiesasthedataflowsfromfieldsensorstopublisheddatastreams.AQCsystemmustallowforaniterative,qualitymanagementcycletoaccommodatefeedbacktopolicies,procedures,andsystemdesignasdatacollectionscontinueovertime.Asystemwilldependoncommunicationamongteammemberstoassurethatnotedsensordatacollectionandtransportissuesandproblemsareaddressedquicklyanddocumentedinthedatastream.Anactive,well-documentedQCsystemwillhelptoestablishuser-confidenceindataproducts.
Automatedorsemi-automatedQCsystemsareneededthatcanadequatelyreviewandscreensourcedataandstillprovideforitstimelyrelease.Automatedqualitycontrolprocessessuchasrangecheckingcanbeperformedinnearreal-timeandasystemcanassigndataqualifiercodes,orflags,foranysensorvaluewhenproblemsoruncertaintyoccursinthedatastream.However,theseprocessescanoftenonlyindicatepotentialproblemsinthedatastreamthatstillrequiremanualreview.AcomprehensiveQCsystemisonlyachievableasahybridsystemdemandingbothautomatedQCchecksandmanualinterventiontoassurehighestdataquality.
Forthischapterwewilldefinequalityassurance(QA)asthosepreventativeprocessesorstepstakentoreduceproblemsandinaccuraciesinthestreamingdata.Thesewillincludesensornetworkdesign,protocoldevelopmentforroutinemaintenanceandsensorcalibration,andbestpracticeproceduresforfieldactivitiesanddatamanagement.Qualitycontrol(QC)primarilyreferstothetestsprovidedtocheckdataqualityandtheassignmentofdataflagsandothernotationstoqualifyissuesanddescribeproblems.QCsystemreferstothiscompletesetofQA/QCpreventativeandproduct-orientedprocesses.
Methods
SensorQualityAssurance(QA)
Qualityassurance(QA)referstopreventativemeasuresandactivitiesusedtominimizeinaccuraciesinthedata.Forexample,schedulingregularsitevisitsandmaintenanceprocedures,orcontinuouslymonitoringandevaluatingsitesensorbehaviorcanpreventsensorfailuresorleadtoearlydetectionofproblems.Designingnetworkswithredundantsensormeasurementsprovidesanadditionalmeanstoqualitychecksensordataandassurecontinuityofmeasurement.Ofcourse,thetimeandexpensetoconducthigh-levelmaintenanceproceduresorimplementefficientandredundantdesignsmaybelimitedbyprojectbudgets,butmaybewarrantedbytheimportanceofthedata.HerewedescribeQAmeasurescategorizedbydesign,maintenance,andpractices:
1. Designa. Designforreplicatesensors.Co-locatedsensorsindependentofthedataloggerandincludedinthedataflowcanbeusefulchecks.Forexample,checktemperaturemeasurementsmightbemadealongsideaCampbellthermistorwithaHOBOpendant,SDI-12temperaturesensor,oranalogthermocouple.Ideally,threereplicatesensorsareusedsothatsensordriftcanbe
detected(withtwosensorsitmaynotbeobviouswhichsensorisdrifting).b. Assureanadequatepowersupply.Powerconsiderationsmightincludeaddingalowvoltagecutoff(LVD)topreventlogger“brown-out”,oraddingpoweraccessorieswithswitchedpowersupply(e.g.CSIlogger,IPrelay)toprogrammaticallycontroloptionaldevices(radios,power-cycleloggers).
c. ProtectallinstrumentationandwiringfromUVlight,animals,humandisturbance,etc.suchaswithflexconduitorenclosures.
d. Implementanautomatedalertsystemtowarnaboutpotentialsensornetworkissuesorcertainevents,e.g.,extremestorms.Forexample,automatedalertsmightsignallowbatterypower,indicatesensorcalibrationisneeded,orindicatehighwindsorprecipitation.
e. Addon-sitecamerasorwebcams.Webcamscanbeusedtorecordweatherorsiteconditions,animaldisturbanceorhumanaccess.
2. Maintenancea. Scheduleroutinesensormaintenance.Routinesitevisitsfollowingstandardprotocolscanassurepropermaintenanceactivities.
b. Standardizefieldnotebooks,checksheetsorfieldcomputerapplicationstoleadfieldtechniciansthroughastandardsetofproceduresandassurethatallnecessarytasksareconducted.Thesenotebooksorapplicationscanserveasanentrypointfortechnicalobservationsregardingpotentialproblemsorsensorfailures.
c. Scheduleroutinecalibrationofinstrumentsandsensorsbasedonmanufacturerspecifications.Maintainingadditionalcalibratedsensorsofthesamemake/modelcanallowimmediatereplacementofsensorsremovedforcalibrationtoavoiddataloss.Otherwise,sensorcalibrationscanbescheduledatnon-criticaltimesorstaggeredsuchthatanearbysensorcanbeusedasaproxytofillgaps.
d. Anticipatecommonrepairsandmaintaininventoryreplacementparts.Sensorscanbereplacedbeforefailurewheresensorlifetimesareknownorcanbeestimated.
e. Assureproperinstallationofsensors(correctorientation,cleanwiring,solidconnectionsandmounting,etc.).Protocolsforinstallingnewsensorswillalsoassurethatkeyinformationisloggedregardingasensor’sestablishment(SeeManagementsection).
3. Practicesa. Maintainanappropriatelevelofhumaninspection.Developthecapabilitytoeasilyviewreal-timedataandexamineregularly(daily/weekly).Regularinspectioncanhelpidentifysensorproblemsquicklyandmightallowforfewersitevisitations.Certainproblemssuchasvisibleextremespikes,intermittentvalues,orrepetitivevaluescanbeeasilyviewedinrawdataplots.
b. Spotcheckmeasurementswithareferencesensorcanberoutinelyusedforsomemeasurements,i.e.temperature,snowdepth,etc.toverifytheperformanceofinsitusensors.
c. Aportableinstrumentpackagethatcanberotatedamongsensorsitescanbeusefulinidentifyingproblems.Theportablepackagemightrunalongsideinstalledsensorsoverafixedperiod(dailyorlongercycle)toinspectfordriftingorfailingsensors.Thistypeofco-locationmightbedonetoauditsensorperformanceonanannualorperiodicbasis.
d. Recordthedateandtimeofknowneventsthatmayimpactmeasurements(seeManagementsection).Ideally,thesenotescanbeenteredorcapturedforautomatedaccess.Forexample,sensorsareknowntodemonstratealternativebehaviorduringsitevisitsormaintenanceactivities,andlightortripsensorsmightbeusedinrecordingsensoraccess.
e. RoutinelysynchronizethetimeclockondataloggerswiththepublicNetworkTimeProtocol(NTP)server(http://www.ntp.org/).
f. Provideareferencetimezoneandavoidchangingdataloggertimestampsfordaylightsavingstime.ManywouldarguethebestpracticeistooutputdatainCoordinatedUniversalTime(UTC),whichisparticularlyusefulwhendataspansmultipletimezones.However,mostlocalusersofthedatapreferseeingoutputinlocalstandardtimebecauseitcorrespondstolocalecologicalconditions,i.e.,oceantidesorsolarnoon,andmayeasetroubleshootingorfield-basedchecking.AnotherstrategyistoprovidethelocaloffsetfromUTCwithinthedatastreamtoallowsimpleconversiontoUTC,orallowuserstoquerythedataandchoosewhatevertimezonetheywouldliketoreceivethedatain.ISO8601(http://www.iso.org/iso/home/standards/iso8601.htm)isaninternationalstandardcoveringtheexchangeofdateandtime-relateddataandprovidestimezonesupport.Forexample,2013-09-
17T07:56:32-0500providestheoffsetfromanESTtimezone,however,lackofsupportinmanyinstrumentsandsoftwarepackagesisadrawbacktoitsuse.Recently,RESTservicesareconstructedtoallowthereturnofdatetimevalueswithanimplicittimezoneoffsetenablingconvenientsharingofdatawithtimestampflexibility.
g. Ensurethatfilesstoredontheloggeraretransmittederror-freetothedatacenterforimport(useerror-correctedprotocolslikeFTP,YmodemandHTTP).Schedulemanualfiledownloadandpost-importchecksifnon-error-correctedprotocolsareusedasaninterimmeasure.
QualityControl(QC)ondatastreams
QualityControlofdatastreamsinvolvesautomatedorsemi-automatedprocesseswherebyvaluesandassociatedtimestampsarecross-checkedagainstpredeterminedstandardsandseparateconcurrently-collecteddatastreams.QCtakesplacepost-collectionduringthestreamingprocessorafterdataisassimilatedintoacentraldatabase.Someprocessescanbeperformedin“nearreal-time”,oratthetimethedatastreamsarebroughtintothedatabase,anddatacanbereleasedas“provisional”afterthisinitialinspectiontosatisfyimmediateuserneeds.Otherprocessesmayrequiresomedelaysuchastrendanalysisforsensordriftdetection.Resultsofthesetestsaretypicallyaccountedforinadataqualifierflagforeachvalue.Manualinspectionandresolutionofsuspectorproblemdataisalsoanecessarystepbeforedataisreleasedwith“provisional”tagsremoved.Revisedorcorrecteddataversionscanbepublishedatalaterdate,anditisimportanttoprovidedocumentationonthetypesofqualitychecksconductedwitheachreleaseofthesedata.
Threecategoriesofautomatedorsemi-automatedQCprocessescanbedescribed:
1. independentevaluation,wherebyasingledatapointischeckedagainstpredeterminedstandards(suchasrangechecks)
2. point-to-pointevaluation,wherebyasingledatapointiscomparedtootherconcurrently-observeddatapoints(suchasreplicatesensors)
3. many-point,ortrendanalysis,wheresometimeframeofobservationsareexaminedstatisticallyoragainstotherdatatrends.Thefirsttwoareessentiallynearreal-timechecks,whereasthethirdcaninvolvetimeframesseveralordersofmagnitudelongerthanthemeasurementinterval.
Nearreal-timeprocessinginvolvesautomatedcheckingofeachdatapointanditsassociateddateandtime.Dataqualifiercodes,ordataflags,willbeassignedbasedonthesechecks.Theseautomatedchecksandflagassignmentsareessentialinprocessingthemassvolumesofdatastreamingfromsensornetworks,butarenotsufficient.Humaninspectionofdataiscriticalandparticularlymightfocusondatapointsthatareflaggedbyanautomatedsystem.ThefollowingterminologycorrespondswithqualitycontroltestslistedinCampbelletal.2013.
Themostcommonandsimplestcheckstoimplement
1. Timestampintegritychecks–ensuresthateachdate-timepairissequential.Withfixedintervaldataitispossibletocross-checktherecordedandexpectedtimestamp.
2. Rangechecks-ensuresthatallvaluesfallwithinestablishedupperandlowerbounds.Boundscanbeestablishedbasedonthespecificsensorlimitations,orcanbebasedonhistoricalseasonalorfinertime-scalerangesdeterminedforthatlocation.Separateflagsmightbeassignedtoqualifyimpossiblevalues(basedonsensorcharacteristics)versusextremevaluesthatareoutsideofthehistoricnormsbutwithinthesensoroperatingrange.
Othercheckscanbeemployedfornearreal-timeorinpost-streamingQC
1. Persistence-checksforrepeated,unchangingvaluesinmeasureswhereconstantchangeisexpected.
2. Spikedetection-checksforsharpincreasesordecreasesfromtheexpectedvalueinashorttimeintervalsuchasaspikeorstepfunction.Thesetestsoftenemploystatisticalmeasuressuchasthestandarddeviationoftheprecedingvaluesindetectingoutliersorspikesthatexceed2-3sigma(standarddeviations)fromwhatisexpected.Analternativealgorithmistochecktoseethatthe
medianvalueofpointst,t+1andt-1isnotmorethanafixedmagnitudefrompointt.3. Internalconsistency–plausibilitychecksforconsistencybetweenrelatedmeasurementssuchasthatthemaximumvalueisgreaterthantheminimumvalue,orthatsnowdepthisgreaterthanitssnowwaterequivalence.Thesechecksmayalsoexaminevaluesthatarenotpossibleunderknownconditionssuchasincomingsolarradiationrecordedduringnighttime.
4. Spatialconsistency–checksforsensordriftorfailurebasedonintersitecomparisonsofnearbyidenticalsensors.Theintegrationofseveraldatastreamsmaybepossibleinpost-processinganddriftingmaybedetectedbasedonknowncorrelationsorpriorconditioningwithredundantornearbysensors.
Dataqualifiers(dataflags)
TheQCsystemmustbeabletoassignoneormorecodestoeachdatapointbasedontheresultofQCtestsorotheravailableinformation.DataflagsmaybeassignedduringtheinitialQCteststhatareintendedtoguidelocalreviewinidentifyingerroneousorproblematicdata(e.g.,invalidvaluesoutofrangeorbelowdetectionlevel),ormightbeflagsthatindicatesite-specificevents(e.g.,lowbatteryvoltage,anicingorothereventorsitecondition,ornotificationofaduedateforsensorcalibration).Theseinternalflagsmayusearichervocabularyoffine-grainedflagsthanwhatisnecessarytosharepublicly.Reviewinginternalflagsisnecessarytoresolveissuesthatmaybeevidentinthedatabeforethesedataaremadeavailableinfinalpublishedversions.Somesystemsmightemploya“rejected”flagasameansofpreservinganoriginalvaluebutallowcapabilitytowithholdthatvaluefrompublicuse.
Externalflagsprovidedinpublisheddatawilllikelybeamoregeneral,simplersuiteofflagsbettersuitedforpublicconsumption.Multipleinternalflagswouldbemappedintothismoregeneralflagset.Whilemanyvocabulariesareinuse,anexamplesuiteofexternalflagsfollows:
A:AcceptedE:EstimatedM:MissingQ:QuestionableSpecificationofuncertainty
The“Accepted”flagshouldbeassignedtovalueswherenoapparentproblemsarediscovered,buttheQCteststhatwereappliedshouldbedescribed.The“Accepted”flagislikelylesscommonlyusedthansimplyleavingtheflagblank.Iftheblankflagisuseditshouldbeincludedinthelistofflagsanddefined,e.g.,“noQCtestswereapplied”or“norecognizableproblems”or“provisionaldata”.Ablankflagcanbeincludedinanenumeratedlistingofvalidflagsbutmaynotbethebestpracticewithinsomemetadatastandards.A“Provisional”flagisnotlistedherebutmaybeappropriate.Alternatively,“provisional”datamightbeindicatedwithina“qualitylevel”attributeontherecordlevelorfilelevelratherthanassociatedwithanindividualmeasurement(SeeDataQualityLevelsectionbelow).
ExamplesofQualityFlagSets(listedcodesmayonlyrepresentasubsetofeachflagset)
AndrewsLTER WISKI(Univ.ofSaskatchewan) HFRLTER VCRLTER SeaDataNet
A-Accepted 10-Rejected M-Missing Blank-OK 0-noQC
E-Estimated 15-Disregard E-Estimated
Q-Questionable
1-Goodvalue
M-Missing 20-Manuallyedited Q-Questionable M-Missing
2-Probablygood
Q-Questionable 25-Simulated R-RangeError 4-Badvalue
Measurementspecific,e.g.,B-Belowdetection 30-Filled S-Data
Spike6-BelowDetection
Theevaluationofextremevaluesmaybenefitfrom“expertinspection”thatcanbebuiltintotheQC
system.Historicalrangescanbedevelopedforsiteswithlong-termsensormeasurementsatannual,seasonalorfinertimescales.Forremotesitesthataredatasparsetheserangesmaybeaprimarytoolforascertainingdataquality,and,forexample,aQCsystemmayflagvaluesthatfalloutsideoftwostandarddeviationsoflong-termmeans.Whereothernearbyinsitumeasurementsareavailableorwherenationalsurfacestationnetworksareavailable,qualitychecksmaybeimprovedthroughcomparisonofvalues.Accesstomultipleclimateelementsmayprovidetheabilitytocreaterelationshipsamongstationsandallowspecificationofuncertaintyforallvalues.EvaluationofaQCsystem’sperformanceindetermininguncertaintyorinestimatingvalueswillbeimportantinmakingsystemimprovementsandpotentiallyallowingaretrospectivere-applicationofqualitycontrol(Dalyetal.2005).
Wherespecificationsofuncertaintycannotbedetermined,valuesmaybedeemed“Questionable”byanautomatedsystem.Ultimately,manualevaluationmayberequiredandadecisionmadeastowhetheradatapointcanbereleasedas“Accepted”versusremovingfromthedatastreamandlistingas“Missing”versusleavingthevalueflaggedas“Questionable”.AsDalyetal.2005pointsout,“intheend,thefundamentaldilemmawithnearlyallqualitycontrolisatensionbetweentherelativemeritsandcostsofaccidentallyrejectinggooddata,oraccidentallyacceptingbaddata,andatradeoffisusuallyinvolved”.
Wheredataaremissing,anoptionmightbetofillgapswith“Estimated”data.FromCampbelletal.2013,“fillingthesegapsmayenhancethedata’sfitnessforusebutcanpossiblyleadtomisinterpretationorinappropriateuse,andcanbeacomplexendeavor.Thedecisionaboutwhethertofillgapsandtheselectionofthemethodwithwhichtodosoaresubjectiveanddependonfactorssuchasthelengthofthegap,thelevelofconfidenceintheestimatedvalue,andhowthedataarebeingused”.
Dataqualitylevel
ThelevelofQCtestingappliedtoasetofdatashouldbewell-describedandtransparenttothedatauser.Publishingofdataisindependentofdataquality,andusersneedtobeabletoquicklyidentifyitsqualitylevel,forexample,todiscernwhetherthedataisunchecked,rawdatavs.thoroughlyinspectedandreviewed.GroupssuchasNEONandCUAHSIhaveassignedaqualityleveltodataproductsincludingoriginalrawdata,initiallyinspectedandflaggedrawdata,publishedrawdata,andestimated,gap-filledorothersyntheticproductsinvolvingmodel-basedorscientificinterpretation(Seereferencesindata_quality_level.pdf).Whilethesegroupsdonotnecessarilyagreeontheactuallevelassignment,therearesomegeneralconceptsofqualitylevelthatcanbeagreeduponandarerepresentedhere:
Level0(raw)-Unfiltered,rawdata,withnoQCtestsappliedandnodataqualifiers(flags)applied-Typically,theseareoriginaldatastreamsthatarenotpublishedbutthatshouldbepreserved.Dataqualityflagsarenotassigned.Conversionofrawmeasurementvaluestomoremeaningfulunitsmaybeacceptable,e.g.,thermocoupletableconversionsofmillivoltstodegreesC.
Level1(provisional)-Provisionaldatareleasedinnearreal-timewithinitialQCtestingapplied-PreliminaryQCtestsordatacalibrationareapplied,potentiallyinnearreal-timethroughautomatedscripts.Dataqualifiersareassignedandmaybeforinternaluseintendedtoguidefurtherreviewofthedata(SeeDataqualifierssubsection).Alldataqualifiersshouldbewell-defined.Rangeanddate-timecheckingarecommonlyappliedtothisprovisionallevel.TheQCtestsappliedshouldbewell-described.
Level1(published)-Publisheddatawithadelayedreleaseafterautomatedandmanualreview-QCtestingiscompleteandsuspectdatahasbeeninspectedandflaggedappropriately.Eachvalueisassignedadataqualifierandthesetofflagsmaybeamoresimplesetdevisedforpublicuseofthedata.Impossibleormissingvalueswouldbeassignedanappropriatemissingvaluecodeandadataflagof“Missing”.Datawouldnolongerbeconsideredprovisionalandwouldbeunlikelytochange.
Level2(gap-filled)-Gap-filledorestimateddatainvolvinginterpretation-Thisisqualityenhanceddatawherecarefulattentionhasbeenappliedtoestimateorfillgapsindataortootherwisebuildderiveddatatoaccommodatedatauserneeds,forexampleestimategapsinasensorstreamusinganearbysensor.Asgap-fillingtypicallyinvolvesinterpretationandmayemploymultiplemodelsoralgorithms,otherversionsoflevel2datamaybeusedinpractice.Methodsemployedingap-fillingorderivingdatashouldbewell-described.
Aggregatingdatafromonetime-steptoanother,e.g.,creatingdailysummarydatafrom10minutedata,thatdoesnotinvolveanyinterpretationinthatsimplemeans,maximum,andminimumsaredeterminedwouldnotnecessarilyalterthequalitylevel.Thatis,meandailytemperaturedeterminedfromlevel1(published)datawouldstillretainaqualitylevel1.However,interpretationmaybeinvolvedwhendetermininganappropriatequalifierflagforthedailymean.Forexample,ifsomeofthe10minuteobservationsaremissingatwhatpointdoesthedailymeanalsobecomemissing(e.g.,morethan20%aremissing)orbecomequestionable(e.g.,morethan5%aremissing).ThistypeofprocessingmayyielddailymeanvaluesthatarebestdescribedasLevel2asinterpretationisinvolved.
Datacollectioninterval
Dataloggersofferthecapabilitytoeasilyoutputmeandatavaluesatmultipletimesteps,e.g.,10minutes,hourly,daily.SavingvaluesatmultipletimestepsmaypresentanextracomplicationintheQCprocessasseparatetablesareusuallystoredforeachtimestep.Whenasinglesensormeasurementisreportedatseparatetimesteps,conflictingQCresultsmayoccurifbothstreamsareQC’dindependently.Onestrategytosimplifythisproblemistooutputmostoralldataintheshortestcommontimestepandusepost-processingtostatisticallyaggregatethedataatlongertimesteps.Forexample,asystemmightQCandoutputthe10minutedataandthenaggregatehourlyanddailyvaluesfromthisfinerresolution10minutedatastream.Dataloggersmighttypicallycalculateandoutputdaily(24-hour)datastreams,butaccurateQCmaybeimpossibleastheexactvaluesusedinthisaggregationareunknown,andtheaggregationmaybeonlyrepresentingasubsetofvalues,e.g.,iftherewasapowerdiscontinuitytothelogger.However,theremaybecaseswheretheoutputofdailyvaluesbytheloggerareimportant.Forexample,aninstantaneousmaximumorminimumvaluebasedonasingleloggersamplewouldnotbecapturedthroughthisaggregation,andadailyminimumormaximumbasedona10minuteorhourlymeanoutputmaydiffersignificantlyfromtheinstantaneousvalue.
DataManagement
TimingofQCsystemprocesses
AutomatedQCsystemproceduresprovidethemosttimelyandefficientprocessingofstreamingdata.Theuseofsystemproceduresprovidesconsistentassignmentofdataflagsandremovesmuchofthesubjectivityinherentinmanualassignment.Ideally,theQCsystemwillbeemployedeverytimedataisacquired,e.g.,every10minutes,andsecondarilyoperateonhourlyordailytimeperiods.Morecomprehensivevisualorprogrammaticchecksortheassignmentofuncertaintyusingnearbyorotherrelatedsitesmightoccuratalatertime.Thefrequencyandtimingofamanualorvisualreviewprocesseswilldependonthedataflowatthesite,softwarestack,anddataprocessingcapabilities.Thenecessarytimeframefordatadeliveryofprovisionalversusfullyprocesseddatashouldbeconsidered.
DocumentationoftheQCprocesses
ThedocumentationofQCprocessesshouldidentifythenearreal-timestreamingQCmethodsincludingassumptionsandthresholds,andadditionalalgorithmsorvisualmethodsapplied.IfnoQCisappliedthatshouldbemadeapparent.DescriptionsofdataprocessingandQCworkflowsarealsousefulindescribingdataprovenanceandallworkflowversionsshouldberetained(Seeexampleworkflow).Datameasurementattributesandqualifierflagsshouldbedefined.
Fig.1ExampleofageneralmonitoringdataflowmodelfromNevadaResearchDataCenter,ScottyStrachan,UniversityofNevada,2014
TheapplicationoftheQCtestsemployedoranyalgorithmsappliedtoaggregate,estimateorgap-filldatashouldbedescribedforalldatalevels,anddatalevelscanpotentiallybedefinedinconjunctionwithadatareleasepolicy.Ideally,dataateachlevelshouldbelocallyarchived.Level0rawdatashouldberetainedlocallyinitsoriginal,unmanipulatedstate.Level1(published)orlevel2datamaybethebestcandidatesformoreformalarchiving.Datasetsshouldbetransparentlytaggedwithadataqualitylevelasdataarereleased.
Sensordatadocumentation
Developanduseacommonvocabularyandsyntaxforsensormeasurementattributenamesandfilenamingconventions.Researchorganizationswithmultiplesensorsitesmeasuringcommonsetsofparameterscangreatlyimproveefficiencyandmoreeasilyemployautomatedmethodswhenacommonvocabularyisemployed.Thesenamingconventionsshouldbeplannedfromtheoutsetintodataloggerprogramsandothersoftwareemployedwithinthedataflow.
Dataqualifierflagsprovidedocumentationforeachmeasuredvalueandshouldbeplacedalongsidethevalueasdatafilesareproducedforarchivalstorage.Anadditionalattributeormethodcodemayalsobeaddedtonoteshiftsinmethodorinstrumentationorotherkeychangesincollectionprocedures.Inclusionofamethodcodedirectlywithinthedatafileplaceskeydocumentationclosetothedatavalueandismorevisibletothedatauser.Inlong-termdatastreamswherethequalitylevelmaychangeovertime,e.g.,periodsoftimewheregap-fillingisemployed,adataqualityattributemightbeusedtoassigndataqualityattherecordormeasurementlevel.
BestPracticesReorganizedfrom:Campbellet.al.2013.
SensorQualityAssurance(QA)
MaintainanappropriatelevelofhumaninspectionReplicatesensors,n=3isoptimalSchedulemaintenanceandrepairstominimizedatalossHavereadyaccesstoreplacementpartsRecordthedate,time,andtimezoneofknowneventsthatmayimpactmeasurementsImplementanautomatedalertsystemtowarnaboutpotentialsensornetworkissues
QualityControl(QC)ondatastreams
EnsurethatdataarecollectedsequentiallyPerformrangechecksonnumericaldataPerformdomainchecksoncategoricaldataPerformslopeandpersistencechecksoncontinuousdataComparedatawithdatafromrelatedsensorsUseflagstoconveyinformationaboutthedataEstimateuncertaintyinthevalue,iffeasibleCorrectdataorfillgapsifitisprudent
Datamanagement
AutomateQA/QCproceduresRetaintheoriginalunmanipulateddataIndicatedataqualitylevelwitheachreleaseofthedataProvidecompletemetadataDocumentallQA/QCproceduresthatwereappliedandindicatedataqualitylevelDocumentalldataprocessing(e.g.,correctionforsensordrift)Retainallversionsofworkflowsandmetadata(dataprovenance).
CaseStudiesWearelookingforcasestudiesthatwilldescribesomecompleteQCsystems,QCprocessingandgeneralsetup(e.g.,numberandtypeofsensors,dataloggers,telemetry,etc.)ExamplesusingGCEToolbox,VistaDataVision,R,etc.wouldbeusefulGeneralworkflowexamplefromNevadaResearchDataCenter
ReferencesCampbell,JL,Rustad,LE,Porter,JH,Taylor,JR,Dereszynski,EW,Shanley,JB,Gries,C,Henshaw,DL,Martin,ME,Sheldon,WM,Boose,ER.2013.Quantityisnothingwithoutquality:AutomatedQA/QCforstreamingsensornetworks.BioScience.63(7):574-585.http://www.treesearch.fs.fed.us/pubs/43678
Taylor,JRandLoescher,HL.2013.Automatedqualitycontrolmethodsforsensordata:anovelobservatoryapproach,Biogeosciences,10,4957-4971doi
Daly,C,Redmond,K,Gibson,W,Doggett,M,Smith,J,Taylor,G,Pasteris,P,Johnson,G.15thAMSConf.onAppliedClimatology,AmericanMeteorologicalSoc.Savannah,GA,June20-23,2005.pdf
ResourcesQCResources
Campbelletal.2013Biosciencehttp://www.treesearch.fs.fed.us/pubs/43678TaylorandLoescher2013.Biogeoscienceshttp://www.biogeosciences.net/10/4957/2013/bg-10-4957-2013.pdfdoiDalyetal.2005.15thAMSConf.onAppliedClimatology,Amer.MeteorologicalSoc.http://ams.confex.com/ams/pdfpapers/94199.pdfCUAHSIhttp://wdc.cuahsi.org/wdc/Docs/ODM1.1DesignSpecifications.pdfNOAASatelliteandInformationService(NationalClimateDataCenter)http://www.ncdc.noaa.gov/oa/climate/ghcn-daily/Carbondioxideinformationanalysiscenterhttp://cdiac.ornl.gov/epubs/ndp/ushcn/daily_doc.htmlSeaDataNethttp://www.seadatanet.org/Standards-Software/Data-Quality-ControlDataQualityAssessment:StatisticalMethodsforPractitionershttp://www.epa.gov/quality/qs-docs/g9s-final.pdf
Flagsetexamples
NOAANationalClimaticDataCenterhttp://www.ncdc.noaa.gov/oa/hofn/coop/coop-flags.htmlCampbelletal.2013Bioscience(Seep.580)http://www.treesearch.fs.fed.us/pubs/43678
Dataqualitylevel
NEONhttp://www.neoninc.org/documents/513CUAHSIhttp://his.cuahsi.org/documents/ODM1.1DesignSpecifications.pdf,pp.19-20,57-58Amerifluxhttp://public.ornl.gov/ameriflux/available.shtmlEarthScienceReferenceHandbookhttp://eospso.gsfc.nasa.gov/sites/default/files/publications/2006ReferenceHandbook.pdf(p.31)ILRSDataproducts:(CODMAC-CommitteeonDataManagement,ArchivingandComputing)http://ilrs.gsfc.nasa.gov/about/reports/9809_attach7b.html
Sensor Data Archiving
Introduces different approaches and repositories for
archiving and publishing data sets of sensor data.
ReturntoEnviroSensingClustermainpage
Contents1Overview2Introduction3Methods
3.1PublishingofSnapshots3.2PersistentIdentifiers3.3Versioning3.4DataStorageFormats3.5DataStorageStrategies
4BestPractices5CaseStudies
5.11.LTERNIS5.22.KNB5.33.GIWS,UniversityofSaskatchewanWISKIdataarchive
6Resources7References
OverviewArchivingdatasnapshotsandusingappropriatemetadataandpackagingstandardscanincreasethelongevityanddiscoveryofdataimmensely.However,theselocalcurationtechniquesarestillsusceptibletothreatstotheprojectsorinstitutionsthatmaintainthelocalarchive.Peopleincriticaltechnologypositionsthatmaintainarchivesmaychangecareersorretire,projectscanlosefunding,andinstitutionsthatseemsolidcandissolveduetochangesinpoliticalclimate.Forthesereasons,partneringacrossinstitutionstoprovidearchivalservicesofdatacangreatlyincreasetheprobabilitythatdatawillremainaccessiblefordecadesorintothenextcentury.
Inthischapter,wediscusstechniquesandissuesinvolvedwitharchivingdataonamulti-decadalscale.Forsensordata,wepromotetheuseofperiodicdatasnapshots,persistentidentifiers,versioningofdataandmetadata,anddatastorageformatsandstrategiesthatcanincreasethelikelihoodthatdatawillnotonlybeaccessibleintothefuture,butwillalsobeunderstandabletofutureresearchers.
IntroductionAdataarchiveisalocationthathasareasonableassurancethatdataandthecontextualinformationneededtointerpretthedatacanberecoveredandaccessedaftersignificantevents,andultimatelyafterdecades.Dataarchivesshouldbemaintainedthroughbackupstrategiessuchasredundancyandoffsitebackup,inmultiplelocationsandthroughinstitutionalpartnerships.Archivingactivitiesshouldhaveinstitutionalcommitment,andideallycross-institutionalcommitment.Archivesmaybelocallymaintained,maybepartofanationalornetwork-widearchiveinitiative,orboth.Forrawdata,anarchivecanbealocalorregionalfacility,whereasqualitycontrolled,‘published’datashouldbearchivedinacommunity-supportednetworkarchiveandavailableonline.
Environmentalresearchscientistsareinneedofaccessingstreamingdatafromsensornetworksbothprovisionallyinnearreal-time,afterQA/QCprocessing,andinfinalformforlong-termstudies.Withoutappropriatearchivingstrategies,dataareatgreatriskoftotallossovertimeduetoinstitutionalmemoryloss,institutionalfundingloss,naturaldisasters,andotheraccidents.Thesetypicallyincludenear-termaccidentsandlong-termdataentropyduetocareerandlifechangesfortheoriginalinvestigator(s)[Michener1997].Data,andthemethodsusedtogenerateandprocessthem,areofteninsufficientlydocumented,whichmayresultinmisinterpretationofthedataormayrenderthedataunusableinlaterresearch.Likewise,lackofversioncontroloruseofpersistentidentifiersforallfilescausesdownstream
confusion,andhindersreproduciblescience.
Datamanagersareincreasinglyaskedtobothpreserverawdatastreamsandtoadditionallyprovideautomated,nearreal-timequalitycontrolandaccesstoprovisionaldatafromthesesensors.Typically,theseprovisionaldatastreamsundergofurthervisualandotherqualitycheckingandfinaldatasetsarepublished.Commonly,furtherinterpretationoccurswheresomemissingdataaregap-filledthroughimputationprocedures,orfaultydataareremoved.Thereisastrongneedtoarchivethesedatastreamsandprovidecontinuedaccess,whichultimatelysafeguardstheinvestmentofbothtimeandmoneydedicatedtocollectthedatainthefirstplace.Thereareanumberoforganizational,storage,formatting,anddeliveryissuestoconsider.However,fourmainarchivingstrategiesshouldbeused:creatingwelldocumenteddatasnapshots,assigningunique,persistentidentifiers,maintainingdataandmetadataversioning,andstoringdataintext-basedformats.Thesepractices,describedbelow,willincreasethelongevityandinteroperabilityofthedata,andwillpromotetheirusefulnesstocurrentandfutureresearchers.
Methods
PublishingofSnapshots
Generatingperiodicsnapshotsofnearreal-timesensorstreamsallowsthedatatobestoredanddescribedinadeterministicmanner.Theratethatsnapshotsareproduceddependsontheneedsofthecommunityusingthedata,buttypicallysnapshotfilesareorganizedusinghourly,daily,weekly,monthly,orannualdatasets.Italsodependsonthesamplerateandsamplesize.Producingthousandsoftinydatafiles,oronefilewithgigabytesofdata,woulddecreasetheusefulnessofthedatafromatransferandhandlingperspective.Makeiteasyontheresearchersusingthedata,andsizethesnapshotsappropriately.
Withoutdetaileddocumentationofthecontextualinformationneededtointerpretindividualmeasurements,evenwell-archiveddatawillberenderedunusable.Developmetadatafilestoaccompanythedatausingamachine-readablemetadatastandardappropriatetothecommunityusingthedata.CommonstandardsincludetheISO19115GeographicInformationMetadata[ISO/TC211,2003],theContentStandardforDigitalGeospatialMetadata(CSDGM)[FGDC,1998],theBiologicalProfileoftheCSDGM(FGDC,1999],andtheEcologicalMetadataLanguage[Fegrausetal.,2001].AlsoconsiderdocumentingsensordetaileddeploymentsettingsandprocesseswithSensorML[OGC,2000].
Likewise,snapshotsofdatathatrepresentatime-seriesshouldbedocumentedandpackagedappropriatelysuchthattherelationshipsamongfilesareclear.Manyoftheabovemetadatastandardshavetheirownmeansoflinkingdatawithmetadata,howevertheyareallimplementeddifferently.FederatedarchivingeffortssuchasDataONEhaveadopted‘resourcemaps’[Lagoze,2008]todescriberelationshipsbetweenmetadataanddatafilesinalanguage-agnosticmanner.(SeeDataONEpackagingintheResourcessection,andtheOpenArchivesInitiativeOREprimer).Considerpublishingresourcemapsofyourdataandmetadatarelationshipstoimproveinteroperabilityacrossarchiverepositories.
Oncedatacollectionsaresufficientlydescribed,deliverycanalsobeachallenge.Whileprovidingresolvablelinksdirectlytothemetadataanddatafilesisagoodpractice,scientistsoftenwouldliketobeabletodownloadfullcollections.Providingaservicethatpackagesfilesintoadownloadablezipfileiscommonplace,butrelationshipsbetweendataandmetadatacanbelost.ConsiderusingtheBagItspecification(seeBagItintheResourcessection)[Boyko,2009],whichprovidessimpleadditionstozipfilessuchasamanifestfilethatmaintainsthemachine-readablerelationshipsbetweentheitemsinthecollection,whilestillallowingresearcherstodownloaddatapackagesdirectlytotheirdesktop.
PersistentIdentifiers
Theabovesnapshotarchivingstrategieshingeontheabilitytouniquelyidentifyeachfileorcomponentofapackageinanunambiguousmanner.Filenamescanoftencollide,particularlyacrossunrelatedprojects.So,assigningunique,persistentidentifierstoeachfile,andtheoriginatingsensorstream,isparamounttosuccessfularchiving.Apersistentidentifierisusuallyatext-basedstringthatrepresentsanunchangingsetofbytesstoredonacomputer.Persistentidentifiersshouldbeassignedtosciencedataobjects,science
metadataobjects,andotherfilesthatassociatethedataandmetadatatogether,suchasresourcemaps.Opaqueidentifierstendtobebestforpersistenceanduniqueness(likeUUIDs),butcanbelessmemorable.SystemssuchastheDigitalObjectIdentifierservice(DOI)andEZIDcanhelpinmaintainingunique,resolvableidentifiers(seeUUIDs,DOIs,andEZIDintheResourcesSection).Eachversionofafileorproductsderivedfromfiles(seeversioningbelow)shouldalsohaveapersistentidentifier.Ifsnapshotsofdataarebeingextendedwithnewdata,anewversionofthedatasetneedstobepublished.Shorteridentifiersarebest,andavoidusingspacesandotherspecialcharactersinidentifierstoincreasecompatibilityinfilesystemsandURLs.Ultimately,theuseofpersistentidentifiersallowsassociatedmetadatatotracktheprovenanceofcleaned,qualityassureddataorotherderivedproducts,andpromotesreproduciblescienceandcitabledata.
Versioning
Datafromsensorstreamsareusuallyconsidered‘provisional’untiltheyhavebeenprocessedforqualitycontrol,andmultipleversionsofthedatamaybegenerated.However,provisionaldataareoftenusedinpublicationsandarecitedassuch.Thatsaid,inordertosupportreproduciblescienceusingsensordata,eachversionshouldbemaintainedwithit’sowncitableidentifier.Overwritingfilesordatabaserecordswithnewvaluesorwithannotatedflagswillultimatelychangetheunderlyingbytes,andeffectivelybreakthe‘persistence’oftheidentifierpointingtothedata.Thisappliestometadataorpackagingversionsaswell,andsocaremustbetakentoplaninversioningwithinyourstoragesystem.Yourversioningstrategiesofrawdatawillbedependentonyoursnapshotstrategies(e.g.appendingtohourlyfiles,thensnapshottingandupdatingmetadatafiles,oralternatively,say,producingdaily,weekly,monthly,orannualpackagesthatincludedatafilesandmetadatafilesforthetimeperiodofcovered).However,bymakingcitableversions,researcherswillbeabletoaccesstheexactbytesthatwereusedinajournalpublication,andpeerreviewofstudiesinvolvingsensordatastreamswillbemorerobustanddeterministic.
DataStorageFormats
Sensordatamaybestoredindifferentstructures,eachwithitsownadvantagesanddisadvantages.Asuiteofvariablesfromonestationandcollectedatthesametemporalresolutionmaybestoredwithinonewidetablewithacolumnforeachvariable,eachtimebeingonerecordofseveralvariables.Alternativesmightbeatableforeachvariableoronetableoftheformatof[time,location,variable,value].Thislattersystemmaybevaluecentricwithmetadataattachedtoeachvalueorseriescentricwithmetadataattachedtoacertaintimeintervalforonevariable(e.g.,atimeseriesofairtemperaturebetweencalibrations).Nomatterhowyouorganizeyourdata,long-term,archivalstoragefileformatsneedtobeconsidered.Inthedigitalage,thousandsoffileformatsexistthatarereadablebycurrentsoftwareapplications.However,someformatswillbemorereadableintothefuturethanothers.Asanexample,MicrosoftExcel1.0files(circa1985),arenotreadablebyMicrosoftExcel2012sincethebinaryformathaschangedovertimeinabackward-incompatiblemanner.Therefore,unlessthesefilesarecontinuallyupdatedyearafteryear,theywillberenderedunusable.Thesameistruefordatabasesystemfiles(.dbf)thatholdtherelationaltablestructuresincommonlyuseddatabasessuchasMicrosoftSQLServer,Oracle,PostgreSQL,andMySQL.Databasefilesmustbeupgradedwitheverynewdatabaseversionsotheydonotbecomeobsolete.Agoodruleofthumbistoarchivedatainformatsthatareubiquitous,andarenottiedtoagivencompany’ssoftware.ArchivedatainASCII(orUTF-8)textfilespreferably,sincethisformatisuniversallyreadableacrossoperatingsystemsandsoftwareapplications.IfASCIIencodingwouldcausemajorincreasesinfilesizesformassivedatasets,considerusingabinaryformatthatiscommunity-supportedsuchasNetCDF.NetCDFisanopenbinaryspecificationdevelopedattheUniversityCooperativeforAtmosphericResearch(UCAR),andprovidesopenprogramminginterfacesinmultiplelanguages(C,Java,Python,etc.)thataresupportedbymanyscientificanalysispackages(Matlab,IDL,R,etc.).Bychoosinganarchivestorageformatthatisn’ttiedtoaspecificvendor,datafileswillbereadableindecadestocomeevenwheninstitutionalsupportformaintainingmorecomplexdatabasesystemsfallsshort.
DataStorageStrategies
Forlocalarchives,themostcommonstoragestrategyistojustdirectlystorefilesinahierarchicalmanner
onafilesystem.Manydatamanagersuseamixoflocation,instrument,andtime-basedhierarchytostorefilesintofolders(e.g./Data/LakeOneida/CTD01/2013/06/22/file.txt).Thisisaverysimple,reliablestrategy,andmaybeemployedbygroupswithlittleresourceforinstallingandmanagingdatabasesoftware.However,filesystem-basedarchivesmaybedifficulttomanagewhenvolumesarelarge,orthenumberofinstrumentsorvariablesaregrowinganddon’tfitastraighthierarchicalmodel.
Manygroupsuserelationaldatabasestomanagesensordataastheystreamintoasite’sacquisitionsystem.Wellknownvendor-basedsolutionslikeOracleDatabaseandMicrosoftSQLServerareoftenused,aswellasopensourcesolutionssuchMySQLandPostgreSQL.Thesesystemsprovideameansofdataorganizationthatcanpromotegoodqualitycontrolandfastsearchingandsubsettingbasedonmanyfactors,beyondthetypicallocation/instrument/datehierarchydescribedabove.Databasesalsoprovidestandardizedprogramminginterfacesinordertoaccessthestoreddatainstandardwaysacrossmultipleprogramminglanguages.Fromanarchiveperspective,theuseofdatabasesmayhelpinmanagingdatalocally,butshouldbeseenasonecomponentofaworkflowtogetdataintoarchivalformats.
Localdatabasesusedformanagingdatashouldbebacked-upregularly,ideallytoanoffsitelocation,andtheunderlyingbinarydatabasefileformatsshouldregularlybeupgradedtothenewest,supportedversionsofthedatabasesoftware.Ultimately,datastoredindatabasesshouldbeperiodicallysnapshottedandstoredinarchivalfileformats,describedabove,withcompletemetadatadescriptionstoenhancetheirlongevity.
Althoughlocalfilesystemsanddatabasesarethemostpracticalmeansofmanagingdata,theyareoftenatriskofbeingdestroyedorunmaintainedoverdecadalscales.Naturaldisasters,computerfailure,staffturnover,lackofcontinuedprogramfunding,andotherrisksshouldbeaddressedwhendecidinghowtoarchivedataforthelongterm.Oneofthestrategiesforbestdataprotectioniscross-institutioncollaborationsthatprovidestorageservicesfortheirparticipants.Thesesortsofarrangementscanguardagainstinstitutionalorprogramdisollution,lackoffunding,etc.ConsiderpartneringwithcommunitysupportedarchivessuchastheLTERNIS,orfederatedarchiveinitiativessuchasDataONEtoarchivesnapshotsofstreamingsensordata(seebothintheResourceSection).
BestPracticesThefollowinglistofbestpracticesaretakenfromtheaboverecommendations,aswellasadditionalconsiderationswhenarchivingsensor-deriveddata.
Developandmaintainanarchivaldatamanagementplansuchthatpersonnelchangesdon’tcompromiseaccesstoorinterpretationofdataarchives(potentiallythroughUniversityLibraryprograms)
Employasounddatabackupplan.Archiveddatashouldbebackeduptoatleasttwospatiallydifferentlocations,farenoughapartthattheywon’tbeaffectedbythesamedestructiveevents(naturaldisasters,powerorinfrastructureissues).Performdailyincrementalbackupsandweeklycompletebackupsthatmaybereplacedperiodically,andannualbackupsthatwon’tchange.(Crashplan,acronis)
Generateperiodicsnapshotsofnearreal-timesensorstreams(acronis)
Developmetadatafilestoaccompanythedatausingamachine-readablemetadatastandard
Assignpersistentidentifierstosciencedataobjects,sciencemetadataobjects,andotherfilesthatassociatethedataandmetadatatogether
Maintainversionedfileswiththeirowncitableidentifier
PreferablyarchivedatainASCII(orUTF-8)fortextfiles,orcommunitysupportedformatslikeNetCDFforbinaryformat
Archiveallrawdata,butallrawdatadonotnecessarilyneedtobeavailableonline.However,assignapersistentidentifiertoeachrawdatafiletobeabletodocumentprovenanceofthepublished,qualitycontrolleddata.
Partneracrossinstitutionstoprovidearchivalservicestomitigateprogrammaticlosses
PreferablymakedatapubliclyavailablethathaveappropriateQA/QCproceduresapplied.
AssignadifferentpersistentidentifierforpublisheddatasetsofdifferentQClevelsinanarchive.Inthemethodsmetadata,documenttheprovenanceandqualitycontrolproceduresapplied.
Documentcontextualinformationforeachdatapoint.i.e.,inadditiontoassigningaqualityflag,assignamethodsflagwhichdocumentsfieldeventslikecalibrations,smallchanges,sensormaintenance,sensorchangesetc.Includenotesthathandleunusualfieldevents(e.g.,animaldisturbanceetc.)Encodemetadataforsensor-deriveddatausingcommunityandornationallyacceptedstandards.
Ensurethetimezoneforalltimestampsiscaptured.Dataloggerarebeingmanuallysettoacertaintime.Considerdaylightsavings.
Establishthemeaningfulnamingconventionsforyourvariablestakingintoaccountthetypeofobservationthatisarchived,adjectivesdescribingthelocation,instrumenttype,andothernecessaryvariabledeterminants.
Determinetheprecisionforyourobservationvaluesinadvance
Preferablyfollowanamingconventionorcontrolledvocabularyforvariables(SeetheResourcesSection)
Avoidusingdatabasesforarchivalstorage,butusethemformanagementandqualitycontrol.However,ifdatabasesareusedformanagingsensordatathenperiodicsnapshotsintoASCIIoropenbinarydataformatsarerecommended.
Trackchangestodatafileswithinmetadatafilestomaintainanaudittrail
CaseStudies
1.LTERNIS
TheNSFLong-TermEcologicalResearchNetworkInformationSystem(LTERNIS)isthecentraldataarchiveforalldatageneratedbyLTERresearchandrelatedprojects.AlldataincludingsensordataaresubmittedwithmetadataintheEcologicalMetadataLanguage(EML).DataarepubliclyavailablethroughthisportalandthroughDataONE,ofwhichtheLTERNISisamember.Specificapproachestoarchivestreamingsensordataarefollowingthebestpracticerecommendationsgiveninthisdocument:Datasetsaresubmittedassnapshotsintimeanditisuptothesiteinformationmanagerstodecidethelengthoftimeineachsnapshot,i.e.,howfrequentlyanewdatasetissubmitted.Minimallyqualitycontrolleddatasetsaresubmitted,whiletherawdataarearchivedateachsite.AsofthiswritingnostandardsforqualitycontrollevelsordataflagginghavebeenadoptedbytheLTERcommunity.
FeaturesoftheLTERNISinclude:
PublicavailabilityofdataandmetadataCongruencecheck-qualitycheckofhowwellthemetadatadescribethestructureofthedataUseofpersistentidentifiersStrongversioningofmetadataanddatafilesinthesystemMemberNodeofDataONESupportofLTERandrelatedprojectsdatastorageusingaccesscontrolrules
Replicationofdataandmetadataacrossgeographicallydispersedservers
2.KNB
TheKnowledgeNetworkforBiocomplexity(KNB)isaninternationalnetworkthatfacilitatesecologicalandenvironmentalresearchonbiocomplexity.Forscientists,theKNBisanefficientwaytodiscover,access,interpret,integrateandanalyzecomplexecologicaldatafromahighly-distributedsetoffieldstations,laboratories,researchsites,andindividualresearchers.TheKNBrepositoryhasbeenstoringandservingdataforoveradecade,andstoresover25,000datasets.
FeaturesoftheKNBinclude:
PublicavailabilityMetacat,anopensourcedatamanagementsystemMorpho,anopensource,desktopmetadataeditorSupportforanyXML-basedmetadatalanguage,butoptimizedfortheEcologicalMetadataLanguageUseofpersistentidentifiersStrongversioningofallfilesinthesystemSupportforcross-metadatapackagingusingresourcemapsCross-institutionalpartneringwiththeLTERandDataONESupportforbothpublicandprivatedatastorageusingaccesscontrolrulesReplicationofdataandmetadataacrossgeographicallydispersedserversInternationalparticipation,andsupportformulti-languagemetadatadescriptions
RecentdevelopmentsoftheKNBincludesupportfortheDataONEprogramminginterface(API)inboththeMetacatandMorphosoftwareproducts.ThisAPIpromotesinteroperabilityofarchivalrepositories,andenablesfederatedaccesstoenvironmentaldata.SincetheKNBproductssupportthisopenAPI,anyonecancreatetheirownwebordesktopapplicationsthatareoptimizedfortheirresearchcommunity.
3.GIWS,UniversityofSaskatchewanWISKIdataarchive
GlobalInstituteforWaterSecurity(GIWS)attheUniversityofSaskatchewanisdirectlyinvolvedinthecollectionofthefielddatafromdifferentresearchareasincludingRockyMountains,BorealForest,Prairie,andothers.
Inadditiontothe“in-house”manageddata,GIWSusesexternaldatasetsfromorganizationssuchasEnvironmentCanadaandAlbertaEnvironment.DatamanagementplatformonwhichGIWScurrentlyoperatesistheWaterInformationSystemKisters(WISKI).ThissystemisusedtogetherwithCampbellScientificLoggerNetsoftwareandcustom.NETmodulesinautomatedtasksthathandledatacollection,centralizeddataprocessing,storing,andreporting.Afterprocessing,theenvironmentaldatasetsarepublishedandmadeavailabletospecificgroupsofusersthroughtheKistersWISKIWebProwebinterfaceandKiWISwebservice.Bothapplicationscanquerythecentralizeddatabaseandreturndataintheformatsthatareusedforvisualizationorfurtherprocessinganddisseminationpurposes.
FeaturesoftheGIWSsystemincludepublicavailability,useofpersistentidentifiers,supportforcross-institutionalpartnering,dataaccesscontrolfordifferentgroupsofusers,supportforOGCWaterML2dataformat.[SeeGIWSintheResourcesSection]
ResourcesBagItZipfileformat:https://wiki.ucop.edu/display/Curation/BagIt
DataONE:http://www.dataone.org/what-dataoneandhttp://www.dataone.org/participate
DataONEPackaging:http://mule1.dataone.org/ArchitectureDocs-current/design/DataPackage.html
DigitalObjectIdentifier(DOI)System:http://doi.org
EcologicalMetadataLanguage:http://knb.ecoinformatics.org/software/eml/
FGDCMetadataStandards:http://www.fgdc.gov/metadata/geospatial-metadata-standards
GIWS:http://giws.usask.ca/documentation/system/GIWS_WISKI.pdf
ISO19115metadataStandard-GeographicInformationhttp://www.iso.org/iso/home/store/catalogue_ics/catalogue_detail_ics.htm?csnumber=53798
EZIDIdentifierService:http://n2t.net/ezid
LTERNetworkInformationSystem:http://nis.lternet.edu
OpenArchivesIntiativeObjectReuseandExchange(ResourceMaps)Primer:http://www.openarchives.org/ore/1.0/primer.html
UniversallyUniqueIdentifiers(UUID):http://en.wikipedia.org/wiki/Universally_unique_identifier
CFMetadatahttp://cf-convention.github.io/
ReferencesBiologicalDataWorkingGroup,FederalGeographicDataCommitteeandUSGSBiologicalResourcesDivision.1999.CONTENTSTANDARDFORDIGITALGEOSPATIALMETADATA,PART1:BIOLOGICALDATAPROFILE.FGDC-STD-001.1-1999https://www.fgdc.gov/standards/projects/FGDC-standards-projects/metadata/biometadata/biodatap.pdf
Boyko,A.,Kunze,J.,Littman,J.,Madden,L.,Vargas,B.(2009).TheBagItFilePackagingFormat(V0.96).RetrievedMay8,2013,fromhttp://www.ietf.org/id/draft-kunze-bagit-09.txt
Fegraus,E.H.,Andelman,S.,Jones,M.B.,&Schildhauer,M.(2005).Maximizingthevalueofecologicaldatawithstructuredmetadata:anintroductiontoEcologicalMetadataLanguage(EML)andprinciplesformetadatacreation.BulletinoftheEcologicalSocietyofAmerica,86(3),158-168.http://dx.doi.org/10.1890/0012-9623(2005)86%5B158:MTVOED%5D2.0.CO;2
Lagoze,Carl;VandeSompel,Herbert;Nelson,MichaelL.;Warner,Simeon;Sanderson,Robert;Johnston,Pete(2008-04-14),"ObjectRe-Use&Exchange:AResource-CentricApproach",arXiv:0804.2273v1[cs.DL],arXiv:0804.2273
MetadataAdHocWorkingGroup,FederalGeographicDataCommittee.1998.CONTENTSTANDARDFORDIGITALGEOSPATIALMETADATA.FGDC-STD-001-1998.http://www.fgdc.gov/standards/projects/FGDC-standards-projects/metadata/base-metadata/v2_0698.pdf
Michener,WilliamK.,JamesW.Brunt,JohnJ.Helly,ThomasB.Kirchner,andSusanG.Stafford.1997.NONGEOSPATIALMETADATAFORTHEECOLOGICALSCIENCES.EcologicalApplications7:330–342.http://dx.doi.org/10.1890/1051-0761(1997)007[0330:NMFTES]2.0.CO;2
OpenGeospatialConsortium,Inc.(OGC).Url:http://www.opengeospatial.org(2010).