Upload
buinhi
View
216
Download
1
Embed Size (px)
Citation preview
Integrated Data Management System
for Critical Zone Observatories CZOData II
Mark Williams, UC-Boulder.
Anthony K. Aufdenkampe, SWRC.
Kerstin Lehnert, IEDA/Columbia.
Ilya Zaslavsky, SDSC.
David Tarboton, USU
Jeff Horsburgh, USU.
Emilio Mayorga, UW-APL
Goals for CZOData II
• extensive and iterative interaction and feedback from the
community of CZO PIs, scientists and data managers
• uniform web portal appearance for the CZO sites and the
national CZO program
• development of a consistent metadata strategy for CZO data,
supported by a respective collection of data submission forms
and tools
• enhancing publication and data discovery workflows for
geochemical, hydrologic,spatial and other data
• creating a uniform data discovery portal
• ensuring that the data descriptions follow consistent semantics
• integrating with the EarthChem system
• developing a consistent online data visualization interface for
CZO time series data
CZOData II Architecture
LocalCZOs
LocalCZOWebSite
CZODisplayFiles
Standards-based
WebServiceClients
EarthChem
CZOMainWebPortal
CZOMainWebSite
Open-Topography(LiDAR)
CUAHSIHIS
CZOCentralDataManagementSystem
CZOCentralHarvester
LocalCZOs CZOCentralCoordina onFunc ons
CZOCentralData
Repositories
Non-CZOIntegratedData&DiscoverySites
Clients
CZO-ISGNRegistra on
System
SharedVocabularySystem
CZchemDBSystem
(w/EarthChemService
Interface)
CZOCentralHydroSystem(w/WFS&CUAHSIHISService
Interface)
DataONE
DataONEInterface
CZOMetadataCatalog
(w/CSWServiceInterface)
SESAR
DataManagement
Tools
CZODataDiscoveryPortal
TimeSeriesDataDisplay&AccessTool
CZchemDBDataAccess
Tool
Community Involvement
• Instigate and Support an Information Management
Committee (IMC)
• 1-2 investigators/CZO + site data managers
• Monthly telecon & annual face to face meeting w/ CZOData
developers
• Feedback to CZOData team
• Data use scenarios
• Meta-data requirements, shared vocabulary, etc.
• Web-based information events and workshops (>2/year);
mailing lists
• Subdiscipline workshops (three workshops)
• Hydrology (all sensor-based data)
• Geochemistry (all sample-based data)
• Geospatial data
• Synthesis working group (two workshops)
Get Started Now
• Form IMC and set first workshop date
• Content for new website
• LIDAR to OpenTopgraphy
• Start registering samples with SESAR
• Start registering datasets with IEDA
Challenges to CZO Data Management
Atmosphere
Biosphere
Hydrosphere
Lithosphere
Many Object & Data Types!
• Diverse media
• Sensor-based
• Stationary
• Mobile
• Spectra/photos
• Sample-based
• Sub-samples
• Preparations/Fractions
• Numeric & Categorical
Hillslope Catchment Watershed
Minutes
Decades
Millenia
Eons
Examples from Different Disciplines
• Climate & Hydrology
• Point observation (sensor) time series
• Raster observation (remote sensing) time series
• Vector networks for water routing
• Geochemistry
• Sample-based lab analyses
• Geophysics
• Seismic and other subsurface profiles
• Biology
• Phylogenic trees
Sensor- vs. Sample-Centric Data
• Sensor-centric Data Models (i.e. ODM)
• Site DataValue
• Sample-centric Data Models (i.e. EarthChem)
• Site Sample/Subsample Prep/Batch DataValue
GeoChemical Data Model
observed value
publication data
source
method/DQ
sample feature of interest
collection,
geospatial
analysis
material
preparation,
obs. point
CZO Chemistry Database Schema
08_Precision
(PK) precisionID (FK) MediumID (FK) variableCode (FK) methodID detectLimit stDeviation (FK) unitID precNote
31_SampleMedium
(PK) meduimID mediumName mediumNote
CZO_CHEM_DB_SCHEMA V4
PK – Primary Key FK – Foreign Key
Lookup tables
Main data
Meta data
1 : 1
1 : n
LEGEND:
Note: All contactIDs , authorID, and scientistID are linked to the personID in the table “Person”
91_ReferenceGroup
(FK) sourceID (PK) refGroupID (FK) projectID (FK) referenceID (FK) contactID refGroupNote
93_Project
(PK) projectID projTitleAbbrv projTitleFull citationFull projSponsorID projStartYr projEndYr (FK) contactID projNote
93_ProjScientist
(FK) projectID (FK) scientistID scientistRole
09_Source
(PK) sourceID (FK) contrabutorID sourceNote
92_Reference
(PK) referenceID (FK) corAuthorID yearPub articleTitle journalName bookTitle bookeditor bookPublisher jourVolume jourIssue jourPages citationFull refWebURL refNote
PSU-EESI
Feb. 16, 2010
71_MethodType
(PK) methdTypeID mthdTypeName mthdTypeNote
11_State
(FK) countryCode (PK) stateCode stateAlphaCode stateNumericCode stateName stateCategory
11_Country
(PK) countryCode countryName countryNumericCode countryAlpha2 countryNameFull
02_Site
(FK) locationID (PK) siteID siteName longitudeDeg latitudeDeg elevation_m slopeDeg aspect landscapePosition landUse vegSpecies parentLithology exposureAge erosionRate depthToRock_m soilTaxonomy (FK) SSURGO_ID siteNote
01_Location
(FK) stateCode (PK) locationID locNameFull locNameAbbrv annlPrecip_mm anlMeanTemp_oC (FK) contactID locNote
04_Preparation
(FK) subSampleID (PK) prepID (FK) methodID (FK) contactID prepNote
03_Sample
(FK) siteID (PK) sampleID (FK) smplMediumID depthTop_cm depthBot_cm waterTemp_oC samplingDate smplLocalTime smplUTCTime (FK) methodID (FK) contactID sampleNote
05_Analysis
(FK) prepID (PK) analysisID labName
analysisDate
(FK) sourceID (FK) methodID (FK) contacted analyNote
03_SubSample
(FK) sampleID (PK) subSampleID splitNumber (FK) methodID (FK) contactID subsmplNote
06_DataValue
(PK) dataID (FK) analysisID (FK) variableCode dataValue (FK) unitID dataNote
61_VariableLookup
(PK) variableCode variableName (FK) varTypeID varNote
62_VariableType
(PK) varTypeID varTypeName varTypeNote
64_Units
(PK) unitID unitCode unitName unitNote
72_Standard
(PK) methdStdID (FK)methodID mthdStdNote ???
10_Person
(PK) personID lastName firstName (FK) instituteID departmentName eMail phoneNumber faxNumber persnAddress persnTitle persnNote
10_Institute
(PK) instituteID instName instNameAbbrv (FK) countryCode (FK) stateCode instCity instZipCode instAddress instPhone instWebURL (FK) contactID instNote
07_Method
(PK) methodID methodName mthdNameAbbrv mthdDescription equipmentName (FK) mthdTypeID (FK) contactID mthdNote
Sample Fractions for Soil Geochemistry
EA-IRMS
FTIR
SA
EA-IRMS
FTIR
EA-IRMS
FTIR
Ziplock (~500g)
Bulk soil
horizon or
depth increment
Al Can (~70 g)
For Gamma
Counting 137Cs
DRY SIEVE
2 mm
glass vial:
<2mm fines
dry sieved
(1) Pick out plant
roots & detritus,
rinse with DI
water, oven dry,
mill (SPEX?)
>2mm:
glass vial:
plant detritus
milled
(2) Remaining
pebbles & rocks,
hard grind
glass vial:
pebbles
hard ground
<2mm
ICP-MS after
Li-borate fusion
XRD?
WET SIEVE, or DENSITY, or
SETTLING
(with or without sonication)
glass vial:
sand +
small detritus
glass vial:
silt + clay
The choice here is
important. Do we want
aggregates or not?
EA-IRMS
FTIR
ICP-MS after
Li-borate fusion
XRD
CEC
SPEX mill
EA-IRMS
FTIR
ICP-MS after
Li-borate fusion
SPEX mill
SA
XRD
CEC
SA
Extractions
Dithionite-Citrate extraction
Na pyrophosphate extraction
Ammonium oxalate extraction
Geoinformatics for Geochemistry
Core
Core
Section 1
Core
Section 3
Core
Section 2
Sample 1
Sample 2
Sample 1
Sample 2
Sample 3
Sample 1
Sample 2
Sample 3
Rock powder
Mineral conc.
Leachate
Fossil separate
Microprobe mount
Parent Parent Child
Child Child Parent
IGSN:XXX000120
IGSN:XXX0065B3
IGSN:XXX9K23G6
IGSN:XXX07ST4K
IGSN:XYZ0G693M
IGSN:ABC0L98SW
IGSN:ABC0L53NW
IGSN:ABC0L653X
IGSN:ABC078HGB
Needed Capabilities for ODM
Sample table
• Optional direct link between Sample & Site
• Need to assign SampleID before data values exist
• Natural one-to-many hierarchy
• 1 site many samples, 1 sample many values
• Recursive parent-child relationships
• Sample metadata
• Medium, fraction, preservation, container, dilution, etc.
ODM v1.1 Suggested
Sample
SampleID (PK)
SampleType
LabSampleCode
LabMethodID (FK)
Sample
SampleID (PK)
SampleCode
SampleNote
IGSN
FieldSampleFlag
LocalDateTime
UTCOffset
MediumTypeID (FK)
FractionTypeID (FK)
MethodID (FK)
SourceID (FK)
Table Notes
alpha-numeric, ~ 20 char
~200 char
Intl. Geo-Sample Number
Y/N
Creation, when container filled
i.e. surface water, soil gas, soil solid
i.e. whole sample, >63 um, acid extract
Types: collection, fractionation, prep.
Who performed method above
ParentSample
ParentSampleID (FK)
ChildSampleID (FK)
FK to SampleID
FK to SampleID Notes:
• Method Type should not include analysis method, b/c its in the values table.
• LIMS info is recorded in Values table (i.e. sample amount, budget #, dilution ratio, sample location, container type)
• Analysis “Batch” or “Run” is treated as a sample group
• ParentSample table allows for composite samples
SiteSample
SiteID (FK)
SampleID (FK)
ODM v1.1 Suggested
GroupDescriptions
GroupID (PK)
GroupDescription
Groups
GroupID (PK)
GroupCode
GroupNote
GroupTypeID (FK)
Table Notes
alpha-numeric, ~ 20 char
~200 char
Types: Value, Sample, Site, Person?, etc.
SiteGroups
SiteID (FK)
GroupID (FK)
Notes:
• Sample Groups: Analysis Batch, Profile, etc.
• Site Groups: Transect, station, observatory, etc.
• Value Groups: ???
• Person Groups: ?Research Teams, etc.?
SampleGroups
SampleID (FK)
GroupID (FK)
ValueGroups
ValueID (FK)
GroupID (FK)
Groups
GroupID (FK)
ValueID (FK)
ODM v1.1 Suggested
Sources/Institution PersonInstitution
Soil/Sed intervals OffsetValueMin & OffsetValueMax
Only one offset value add horizontal offsets?
Horizon Descriptions? Add DataValueNote field to Data Values
Methods table insufficient add MethodType, PersonID, etc.
CensorCode insufficient Need value field (i.e. Method Detection Limit)
Other outstanding issues:
• Do spatial offsets also belong in samples table? [yes]
• Spectral data, photos?
• Dataset versioning
Importance of Sample/Site Tracking
• CZO scientists share samples!
• Data often needs to be merged at level of
subsamples
• SWRC’s biggest data management
headaches always come from merging data
from different instruments/labs by sample and
by site.
• International Geo-Sample Number (IGSN) is
the answer!
Object Types in IGSN/SESAR Existing • Core • Core half round • Core quarter round • Core piece • Core section • Core section half • Core sub-piece • Core whole round • Cuttings • Dredge • Grab • Hole • Individual sample • Oriented core • Other • Rock powder • Terrestrial sample • Trawl
Considering? • Sampling events:
• holes, cores, dredges, stratigraphic sections
• Individual samples: • Specimens, rocks, minerals, fossils,
precipitates, synthetic material, etc. • Fluid samples: seawater,
hydrothermal fluids, groundwater, etc. (to be completed)
• Particulates: aerosols, suspended matter
• Soil pedons and samples thereof
• Sub-samples of any of above: • processed samples such as mineral
or fossil separates, leachates, thin sections, etc.
http://www.geosamples.org/sampletypes
CZO Geo-Object Types • Site/Location (x,y. z treated via vertical offset)
• surface water station, well, lysimeter, piezometer, soil pit,
borehole, monument, meteorological station/tower, tree?
• Fluid Sample (Water Sample?)
• stream/river water, pond/lake water, wetland surface water,
groundwater, soil water (unsaturated), sediment porewater, sap?
• Gas Sample (also fluid?)
• atmospheric gas, dissolved gas, soil gas
• Soil/Lithology/Sediment Sample (need help with names)
• Surface grab, core, auger interval, pit interval, rock, saprolite?,
bedrock?, cuttings?
• Plant Sample
• Whole plant, tissue, ???
CZO Sample Fraction Types • Subsample
• Duplicate or split that does not fractionate whole sample
• Size Fraction
• i.e. > 2 mm, 63-2000 um, <63 um
• Extracted Fraction
• Acid soluble, total lipid extract, dithionate-citrate-bicarbonate
extract
• Extraction residue
• etc.
ODM v1.1 Suggested v2
DataValues
ValueID (PK)
DataValue
ValueAccuracy
LocalDateTime
UTCOffset
DateTimeUTC
SiteID (FK)
VariableID (FK)
OffsetValue
OffsetTypeID
CensorCode
QualifierID
MethodID (FK)
SourceID (FK)
SampleID (FK)
DerivedFromID (FK)
QualityControlLevelD
DataValues
ValueID (PK)
DataValue
LocalDateTime
UTCOffset
DateTimeUTC
SiteID (FK)
VariableID (FK)
CensorCode
MethodID (FK)
SourceID (FK)
QualityControlLevelD
SampleID (FK)
DataValuesExtension
DataValueExtensionID
DataValueID (FK)
AttributeID (FK)
DataValueAttributeValue
Example Attributes:
Offset, OffsetMin, OffsetMax, QualifierID, DerivedFromID, ValueAccuracy, InstrumentType, InstrumentID (FK), SensorID (FK) AnalysisNote, DataValueNote, ProjectName, CensorType, CensorLimitValue
Attributes
AttributeID (PK)
AttributeType (CV)
AttributeDescription
Units (FK)
AttributeTypes:
Correspond directly to table that is being extended.
i.e. Site, Sample, Value
Bold fields in tables are required
Non-bold fields are optional
ODM v1.1 Suggested v2
Samples
SampleID (PK)
SampleType
LabSampleCode
LabMethodID (FK)
Samples
SampleID (PK)
SampleCode
SampleNote
IGSN
IsFieldSample
LocalDateTime
UTCOffset
ObjectTypeID (FK)
FractionTypeID (FK)
MethodID (FK)
SourceID (FK)
Table Notes
alpha-numeric, ~ 20 char
~200 char
Intl. Geo-Sample Number
Y/N, to distinguish ultimate parent
Creation, when container filled
Corresponding to IGSN Object Types
i.e. whole sample, >63 um, acid extract
Types: collection or prep., not analysis
Who performed method above
ParentSampleXRef
ParentSampleID (FK)
ChildSampleID (FK)
Notes:
• Method Type should not include analysis method, b/c its in the values table.
• Analysis “Batch” or “Run” is treated as a sample group
• ParentSample table allows for composite samples
SiteSampleXRef
SiteID (FK)
SampleID (FK)
SamplesExtension
SampleExtensionID (PK)
SampleID (FK)
AttributeID (FK)
SampleAttributeValue
Example Attributes:
VerticalOffset, VerticalOffsetMin, VerticalOffsetMax, HorizontalOffset, HorizonalOffsetDirection (deg.), Medium, AlternateSampleCode, FieldCampagneName, Amount, StorageLocation, ContainterType, DilutionRatio, CollectionNote, PreparationNote, FractionNote, IsExperimentalSample, ExperimentID
ODM v1.1 Suggested v2
Sites
SiteID (PK)
SiteCode
SiteName
Latitude
Longitute
LatLongDatumID (FK)
Elevation_m
VerticalDatum
LocalX
LocalY
LocalProjectionID (FK)
PosAccuracy_m
State
Country
Comments
Notes:
• Method Type should not include analysis method, b/c its in the values table.
• LIMS info is recorded in Values table (i.e. sample amount, budget #, dilution ratio, sample location, container type)
• Analysis “Batch” or “Run” is treated as a sample group
• ParentSample table allows for composite samples
SitesExtension
SiteExtensionID
SiteID (FK)
AttributeID (FK)
SiteAttributeValue
Example Attributes:
From ODM 1.1: SiteDescription, LocalX, LocalY, LocalProjectionID (FK), PosAccuracy_m (LatLongAccuracy_m?), City/Township, State/Province, Country, Comments
From IGSN/SESAR: Physiographic feature, Name of physiographic feature, Location description, Locality, Locality description, Field Program/Cruise, Platform type, Platform name
From Sue Brantley: annlPrecip_mm, anlMeanTemp_oC, slopeDeg, aspect, landscapePosition, landUse, vegSpecies, parentLithology, exposureAge, erosionRate, depthToRock_m, soilTaxonomy, SSURGO_ID (FK), siteNote, ContactName
From SWRC: AlternateSiteCode, WatershedName, HUC, ElevationAccuracy,
Sites
SiteID (PK)
SiteCode
SiteName
Latitude
Longitute
LatLongDatumID (FK)
Elevation_m
VerticalDatum
ODM v1.1 Suggested v2
GroupDescriptions
GroupID (PK)
GroupDescription
Groups
GroupID (PK)
GroupCode
GroupDescription
GroupTypeID (FK)
Table Notes
alpha-numeric, ~ 20 char
~200 char
Types: Value, Sample, Site, Person?, etc.
SiteGroupsXRef
SiteID (FK)
GroupID (FK)
Notes:
• Sample Groups: Analysis Batch, Profile, Experiment, etc.
• Site Groups: Transect, station, observatory, etc.
• Value Groups: Profile, Analysis, Spectra
• Person Groups: ?Research Teams, etc.?
SampleGroupsXRef
SampleID (FK)
GroupID (FK)
ValueGroupsXRef
ValueID (FK)
GroupID (FK)
Groups
GroupID (FK)
ValueID (FK)
ODM v1.1 Suggested v2
Methods
MethodID (PK)
MethodDescription
Method
MethodID (PK)
MethodCode
MethodDescription
MethodTypeID (CV)
MethodLink
SourceID
Table Notes
alpha-numeric, ~ 20 char
~200 char
Types: Collection, Preparation, Analysis
URL or DOI
Could be paper, report, person/lab
Sources
SourceID (PK)
Organization
SourceDescription
SourceLink
ContactName
Phone
Address
City
State
ZipCode
Citation
MetadataID (FK)
Persons
PersonID (PK)
LastName
FirstName
Phone
InstitutionID (FK)
PersonLink
Institutions
InstitutionID (PK)
InstitutionName
Department
Address
City
State
ZipCode
InstitutionLink
Sources
SourceID (PK)
SourceDescription
SourceLink
Corresponding PersonID (FK)
DataSeries: better sample integration?
• Uses DataSeries table modified from HydroDesktop (next page)
• DataSeries Table can act as a XRef Table, but requires creation
of a data series upon registration of the FieldSample
• A DataSeries from/for a single sample can be viewed as
equivalent to EarthChem’s Analysis table
• Joins do not require passing through huge DataValues table
Sites Samples
DataSeries
DataValues
HydroDesktop Suggested ODM v2
DataSeries
SeriesID (PK)
SiteID (FK)
VariableID (FK)
IsCategorical
MethodID (FK)
SourceID (FK)
QualityControlLevelD
BeginDateTime
EndDateTime
BeginDateTimeUTC
EndDateTimeUTC
ValueCount
CreationDateTime
Subscribed
UpdateDateTime
LastCheckedDateTime Example DataSeriesAttributes:
BeginDateTime, EndDateTime,BeginDateTimeUTC EndDateTimeUTC, ValueCount, CreationDateTime, Subscribed, UpdateDateTime, LastCheckedDateTime
InstrumentType, InstrumentID (FK), SensorID (FK), PlatformID (FK), AnalysisNote, UTCOffset,
DataSeries
SeriesID (PK)
SiteID (FK)
SampleID (FK)
IsCategorical
MethodID (FK)
SourceID (FK)
QualityControlLevellD
CreationDateTime
DataValues
ValueID (PK)
SeriesID (FK)
DataValue
VariableID (FK)
CensorCode
LocalDateTime
DateTimeUTC
DataValues
ValueID (PK)
SeriesID
DataValue
ValueAccuracy
LocalDateTime
UTCOffset
DateTimeUTC
OffsetValue
OffsetTypeID (FK)
CensorCode
QualifierlD
SampleID (FK)
FileID (FK)
DataSeriesExtension
DataSeriesExtensionID
DataSeriesID (FK)
AttributeID (FK)
DataSeriesAttributeValue