Upload
buidat
View
213
Download
0
Embed Size (px)
Citation preview
3 FUNDAMENTAL DATA TYPES
Environmental Data Store: Basic Concepts and the Development of an
Extensive Controlled Vocabulary System
Abstract ID:IN21B-1478
PENG JI (1) ([email protected]) , MICHAEL PIASECKI (2)([email protected]) (1): Environmental CrossRoads Initiative, City College of New York, 160 Convent Ave., New York, NY, 10031, USA
(2): Dept. of Civil Engineering, City College of New York, 160 Convent Ave., New York, NY, 10031, USA
1 ABSTRACT
With the rapid growth in data volumes, data diversity and data demands from multi-disciplinary research effort, data management and exploitation are increasingly facing significant challenges for environmental scientific community. We describe Environmental data store (EDS), a system we are developing that is a web-based system following an open source implementation to manage and exploit multi-data-type environmental data. EDS provides repository services for the six fundamental data types, which meet the demands of multi-disciplinary environmental research. These data types are: a) Time Series Data, b) GeoSpatial data, c) Digital Data, d) Ex-Situ Sampling data, e) Modeling Data, f) Raster Data. Through data portal, EDS allows for efficient consuming these six types of data placed in data pool, which is made up of different data nodes corresponding to different data types, including ODM, GeoServer, RAMADDA, ESSDB, THREADS, etc. EDS data portal offers unified submission interface for the above different data types; provides fully integrated, scalable search across content from the above different data systems; also features mapping, analysis, exporting and visualization, through integration with other software. EDS uses a number of developed systems, follows widely used data standards, and highlights the thematic, semantic, and syntactic support on the submission and search, in order to advance multi-disciplinary environmental research.
2 FUNDAMENTAL DATA TYPES
Data Realm
Sampling Time Series
Geospatial
Digital Modeling
Raster
ODM node
RAMADDA node
THREDDS node
GEOSERVER node
ESSDB node
RASTER node
Search
Themes Keywords Temporal
Spatial Facets
Gazetteer …
ODM node
THREDDS node
HIS Central
Unidata's TDS
EDS Central
Data Submission
module
Data Retrieval module
Semantic module
Metadata
Common elements Special elements
…
Vocabulary
Topic Category Geodetic Datum
Site Type Sample Type
Sample Medium Variable Name
…
4 COMMON DATA ELEMENTS
Geospacial
Raster
Time Series
Ex-Situ Sampling
Modeling
• Title • Topic Category • Abstract • Keywords • Temporal Coverage • Spatial Coverage • Project • Contributor
• THREDDS Dataset Inventory Catalog Specification
• ODM Design Specification • HIS Central Functionality
Requirement • WaterML 2.0
• ISO 19115 • GeoServer Data Directory
Structure
• ESSDB Design Specification • Water Quality Element • Environmental Sampling, Analysis, and
Result Data Standards
Digital Data: RAMADDA allows user to add standard metadata element or customized tag to the properties of a dataset.
6 ACKNOWLEDGEMENTS
We would like to acknowledge the National Science Foundation who has supported this work under grant numbers EAR0838307 and EAR0949196 . We would also like to thank the City College of New York for their financial support for this project and also the Paul Muzio from the CUNY’s High Performance Computing Center for his support to use the facility for installing the EDS. Special thanks go to Kerstin Lehnert (Director IEDA at Columbia University) for her support and assistance for the use of the petDB system. Lastly, we would like to thank Florian Lengyel from the Environmental CrossRoads Initiative for his continued help and support to establish the first prototype installation for the EDS.
CONTROLLED VOCABULARIES
• Collect, analyze, compare, choose and merge CVs widely used and recognized in geoscience/environmental community Identify and organize 16 CVs for EDS . Instantiation in a Simple Knowledge Observation System (SKOS). Formal representation is based on “tematres” http://sourceforge.net/projects/tematres/
5 Top concept mapping between EDS and other data standards
WaterML ODM NWIS WQX SWAMP ENVO SWEET
# of terms 13 17 33 41 17 325 281
WaterML2.0 5(33%) 4(17%) 6(27%) 3(20%) 4(2%) 2(1%)
ODM 4(16%) 6(21%) 4(24%) 5(3%) 5(3%)
NWIS 8(22%) 4(16%) 6(3%) 4(3%)
WQX 5(17%) 10(5%) 5(3%)
SWAMP 3(2%) 4(3%)
ENVO 24(8%)
The relative overlap between different vocabularies (taking Sample Medium Vocabulary as a case)
• Topic Category: 2 levels, 51 kinds
• Site Types: 5 levels, 503 kinds
• Measurement Units: 4 levels, 718 kinds
• Geodetic Datum: 2 levels, 602 kinds
• Sample Mediums: 4 levels, 670 kinds
• Variable Name: 5 levels, 3999 kinds
• Miscellaneous: 3 levels, 379 kinds
• Sample Processing Method: 3 levels, 1123 kinds
• Sample Processing Equipment: 3 levels, 255 kinds
• Ambient Condition: 5 levels, 226 kinds
• …It is growing
EDS CVs
Envo
SWEET
NEMI
CF
EPSG
NWIS
ODM
CUAHSI Ontology
WQX
WaterML
Relative overlap= (2*number of overlapping terms)/total number of terms in the pair of vocabulary*100%.
Water
EDS Top Concept Time Series Data Ex-Situ Sampling data Geospatial data Modeling Data
ODM 1.1 WaterML2.0 WQDE ESAR ISO 19115 DICS
Geodetic Datum Spatial References Vertical Datum
Horizontal Reference Datum Vertical Reference Datum
Horizontal Reference Datum Vertical Reference Datum
ReferenceSystem Vertical extent information
geospatialCoverage
Measurement Units Units Unit of Measure Name Measure Unit Code
Misce
llane
ou
s
Contributor Property RoleCode contributor role
Data Processing ValueType quality Detection Limit Type
Result Status Identifier Detection Limit Type Reporting Limit Type Measure Qualifier Code
ProgressCode SpatialRepresentationCode MediumFormatCode MediumNameCode MaintenanceFrequencyCode PresentationFormCode
dataType dataFormat collectionTypes
Time Series Related DataType interpolation type process type
Others
Sample Medium SampleMedium medium Media Sampled Sample Media Name Sample Media Sub-division Name
Sample Processing Method Sample Collection Method preservation method Analytical Method Number
Sample Preparation Method Sample Analytical Method Method Type
Sample Processing Equipment Analysis Equipment Equipment Type
Sample Type SampleType Sample Type
Site Type SiteType Sampling Station Type Monitoring Location Type Well Type Facility Site Type
Topic Category Topic Category General Category
TopicCategoryCode keyword
Variable Name Variable Name Analyte Name Substance Identification variable name
Get ePoster!
Go to EDSCVs