1
3 FUNDAMENTAL DATA TYPES Environmental Data Store: Basic Concepts and the Development of an Extensive Controlled Vocabulary System Abstract ID:IN21B-1478 PENG JI (1) ([email protected]) , MICHAEL PIASECKI (2)([email protected]) (1): Environmental CrossRoads Initiative, City College of New York, 160 Convent Ave., New York, NY, 10031, USA (2): Dept. of Civil Engineering, City College of New York, 160 Convent Ave., New York, NY, 10031, USA 1 ABSTRACT With the rapid growth in data volumes, data diversity and data demands from multi-disciplinary research effort, data management and exploitation are increasingly facing significant challenges for environmental scientific community. We describe Environmental data store (EDS), a system we are developing that is a web-based system following an open source implementation to manage and exploit multi-data-type environmental data. EDS provides repository services for the six fundamental data types, which meet the demands of multi-disciplinary environmental research. These data types are: a) Time Series Data, b) GeoSpatial data, c) Digital Data, d) Ex-Situ Sampling data, e) Modeling Data, f) Raster Data. Through data portal, EDS allows for efficient consuming these six types of data placed in data pool, which is made up of different data nodes corresponding to different data types, including ODM, GeoServer, RAMADDA, ESSDB, THREADS, etc. EDS data portal offers unified submission interface for the above different data types; provides fully integrated, scalable search across content from the above different data systems; also features mapping, analysis, exporting and visualization, through integration with other software. EDS uses a number of developed systems, follows widely used data standards, and highlights the thematic, semantic, and syntactic support on the submission and search, in order to advance multi-disciplinary environmental research. 2 FUNDAMENTAL DATA TYPES Data Realm Sampling Time Series Geospatial Digital Modeling Raster ODM node RAMADDA node THREDDS node GEOSERVER node ESSDB node RASTER node Search Themes Keywords Temporal Spatial Facets Gazetteer ODM node THREDDS node HIS Central Unidata's TDS EDS Central Data Submission module Data Retrieval module Semantic module Metadata Common elements Special elements Vocabulary Topic Category Geodetic Datum Site Type Sample Type Sample Medium Variable Name 4 COMMON DATA ELEMENTS Geospacial Raster Time Series Ex-Situ Sampling Modeling Title Topic Category Abstract Keywords Temporal Coverage Spatial Coverage Project Contributor THREDDS Dataset Inventory Catalog Specification ODM Design Specification HIS Central Functionality Requirement WaterML 2.0 ISO 19115 GeoServer Data Directory Structure ESSDB Design Specification Water Quality Element Environmental Sampling, Analysis, and Result Data Standards Digital Data: RAMADDA allows user to add standard metadata element or customized tag to the properties of a dataset. 6 ACKNOWLEDGEMENTS We would like to acknowledge the National Science Foundation who has supported this work under grant numbers EAR0838307 and EAR0949196 . We would also like to thank the City College of New York for their financial support for this project and also the Paul Muzio from the CUNY’s High Performance Computing Center for his support to use the facility for installing the EDS. Special thanks go to Kerstin Lehnert (Director IEDA at Columbia University) for her support and assistance for the use of the petDB system. Lastly, we would like to thank Florian Lengyel from the Environmental CrossRoads Initiative for his continued help and support to establish the first prototype installation for the EDS. CONTROLLED VOCABULARIES Collect, analyze, compare, choose and merge CVs widely used and recognized in geoscience/environmental community Identify and organize 16 CVs for EDS . Instantiation in a Simple Knowledge Observation System (SKOS). Formal representation is based on “tematres” http://sourceforge.net/projects/tematres/ 5 Top concept mapping between EDS and other data standards WaterML ODM NWIS WQX SWAMP ENVO SWEET # of terms 13 17 33 41 17 325 281 WaterML2.0 5(33%) 4(17%) 6(27%) 3(20%) 4(2%) 2(1%) ODM 4(16%) 6(21%) 4(24%) 5(3%) 5(3%) NWIS 8(22%) 4(16%) 6(3%) 4(3%) WQX 5(17%) 10(5%) 5(3%) SWAMP 3(2%) 4(3%) ENVO 24(8%) The relative overlap between different vocabularies (taking Sample Medium Vocabulary as a case) Topic Category: 2 levels, 51 kinds Site Types: 5 levels, 503 kinds Measurement Units: 4 levels, 718 kinds Geodetic Datum: 2 levels, 602 kinds Sample Mediums: 4 levels, 670 kinds Variable Name: 5 levels, 3999 kinds Miscellaneous: 3 levels, 379 kinds Sample Processing Method: 3 levels, 1123 kinds Sample Processing Equipment: 3 levels, 255 kinds Ambient Condition: 5 levels, 226 kinds It is growing EDS CVs Envo SWEET NEMI CF EPSG NWIS ODM CUAHSI Ontology WQX WaterML Relative overlap= (2*number of overlapping terms)/total number of terms in the pair of vocabulary*100%. Water EDS Top Concept Time Series Data Ex-Situ Sampling data Geospatial data Modeling Data ODM 1.1 WaterML2.0 WQDE ESAR ISO 19115 DICS Geodetic Datum Spatial References Vertical Datum Horizontal Reference Datum Vertical Reference Datum Horizontal Reference Datum Vertical Reference Datum ReferenceSystem Vertical extent information geospatialCoverage Measurement Units Units Unit of Measure Name Measure Unit Code Miscellaneous Contributor Property RoleCode contributor role Data Processing ValueType quality Detection Limit Type Result Status Identifier Detection Limit Type Reporting Limit Type Measure Qualifier Code ProgressCode SpatialRepresentationCode MediumFormatCode MediumNameCode MaintenanceFrequencyCo de PresentationFormCode dataType dataFormat collectionTypes Time Series Related DataType interpolation type process type Others Sample Medium SampleMedium medium Media Sampled Sample Media Name Sample Media Sub-division Name Sample Processing Method Sample Collection Method preservation method Analytical Method Number Sample Preparation Method Sample Analytical Method Method Type Sample Processing Equipment Analysis Equipment Equipment Type Sample Type SampleType Sample Type Site Type SiteType Sampling Station Type Monitoring Location Type Well Type Facility Site Type Topic Category Topic Category General Category TopicCategoryCode keyword Variable Name Variable Name Analyte Name Substance Identification variable name Get ePoster! Go to EDSCVs

EDS CVs - 2018 AGU Fall Meeting · Through data portal, EDS allows for efficient consuming these six types of data placed in data pool, which is ... EDS CVs Envo SWEET NEMI CF EPSG

  • Upload
    buidat

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

3 FUNDAMENTAL DATA TYPES

Environmental Data Store: Basic Concepts and the Development of an

Extensive Controlled Vocabulary System

Abstract ID:IN21B-1478

PENG JI (1) ([email protected]) , MICHAEL PIASECKI (2)([email protected]) (1): Environmental CrossRoads Initiative, City College of New York, 160 Convent Ave., New York, NY, 10031, USA

(2): Dept. of Civil Engineering, City College of New York, 160 Convent Ave., New York, NY, 10031, USA

1 ABSTRACT

With the rapid growth in data volumes, data diversity and data demands from multi-disciplinary research effort, data management and exploitation are increasingly facing significant challenges for environmental scientific community. We describe Environmental data store (EDS), a system we are developing that is a web-based system following an open source implementation to manage and exploit multi-data-type environmental data. EDS provides repository services for the six fundamental data types, which meet the demands of multi-disciplinary environmental research. These data types are: a) Time Series Data, b) GeoSpatial data, c) Digital Data, d) Ex-Situ Sampling data, e) Modeling Data, f) Raster Data. Through data portal, EDS allows for efficient consuming these six types of data placed in data pool, which is made up of different data nodes corresponding to different data types, including ODM, GeoServer, RAMADDA, ESSDB, THREADS, etc. EDS data portal offers unified submission interface for the above different data types; provides fully integrated, scalable search across content from the above different data systems; also features mapping, analysis, exporting and visualization, through integration with other software. EDS uses a number of developed systems, follows widely used data standards, and highlights the thematic, semantic, and syntactic support on the submission and search, in order to advance multi-disciplinary environmental research.

2 FUNDAMENTAL DATA TYPES

Data Realm

Sampling Time Series

Geospatial

Digital Modeling

Raster

ODM node

RAMADDA node

THREDDS node

GEOSERVER node

ESSDB node

RASTER node

Search

Themes Keywords Temporal

Spatial Facets

Gazetteer …

ODM node

THREDDS node

HIS Central

Unidata's TDS

EDS Central

Data Submission

module

Data Retrieval module

Semantic module

Metadata

Common elements Special elements

Vocabulary

Topic Category Geodetic Datum

Site Type Sample Type

Sample Medium Variable Name

4 COMMON DATA ELEMENTS

Geospacial

Raster

Time Series

Ex-Situ Sampling

Modeling

• Title • Topic Category • Abstract • Keywords • Temporal Coverage • Spatial Coverage • Project • Contributor

• THREDDS Dataset Inventory Catalog Specification

• ODM Design Specification • HIS Central Functionality

Requirement • WaterML 2.0

• ISO 19115 • GeoServer Data Directory

Structure

• ESSDB Design Specification • Water Quality Element • Environmental Sampling, Analysis, and

Result Data Standards

Digital Data: RAMADDA allows user to add standard metadata element or customized tag to the properties of a dataset.

6 ACKNOWLEDGEMENTS

We would like to acknowledge the National Science Foundation who has supported this work under grant numbers EAR0838307 and EAR0949196 . We would also like to thank the City College of New York for their financial support for this project and also the Paul Muzio from the CUNY’s High Performance Computing Center for his support to use the facility for installing the EDS. Special thanks go to Kerstin Lehnert (Director IEDA at Columbia University) for her support and assistance for the use of the petDB system. Lastly, we would like to thank Florian Lengyel from the Environmental CrossRoads Initiative for his continued help and support to establish the first prototype installation for the EDS.

CONTROLLED VOCABULARIES

• Collect, analyze, compare, choose and merge CVs widely used and recognized in geoscience/environmental community Identify and organize 16 CVs for EDS . Instantiation in a Simple Knowledge Observation System (SKOS). Formal representation is based on “tematres” http://sourceforge.net/projects/tematres/

5 Top concept mapping between EDS and other data standards

WaterML ODM NWIS WQX SWAMP ENVO SWEET

# of terms 13 17 33 41 17 325 281

WaterML2.0 5(33%) 4(17%) 6(27%) 3(20%) 4(2%) 2(1%)

ODM 4(16%) 6(21%) 4(24%) 5(3%) 5(3%)

NWIS 8(22%) 4(16%) 6(3%) 4(3%)

WQX 5(17%) 10(5%) 5(3%)

SWAMP 3(2%) 4(3%)

ENVO 24(8%)

The relative overlap between different vocabularies (taking Sample Medium Vocabulary as a case)

• Topic Category: 2 levels, 51 kinds

• Site Types: 5 levels, 503 kinds

• Measurement Units: 4 levels, 718 kinds

• Geodetic Datum: 2 levels, 602 kinds

• Sample Mediums: 4 levels, 670 kinds

• Variable Name: 5 levels, 3999 kinds

• Miscellaneous: 3 levels, 379 kinds

• Sample Processing Method: 3 levels, 1123 kinds

• Sample Processing Equipment: 3 levels, 255 kinds

• Ambient Condition: 5 levels, 226 kinds

• …It is growing

EDS CVs

Envo

SWEET

NEMI

CF

EPSG

NWIS

ODM

CUAHSI Ontology

WQX

WaterML

Relative overlap= (2*number of overlapping terms)/total number of terms in the pair of vocabulary*100%.

Water

EDS Top Concept Time Series Data Ex-Situ Sampling data Geospatial data Modeling Data

ODM 1.1 WaterML2.0 WQDE ESAR ISO 19115 DICS

Geodetic Datum Spatial References Vertical Datum

Horizontal Reference Datum Vertical Reference Datum

Horizontal Reference Datum Vertical Reference Datum

ReferenceSystem Vertical extent information

geospatialCoverage

Measurement Units Units Unit of Measure Name Measure Unit Code

Misce

llane

ou

s

Contributor Property RoleCode contributor role

Data Processing ValueType quality Detection Limit Type

Result Status Identifier Detection Limit Type Reporting Limit Type Measure Qualifier Code

ProgressCode SpatialRepresentationCode MediumFormatCode MediumNameCode MaintenanceFrequencyCode PresentationFormCode

dataType dataFormat collectionTypes

Time Series Related DataType interpolation type process type

Others

Sample Medium SampleMedium medium Media Sampled Sample Media Name Sample Media Sub-division Name

Sample Processing Method Sample Collection Method preservation method Analytical Method Number

Sample Preparation Method Sample Analytical Method Method Type

Sample Processing Equipment Analysis Equipment Equipment Type

Sample Type SampleType Sample Type

Site Type SiteType Sampling Station Type Monitoring Location Type Well Type Facility Site Type

Topic Category Topic Category General Category

TopicCategoryCode keyword

Variable Name Variable Name Analyte Name Substance Identification variable name

Get ePoster!

Go to EDSCVs