Upload
cade
View
38
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Science Environment for Ecological Knowledge: Ecogrid Interfaces Dave Vieglais The Natural History Museum and Biodiversity Research Center University of Kansas. Science Environment for Ecological Knowledge. Research Objectives Access to ecological and environmental data - PowerPoint PPT Presentation
Citation preview
Science Environment for Ecological Knowledge: Ecogrid Interfaces
Dave VieglaisThe Natural History Museum and Biodiversity Research Center
University of Kansas
Science Environment for Ecological Knowledge
Research Objectives
Access to ecological and environmental data Enable data sharing & re-use Enhance data discovery at global scales
Scalable analysis and synthesis Taxonomic, Spatial, Temporal, Conceptual integration of
data Enable communication and collaboration for analysis Address data heterogeneity issues Enable re-use of analytical components
Data is Heterogeneous Syntax Schema Semantics
From many disciplines Biodiversity surveys, hydrology, atmospheric
chemistry, spatial data, behavioral experiments,… Data on economics, demographics, legal issues,…
Data is distributed
Informatics Challenges for SEEK
SEEK Components
EcoGrid Ecological, biodiversity and environmental data Computational access
Analysis and Modeling System Modeling scientific workflows
Semantic Mediation System “Smart” data discovery Knowledge-based data integration Knowledge-based analysis integration
Knowledge Representation Ontologies for describing ecology
Building the EcoGrid
AND
SEV
LUQ
VCR
HBR
NTL
NRSPISCO1
PISCO2 OBFS
Metacat node
Site node
LTER Network (24)Organization of Biological Field Stations (180)UC Natural Reserve System (36)Partnership for Interdisciplinary Studies of Coastal Oceans (4)Multi-agency Rocky Intertidal Network (60)
SDSC
NET
KU
NCEAS
SRB node
DiGIR node
SEEK EcoGrid
Integrate diverse data networks from ecology, biodiversity, and environmental sciences Metacat, DiGIR, SRB, Xanthoria, ...
EML is the core for data documentation Access to computational resources via the Grid
(OGSA)
Ecological Metadata Language (EML)
Metadata: a means to manage ecological data There is no universal data model for ecology Accommodate heterogeneity and dispersion
EML Discovery information
Creator, Title, Abstract, Keyword, etc. Coverage
Geographic, temporal, and taxonomic extent Logical and physical data structure
Data semantics via unit definitions and typing Protocols and methods
DiGIR Overview
DiGIR = Distributed Generic Information Retrieval A DiGIR client may communicate with any number of
data providers A DiGIR data provider may expose any number of
resources (databases) A DiGIR resource is a collection of objects described
by a single federation schema
DiGIR Client
DiGIR Provider
DataResource1..n 1..n
EcoGrid Interfaces
Registry
Session
Query
Taxon
SMS
Resolves references to objects
•Interface definitions
•Data structures
•Service instancesAuthentication
Details on session information
Coarse granularity of resource restriction
Search and retrieve metadata and data
Different levels of “conformance”
Low bar for participation in SEEKSystem to reduce ambiguity in scientific names
Commonly used to address synonomy
Mechanism for relating and resolving data andmetadata concepts
EcoGrid Query Interfaces
Provides a mechanism for search and retrieval of metadata and federated data
Supports third party interaction with search results – forwarding of result set identifiers to another service instance for retrieval
Different levels of compliance Low barrier for participation Bulk of data will be accessible through Type I
Query Interfaces Implemented
Initial requirement to support query and retrieval from: SRB Metacat DiGIR Xanthoria
Federated data sets that subscribe to a small set of federation schemas
EcoGrid Query Level I
Basic, entry level exposure of data and metadata for EcoGrid and SEEK
Response contains data – intended for direct communications rather than 3rd party indirection
ResultsetType query(SessionID,QueryType)
byte[] get(SessionID,objectID)
Query Example
<egq:query queryId="query-digir.1.1" system="http://knb.ecoinformatics.org"
xmlns:egq="ecogrid://ecoinformatics.org/ecogrid-query-1.0.0beta1"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid-
query-1.0.0beta1 ../../src/xsd/query.xsd"> <namespace
prefix="darwin">http://digir.net/schema/conceptual/darwin/2003/1.0</namespace>
<returnfield>/ScientificName</returnfield> <returnfield>/Longitude</returnfield> <returnfield>/Latitude</returnfield> <title>Peromyscus genus query</title> <condition operator="LIKE"
concept="Genus">Peromyscus</condition></egq:query>
Query Structure
Language independent representation of a query structure
Transformed into the appropriate native language of the data store
Example:<AND> <condition operator="LIKE“ concept="ScientificName">
peromyscus man%</condition>
<condition operator="NOT EQUALS“ concept="DecimalLatitude"> NULL</condition>
</AND>
Specifying the Resultset
Specify the list of concepts (fields) to be returned in the resultset
Simple paths used to identify elements or document subtrees
Effectively flattens the structure of the records, but allows generic representation
Example: <returnfield>/ScientificName</returnfield>
<returnfield>/Longitude</returnfield>
<returnfield>/Latitude</returnfield>
Query Result Set Structure
<rs:resultset resultsetId="foo.1.1" system="urn:not://sure/what/to/put/here" xmlns:rs="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid-resultset-
1.0.0beta1 ../../src/xsd/resultset.xsd"> <resultsetMetadata> <sendTime>2003-05-02T16:45:50-09:00</sendTime> <startRecord>1</startRecord> <endRecord>2</endRecord> <recordCount>2</recordCount> </resultsetMetadata> <record number="1"
system="http://speciesanalyst.net/digir/DiGIR.php?resource=MammalsDwC2" identifier="mvz1" namespace="http://digir.net/schema/conceptual/darwin/2003/1.0" lastModifiedDate="2003-03-03T10:42:13" creationDate="2003-03-03T10:42:13"> <darwin:ScientificName>PEROMYSCUS LEUCOPUS NOVEBORACENSIS
</darwin:ScientificName> <darwin:Longitude>121</darwin:Longitude> <darwin:Latitude>33</darwin:Latitude> </record>
EcoGrid Query Level II
More detailed handling of results Uses RSIDs to identify resultsets- handles
that can be passed to a third party
Resultset retrieve(SessionID,RSID,start,numrecs)
RSID search(SessionID,query)
query decodeResultsetIdentifier(SessionID,RSID)
statusinfo getResultStatus(SessionID)
int transfer(SessionID,sourceURL,destURL,ObjectID)
EcoGrid Write
Used to push data back to sources (e.g. publishing EML documents)
Depends on the availability of an authentication system
put(sessionID, objectID, object, type)
delete(sessionID,objectID)
Data Instance Query?
New requirement to support direct query and retrieval with arbitrary data sets
Generally no common schemas between different instances
Could either Push data instance to service that can query
object (e.g. the SRB) Implement interface at the data instance location
Simple JDBC / SQL interface?
dbSchema getDataSchema(sessionID,objectID)
dbResultset search(sessionID,objectID,SQL)
Convergence with Globus?
EcoGrid originally intended to use Globus since it provided much of the infrastructure
Globus is not a viable infrastructure layer due to installation and reliability concerns
Should SEEK implement Globus infrastructure to support project requirements?
Likely to duplicate minimal service definitions and re-implement
Acknowledgements
This material is based upon work supported by:
The National Science Foundation under Grant Numbers 9980154, 9904777, 0131178, 9905838, 0129792, and 0225676.
The National Center for Ecological Analysis and Synthesis, a Center funded by NSF (Grant Number 0072909), the University of California, and the UC Santa Barbara campus.
The Andrew W. Mellon Foundation.
PBI Collaborators: NCEAS, University of New Mexico (Long Term Ecological Research Network Office), San Diego Supercomputer Center, University of Kansas (Center for Biodiversity Research)