17
EarthCube Layered Architecture Concept Award Interoperability Mechanisms

EarthCube Layered Architecture Concept Award Interoperability Mechanisms

Embed Size (px)

Citation preview

Page 1: EarthCube Layered Architecture Concept Award Interoperability Mechanisms

EarthCube Layered Architecture Concept Award

Interoperability Mechanisms

Page 2: EarthCube Layered Architecture Concept Award Interoperability Mechanisms

Layered Architecture Concept Award

Reagan Moore (UNC-CH/DICE) Collaboration environmentsIlkay Altintas (UCSD) WorkflowsDavid Arctur (OGC) Web servicesLawrence Band (UNC-CH/IE) Eco-hydrology modelingLiping Di (GMU) Geospatial knowledge buildingJanet Fredericks (WHOI) Data qualityJeff Horsburgh (Utah State University) CUAHSI / DataONEYong Liu (UIUC / NCSA) Workflows / Cyberintegrator Chris MacDermaid (Colorado State Univ.) Physics model frameworksBrian Miles (UNC-CH/IE) Eco-hydrology workflowsMichael Schoffner (RENCI) Web service integrationAntoine de Torcy (UNC-CH/DICE) Workflow integrationWeiguo Han (GMU) GeoBrain

Page 3: EarthCube Layered Architecture Concept Award Interoperability Mechanisms

Research Environment - Applications , Workflows

Collaboration Environment – Data Grids, Portals

Protocols Web Services ---------------------

Protocols

Brokers / Messaging / Structured Object Manipulation

Protocols Web Services ---------------------

Protocols

Community Resources

Policies

Policies Policies Policies

Loosely Coupled – Layered Architecture

1

2

3

EarthCube Infrastructure

Page 4: EarthCube Layered Architecture Concept Award Interoperability Mechanisms

Interoperability Mechanisms• For interactions with a collaboration environment

– Register a remote file into the collaboration space• Collection of links Soft link

– Operations on a remote file• THREDDS, OpeNDAP, NetCDF, HDF5, FITS, … Posix I/O extensions

– Access (get, put) a remote file• Web service invoking remote protocol Micro-services

– Asynchronously post and read messages• Message queue forwarding (AMQP) Queuing

– Operations on aggregations of files• Operations associated with a collection Posix I/O extensions

– Operations on aggregations of procedures• Workflows and structured information exchange Rule exchange

– Policy enforcement• Policy-encoded objects Policy exchange

Page 5: EarthCube Layered Architecture Concept Award Interoperability Mechanisms

Use Case Collaborations– Register a remote file into the collaboration space

• DataNet Federation Consortium data grid (DFC)

– Operations on a remote file• THREDDS, OpeNDAP, NetCDF, HDF5, storage drivers for OOI

– Access (get, put) a remote file• DataONE, CUAHSI, (OGC, Data Conservancy)

– Asynchronously post and read messages• SEAD - VIVO

– Operations on aggregations of files• OOI time series archive

– Operations on aggregations of procedures• Kepler (Gulf of Mexico hypoxia), NCSA Cyberintegrator (Texas drought)

– Policy enforcement• Research Data Alliance policy sharing

Page 6: EarthCube Layered Architecture Concept Award Interoperability Mechanisms

Use Cases• Demonstrate reproducible science. A use case could include the registration,

storage, sharing, and re-execution of a workflow. The hypoxia use case from the Cross-Domain and Brokering Concept groups could be used as an example.

• Automate data retrieval. A use case could demonstrate remote access to a data collection, retrieval of desired data sets, transformation, and use in an analysis workflow. An eco-hydrology example that automates access to digital elevation maps and land use coverage is being built.

• Integrate community resources with collaboration environments. An example would be use of the DAB protocol to identify and cache local copies of relevant data sets for local analysis.

• Integrate multiple community resources. A use case could be demonstration of invocation of multiple workflow systems within the same analysis. An example is the integration of Cyberintegrator workflow with collaboration environments to support drought prediction.

Page 7: EarthCube Layered Architecture Concept Award Interoperability Mechanisms

Eco-Hydrology

Choose gauge or outlet (HIS)

Extract drainage area

(NHDPlus)

Digital Elevation

Model (DEM)

WorldfileFlowtable

RHESSys

Slope

Aspect

Streams (NHD)

Roads (DOT) Strata

Hillslope

Patch

Basin

Stream network

Nested watershed structure

Land Use

Leaf Area Index

Phenology

Soil Data

NLCD (EPA)

Landsat TM

MODIS

USDA

Soil and vegetation parameter files

RHESSys workflow to develop a nested watershed parameter file (worldfile) containing a nested ecogeomorphic object framework, and full, initial system state.

Page 8: EarthCube Layered Architecture Concept Award Interoperability Mechanisms

iRODS Rule for RHESSys

main { getExtentForGageReachcode(*gageReachcode, *extentInNHD_Vect_Coords); convertExtentToNHD_DEM(*extentInNHD_Vect_Coords, *extentInNHD_DEM_Coords); extractTileFromNHD_DEM(trimr(*extentInNHD_DEM_Coords, "\n")); importDEMTileIntoNewGRASSLocationAsUTM(*extentInNHD_Vect_Coords, *newLocPhysPath, *newLocObjPath); delineateWatershedForNHDGage(*nhdStreamGageID, *newLocPhysPath, *newLocObjPath);}

Modular workflow composed by chaining basic transformationDefine input variablesCall functions to apply each transformation stepStore results in shared collection

Page 9: EarthCube Layered Architecture Concept Award Interoperability Mechanisms

extractTileFromNHD_DEM(*extentCoords) {# Split path to object into collection and name msiSplitPath(*nhdDEMObjPath, *nhdDEMObjColl, *nhdDEMObjName); writeLine("serverLog", *nhdDEMObjColl); writeLine("serverLog", *nhdDEMObjName);# Build query to discover physical path msiAddSelectFieldToGenQuery("DATA_PATH", "null", *genQInp); msiAddConditionToGenQuery("DATA_NAME", "=", *nhdDEMObjName, *genQInp); msiAddConditionToGenQuery("COLL_NAME", "=", *nhdDEMObjColl, *genQInp); msiAddConditionToGenQuery("DATA_RESC_NAME", "=", *rescName, *genQInp);# Run query msiExecGenQuery(*genQInp, *genQOut);# Extract path from query result foreach (*genQOut) {msiGetValByKey(*genQOut, "DATA_PATH", *filePath); } writeLine("serverLog", *filePath);# Determine physical path of input directory msiSplitPath(*filePath, *inFileDir, *headerFileIgnore);# Generate physical path of output file msiSplitPath(*inFileDir, *inFileParentDir, *rasterDatasetName) *tileFileName = "SUBSET-"++*rasterDatasetName++".img" *tileFilePath = *inFileParentDir++"/"++*tileFileName;# Generate iRODS path of output msiSplitPath(*nhdDEMObjColl, *nhdDEMObjCollParent, *junk) *tileObjPath = *nhdDEMObjCollParent++"/"++*tileFileName *args = "-of HFA -projwin "++*extentCoords++" "++"'*inFileDir'"++" "++"'*tileFilePath'"; writeLine("serverLog", *args); msiExecCmd("gdal_translate", *args, "iren.renci.org", "null", "null", *cmd_out); writeLine("serverLog", *cmd_out);# Register tile file with iRODS msiPhyPathReg(*tileObjPath, *rescName, *tileFilePath, "null", *status);}

Page 10: EarthCube Layered Architecture Concept Award Interoperability Mechanisms

Event-Driven Real-Time Drought Analysis/Prediction Workflow

Data Grid – Collaboration Environment

RAPID (river routing model)

NASA NLDAS-2

Other data sources

Invoke Monitor

Output

Store

Visualization

http://rapid.ncsa.illinois.edu:8080/rapid/

NCSA Cyberintegrator

Page 11: EarthCube Layered Architecture Concept Award Interoperability Mechanisms

Management of Workflows

• Workflow components– File containing input parameters, input file names, output file

names– Input files– File containing workflow language– Output files

• Each invocation of the workflow generates versioned instance– Compare results across input file versions– Share workflows– Re-execute workflows

• Automatically associates input parameters with each workflow invocation and with resulting output files

Page 12: EarthCube Layered Architecture Concept Award Interoperability Mechanisms

Workflow Management

eCWkflow.mssWorkflow file

/earthCube/eCWkflow

Directory holding all input and output filesassociated with workflow file (mounted collection that is linked to the workflow file)

eCWkflow.mpfInput parameter file, lists parametersand input and output file names

/earthCube/eCWkflow/eCWkflow.runDir0

Directory holding all outputfiles generated for invocation of eCWkflow.run, the version number is incremented

eCWkflow.run Automatically generated run file forExecuting each input file

Outfile Output file created for eCWKflow.mpf

eCWkflow2.run

eCWkflow2.mpf

/earthCube/eCWkflow/eCWkflow2.runDir0

NewfileOutput file created for eCWKflow2.mpf

Page 13: EarthCube Layered Architecture Concept Award Interoperability Mechanisms

Workflow Re-execution & SharingeCWkflow.mss

/earthCube/eCWkflow

eCWkflow.mpf

/earthCube/eCWkflow/eCWkflow.runDir0

eCWkflow.run

Outfile

/hydrology/myWkflow

myWkflow.mpf

/hydrology/myWkflow/myWkflow.runDir0

myWkflow.run

Outfile

….imcoll imcoll

/earthCube/eCWkflow/eCWkflow.runDir1

Outfile

/hydrology/myWkflow/myWkflow.runDir1

Outfile

Page 14: EarthCube Layered Architecture Concept Award Interoperability Mechanisms

14

DFC + DataONE Interoperability

• Goal: support interoperability between a DFC data grid and DataONE

• Task: Retrieve a file from DataONE, load into a DFC collaboration environment and add metadata

Page 15: EarthCube Layered Architecture Concept Award Interoperability Mechanisms

15

How It Works1. Query DataONE Coordinating Nodes with SOLR

query

2. Create iRODS collection with same name as query

3. Get list of identifiers for metadata files from search

4. Download the metadata file for each identifier

5. Store the metadata file in DFC data grid

Page 16: EarthCube Layered Architecture Concept Award Interoperability Mechanisms

16

What the Demo Shows

REST APIs

Collection “rain”

Query: “rain”1

2Matching identifier list

3Get metadata file for each identifier

4File goes into collection Mercury Web portal

filefile

file

Page 17: EarthCube Layered Architecture Concept Award Interoperability Mechanisms

EarthCube Layered ArchitectureNSF EAGER 1239678

DataNet Federation ConsortiumNSF OCI-0940841

iRODS Policy-based data managementNSF SDCI 1032732