Upload
silas-booth
View
213
Download
1
Embed Size (px)
Citation preview
EarthCube Layered Architecture Concept Award
Interoperability Mechanisms
Layered Architecture Concept Award
Reagan Moore (UNC-CH/DICE) Collaboration environmentsIlkay Altintas (UCSD) WorkflowsDavid Arctur (OGC) Web servicesLawrence Band (UNC-CH/IE) Eco-hydrology modelingLiping Di (GMU) Geospatial knowledge buildingJanet Fredericks (WHOI) Data qualityJeff Horsburgh (Utah State University) CUAHSI / DataONEYong Liu (UIUC / NCSA) Workflows / Cyberintegrator Chris MacDermaid (Colorado State Univ.) Physics model frameworksBrian Miles (UNC-CH/IE) Eco-hydrology workflowsMichael Schoffner (RENCI) Web service integrationAntoine de Torcy (UNC-CH/DICE) Workflow integrationWeiguo Han (GMU) GeoBrain
Research Environment - Applications , Workflows
Collaboration Environment – Data Grids, Portals
Protocols Web Services ---------------------
Protocols
Brokers / Messaging / Structured Object Manipulation
Protocols Web Services ---------------------
Protocols
Community Resources
Policies
Policies Policies Policies
Loosely Coupled – Layered Architecture
1
2
3
EarthCube Infrastructure
Interoperability Mechanisms• For interactions with a collaboration environment
– Register a remote file into the collaboration space• Collection of links Soft link
– Operations on a remote file• THREDDS, OpeNDAP, NetCDF, HDF5, FITS, … Posix I/O extensions
– Access (get, put) a remote file• Web service invoking remote protocol Micro-services
– Asynchronously post and read messages• Message queue forwarding (AMQP) Queuing
– Operations on aggregations of files• Operations associated with a collection Posix I/O extensions
– Operations on aggregations of procedures• Workflows and structured information exchange Rule exchange
– Policy enforcement• Policy-encoded objects Policy exchange
Use Case Collaborations– Register a remote file into the collaboration space
• DataNet Federation Consortium data grid (DFC)
– Operations on a remote file• THREDDS, OpeNDAP, NetCDF, HDF5, storage drivers for OOI
– Access (get, put) a remote file• DataONE, CUAHSI, (OGC, Data Conservancy)
– Asynchronously post and read messages• SEAD - VIVO
– Operations on aggregations of files• OOI time series archive
– Operations on aggregations of procedures• Kepler (Gulf of Mexico hypoxia), NCSA Cyberintegrator (Texas drought)
– Policy enforcement• Research Data Alliance policy sharing
Use Cases• Demonstrate reproducible science. A use case could include the registration,
storage, sharing, and re-execution of a workflow. The hypoxia use case from the Cross-Domain and Brokering Concept groups could be used as an example.
• Automate data retrieval. A use case could demonstrate remote access to a data collection, retrieval of desired data sets, transformation, and use in an analysis workflow. An eco-hydrology example that automates access to digital elevation maps and land use coverage is being built.
• Integrate community resources with collaboration environments. An example would be use of the DAB protocol to identify and cache local copies of relevant data sets for local analysis.
• Integrate multiple community resources. A use case could be demonstration of invocation of multiple workflow systems within the same analysis. An example is the integration of Cyberintegrator workflow with collaboration environments to support drought prediction.
Eco-Hydrology
Choose gauge or outlet (HIS)
Extract drainage area
(NHDPlus)
Digital Elevation
Model (DEM)
WorldfileFlowtable
RHESSys
Slope
Aspect
Streams (NHD)
Roads (DOT) Strata
Hillslope
Patch
Basin
Stream network
Nested watershed structure
Land Use
Leaf Area Index
Phenology
Soil Data
NLCD (EPA)
Landsat TM
MODIS
USDA
Soil and vegetation parameter files
RHESSys workflow to develop a nested watershed parameter file (worldfile) containing a nested ecogeomorphic object framework, and full, initial system state.
iRODS Rule for RHESSys
main { getExtentForGageReachcode(*gageReachcode, *extentInNHD_Vect_Coords); convertExtentToNHD_DEM(*extentInNHD_Vect_Coords, *extentInNHD_DEM_Coords); extractTileFromNHD_DEM(trimr(*extentInNHD_DEM_Coords, "\n")); importDEMTileIntoNewGRASSLocationAsUTM(*extentInNHD_Vect_Coords, *newLocPhysPath, *newLocObjPath); delineateWatershedForNHDGage(*nhdStreamGageID, *newLocPhysPath, *newLocObjPath);}
Modular workflow composed by chaining basic transformationDefine input variablesCall functions to apply each transformation stepStore results in shared collection
extractTileFromNHD_DEM(*extentCoords) {# Split path to object into collection and name msiSplitPath(*nhdDEMObjPath, *nhdDEMObjColl, *nhdDEMObjName); writeLine("serverLog", *nhdDEMObjColl); writeLine("serverLog", *nhdDEMObjName);# Build query to discover physical path msiAddSelectFieldToGenQuery("DATA_PATH", "null", *genQInp); msiAddConditionToGenQuery("DATA_NAME", "=", *nhdDEMObjName, *genQInp); msiAddConditionToGenQuery("COLL_NAME", "=", *nhdDEMObjColl, *genQInp); msiAddConditionToGenQuery("DATA_RESC_NAME", "=", *rescName, *genQInp);# Run query msiExecGenQuery(*genQInp, *genQOut);# Extract path from query result foreach (*genQOut) {msiGetValByKey(*genQOut, "DATA_PATH", *filePath); } writeLine("serverLog", *filePath);# Determine physical path of input directory msiSplitPath(*filePath, *inFileDir, *headerFileIgnore);# Generate physical path of output file msiSplitPath(*inFileDir, *inFileParentDir, *rasterDatasetName) *tileFileName = "SUBSET-"++*rasterDatasetName++".img" *tileFilePath = *inFileParentDir++"/"++*tileFileName;# Generate iRODS path of output msiSplitPath(*nhdDEMObjColl, *nhdDEMObjCollParent, *junk) *tileObjPath = *nhdDEMObjCollParent++"/"++*tileFileName *args = "-of HFA -projwin "++*extentCoords++" "++"'*inFileDir'"++" "++"'*tileFilePath'"; writeLine("serverLog", *args); msiExecCmd("gdal_translate", *args, "iren.renci.org", "null", "null", *cmd_out); writeLine("serverLog", *cmd_out);# Register tile file with iRODS msiPhyPathReg(*tileObjPath, *rescName, *tileFilePath, "null", *status);}
Event-Driven Real-Time Drought Analysis/Prediction Workflow
Data Grid – Collaboration Environment
RAPID (river routing model)
NASA NLDAS-2
Other data sources
Invoke Monitor
Output
Store
Visualization
http://rapid.ncsa.illinois.edu:8080/rapid/
NCSA Cyberintegrator
Management of Workflows
• Workflow components– File containing input parameters, input file names, output file
names– Input files– File containing workflow language– Output files
• Each invocation of the workflow generates versioned instance– Compare results across input file versions– Share workflows– Re-execute workflows
• Automatically associates input parameters with each workflow invocation and with resulting output files
Workflow Management
eCWkflow.mssWorkflow file
/earthCube/eCWkflow
Directory holding all input and output filesassociated with workflow file (mounted collection that is linked to the workflow file)
eCWkflow.mpfInput parameter file, lists parametersand input and output file names
/earthCube/eCWkflow/eCWkflow.runDir0
Directory holding all outputfiles generated for invocation of eCWkflow.run, the version number is incremented
eCWkflow.run Automatically generated run file forExecuting each input file
Outfile Output file created for eCWKflow.mpf
eCWkflow2.run
eCWkflow2.mpf
/earthCube/eCWkflow/eCWkflow2.runDir0
NewfileOutput file created for eCWKflow2.mpf
Workflow Re-execution & SharingeCWkflow.mss
/earthCube/eCWkflow
eCWkflow.mpf
/earthCube/eCWkflow/eCWkflow.runDir0
eCWkflow.run
Outfile
/hydrology/myWkflow
myWkflow.mpf
/hydrology/myWkflow/myWkflow.runDir0
myWkflow.run
Outfile
….imcoll imcoll
/earthCube/eCWkflow/eCWkflow.runDir1
Outfile
/hydrology/myWkflow/myWkflow.runDir1
Outfile
14
DFC + DataONE Interoperability
• Goal: support interoperability between a DFC data grid and DataONE
• Task: Retrieve a file from DataONE, load into a DFC collaboration environment and add metadata
15
How It Works1. Query DataONE Coordinating Nodes with SOLR
query
2. Create iRODS collection with same name as query
3. Get list of identifiers for metadata files from search
4. Download the metadata file for each identifier
5. Store the metadata file in DFC data grid
16
What the Demo Shows
REST APIs
Collection “rain”
Query: “rain”1
2Matching identifier list
3Get metadata file for each identifier
4File goes into collection Mercury Web portal
filefile
file
EarthCube Layered ArchitectureNSF EAGER 1239678
DataNet Federation ConsortiumNSF OCI-0940841
iRODS Policy-based data managementNSF SDCI 1032732