Upload
barnaby-washington
View
221
Download
1
Tags:
Embed Size (px)
Citation preview
Ryan Fraser, Nicholas Carr
Connecting RDSI, NeCTAR and ANDS via Provenance - VHIRL
● What are VLs?
● What is provenance?
● How do we represent VLs using standardised provenance?
Outline
From https://nectar.org.au/virtual-laboratories-1, they are:
● data repositories and computational tools and streamlining research workflows
What are VLs?
Connecting the commons with VHIRL and Provenance
From http://en.wikipedia.org/wiki/Provenance#Computer_Science:
What is provenance?
“Computer science uses the term provenance to mean the lineage of data or processes, as per data provenance. However there is a field of informatics research within computer science called provenance that studies how provenance of data and processes should be characterised, stored and used. Semantic web standards bodies, such as the World Wide Web Consortium, ratified a standard for provenance representation in 2014, known as PROV.”
Do you make decisions? Yes. Should someone remember how you made those decisions? Yes = PROV
Data Services
Data Layers discovered
Layers consist of numerous remote data services
PROV: a) Service
captures data service information (hosted on RDS)
b) Captures subset details of data selected
Subset Selected for processing
Compute/Storage Services
Flexibility in what compute provider to utilise
PROV: Captures job details, login info, where/what/ when/how computed etc
Includes all relevant NeCTAR details for cloud processing
Available Toolboxes
TCRM – estimate wind speed from cyclone and severe wind
ANUGA – estimate inundation from riverine floods, tsunami, dam break and storm surge
PROV: Captures code utilised along with “how” it is used (template/input files)
Example for tsunami inundation
PROV: Captures location (PID) of where input files/scripts are persisted
Processing Services
The steps so far have been building an environment to run a processing script
Either write your own script...
...or build from existing templates
...when you’re done, it will be submitted for processing on the Cloud!
PROV: Captures location (PID) of where input files/scripts are persisted
PROV: Finalised outputs are persisted with PIDs on RDS and captured in prov information
PROV: After job is completed – finalised Prov record is published to provenance store
PROV record endpoints could be registered in ANDS RDA along side output data!!!
Components of the Virtual Hazard Impact & Risk Laboratory (VHIRL)
Data Services Processing Services
Compute Services Enablers
Virtual Laboratories/Ap
psData Analytics
Magnetics
Gravity
DEM
eScript
ANUGA
NCIPetascale
NCICloud
NeCTAR Cloud
AmazonCloud
Desktop
Service Orchestration
ProvenanceMetadata
Auth.
CoastalInundation
Tsuanmi Inundation
Scenario
Cyclone Wind Path Calculation
Landsat
Bathymetry
Cyclone WindModel
Surface Wave Propagation (earthquake)TCRM
Basic scientific data processing model - 1
Input Data ProcessOutput Data
Background: How do we represent VLs using standardised provenance?
Basic scientific data processing model - 2
Code ProcessOutput Data
Config
Input Data
input item Roles
Basic scientific data processing model - 3, PROV
Code ProcessOutput Data
Config
Input Data
Who/ which
system
Who
wasGeneratedBy
wasAttri
butedTowasAssociatedWith
used
Entity Activity AgentPROV classes:
Basic scientific data processing model - 4, PROMS
Report N
Entity Activity AgentPROV classes:PROMS classes:
hadStartingActivity /
hadEndingActivityReporting System X
reportingSystem
R.S. Report
Basic scientific data processing model - 5, Storage
Report N
Entity Activity AgentPROV classes:PROMS classes:
Reporting System X
R.S. Report
Report NReport N
Report M
Report NReporting System Y Report N
Report NReport N
OrganisationalProvenance
Store
reported and stored
managed data
web service
data
user supplied
data
managed code
user supplied
code
Data Management
VL ID’d and persisted
output data
cited using PROMS-O format
soon to be VL ID’d and persisted, with minimal metadata recorded too
SSSC ID’s and persisted
perhaps SSSC ID’s and persisted, perhaps VL managed
soon to be VL ID’d and persisted, if required, perhaps with time limits
managed data
web service
data
user supplied
data
managed code
user supplied
code
Data Management
VL ID’d and persisted
output data
cited using PROMS-O format
soon to be VL ID’d and persisted, with minimal metadata recorded too
SSSC ID’s and persisted
perhaps SSSC ID’s and persisted, perhaps VL managed
soon to be VL ID’d and persisted, if required, perhaps with time limits
Virtual Labs Service Citation Example
[{ref}] {service title}{service endpoint URI}{query}{time queried}{cached copy ID}
[1] “Subset of elevation”
http://pid.csiro.au/service/anuga-thredds“bussleton.nc?var=elevation&spatial=bb&north=-33.06495205829679&south=-33.551573283840156&west=114.84967874597227&east=115.70661233971667&temporal=all&time_start=&time_end=&horizStride”
“2014-12-15T13:15:11”
http://pid.csiro.au/dataset/abcd1234