21
Traceability, reproducibility, and scalability in Integrated Ecosystem Assessments: July 2013 ECO-OP is supported by NSF Grant #0955649 PIs: Peter Fox (RPI) and Andrew Maffei (WHOI) NEFSC Collaborators: Jon Hare and Mike Fogarty Software programmer: Massimo Di Stefano Informatics and metadata: Stace Beaulieu [email protected]

Traceability, reproducibility, and scalability in Integrated Ecosystem Assessments: July 2013

  • Upload
    adina

  • View
    43

  • Download
    0

Embed Size (px)

DESCRIPTION

Traceability, reproducibility, and scalability in Integrated Ecosystem Assessments: July 2013. ECO-OP is supported by NSF Grant #0955649 PIs: Peter Fox (RPI) and Andrew Maffei (WHOI) NEFSC Collaborators: Jon Hare and Mike Fogarty Software programmer: Massimo Di Stefano - PowerPoint PPT Presentation

Citation preview

Page 1: Traceability, reproducibility, and scalability in Integrated Ecosystem Assessments: July 2013

Traceability, reproducibility, and scalability in Integrated Ecosystem Assessments:

July 2013

ECO-OP is supported by NSF Grant #0955649PIs: Peter Fox (RPI) and Andrew Maffei (WHOI)

NEFSC Collaborators: Jon Hare and Mike Fogarty

Software programmer: Massimo Di StefanoInformatics and metadata: Stace Beaulieu

[email protected]

Page 2: Traceability, reproducibility, and scalability in Integrated Ecosystem Assessments: July 2013

Traceability, reproducibility, and scalability in Integrated Ecosystem Assessments:

Adopting a provenance modelfor a collaborative report

July 2013

ECO-OP is supported by NSF Grant #0955649PIs: Peter Fox (RPI) and Andrew Maffei (WHOI)

NEFSC Collaborators: Jon Hare and Mike Fogarty

Software programmer: Massimo Di StefanoInformatics and metadata: Stace Beaulieu

[email protected]

Page 3: Traceability, reproducibility, and scalability in Integrated Ecosystem Assessments: July 2013

Traceability, reproducibility, and scalability in Integrated Ecosystem Assessments:

Adopting a provenance modelfor a collaborative report

July 2013

Page 4: Traceability, reproducibility, and scalability in Integrated Ecosystem Assessments: July 2013

Traceability, reproducibility, and scalability in Integrated Ecosystem Assessments:

Adopting a provenance modelfor a collaborative report

July 2013

Metadata for data and workflow provenance(i.e., the marine ecosystem indicators and the collaborative report)

Page 5: Traceability, reproducibility, and scalability in Integrated Ecosystem Assessments: July 2013

Use Case:Northeast Shelf Large Marine Ecosystem

Ecosystem Status Report

“traceability, repeatability, explanation, verification, and validation” for ecosystem data and information products in the NEFSC Ecosystem Status Report (ESR)

Goal:

Page 6: Traceability, reproducibility, and scalability in Integrated Ecosystem Assessments: July 2013

Page from 2009 ESR

Section on Climate Forcing

Figures available for download as PDF or image files –

but without access to data or metadata

Page 7: Traceability, reproducibility, and scalability in Integrated Ecosystem Assessments: July 2013

Page from 2009 ESR

Section on Climate Forcing

Figures available for download as PDF or image files –

but without access to data or metadata

Note: NOAA directive forISO 19115 metadata, butthese are not sufficient to describe time-series indicators

Page 8: Traceability, reproducibility, and scalability in Integrated Ecosystem Assessments: July 2013
Page 9: Traceability, reproducibility, and scalability in Integrated Ecosystem Assessments: July 2013

Software design to track provenance

M. Di Stefano

Page 10: Traceability, reproducibility, and scalability in Integrated Ecosystem Assessments: July 2013

Software design to track provenance

M. Di Stefano

Page 11: Traceability, reproducibility, and scalability in Integrated Ecosystem Assessments: July 2013

PROV Data Modelhttp://www.w3.org/TR/prov-dm/W3C Recommendation 30 April 2013

Core Structures (types and relations)

Page 12: Traceability, reproducibility, and scalability in Integrated Ecosystem Assessments: July 2013

PROV Data Modelhttp://www.w3.org/TR/prov-dm/W3C Recommendation 30 April 2013

Core Structures (types and relations)

Entity may be a single data product, or a chapter containing several data products

Page 13: Traceability, reproducibility, and scalability in Integrated Ecosystem Assessments: July 2013

PROV-O: The PROV Ontology (expresses PROV-DM using OWL2)http://www.w3.org/TR/prov-o/

PROV Data Modelhttp://www.w3.org/TR/prov-dm/W3C Recommendation 30 April 2013

Core Structures (types and relations)

Entity may be a single data product, or a chapter containing several data products

Page 14: Traceability, reproducibility, and scalability in Integrated Ecosystem Assessments: July 2013

http://ipython.org/Screenshot of IPython Notebook used to track both data and workflow provenance

Page 15: Traceability, reproducibility, and scalability in Integrated Ecosystem Assessments: July 2013

http://ipython.org/Screenshot of IPython Notebook used to track both data and workflow provenance

Code inPython,Matlab,R, other

Page 16: Traceability, reproducibility, and scalability in Integrated Ecosystem Assessments: July 2013

http://ipython.org/Screenshot of IPython Notebook used to track both data and workflow provenance

Code inPython,Matlab,R, other

Page 17: Traceability, reproducibility, and scalability in Integrated Ecosystem Assessments: July 2013

http://ipython.org/Screenshot of IPython Notebook used to track both data and workflow provenance

Notebook can be shared, or output as script, HTML, PDF,other

Page 18: Traceability, reproducibility, and scalability in Integrated Ecosystem Assessments: July 2013

PDF output of IPython Notebook with clickable links to data and code

Page 19: Traceability, reproducibility, and scalability in Integrated Ecosystem Assessments: July 2013

PDF output of IPython Notebook with clickable links to data and code

Page 20: Traceability, reproducibility, and scalability in Integrated Ecosystem Assessments: July 2013

Screenshot of csv file at GitHub

Page 21: Traceability, reproducibility, and scalability in Integrated Ecosystem Assessments: July 2013

Screenshot of csv file at GitHub

Having access not only to the data that are plotted, but also to provenance metadata increases the (re-) usability of the data