28
Provenance and annotations Stian Soiland-Reyes myGrid, University of Manchester HeRC CHIPSET meeting, Manchester, 2013-12-16 This work is licensed under a Creative Commons Attribution 3.0 Unported License

2013 12-16 Provenance and annotations

Embed Size (px)

DESCRIPTION

Presenting PROV-O, PAV, Open Annotation Model and Research Object (RO). Powerpoint source: https://skydrive.live.com/view.aspx?cid=37935FEEE4DF1087&resid=37935FEEE4DF1087%21668&app=PowerPoint&wdo=1 See also: http://practicalprovenance.wordpress.com/ http://www.w3.org/TR/prov-primer/ http://www.w3.org/TR/prov-o/ http://www.researchobject.org/ http://www.openannotation.org/spec/core/

Citation preview

Page 1: 2013 12-16 Provenance and annotations

Provenance and annotationsStian Soiland-Reyes

myGrid, University of Manchester

HeRC CHIPSET meeting, Manchester, 2013-12-16This work is licensed under aCreative Commons Attribution 3.0 Unported License

Page 2: 2013 12-16 Provenance and annotations

What is provenance?

By Dr Stephen Dannlicensed under Creative Commons Attribution-ShareAlike 2.0 Generichttp://www.flickr.com/photos/stephendann/3375055368/

Derivationhow did it change?

Activitywhat happens to it?

Originwhere is it from?

Page 3: 2013 12-16 Provenance and annotations

What is provenance?

By Dr Stephen Dannlicensed under Creative Commons Attribution-ShareAlike 2.0 Generichttp://www.flickr.com/photos/stephendann/3375055368/

Attributionwho did it?

Licensingcan I use it?

Attributeswhat is it?

Annotationswhat do others say about it?

Aggregationwhat is it part of?

Date and toolwhen was it made?using what?

Page 4: 2013 12-16 Provenance and annotations

AttributionWho collected this sample? Who helped?

Which lab performed the sequencing?

Who did the data analysis?

Who curated the results?

Who produced the raw data this analysis is based on?

Who wrote the analysis workflow?

Why do I need this?

i. To be recognized for my work

ii. Who should I give credits to?

iii. Who should I complain to?

iv. Can I trust them?

v. Who should I make friends with?

prov:wasAttributedToprov:actedOnBehalfOfdct:creatordct:publisherpav:authoredBypav:contributedBypav:curatedBypav:createdBypav:importedBypav:providedBy...

RolesPersonOrganizationSoftwareAgent

Agent types

AliceThe lab

Data

wasAttributedTo

actedOnBehalfOf

http://practicalprovenance.wordpress.com/

Page 5: 2013 12-16 Provenance and annotations

DerivationWhich sample was this metagenome sequenced from?

Which meta-genomes was this sequence extracted from?

Which sequence was the basis for the results?

What is the previous revision of the new results?

Why do I need this?

i. To verify consistency (did I usethe correct sequence?)

ii. To find the latest revision

iii. To backtrack where a diversionappeared after a change

iv. To credit work I depend on

v. Auditing and defence for peer review

wasDerivedFrom

wasQuotedFrom

Sequence

New results

wasDerivedFrom

Sample

Meta -genome

Old results

wasRevisionOf

wasInfluencedBy

Page 6: 2013 12-16 Provenance and annotations

Activities

What happened? When? Who?

What was used and generated?

Why was this workflow started?

Which workflow ran? Where?

Why do I need this?

i. To see which analysis was performed

ii. To find out who did what

iii. What was the metagenome used for?

iv. To understand the whole process“make me a Methods section”

v. To track down inconsistencies

used

wasGeneratedBy

wasStartedAt

"2012-06-21"

Metagenome

Sample

wasAssociatedWith

Workflow server

wasInformedBy

wasStartedBy

Workflow run

wasGeneratedBy

Results

Sequencing

wasAssociatedWith

Alice

hadPlan

Workflow definition

hadRole

Lab technician

Results

Page 7: 2013 12-16 Provenance and annotations

PROV model

http://www.w3.org/TR/prov-primer/

Copyright © 2013 W3C® (MIT, ERCIM, Keio, Beihang), All Rights Reserved.

Provenance Working Group

Page 8: 2013 12-16 Provenance and annotations

PROV implementationsAERS-LD agentSwitch Amalgame

Annotation Inference

Framework

APROVeD: Automatic Provenance Derivation

checker.pl CollabMap cProv csv2rdf4lod-automation

D2R Server DataFAQs DBpedia DeFactoDublin Core to

PROV mapping

Earth System Science Server

Global Change Information

SystemHedgehog

Human Computation

ontology

Informed Rural Passenger

Information Infrastructure

ISO_19115_Lineage

Music Ontology

OBIAMAOECD Linked

Data

Open Provenance Model for Workflows (OPMW)

OpenUp Prov

Oracle Enterprise Transactions

Controls Governor

PAV Provenance,

Authoring and Versioning

PML 3.0Policy

Reasoning Framework

PoN P-planPROV Python library

prov-api prov-check

Provenance Environment

(ProvEn) Services

Provenance for Earth Science

Provenance server

Provenance Vocabulary

Prov-genPROV-N to Neo4J DB mapping

PROVoKingProv

ToolboxProv-

Validatorprovx2o Pubby

PubFlow Provenance

Archive

Quality Assessment Framework

QuerioCity research

prototypeRaw2LD recoprov roevo

Semantic Proteomics Dashboard (SemPoD)

SIGNAStatJR eBook

system

SysPro Taverna tavernaprovTinga

Provenance Service

TriplifyTWC

Healthdata

University of Southampton

Open Data

WebLab-PROV

wfprov

Wings Provenance

ExportYanfeng Shu

http://dx.doi.org/10.6084/m9.figshare.878099

PROV-N PROV-O PROV-XMLPROV-JSONLegend:

Source (2013-04-16):http://www.w3.org/TR/prov-implementations/

Page 9: 2013 12-16 Provenance and annotations

Open Annotation Data Model

http://www.openannotation.org/spec/core/core.html

Copyright © 2012-2013 the Contributors to the Open Annotation Core Data Model Specification, published by the Open Annotation Community Group under the W3C

Community Contributor License Agreement (CLA).

Page 10: 2013 12-16 Provenance and annotations

Example: David’s slides are about ClinicalCodes

http://dev.mygrid.org.uk/wiki/download/attachments/16384498/daspringate_clinicalcodes_HeRC.pdf

https://clinicalcodes.rss.mhs.man.ac.uk/

foaf:primaryTopic

Option 1: The FOAF vocabulary

The primaryTopic property relates a document to the main thing that the document is about.

Page 11: 2013 12-16 Provenance and annotations

Example: David’s slides are about ClinicalCodes

http://dev.mygrid.org.uk/wiki/download/attachments/16384498/daspringate_clinicalcodes_HeRC.pdf

https://clinicalcodes.rss.mhs.man.ac.uk/

annotation

oa:hasBodyoa:hasTarget

Option 2: Open Annotation Data Model

Page 12: 2013 12-16 Provenance and annotations

Annotations have provenance

annotationoa:hasBody oa:hasTarget

oa:annotatedBy

Stian Soiland-Reyes

foaf:name

pav:authoredBy

David A. Springate

foaf:name

© 2013 David A. Springate

pav:createdBy

David A. Springate

foaf:name

pav:retrievedBy

http://purl.org/pav/htmlWho is the “creator” of the slides, is it David or Stian?With PAV we can differentiate content authoring from upload

Page 13: 2013 12-16 Provenance and annotations

Annotations have provenance

annotationoa:hasBody oa:hasTarget

http://orcid.org/0000-0001-9842-9718

oa:annotatedBy

Stian Soiland-Reyes

foaf:name

pav:authoredBy

David A. Springate

foaf:name

© 2013 David A. Springate

pav:createdBy

David A. Springate

foaf:name

pav:retrievedBy

http://purl.org/pav/html

Which David…? Need a common identifier ORCID

Page 14: 2013 12-16 Provenance and annotations

Annotations as first-class citizens

annotationoa:hasBody oa:hasTarget

oa:motivatedBy

oa:bookmarking

oa:classifying

oa:commenting

oa:describing

oa:editing

oa:highlighting

oa:identifying

oa:linking

oa:moderating

oa:questioning

oa:replying

oa:tagging

JSON

Turtle

Page 15: 2013 12-16 Provenance and annotations

Provenance of what?

Who made the (content of) this data set? Who maintains it?

Who wrote this document? Who uploaded it?

Which CSV was this Excel file imported from?

Who wrote this description? When? How did we get it?

What is the state of these guidelines? Are they official?

What did the guidelines look like before? (Revisions) – are there newer versions?

What new resources have been derived from this data set?

Page 16: 2013 12-16 Provenance and annotations

http://www.researchobject.org/

RESEARCH OBJECT (RO)

http://www.researchobject.org/

Research objects goal: Openly share everything about your experiments, including how those things are related

Page 17: 2013 12-16 Provenance and annotations

What is in a research object?A Research Object bundles and relates digital resources of a scientific experiment or investigation:

Data used and results produced in experimental study

Methods employed to produce and analyse that data

Provenance and settings for the experiments

People involved in the investigation

Annotations about these resources, that are essential to the understanding and interpretation of the scientific outcomes captured by a research object

http://www.researchobject.org/

Page 18: 2013 12-16 Provenance and annotations

Gathering everythingResearch Objects (RO) aggregate related resources, their provenance and annotations

Conveys “everything you need to know” about a study/experiment/analysis/dataset/workflow

Shareable, evolvable, contributable, citable

ROs have their own provenance and lifecycles

Page 19: 2013 12-16 Provenance and annotations

Research object model at a glance

Research Object

ResourceResource

Resource

AnnotationAnnotation

Annotation

oa:hasTarget

ResourceResourceAnnotation graph

oa:hasBody

ore:aggregates

Manifest

Page 20: 2013 12-16 Provenance and annotations

Why Research Objects?i. To share your research materials

(RO as a social object)

ii. To facilitate reproducibility and reuse of methods

iii. To be recognized and cited(even for constituent resources)

iv. To preserve results and prevent decay (curation of workflow definition; using provenance for partial rerun)

Page 21: 2013 12-16 Provenance and annotations

A Research objecthttp://alpha.myexperiment.org/packs/387

Page 22: 2013 12-16 Provenance and annotations
Page 23: 2013 12-16 Provenance and annotations

Annotations in research objectsTypes: “This document contains an hypothesis”

Relations: “These datasets are consumed by that tool”

Provenance: “These results came from this workflow run”

Descriptions: “Purpose of this step is to filter out invalid data”

Comments: “This method looks useful, but how do I install it?”

Examples: “This is how you could use it”

Page 24: 2013 12-16 Provenance and annotations

Annotation guidelines – which properties?Descriptions: dct:title, dct:description, rdfs:comment, dct:publisher, dct:license, dct:subject

Provenance: dct:created, dct:creator, dct:modified, pav:providedBy, pav:authoredBy, pav:contributedBy, roevo:wasArchivedBy, pav:createdAt

Provenance relations: prov:wasDerivedFrom, prov:wasRevisionOf, wfprov:usedInput, wfprov:wasOutputFrom

Social networking: oa:Tag, mediaont:hasRating, roterms:technicalContact, cito:isDocumentedBy, cito:isCitedBy

Dependencies: dcterms:requires, roterms:requiresHardware, roterms:requiresSoftware, roterms:requiresDataset

Typing: wfdesc:Workflow, wf4ever:Script, roterms:Hypothesis, roterms:Results, dct:BibliographicResource

Page 25: 2013 12-16 Provenance and annotations

Saving a research object: RO bundle

Single, transferrable research object

Self-contained snapshot

Which files in ZIP, which are URIs? (Up to user/application)

Regular ZIP file, explored and unpacked with standard tools

JSON manifest is programmatically accessible without RDF understanding

Works offline and in desktop applications – no REST API access required

Basis for RO-enabled file formats, e.g. Taverna run bundle

Exchanged with myExperiment and RO tools

Page 26: 2013 12-16 Provenance and annotations

Workflow Results Bundle

workflowrun.prov.ttl(RDF)

outputA.txt

outputC.jpg

outputB/

https://w3id.org/bundle

intermediates/

1.txt2.txt

3.txt

de/def2e58b-50e2-4949-9980-fd310166621a.txt

inputA.txtworkflow

URI references

attribution

executionenvironment

Aggregating in Research Object

ZIP folder structure (RO Bundle)

mimetype

application/vnd.wf4ever.robundle+zip

.ro/manifest.json

Page 27: 2013 12-16 Provenance and annotations

RO Bundle

What is aggregated? File In ZIP or external URI

Who made the RO? When?

Who?

External URIs placed in folders

Embedded annotation

External annotation, e.g. blogpost

JSON-LD context RDF

RO provenance

.ro/manifest.json

Format

Note: JSON "quotes" not shown above for brevity

http://json-ld.org/

http://orcid.org/

https://w3id.org/bundle

Page 28: 2013 12-16 Provenance and annotations

http://mayor2.dia.fi.upm.es/oeg-upm/files/dgarijo/motifAnalysisSite/

<h3 property="dc:title">Common Motifs in Scientific Workflows:<br>An Empirical Analysis</h3>

<body resource="http://www.oeg-upm.net/files/dgarijo/motifAnalysisSite/" typeOf="ore:Aggregation ro:ResearchObject">

Research Object as RDFahttp://www.oeg-upm.net/files/dgarijo/motifAnalysisSite/

<li><a property="ore:aggregates" href="t2_workflow_set_eSci2012.v.0.9_FGCS.xls"typeOf="ro:Resource">Analytics for Taverna workflows</a></li>

<li><a property="ore:aggregates" href="WfCatalogue-AdditionalWingsDomains.xlsx“typeOf="ro:Resource">Analytics for Wings workflows</a></li>

<span property="dc:creator prov:wasAttributedTo"resource="http://delicias.dia.fi.upm.es/members/DGarijo/#me"></span>