11
Alison Callahan and Michel Dumontier Carleton University Ovopubs: Modular data publication with minimal provenance Dumontier::Bio-ontologies 2013:Ovopubs 1

Ovopub: Modular data publication with minimal provenance

Embed Size (px)

DESCRIPTION

With the growth of the Semantic Web as a medium for creating, consuming, mashing up and republishing data, our ability to trace any statement(s) back to their origin is becoming ever more important. Several approaches have now been proposed to associate statements with provenance, with multiple applications in data publication, attribution and argumentation. Here, we describe the ovopub, a modular model for data publication that enables encapsulation, aggregation, integrity checking, and selective-source query answering. We describe the ovopub RDF specification, key design patterns and their application in the publication and referral to data in the life sciences. paper: http://arxiv.org/abs/1305.6800 presented at bio-ontologies 2013: https://sites.google.com/site/bioontologies/home

Citation preview

Page 1: Ovopub: Modular data publication with minimal provenance

Dumontier::Bio-ontologies 2013:Ovopubs 1

Alison Callahan and Michel DumontierCarleton University

Ovopubs: Modular data publication with minimal provenance

Page 2: Ovopub: Modular data publication with minimal provenance

Dumontier::Bio-ontologies 2013:Ovopubs 2

Data publication

• Emerging interest in publishing data on the web• microdata formats (rdfa, schema.org) and formal

knowledge representation languages (RDF/OWL) • Efforts to capturing credit/provenance of assertions– PROV-O, OAG– nanopublications (data/statements - Groth, Kuth)– microattributions (gene variation - Patrinos et al)– micropublications (discourse - Clark et al)

Page 3: Ovopub: Modular data publication with minimal provenance

Dumontier::Bio-ontologies 2013:Ovopubs 3

assertions

Nanopublication• A nanopublication claims to be the “smallest,

unambiguous unit of thought”. • A nanopublication is an RDF graph that links to

two/three graphs:– A graph containing one or more assertions– A graph containing the provenance for the assertion(s)– A graph providing information about the nanopublication

assertion provenance publication

Problems : indirection between assertion and its provenance; what if no provenance is provided? nanopub graph cannot fully contain other graphs; reasoning and easy of queries across nested graphs.

Page 4: Ovopub: Modular data publication with minimal provenance

Dumontier::Bio-ontologies 2013:Ovopubs 4

an Ovopub is an object that contains and links to data and the ovopub’s provenance

data

provenance

Page 5: Ovopub: Modular data publication with minimal provenance

Dumontier::Bio-ontologies 2013:Ovopubs 5

an assertion ovopub contains one or more connected statements

This ovopub is good for capturing knowledge in the form of statements

Page 6: Ovopub: Modular data publication with minimal provenance

Dumontier::Bio-ontologies 2013:Ovopubs 6

An ovopub also links itself to its content

rdfs:member <uri>

This explicit reification enables transitive closures over graph structures

Page 7: Ovopub: Modular data publication with minimal provenance

Dumontier::Bio-ontologies 2013:Ovopubs 7

An ovopub contains and links to its own provenance

• dc:creator <uri>• dc:created xsd:datetime• dc:license <uri>• rdf:type sio:assertion-ovopub sio:collection-ovopub

creator

timestamp

license

ovopub type

Page 8: Ovopub: Modular data publication with minimal provenance

Dumontier::Bio-ontologies 2013:Ovopubs 8

a collection ovopub contains one or more unconnected items

Item types: - object - assertion ovopub - collection ovopub

This ovopub is good for - encapsulation and

redistribution of selected content

- restriction of query execution / results

Page 9: Ovopub: Modular data publication with minimal provenance

Dumontier::Bio-ontologies 2013:Ovopubs 9

iRefIndex: Ovopub Case Study for Datasets, Records, Assertions

Page 10: Ovopub: Modular data publication with minimal provenance

Dumontier::Bio-ontologies 2013:Ovopubs 10

Future work• Actively develop the nanopublication as a community

standard for provenance-based data publication– Assess the value of directly linking assertion & provenance graphs– Generate (revised) nanopublications in Bio2RDF

• Promote nanopublication-based design patterns for:– direct/indirect data/discourse assertions– Aggregation semantics

• Use of nanopublications for scientific research– Evidence gathering (HyQue)

Page 11: Ovopub: Modular data publication with minimal provenance

11

Michel [email protected]

Publications: http://dumontierlab.com Presentations: http://slideshare.com/micheldumontier