Reproducibility Using Semantics: An Overview
Dagstuhl SeminarJan 2016
Daniel Garijo, Olga Giraldo, Idafen Santana-Pérez, Victor Rodriguez Doncel, Oscar Corcho
Ontology Engineering Group Universidad Politécnica de Madrid
Madrid, Spain
The Research Method in different disciplines
2
INPUT DATA LABORATORY PROTOCOL EQUIPMENT
IN V
IVO
/VIT
RO
IN S
ILIC
O
DATASET SCIENTIFIC WORKFLOWINFRASTRUCTURE
Some problems in lab protocols
some of them present insufficient granularity,
the instructions can be imprecise or ambiguous due to the use of natural language.
• Incubate the centrifuge tubes in a water bath.
• Incubate the samples for 5 min with gentle shaking.
• Rinse DNA briefly in 1-2 ml of wash.
• Incubate at -20C overnight.
3
Currently…
Semi-structured information
Unstructured information
How to formalize the information from laboratory protocols as a knowledge base?
NLP tools + Ontologies
4
Semantic annotation
SMART Protocols ontology is available here:http://vocab.linkeddata.es/SMARTProtocols/
GATE Smart Protocols
5
The Research Method in different disciplines
6
INPUT DATA LABORATORY PROTOCOL EQUIPMENT
IN V
IVO
/VIT
RO
IN S
ILIC
O
DATASET SCIENTIFIC WORKFLOWINFRASTRUCTURE
Vocabularies and methodologies for representing and publishing workflows
7
Interactive Browsing
(Pubby frontend)
Programatic access(external apps)
Wings workflow generation
OPM/PROVconversion Publication Share Reuse
Core
Portal
WINGS on local laptop
Workflow Template
WorkflowInstance
PROVexport
Core
Portal
WINGS on shared hostWorkflow Template
WorkflowInstance
PROVexport
Core
Portal
WINGS on web serverWorkflow Template
WorkflowInstance
PROVexport
LinkedData
Publication
Users
Other workflow environments
RDF TripleStore
Workflow Provenance
Workflow PlanMethodology for workflow publishing
Repository of linked workflows:http://www.opmw.org/sparql
http://purl.org/net/p-plan
http://www.opmw.org/ontology/
Daniel Garijo and Yolanda Gil. 2011. A new approach for publishing workflows: abstractions, standards, and linked data. (WORKS '11). ACM, New York, NY, USA, 47-56.Daniel Garijo and Yolanda Gil. Augmenting PROV with Plans in P-PLAN: Scientific Processes as Linked Data. In Proceedings of the 2nd International Workshop on Linked Science 2012, Boston, 2012.
7
The Research Method in different disciplines
8
INPUT DATA LABORATORY PROTOCOL EQUIPMENT
IN V
IVO
/VIT
RO
IN S
ILIC
O
DATASET SCIENTIFIC WORKFLOWINFRASTRUCTURE
PegasusMontageSoyKBEpigenomics
CLOUD
Reproducibility of Computational Scientific Experiments
9
FORMEREQUIPMENT
ANNOTATE REPRODUCE
SEMANTIC ANNOTATIONS
EQUIVALENT EXECUTION
ENVIRONMENT
Dispel4PyInternal ExtinctionSeismic Cross Correlation
MakeflowBlast
Some results
• Pegasus Montage Workflow• Astronomy workflow• Construct large image mosaics of the sky• Montage Software distribution• 59 binaries
• Target IaaS Cloud Providers• Amazon EC2 & Futuregrid• Vagrant
10
RO available at http://pegasus.isi.edu/publications/reppar
The Research Method in different disciplines
11
INPUT DATA LABORATORY PROTOCOL EQUIPMENT
IN V
IVO
/VIT
RO
IN S
ILIC
O
DATASET SCIENTIFIC WORKFLOWINFRASTRUCTURE
+ CONTEXT!
Research Objects
ROs as web pages http://rohub.linkeddata.es/ROs as part of a Linked Data Platform (alpha): http://purl.org/net/ldp4ro
12
How to preserve Workflows/Research Objects?
13
Three main ways/levels:• Descriptive reproducibility
• Documentation• Workflow execution reproducibility
• Can we run the workflow?• Workflow results reproducibility
• Can we get the same results?
Checklists!• Corcho et al: Checklist for workflow conservation.
• http://dx.doi.org/10.6084/m9.figshare.1285011• 40 different aspects
• Documentation• Goals• Results• Metadata
• Corcho et al: Checklist for a workflow conservation plan• http://dx.doi.org/10.6084/m9.figshare.1285012• Based on the DCC’s data management plan
Some examples
14
Levels of reproducibility
Workflow conservation Plan
Acknowledgements
• The Semantic e-Science team at UPM• Carlos Badenes• Daniel Garijo• Olga Giraldo• Rafael González-Cabero• Idafen Santana• Victor Rodriguez Doncel
• The Wf4Ever team• Carole Goble, José Manuel Gómez Pérez, Raúl Palma,
Jun Zhao, Stian Soiland-Reyes, Khalid Belhajjame, José Enrique Ruíz, Marco Roos, Lourdes Verdes-Montenegro, Norman Morrison, Sean Bechoffer, Graham Klyne, Matt Gamble, and a large etcetera
• The Research Object community group• http://www.researchobject.org/
16