A Semantic Web Approach for the Third Provenance Challenge
Tetherless World Constellation@
Rensselaer Polytechnic Institute
James Michaelis, Li Ding, Rui Huang, Zhenning Shangguan, Deborah L. McGuinness
104/21/23
Introduction
• Our approach the Third Provenance Challenge (called TetherlessPC3) is designed to leverage Semantic Web technologies
• Support for two things useful for answering the provided queries:
• Declarative inference – SPARQL + OWL Syntax• Augmenting provenance data derived from the
workflow execution with supplementary information – SPARQL
204/21/23
TetherlessPC3 Approach
04/21/23 3
Provenance Generator Query Front-End
Import/Export Component
1 2
3
Provenance Generator Query Front-End
Import/Export Component
TetherlessPC3 Approach
Trace(OPM)
Run TW’sWorkflow code
Run other team’sWorkflow code
Trace(OPM’)
Trace(OWL)
PC3OPM(OWL)
Trace(PML)
Run Query (Pellet/Jena)
Query (SPARQL)
Results (Text)
Normalization(OPM’-OPM)
Query (English)
1 2
3 Translation(OPM-PC3OPM)
Translation(PC3OPM-PML)
Translation(English-Sparql)
Provenance Generator Query Front-End
Import/Export Component
TetherlessPC3 Approach
Trace(OPM)
Run TW’sWorkflow code
Run other team’sWorkflow code
Trace(OPM’)
Trace(OWL)
PC3OPM(OWL)
Trace(PML)
Run Query (Pellet/Jena)
Query (SPARQL)
Results (Text)
Normalization(OPM’-OPM)
Query (English)
1 2
3 Translation(OPM-PC3OPM)
Translation(PC3OPM-PML)
Translation(English-Sparql)
Produces provenance traces in Web Ontology Language (OWL) format, using Jena – a Java-based Semantic Web framework
These are structured based on the PC3OPM Ontology athttp://www.cs.rpi.edu/~michaj6/Provenance/PC3OPM.owl
PC3OPM is designed to be compatible with the OPM Specification v1.01
Provenance Generator Query Front-End
Import/Export Component
TetherlessPC3 Approach
Trace(OPM)
Run TW’sWorkflow code
Run other team’sWorkflow code
Trace(OPM’)
Trace(OWL)
PC3OPM(OWL)
Trace(PML)
Run Query (Pellet/Jena)
Query (SPARQL)
Results (Text)
Normalization(OPM’-OPM)
Query (English)
1 2
3 Translation(OPM-PC3OPM)
Translation(PC3OPM-PML)
Translation(English-Sparql)
To get the provenance workflow execution service used
This is designed to run a modified version of the workflow emulation code provided by Yogesh Simmhan (Microsoft Research)
This modified version contains injected code (in section for executing high level workflow) to recording provenance information based on PC3OPM
Three properties of PC3OPM• Provide direct mappings to OPM concepts
• Example: PC3OPM:Artifact to the OPM concept “Artifact”• Reification of OPM relations
• Example: For the relationship (Process1, WasTriggeredBy, Process2)
• Declare an instance of the class PC3OPM:WasTriggeredBy.• Extend the definitions in OPM through new concepts
• Domain dependent: some terminology specific to Third Provenance Challenge workflow
• Example: CSVFileEntry (subclass of OPM Artifact)• Domain independent: Terminology from the Proof Markup
Language (PML)• We added a new concept “Function” based on (pmlp:inferenceRule),
where an OPM process is an execution of a “Function”
04/21/23 7
WHAT IS IT?•A Provenance interlingua designed for representing and sharing explanations generated by various intelligent systems.•Originally designed to explain activity of theorem proof generators•Part of the Inference Web framework (which includes tools for browsing, validating PML)
THREE PARTS•Justification: Provides structure for describing how a conclusion was derived•Provenance: Metadata on information referenced in Justification•Trust: Metadata on trust for information referenced in Justification
04/21/23 8
Proof Markup Language (PML)
Provenance Generator Query Front-End
Import/Export Component
TetherlessPC3 Approach
Trace(OPM)
Run TW’sWorkflow code
Run other team’sWorkflow code
Trace(OPM’)
Trace(OWL)
PC3OPM(OWL)
Trace(PML)
Run Query (Pellet/Jena)
Query (SPARQL)
Results (Text)
Normalization(OPM’-OPM)
Query (English)
1 2
3 Translation(OPM-PC3OPM)
Translation(PC3OPM-PML)
Translation(English-Sparql)
What we have done1.Review given English-based queries and form corresponding SPARQL Queries
2.Update PC3OPM ontology to assist with (1) and re-generate the Provenance trace
3.Run queries, and get back results
Provenance Generator Query Front-End
Import/Export Component
TetherlessPC3 Approach
Trace(OPM)
Run TW’sWorkflow code
Run other team’sWorkflow code
Trace(OPM’)
Trace(OWL)
PC3OPM(OWL)
Trace(PML)
Run Query (Pellet/Jena)
Query (SPARQL)
Results (Text)
Normalization(OPM’-OPM)
Query (English)
1 2
3 Translation(OPM-PC3OPM)
Translation(PC3OPM-PML)
Translation(English-Sparql)
Technologies used•SPARQL - RDF Query Language
•Pellet – an Open Source OWL Reasoner
Query Answering Example
• Provenance Challenge core question 3:• “Which operation executions were strictly necessary for the
Image table to contain a particular (non-computed) value?”
• Our interpretation:• Find the Process X which generated the Image table• Look for the processes XT (directly or indirectly) triggered X• Return X and as XT as query results
• Handling this query:• Rather than write a recursive program, we use OWL-based
transitive properties in the answer
1104/21/23
Enhancing Provenance Trace
• To keep our provenance trace simple and concise, we don’t put in transitive properties – since most of the queries don’t need them
• To insert them when necessary, we create additional RDF data through a SPARQL CONSTRUCT query
04/21/23 12
SPARQL SELECT QueryPREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX PC3: <http://www.cs.rpi.edu/~michaj6/provenance/OurTrace.owl#>PREFIX PC3OPM: <http://www.cs.rpi.edu/~michaj6/provenance/PC3OPM.owl#>
SELECT ?fxn1 ?fxn2FROM <http://www.cs.rpi.edu/~michaj6/provenance/PC3OPM.owl#>FROM http://www.cs.rpi.edu/~michaj6/provenance/OurTrace.owl#FROM <http://onto.rpi.edu/sw4j/sparql?queryURL=http://tw.rpi.edu/proj/portal.wiki/images/3/36/MakeMoreTriples.sparql>
WHERE { ?wgb PC3OPM:wgbSource PC3:provVarDbEntryP2ImageMeta_0 .?wgb PC3OPM:wgbTarget ?fxn1 .OPTIONAL { ?fxn1 PC3OPM:opWasTriggeredBy ?fxn2 . }
}
1304/21/23
SPARQL CONSTRUCT QueryPREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX PC3: <http://www.cs.rpi.edu/~michaj6/provenance/PC3.owl#>PREFIX PC3OPM: <http://www.cs.rpi.edu/~michaj6/provenance/PC3OPM.owl#>
CONSTRUCT { ?FXN PC3OPM:opWasTriggeredBy ?FXN2 }
FROM <http://www.cs.rpi.edu/~michaj6/provenance/PC3.owl>FROM <http://www.cs.rpi.edu/~michaj6/provenance/PC3OPM.owl>
WHERE {?USD PC3OPM:usdSource ?FXN . ?USD PC3OPM:usdTarget ?VAR .?WGB PC3OPM:wgbSource ?VAR . ?WGB PC3OPM:wgbTarget ?FXN2
}
1404/21/23
Provenance Generator Query Front-End
Import/Export Component
TetherlessPC3 Approach
Trace(OPM)
Run TW’sWorkflow code
Run other team’sWorkflow code
Trace(OPM’)
Trace(OWL)
PC3OPM(OWL)
Trace(PML)
Run Query (Pellet/Jena)
Query (SPARQL)
Results (Text)
Normalization(OPM’-OPM)
Query (English)
1 2
3 Translation(OPM-PC3OPM)
Translation(PC3OPM-PML)
Translation(English-Sparql)
Can Import: OPM GraphsCan Export: OPM Graphs PML Proofs
The Import/Export protocols for OPM are handled through the OPM API
Likewise, the import/export Protocols for PML are handledThrough a PML API developedby our lab.
Discussion: Importing From Other Teams
• Some OPM graphs generated by other teams were not parsable by OPM API, so normalization was needed
• Our SPAQRL queries (used on our provenance trace) only needed slight modification to handle imported provenance (change URIs of artifacts)
• Some information loss was observed with many teams dumping provenance traces to OPM
• Control flow traces were not captured by some teams
04/21/23 16
Comparing with other Teams:Answering Core Query 3
Blue Team
Our Team Green Team
Conclusions• Semantic Web technologies useful for handling
provenance data• Provenance generation – RDF/OWL helps clarify the
domain specific concepts/entities in provenance metadata
• Provenance Query – supported by SPARQL + OWL inference
• We can capture control flow and data flow• Using transitive inference rules, we don’t need to write a program
to implement a recursive query• Provenance integration – RDF/OWL syntax of OPM (with
references to domain terminology) will help avoid information loss issues when exporting OPM data
04/21/23 18
References• OWL
http://www.w3.org/TR/owl-features/• SPARQL
http://www.w3.org/TR/rdf-sparql-query/• Pellet
http://clarkparsia.com/pellet/• Jena
http://jena.sourceforge.net/• PML API
http://inference-web.org/wiki/Tools_%26_Demos• OPM API http://openprovenance.org/java/maven-snapshots/org/openprovenance/
04/21/23 19
BACK
04/21/23 20
PC3 OPM Ontology
04/21/23 21