51
Architecture Tutorial Provenance: overview Professor Luc Moreau [email protected] University of Southampton www.ecs.soton.ac.uk/~lavm

Architecture Tutorial Provenance: overview Professor Luc Moreau [email protected] University of Southampton lavm

Embed Size (px)

Citation preview

Page 1: Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm

Architecture Tutorial

Provenance: overview

Professor Luc [email protected] of Southampton

www.ecs.soton.ac.uk/~lavm

Page 2: Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm

Architecture Tutorial

Provenance & PASOA Teams

• University of Southampton– Luc Moreau, Paul Groth, Simon Miles, Victor Tan, Miguel Branco, Sofia

Tsasakou, Sheng Jiang, Steve Munroe, Zheng Chen• IBM UK (EU Project Coordinator)

– John Ibbotson, Neil Hardman, Alexis Biller• University of Wales, Cardiff

– Omer Rana, Arnaud Contes, Vikas Deora, Ian Wootten, Shrija Rajbhandari• Universitad Politecnica de Catalunya (UPC)

– Steven Willmott, Javier Vazquez• SZTAKI

– Laszlo Varga, Arpad Andics, Tamas Kifor

• German Aerospace– Andreas Schreiber, Guy Kloss, Frank Danneman

Page 3: Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm

Architecture Tutorial

Contents

• Motivation

• Provenance Concepts

• Provenance Architecture

• Standardisation

• Conclusions

Page 4: Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm

Architecture Tutorial

Motivation

Page 5: Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm

Architecture Tutorial

Scientific Research

Academic Peer Review

Page 6: Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm

Architecture Tutorial

Business Regulations

Audit (Sarbanes-Oxley)

Audit (Basel II)

Accounting

Banking

Page 7: Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm

Architecture Tutorial

Health Care Management

European Recommendation R(97)5: on the protection of medical data

Page 8: Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm

Architecture Tutorial

e-Science datasets

• How to undertake peer-reviewing and validation of e-Scientific results?

Page 9: Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm

Architecture Tutorial

Compliance to Regulations

• The “next-compliance” problem– Can we be certain that

by ensuring compliance to a new regulation, we do not break previous compliance?

Page 10: Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm

Architecture Tutorial

Current Solutions

• Proprietary, Monolithic• Silos, Closed• Do not inter-operate

with other applications• Not adaptable to new

regulations

Page 11: Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm

Architecture Tutorial

Provenance

• Oxford English Dictionary: – the fact of coming from some particular source or

quarter; origin, derivation– the history or pedigree of a work of art, manuscript,

rare book, etc.; – concretely, a record of the passage

of an item through its various

owners.

• Concept vs representation

Page 12: Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm

Architecture Tutorial

Provenance in Computer Systems• Our definition of provenance in the context of

applications for which process matters to end users:

The provenance of a piece of data is the process that led to that piece of data

• Our aim is to conceive a computer-based representation of provenance that allows us to perform useful analysis and reasoning to support our use cases

Page 13: Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm

Architecture Tutorial

Our Approach

• Define core concepts pertaining to provenance

• Specify functionality required to become “provenance-aware”

• Define open data models and protocols that allow systems to inter-operate

• Standardise data models and protocols• Provide a reference implementation• Provide reasoning capability

Page 14: Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm

Architecture Tutorial

Context (1)

Aerospace engineering: maintain a historical record of design processes, up to 99 years.

Organ transplant management: tracking of previous decisions, crucial to maximise the efficiency in matching and recovery rate of patients

Page 15: Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm

Architecture Tutorial

Context (2)

High Energy Physics: tracking, analysing, verifying data sets in the ATLAS Experiment of the Large Hadron Collider (CERN)

Bioinformatics: verification and auditing of “experiments” (e.g.for drug approval)

Page 16: Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm

Architecture Tutorial

Provenance Concepts

Page 17: Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm

Architecture Tutorial

Provenance “Lifecycle”

ApplicationApplication

Data Results

ProvenanceStore

Record Documentation of Execution

Query andReason overProvenance

of Data

AdministerStore and itscontents

Core Interfaces to Provenance Store

Page 18: Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm

Architecture Tutorial

Nature of Documentation

• We represent the provenance of some data by documenting the process that led to the data:– documentation can be complete or partial;– it can be accurate or inaccurate; – it can present conflicting or consensual views

of the actors involved; – it can provide operational details of execution

or it can be abstract.

Page 19: Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm

Architecture Tutorial

p-assertion

• A given element of process documentation will be referred to as a p-assertion

– p-assertion: is an assertion that is made by an actor and pertains to a process.

Page 20: Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm

Architecture Tutorial

Service Oriented Architecture• Broad definition of service as component that takes

some inputs and produces some outputs. • Services are brought together to solve a given problem

typically via a workflow definition that specifies their composition.

• Interactions with services take place with messages that are constructed according to services interface specification.

• The term actor denotes either a client or a service in a SOA.

• A process is defined as execution of a workflow

Page 21: Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm

Architecture Tutorial

M1

M2

M3

M4

Actor 1 Actor 2

I received M1, M4I sent M2, M3

I received M3I sent M4

From these p-assertions, we can derive that M3 was sent by Actor 1and received by Actor 2 (and likewise for M4)

If actors are black boxes, these assertions are not very useful because we do not know dependencies between messages

Process Documentation (1)

Page 22: Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm

Architecture Tutorial

M1

M2

M3

M4

Actor 1 Actor 2

M2 is in reply to M1M3 is caused by M1M2 is caused by M4

M4 is in reply to M3

These assertions help identify order of messages,but not how data was computed

Process Documentation (2)

Page 23: Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm

Architecture Tutorial

f

M1

M2

M3

M4

Actor 1 Actor 2

f1

f2

M3 = f1(M1)M2 = f2(M1,M4) M4 = f(M3)

These assertions help identify how data is computed,but provide no information about non-functional characteristics of the computation(time, resources used, etc)

Process Documentation (3)

Page 24: Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm

Architecture Tutorial

M1

M2

M3

M4

Actor 1 Actor 2

I used 386 clusterRequest sat inqueue for 6min

I used sparc processor

I used algorithm x version x.y.z

Process Documentation (4)

Page 25: Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm

Architecture Tutorial

Types of p-assertions (1)

– Interaction p-assertion: is an assertion of the contents of a message by an actor that has sent or received that message

I received M1, M4I sent M2, M3

Page 26: Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm

Architecture Tutorial

Types of p-assertions (2)– Relationship p-assertion: is an assertion, made

by an actor, that describes how the actor obtained an output message sent in an interaction by applying some function to input messages from other interactions (likewise for data)

M2 is in reply to M1M3 is caused by M1M2 is caused by M4

M3 = f1(M1)M2 = f2(M1,M4)

Page 27: Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm

Architecture Tutorial

Types of p-assertions (3)

– Actor state p-assertion: assertion made by an actor about its internal state in the context of a specific interaction

I used sparc processor

I used algorithm xversion x.y.z

Page 28: Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm

Architecture Tutorial

Data flow

• Interaction p-assertions allow us to specify a flow of data between actors

• Relationship p-assertions allow us to characterise the flow of data “inside” an actor

• Overall data flow (internal + external) constitutes a DAG, which characterises the process that led to a result

Page 29: Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm

Architecture Tutorial

Provenance Architecture

Page 30: Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm

Architecture Tutorial

Interfaces to Provenance Store

ApplicationApplication

Results

ProvenanceStore

Record Documentation of Execution

Query andReason overProvenance

of Data

AdministerStore and itscontents

Page 31: Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm

Architecture Tutorial

Page 32: Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm

Architecture Tutorial

P-Assertion schemas

Page 33: Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm

Architecture Tutorial

The p-structure

• The p-structure is a common logical structure of the provenance store shared by all asserting and querying actors

• Hierarchical• Indexed by interactions (interaction= 1 message

exchange)

Page 34: Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm

Architecture Tutorial

Recording Protocol (Groth04-06)

• Abstract machines• DS Properties

– Termination– Liveness– Safety– Statelessness

• Documentation Properties– Immutability– Attribution– Datatype safety

• Foundation for adding necessary cryptographic techniques

Page 35: Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm

Architecture Tutorial

Querying Functionality (Miles06)

• Process Documentation Query Interface: allows for “navigation” of the documentation of execution– Allows us to view the provenance store (i.e. the p-

structure) as if containing XML data structures– Independent of technology used for running

application and internal store representation– Seamless navigation of application dependent and

application independent process documentation

Page 36: Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm

Architecture Tutorial

Querying Functionality (Miles06)

• Provenance Query Interface: allows us to obtain the provenance of some specific data

• A recognition that there is not “one” provenance for a piece of data, but there may be different, depending on the end-user’s interest

• Hence, provenance is seen as the result of a query:– Identify a piece of data at a specific execution point– Scope of the process of interest:

• Filter in/out p-assertions according to actors, process, types of relationships, etc

Page 37: Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm

Architecture Tutorial

Standardisation

Page 38: Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm

Architecture Tutorial

Standardisation Options

APIsProgrammatic

inter-op

Recording and querying

InterfacesService inter-op

Provenance ModelData inter-op

Page 39: Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm

Architecture Tutorial

Purpose of Standardisation

ApplicationApplication

ProvenanceStores

Record Documentation of Execution

ApplicationApplication

Allow for multiple applications to document their execution.Applications may be running in different institutions.

Page 40: Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm

Architecture Tutorial

Purpose of Standardisation

ApplicationApplication

ProvenanceStore

Record Documentation of Execution

Allow for multiple stores from multiple IT providers

ProvenanceStore

ProvenanceStore

Page 41: Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm

Architecture Tutorial

Purpose of Standardisation

ProvenanceStore Query

Provenanceof Data

Allow for multiple stores from multiple IT providers

ProvenanceStore

Page 42: Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm

Architecture Tutorial

Purpose of Standardisation

Allow for legacy, monolithic applications to expose theircontents (according to standard schema)

Convert in standard dataformat

Page 43: Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm

Architecture Tutorial

Purpose of Standardisation

Allow third parties to host provenance stores, which are trusted by application owners but also auditors

ApplicationApplication

ProvenanceStore

Page 44: Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm

Architecture Tutorial

Compliance Oriented Architectures

• Separate execution documentation from compliance verification

• Allows for multiple compliance verifications

• Allows for validation to take place across multiple applications, possibly run by different institutions (in particular, allows for outsourcing and subcontracting).

• Approach is suitable for e-scientific peer-reviewing and business compliance verification

ApplicationApplication

ProvenanceStore

Record Documentationof Execution

QueryProvenance

Of Data

Complianceverification

ApplicationApplication

ProvenanceStore

Record Documentationof Execution

Record Documentationof Execution

QueryProvenance

Of Data

ComplianceverificationComplianceverification

Page 45: Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm

Architecture Tutorial

Organ Transplant ScenarioHospital

Electronic HealthcareManagement Service

Testing Lab

Page 46: Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm

Architecture Tutorial

Hospital Actors

User Interface

Donor DataCollector

Brain DeathManager

Page 47: Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm

Architecture Tutorial

What’s on the CD

• Documents relating to both PASOA and EU Provenance projects

• All the talks presented today• Handouts• Software

– PReServ (Paul Groth & Simon Miles)– The EU Provenance client side library

Page 48: Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm

Architecture Tutorial

Conclusions

Page 49: Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm

Architecture Tutorial

Standardising thedocumentation of

Business Processes

ProvenanceStore

Reco

rd

To Sum Up

Query

• Compliance check• Rerun/Reproduce• Analyse

• Provenance– Architecture– Methodology

Apply

Healthcare

DistributionFinance Aerospace

Automobile

Pharmaceutical

Slide from John Ibbotson

Page 50: Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm

Architecture Tutorial

Overview of Today’s Talks

• Provenance Data Structures

• Recording and Querying Provenance– Break (30 minutes)

• Distribution and Scalability

• Security

• Methodology

Page 51: Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm

Architecture Tutorial

Questions