47
From Models of Computation (MoCs) to Models of Provenance (MoPs) Bertram Ludäscher Dept. of Computer Science & Genome Center University of California, Davis [email protected] UC DAVIS Department of Computer Science

From Models of Computation (MoCs) to Models of Provenance (MoPs) Bertram Ludäscher Dept. of Computer Science & Genome Center University of California,

Embed Size (px)

Citation preview

Page 1: From Models of Computation (MoCs) to Models of Provenance (MoPs) Bertram Ludäscher Dept. of Computer Science & Genome Center University of California,

From Models of Computation (MoCs) to Models of Provenance (MoPs)

Bertram LudäscherDept. of Computer Science

& Genome Center

University of California, [email protected]

UC DAVISDepartment ofComputer Science

Page 2: From Models of Computation (MoCs) to Models of Provenance (MoPs) Bertram Ludäscher Dept. of Computer Science & Genome Center University of California,

From MoCs to MoPs, B. LudäscherFrom MoCs to MoPs, B. LudäschereScience Theme 9, Oct 13-17, 2008, SLCeScience Theme 9, Oct 13-17, 2008, SLC

Pop Quiz Time!! How does this execute?Pop Quiz Time!! How does this execute?

• It depends …– DAG(man)– SDF– PN– DDF– COMAD– Petri-Net:

• actors = transitions• channels = places

• Different MoCs – different programming languages

• Different features:– Data-ware vs. -agnostic– Parallelism:

• Task-parallel, pipeline parallel, streaming pipeline parallel, data parallel

– Loops? data transport, control-flow, time, …

D

C

BA

Page 3: From Models of Computation (MoCs) to Models of Provenance (MoPs) Bertram Ludäscher Dept. of Computer Science & Genome Center University of California,

From MoCs to MoPs, B. LudäscherFrom MoCs to MoPs, B. LudäschereScience Theme 9, Oct 13-17, 2008, SLCeScience Theme 9, Oct 13-17, 2008, SLC

Objectives, GoalsObjectives, Goals

• Better understand different notions of “provenance”– … and “workflow”

• … and how they relate • Cross-fertilize between

– CS subdisciplines (databases, workflows, BPM, PL, concurrency,…)– basic research and applications

• new research problems, informed by apps (but not only)• impact apps by knowledge transfer from basic research

• “In eigener Sache”…– Curiosity/fun-driven research

• Petri-net vs Kahn, COMAD vs Taverna, XML streaming vs NRC, …– A subset (B union C); Hamming; partial data-structures, …

– … in addition to (or thereby) making the world better

Page 4: From Models of Computation (MoCs) to Models of Provenance (MoPs) Bertram Ludäscher Dept. of Computer Science & Genome Center University of California,

From MoCs to MoPs, B. LudäscherFrom MoCs to MoPs, B. LudäschereScience Theme 9, Oct 13-17, 2008, SLCeScience Theme 9, Oct 13-17, 2008, SLC

So many models, so little timeSo many models, so little time

types of users, roles

use casesquerieswhat theywant to do Wf models, MoCs, MoPs

expressiveness, complexity

Page 5: From Models of Computation (MoCs) to Models of Provenance (MoPs) Bertram Ludäscher Dept. of Computer Science & Genome Center University of California,

From MoCs to MoPs, B. LudäscherFrom MoCs to MoPs, B. LudäschereScience Theme 9, Oct 13-17, 2008, SLCeScience Theme 9, Oct 13-17, 2008, SLC

Notions, TerminologyNotions, Terminology• (Scientific) Workflow

– A program? A specification? A partial one? cannot be properly defined!? (cf. ‘family resemblance’ in classification)

Family resemblance (German Familienähnlichkeit [1]) is a philosophical idea proposed by Ludwig Wittgenstein, with the most well known exposition being given in the posthumously published book Philosophical Investigations (1953) [2]. The idea itself takes its name from Wittgenstein's metaphorical description of a type of relationship he argued was exhibited by language.[3] Wittgenstein's point was that things which may be thought to be connected by one essential common feature may in fact be connected by a series of overlapping similarities, where no one feature is common to all. Games, which Wittgenstein used to explain the notion, have become the paradigmatic example of a group that is related by family resemblances.In classification theory: polythetic vs. monothetic

Page 6: From Models of Computation (MoCs) to Models of Provenance (MoPs) Bertram Ludäscher Dept. of Computer Science & Genome Center University of California,

From MoCs to MoPs, B. LudäscherFrom MoCs to MoPs, B. LudäschereScience Theme 9, Oct 13-17, 2008, SLCeScience Theme 9, Oct 13-17, 2008, SLC

Notions, TerminologyNotions, Terminology• Model of Computation (MoC)

– Takes a wf W, a “domain” / “director” / model of computation M– Then for any input x, defines what y = M(W, x) is– Implies a set of observables

• Run:– Representation of an execution in terms of “basic” observables, i.e.,

implied by the MoC• Trace

– Representation (approximation) of an execution in terms of “relevant” observables (for a use-case, query)

• Model of Provenance (MoP)– … make this precise (maybe for some MoCs)

Page 7: From Models of Computation (MoCs) to Models of Provenance (MoPs) Bertram Ludäscher Dept. of Computer Science & Genome Center University of California,

From MoCs to MoPs, B. LudäscherFrom MoCs to MoPs, B. LudäschereScience Theme 9, Oct 13-17, 2008, SLCeScience Theme 9, Oct 13-17, 2008, SLC

Different types of “Provenance” Different types of “Provenance” • Data provenance:

– lineage, data dependencies

• Execution provenance: – other runtime observables

Querying lineage vs querying the execution

• Workflow evolution provenance– Vistrails

• Provenance is more important than the results! The Selfish P-Assertion / Selfish Provenance Graph! (cf. “The Selfish Gene”)

Page 8: From Models of Computation (MoCs) to Models of Provenance (MoPs) Bertram Ludäscher Dept. of Computer Science & Genome Center University of California,

From MoCs to MoPs, B. LudäscherFrom MoCs to MoPs, B. LudäschereScience Theme 9, Oct 13-17, 2008, SLCeScience Theme 9, Oct 13-17, 2008, SLC

Provenance Uses for the Domain ScientistsProvenance Uses for the Domain Scientists

• Query the lineage of a data product– from what data was this computed?– “real” dependencies please!!!

• Evaluate the results of a workflow– do I like how this result was computed?

• Reuse data products of one workflow run in another– (re-)attach prior data products to a new workflow

• Archive scientific results in a repository• Replicate the results reported by another researcher• Discover all results derived from a given dataset

– … i.e. across all runs• Explain unexpected results

– … via parameter-, dataset-, object-dependencies in the scientist’s terms (yes, you may substitute “ontology” here … )

Page 9: From Models of Computation (MoCs) to Models of Provenance (MoPs) Bertram Ludäscher Dept. of Computer Science & Genome Center University of California,

From MoCs to MoPs, B. LudäscherFrom MoCs to MoPs, B. LudäschereScience Theme 9, Oct 13-17, 2008, SLCeScience Theme 9, Oct 13-17, 2008, SLC

Provenance for the WF Engineer / “Plumber”Provenance for the WF Engineer / “Plumber”• A Workflow Engineer’s View

– Monitor, benchmark, and optimize workflow performance

– Record resource usage for a workflow execution

– “Smart Re-run” of (variants of) previous executions

– Checkpointing & restart (e.g. for crash recovery, load balancing)

– Debug or troubleshoot a workflow run

– Explain when, where, why a workflow crashed

Page 10: From Models of Computation (MoCs) to Models of Provenance (MoPs) Bertram Ludäscher Dept. of Computer Science & Genome Center University of California,

From MoCs to MoPs, B. LudäscherFrom MoCs to MoPs, B. LudäschereScience Theme 9, Oct 13-17, 2008, SLCeScience Theme 9, Oct 13-17, 2008, SLC

And the “right level” of modeling is … And the “right level” of modeling is … • Common approach: “Let’s record everything”

– What does that mean???

• Say your workflow is implemented in Kepler:– Workflow invocation + input + output– Actor invocation + input + output– Everything that has be written to (read from) a port– Something else?

• And what about a trace of the JVM instructions?• … the assembly level instructions?• … firmware code?• … signals

What are the observables?

Page 11: From Models of Computation (MoCs) to Models of Provenance (MoPs) Bertram Ludäscher Dept. of Computer Science & Genome Center University of California,

From MoCs to MoPs, B. LudäscherFrom MoCs to MoPs, B. LudäschereScience Theme 9, Oct 13-17, 2008, SLCeScience Theme 9, Oct 13-17, 2008, SLC

Summary: what people do with “provenance” Summary: what people do with “provenance” • Result validation (different in: science vs workflow “logic”)• Result debugging (science vs wf logic)• Reproducibility• Repeatability • Explanation (derivations, traces, proof trees)• Runtime monitoring

– Profiling, benchmarking• Performance Optimization (“smart re-run”)• Fault-tolerance, crash-recovery• Workflow design• QUICK DEMO

Page 12: From Models of Computation (MoCs) to Models of Provenance (MoPs) Bertram Ludäscher Dept. of Computer Science & Genome Center University of California,

From MoCs to MoPs, B. LudäscherFrom MoCs to MoPs, B. LudäschereScience Theme 9, Oct 13-17, 2008, SLCeScience Theme 9, Oct 13-17, 2008, SLC

Kepler/pPOD workflowsKepler/pPOD workflows

new directordata types, collections

assembly-line processingprovenance enabled

actor libraryCipres web services

local applicationsformat conversion

GUI componentsworkspace extensionaccess to workflows

access to run “traces”

Page 13: From Models of Computation (MoCs) to Models of Provenance (MoPs) Bertram Ludäscher Dept. of Computer Science & Genome Center University of California,

From MoCs to MoPs, B. LudäscherFrom MoCs to MoPs, B. LudäschereScience Theme 9, Oct 13-17, 2008, SLCeScience Theme 9, Oct 13-17, 2008, SLC

Kepler/pPOD Provenance BrowserKepler/pPOD Provenance Browser

• Reusable “widgets” for viewing different aspects of a trace• Move “forward” and “backward” through execution• Data dependencies, collection structure, actor invocations

Page 14: From Models of Computation (MoCs) to Models of Provenance (MoPs) Bertram Ludäscher Dept. of Computer Science & Genome Center University of California,

From MoCs to MoPs, B. LudäscherFrom MoCs to MoPs, B. LudäschereScience Theme 9, Oct 13-17, 2008, SLCeScience Theme 9, Oct 13-17, 2008, SLC

Kepler/pPOD Provenance BrowserKepler/pPOD Provenance Browser

• Collection and invocation VIEW• Incrementally step through execution history• Actor invocation graph shows pipelining, implicit branches

Page 15: From Models of Computation (MoCs) to Models of Provenance (MoPs) Bertram Ludäscher Dept. of Computer Science & Genome Center University of California,

From MoCs to MoPs, B. LudäscherFrom MoCs to MoPs, B. LudäschereScience Theme 9, Oct 13-17, 2008, SLCeScience Theme 9, Oct 13-17, 2008, SLC

• Complex SDM/CPES workflow in Kepler

– 50+ composite actors (subworkflows)

– 4 levels of hierarchy – 1000+ atomic (Java) actors– Model of Computation:

• Dataflow (~Kahn-PN)• Task parallel • pipeline parallel

(streaming!)

43 actors, 3 levels

196 actors, 4 levels30 actors

206 actors, 4 levels

137 actors33 actors

150123 actors

66 actors12 actors

243 actors, 4 levels

Source: Norbert Podhorszki (UC Davis --> ORNL)

Page 16: From Models of Computation (MoCs) to Models of Provenance (MoPs) Bertram Ludäscher Dept. of Computer Science & Genome Center University of California,

From MoCs to MoPs, B. LudäscherFrom MoCs to MoPs, B. LudäschereScience Theme 9, Oct 13-17, 2008, SLCeScience Theme 9, Oct 13-17, 2008, SLC

16

Workflow Framework: another MoCWorkflow Framework: another MoC

Provenance,Tracking &Meta-Data

(DBs and Portals)

Control Plane(light data flows)

ExecutionPlane

(“HeavyLifting”

Computationsand flows)

Synchronous or Asynchronous

Kepler

Page 17: From Models of Computation (MoCs) to Models of Provenance (MoPs) Bertram Ludäscher Dept. of Computer Science & Genome Center University of California,

From MoCs to MoPs, B. LudäscherFrom MoCs to MoPs, B. LudäschereScience Theme 9, Oct 13-17, 2008, SLCeScience Theme 9, Oct 13-17, 2008, SLC

Streaming tokens & dealing with failureStreaming tokens & dealing with failure

33 22 11

transfer 1

failed 2

convert 1 arch 1

transfer 3 convert 3 arch 3

Source: Norbert Podhorszki (UC Davis -->

ORNL)

Source: Norbert Podhorszki (UC Davis -->

ORNL)

Page 18: From Models of Computation (MoCs) to Models of Provenance (MoPs) Bertram Ludäscher Dept. of Computer Science & Genome Center University of California,

From MoCs to MoPs, B. LudäscherFrom MoCs to MoPs, B. LudäschereScience Theme 9, Oct 13-17, 2008, SLCeScience Theme 9, Oct 13-17, 2008, SLC

After a restart…After a restart…

33 22 11

skip 1

transfer 2

skip 1

convert 2

skip 1

arch 2

skip 3 skip 3 skip 3

Source: Norbert Podhorszki (UC Davis -->

ORNL)

Source: Norbert Podhorszki (UC Davis -->

ORNL)

Page 19: From Models of Computation (MoCs) to Models of Provenance (MoPs) Bertram Ludäscher Dept. of Computer Science & Genome Center University of California,

From MoCs to MoPs, B. LudäscherFrom MoCs to MoPs, B. LudäschereScience Theme 9, Oct 13-17, 2008, SLCeScience Theme 9, Oct 13-17, 2008, SLC

Kepler + “Process Central” (Execution monitoring)Kepler + “Process Central” (Execution monitoring)

Faraaz Sareshwala (ECS-199 project)Faraaz Sareshwala (ECS-199 project)

Page 20: From Models of Computation (MoCs) to Models of Provenance (MoPs) Bertram Ludäscher Dept. of Computer Science & Genome Center University of California,

From MoCs to MoPs, B. LudäscherFrom MoCs to MoPs, B. LudäschereScience Theme 9, Oct 13-17, 2008, SLCeScience Theme 9, Oct 13-17, 2008, SLC

Kepler + Weka (Data Mining Package)Kepler + Weka (Data Mining Package)

Peter Reutemann, University of Waikato, NZ

Peter Reutemann, University of Waikato, NZ

Page 21: From Models of Computation (MoCs) to Models of Provenance (MoPs) Bertram Ludäscher Dept. of Computer Science & Genome Center University of California,

From MoCs to MoPs, B. LudäscherFrom MoCs to MoPs, B. LudäschereScience Theme 9, Oct 13-17, 2008, SLCeScience Theme 9, Oct 13-17, 2008, SLC

Kepler Flex Client Kepler Flex Client

Christopher Tuot, DFKI, GermanyChristopher Tuot, DFKI, Germany

Page 22: From Models of Computation (MoCs) to Models of Provenance (MoPs) Bertram Ludäscher Dept. of Computer Science & Genome Center University of California,

From MoCs to MoPs, B. LudäscherFrom MoCs to MoPs, B. LudäschereScience Theme 9, Oct 13-17, 2008, SLCeScience Theme 9, Oct 13-17, 2008, SLC

Kepler on the WebKepler on the Web

Tristan King, James Cook University, Australia

Tristan King, James Cook University, Australia

Page 23: From Models of Computation (MoCs) to Models of Provenance (MoPs) Bertram Ludäscher Dept. of Computer Science & Genome Center University of California,

From MoCs to MoPs, B. LudäscherFrom MoCs to MoPs, B. LudäschereScience Theme 9, Oct 13-17, 2008, SLCeScience Theme 9, Oct 13-17, 2008, SLC

Taverna, MyExperimentTaverna, MyExperiment

Page 24: From Models of Computation (MoCs) to Models of Provenance (MoPs) Bertram Ludäscher Dept. of Computer Science & Genome Center University of California,

From MoCs to MoPs, B. LudäscherFrom MoCs to MoPs, B. LudäschereScience Theme 9, Oct 13-17, 2008, SLCeScience Theme 9, Oct 13-17, 2008, SLC

Types of Data ProvenanceTypes of Data Provenance• Black-box

– know (next to) nothing at compile-time– at runtime: keep data lineage: R/W/fire observables

• Grey-box1. can “look inside” (inside some black boxes)2. … or FP signatures: A :: t1, t2 t3,t43. … or semantic annotations (sem.types)4. … or dependency signatures!e.g. subworkflows, COMAD!

• White-box– statically (compile-time) analyzable– v(P1*P2, X,Z) :- r(P1, X,_,Z), r(P2, _,_,Z).– most database work seem to fall here

f

A

q

t1t2

t3t4

X1X2

Y1Y2

Page 25: From Models of Computation (MoCs) to Models of Provenance (MoPs) Bertram Ludäscher Dept. of Computer Science & Genome Center University of California,

From MoCs to MoPs, B. LudäscherFrom MoCs to MoPs, B. LudäschereScience Theme 9, Oct 13-17, 2008, SLCeScience Theme 9, Oct 13-17, 2008, SLC

Different kinds of “scenarios”, problemsDifferent kinds of “scenarios”, problems• Given database D, output y = Q(D)• … find Q’ such Q’(y) yields part of D on which y depends

• Given a “runtime recording” / trace: – … query for lineage (scientist), performance (engineer), … – … and a modification, do a “smart-rerun”– … and a crash, do a “recovery”

Page 26: From Models of Computation (MoCs) to Models of Provenance (MoPs) Bertram Ludäscher Dept. of Computer Science & Genome Center University of California,

From MoCs to MoPs, B. LudäscherFrom MoCs to MoPs, B. LudäschereScience Theme 9, Oct 13-17, 2008, SLCeScience Theme 9, Oct 13-17, 2008, SLC

From MoC to MoP via ObservablesFrom MoC to MoP via Observables• Model of Computation MoC M

– specification/algorithm to compute o = M(W,P,i)– a director or scheduler implements M– gives rise to formal notions of

• computation (aka run) R; typically tree models– Formalisms to define M? Via a meta-interpreter?

• Model of Provenance MoP M’– approximation M’ of M – a trace T approximates a run R by inclusion/exclusion of observables– “ T = R – Ignored-observables (Ignorables) + Model-observables ”

• Observables (of a MoC M)– functional observables (may influence output o)

• token rate, notions of firing, … – non-functional observables (not part of M, do not influence o)

• token timestamp, size, … (unless the MoC cares about those)– Actors should not be able to observe anything!

• Race conditions via “arrival times”…

Page 27: From Models of Computation (MoCs) to Models of Provenance (MoPs) Bertram Ludäscher Dept. of Computer Science & Genome Center University of California,

From MoCs to MoPs, B. LudäscherFrom MoCs to MoPs, B. LudäschereScience Theme 9, Oct 13-17, 2008, SLCeScience Theme 9, Oct 13-17, 2008, SLC

Models of Computation (A WF Engineer’s Issue)Models of Computation (A WF Engineer’s Issue)

Directors separate orchestration/ scheduling concerns from conceptual design

– Synchronous Dataflow (SDF)• Statically analyzable: schedule, no deadlocks, fixed buffer requirements;

executable as a single thread by the director.– Process Networks (PN)

• Generalizes SDF. Actors execute as separate threads/processes, with queues of unbounded size (Kahn/MacQueen networks).

– Directed Acyclic Graph (DAG)• Special case of SDF. No loops, no pipelining, no state (one invocation per actor)

– Continuous Time (CT)• Connections represent the value of a continuous time signal at some point in

time ... Often used to model physical processes.

– Discrete Event (DE)• Actors communicate through a queue of events in time. Used for instantaneous

reactions in physical systems.

– …

Page 28: From Models of Computation (MoCs) to Models of Provenance (MoPs) Bertram Ludäscher Dept. of Computer Science & Genome Center University of California,

From MoCs to MoPs, B. LudäscherFrom MoCs to MoPs, B. LudäschereScience Theme 9, Oct 13-17, 2008, SLCeScience Theme 9, Oct 13-17, 2008, SLC

• Vanilla Process Network

• Functional Programming

Dataflow Network

• XML Transformation Network

• Collection-oriented

Modeling & Design (COMAD)

Language & Abstractions; Modeling & Design Language & Abstractions; Modeling & Design

The limitations of my modeling / wf language are the

limitations of my design world. – BL

Page 29: From Models of Computation (MoCs) to Models of Provenance (MoPs) Bertram Ludäscher Dept. of Computer Science & Genome Center University of California,

From MoCs to MoPs, B. LudäscherFrom MoCs to MoPs, B. LudäschereScience Theme 9, Oct 13-17, 2008, SLCeScience Theme 9, Oct 13-17, 2008, SLC

Automatic Iteration in Kahn NetworksAutomatic Iteration in Kahn Networks• Given f: x y • and input stream <x>• Kahn process is F: <x> <y>

– i.e., big F is a kind of “stream-map” of some small f– Kahn doesn’t talk about the little f– Dennis Dataflow does (via firing rules)

Page 30: From Models of Computation (MoCs) to Models of Provenance (MoPs) Bertram Ludäscher Dept. of Computer Science & Genome Center University of California,

From MoCs to MoPs, B. LudäscherFrom MoCs to MoPs, B. LudäschereScience Theme 9, Oct 13-17, 2008, SLCeScience Theme 9, Oct 13-17, 2008, SLC

Kahn Processes over streams: F monotoneKahn Processes over streams: F monotone

Edward A. Lee and Eleftherios Matsikoudis, "The Semantics of Dataflow with Firing," Chapter in From Semantics to Computer Science: Essays in memory of Gilles Kahn. Gérard Huet, Gordon Plotkin, Jean-Jacques Lévy, Yves Bertot, editors, Preprint Version, March 07, 2008, Copyright (c) Cambridge University Press, 2008.

Edward A. Lee and Eleftherios Matsikoudis, "The Semantics of Dataflow with Firing," Chapter in From Semantics to Computer Science: Essays in memory of Gilles Kahn. Gérard Huet, Gordon Plotkin, Jean-Jacques Lévy, Yves Bertot, editors, Preprint Version, March 07, 2008, Copyright (c) Cambridge University Press, 2008.

Page 31: From Models of Computation (MoCs) to Models of Provenance (MoPs) Bertram Ludäscher Dept. of Computer Science & Genome Center University of California,

From MoCs to MoPs, B. LudäscherFrom MoCs to MoPs, B. LudäschereScience Theme 9, Oct 13-17, 2008, SLCeScience Theme 9, Oct 13-17, 2008, SLC

Kahn Processes over streams: F continuousKahn Processes over streams: F continuous

Page 32: From Models of Computation (MoCs) to Models of Provenance (MoPs) Bertram Ludäscher Dept. of Computer Science & Genome Center University of California,

From MoCs to MoPs, B. LudäscherFrom MoCs to MoPs, B. LudäschereScience Theme 9, Oct 13-17, 2008, SLCeScience Theme 9, Oct 13-17, 2008, SLC

Dataflow WITH FiringDataflow WITH Firing

Page 33: From Models of Computation (MoCs) to Models of Provenance (MoPs) Bertram Ludäscher Dept. of Computer Science & Genome Center University of California,

From MoCs to MoPs, B. LudäscherFrom MoCs to MoPs, B. LudäschereScience Theme 9, Oct 13-17, 2008, SLCeScience Theme 9, Oct 13-17, 2008, SLC

Dataflow processes: From little f to big F … Dataflow processes: From little f to big F …

Page 34: From Models of Computation (MoCs) to Models of Provenance (MoPs) Bertram Ludäscher Dept. of Computer Science & Genome Center University of California,

From MoCs to MoPs, B. LudäscherFrom MoCs to MoPs, B. LudäschereScience Theme 9, Oct 13-17, 2008, SLCeScience Theme 9, Oct 13-17, 2008, SLC

Dataflow with Firing: Dennis dataflowDataflow with Firing: Dennis dataflow

Page 35: From Models of Computation (MoCs) to Models of Provenance (MoPs) Bertram Ludäscher Dept. of Computer Science & Genome Center University of California,

From MoCs to MoPs, B. LudäscherFrom MoCs to MoPs, B. LudäschereScience Theme 9, Oct 13-17, 2008, SLCeScience Theme 9, Oct 13-17, 2008, SLC Source: Edward Lee http://ptolemy.eecs.berkeley.edu/Source: Edward Lee http://ptolemy.eecs.berkeley.edu/

Synchronous Dataflow (SDF)Synchronous Dataflow (SDF)

Page 36: From Models of Computation (MoCs) to Models of Provenance (MoPs) Bertram Ludäscher Dept. of Computer Science & Genome Center University of California,

From MoCs to MoPs, B. LudäscherFrom MoCs to MoPs, B. LudäschereScience Theme 9, Oct 13-17, 2008, SLCeScience Theme 9, Oct 13-17, 2008, SLC

Let’s talk about observables …Let’s talk about observables …• DAG model: … [A] [B] …

– Observables: start(job)@time, finish(job)@time – Correctness criterion: start(B) > end(A)

• Variants of PN …– {x} [A] {y}– Observable: set of reads, set of writes– y_j may depend on subset of x’s– but no x_j depends on any y_i

– {x1@t1, x2@t2, …} [A] {y1@t1’, y2@t2’, …}– Can draw inference: write y_i@t may depend on all x_i prior to t

• More informed models:– Actor signatures (compile-time static analysis)– Actor assertions (runtime richer provenance graph)

• Special case: RWS

Page 37: From Models of Computation (MoCs) to Models of Provenance (MoPs) Bertram Ludäscher Dept. of Computer Science & Genome Center University of California,

From MoCs to MoPs, B. LudäscherFrom MoCs to MoPs, B. LudäschereScience Theme 9, Oct 13-17, 2008, SLCeScience Theme 9, Oct 13-17, 2008, SLC

“Real” Data Dependencies over token streams“Real” Data Dependencies over token streams

• Stateless actors, “firing” on each token– [ x1, x2, x3, … ] F [ f(x1), f(x2), f(x3), … ] – Generates dependencies: x_i f_i y_i – Note: F = map(f)

• Stateful actor, “firing” on each token– [ x1, x2, x3, … ] F [ f(x1), f(x1,x2), f(x1,x2,x3), … ] – Kahn-McQueen Process Networks

• prefix monotonic, deterministic computations– Generates dependencies (here): x_j f_i y_i, f.a. i<= j

F[…|x3|x2|x1] […|y3|y2|y1]

Page 38: From Models of Computation (MoCs) to Models of Provenance (MoPs) Bertram Ludäscher Dept. of Computer Science & Genome Center University of California,

From MoCs to MoPs, B. LudäscherFrom MoCs to MoPs, B. LudäschereScience Theme 9, Oct 13-17, 2008, SLCeScience Theme 9, Oct 13-17, 2008, SLC

Behold the Beauty of Scientific Workflow DesignBehold the Beauty of Scientific Workflow Design

Author: Kristian Stevens, UC Davis

Page 39: From Models of Computation (MoCs) to Models of Provenance (MoPs) Bertram Ludäscher Dept. of Computer Science & Genome Center University of California,

From MoCs to MoPs, B. LudäscherFrom MoCs to MoPs, B. LudäschereScience Theme 9, Oct 13-17, 2008, SLCeScience Theme 9, Oct 13-17, 2008, SLC

… Shimology Part 2: the ugly truth inside… Shimology Part 2: the ugly truth inside Author: Kristian Stevens, UC Davis

Page 40: From Models of Computation (MoCs) to Models of Provenance (MoPs) Bertram Ludäscher Dept. of Computer Science & Genome Center University of California,

From MoCs to MoPs, B. LudäscherFrom MoCs to MoPs, B. LudäschereScience Theme 9, Oct 13-17, 2008, SLCeScience Theme 9, Oct 13-17, 2008, SLC

COMAD: “Virtual Assembly Lines”COMAD: “Virtual Assembly Lines”

• Actors select parts of token stream, forward rest• Special tokens denote collections, metadata, & parameters• Actors insert tokens into and remove tokens from stream• Some advantages of COMAD:

– workflows with loops, branches, composition (subworkflows)– concurrency, pipelining (streaming)– resilient to change (data nesting, add/remove actors)– simpler workflow designs

……

Compute Consensus

… …

Proj

Seqs Aligns

… …

Trees

S1 S10 A1 A2 T1 T5

>< < < >>><

<A

lign

s>

</A

lig

ns

> <P

roj>

</P

roj>

<S

eq

>

</S

eq

>

<T

ree

s>

</T

ree

s>

S10 S1A2 A1T5 T1T6T6

Page 41: From Models of Computation (MoCs) to Models of Provenance (MoPs) Bertram Ludäscher Dept. of Computer Science & Genome Center University of California,

From MoCs to MoPs, B. LudäscherFrom MoCs to MoPs, B. LudäschereScience Theme 9, Oct 13-17, 2008, SLCeScience Theme 9, Oct 13-17, 2008, SLC

Input Change-Resilience (nested data types)Input Change-Resilience (nested data types)

S. Bowers, Daniel Zinn (UC Davis)

Page 42: From Models of Computation (MoCs) to Models of Provenance (MoPs) Bertram Ludäscher Dept. of Computer Science & Genome Center University of California,

From MoCs to MoPs, B. LudäscherFrom MoCs to MoPs, B. LudäschereScience Theme 9, Oct 13-17, 2008, SLCeScience Theme 9, Oct 13-17, 2008, SLC

Optimizing COMAD: User- vs. System ViewOptimizing COMAD: User- vs. System View

Daniel Zinn (UC Davis) Daniel Zinn (UC Davis)

Page 43: From Models of Computation (MoCs) to Models of Provenance (MoPs) Bertram Ludäscher Dept. of Computer Science & Genome Center University of California,

From MoCs to MoPs, B. LudäscherFrom MoCs to MoPs, B. LudäschereScience Theme 9, Oct 13-17, 2008, SLCeScience Theme 9, Oct 13-17, 2008, SLC

X-CSR (“XML Scissor”): Cut-Ship-ReassembleX-CSR (“XML Scissor”): Cut-Ship-Reassemble

Daniel Zinn, Shawn Bowers,Timothy McPhillips, Bertram Ludaescher (UC Davis), ICDE

2009

Daniel Zinn, Shawn Bowers,Timothy McPhillips, Bertram Ludaescher (UC Davis), ICDE

2009

Page 44: From Models of Computation (MoCs) to Models of Provenance (MoPs) Bertram Ludäscher Dept. of Computer Science & Genome Center University of California,

From MoCs to MoPs, B. LudäscherFrom MoCs to MoPs, B. LudäschereScience Theme 9, Oct 13-17, 2008, SLCeScience Theme 9, Oct 13-17, 2008, SLC

What we (will) get: Change-Resilience What we (will) get: Change-Resilience

A B CS R

W

X

S RW•

S RW•+X•

Original: Automatic Configuration:

?

Infer Configuration X• of X

Daniel Zinn (UC Davis)Daniel Zinn (UC Davis)

Page 45: From Models of Computation (MoCs) to Models of Provenance (MoPs) Bertram Ludäscher Dept. of Computer Science & Genome Center University of California,

From MoCs to MoPs, B. LudäscherFrom MoCs to MoPs, B. LudäschereScience Theme 9, Oct 13-17, 2008, SLCeScience Theme 9, Oct 13-17, 2008, SLC

(Scientific) Workflow Modeling Paradigms & MoCs(Scientific) Workflow Modeling Paradigms & MoCs

• Vanilla Process Network

• Functional Programming Dataflow Network

• XML Transformation Network

• Collection-oriented Modeling & Design framework (COMAD)

Page 46: From Models of Computation (MoCs) to Models of Provenance (MoPs) Bertram Ludäscher Dept. of Computer Science & Genome Center University of California,

From MoCs to MoPs, B. LudäscherFrom MoCs to MoPs, B. LudäschereScience Theme 9, Oct 13-17, 2008, SLCeScience Theme 9, Oct 13-17, 2008, SLC

Collection-Oriented Modeling & Design (COMAD)Collection-Oriented Modeling & Design (COMAD)

Page 47: From Models of Computation (MoCs) to Models of Provenance (MoPs) Bertram Ludäscher Dept. of Computer Science & Genome Center University of California,

From MoCs to MoPs, B. LudäscherFrom MoCs to MoPs, B. LudäschereScience Theme 9, Oct 13-17, 2008, SLCeScience Theme 9, Oct 13-17, 2008, SLC

Implicit iteration – compare Kahn vs COMADImplicit iteration – compare Kahn vs COMAD