Upload
paolo-missier
View
290
Download
3
Tags:
Embed Size (px)
DESCRIPTION
IPAW'14 talk for our paper: http://arxiv.org/abs/1406.1998
Citation preview
IPA
W 2
014
– P.
Mis
sier
ProvAbs: model, policy, and tooling for abstracting PROV graphs
Paolo Missier, Jeremy Bryans, Carl Gamble
School of Computing Science, Newcastle University
Vasa Curcin, Roxana Danger
Imperial College, London
IPAW’14
Koln, June 10th, 2014
IPA
W 2
014
– P.
Mis
sier
Motivation: partial disclosure of provenance
Consumer: • Motivated to acquire and act upon analysis But: expect support evidence, mitigate risk of acting upon inaccurate information
Provider:• Motivated to provide accurate analysis to Public Agencies • Enhance communication using provenance metadata for evidenceBut: cannot fully disclose sources, analysis methods, etc.
IPA
W 2
014
– P.
Mis
sier
Provenance-enabled data exchanges
IPA
W 2
014
– P.
Mis
sier
Provenance exchange as part of data exchange
IPA
W 2
014
– P.
Mis
sier
Provenance abstraction
What:• Abstraction model for PROV• Policy model and language to drive the abstraction• Implementation: the ProvAbs tool
Why: • To enable data exchanges with partial disclosure of the data
provenance• To simplify understanding of provenance traces by humans
How:• Graph rewriting, from valid PROV to valid PROV
• A node grouping operator
IPA
W 2
014
– P.
Mis
sier
Provenance views
Motivation similar to the UserViews model (*)
Goals: 1. construct relevant user views2. answer to a provenance query depends on the workflow view
In contrast, in our work:
No assumption on any process specification (formal or not) driving the views on provenance
(*) Biton, O, S Cohen Boulakia, S B Davidson, and C S Hara. “Querying and Managing Provenance through User Views in Scientific Workflows.” In ICDE, 1072–1081, 2008. doi:http://dx.doi.org/10.1109/ICDE.2008.4497516.
• Heavily focused on workflow and their provenance• Scenario: one (or more) workflows, multiple users/viewers• Rely on “composite modules” (sub-workflow structuring):• Real workflow induced workflow
IPA
W 2
014
– P.
Mis
sier
History of an analyst’s report
Document produced by the “incident room analysts”
IPA
W 2
014
– P.
Mis
sier
1 – Define policy to assign sensitivity to graph nodes
list classifications[protect, restricted, confidential, secret, topSecret];
for all (activity used data) where (data.Status > confidential in classifications)
setSensitivity(activity, 7);for all (activity used data) where (data.Status <= confidential in classifications) setSensitivity(activity, 5);
IPA
W 2
014
– P.
Mis
sier
2- Node selection
Select nodes for abstraction based on the receiver’s clearance level
7 7 7
5
Receiver’s clearance level: 6
✔
︎ ✗︎ ✗ ︎ ✗ ︎ ✗
IPA
W 2
014
– P.
Mis
sier
3- Abstraction
Apply abstraction operator
7 7 7
5✔
︎ ✗︎ ✗ ︎ ✗ ︎ ✗
IPA
W 2
014
– P.
Mis
sier
Abstracting over sets of nodes
General abstraction idea: replace a group of (possibly non-contiguous) nodes with a new node
IPA
W 2
014
– P.
Mis
sier
Naïve node group replacement: introducing cycles
Generation-usage cycles are legal in PROV
Note: initial focus on vanilla PROV: usage-generation/entity-activity
IPA
W 2
014
– P.
Mis
sier
What’s wrong with cycles?
New cycles introduce new constraintson the temporal ordering of events
u’, g’ simultaneous
IPA
W 2
014
– P.
Mis
sier
More generally: mapping concrete to abstract events
Abstract graph nodes should be characterised by abstract events
• Generation is the completion of production of a new entity (PROV-DM Sec. 5.1.3)• Usage is the beginning of utilizing an entity (PROV-DM Sec. 5.1.4).
g’ = max { g1, g2 } u’ = min { u3, u4 }
IPA
W 2
014
– P.
Mis
sier
Usage-follows-generation
Abstract graphs with abstract usage-generation events correspond to a specific class of base graphs with pattern:
<all generations> -- <all usages>
All generation events for all ei must precede all usage events for all ei.
Given a grouping set of entities{e1…en}
such that:
ei wasGeneratedBy aor
a used ei:
IPA
W 2
014
– P.
Mis
sier
Naïve node group replacement -2: Type violations
IPA
W 2
014
– P.
Mis
sier
Criteria for abstraction
1. No new generation-usage cycles
2. No new dependencies
3. Satisfy type constraints on relationship
but: ok to remove some dependencies
Convexity by closure
Extension
Replacement, rewiring
IPA
W 2
014
– P.
Mis
sier
Convexity by path closure
IPA
W 2
014
– P.
Mis
sier
Replacement , rewiring
IPA
W 2
014
– P.
Mis
sier
Extension – restore type correctness
IPA
W 2
014
– P.
Mis
sier
t-grouping
Nodes in the grouping set can be a mix of Entities or Activities
• When all boundary nodes are of the same type: grouping creates a node of that type
• e-grouping: new Entity node• a-grouping: new Activity node
• Boundary nodes of mixed types: grouping can introduce a node of either type
t-grouping: creates new node of type t { En, Act }∈
Note:Grouping is commutative and closed wrt composition
IPA
W 2
014
– P.
Mis
sier
t-grouping
a-grouping e-grouping
IPA
W 2
014
– P.
Mis
sier
The ProvAbs tool
• A tool to let a policy designer explore partial disclosure options• by experimenting with policy settings and clearance thresholds.
• Accepts graphs in PROV-N format• Policy specified interactively, or loaded from file
Demo available!
IPA
W 2
014
– P.
Mis
sier
Summary
A model for abstracting PROV graph by (recursively) replacing sets
of nodes with new nodes
• Map valid PROV to valid PROV – ref.: PROV-CONSTRAINTS
• No false dependencies introduced
Abstract nodes abstract events
Extended to Agents (see TechReport)
Need to extend to more PROV relationship types
See also:Missier, P., Gamble, C., Bryans, J.: Provenance graph abstraction by node grouping. Technical report, Newcastle University (2013)http://www.ncl.ac.uk/computing/research/publication/194432