Upload
brandon-kuczenski
View
48
Download
1
Embed Size (px)
Citation preview
photo: flickr.com/people/44073224@N04 (CC-A)
Privacy and Provenance in Environmental Impact Assessment
Brandon Kuczenski, Amr El Abbadi, Cetin SahinUniversity of California, Santa Barbara
ISIE 2015 – University of Surrey
Discerning our Environmental Impacts
Kuczenski et al. ISIE 2015 – Surrey UK – 1 / 12
UNEP-SETAC 2011
Even the simplest query in Life-cycle assessment (LCA) requires vast and varied data, all of whichis “private” to some extent:
● Manufacturing: product composition, equipment operation, utility demand, waste;
● Use: individual consumption habits, product usage behavior, disposal decisions;
● Supply Chain: materials sourcing, supply contracts, logistics;
● Background: aggregated industrial-economic models.
Data privacy is at the heart of LCA practice:
● Averaging (Horizontal): similar processes operatedin parallel;
● Aggregation (Vertical): grouping related activities;
● Background Aggregation (“roll-up”): cradle-to-gatecomputation.
Data privacy is at odds with the objectives of LCA:
● Attribution of impacts to specific activities;
● Identifying ways to improve environmental perfor-mance through operational changes.
How do we evaluate environmental claims while ensuring data privacy?
What is privacy in life-cycle assessment?
Kuczenski et al. ISIE 2015 – Surrey UK – 2 / 12
PROV Primer (W3C)
Privacy means confidentiality (a.k.a. secrecy)
● Companies don’t want to reveal sensitive details to competitors, regulators, or the public;
● Usually accomplished through roll-up or vertical aggregation.
Privacy means anonymity :
● Published results should not reveal the details of any contributor;
● Usually accomplished through horizontal aggregation among “at least three” contributors.
Privacy goes hand in hand with provenance:
● Data provenance is the attribution of results to specific observations and/or computations.
● Standardized by the W3C as a directed graph model linking agents, entities and activities.
● Rich parallels with LCI modeling.
In LCA publishing it is desirable to make two assurances:
● Assure data providers that their information cannotbe discerned from published results;
● Assure data users that the results reflect anaccurate model of the system under study.
● Needs a provenance framework!
Provenance Framework
Kuczenski et al. ISIE 2015 – Surrey UK – 3 / 12
● Linked observations form a treecalled an Inventory fragment.
● Fragments are equivalent toprovenance graphs.
Define an inventory foreground model on the basis of observations of flows between a parent
node and child nodes and their directed implications :
Flow Child nodeParent
node
generated byrequires
Direction: Input
Flow Child nodeParent
node
consumed bygenerates
Direction: Output
node
node
node
node
node
node
exchange
Reference
exchange
exchange
exchange
exchange
exchange
exchange
exchange
exchange
exchange
exchange
Provenance Framework: Data Set Resolution
Kuczenski et al. ISIE 2015 – Surrey UK – 4 / 12
● Foreground nodes and backgrounddependencies are mapped to specificdatasets (public or private).
● Each reference must be resolved attime of computation
Common background processes
node
node
node
node
node
node
exchange
Reference
exchange
exchange
exchange
exchange
exchange
exchange
exchange
exchange
exchange
exchange
Electricity (EU)
Electricity (CN)
...
Thermal Energy from Gas
...
Structural Steel
...
Freight Transport (truck)
...
b
bb
b
b
b
b
Provenance Framework: LCI Publication
Kuczenski et al. ISIE 2015 – Surrey UK – 5 / 12
node
node
node
node
node
node
exchange
Reference
exchange
exchange
exchange
exchange
exchange
exchange
exchange
exchange
exchange
exchange
Electricity (EU)
Electricity (CN)
...
Thermal Energy from Gas
...
Structural Steel
...
Freight Transport (truck)
...
b
bb
b
b
b
b
LCI model publication is a serialization of the graphmodel:
● Fragment table describes the structure of theforeground tree;
● Foreground table describes node resolutions.
● Background table describes backgroundresolutions.
These three pieces precisely describe an LCI model ina database-independent fashion, even if data setsthemselves remain private.
LCI model can be formulated as a foreground tree and a strongly connected background:
AP =
(
Af 0Ad A
⋆
)
; BP =(
Bf B⋆
)
● Af are links between foreground nodes (fragment table);
● Ad are the dependencies of the foreground on the background (fragment table);
● Bf includes foreground direct emissions (dereferenced node table);
● A⋆ and B
⋆ are the background database (dereferenced background table).
LCI Publication Use Cases
Kuczenski et al. ISIE 2015 – Surrey UK – 6 / 12
Use Case 1: Secure Multiparty Computation
Secure Multiparty Computation: The Private Jet Problem
Kuczenski et al. ISIE 2015 – Surrey UK – 7 / 12
photo: flickr.com/photos/rodeime (CC-A)I have some bad news, gentlemen.
Secure Multiparty Computation (SMC) and LCA
Kuczenski et al. ISIE 2015 – Surrey UK – 8 / 12
Setting: a group of untrusting parties with private inputs.Goal: jointly compute a function of their inputs while maintaining secrecy of all private data.
SMC uses cryptographic techniques to collaboratively compute a function of private inputs (aver-age, maximum, quantile, etc.) without any party revealing information to any other party.
● The output of the computation can be known to every party.
● The output can remain secret, reporting “flags” (high/low) or rank ordering to each contributorprivately.
● No trusted party is required.
SMC can be used anywhere horizontal
averaging is needed, to securely computeexchange coefficients (e.g. values in Ad or Af
or Bf ).
● Contributors must agree on a provenancemodel.
● Vulnerable to false information provided bya careless or malicious contributor.
● Audit mechanisms can be established (butrequire a trusted party).
LCI Publication Use Cases
Kuczenski et al. ISIE 2015 – Surrey UK – 9 / 12
Use Case 2: Secure Publication
Secure Publication: The Supplier Problem
Kuczenski et al. ISIE 2015 – Surrey UK – 10 / 12
Product designer wants to publish LCA results with as much detail as is permitted by dataproviders’ confidentiality policies. Upstream supplier’s objectives:
● Conceal its impacts if they are “large” (may want complete anonymity);
● Publicize its impacts if they are “small” (wants credit for the low impacts attributed to them)
● “Private” data are values of entries in Af , Ad, Bf
● Other portions of the model may be public; adversary may have partial information or usestatistical methods.
How much information can be published without revealing private data?
node
node
node
node
node
node
exchange
Reference
exchange
exchange
exchange
exchange
exchange
exchange
exchange
exchange
exchange
exchange
Electricity (EU)
Electricity (CN)
...
Thermal Energy from Gas
...
Structural Steel
...
Freight Transport (truck)
...
b
bb
b
b
b
b
GWP
AP
EP
Smog
ADP
Study Formulation and Obfuscation
Kuczenski et al. ISIE 2015 – Surrey UK – 11 / 12
The obfuscated study is constructed through graph transformation, e.g. grouping foregroundnodes (vertical aggregation) and background LCI results (background aggregation):
Af → A′
f ; Ad → A′
d; Bf → B′
f
The obfuscated study can be formulated as a linear equation:
s = E · (B′
f +BxA′
d) · x
● x is derived from foreground traversal;
● Bx = B⋆· (I − A
⋆)−1 is the aggregated background database;
● E is the characterization matrix.
● Privacy protection depends on the locations of nonzero elements in A′
d, B′
f and E.
Operationalize the two competency questions of privacy-preserving LCA publication:
1. How closely can an adversary (possibly with partial information) estimate the values of privatedata?
2. How can a data user (or critical reviewer) be convinced of the accuracy of a computation thatconceals private data?
Conclusions and Outlook
Kuczenski et al. ISIE 2015 – Surrey UK – 12 / 12
● LCA interpretation is greatly facilitated with an explicit provenance framework.
● LCI models can be published precisely without revealing private data. Two use cases:
1. Mutually untrusting peers wish to privately evaluate their collective performance(Secure multiparty computation);
2. Trusted party wishes to publicly reveal detailed results of a study that includes privatedata (Secure publication).
● Current work:
− Implement SMC for horizontal averaging;
− Develop minimal constraints on privacy-preserving aggregation for secure publication;
− Operationalize result validation with private data.
Thanks to:
● Co-PI Amr El Abbadi (UCSB CS); PhD student Cetin Sahin (UCSB CS)
● Omer Eğecioğlu; Roland Geyer, Pascal Lesage, Kyle Meisterling
● NSF CCF-1442966
Thank you!