Privacy and Provenance in Environmental Impact Assessment - ISIE 2015 Surrey UK

photo: flickr.com/people/44073224@N04 (CC-A)

Privacy and Provenance in Environmental Impact Assessment

Brandon Kuczenski, Amr El Abbadi, Cetin SahinUniversity of California, Santa Barbara

ISIE 2015 – University of Surrey

Discerning our Environmental Impacts

Kuczenski et al. ISIE 2015 – Surrey UK – 1 / 12

UNEP-SETAC 2011

Even the simplest query in Life-cycle assessment (LCA) requires vast and varied data, all of whichis “private” to some extent:

● Manufacturing: product composition, equipment operation, utility demand, waste;

● Use: individual consumption habits, product usage behavior, disposal decisions;

● Supply Chain: materials sourcing, supply contracts, logistics;

● Background: aggregated industrial-economic models.

Data privacy is at the heart of LCA practice:

● Averaging (Horizontal): similar processes operatedin parallel;

● Aggregation (Vertical): grouping related activities;

● Background Aggregation (“roll-up”): cradle-to-gatecomputation.

Data privacy is at odds with the objectives of LCA:

● Attribution of impacts to specific activities;

● Identifying ways to improve environmental perfor-mance through operational changes.

How do we evaluate environmental claims while ensuring data privacy?

What is privacy in life-cycle assessment?


PROV Primer (W3C)

Privacy means confidentiality (a.k.a. secrecy)

● Companies don’t want to reveal sensitive details to competitors, regulators, or the public;

● Usually accomplished through roll-up or vertical aggregation.

Privacy means anonymity :

● Published results should not reveal the details of any contributor;

● Usually accomplished through horizontal aggregation among “at least three” contributors.

Privacy goes hand in hand with provenance:

● Data provenance is the attribution of results to specific observations and/or computations.

● Standardized by the W3C as a directed graph model linking agents, entities and activities.

● Rich parallels with LCI modeling.

In LCA publishing it is desirable to make two assurances:

● Assure data providers that their information cannotbe discerned from published results;

● Assure data users that the results reflect anaccurate model of the system under study.

● Needs a provenance framework!

Provenance Framework


● Linked observations form a treecalled an Inventory fragment.

● Fragments are equivalent toprovenance graphs.

Define an inventory foreground model on the basis of observations of flows between a parent

node and child nodes and their directed implications :

Flow Child nodeParent

node

generated byrequires

Direction: Input

Flow Child nodeParent

node

consumed bygenerates

Direction: Output

node

node

node

node

node

node

exchange

Reference

exchange

exchange

exchange

exchange

exchange

exchange

exchange

exchange

exchange

exchange

Provenance Framework: Data Set Resolution


● Foreground nodes and backgrounddependencies are mapped to specificdatasets (public or private).

● Each reference must be resolved attime of computation

Common background processes

node

node

node

node

node

node

exchange

Reference

exchange

exchange

exchange

exchange

exchange

exchange

exchange

exchange

exchange

exchange

Electricity (EU)

Electricity (CN)

...

Thermal Energy from Gas

...

Structural Steel

...

Freight Transport (truck)

...

b

bb

b

b

b

b

Provenance Framework: LCI Publication


node

node

node

node

node

node

exchange

Reference

exchange

exchange

exchange

exchange

exchange

exchange

exchange

exchange

exchange

exchange

Electricity (EU)

Electricity (CN)

...


...

Structural Steel

...


...

b

bb

b

b

b

b

LCI model publication is a serialization of the graphmodel:

● Fragment table describes the structure of theforeground tree;

● Foreground table describes node resolutions.

● Background table describes backgroundresolutions.

These three pieces precisely describe an LCI model ina database-independent fashion, even if data setsthemselves remain private.

LCI model can be formulated as a foreground tree and a strongly connected background:

AP =

(

Af 0Ad A

⋆

)

; BP =(

Bf B⋆

)

● Af are links between foreground nodes (fragment table);

● Ad are the dependencies of the foreground on the background (fragment table);

● Bf includes foreground direct emissions (dereferenced node table);

● A⋆ and B

⋆ are the background database (dereferenced background table).

LCI Publication Use Cases


Use Case 1: Secure Multiparty Computation

Secure Multiparty Computation: The Private Jet Problem


photo: flickr.com/photos/rodeime (CC-A)I have some bad news, gentlemen.

Secure Multiparty Computation (SMC) and LCA


Setting: a group of untrusting parties with private inputs.Goal: jointly compute a function of their inputs while maintaining secrecy of all private data.

SMC uses cryptographic techniques to collaboratively compute a function of private inputs (aver-age, maximum, quantile, etc.) without any party revealing information to any other party.

● The output of the computation can be known to every party.

● The output can remain secret, reporting “flags” (high/low) or rank ordering to each contributorprivately.

● No trusted party is required.

SMC can be used anywhere horizontal

averaging is needed, to securely computeexchange coefficients (e.g. values in Ad or Af

or Bf ).

● Contributors must agree on a provenancemodel.

● Vulnerable to false information provided bya careless or malicious contributor.

● Audit mechanisms can be established (butrequire a trusted party).

LCI Publication Use Cases


Use Case 2: Secure Publication

Secure Publication: The Supplier Problem


Product designer wants to publish LCA results with as much detail as is permitted by dataproviders’ confidentiality policies. Upstream supplier’s objectives:

● Conceal its impacts if they are “large” (may want complete anonymity);

● Publicize its impacts if they are “small” (wants credit for the low impacts attributed to them)

● “Private” data are values of entries in Af , Ad, Bf

● Other portions of the model may be public; adversary may have partial information or usestatistical methods.

How much information can be published without revealing private data?

node

node

node

node

node

node

exchange

Reference

exchange

exchange

exchange

exchange

exchange

exchange

exchange

exchange

exchange

exchange

Electricity (EU)

Electricity (CN)

...


...

Structural Steel

...


...

b

bb

b

b

b

b

GWP

AP

EP

Smog

ADP

Study Formulation and Obfuscation


The obfuscated study is constructed through graph transformation, e.g. grouping foregroundnodes (vertical aggregation) and background LCI results (background aggregation):

Af → A′

f ; Ad → A′

d; Bf → B′

f

The obfuscated study can be formulated as a linear equation:

s = E · (B′

f +BxA′

d) · x

● x is derived from foreground traversal;

● Bx = B⋆· (I − A

⋆)−1 is the aggregated background database;

● E is the characterization matrix.

● Privacy protection depends on the locations of nonzero elements in A′

d, B′

f and E.

Operationalize the two competency questions of privacy-preserving LCA publication:

1. How closely can an adversary (possibly with partial information) estimate the values of privatedata?

2. How can a data user (or critical reviewer) be convinced of the accuracy of a computation thatconceals private data?

Conclusions and Outlook


● LCA interpretation is greatly facilitated with an explicit provenance framework.

● LCI models can be published precisely without revealing private data. Two use cases:

1. Mutually untrusting peers wish to privately evaluate their collective performance(Secure multiparty computation);

2. Trusted party wishes to publicly reveal detailed results of a study that includes privatedata (Secure publication).

● Current work:

− Implement SMC for horizontal averaging;

− Develop minimal constraints on privacy-preserving aggregation for secure publication;

− Operationalize result validation with private data.

Thanks to:

● Co-PI Amr El Abbadi (UCSB CS); PhD student Cetin Sahin (UCSB CS)

● Omer Eğecioğlu; Roland Geyer, Pascal Lesage, Kyle Meisterling

● NSF CCF-1442966

Thank you!

[email protected]

Science

Privacy and Provenance in Environmental Impact Assessment - ISIE 2015 Surrey UK