43
Privacy-preserving Information Sharing: Tools and Applications Emiliano De Cristofaro University College London https://emilianodc.com Koc University, Jan 2016

Privacy-preserving Information Sharing: Tools and Applications

Embed Size (px)

Citation preview

Page 1: Privacy-preserving Information Sharing: Tools and Applications

Privacy-preserving Information Sharing: Tools and Applications

Emiliano De CristofaroUniversity College Londonhttps://emilianodc.com

Koc University, Jan 2016

Page 2: Privacy-preserving Information Sharing: Tools and Applications

2

Prologue

Privacy-Enhancing Technologies (PETs):Increase privacy of users, groups, and/or organizations

PETs often respond to privacy threatsProtect personally identifiable informationSupport anonymous communicationsPrivacy-respecting data processing

Another angle: privacy as an enablerActively enabling scenarios otherwise impossible w/o clear privacy guarantees

Page 3: Privacy-preserving Information Sharing: Tools and Applications

3

Sharing Information w/ Privacy

When parties with limited mutual trust willing or required to share information

Only the required minimum amount of information should be disclosed in the process

Page 4: Privacy-preserving Information Sharing: Tools and Applications

4

Secure ComputationAlice (a) Bob (b)

f(a,b)

f(a,b)f(a,b)

Map information sharing to f(·,·)?Realize secure f(·,·) efficiently?Quantify information disclosure from output of f(·,·)?

Page 5: Privacy-preserving Information Sharing: Tools and Applications

5

Private Set Intersection (PSI)

Server Client

PrivateSet Intersection

Page 6: Privacy-preserving Information Sharing: Tools and Applications

6

PSI Cardinality (PSI-CA)

Server Client

PSI-CA

Page 7: Privacy-preserving Information Sharing: Tools and Applications

7

PSI w/ Data Transfer (PSI-DT)Server Client

PSI-DT

),(),...,,( 11 ww datasdatasS

Page 8: Privacy-preserving Information Sharing: Tools and Applications

8

PSI w/ Data Transfer

Client Server

Page 9: Privacy-preserving Information Sharing: Tools and Applications

9

Authorized Private Set Intersection

Server Client

PrivateSet Intersection

What if the client populates C with its best guesses for S?

Client needs to prove that inputs satisfy a policy or be authorized

Authorizations issued by appropriate authorityAuthorizations need to be verified implicitly

Page 10: Privacy-preserving Information Sharing: Tools and Applications

10

Size-Hiding Private Set Intersection

Server Client

PrivateSet Intersection

v w

Page 11: Privacy-preserving Information Sharing: Tools and Applications

11

Garbled-Circuit Based PSI

Very fast implementations [HKE12, DCW13, PSZ14], possibly more Based on pipeliningExploits advances in OT Extension

Do not support all functionalitiesStill high communication overhead

Page 12: Privacy-preserving Information Sharing: Tools and Applications

12

Special-purpose PSI

[DT10]: scales efficiently to very large setsFirst protocol with linear complexities and fast crypto

[DKT10]: extends to arbitrarily malicious adversaries

Works also for Authorized Private Set Intersection[DJLLT11]: PSI-based database querying

Won IARPA APP challenge, basis for IARPA SPAR[DT12]: optimized toolkit for PSI

Privately intersect sets – 2,000 items/sec[ADT11]: size-hiding PSI

Page 13: Privacy-preserving Information Sharing: Tools and Applications

13

Oblivious Pseudo-Random Functions

Server Client

OPRF

Page 14: Privacy-preserving Information Sharing: Tools and Applications

14

OPRF-based PSI

Server Client

OPRF

Unless sj is in the intersection fk(sj)indistinguishable from random

Page 15: Privacy-preserving Information Sharing: Tools and Applications

15

OPRF from Blind-RSA Signatures

RSA Signatures:

PRF:Server (d) Client (x)

(H one way function)

Page 16: Privacy-preserving Information Sharing: Tools and Applications

16

Authorized Private Set Intersection (APSI)

Server Client

Authorized Private

Set Intersection

Court

Page 17: Privacy-preserving Information Sharing: Tools and Applications

17

OPRF w/ Implicit Signature Verification

Server Client

OPRF with ISV

Page 18: Privacy-preserving Information Sharing: Tools and Applications

18

A simple OPRF-like with ISV

Court issues authorizations:

OPRF:

Server (k) Client (H(x)d)

(Implicit Verification)

Page 19: Privacy-preserving Information Sharing: Tools and Applications

19

OPRF with ISV – Malicious Security

OPRF:

Server (k) Client (H(x)d)

Page 20: Privacy-preserving Information Sharing: Tools and Applications

20

Other Building Blocks

[DGT12]: Private Set Intersection Cardinality-only[BDG12]: Private Sample Set Similarity [DFT13]: Private Substring/Pattern Matching

Page 21: Privacy-preserving Information Sharing: Tools and Applications

21

Cool! So what?

Page 22: Privacy-preserving Information Sharing: Tools and Applications

22

Genomics…

Page 23: Privacy-preserving Information Sharing: Tools and Applications

23

Page 24: Privacy-preserving Information Sharing: Tools and Applications

24

Page 25: Privacy-preserving Information Sharing: Tools and Applications

25

Page 26: Privacy-preserving Information Sharing: Tools and Applications

26

Page 27: Privacy-preserving Information Sharing: Tools and Applications

27

The Bad News

Sensitivity of human genome:Uniquely identifies an individualDiscloses ethnicity, disease predispositions (including mental)Progress aggravates fears of discriminationOnce leaked, it cannot be “revoked”

De-identification and obfuscation are not effectiveMore info:

[ADHT13] Chills and Thrills of Whole Genome Sequencing. IEEE Computer Magazine.

Page 28: Privacy-preserving Information Sharing: Tools and Applications

28

Secure Genomics?

Privacy:Individuals remain in control of their genomeAllow doctors/clinicians/labs to run genomic tests, while disclosing the required minimum amount of information, i.e.:

(1) Individuals don’t disclose their entire genome(2) Testing facilities keep test specifics (“secret sauce”) confidential

[BBDGT11]: Secure genomics via *-PSIMost personalized medicine tests in < 1 secondWorks on Android too

Page 29: Privacy-preserving Information Sharing: Tools and Applications

29

Private Set Intersection Cardinality

Test Result: (#fragments with same length)

Private RFLP-based Paternity Test

Page 30: Privacy-preserving Information Sharing: Tools and Applications

30

doctoror lab

genome

individual

test specifics

Secure Function

Evaluation

test result test result

• Private Set Intersection (PSI)• Authorized PSI• Private Pattern Matching• Homomorphic Encryption• Garbled Circtuis• […]

Output reveals nothing beyond test result

• Paternity/Ancestry Testing• Testing of SNPs/Markers • Compatibility Testing• Disease Predisposition […]

Page 31: Privacy-preserving Information Sharing: Tools and Applications

31

Open Problems

Where do we store genomes?Encryption can’t guarantee security past 30-50 yrsReliability and availability issues?

CryptographyEfficiency overhead Data representation assumptionsHow much understanding required from users?

Page 32: Privacy-preserving Information Sharing: Tools and Applications

32

Collaborative Anomaly Detection

Anomaly detection is hardSuspicious activities deliberately mimic normal behaviorBut, malevolent actors often use same resources

Wouldn’t it be better if organizations collaborated?

It’s a win-win, no?

“It is the policy of the United States Government to increase the volume, timelines, and quality of cyber threat information shared with U.S. private sector entities so that these entities may better protect and defend themselves against cyber attacks.”

Barack Obama2013 State of the Union Address

Page 33: Privacy-preserving Information Sharing: Tools and Applications

33

Problems with Collaborations

TrustWill others leak my data?

Legal LiabilityWill I be sued for sharing customer data? Will others find me negligible?

Competitive concernsWill my competitors outperform me?

Shared data quality Will data be reliable?

Page 34: Privacy-preserving Information Sharing: Tools and Applications

34

Solution Intuition [FDB15]

tnowSharing

Information w/ Privacy

Company 2

tnow

Company 1

Better Analytics

Securely assess the benefits of

sharingSecurely assess

the risks of sharing

Page 35: Privacy-preserving Information Sharing: Tools and Applications

35

Training Machine Learning Models

The Big Data “Hype”Large-scale collection of contextual information often essential to gather statistics, train machine learning models, and extract knowledge from data

Doing so privately…

Page 36: Privacy-preserving Information Sharing: Tools and Applications

36

Efficient Private Statistics [MDD16]

Real-world problems:1. Recommender systems for online streaming services2. Statistics about mass transport movements3. Traffic statistics for the Tor Network

Available tools for computing private statistics are impractical for large streams collectionIntuition: Approximate statistics are acceptable in some cases?

Page 37: Privacy-preserving Information Sharing: Tools and Applications

37

Preliminaries: Count-Min Sketch

An estimate of an item’s frequency in a stream

Mapping a stream of values (of length T) into a matrix of size O(logT) The sum of two sketches results in the sketch of the union of the two data streams

Page 38: Privacy-preserving Information Sharing: Tools and Applications

38

ItemKNN Recommender Systems

Predict favorite TV programs based on their own ratings and those of “similar” usersConsider N users, M programs and binary ratingsBuild a co-views matrix C, where Cab is the number of views for the pair of programs (a,b)Compute the Similarity MatrixIdentify K-Neighbors based on the Similarity Matrix

Page 39: Privacy-preserving Information Sharing: Tools and Applications

39

Private Recommender System

We build a global matrix of co-views for training ItemKNN in a privacy-friendly way by relying on:

Private data aggregation based on [Kursawe et al. 2011]Count-Min Sketch to reduce overhead

System ModelUsers (in groups) Tally Server (e.g, the BBC)

Page 40: Privacy-preserving Information Sharing: Tools and Applications

40

Security & Implementation

SecurityIn the honest-but-curious model under the CDH assumption

Prototype implementation:Tally as a Node.js web server

Users run in the browser or as a mobile cross-platform application (Apache Cordova)

Transparency, ease of use, ease of deployment

Page 41: Privacy-preserving Information Sharing: Tools and Applications

41

User side

Server side

Page 42: Privacy-preserving Information Sharing: Tools and Applications

42

Accuracy

Page 43: Privacy-preserving Information Sharing: Tools and Applications

43

Challenges Ahead…

This slide is intentionally left blank