10
Privacy-Preserving Schema Privacy-Preserving Schema Matching Using Mutual Matching Using Mutual Information Information Isabel F. Cruz Isabel F. Cruz University of Illinois at Chicago University of Illinois at Chicago Roberto Tamassia Roberto Tamassia Danfeng Yao Danfeng Yao Brown University Brown University DBSec 2007, Redondo Beach, CA Supported in part by the National Science Foundation under ITR awards IIS–0326284, IIS–0324846, and IIS– 0513553

Privacy-Preserving Schema Matching Using Mutual Information

  • Upload
    howie

  • View
    31

  • Download
    0

Embed Size (px)

DESCRIPTION

Privacy-Preserving Schema Matching Using Mutual Information. Isabel F. Cruz University of Illinois at Chicago Roberto TamassiaDanfeng Yao Brown University. Supported in part by the National Science Foundation under ITR awards IIS–0326284, IIS–0324846, and IIS–0513553. - PowerPoint PPT Presentation

Citation preview

Page 1: Privacy-Preserving Schema Matching Using Mutual Information

Privacy-Preserving Schema Privacy-Preserving Schema Matching Using Mutual Matching Using Mutual

InformationInformation

Isabel F. CruzIsabel F. CruzUniversity of Illinois at ChicagoUniversity of Illinois at Chicago

Roberto TamassiaRoberto Tamassia Danfeng YaoDanfeng YaoBrown UniversityBrown University

DBSec 2007, Redondo Beach, CA

Supported in part by the National Science Foundation under ITR awards IIS–0326284, IIS–0324846, and IIS–0513553

Page 2: Privacy-Preserving Schema Matching Using Mutual Information

2DBSec 2007, Redondo Beach, CA

Heterogeneous databasesHeterogeneous databases

Query: join patients’ records in medical database A, B, and CQuery: join patients’ records in medical database A, B, and C

Page 3: Privacy-Preserving Schema Matching Using Mutual Information

3DBSec 2007, Redondo Beach, CA

The need for schema matchingThe need for schema matchingPatient Patient

typetypeBlood Blood

pressurpressuree

HearHeart ratet rate

Height Height WeighWeight t

AgeAge

Diabetic

120/75 70 6’ 0 110 LB 45

Cancer 115/65 60 5’ 4 170 LB 60Obese 105/60 72 5’ 6 150 LB 75

Database ADatabase A

Group Group IDID

BPBP HRHR HH WW AA

01 115/65 60 5’ 4 180 LB 6602 120/70 70 6’ 1 170 LB 4003 105/60 72 5’ 3 130 LB 69

Database BDatabase B

How to find out the correspondence of attribute names in A and B ?How to find out the correspondence of attribute names in A and B ?

Page 4: Privacy-Preserving Schema Matching Using Mutual Information

4DBSec 2007, Redondo Beach, CA

Privacy in schema matchingPrivacy in schema matching Data interoperability requires schema matching Data interoperability requires schema matching However, data owners may consider schema However, data owners may consider schema

sensitivesensitive Need to develop privacy-preserving schema Need to develop privacy-preserving schema

matching methodsmatching methods Related work:Related work:

Privacy-preserving data sharing [Privacy-preserving data sharing [Clifton Kantarcioglu Doan Clifton Kantarcioglu Doan Schadow Vaidya Elmagarmid Suciu 04Schadow Vaidya Elmagarmid Suciu 04]]

Privacy-preserving ontology matching [Privacy-preserving ontology matching [Mitra Liu Pan 05Mitra Liu Pan 05]] Privacy-preserving access control to heterogeneous Privacy-preserving access control to heterogeneous

databases [databases [Mitra Liu Pan AtluriMitra Liu Pan Atluri 0606]] Privacy-preserving schema and data matching Privacy-preserving schema and data matching

[[Scannapieco Figotin Bertino Elmagarmid 07Scannapieco Figotin Bertino Elmagarmid 07]]

Page 5: Privacy-Preserving Schema Matching Using Mutual Information

5DBSec 2007, Redondo Beach, CA

Overview of our approachOverview of our approach Key observation 1: same attributes have similar data Key observation 1: same attributes have similar data

distributions and correlate similarly to other attributesdistributions and correlate similarly to other attributes Probability distribution of attributes (e.g., heart rate, age, Probability distribution of attributes (e.g., heart rate, age,

height, blood pressure)height, blood pressure) Mutual information (MI) captures the correlation of Mutual information (MI) captures the correlation of

attributes (e.g., age and blood pressure)attributes (e.g., age and blood pressure) Key observation 2: we reduce private schema matching Key observation 2: we reduce private schema matching

to 2-party private set intersectionto 2-party private set intersection Only intersected elements are returned and nothing elseOnly intersected elements are returned and nothing else

Our approach for private schema matchingOur approach for private schema matching View self and mutual information (MI) values of each View self and mutual information (MI) values of each

schema as sets of numbersschema as sets of numbers Match the MIs of two schemas using private set Match the MIs of two schemas using private set

intersectionintersection

Page 6: Privacy-Preserving Schema Matching Using Mutual Information

6DBSec 2007, Redondo Beach, CA

Building block: pair-wise mutual Building block: pair-wise mutual information (MI)information (MI)

Patient Patient typetype

Heart Heart raterate

Blood Blood typetype

Diabetic 70 ACancer 60 OObese 77 AObese 76 O

1. Party A with schema A computes MI and constructs a graph 1. Party A with schema A computes MI and constructs a graph

2. Party B with schema B constructs its graph2. Party B with schema B constructs its graph

Assume that schemas are not private info [Assume that schemas are not private info [Kang Naughton 03Kang Naughton 03]]

3. Both parties then find a correspondence of the two graphs 3. Both parties then find a correspondence of the two graphs

1.5

2.0 1.0

Patient typePatient type

Heart rateHeart rate Blood typeBlood type

1.01.01.51.5

1.01.0

Node ANode A1.5

2.0 1.0

1.01.01.51.5

1.01.0Node BNode B Node CNode C

Node ANode A Node BNode B Node CNode C…… …… ……

Page 7: Privacy-Preserving Schema Matching Using Mutual Information

7DBSec 2007, Redondo Beach, CA

Building block: Privacy-preserving Building block: Privacy-preserving set intersectionset intersection

An efficient protocol based on homomorphic encryption was An efficient protocol based on homomorphic encryption was proposed by [proposed by [Freedman Nissim Pinkas 04Freedman Nissim Pinkas 04]]

3, 7, 17, 20, 80 3, 6, 15, 20, 88

3, 20

Interactive protocolInteractive protocol

OutputOutput

Alice and Bob only learn the intersected elementsAlice and Bob only learn the intersected elementsSecure against malicious adversariesSecure against malicious adversaries

Page 8: Privacy-Preserving Schema Matching Using Mutual Information

8DBSec 2007, Redondo Beach, CA

Our approach: Privacy-preserving Our approach: Privacy-preserving schema mappingschema mapping

Two players: A and B, each with a private schemaTwo players: A and B, each with a private schema A and B compute MI of schema attributes and graphs, A and B compute MI of schema attributes and graphs,

respectivelyrespectively Each attribute has a MI set: attribute entropy and pair-wise MIEach attribute has a MI set: attribute entropy and pair-wise MI

A and B sort the entropies (self MIs) of attributes, respectivelyA and B sort the entropies (self MIs) of attributes, respectively For each attribute, A and B carry our private set intersection For each attribute, A and B carry our private set intersection

If attributes match, then set intersection returns the entire MI setIf attributes match, then set intersection returns the entire MI set

1.5

2.0 1.0

Patient typePatient type

Heart rateHeart rate Blood typeBlood type

1.01.01.51.5

1.01.0

(1.5, 1.5, 1.0)(1.5, 1.5, 1.0)MI setMI set

(1.0, 1.0, 1.0)(1.0, 1.0, 1.0)

AA

Page 9: Privacy-Preserving Schema Matching Using Mutual Information

9DBSec 2007, Redondo Beach, CA

PropertiesProperties Support of three types of schema mappingsSupport of three types of schema mappings

One-to-one, onto, partial mappingsOne-to-one, onto, partial mappings Security property Security property

Basic metod secure against semi-honest adversariesBasic metod secure against semi-honest adversaries Advanced method (uses zero knowledge) secure against Advanced method (uses zero knowledge) secure against

malicious adversariesmalicious adversaries Complexity propertyComplexity property

Assuming entropies of distinct attributes are different, we Assuming entropies of distinct attributes are different, we perform a linear (proportional to the number of attributes perform a linear (proportional to the number of attributes of A and B) number of privacy-preserving set intersectionsof A and B) number of privacy-preserving set intersections

One-to-oneOne-to-one OntoOnto PartialPartial

Page 10: Privacy-Preserving Schema Matching Using Mutual Information

10DBSec 2007, Redondo Beach, CA

TheoremsTheorems Theorem 1 (Security): Theorem 1 (Security): Assuming the existence of a private set Assuming the existence of a private set

intersection protocol against malicious adversaries, our privacy-intersection protocol against malicious adversaries, our privacy-preserving schema matching protocol for one-to-one, onto, and preserving schema matching protocol for one-to-one, onto, and partial mappings is secure against malicious adversaries.partial mappings is secure against malicious adversaries.

Definition: Definition: The The multiplicity valuemultiplicity value m mii of element a of element aii in a list L with in a list L with l elements and k distinct elements is the number of times l elements and k distinct elements is the number of times element aelement aii (1 ≤ i ≤ k) appears in L. The (1 ≤ i ≤ k) appears in L. The multiplicity sequence of multiplicity sequence of LL is (m is (m11, m, m22, …, m, …, mkk) where m) where m11+ m+ m2 2 + … + m+ … + mk k = l= l

Theorem 2 (Complexity): Theorem 2 (Complexity): Consider a schema Consider a schema AA with m with m attributes and a schema attributes and a schema BB with n attributes. Let (m with n attributes. Let (m11, m, m22, …, m, …, mkk) ) be the multiplicity sequence of the entropy list of be the multiplicity sequence of the entropy list of AA and let (n and let (n11, , nn22, …, n, …, nkk) be the multiplicity sequence of the entropy list of ) be the multiplicity sequence of the entropy list of BB by removing the elements not present in the entropy list of by removing the elements not present in the entropy list of AA. . We have that the We have that the number of set intersectionsnumber of set intersections performed in our performed in our privacy-preserving schema matching protocol is at most: privacy-preserving schema matching protocol is at most: k k

mmi i nnii i=1i=1