Upload
howie
View
31
Download
0
Embed Size (px)
DESCRIPTION
Privacy-Preserving Schema Matching Using Mutual Information. Isabel F. Cruz University of Illinois at Chicago Roberto TamassiaDanfeng Yao Brown University. Supported in part by the National Science Foundation under ITR awards IIS–0326284, IIS–0324846, and IIS–0513553. - PowerPoint PPT Presentation
Citation preview
Privacy-Preserving Schema Privacy-Preserving Schema Matching Using Mutual Matching Using Mutual
InformationInformation
Isabel F. CruzIsabel F. CruzUniversity of Illinois at ChicagoUniversity of Illinois at Chicago
Roberto TamassiaRoberto Tamassia Danfeng YaoDanfeng YaoBrown UniversityBrown University
DBSec 2007, Redondo Beach, CA
Supported in part by the National Science Foundation under ITR awards IIS–0326284, IIS–0324846, and IIS–0513553
2DBSec 2007, Redondo Beach, CA
Heterogeneous databasesHeterogeneous databases
Query: join patients’ records in medical database A, B, and CQuery: join patients’ records in medical database A, B, and C
3DBSec 2007, Redondo Beach, CA
The need for schema matchingThe need for schema matchingPatient Patient
typetypeBlood Blood
pressurpressuree
HearHeart ratet rate
Height Height WeighWeight t
AgeAge
Diabetic
120/75 70 6’ 0 110 LB 45
Cancer 115/65 60 5’ 4 170 LB 60Obese 105/60 72 5’ 6 150 LB 75
Database ADatabase A
Group Group IDID
BPBP HRHR HH WW AA
01 115/65 60 5’ 4 180 LB 6602 120/70 70 6’ 1 170 LB 4003 105/60 72 5’ 3 130 LB 69
Database BDatabase B
How to find out the correspondence of attribute names in A and B ?How to find out the correspondence of attribute names in A and B ?
4DBSec 2007, Redondo Beach, CA
Privacy in schema matchingPrivacy in schema matching Data interoperability requires schema matching Data interoperability requires schema matching However, data owners may consider schema However, data owners may consider schema
sensitivesensitive Need to develop privacy-preserving schema Need to develop privacy-preserving schema
matching methodsmatching methods Related work:Related work:
Privacy-preserving data sharing [Privacy-preserving data sharing [Clifton Kantarcioglu Doan Clifton Kantarcioglu Doan Schadow Vaidya Elmagarmid Suciu 04Schadow Vaidya Elmagarmid Suciu 04]]
Privacy-preserving ontology matching [Privacy-preserving ontology matching [Mitra Liu Pan 05Mitra Liu Pan 05]] Privacy-preserving access control to heterogeneous Privacy-preserving access control to heterogeneous
databases [databases [Mitra Liu Pan AtluriMitra Liu Pan Atluri 0606]] Privacy-preserving schema and data matching Privacy-preserving schema and data matching
[[Scannapieco Figotin Bertino Elmagarmid 07Scannapieco Figotin Bertino Elmagarmid 07]]
5DBSec 2007, Redondo Beach, CA
Overview of our approachOverview of our approach Key observation 1: same attributes have similar data Key observation 1: same attributes have similar data
distributions and correlate similarly to other attributesdistributions and correlate similarly to other attributes Probability distribution of attributes (e.g., heart rate, age, Probability distribution of attributes (e.g., heart rate, age,
height, blood pressure)height, blood pressure) Mutual information (MI) captures the correlation of Mutual information (MI) captures the correlation of
attributes (e.g., age and blood pressure)attributes (e.g., age and blood pressure) Key observation 2: we reduce private schema matching Key observation 2: we reduce private schema matching
to 2-party private set intersectionto 2-party private set intersection Only intersected elements are returned and nothing elseOnly intersected elements are returned and nothing else
Our approach for private schema matchingOur approach for private schema matching View self and mutual information (MI) values of each View self and mutual information (MI) values of each
schema as sets of numbersschema as sets of numbers Match the MIs of two schemas using private set Match the MIs of two schemas using private set
intersectionintersection
6DBSec 2007, Redondo Beach, CA
Building block: pair-wise mutual Building block: pair-wise mutual information (MI)information (MI)
Patient Patient typetype
Heart Heart raterate
Blood Blood typetype
Diabetic 70 ACancer 60 OObese 77 AObese 76 O
1. Party A with schema A computes MI and constructs a graph 1. Party A with schema A computes MI and constructs a graph
2. Party B with schema B constructs its graph2. Party B with schema B constructs its graph
Assume that schemas are not private info [Assume that schemas are not private info [Kang Naughton 03Kang Naughton 03]]
3. Both parties then find a correspondence of the two graphs 3. Both parties then find a correspondence of the two graphs
1.5
2.0 1.0
Patient typePatient type
Heart rateHeart rate Blood typeBlood type
1.01.01.51.5
1.01.0
Node ANode A1.5
2.0 1.0
1.01.01.51.5
1.01.0Node BNode B Node CNode C
Node ANode A Node BNode B Node CNode C…… …… ……
7DBSec 2007, Redondo Beach, CA
Building block: Privacy-preserving Building block: Privacy-preserving set intersectionset intersection
An efficient protocol based on homomorphic encryption was An efficient protocol based on homomorphic encryption was proposed by [proposed by [Freedman Nissim Pinkas 04Freedman Nissim Pinkas 04]]
3, 7, 17, 20, 80 3, 6, 15, 20, 88
3, 20
Interactive protocolInteractive protocol
OutputOutput
Alice and Bob only learn the intersected elementsAlice and Bob only learn the intersected elementsSecure against malicious adversariesSecure against malicious adversaries
8DBSec 2007, Redondo Beach, CA
Our approach: Privacy-preserving Our approach: Privacy-preserving schema mappingschema mapping
Two players: A and B, each with a private schemaTwo players: A and B, each with a private schema A and B compute MI of schema attributes and graphs, A and B compute MI of schema attributes and graphs,
respectivelyrespectively Each attribute has a MI set: attribute entropy and pair-wise MIEach attribute has a MI set: attribute entropy and pair-wise MI
A and B sort the entropies (self MIs) of attributes, respectivelyA and B sort the entropies (self MIs) of attributes, respectively For each attribute, A and B carry our private set intersection For each attribute, A and B carry our private set intersection
If attributes match, then set intersection returns the entire MI setIf attributes match, then set intersection returns the entire MI set
1.5
2.0 1.0
Patient typePatient type
Heart rateHeart rate Blood typeBlood type
1.01.01.51.5
1.01.0
(1.5, 1.5, 1.0)(1.5, 1.5, 1.0)MI setMI set
(1.0, 1.0, 1.0)(1.0, 1.0, 1.0)
AA
9DBSec 2007, Redondo Beach, CA
PropertiesProperties Support of three types of schema mappingsSupport of three types of schema mappings
One-to-one, onto, partial mappingsOne-to-one, onto, partial mappings Security property Security property
Basic metod secure against semi-honest adversariesBasic metod secure against semi-honest adversaries Advanced method (uses zero knowledge) secure against Advanced method (uses zero knowledge) secure against
malicious adversariesmalicious adversaries Complexity propertyComplexity property
Assuming entropies of distinct attributes are different, we Assuming entropies of distinct attributes are different, we perform a linear (proportional to the number of attributes perform a linear (proportional to the number of attributes of A and B) number of privacy-preserving set intersectionsof A and B) number of privacy-preserving set intersections
One-to-oneOne-to-one OntoOnto PartialPartial
10DBSec 2007, Redondo Beach, CA
TheoremsTheorems Theorem 1 (Security): Theorem 1 (Security): Assuming the existence of a private set Assuming the existence of a private set
intersection protocol against malicious adversaries, our privacy-intersection protocol against malicious adversaries, our privacy-preserving schema matching protocol for one-to-one, onto, and preserving schema matching protocol for one-to-one, onto, and partial mappings is secure against malicious adversaries.partial mappings is secure against malicious adversaries.
Definition: Definition: The The multiplicity valuemultiplicity value m mii of element a of element aii in a list L with in a list L with l elements and k distinct elements is the number of times l elements and k distinct elements is the number of times element aelement aii (1 ≤ i ≤ k) appears in L. The (1 ≤ i ≤ k) appears in L. The multiplicity sequence of multiplicity sequence of LL is (m is (m11, m, m22, …, m, …, mkk) where m) where m11+ m+ m2 2 + … + m+ … + mk k = l= l
Theorem 2 (Complexity): Theorem 2 (Complexity): Consider a schema Consider a schema AA with m with m attributes and a schema attributes and a schema BB with n attributes. Let (m with n attributes. Let (m11, m, m22, …, m, …, mkk) ) be the multiplicity sequence of the entropy list of be the multiplicity sequence of the entropy list of AA and let (n and let (n11, , nn22, …, n, …, nkk) be the multiplicity sequence of the entropy list of ) be the multiplicity sequence of the entropy list of BB by removing the elements not present in the entropy list of by removing the elements not present in the entropy list of AA. . We have that the We have that the number of set intersectionsnumber of set intersections performed in our performed in our privacy-preserving schema matching protocol is at most: privacy-preserving schema matching protocol is at most: k k
mmi i nnii i=1i=1