Upload
pennie
View
53
Download
0
Tags:
Embed Size (px)
DESCRIPTION
experience-based access management & privacy-preserving record linkage. elizabeth ashley durham thursday , november 11, 2010. roadmap. experience-based access management privacy-preserving record linkage definition steps in record linkage experiment conclusions - PowerPoint PPT Presentation
Citation preview
experience-based access management & privacy-preserving record linkage
elizabeth ashley durhamthursday, november 11, 2010
TRUST 2010 2
roadmap
• experience-based access management• privacy-preserving record linkage
– definition– steps in record linkage– experiment– conclusions– open research questions in record linkage
TRUST 2010 3
roadmap
• experience-based access management• privacy-preserving record linkage
– definition– steps in record linkage– experiment– conclusions– open research questions in record linkage
access management
• Least Privilege: How can we limit provider access to only the information required to do their job?
• Identity and Access Management (IAM)– ex: role-based access controls
• IAM in health care organizations– complex workflow– routine emergencies
TRUST 2010 4
5
the problem with access controls
Ideal Model
Enforced Control
the problem
TRUST 2010L. Røstad and N. Øystein. Access control and integration of health care systems: an experience report and future challenges. Proc. Availability, Reliability & Security, 2007; 871-878.
study: 43% of providers accessed records for which they did not have permissions
6
the experience-based access management (EBAM) lifecycle
Ideal Model
Enforced Control
Access Log Expected Model
TRUST 2010
C. Gunter, D. Liebovitz, and B. Malin. “EBAM: Experience-Based Access Management for Healthcare”. USENIX HealthSec’10 workshop
For more information, see:• USENIX Health Security workshop: http://www.usenix.org/event/healthsec10/ • Copy of the paper: http://seclab.uiuc.edu/pubs/GunterML10.pdf• Video of the presentation: http://www.usenix.org/multimedia/healthsec10gunter
TRUST 2010 7
record linkage in surveillance
access logs
“Karen Lewis”
human resources
“Karen Lewis”
hospitalprivacyoffice
TRUST 2010 8
roadmap
• experience-based access management• privacy-preserving record linkage
– definition– steps in record linkage– experiment– conclusions– open research questions in record linkage
TRUST 2010 9
privacy-preserving record linkage (pprl)
set of records from dataholder A set of records from dataholder B
FirstName
LastName
BirthDay
BirthMonth
BirthYear
Gender
Karyn Lewis 28 Sept 1990 F
Marty Smith 19 Apr 1982 M
Jon Smyth 04 Feb 1960 M
Joy Beck 08 May 1980 F
Laura Root 27 Aug 1945 F
FirstName
LastName
BirthDay
BirthMonth
BirthYear
Gender
John Smith 01 Feb 1960 M
Bob Beck 19 Mar 1980 M
Bob Taylor 07 Jun 1972 M
Karen Lewis 28 Sept 1990 F
Alice Todd 27 Aug 1965 F
TRUST 2010 10
roadmap• experience-based access management• privacy-preserving record linkage
– definition– applications– steps in record linkage– experiment– conclusions– open research questions in record linkage
11
steps in record linkage
blocking field comparison
record paircomparison
record pairclassification
matches
non-matches
*
*
* I assume a common schema and method of data standardization. I also assume that the records from an institution have been deduplicated (i.e., record linkage has been applied within each institution such that an individual is represented by only a single record within an institution.) TRUST 2010
TRUST 2010 12
John Smith 04 Mar 1962 M
Jon Smyth 04 Mar 1960 M
0.75 0.8 1 1 0.975 1
record a:
record b:
comparison vector:
field comparison
FirstName
LastName
BirthDay
BirthMonth
BirthYear
Genderfields:
TRUST 2010 13
roadmap
• experience-based access management• privacy-preserving record linkage
– definition– applications– steps in record linkage– experiment– conclusions– open research questions in record linkage
privacy-preserving field comparison experiment
the dataset• 1,000 records from the North Carolina Voter Registration database• fields:
• 1,000 “corrupted” records
• repeated 100 times to examine statistical significanceP. Christen and A. Pudjijono, “Accurate Synthetic Generation of Realistic Personal Information.” Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, 2009. 14TRUST 2010
Last Name
First Name
Middle Name
Birth State
City State Street Name
Street Type
Street Suffix
Race Gender
data corrupter
KATHRYN MCMILLAN
KATHY MEMILLAN
TRUST 2010 15
privacy-preserving field comparison experiment
• option 1: hash & compare• option 2: secure edit similarity• option 3: bloom filter
TRUST 2010 16
privacy-preserving field comparison
option 1: hash & compare
John Smith 04 Mar 1962 M
Jon Smyth 04 Mar 1960 M
0 0 1 1 0 1
record a:
record b:
comparison vector:
xy9l br3f xt ves vr3d ns
nw2 vwer xt ves xd6 ns
SHA-1, “salting” used to prevent dictionary attack
privacy-preserving field comparison experiment
option 2: secure edit similarity
17
• edit distance: the minimal number of insertions, deletions, and substitutions required to convert one string into another
• edit similarity:
• “secure” edit distance: calculated by iteratively using homomorphic encryption to compute the value of each cell of the matrix used in the dynamic programming algorithm to calculate edit distance
𝑒𝑑𝑖𝑡 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 (𝑠𝑡𝑟𝑖𝑛𝑔1 ,𝑠𝑡𝑟𝑖𝑛𝑔2 )=1− 𝑒𝑑𝑖𝑡 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 (𝑠𝑡𝑟𝑖𝑛𝑔1 ,𝑠𝑡𝑟𝑖𝑛𝑔2)max ( h𝑙𝑒𝑛𝑔𝑡 (𝑠𝑡𝑟𝑖𝑛𝑔1 ) , h𝑙𝑒𝑛𝑔𝑡 (𝑠𝑡𝑟𝑖𝑛𝑔2 ))
W. Du, M. J. Atallah, “Protocols for Secure Remote Database Access with Approximate Matching, Technical Report”, CERIAS, Purdue Uni-versity, 2001. TRUST 2010
privacy-preserving field comparison experiment
option 3: Bloom filtersrecord a record b
john jon
_j jo oh hn n_ _j jo on n_
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 01 1
h1h2
1 1 1 1 1 1 1 1 1 1 1
77.0135*2
||||||2
tcoefficienDice
α: β:
1,000 bits & 30 hash functions (all variations of SHA-1, “salting” used to prevent dictionary attack)
18Rainer Schnell, Tobias Bachteler, and Jorg Reiher. “Privacy-preserving record linkage using Bloom filters,” BMC Medical Informatics and Decision Making (9). 2009 TRUST 2010
TRUST 2010 19
privacy-preserving field comparison experimentrun time
2.5 GHz quad core PC with 4GB of memory
Elizabeth Durham, Yuan Xue, Murat Kantarcioglu, and Bradley Malin. Submitted to Information Fusion. 2010.
Exact Matching Bloom Filter Estimated Edit Similarity
Embedding Jaro Winkler Phonetic Filter Trigrams100
1,000
10,000
100,000
1,000,000
10,000,000
100,000,000
323 391
74,088,275
Field Comparison Method
Runt
ime
(sec
onds
)
TRUST 2010 20
privacy-preserving field comparison experiment
correctness
Elizabeth Durham, Yuan Xue, Murat Kantarcioglu, and Bradley Malin. Submitted to Information Fusion. 2010.
Exact Matching Bloom Filter Edit Similarity Embedding Jaro Winkler Phonetic Filter Trigrams75%
80%
85%
90%
95%
100%
83.97%
99.45% 98.54%
Field Comparison Method
True
Pos
itive
Rat
e
𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑅𝑎𝑡𝑒=¿𝑡𝑟𝑢𝑒𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠
¿𝑡𝑟𝑢𝑒𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠+¿ 𝑓𝑎𝑙𝑠𝑒𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠
TRUST 2010 21
roadmap
• experience-based access management• privacy-preserving record linkage
– definition– applications– steps in record linkage– experiment– conclusions– open research questions in record linkage
TRUST 2010 22
conclusions
hash & compare bloom filter secure edit distance
accuracy:
speed:
security:
overall:
TRUST 2010 23
roadmap
• experience-based access management• privacy-preserving record linkage
– definition– applications– steps in record linkage– experiment– conclusions– open research questions in record linkage
TRUST 2010
centralized distributed
24
open research questions in record linkage
TRUST 2010 25
thanks
NLM 2-T15LM07450-06NIH R01 LM009989
NSF CNS-0964063 (EBAM)NSF CCF-0424422 (TRUST)
ebam
privacy-preserving record linkage
TRUST 2010 26
roadmap• experience-based access management• privacy-preserving record linkage
– definition– applications– steps in record linkage
– experiment• design• results
– open research questions in record linkage
blocking field comparison
record paircomparison
record pairclassification
TRUST 2010 27
John Smith, …
Bob Beck, …
Bob Taylor, …Karen Lewis, …Alice Todd, …
Jon S
myth, …
Joy B
eck,
…
Mart
y Smith
, …
Karyn
Lewis,
…La
ura R
oot, …
|A||B| = 25 record pair comparisons
John Smith, …
Bob Beck, …
Bob Taylor, …Karen Lewis, …Alice Todd, …
Jon S
myth, …
Joy B
eck,
…Ka
ryn Le
wis, …
Laur
a Roo
t, …
Mart
y Smith
, …
4 record pair comparisons
no blocking blocking(first letter of last name)
blocking = match = non-match
TRUST 2010 28
roadmap• experience-based access management• privacy-preserving record linkage
– definition– applications– steps in record linkage
– experiment• design• results
– open research questions in record linkage
blocking field comparison
record paircomparison
record pairclassification
TRUST 2010
continuous fellegi-sunter
29
* Note this assumes a uniform distribution of similarity scores.
Edward H. Porter and William E. Winkler, “Approximate String Comparison and its Effect on an Advanced Record Linkage System”, Research Report RR97/02, U.S. Census Bureau. 1997.
• conditional probability vectors:• m[i] = P(a[i] == b[i] | (a,b) is a match)* • u[i] = P(a[i] == b[i] | (a,b) is a non-match) *where i = 1, … , # fields
• weight vectors:• agreement weight: wa[i] = log(m[i] / u[i])
• disagreement weight: wd[i] = log(1-m[i] / 1-u[i])
• scoring:
k
i d
a
iiwiiw
bascoreba
ba
1 0][,][1][,][
),(,
,
Fellegi-Sunter (FS)
* The Expectation Maximization (EM) algorithm, or a subset of records for which the true match status is known, can be used to determine these conditional probabilities.
record pair comparison
30
calc
ulat
ed o
nce
per r
ecor
d lin
kage
ove
r all
reco
rd p
airs
calc
ulat
ed fo
r ea
ch re
cord
pai
r
TRUST 2010
conditional probability vectors: weight vectors:
50.090.0log
50.0190.01log
31
fellegi-sunter
I. Fellegi and A. Sunter, "A theory for record linkage.” Journal of the American Statistical Society, 1969.
TRUST 2010 32
roadmap• experience-based access management• privacy-preserving record linkage
– definition– applications– steps in record linkage
– experiment• design• results
– open research questions in record linkage
blocking field comparison
record paircomparison
record pairclassification
TRUST 2010
match score
record pair classification
non-match
non-match
non-match
match
non-match
non-match
non-match
match
non-match
33
record pair classification
mimi williams 02.02.82 f bill rogers 02.02.81 m
mimi williams 02.02.82 f
mimi williams 02.02.82 f
bill rogers 01.01.81 m
bill rogers 01.01.81 m
bill rogers 01.01.81 m
jack abbott 03.03.83 m
jack abbott 03.03.83 m
jack abbott 03.03.83 m
momo williams 01.01.81 f
bill rogers 02.02.81 m
bill rogers 02.02.81 m
william rogers 01.01.81 m
momo williams 01.01.81 f
momo williams 01.01.81 f
william rogers 01.01.81 m
william rogers 01.01.81 m
0
+2
0
+3
+1
+3
+1
0
+1
34
open research questions in record linkage
John Smith, …
Bob Beck, …
Bob Taylor, …Karen Lewis, …Alice Todd, …
Jon S
myth, …
Joy B
eck,
…
Mart
y Smith
, …
Karyn
Lewis,
…La
ura R
oot, …
|A||B| = 25 record pair comparisons
John Smith, …
Bob Beck, …
Bob Taylor, …Karen Lewis, …Alice Todd, …
Jon S
myth, …
Joy B
eck,
…Ka
ryn Le
wis, …
Laur
a Roo
t, …
Mart
y Smith
, …
4 record pair comparisons
no blocking blocking(first letter of last name)
= match = non-matchTRUST 2010
TRUST 2010 35
open research questions in record linkage
first name
last name
birth date
gender
bill rogers 01.01.81 m
mimi williams 02.02.82 f
jack abbott 03.03.83 m
first name
last name
birth date
gender
bill rogers 02.02.81 m
momo williams 01.01.81 f
william rogers 01.01.81 m
first name
last name
birth date
gender
bill rogers 01.01.81 m
mimi williams 02.02.82 f
jack abbott 03.03.83 m
first name
last name
birth date
gender
bill rogers 02.02.81 m
momo williams 01.01.81 f
william rogers 01.01.81 m
actual
predicted